Technical Papers Parallel Session-V: Urdu optical character recognition technique using point feature matching; a generic approach

Abstract/Description

The complexity associated with Urdu fonts regarding OCR in newspapers is being dealt with active research. When creating an Urdu OCR you are limited to a certain font size i.e. if working with a font size of 12, you will have to create a database covering all characters/words of font size 12. In order to work with another font size of same Urdu font you'll have to cover all the characters/words of that respective font size. The OCR technique should be generic where the font size should not matter. The objective was to create a technique that could be applied to any Urdu script font size, without worrying about the variation of characters/words caused by the disposal of ink in Urdu newspaper clippings. In this paper the authors have developed a technique using point feature matching on cropped Urdu newspaper clippings with font Jameel Noori Nastaleeq and converted them into editable textual Unicodes.

Location

C-10, AMAN CED

Session Theme

Technical Papers Parallel Session-V (Information Retrieval)

Session Type

Parallel Technical Session

Session Chair

Dr. Imran Hayee

Start Date

13-12-2015 2:50 PM

End Date

13-12-2015 3:10 PM

Share

COinS
 
Dec 13th, 2:50 PM Dec 13th, 3:10 PM

Technical Papers Parallel Session-V: Urdu optical character recognition technique using point feature matching; a generic approach

C-10, AMAN CED

The complexity associated with Urdu fonts regarding OCR in newspapers is being dealt with active research. When creating an Urdu OCR you are limited to a certain font size i.e. if working with a font size of 12, you will have to create a database covering all characters/words of font size 12. In order to work with another font size of same Urdu font you'll have to cover all the characters/words of that respective font size. The OCR technique should be generic where the font size should not matter. The objective was to create a technique that could be applied to any Urdu script font size, without worrying about the variation of characters/words caused by the disposal of ink in Urdu newspaper clippings. In this paper the authors have developed a technique using point feature matching on cropped Urdu newspaper clippings with font Jameel Noori Nastaleeq and converted them into editable textual Unicodes.