New Arabic OCR Technology with advanced learning of new fonts and characters
-
Upload
independent -
Category
Documents
-
view
4 -
download
0
Transcript of New Arabic OCR Technology with advanced learning of new fonts and characters
New Arabic OCR Technologywith advanced learning of new fonts and
characters
By
Eng. Ahmed Hossam El Din
Technology Overview
• The technology aims to give users a systematic way to recognize text by using learning ability for new fonts through factor tuning process, or learning a new character inside a certain font
• Technology is using mainly geometric features to describe segments in shape of vector of consisting of about 60 values beside assistance of flags vector
• Overlapped characters specially in traditional Arabic are processed as group of characters up to 3 connected characters
• Technology is implemented in Visual Basic ready to be coded in any suitable language or technology
Technology Details
1. Image processing
2. Zone startup & learning new font
3. Line startup
4. Word startup
5. Segmentation
6. Classification
7. Connection
8. Recognition
9. Correction and learning new character
1-Image processing
• Scanning B&W one bit color
• Crop to the desired area
• Tilt to adjust lines horizontally
2- Zone startup & learning new font
• Assign predefined font factors
• Define factors of new font according to hints specified beside the window of factors (learning of new font).
3- Line startup
• Detect start and end of the line
• Detect up and down limits
• Detect Middle line
• Detect the font thickness
5- Segmentation
• Detect start of the segment
• Detect end of the segment
• (segment means limited piece of any bump or dot up or down the middle line)
• Record the boundary of the segment image.
6- Classification
• Extract geometric features of a single segment in a form of vector, after processes of threshold and normalization.
• Extract deducted flag’s vector of segment.
7- Connection
• Connect different successive segments into one character.
• Integrate features into one vector of features (classifier).
8- Recognition
• Get character classes which match the classifier.
• Sort weighted order of estimated recognized characters.
• Choose most probable estimation.
• Assign character position in the text corresponding to the character boundary in the image.
عبير الرسى وهانى عزت وحازم 0القاهرة
أبودومة تعقد األحزاب والقوى السياسية اجتماعا غدا ،
لمناقشة االقترا ح الذى تقدم به النائب وحيد عبدالمجيد،
بش وضع معايير جديدة لتشكيل الجمعية التأسيسية
يمثلون مختلف 23للدستور، وتشمل مائة عضو، من بينهم
من النقابات المهنية ، ومثلهم من 8ا حزاب السياسية ، و
شباب الثورة د ومثلى الطالب د واتحاد ات العمال
من خبراء القان 1تحادات النوعية ، و 9والفالحين، وا
من 3من المؤسسات الدينية ، و 9والهيئات القضائية ، و
من 71السلطة التنفيذية الجي ، والشرطة ، والحكومة ع ،
الشخصيات العامة واقترح سامح عاشورنقيب المحامين
رئيس الجلس االستشارى ورئيس الجبهة الوطنية
بما يجعل الجمعية التأسيسية 6الديمقراطية ،تعديل المادة د
تستمد سلطاتها من الدستور فقط ، وال تخضع لسلطات
Text after cleaning
Automatic recognition and cleaning produces better results
9- Correction and learning new character
• From the mouse position character is displayed both from text and image, user has the ability to correct and teach the character set of the chosen font, learning process is an intelligent process to prevent un-logic learning.
Character from textCharacter from image
Input of new character
Learning of new character
On mouse click the program shows both character from text and image the user can input the correct character and program will learn it