A Proposed Arabic Handwritten Text Normalization Method

Tarik Abu-Ain, Siti Norul Huda Sheikh Abdullah, Khairuddin Omar, Ashraf Abu-Ein, Bilal Bataineh, Waleed Abu-Ain

Abstract


Text normalization is an important technique in document image analysis and recognition. It consists of many preprocessing stages, which include slope correction, text padding, skew correction, and straight the writing line. In this side, text normalization has an important role in many procedures such as text segmentation, feature extraction and characters recognition. In the present article, a new method for text baseline detection, straightening, and slant correction for Arabic handwritten texts is proposed. The method comprises a set of sequential steps: first components segmentation is done followed by components text thinning; then, the direction features of the skeletons are extracted, and the candidate baseline regions are determined. After that, selection of the correct baseline region is done, and finally, the baselines of all components are aligned with the writing line.  The experiments are conducted on IFN/ENIT benchmark Arabic dataset. The results show that the proposed method has a promising and encouraging performance.

Full Text:

PDF

References


Grimes, B.F., Ethnologue: Languages of the world, fourteenth ed., SIL International, 2000.

Lorigo, M. & Govindaraju, V., Offline Arabic Handwriting Recog-nition: A Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5), pp. 712-724, 2006.

Abu-Ain, T.A.H., Abu-Ain, W.A.H., Abdullah, S.N.H.S. & Omar, K., Off-line Arabic Character-Based Writer Identification – a Survey, in Proceeding of the International Conference on Advanced Science, Engineering and Information Technology (ICASEIT 2011),Indonesian Students Association-Universiti Kebangsaan Malaysia, Bangi, Malaysia, pp. 161-166, 2011.

Abu-Ain, T., Abdullah, S.N.H.S., Bataineh, B., Abu-Ain, W. & Omar, K., Text Normalization Framework for Handwritten Cursive Languages by Detection and Straightness the Writing Baseline, in International Conference on Electrical Engineering and Informatics (ICEEI 2013), UKM, Bangi, Selangor,Malaysia. pp. 654-658, 2013.

Gacek, A., ArabicManuscripts: A Vademecum for Readers, BRILL 2009.

Pechwitz, M. & Margner,V., Baseline Estimation for Arabic Handwritten Words, in Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR 2002), Niagara-on-the-Lake, Ontario, Canada, pp. 479-484, 2002.

Farooq, F., Govindaraju, V. & Perrone, M., Pre-Processing Methods for Handwritten Arabic Documents, in Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), Seoul, Korea, pp. 267-271, 2005.

Boukerma, H. & Farah, N., A Novel Arabic Baseline Estimation Algorithm Based on Sub-Words Treatment, in Proceedings of the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), Kolkata, India, pp. 335-338, 2010.

Parhami, B. & Taraghi, M., Automatic Recognition of Printed Farsi Texts, Pattern Recognition, 14(1-6), pp. 395-403, 1981.

Saady, Y.E., Rachidi, A., El Yassa, M. & Driss, M., Amazigh Handwritten Character Recognition based on Horizontal and Vertical Centerline of Character, International Journal of Advanced Science and Technology, 33(17), pp. 33-50, 2011.

Touj, S., Amara, N.B. & Amiri, H., Arabic Handwritten Words Recognition Based on a Plannar Hidden Markov Model, The International Arab Journal of Information Technology, 2(4), pp. 318-325, 2005.

Ziaratban, M. & Faez, K., A Novel Two-Stage Algorithm for Baseline Estimation and Correction in Farsi and Arabic Handwritten Text Line, in 19th International Conference on Pattern Recognition (ICPR 2008), Tampa, Florida, USA, pp. 1-5, 2008.

Boubaker, H., Kherallah, M. & Alimi, A.M., New Algorithm of Straight or Curved Baseline Detection for Short Arabic Handwritten Writing, in Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona, Spain, pp. 778-782, 2009.

Nagabhushan, P. & Alaei, A., Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique, International Journal on Computer Science and Engineering, 2(4), pp. 907-916, July 2010.

Bataineh, B., Abdullah, S.N.H.S. & Omar, K., An Adaptive Local Binarization Method for Document Images Based on A Novel Thresholding Method and Dynamic Windows, Pattern Recognition Letters, 32(14), pp. 1805-1813, 2011.

Bataineh, B., Abdullah, S.N.H.S., Omar, K. & Faidzul, M., Adaptive Thresholding Methods for Documents Image Binarization, in Pattern Recognition, ed: Springer Berlin Heidelberg, pp. 230-239, 2011.

Linda, G. & Shapiro, G.C.S., Computer Vision: Prentice Hall, p. 608, 2002.

Abu-Ain, W., Abdullah, S.N.H.S., Bataineh, B., Abu-Ain, T. & Omar, K., Skeletonization Algorithm for Binary Images, in International Conference on Electrical Engineering and Informatics (ICEEI 2013), UKM, Bangi, Selangor, Malaysia. pp. 690-694, 2013.

Al Hamad, H. & Abu Zitar, R., Development of an Efficient Neural-Based Segmentation Technique for Arabic Handwriting Recognition. Pattern Recognition, 43(8), pp. 2773–2798, 2010.

Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N. & Amiri, H., IFN/ENIT - Database of Arabic Handwritten words, in Proceedings of Colloque international francophone surl'écrit et le document (CIFED 2002), Hammamet, Tunisie, pp. 129-136, 2002.




DOI: http://dx.doi.org/10.5614%2Fitbj.ict.res.appl.2013.7.2.5

Refbacks

  • There are currently no refbacks.


Contact Information:

ITB Journal Publisher, LPPM – ITB, 

Center for Research and Community Services (CRCS) Building Floor 7th, 
Jl. Ganesha No. 10 Bandung 40132, Indonesia,

Tel. +62-22-86010080,

Fax.: +62-22-86010051;

e-mail: jictra@lppm.itb.ac.id.