A Scheme Towards Automatic Word Indexation System for Balinese Palm Leaf Manuscripts
DOI:
https://doi.org/10.5614/itbj.ict.res.appl.2021.15.2.1Keywords:
Balinese palm leaf manuscript, patch extraction, text detection, transliteration, word indexingAbstract
This paper proposes an initial scheme towards the development of an automatic word indexation system for Balinese lontar (palm leaf manuscript) collections. The word indexation system scheme consists of a sub module for patch image extraction of text areas in lontars and a sub module for word image transliteration. This is the first word indexation system for lontar collections to be proposed. To detect parts of a lontar image that contain text, a Gabor filter is used to provide initial information about the presence of text texture in the image. An adaptive sliding patch algorithm for the extraction of patch images in lontars is also proposed. The word image transliteration sub module was built using the long short-term memory (LSTM) model. The results showed that the image patch extraction of text areas process succeeded in optimally detecting text areas in lontars and extracting the patch image in a suitable position. The proposed scheme successfully extracted between 20% to 40% of the keywords in lontars and thus can at least provide an initial description for prospective lontar readers of the content contained in a lontar collection or to find in which lontar collection certain keywords can be found.
Downloads
References
Kesiman, M.W.A., Burie, J.C., Ogier, J.M., Wibawantara, G.N.M.A. & Sunarya, I.M.G., AMADI_LontarSet: The First Handwritten Balinese Palm Leaf Manuscripts Dataset, 15th International Conference on Frontiers in Handwriting Recognition 2016, Shenzhen, China, pp. 168-172, Oct. 2016. DOI: 10.1109/ICFHR.2016.39.
Kesiman, M.W.A., Word Recognition for the Balinese Palm Leaf Manuscripts, IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), Banda Aceh, Indonesia, pp. 72-76, 2019. DOI: 10.1109/CYBERNETICSCOM.2019.8875634.
Kesiman, M.W.A. & Pradnyana, G.A., Image Patch Extraction in Text Area of Balinese Palm Leaf Manuscripts with Gabor Filters, Proceedings of the 3rd International Conference on Innovative Research Across Disciplines (ICIRAD 2019), Denpansar, Indonesia, 2020. DOI: 10.2991/assehr.k.200115.004.
Kesiman, M.W.A. & Pradnyana, G.A., A Complete Scheme of Word Spotting System for the Balinese Palm Leaf Manuscripts, 2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE), Pattaya, Thailand, pp. 1-5, 2019. DOI: 10.1109/ICITEED.2019.8929937.
Kesiman, M.W.A., Pradnyana, G.A. & Maysanjaya, I.M.D., Balinese Glyph Recognition with Gabor Filters, J. Phys. Conf. Ser., 1516, pp. 012029, Apr. 2020. DOI: 10.1088/1742-6596/1516/1/012029.
Kesiman, M., Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia, J. Imaging, 4(2), pp. 43, Feb. 2018. DOI: 10.3390/jimaging4020043.
Kesiman, M.W.A., Burie, J.C., Ogier, J.M., Wibawantara, G.N.M.A. & Sunarya, I.M.G., Historical Handwritten Document Analysis of Southeast Asian Palm Leaf Manuscripts, in Handwriting: Recognition, Development and Analysis, Bezerra, B.L.D. C. Zanchettin, Toselli, A.H. & Pirlo, G., eds. Hauppauge, New York: Nova Science Publishers, pp. 227-267, 2017.
Antara Kesiman, M.W, Burie, J.C., Ogier, J.M. & Grang P., Knowledge Representation and Phonological Rules for the Automatic Transliteration of Balinese Script on Palm Leaf Manuscript, Comput. Sist., 21(4), Jan. 2018. DOI: 10.13053/cys-21-4-2851.
Kesiman, M.W.A., Burie, J.C. & Ogier, J.M., A Complete Scheme of Spatially Categorized Glyph Recognition for the Transliteration of Balinese Palm Leaf Manuscripts, Kyoto, Japan, Nov. 2017.
EpsiLont (Electronic Pattern Analysis for Lontar) Project, https://research.undiksha.ac.id/vvip-rg/. (March 2021)
Otsu, N., A Threshold Selection Method from Gray-level Histograms, IEEE Transactions on Systems, Man, and Cybernetics, 9(1), pp. 62-66, 1979. DOI: 10.1109/TSMC.1979.4310076.
Shishtla, P., Ganesh, V. S., Subramaniam, S. & Varma, V., A Language-Independent Transliteration Schema Using Character Aligned Models at NEWS 2009, pp. 40-43, 2009. DOI: 10.3115/1699705.1699715.
AbdulJaleel, N. & Larkey, L.S., English to Arabic Transliteration for Information Retrieval: A Statistical Approach, Feb. 2004.
Finch, A. & Sumita, E., Transliteration using a Phrase-based Statistical Machine Translation System to Re-score the Output of a Joint Multigram Model, Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, pp. 48-52, 2010.
Pretkalnina, L., Paikens, P., Gruzitis, N., Rituma, L. & Spektors, A., Making Historical Latvian Texts More Intelligible to Contemporary Readers, May 2012.
Breuel, T.M., Ul-Hasan, A., Al-Azawi, M.A. & Shafait, F., High-Performance OCR for Printed English and Fraktur Using LSTM Networks, pp. 683-687, 2013. DOI: 10.1109/ICDAR.2013.140.
Jenckel, M., Bukhari, S.S. & Dengel, A., anyocr: A Sequence Learning Based OCR System for Unlabeled Historical Documents, pp. 4035-4040, 2016. DOI: 10.1109/ICPR.2016.7900265.
Ul-Hasan, A. & Breuel, T.M., Can We Build Language-Independent OCR Using LSTM Networks? 2013, 1, 2013. DOI: 10.1145/2505377.2505394.
Burie, J.C., ICFHR 2016 Competition on the Analysis of Handwritten Text in Images of Balinese Palm Leaf Manuscripts, 15th International Conference on Frontiers in Handwriting Recognition 2016, Shenzhen, China, pp. 596-601, 2016. DOI: 10.1109/ICFHR.2016.107.