Implementation of Kadazan Tagger Based on Brill's Method

Authors

  • Marylyn Alex CAIT Research Group, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia
  • Lailatul Qadri Zakaria CAIT Research Group, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia

DOI:

https://doi.org/10.5614/itbj.ict.res.appl.2013.7.3.1

Abstract

We present and evaluate the implementation of Part of Speech (POS) Tagging for the Kadazan language by using the Transformation-based approach. The main purpose of this study is to develop an automatic POS tagging for the Kadazan language, which had never, been developed before. POS tagging can tag the Kadazan corpus automatically and can help reduce the disambiguation problem of this language. The implementation of this approach in this study is to achieve a better and higher accuracy or at least similar to that of the other tagging approaches such as the statistical and the original rule-based approach. This approach can transform the tags based on the prescribed set of rules. A number of objectives were set in order to achieve the main purpose of this study. Firstly, to apply the lexical and contextual rules for this language. Secondly, to implement the Brill's algorithm based on the set of rules and finally to determine the effectiveness of the Kadazan Part of Speech by using this approach. The tagging system had been trained using four Kadazan corpuses containing 5663 words in all. Based on the evaluation results, the tagging system had achieved around 93% accuracy.

Downloads

Download data is not yet available.

References

Gulen, A. & Saka, E., Part of Speech Tagging, Middle East Technical University, 2001.

Megyesi, B., Improving Brill's POS Tagger for An Agglutinative Language, Proceedings of Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, ACL-99, University of Maryland, MD, USA, 1999.

Anwar, W., Bajwa, U.I., Munir, E.U. & Fareena Naz, F., Urdu Part of Speech Tagging Using Transformation Based Error Driven Learning, World Applied Sciences Journal, 16 (3), pp. 437-448, 2012.

Voutilainen, A., A Syntax-Based Part of Speech Tagger, Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Dublin, Germany, 1995.

Rabbi, I., Khad, M.A. & Ali, R., Rule-Based Part of Speech Tagger for Pashto Language, Proceedings of the Conference on Language & Technology, CLT09, Lahore, Pakistan, 2009.

Singha, K.R., Purkayastha, B.S. & Singha, K.D., Part of Speech Tagging in Manipuri with Hidden Markov Model, IJCSI International Journal of Computer Science Issues, 9(6), pp. 146-149, 2012.

Petasis, G., Palioras, G., Karkaletsis, V., Spyropoules, C.D. & Androutsopoulas, I., Resolving Part-of-Speech Ambiguity in Greek Language Using Learning Techniques, Proceeding of the ECCAI Advanced Course on Artificial Intelligence, ACAI'99, Chania, Greece, 1999.

Schneider, G. & Volk, M., Adding Manual Constraints and Lexical Look-up to a Brill-Tagger for German, Proceedings of the ESSLLI-98 Workshop on Recent Advances in Corpus Annotation, ESSLLI-98, Saarbrucken, Germany, 1998.

Hardt, D., Transformation-Based Learning of Danish Grammar Correction, Proceedings of RANLP 2001, CLPP-BAS, Tzigov Chark, Bulgaria, 2001.

Downloads

Published

2013-12-01

How to Cite

Alex, M., & Zakaria, L. Q. (2013). Implementation of Kadazan Tagger Based on Brill’s Method. Journal of ICT Research and Applications, 7(3), 177-190. https://doi.org/10.5614/itbj.ict.res.appl.2013.7.3.1

Issue

Section

Articles