Komodo-7B with Hybrid Retrieval and Q-LoRA for Indonesian Population Administration Question Answering

Anindo Saka Fitri; Abdul Rezha  Efrat Najaf; Eko  Wahyudi; Adelia Azizatul Haq; Sugiarto Sugiarto; I Gede Susrama Mas Diyasa

doi:10.5614/itbj.ict.res.appl.2026.20.1.5

Authors

Anindo Saka Fitri Department of Information System, Faculty of Computer Science, University of Pembangunan Veteran Jawa Timur, Rungkut Madya, Surabaya 60294,
Abdul Rezha Efrat Najaf Department of Information System, Faculty of Computer Science, University of Pembangunan Veteran Jawa Timur, Rungkut Madya, Surabaya 60294,
Eko Wahyudi Department of Law, Faculty of Law, University of Pembangunan Veteran Jawa Timur, Rungkut Madya, Surabaya 60294,
Adelia Azizatul Haq Department of Data Science, Faculty of Computer Science, University of Pembangunan Veteran Jawa Timur, Rungkut Madya Surabaya,
Sugiarto Sugiarto Department of Digital Business, Faculty of Computer Science, University of Pembangunan Veteran Jawa Timur, Rungkut Madya Surabaya 60294,
I Gede Susrama Mas Diyasa Department of Master Information Technology, Faculty of Computer Science, University of Pembangunan Veteran Jawa Timur, Rungkut Madya, Surabaya 60294

DOI:

https://doi.org/10.5614/itbj.ict.res.appl.2026.20.1.5

Keywords:

Komodo-7B, large language models, Q-LoRA, question answering, retrieval-augmented generation, population administration, public services

Abstract

Fast, responsive, and informative public services are societal demands that must be fulfilled by government agencies, among which the Department of Population and Civil Registration of Surabaya City. To enhance service quality, this study developed a Large Language Model (LLM)-based Question Answering (QA) system to address public inquiries regarding Identity Card (ID) and Family Card (FC) services. The proposed system utilizes the Komodo-7B model, which was customized using Quantized Low-Rank Adaptation (Q-LoRA) fine-tuning and integrated with a Retrieval-Augmented Generation (RAG) approach to improve the accuracy and relevance of the generated responses. The training process leveraged a real-world complaint dataset from Disdukcapil alongside the open-source MS MARCO dataset. Furthermore, the RAG implementation employs sentence vectorization via SentenceTransformer and cosine similarity-based context retrieval. System performance was evaluated using ROUGE and METEOR metrics across four scenarios: Komodo-7B Base, RAG Komodo-7B Base, Fine-Tuned Komodo-7B, and RAG Fine-Tuned Komodo-7B. The results show that the RAG Fine-Tuned Komodo-7B configuration delivered the best performance, achieving F1-Scores of 0.3554 for ROUGE-1, 0.3096 for ROUGE-L, and 0.2886 for METEOR.

Downloads

Download data is not yet available.

References

Irianto, H., Kurniawan, A. & Mulyono, A., Optimizing Services to Achieve Good Governance at the Mini Public Service Mall in Sukodono District, Sidoarjo Regency, Jurnal Intelektual Administrasi Publik dan Ilmu Komunikasi, 8(1), 2022. (Text in Indonesian)

Andrew, B.F. & Mei, R.A., The New Generation Klampid System Independently Addresses Population and Civil Registration Problems in Surabaya, Masyarakat Mandiri: Jurnal Pengabdian dan Pembangunan Lokal, 1(3), 2024. (Text in Indonesian)

Hardi, W., Suprastiyo, A. & Retno, S.A., Model of Implementation of Population and Civil Registration Services in the Surabaya City Government Indonesia, Wseas Transactions on Environment and Development, 19, 2023.

Abbasiantaeb, Z. & Momtazi, S., Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives: A Survey, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(6), 2020.

Shaikh, M.R, Nida, S., Khalid, M., Qurrat-ul-Ain, N. & Talha, M., Transformers as the Foundation of Large Language Models: A Comprehensive Review, International Journal of Innovations in Science & Technology, 7(4), 2025.

Ibomoiye, D.M., Nobert, J., George, O., Oyindamola, O.O., Ebenezer, E. & Cameron, M., Large language models: an overview of foundational architectures, recent trends, and a new taxonomy, Discover Applied Sciences, 7(1027), 2025.

Subhash, N., Bandyopadhyay, S., Zhang, J., et al., Transformers and large language models in healthcare: A review, Artif Intell Med, 154, 102900, 2024.

Alawwad, H.A., Alhothali, A., Naseem, U., Alkhathlan, A. & Jamal, A., Enhancing textual textbook question answering with large language models and retrieval augmented generation, Pattern Recognition, 162(5), 111332, 2025.

Hakim, S.A., Perdana, R.S. & Fatyanosa, T.N., Anak Baik (Good Boy): A Low-Cost Approach to Curate Indonesian Ethical and Unethical Instructions, Proceedings of the Second Workshop in South East Asian Language Processing, 2025.

Chaubey, H. K., Tripathi, G., Ranjan, R. & and Gopalaiyengar, S.K., Comparative Analysis of RAG, Fine-Tuning, and Prompt Engineering in Chatbot Development, Proceedings of the International Conference on Future Technologies for Smart Society (ICFTSS), 2024.

Dettmers, T., Pagnoni, A., Holtzman, A. & Zettlemoyer, L., Q-LoRA: Efficient Finetuning of Quantized LLMs, Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023.

Nur, D., Kendry, W.D., & Puspitaningtyas, A., Public Complaints Service Through the ?My Citizens of Surabaya? Application as a Manifestation of E-Governance in the City of Surabaya, Triwikrama: Jurnal Ilmu Sosial, 4, 2023. (Text in Indonesian)

Puspita, S.R., Komodo-7B: The Latest Multilingual AI Model for Regional Languages) https://www.cloudcomputing.id/berita/komodo-7b-ai-multibahasa, 2024

Maryamah, M., Wilsen, G., Suhalim, C.T., Septiana, R., Fajar, A. & Solihin, M.I., Hybrid Information Retrieval with Masked and Permuted Language Modeling (MPNet) and BM25L for Indonesian Drug Data Retrieval, IEEE Xplore, International Conference on Knowledge and Smart Technology (KST), 2024.

Craswell, N., Mitra, B., Yilmaz, E., Campos, D. & Lin, J., MS MARCO: Benchmarking Ranking Models in the Large-Data Regime, in IEEE X-Plore, Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021.

Brauwers, G. & Frasincar, F., A General Survey on Attention Mechanisms in Deep Learning, IEEE Trans Knowl Data Eng, 35(4), pp. 3279-3298, 2023.

Jiwandono, R., Yellow.ai Launches Komodo-7B, Indonesia?s First LLM Trained in 11 Regional Languages https://www.techverse.asia/techno/6358/08032024/yellowai-meluncurkan-komodo-7b-llm-pertama-di-indonesia-yang-dilatih-11-bahasa-daerah, 2024. (Text in Indonesian)

Owen, L., Tripathi, V., Kumar, A. & Ahmed, B., Komodo: A Linguistic Expedition into Indonesia?s Regional Languages, Mar. 2024. Available: http://arxiv.org/abs/2403.09362

Pujiono, I., Agtyaputra, I.M. & Ruldeviyani, Y., Implementing Retrieval-Augmented Generation and Vector Databases for Chatbots in Public Services Agencies Context, Jurnal Ilmu Pengetahuan dan Teknologi Komputer, 10(1), pp. 216-223, 2024.

Song, K., Tan, X., Qin, T., Lu, J. & Liu, T.Y., MPNet: Masked and Permuted Pre-training for Language Understanding, in Proceedings of the Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020

Kurniawan, R.F. & Arif, M.F., Implementation of Text Mining Using the Cosine Similarity Method for Classifying News Content in Posts on the Pasuruan Traffic and Crime Info Facebook Group, JAMI: Jurnal Ahli Muda Indonesia, 3(1), pp. 9-17, 2022. (Text in Indonesian)

Tribes, C., Benarroch-Lelong, S., Lu, P. & Kobyzev, I., Hyperparameter Optimization for Large Language Model Instruction-Tuning, Jan. 2024, Available: http://arxiv.org/abs/2312.00949

Walker II, S.M., What is the ROUGE Score (Recall-Oriented Understudy for Gisting Evaluation)?, https://klu.ai/glossary/rouge-score. 09 November 2024.

Masdiyasa, I.G.S., Purnama, I.K.E. & Mauridhi, H. P., A New Method to Improve Movement Tracking of Human Sperms, IAENG International Journal of Computer Science, 45(4), IJCS_45_4_05, 2020.

Komodo-7B with Hybrid Retrieval and Q-LoRA for Indonesian Population Administration Question Answering

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section