Komodo-7B with Hybrid Retrieval and Q-LoRA for Indonesian Population Administration Question Answering
DOI:
https://doi.org/10.5614/itbj.ict.res.appl.2026.20.1.5Keywords:
Komodo-7B, large language models, Q-LoRA, question answering, retrieval-augmented generation, population administration, public servicesAbstract
Fast, responsive, and informative public services are societal demands that must be fulfilled by government agencies, among which the Department of Population and Civil Registration of Surabaya City. To enhance service quality, this study developed a Large Language Model (LLM)-based Question Answering (QA) system to address public inquiries regarding Identity Card (ID) and Family Card (FC) services. The proposed system utilizes the Komodo-7B model, which was customized using Quantized Low-Rank Adaptation (Q-LoRA) fine-tuning and integrated with a Retrieval-Augmented Generation (RAG) approach to improve the accuracy and relevance of the generated responses. The training process leveraged a real-world complaint dataset from Disdukcapil alongside the open-source MS MARCO dataset. Furthermore, the RAG implementation employs sentence vectorization via SentenceTransformer and cosine similarity-based context retrieval. System performance was evaluated using ROUGE and METEOR metrics across four scenarios: Komodo-7B Base, RAG Komodo-7B Base, Fine-Tuned Komodo-7B, and RAG Fine-Tuned Komodo-7B. The results show that the RAG Fine-Tuned Komodo-7B configuration delivered the best performance, achieving F1-Scores of 0.3554 for ROUGE-1, 0.3096 for ROUGE-L, and 0.2886 for METEOR.
Downloads
References
Irianto, H., Kurniawan, A. & Mulyono, A., Optimizing Services to Achieve Good Governance at the Mini Public Service Mall in Sukodono District, Sidoarjo Regency, Jurnal Intelektual Administrasi Publik dan Ilmu Komunikasi, 8(1), 2022. (Text in Indonesian)
Andrew, B.F. & Mei, R.A., The New Generation Klampid System Independently Addresses Population and Civil Registration Problems in Surabaya, Masyarakat Mandiri: Jurnal Pengabdian dan Pembangunan Lokal, 1(3), 2024. (Text in Indonesian)
Hardi, W., Suprastiyo, A. & Retno, S.A., Model of Implementation of Population and Civil Registration Services in the Surabaya City Government Indonesia, Wseas Transactions on Environment and Development, 19, 2023.
Abbasiantaeb, Z. & Momtazi, S., Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives: A Survey, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(6), 2020.
Shaikh, M.R, Nida, S., Khalid, M., Qurrat-ul-Ain, N. & Talha, M., Transformers as the Foundation of Large Language Models: A Comprehensive Review, International Journal of Innovations in Science & Technology, 7(4), 2025.
Ibomoiye, D.M., Nobert, J., George, O., Oyindamola, O.O., Ebenezer, E. & Cameron, M., Large language models: an overview of foundational architectures, recent trends, and a new taxonomy, Discover Applied Sciences, 7(1027), 2025.
Subhash, N., Bandyopadhyay, S., Zhang, J., et al., Transformers and large language models in healthcare: A review, Artif Intell Med, 154, 102900, 2024.
Alawwad, H.A., Alhothali, A., Naseem, U., Alkhathlan, A. & Jamal, A., Enhancing textual textbook question answering with large language models and retrieval augmented generation, Pattern Recognition, 162(5), 111332, 2025.
Hakim, S.A., Perdana, R.S. & Fatyanosa, T.N., Anak Baik (Good Boy): A Low-Cost Approach to Curate Indonesian Ethical and Unethical Instructions, Proceedings of the Second Workshop in South East Asian Language Processing, 2025.
Chaubey, H. K., Tripathi, G., Ranjan, R. & and Gopalaiyengar, S.K., Comparative Analysis of RAG, Fine-Tuning, and Prompt Engineering in Chatbot Development, Proceedings of the International Conference on Future Technologies for Smart Society (ICFTSS), 2024.
Dettmers, T., Pagnoni, A., Holtzman, A. & Zettlemoyer, L., Q-LoRA: Efficient Finetuning of Quantized LLMs, Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023.
Nur, D., Kendry, W.D., & Puspitaningtyas, A., Public Complaints Service Through the ?My Citizens of Surabaya? Application as a Manifestation of E-Governance in the City of Surabaya, Triwikrama: Jurnal Ilmu Sosial, 4, 2023. (Text in Indonesian)
Puspita, S.R., Komodo-7B: The Latest Multilingual AI Model for Regional Languages) https://www.cloudcomputing.id/berita/komodo-7b-ai-multibahasa, 2024
Maryamah, M., Wilsen, G., Suhalim, C.T., Septiana, R., Fajar, A. & Solihin, M.I., Hybrid Information Retrieval with Masked and Permuted Language Modeling (MPNet) and BM25L for Indonesian Drug Data Retrieval, IEEE Xplore, International Conference on Knowledge and Smart Technology (KST), 2024.
Craswell, N., Mitra, B., Yilmaz, E., Campos, D. & Lin, J., MS MARCO: Benchmarking Ranking Models in the Large-Data Regime, in IEEE X-Plore, Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021.
Brauwers, G. & Frasincar, F., A General Survey on Attention Mechanisms in Deep Learning, IEEE Trans Knowl Data Eng, 35(4), pp. 3279-3298, 2023.
Jiwandono, R., Yellow.ai Launches Komodo-7B, Indonesia?s First LLM Trained in 11 Regional Languages https://www.techverse.asia/techno/6358/08032024/yellowai-meluncurkan-komodo-7b-llm-pertama-di-indonesia-yang-dilatih-11-bahasa-daerah, 2024. (Text in Indonesian)
Owen, L., Tripathi, V., Kumar, A. & Ahmed, B., Komodo: A Linguistic Expedition into Indonesia?s Regional Languages, Mar. 2024. Available: http://arxiv.org/abs/2403.09362
Pujiono, I., Agtyaputra, I.M. & Ruldeviyani, Y., Implementing Retrieval-Augmented Generation and Vector Databases for Chatbots in Public Services Agencies Context, Jurnal Ilmu Pengetahuan dan Teknologi Komputer, 10(1), pp. 216-223, 2024.
Song, K., Tan, X., Qin, T., Lu, J. & Liu, T.Y., MPNet: Masked and Permuted Pre-training for Language Understanding, in Proceedings of the Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020
Kurniawan, R.F. & Arif, M.F., Implementation of Text Mining Using the Cosine Similarity Method for Classifying News Content in Posts on the Pasuruan Traffic and Crime Info Facebook Group, JAMI: Jurnal Ahli Muda Indonesia, 3(1), pp. 9-17, 2022. (Text in Indonesian)
Tribes, C., Benarroch-Lelong, S., Lu, P. & Kobyzev, I., Hyperparameter Optimization for Large Language Model Instruction-Tuning, Jan. 2024, Available: http://arxiv.org/abs/2312.00949
Walker II, S.M., What is the ROUGE Score (Recall-Oriented Understudy for Gisting Evaluation)?, https://klu.ai/glossary/rouge-score. 09 November 2024.
Masdiyasa, I.G.S., Purnama, I.K.E. & Mauridhi, H. P., A New Method to Improve Movement Tracking of Human Sperms, IAENG International Journal of Computer Science, 45(4), IJCS_45_4_05, 2020.


