Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language
DOI:
https://doi.org/10.5614/itbj.ict.res.appl.2025.19.1.2Keywords:
COVID-19, deep learning, fact-checking, natural language inference, knowledge graph, natural languageAbstract
Automated fact-checking is a key strategy to overcome the spread of COVID-19 misinformation on the internet. These systems typically leverage deep learning approaches through natural language inference (NLI) to verify the truthfulness of information based on supporting evidence. However, one challenge that arises in deep learning is performance stagnation due to a lack of knowledge during training. This study proposes using a knowledge graph (KG) as external knowledge to enhance NLI performance for automated COVID-19 fact-checking in the Indonesian language. The proposed model architecture comprises three modules: a fact module, an NLI module, and a classifier module. The fact module processes information from the KG, while the NLI module handles semantic relationships between the given premise and hypothesis. The representation vectors from both modules are concatenated and fed into the classifier module to produce the final result. The model was trained using the generated Indonesian COVID-19 fact-checking dataset and the COVID-19 KG Bahasa Indonesia. Our study demonstrates that incorporating KGs can significantly improve NLI performance in fact-checking, achieving a maximum accuracy of 0.8616. This suggests that KGs are a valuable component for enhancing NLI performance in automated fact-checking.
Downloads
References
Baloch, S., Baloch, M.A., Zheng, T. & Pei, X., The Coronavirus Disease 2019 (COVID-19) Pandemic, Tohoku J Exp Med, 250(4), pp. 271-278, 2020.
van der Meer, T.G.L.A. & Jin, Y., Seeking Formula for Misinformation Treatment in Public Health Crises: The Effects of Corrective Information Type and Source, Health Commun, 35(5), pp. 560-575, Apr. 2020.
Vladika, J. & Matthes, F., Scientific Fact-Checking: A Survey of Resources and Approaches, Findings of the Association for Computational Linguistics: ACL 2023, Association for Computational Linguistics, pp. 6215-6230, Jul. 2023.
Tejamaya, M., Susanto, H., Putri, R.S., Surahman, E., Pratama, N.Y., Wibowo, H., Santoso, F.H. & Wibawa, B.M., Risk Perception of COVID-19 in Indonesia During the First Stage of the Pandemic, Front Public Health, 9, Oct. 2021.
Manika, D., Dickert, S. & Golden, L.L., Check (It) Yourself Before You Wreck Yourself: The Benefits of Online Health Information Exposure on Risk Perception and Intentions to Protect Oneself, Technol Forecast Soc Change, 173, 121098, Dec. 2021.
Swire-Thompson, B. & Lazer, D., Public Health and Online Misinformation: Challenges and Recommendations, Annu Rev Public Health, 41(1), pp. 433-451, Apr. 2020.
Ahmed, S. & Rasul, M.E., Examining the Association Between Social Media Fatigue, Cognitive Ability, Narcissism and Misinformation Sharing: Cross-National Evidence from Eight Countries, Sci Rep, 13(1), 15416, Sep. 2023.
Tan, A.S.L., Lee, C. & Chae, J., Exposure to Health (Mis)Information: Lagged Effects on Young Adults' Health Behaviors and Potential Pathways, Journal of Communication, 65(4), pp. 674-698, Aug. 2015.
Mohamad, E., Tham, J.S., Ayub, S.H., Hamzah, M.R., Hashim, H., Azlan, A.A., Ahmad, A.L., Che Sab, N. & Chan, X.Y., Exposure to Misinformation, Risk Perception, and Confidence towards the Government as Factors Influencing Negative Attitudes towards COVID-19 Vaccination in Malaysia, Int J Environ Res Public Health, 19(22), 14623, Nov. 2022.
Thorne, J. & Vlachos, A., Automated Fact Checking: Task Formulations, Methods and Future Directions, Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 3346-3359, Aug. 2018.
Zeng, X., Abumansour, A.S. & Zubiaga, A., Automated Fact-Checking: A survey, Language & Linguistics Compass, 15(10), 2021.
Vlachos, A. & Riedel, S., Fact Checking: Task Definition and Dataset Construction, Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, Association for Computational Linguistics, pp. 18-22, Jun. 2014.
Guo, Z., Schlichtkrull, M. & Vlachos, A., A Survey on Automated Fact-Checking, Trans Assoc Comput Linguist, 10, pp. 178-206, 2022.
Sathe, A., Ather, S., Le, T.M., Perry, N. & Park, J., Automated Fact-Checking of Claims from Wikipedia, Proceedings of the Twelfth Language Resources and Evaluation Conference, European Language Resources Association, pp. 6874-6882, May 2020.
Widiana, P.G.A.T., Purwarianti, A. & Ruskanda, F.Z., Developing COVID-19 Information validation system Using Natural language inference, Proceedings of the 2022 9th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), IEEE, pp. 1-6, Sep. 2022.
Bowman, S.R., Angeli, G., Potts, C. & Manning, C.D., A Large Annotated Corpus for Learning Natural Language Inference, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 632-642, Sep. 2015.
Kalouli, A.-L., Buis, A., Real, L., Palmer, M. & de Paiva, V., Explaining Simple Natural Language Inference, Proceedings of the 13th Linguistic Annotation Workshop, Association for Computational Linguistics, pp. 132-143, Aug. 2019.
MacCartney, B. & Manning, C.D., An Extended Model of Natural Logic, Proceedings of the Eight International Conference on Computational Semantics, Association for Computational Linguistics, pp. 140-156, Jan. 2009.
Kalyan, K.S., Rajasekharan, A. & Sangeetha, S., AMMU: A Survey of Transformer-Based Biomedical Pretrained Language Models, Journal of Biomedical Informatics, 126, p. 103982, Feb. 2022.
Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N. & Huang, X., Pre-trained Models for Natural Language Processing: A Survey, Science China Technological Sciences, 63(10), pp. 1872-1897, Sep. 2020.
Hu, L., Liu, Z., Zhao, Z., Hou, L., Nie, L. & Li, J., A Survey of Knowledge Enhanced Pre-Trained Language Models, IEEE Trans Knowl Data Eng, 36(4), pp. 1413-1430, Apr. 2024.
Wang, Z., Li, L. & Zeng, D., Knowledge-Enhanced Natural Language Inference Based on Knowledge Graphs, Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, pp. 6498-6508, Dec. 2020.
Chen, C. & Shu, K., Combating Misinformation in the Age of LLMs: Opportunities and Challenges, AI Magazine, 45(3), pp. 354-368, Aug. 2024.
Yang, J., Hu, X., Xiao, G., & Shen, Y., A Survey of Knowledge Enhanced Pre-Trained Language Models, ACM Transactions on Asian and Low-Resource Language Information Processing, 23(4), pp. 1-27, Mar. 2024.
Hogan, A., Blomqvist, E., Cochez, M., D'Amato, C., Melo, G. De, Gutierrez, C., Kirrane, S., Gayo, J. E. L., Navigli, R., Neumaier, S., Ngomo, A.-C. Ngonga, Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S. & Zimmermann, A., Knowledge graphs, ACM Computing Surveys, 54(4), 71, June 2021.
Peng, C., Xia, F., Naseriparsa, M. & Osborne, F., Knowledge Graphs: Opportunities and Challenges, Artificial Intelligence Review, 56(11), pp. 13071-13102, Apr. 2023.
Muharram, A.P., Taufikulhakim, F.H. & Purwarianti, A., Building a Simple COVID-19 Knowledge Graph in Bahasa Indonesia: A Preliminary Study, Proceedings of the 2023 IEEE International Biomedical Instrumentation and Technology Conference (IBITeC), IEEE, pp. 159-164, Nov. 2023.
Maharani, N.P.I., Purwarianti, A. & Aji, A.F., Low-Resource Clickbait Spoiling for Indonesian via Question Answering, Proceedings of the 2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA), IEEE, pp. 1-6, Oct. 2023.
Liu, W., Zhou, P., Zhao, Z., Wang, Z., Ju, ., Deng, H. & Wang, P., K-BERT: Enabling Language Representation with Knowledge Graph, Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, pp. 2901-2908, Apr. 2020.
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. & Zettlemoyer, L., Knowledge Enhanced Contextual Word Representations, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, pp. 43-54, Nov. 2019.
Yang, A., Wang, Q., Liu, J., Liu, K., Lyu, Y., Wu, H., She, Q. & Li, S., Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 2346-2357, July 2019.
Tala, F.Z., A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia, Master of Logic Thesis, University of Amsterdam, 2003.
Fang, Y., Li, X., Thomas, S.W. & Zhu, X., ChatGPT as Data Augmentation for Compositional Generalization: A Case Study in Open Intent Detection, Proceedings of the Joint Workshop of the 5th Financial Technology and Natural Language Processing (FinNLP) and 2nd Multimodal AI for Financial Forecasting (Muffin), pp. 13-33, Aug. 2023.
Lingo, R., Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study on Telematics Data with ChatGPT, arXiv preprint, June 2023.
Dai, H., Liu, Z., Liao, W., Huang, X., Cao, Y., Wu, Z., Zhao, L., Xu, S., Zeng, F., Liu, W., Liu, N., Li, S., Zhu, D., Cai, H., Sun, L., Li, Q., Shen, D., Liu, T. & Li, X., AugGPT: Leveraging ChatGPT for Text Data Augmentation, IEEE Transactions on Big Data, 11(3), pp. 907-918, 2025.
Guo, X. & Chen, Y., Generative AI for Synthetic Data Generation: Methods, Challenges and the Future, arXiv preprint, Mar. 2024.
Ding, B., Qin, C., Zhao, R., Luo, T., Li, X., Chen, G., Xia, W., Hu, J., Luu, A.T. & Joty, S., Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges, in Findings of the Association for Computational Linguistics: ACL 2024, Association for Computational Linguistics, pp. 1679-1705, Aug. 2024.
Koto, F., Rahimi, A., Lau, J.H. & Baldwin, T., IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP, Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, pp. 757-770, Dec. 2020.
Wilie, B., Vincentio, K., Winata, G.I., Cahyawijaya, S., Li, X., Lim, Z.Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S. & Purwarianti, A., IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding, in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, pp. 843-857, Dec. 2020.
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, pp. 4171-4186, Jun. 2019.
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm, F., Grave, E., Ott, M., Zettlemoyer, L. & Stoyanov, V., Unsupervised Cross-lingual Representation Learning at Scale, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 8440-8451, Jul. 2020.
Dunford, R., Su, Q., Tamang, E., & Wintour, A., The Pareto Principle, The Plymouth Student Scientist, 7(1), 140-148, 2014.
Ng, L.H.Xian & Carley, K.M., ?The Coronavirus is a Bioweapon?: Classifying Coronavirus Stories on Fact-Checking Sites, Computational and Mathematical Organization Theory, 27(2), 179-194, 2021.
Shaar, S., Alam, F., Da San Martino, G. & Nakov, P., The Role of Context in Detecting Previously Fact-Checked Claims, Findings of the Association for Computational Linguistics: NAACL 2022, Association for Computational Linguistics, pp. 1619-1631, Jul. 2022.
Zhang, D.C. & Lee, D., CORRECT: Context- and Reference-Augmented Reasoning and Prompting for Fact-Checking, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Association for Computational Linguistics, pp. 3007-3019, Apr. 2025.