Fine-tuning NER for Triplet Extraction in Medical Knowledge Graph Construction
DOI:
https://doi.org/10.5614/itbj.ict.res.appl.2025.19.2.1Keywords:
dependency parsing, medical knowledge graph, named entity recognition, part-of-speech tagging, tripletsAbstract
This study presents a new approach for constructing a medical knowledge graph using Named Entity Recognition (NER) to identify entities such as diseases, drugs, or medical procedures, alongside part-of-speech (POS) tagging and dependency parsing to determine words that function as verbs and roots. These extracted words are then used as relations between entities, forming triplets in the format (entity, relation, entity). While the knowledge graph provides a structured representation of medical information, the evaluation primarily reflects the performance of the underlying NLP pipeline (NER, POS tagging, and dependency parsing) used to generate the triplets. Quantitative evaluation was performed using metrics such as precision, recall, and F1-score to assess the accuracy and completeness of entity and relation extraction. The qualitative evaluation involved medical domain experts to assess the relevance and validity of the relationships derived. The results indicate that fine-tuning a pre-trained model for NER and leveraging a pre-trained model for POS tagging and dependency parsing can effectively generate accurate triplets for constructing a medical knowledge graph. This approach demonstrated strong performance, achieving high evaluation scores in both quantitative and qualitative evaluations.
Downloads
References
Nicholson, D.N. & Greene, C.S., Constructing Knowledge Graphs and Their Biomedical Applications, Computational and Structural Biotechnology Journal, 18, 1414-1428, 2020.
Gong, F., Wang, M., Wang, H., Wang, S. & Liu, M., SMR: Medical Knowledge Graph Embedding for Safe Medicine Recommendation, Big Data Research, 23, 100174, 2021.
Harnoune, A., Rhanoui, M., Mikram, M., Yousfi, S., Elkaimbillah, Z. & Asri, B.E., BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph Construction and Analysis, Computer Methods and Programs in Biomedicine Update, 1, 100042, 2021.
Agrawal, G., Deng, Y., Park, J., Liu, H. & Chen, Y.C., Building Knowledge Graphs from Unstructured Texts, Applications and impact analyses in cyber security education. Information, 13(11), p.526, 2022.
Shi, L., Li, S., Yang, X., Qi, J., Pan, G. & Zhou, B., Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services, BioMed research international, 2017.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M. & Davison, J., Transformers: State-of-the-art Natural Language Processing, In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38-45, 2020.
Kenton, J.D.M.W.C. & Toutanova, L.K., Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding, In Proceedings of naacL-HLT, 1, p. 2, June, 2019.
He, P., Gao, J. & Chen, W., Debertav3: Improving Deberta using Electra-style Pre-training with Gradient-disentangled Embedding Sharing. arXiv preprint arXiv, 2111,09543, 2021.
Clark, K., Luong, M., Le, Q. & Manning, C., Electra: Pre-training Text Encoders as Discriminators Rather than Generators, arXiv preprint arXiv, 2003, 10555, 2020.
Chiche, A. & Yitagesu, B., Part of Speech Tagging: A Systematic Review of Deep Learning and Machine Learning Approaches, Journal of Big Data, 9(1), p.10, 2022.
Nivre, J., De Marneffe, M.C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N. & Tsarfaty, R., May. Universal Dependencies V1: A Multilingual Treebank Collection, In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pp. 1659-1666, 2016.
Thomas, G., Anderson, D. & Nashon, S., Development of an Instrument Designed to Investigate Elements of Science Students? Metacognition, Self-efficacy and Learning Processes, The SEMLI-S. Int. J. Sci. Educ., 30, 1701?1724, 2008.
Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.W.H., Feng, M., Ghassemi, M., & Mark, R.G., MIMIC-III, a Freely Accessible Critical Care Database, Scientific data, 3(1), 1-9, 2016.
Zhu, Y., Zhang, J., Wang, G., Yao, R., Ren, C., Chen, G. & Yu, Q., Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the Mimic-iii Database, Frontiers In Medicine, 8, 662340, 2021.
Aldughayfiq, B., Ashfaq, F., Jhanjhi, N.Z. & Humayun, M., Capturing semantic relationships in electronic health records using knowledge graphs: An implementation using mimic iii dataset and graphdb, Healthcare 11,(12), p. 1762, MDPI, 2023.
Gao, J., Li, X., Xu, Y.E., Sisman, B., Dong, X.L. & Yang, J., Efficient Knowledge Graph Accuracy Evaluation, arXiv preprint arXiv, 1907.09657, 2019.


