Gene Family Abundance Visualization based on Feature Selection Combined Deep Learning to Improve Disease Diagnosis

Authors

  • Hai Thanh Nguyen College of Information Communication and Technology, Can Tho University Campus II, 3/2 Street, Ninh Kieu District, Can Tho city, 900000,
  • Tai Tan Phan College of Information Communication and Technology, Can Tho University Campus II, 3/2 Street, Ninh Kieu District, Can Tho city, 900000,
  • Tinh Cong Dao College of Information Communication and Technology, Can Tho University Campus II, 3/2 Street, Ninh Kieu District, Can Tho city, 900000,
  • Phuc Vinh Dang Ta College of Information Communication and Technology, Can Tho University Campus II, 3/2 Street, Ninh Kieu District, Can Tho city, 900000,
  • Cham Ngoc Thi Nguyen College of Information Communication and Technology, Can Tho University Campus II, 3/2 Street, Ninh Kieu District, Can Tho city, 900000,
  • Ngoc Huynh Pham College of Information Communication and Technology, Can Tho University Campus II, 3/2 Street, Ninh Kieu District, Can Tho city, 900000,
  • Hiep Xuan Huynh College of Information Communication and Technology, Can Tho University Campus II, 3/2 Street, Ninh Kieu District, Can Tho city, 900000,

DOI:

https://doi.org/10.5614/j.eng.technol.sci.2021.53.1.9

Keywords:

deep learning, disease prediction, feature selection, gene family abundance, metagenomic, personalized medicine

Abstract

Advancements in machine learning in general and in deep learning in particular have achieved great success in numerous fields. For personalized medicine approaches, frameworks derived from learning algorithms play an important role in supporting scientists to investigate and explore novel data sources such as metagenomic data to develop and examine methodologies to improve human healthcare. Some challenges when processing this data type include its very high dimensionality and the complexity of diseases. Metagenomic data that include gene families often have millions of features. This leads to a further increase of complexity in processing and requires a huge amount of time for computation. In this study, we propose a method combining feature selection using perceptron weight-based filters and synthetic image generation to leverage deep-learning advancements in order to predict various diseases based on gene family abundance data. An experiment was conducted using gene family datasets of five diseases, i.e. liver cirrhosis, obesity, inflammatory bowel diseases, type 2 diabetes, and colorectal cancer. The proposed method provides not only visualization for gene family abundance data but also achieved a promising performance level.


Downloads

Download data is not yet available.

References

Academy of Medical Sciences, Stratified, Personalised or P4 Medicine: a New Direction for Placing the Patient at the Centre of Healthcare and Health Education (Technical Report), 2015.

Smith, R., Stratified, Personalised, or Precision Medicine, British Medical Journal, 15 October 2012.

Dudley, J.T. & Karczewski, K.J., Exploring Personal Genomics, Oxford : Oxford University Press, 2014. DOI: 10.1093/acprof:oso/97801996444 83.001.0001.

Meiliana, A., Personalize Medicine: The Future of Health Care, Indonees Biomed J., 8(3), pp. 127- 146, 2016, DOI: 10.18585/inabj. v8i3.271.

Handelsman, J., Metagenomics: Application of Genomics to Uncultured Microorganisms, Microbiol Mol. Biol. Rev., 68, pp. 669-684, 2004.

McCall, B.,COVID-19 and Artificial Intelligence: Protecting Health-Care Workers and Curbing the Spread, The Lancet Digital Health, 2(4), e166-E167, 2020. DOI: 10.1016/ S2589-7500(20)30054-6.

Behjati, S. & Tarpey, P.S., What is Next Generation Sequencing, 98, pp. 236-238, 2013. DOI:10.1136/archdischild-2013-304340.

TH, N., Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks, The 13th IEEE-RIVF International Conference on Computing and Communication Technologies 2019, Da Nang 20-22/03/2019; pp. 231-236, 2019.

Breiman, L., Random Forests. Mach Learn, 45, pp. 5-32, 2001. DOI: 10.1023/A:1010933404324.

Lewis, D.D. Naive (Bayes)at forty: The Independence Assumption in Information Retrieval, in Nedellec C., Rouveirol C. (eds) Machine Learning: ECML-98, ECML 1998, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1398, pp. 4-15, 1998. DOI: 10.1007/BFb 0026666.

Vapnik, V., Support Vector Machine. Mach Learn, 20, pp. 273-297, 1995.

Le, D., Hoai, N.X. & Kwon, Y., Knowledge and Systems Engineering, 326, pp. 577-588, 2015.

Cai, L., Wu, H., Li, D., Zhou, K., & Zou, F., Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected Via Iterative Sure Independent Screening Method, PloS one, 10(10), e0140827, 2015.

Pasolli, E., Machine Learning Meta-Analysis of Large Metagenomic Datasets: Tools and Biological Insights, PLOS Computational Biology, 12(7), e1004977, 2016. DOI: 10.1371/journal.pcbi.1004977.

Zou, Hui, & Hastiem, T., Regularization and Variable Selection via the Elastic Net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), pp. 301-320, 2005.

Hacnlar, Hilal., Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Ensemble Feature Selection Methods, 2020.

Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H. & Nowe, A., A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis, 9(4), pp. 1106-1119, 2012. DOI: 10.1109/TCBB.2012.33

Abubucker, S., Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome, 8, p. e1002 358. ISSN 1553-7358. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal .pcbi.1002358.

Pasolli, E., Accessible, Curated Metagenomic Data Through ExperimentHub, 14, pp. 1023-1024, 2017.

Xu, J., Systematic Comparison of Two Animal-to-Human Transmitted Human Coronaviruses: SARS-CoV-2 and SARS-CoV, Viruses 2020, 12(2), pp. 244, 2020, DOI: 10.3390/v12020244. 2020.

Qin, J., A Human Gut Microbial Gene Catalogue Established by Metagenomic Sequencing, 464(7285), pp. 59-65, 2010, DOI: 10.1038/nature08821 PMID: 20203603.

Qin, N., Alterations of the Human Gut Microbiome in Liver Cirrhosis. Nature, 513(7516), pp. 59-64, 2014, DOI: 10.1038/nature13568.

Zeller, G., Tap, J., Voigt, A.Y., Sunagawa, S., Kultima, J.R., Costea, P.I., Amiot, A., Bhm, J., Brunetti, F., Haberman, N., Hercog, R., Koch, M., Luciani, A., Mende, D.R., Schneider, M.A., Schrotz-King, P., Van Nhieu, J.T., Yamada, T., Zimmerman, J., Benes, V., Kloor, M., Ulrich, C.M., Doeberitz, M.v.K., Sobhani, I. & Bork, P., Potential of Fecal Microbiota for Early" Stage Detection of Colorectal Cancer, Mol Syst Biol, 10, 766, 2014. DOI: 10.15252/msb.20145645.

Le Chatelier, E, Nielsen, T., Qin, J., Prifti, E., Hildebrand, F. & Falony, G., Richness of Human gut Microbiome Correlates with Metabolic Markers. Nature, 500(7464), pp. 541-546, 2013. DOI: 10.1038/nature12506 PMID: 23985870.

Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J. & Zhang, F., A Metagenomewide Association Study of Gut Microbiota in Type 2 Diabetes. Nature 2012; 490(7418), pp. 55-60, 2012, DOI: 10.1038/nature11450 PMID: 23023125.

Ditzler, G., Morrison, J.C., Lan, Y. & Rosen, G.L., Feature Subset Selection for Metagenomics, 16, 358, 2015. DOI: 10.1186/s12859- 015-0793-8

Blum, A.L. & Langley, P., Selection of Relevant Features and Examples in Machine Learning, 97(1-2), pp. 245-271, 1997. DOI: 10.1016/S0004-3702(97)00063-5.

Garreta, R. & Moncecchi, G., Learning Scikit-learn: Machine Learning in Python, Birmingham, United Kingdom, Packt Publishing Ltd, 2013.

Hunter, J.D., Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9(3), pp. 90-95, 2007.

Chollet, F., Keras, https://keras.io (2015).

Abadi, M., TensorFlow: Large-scale Machine Learning on Heterogeneous Systems, software available from tensorflow.org. (15 February, 2020)

Downloads

Published

2021-01-30

How to Cite

Nguyen, H. T., Phan, T. T., Dao, T. C., Ta, P. V. D., Nguyen, C. N. T., Pham, N. H., & Huynh, H. X. (2021). Gene Family Abundance Visualization based on Feature Selection Combined Deep Learning to Improve Disease Diagnosis. Journal of Engineering and Technological Sciences, 53(1), 210109. https://doi.org/10.5614/j.eng.technol.sci.2021.53.1.9

Issue

Section

Articles