Gene Family Abundance Visualization based on Feature Selection Combined Deep Learning to Improve Disease Diagnosis
DOI:
https://doi.org/10.5614/j.eng.technol.sci.2021.53.1.9Keywords:
deep learning, disease prediction, feature selection, gene family abundance, metagenomic, personalized medicineAbstract
Advancements in machine learning in general and in deep learning in particular have achieved great success in numerous fields. For personalized medicine approaches, frameworks derived from learning algorithms play an important role in supporting scientists to investigate and explore novel data sources such as metagenomic data to develop and examine methodologies to improve human healthcare. Some challenges when processing this data type include its very high dimensionality and the complexity of diseases. Metagenomic data that include gene families often have millions of features. This leads to a further increase of complexity in processing and requires a huge amount of time for computation. In this study, we propose a method combining feature selection using perceptron weight-based filters and synthetic image generation to leverage deep-learning advancements in order to predict various diseases based on gene family abundance data. An experiment was conducted using gene family datasets of five diseases, i.e. liver cirrhosis, obesity, inflammatory bowel diseases, type 2 diabetes, and colorectal cancer. The proposed method provides not only visualization for gene family abundance data but also achieved a promising performance level.
Downloads
References
Academy of Medical Sciences, Stratified, Personalised or P4 Medicine: a New Direction for Placing the Patient at the Centre of Healthcare and Health Education (Technical Report), 2015.
Smith, R., Stratified, Personalised, or Precision Medicine, British Medical Journal, 15 October 2012.
Dudley, J.T. & Karczewski, K.J., Exploring Personal Genomics, Oxford : Oxford University Press, 2014. DOI: 10.1093/acprof:oso/97801996444 83.001.0001.
Meiliana, A., Personalize Medicine: The Future of Health Care, Indonees Biomed J., 8(3), pp. 127- 146, 2016, DOI: 10.18585/inabj. v8i3.271.
Handelsman, J., Metagenomics: Application of Genomics to Uncultured Microorganisms, Microbiol Mol. Biol. Rev., 68, pp. 669-684, 2004.
McCall, B.,COVID-19 and Artificial Intelligence: Protecting Health-Care Workers and Curbing the Spread, The Lancet Digital Health, 2(4), e166-E167, 2020. DOI: 10.1016/ S2589-7500(20)30054-6.
Behjati, S. & Tarpey, P.S., What is Next Generation Sequencing, 98, pp. 236-238, 2013. DOI:10.1136/archdischild-2013-304340.
TH, N., Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks, The 13th IEEE-RIVF International Conference on Computing and Communication Technologies 2019, Da Nang 20-22/03/2019; pp. 231-236, 2019.
Breiman, L., Random Forests. Mach Learn, 45, pp. 5-32, 2001. DOI: 10.1023/A:1010933404324.
Lewis, D.D. Naive (Bayes)at forty: The Independence Assumption in Information Retrieval, in Nedellec C., Rouveirol C. (eds) Machine Learning: ECML-98, ECML 1998, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1398, pp. 4-15, 1998. DOI: 10.1007/BFb 0026666.
Vapnik, V., Support Vector Machine. Mach Learn, 20, pp. 273-297, 1995.
Le, D., Hoai, N.X. & Kwon, Y., Knowledge and Systems Engineering, 326, pp. 577-588, 2015.
Cai, L., Wu, H., Li, D., Zhou, K., & Zou, F., Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected Via Iterative Sure Independent Screening Method, PloS one, 10(10), e0140827, 2015.
Pasolli, E., Machine Learning Meta-Analysis of Large Metagenomic Datasets: Tools and Biological Insights, PLOS Computational Biology, 12(7), e1004977, 2016. DOI: 10.1371/journal.pcbi.1004977.
Zou, Hui, & Hastiem, T., Regularization and Variable Selection via the Elastic Net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), pp. 301-320, 2005.
Hacnlar, Hilal., Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Ensemble Feature Selection Methods, 2020.
Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H. & Nowe, A., A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis, 9(4), pp. 1106-1119, 2012. DOI: 10.1109/TCBB.2012.33
Abubucker, S., Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome, 8, p. e1002 358. ISSN 1553-7358. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal .pcbi.1002358.
Pasolli, E., Accessible, Curated Metagenomic Data Through ExperimentHub, 14, pp. 1023-1024, 2017.
Xu, J., Systematic Comparison of Two Animal-to-Human Transmitted Human Coronaviruses: SARS-CoV-2 and SARS-CoV, Viruses 2020, 12(2), pp. 244, 2020, DOI: 10.3390/v12020244. 2020.
Qin, J., A Human Gut Microbial Gene Catalogue Established by Metagenomic Sequencing, 464(7285), pp. 59-65, 2010, DOI: 10.1038/nature08821 PMID: 20203603.
Qin, N., Alterations of the Human Gut Microbiome in Liver Cirrhosis. Nature, 513(7516), pp. 59-64, 2014, DOI: 10.1038/nature13568.
Zeller, G., Tap, J., Voigt, A.Y., Sunagawa, S., Kultima, J.R., Costea, P.I., Amiot, A., Bhm, J., Brunetti, F., Haberman, N., Hercog, R., Koch, M., Luciani, A., Mende, D.R., Schneider, M.A., Schrotz-King, P., Van Nhieu, J.T., Yamada, T., Zimmerman, J., Benes, V., Kloor, M., Ulrich, C.M., Doeberitz, M.v.K., Sobhani, I. & Bork, P., Potential of Fecal Microbiota for Early" Stage Detection of Colorectal Cancer, Mol Syst Biol, 10, 766, 2014. DOI: 10.15252/msb.20145645.
Le Chatelier, E, Nielsen, T., Qin, J., Prifti, E., Hildebrand, F. & Falony, G., Richness of Human gut Microbiome Correlates with Metabolic Markers. Nature, 500(7464), pp. 541-546, 2013. DOI: 10.1038/nature12506 PMID: 23985870.
Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J. & Zhang, F., A Metagenomewide Association Study of Gut Microbiota in Type 2 Diabetes. Nature 2012; 490(7418), pp. 55-60, 2012, DOI: 10.1038/nature11450 PMID: 23023125.
Ditzler, G., Morrison, J.C., Lan, Y. & Rosen, G.L., Feature Subset Selection for Metagenomics, 16, 358, 2015. DOI: 10.1186/s12859- 015-0793-8
Blum, A.L. & Langley, P., Selection of Relevant Features and Examples in Machine Learning, 97(1-2), pp. 245-271, 1997. DOI: 10.1016/S0004-3702(97)00063-5.
Garreta, R. & Moncecchi, G., Learning Scikit-learn: Machine Learning in Python, Birmingham, United Kingdom, Packt Publishing Ltd, 2013.
Hunter, J.D., Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9(3), pp. 90-95, 2007.
Chollet, F., Keras, https://keras.io (2015).
Abadi, M., TensorFlow: Large-scale Machine Learning on Heterogeneous Systems, software available from tensorflow.org. (15 February, 2020)