Genetic Links Between Common Lung Diseases and Lung Cancer Progression: Bioinformatics and Machine Learning Insights

Lung Cancer Commonly Lung Disorders Survival Curve COX PH Model Classification Algorithms PPI Network Molecular Pathways.

Authors

  • Md Ali Hossain 1) Department of Computer Science & Engineering, Jahangirnagar University, Savar, Dhaka 1342, Bangladesh. 2) Health Informatics Lab, Department of Computer Science & Engineering, Daffodil International University, Dhaka 1216, Bangladesh.
  • Tania Akter Asa 2) Health Informatics Lab, Department of Computer Science & Engineering, Daffodil International University, Dhaka 1216, Bangladesh. 3) Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh.
  • Md. Zulfiker Mahmud Department of Computer Science and Engineering, Jagannath University, Dhaka 1100,, Bangladesh
  • AKM Azad Department of Mathematics & Statistics, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318,, Saudi Arabia
  • Mohammad Zahidur Rahman
    rmzahid@juniv.edu
    Department of Computer Science & Engineering, Jahangirnagar University, Savar, Dhaka 1342,, Bangladesh
  • Mohammad Ali Moni 5) Artificial Intelligence and Cyber Futures Institute, Charles Sturt University, Bathurst 2795, Australia. 6) Rural Health Research Institute, Charles Sturt University, Orange 2800, Australia.
  • Ahmed Moustafa 7) Department of Human Anatomy and Physiology, Faculty of Health Sciences, University of Johannesburg, Doornfontein, 2094, South Africa. 8) Centre for Data Analytics and School of Psychology, Bond University, Gold Coast, Queensland, 4229, Australia.

Downloads

Lung cancer (LC) is one of the most frequently diagnosed cancers and remains the leading cause of cancer-related mortality worldwide, representing a significant global health challenge. While numerous common lung diseases (CLDs) are implicated in LC development, the underlying causes of LC originating from CLDs remain inadequately elucidated. A thorough exploration of LC's progression from CLDs is essential; our approach integrated bioinformatics and machine learning, utilizing data from GEO and TCGA databases. We began by identifying differentially expressed genes (DEGs) in LC and CLDs, and our gene-disease network revealed for the first time shared DEGs (LC shares significant genes with TB (36), asthma (10), pneumonia (17), COPD (18), and Idiopathic Pulmonary Fibrosis (IPF) (78)), providing insights into potential connections of LC with CLDs. This analysis not only broadened our understanding of their associations but also identified significant pathways and hub proteins (SPTBN1, KCNA4, SCN7A, KCNQ3, GRIA1, and SDC1) through a protein-protein interaction network (PPI). Furthermore, RNA-seq and clinical data were obtained from the cBioPortal portal for shared DEGs of LC and CLDs, assessing their impact on LC patient survival. Integrated mRNA-Seq and clinical data were analyzed via univariate and multivariate Cox Proportional Hazard models to elucidate the influence of significant genes on survival. Furthermore, we developed and deployed a predictive model leveraging the identified hub genes, which demonstrated high accuracy in predicting LC progression. The identified biomarkers and pathways hold promise for further translational research and potential therapeutic targets, advancing understanding of LC development from CLDs. Additionally, co-expression networks among common genes were explored using the Weighted Gene Co-expression Network Analysis (WGCNA). Finally, the hub genes were validated using the Human Protein Atlas (HPA) database and evaluated through various classification algorithms to ascertain their predictive power and diagnostic potential.

 

Doi: 10.28991/ESJ-2025-09-02-021

Full Text: PDF