Comparative Assessment of Machine Learning Approaches for Early Lung Cancer Diagnosis

Electronic Health Records Ensemble Learning Hybrid Metaheuristics Algorithm Lung Cancer Diagnosis Machine Learning

Authors

Downloads

Lung cancer, a leading cause of cancer-related mortality worldwide, often escapes early detection due to the absence of distinct symptoms in its initial stages. This work investigates how Machine Learning (ML) might improve early diagnosis by analyzing Electronic Health Records (EHR) data. Multiple ML models were developed and evaluated on a synthetic dataset created to replicate real-world patient characteristics, allowing controlled experimentation while safeguarding privacy. Model performance was tuned using both conventional optimization methods and nature-inspired approaches, with the aim of balancing predictive accuracy and computational efficiency. In our synthetic dataset experiments, ensemble learners optimized with metaheuristic techniques reached accuracy levels approaching 99 percent while maintaining computational efficiency and generally outperformed simpler baselines. The contribution of this work lies in exploring the integration of GFO and WOA for feature selection and hyperparameter tuning of XGBoost, together with a soft-voting ensemble. This approach provides an experimental pathway for enhancing predictive performance under computational constraints. However, as the dataset is synthetic, the conclusion remains experimental; validation against clinical records will be essential before translation into practice.