A Comparative Performance Analysis of Hybrid and Classical Machine Learning Method in Predicting Diabetes

Kalaiarasi Sonai Muthu Anbananthen, Mikail Bin Muhammad Azman Busst, Rajkumar Kannan, Subarmaniam Kannan

Abstract


Diabetes mellitus is one of medical science’s most important research topics because of the disease’s severe consequences. High blood glucose levels characterize it. Early detection of diabetes is made possible by machine learning techniques with their intelligent capabilities to accurately predict diabetes and prevent its complications. Therefore, this study aims to find a machine learning approach that can more accurately predict diabetes. This study compares the performance of various classical machine learning models with the hybrid machine learning approach. The hybrid model includes the homogenous model, which comprises Random Forest, AdaBoost, XGBoost, Extra Trees, Gradient Booster, and the heterogeneous model that uses stacking ensemble methods. The stacking ensemble or stacked generalization approach is a meta-classifier in which multiple learners collaborate for prediction. The performance of the homogeneous hybrid models, Stacked Generalization and the classic machine learning methods such as Naive Bayes and Multilayer Perceptron, k-Nearest Neighbour, and support vector machine are compared. The experimental analysis using Pima Indians and the early-stage diabetes dataset demonstrates that the hybrid models achieve higher accuracy in diagnosing diabetes than the classical models. In the comparison of all the hybrid models, the heterogeneous model using the Stacked Generalization approach outperformed other models by achieving 83.9% and 98.5%.

 

Doi: 10.28991/ESJ-2023-07-01-08

Full Text: PDF


Keywords


Ensemble Learning; Stacked Generalization; Machine Learning; Prediction; Healthcare.

References


Wild, S., Roglic, G., Green, A., Sicree, R., & King, H. (2004). Global Prevalence of Diabetes: Estimates for the year 2000 and projections for 2030. Diabetes Care, 27(5), 1047–1053. doi:10.2337/diacare.27.5.1047.

International Diabetes Federation (IDF). (2021). Learning of diabetes facts figures. International Diabetes Federation, Brussels, Belgium. Available online: https://www.idf.org/aboutdiabetes/what-is-diabetes/facts-figures.html (accessed on August 2022).

Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., Colagiuri, S., Guariguata, L., Motala, A. A., Ogurtsova, K., Shaw, J. E., Bright, D., & Williams, R. (2019). Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Research and Clinical Practice, 157, 1–10,. doi:10.1016/j.diabres.2019.107843.

Cho, N. H., Shaw, J. E., Karuranga, S., Huang, Y., da Rocha Fernandes, J. D., Ohlrogge, A. W., & Malanda, B. (2018). IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Research and Clinical Practice, 138, 271–281. doi:10.1016/j.diabres.2018.02.023.

Seery, C. (2019). Diabetes Prevalence. The global diabetes community, Available online: https://www.diabetes.co.uk/diabetes-prevalence.html (accessed on August 2022).

Syaifuddin, M., & Muthu Anbananthen, K. S. (2013). Framework: Diabetes management system. IMPACT-2013. doi:10.1109/mspct.2013.6782099.

Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal, 15, 104–116. doi:10.1016/j.csbj.2016.12.005.

Çalişir, D., & Doğantekin, E. (2011). An automatic diabetes diagnosis system based on LDA-Wavelet Support Vector Machine Classifier. Expert Systems with Applications, 38(7), 8311–8315. doi:10.1016/j.eswa.2011.01.017.

Georga, E. I., Protopappas, V. C., Polyzos, D., & Fotiadis, D. I. (2012). A predictive model of subcutaneous glucose concentration in type 1 diabetes based on Random Forests. 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. doi:10.1109/embc.2012.6346567.

Sardarinia, M., Akbarpour, S., Lotfaliany, M., Bagherzadeh-Khiabani, F., Bozorgmanesh, M., Sheikholeslami, F., Azizi, F., & Hadaegh, F. (2016). Risk factors for incidence of cardiovascular diseases and all-cause mortality in a middle-eastern population over a decade follow-up: Tehran lipid and glucose study. PLoS ONE, 11(12), 12. doi:10.1371/journal.pone.0167623.

Iyer, A., S, J., & Sumbaly, R. (2015). Diagnosis of Diabetes Using Classification Mining Techniques. International Journal of Data Mining & Knowledge Management Process, 5(1), 01–14. doi:10.5121/ijdkp.2015.5101.

Butwall, M., & Kumar, S. (2015). A Data Mining Approach for the Diagnosis of Diabetes Mellitus using Random Forest Classifier. International Journal of Computer Applications, 120(8), 36–39. doi:10.5120/21249-4065.

Sisodia, D., & Sisodia, D. S. (2018). Prediction of Diabetes using Classification Algorithms. Procedia Computer Science, 132, 1578–1585. doi:10.1016/j.procs.2018.05.122.

Oleiwi, A., Shi, L., Tao, Y., & Wei, L. (2020). A comparative analysis and risk prediction of diabetes at early stage using machine learning approach. International Journal of Future Generation Communication and Networking, 13(3), 4151-4163.

Bukhari, M. M., Alkhamees, B. F., Hussain, S., Gumaei, A., Assiri, A., & Ullah, S. S. (2021). An Improved Artificial Neural Network Model for Effective Diabetes Prediction. Complexity, 2021, 1–10. doi:10.1155/2021/5525271.

Rajaraman, S., Candemir, S., Xue, Z., Alderson, P. O., Kohli, M., Abuya, J., Thoma, G. R., & Antani, S. (2018). A novel stacked generalization of models for improved TB detection in chest radiographs. 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). doi:10.1109/embc.2018.8512337.

Graczyk, M., Lasota, T., Trawiński, B., Trawiński, K. (2010). Comparison of Bagging, Boosting and Stacking Ensembles Applied to Real Estate Appraisal. Intelligent Information and Database Systems. ACIIDS 2010, Lecture Notes in Computer Science, 5991, Springer, Berlin, Germany. doi:10.1007/978-3-642-12101-2_35.

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. doi:10.1007/bf00058655.

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. doi:10.1016/s0893-6080(05)80023-1.

Sill, J., Takács, G., Mackey, L., & Lin, D. (2009). Feature-weighted linear stacking. arXiv preprint. doi:10.48550/arXiv.0911.0460

Rider, A. K., & Chawla, N. V. (2013). An Ensemble Topic Model for Sharing Healthcare Data and Predicting Disease Risk. Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. doi:10.1145/2506583.2506640.

Araújo, F. H. D., Santana, A. M., & de A. Santos Neto, P. (2016). Using machine learning to support healthcare professionals in making preauthorization decisions. International Journal of Medical Informatics, 94, 1–7. doi:10.1016/j.ijmedinf.2016.06.007.

Elkomy, G., Sallam, E., & Elgokhy, S. (2017). A stacked generalization method for disease progression prediction. 2017 13th International Computer Engineering Conference (ICENCO). doi:10.1109/icenco.2017.8289772.

Kaggle Inc. (2016). Pima Indians Diabetes Databases. Available online: https://www.kaggle.com/uciml/pima-indians-diabetes-database (accessed on August 2022).

Dutta, I. (2020). Early-stage diabetes. Kaggle. Available online: https://www.kaggle.com/datasets/ishandutta/early-stage-diabetes-risk-prediction-dataset (Accessed on April 2022).

Verma, D., & Mishra, N. (2017). Analysis and prediction of breast cancer and diabetes disease datasets using data mining classification techniques. 2017 International Conference on Intelligent Sustainable Systems (ICISS). doi:10.1109/iss1.2017.8389229.

Kumari, S., Kumar, D., & Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering, 2, 40–46. doi:10.1016/j.ijcce.2021.01.001.

Cao, H., Peng, J., Zhou, Z., Sun, Y., Wang, Y., & Liang, Y. (2022). Insight into the defluorination ability of per-and polyfluoroalkyl substances based on machine learning and quantum chemical computations. Science of the Total Environment, 807, 151018. doi:10.1016/j.scitotenv.2021.151018.

Anbananthen, K. S. M., Subbiah, S., Chelliah, D., Sivakumar, P., Somasundaram, V., Velshankar, K. H., & Khan, M. K. A. A. (2021). An intelligent decision support system for crop yield prediction using hybrid machine learning algorithms. F1000Research, 10. doi:10.12688/f1000research.73009.1.

Xiao, H., Xiao, Z., & Wang, Y. (2016). Ensemble classification based on supervised clustering for credit scoring. Applied Soft Computing Journal, 43, 73–86. doi:10.1016/j.asoc.2016.02.022.

Abdollahi, J., & Nouri-Moghaddam, B. (2022). Hybrid stacked ensemble combined with genetic algorithms for diabetes prediction. Iran Journal of Computer Science, 5(3), 205–220. doi:10.1007/s42044-022-00100-1.

Talukdar, S., Pal, S., & Singha, P. (2021). Proposing artificial intelligence based livelihood vulnerability index in river islands. Journal of Cleaner Production, 284, 124707. doi:10.1016/j.jclepro.2020.124707.

Dhahri, H., Rahmany, I., Mahmood, A., Al Maghayreh, E., & Elkilani, W. (2020). Tabu Search and Machine-Learning Classification of Benign and Malignant Proliferative Breast Lesions. BioMed Research International, 2020. doi:10.1155/2020/4671349.

Guo, B., Hu, J., Wu, W., Peng, Q., & Wu, F. (2019). The Tabu_genetic algorithm: A novel method for hyper-parameter optimization of learning algorithms. Electronics (Switzerland), 8(5), 1–19. doi:10.3390/electronics8050579.

Wang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38(1), 223–230. doi:10.1016/j.eswa.2010.06.048.

Liao, Z., Su, M., Ning, G., Liu, Y., Wang, T., & Zhou, J. (2021). A Novel Stacked Generalization Ensemble-Based Hybrid PSVM-PMLP-MLR Model for Energy Consumption Prediction of Copper Foil Electrolytic Preparation. IEEE Access, 9, 5821–5831. doi:10.1109/ACCESS.2020.3048714.

Mirshahvalad, R., & Zanjani, N. A. (2017). Diabetes prediction using ensemble perceptron algorithm. 2017 9th International Conference on Computational Intelligence and Communication Networks (CICN), IEEE, Girne, Northern Cyprus, 17634291. doi:10.1109/cicn.2017.8319383.


Full Text: PDF

DOI: 10.28991/ESJ-2023-07-01-08

Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Kalaiarasi Sonai Muthu Anbananthen, Mikail Muhammad Azman Busst, Rajkumar Kannan, Subarmaniam Kannan