Predicting Dropout in MENA STEM Higher Education Using Explainable AI: A Machine Learning Approach

Machine Learning SHAP Explainable AI Student Dropout STEM Retention Educational Data Mining MENA

Authors

Vol. 9 (2025): Special Issue "Emerging Trends, Challenges, and Innovative Practices in Education"
Special Issue "Emerging Trends, Challenges, and Innovative Practices in Education"

Downloads

This study aims to develop an explainable machine learning–based early warning system to predict dropout risk among Science, Technology, Engineering, and Mathematics (STEM) students in the MENA region. Using longitudinal data from 6,798 undergraduate STEM students enrolled at a major UAE university, we evaluated six supervised classifiers: XGBoost, Gradient Boosting Machine (GBM), Random Forest, CART, Logistic Regression, and K-Nearest Neighbors. Models were trained on institutional student information system (SIS) data spanning ten cohorts (2010–2019), with class imbalance addressed through ROSE sampling. The top-performing models (XGBoost, GBM, and Random Forest) achieved AUC-ROC scores exceeding 0.91 and F1-scores above 0.84, significantly outperforming baseline models. Key predictors of dropout included the number of withdrawn semesters, second-term credit load, academic probation history, and performance in mathematics and physics. To improve interpretability, we applied SHapley Additive exPlanations (SHAP) analysis, enabling both global and individual-level feature attribution. The system offers scalable, real-time predictive capabilities using only routinely available SIS data, with no need for external surveys or learning management system inputs. The novelty of this research lies in its integration of explainable AI into a regional context, enabling early, transparent, and actionable interventions to reduce dropout. These findings contribute to data-driven retention strategies in higher education systems where predictive tools remain underutilized.