A Comparative Analysis of Machine Learning Models for Predicting EFL Student Language Performance in Smart Learning Environments

Machine Learning Predictive Modeling EFL Student Performance Ensemble Methods Educational Analytics.

Authors

  • Banchakarn Sameephet Smart Learning Innovation Research Center and Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen,, Thailand
  • Wirapong Chansanam
    wirach@kku.ac.th
    Smart Learning Innovation Research Center and Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen,, Thailand http://orcid.org/0000-0001-5546-8485
  • Mahboubeh Rakhshandehroo Faculty of Culture and Representation, Doshisha Women's College of Liberal Arts, Kyoto, Japan; Center for Multilingual Education, Osaka University, Osaka,, Japan
  • Chawin Srisawat Smart Learning Innovation Research Center and Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen,, Thailand
  • Kittichai Nilubol Smart Learning Innovation Research Center and Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen,, Thailand
  • Arnon Jannok Smart Learning Innovation Research Center and Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen,, Thailand
  • Bhirawit Satthamnuwong Smart Learning Innovation Research Center and Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen,, Thailand
  • Kornwipa Poonpon Smart Learning Innovation Research Center and Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen,, Thailand

Downloads

Integrating smart learning environments into modern education systems opens up significant opportunities to use data analysis techniques to predict students' English language performance. This study aims to evaluate the performance of various machine learning models for predicting English as a foreign language student performance, emphasizing data preprocessing and feature selection. The dataset was gathered from 181 students in eight middle schools in Thailand. The student's data was exported from the Smart Learning Project, which includes data on 14 PISA-like English quizzes covering 27 competencies. The study compares the predictive performance of machine learning models, including Random Forest, Support Vector Regression, AdaBoost, Bayesian Ridge, K-Nearest Neighbors, ElasticNet, XGBoost, Gradient Boosting, and Stacking Ensemble, using MSE, RMSE, MAE, and R² metrics. The analysis results indicated that ensemble models, particularly XGBoost and Stacking Ensemble, performed the best in predicting students' English language performance. These models can efficiently capture complex relationships in educational data. Therefore, data preprocessing and feature selection play a significant role in improving model performance. This study highlights the potential of advanced machine learning techniques in educational data analysis. The results can contribute to developing personalized learning strategies and early intervention. It supports an efficient and adaptive education system, advancing smart learning and data-driven instruction.

 

Doi: 10.28991/ESJ-2025-09-02-07

Full Text: PDF