Educational Data Mining to Predict Bachelors Students’ Success

David Jacob, Roberto Henriques


Predicting academic success is essential in higher education because it is perceived as a critical driver for scientific and technological advancement and countries’ economic and social development. This paper aims to retrieve the most relevant attributes for academic success by applying educational data mining (EDM) techniques to a Portuguese business school bachelor’s historical data. We propose two predictive models to classify each student regarding academic success at enrolment and the end of the first academic year. We implemented a SEMMA methodology and tried several machine learning algorithms, including decision trees, KNN, neural networks, and SVM. The best classifier for academic success at the entry-level reached is a random forest with an accuracy of 69%. At the end of the first academic year, an MLP artificial neural network’s best performance was achieved with an accuracy of 85%. The main findings show that at enrolment or the end of the first year, the grades and, thus, the student’s previous education and engagement with the school environment are decisive in achieving academic success.


Doi: 10.28991/ESJ-2023-SIED2-013

Full Text: PDF


Academic Success; Student Success; Educational Data Mining; Machine Learning.


Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. doi:10.1016/j.procs.2015.12.157.

Najimi, A., Sharifirad, G., Amini, M., & Meftagh, S. (2013). Academic failure and students viewpoint: The influence of individual, internal and external organizational factors. Journal of Education and Health Promotion, 2(1), 22. doi:10.4103/2277-9531.112698.

Baek, C., & Doleck, T. (2020). A Bibliometric Analysis of the Papers Published in the Journal of Artificial Intelligence in Education from 2015-2019. International Journal of Learning Analytics and Artificial Intelligence for Education (IJAI), 2(1), 67. doi:10.3991/ijai.v2i1.14481.

Fischer, C., Pardos, Z. A., Baker, R. S., Williams, J. J., Smyth, P., Yu, R., Slater, S., Baker, R., & Warschauer, M. (2020). Mining Big Data in Education: Affordances and Challenges. Review of Research in Education, 44(1), 130–160. doi:10.3102/0091732X20903304.

Mohamad, S. K., & Tasir, Z. (2013). Educational Data Mining: A Review. Procedia - Social and Behavioral Sciences, 97, 320–324. doi:10.1016/j.sbspro.2013.10.240.

Yu, R., Jiang, D., & Warschauer, M. (2018). Representing and predicting student navigational pathways in online college courses. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale, 44, 1-4. doi:10.1145/3231644.3231702.

Yağcı, M. (2022). Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learning Environments, 9(1), 1-19. doi:10.1186/s40561-022-00192-z.

Helal, S., Li, J., Liu, L., Ebrahimie, E., Dawson, S., Murray, D. J., & Long, Q. (2018). Predicting academic performance by considering student heterogeneity. Knowledge-Based Systems, 161, 134–146. doi:10.1016/j.knosys.2018.07.042.

Kuh, G. D., Kinzie, J. L., Buckley, J. A., Bridges, B. K., & Hayek, J. C. (2006). What matters to student success: A review of the literature. National Postsecondary Education Cooperative, Washington, United States.

Hampton, J. (2011). SEMMA and CRISP-DM: Data mining methodologies. Connecticut, United States. Available online: (accesed on May 2023).

Haines, R. T., & Mueller, C. E. (2013). Academic achievement: An adolescent perspective. International guide to student achievement. Routledge, Milton Park, United States. doi:10.4324/9780203850398-4.

Morales-Vives, F., Camps, E., & Dueñas, J. M. (2020). Predicting academic achievement in adolescents: The role of maturity, intelligence and personality. Psicothema, 32(1), 84–91. doi:10.7334/psicothema2019.262.

Simms, S., & Paschke-Wood, J. (2022). Academic Librarians and Student Success: Examining Changing Librarian Roles and Attitudes. Journal of Library Administration, 62(8), 1017-1044. doi:10.1080/01930826.2022.2127585.

Mentkowski, M., & Astin, A. W. (1992). Assessment for Excellence: The Philosophy and Practice of Assessment and Evaluation in Higher Education. The Journal of Higher Education, 63(6), 717. doi:10.2307/1982058.

Terenzini, P. T., & Reason, R. D. (2005). Parsing the first year of college: A conceptual framework for studying college impacts. Annual meeting of the Association for the Study of Higher Education, November, Philadelphia, United Stastes.

Tinto, V. (1997). Classrooms as communities: Exploring the educational character of student persistence. Journal of Higher Education, 68(6), 599-623. doi:10.2307/2959965.

Tinto, V. (2006). Research and practice of student retention: What next? Journal of College Student Retention: Research, Theory and Practice, 8(1), 1–19. doi:10.2190/4YNU-4TMB-22DJ-AN4W.

Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 40(6), 601–618. doi:10.1109/TSMCC.2010.2053532.

Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Erven, G. Van. (2019). Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil. Journal of Business Research, 94, 335–343. doi:10.1016/j.jbusres.2018.02.012.

Hoffait, A. S., & Schyns, M. (2017). Early detection of university students with potential difficulties. Decision Support Systems, 101, 1–11. doi:10.1016/j.dss.2017.05.003.

Rebai, S., Ben Yahia, F., & Essid, H. (2020). A graphically based machine learning approach to predict secondary schools performance in Tunisia. Socio-Economic Planning Sciences, 70. doi:10.1016/j.seps.2019.06.009.

Musso, M. F., Hernández, C. F. R., & Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: a machine-learning approach. Higher Education, 80(5), 875–894. doi:10.1007/s10734-020-00520-7.

Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104. doi:10.1016/j.chb.2019.106189.

Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98, 166–173. doi:10.1016/j.chb.2019.04.015.

Bernacki, M. L., Chavez, M. M., & Uesbeck, P. M. (2020). Predicting achievement and providing support before STEM majors begin to fail. Computers and Education, 158. doi:10.1016/j.compedu.2020.103999.

Cruz-Jesus, F., Castelli, M., Oliveira, T., Mendes, R., Nunes, C., Sa-Velho, M., & Rosa-Louro, A. (2020). Using artificial intelligence methods to assess academic achievement in public high schools of a European Union country. Heliyon, 6(6). doi:10.1016/j.heliyon.2020.e04081.

Bakhshinategh, B., Zaiane, O. R., ElAtia, S., & Ipperciel, D. (2018). Educational data mining applications and tasks: A survey of the last 10 years. Education and Information Technologies, 23(1), 537–553. doi:10.1007/s10639-017-9616-z.

Azevedo, A., & Santos, M. F. (2008). KDD, semma and CRISP-DM: A parallel overview. IADIS Multi Conference on Computer Science and Information Systems (MCCSIS 200), 22-27 July, 2008, Amsterdam, Netherlands.

SAS Institute Inc. (2011). SAS ® Enterprise Miner 12.1 ® Reference Help (2nd Ed.). SAS Institute Inc, Cary, United States.

Bowman, N. A., & Garvey, J. C. (2022). Theories, findings, and implications from higher education research on student success. In How College Students Succeed, Routledge, 28-50. doi:10.4324/9781003445159-3.

Bursac, Z., Gauss, C. H., Williams, D. K., & Hosmer, D. W. (2008). Purposeful selection of variables in logistic regression. Source Code for Biology and Medicine, 3(1), 17. doi:10.1186/1751-0473-3-17.

Full Text: PDF

DOI: 10.28991/ESJ-2023-SIED2-013


  • There are currently no refbacks.

Copyright (c) 2023 David Jacob, Roberto Henriques