Mathematics and Mother Tongue Academic Achievement: A Machine Learning Approach

Academic achievement is of great interest to education researchers and practitioners. Several academic achievement determinants have been described in the literature, mostly identified by analyzing primary (sample) data with classic statistical methods. Despite their superiority, only recently have machine learning methods started to be applied systematically in this context. However, even when this is the case, the ability to draw conclusions is greatly hampered by the " black-box " effect these methods entail. We contribute to the literature by combining the efficiency of machine learning methods, trained with data from virtually every public upper-secondary student of a European country, with the ability to quantify exactly how much each driver impacts academic achievement on Mathematics and mother tongue, through the use of prototypes. Our results indicate that the most important general academic achievement inhibitor is the previous retainment. Legal guardian's education is a critical driver, especially in Mathematics; whereas gender is especially important for mother tongue, as female students perform better. Implications for research and practice are presented.


2-Theoretical Background
Academic Achievement (AA), the extent to which students are successful in school, is usually measured quantitatively in terms of performance: individual students' grades, school achievement exams, standardized test scores, and/or grade point average [8]. It is an expression of a cumulative learning experience across all school years. Understanding AA is crucial for educational decision-makers, since academic difficulties may lead to long-term dropout patterns, academic failure, and problems achieving a successful career in adulthood [9]. Struggling in school can also lead to externalizing problems [10], especially during adolescence and upper-secondary, which represent a critical period with considerable implications for AA. Even though upper-secondary completion is necessary to complete compulsory education in most developed countries, including in Portugal, many students quit school during this period. School dropout is linked to poverty and social exclusion [11] and difficulties entering a successful career in adulthood [12]. Portugal's upper-secondary dropout rate currently exceeds the European average (12.6% vs. 10.7%)., and a thorough understanding of the drivers of AA in upper-secondary can shed light on effective strategies to increase graduation rates and reduce dropout.
Most of the research done in AA has identified diverse factors that impact students' performance, namely characteristics of the students [13], parents [14], teachers [15], and the schools [16] the primary agents related to the educational process. Other researchers described some influences that contribute to learning and achievement: the student, the home, the school, the classroom, the curricula, the teacher, and teaching and learning approaches [4].

2-1-1-Students' Characteristics
Students' characteristics are understandably a major factor in explaining AA and students' cognitive ability has been identified as a major determinant of AA [17]. However, this identification has been questioned [18] as a much wider number of factors exist that determine school success. Personal psychological attributes such as personality traits, motivation, academic engagement, self-efficacy beliefs, and grit also determine students' pursuit of learning goals and work efforts and are positive predictors of AA [13].
Motivational and personal predictors stem mostly from student progress across all school years and previous academic results achieved. In fact, a student's track record is a rich source of information and a strong predictor of current achievement. The previous achievement is a positive predictor of AA [19,20], demonstrating the cumulative nature of learning and knowledge acquisition, and its effects on self-concept [21]. Likewise, the previous retention does not seem to provide great benefits to students with academic difficulties [22], but it even increases the probability that the student will fail a subsequent academic year [5]. Grade retention negatively impacts self-esteem, academic self-concept, and homework completion while contributing to higher maladaptive motivation and absenteeism [23]. Retention has been linked to peer stigmatization during early adolescence and may aggravate behavioral and socioemotional adjustment problems [24].
Research also shows significant differences in AA according to demographic characteristics of students, such as age [5], nationality [25], and gender [26]. Older students, meaning that they were likely previously retained, typically perform more poorly [27]. The AA of non-national students or those whose parents are immigrants is contingent on how well they have adapted to school [25]. Concerning gender, girls typically attain higher scores than boys [28]. However, boys have better results in specific mathematics abilities [29], and girls outperform boys in reading [30], notably across OECD countries.
Computer use and internet access have increased in recent decades, especially among teenagers, demonstrating noticeable benefits of its applicability in many areas. Their impact on AA is still controversial as both benefits and losses have been identified in research. Although information and communication technologies (ICT) are an essential tool for education, its use for leisure has escalated, especially among children and teenagers. Excessive internet leisure use elicits lower academic results due to sleep disturbances and decreased study time [31]. However, other studies show that moderate computer use and video gaming have positively influenced upper-secondary AA [32]. Concerning internet access, studies have shown that students who more often use the internet tend to have higher scores [33]. Likewise, AA is higher for students who are not dependent solely on a cell phone (i.e., have computer access) for internet access and those with higher digital skills, mainly social media skills.

2-1-2-Parents' Characteristics
AA is influenced by family characteristics such as socioeconomic status and cultural capital. Family background is crucial for children's development and adjustment to school life. Students with socioeconomic disadvantages show lower communication, language, literacy, and mathematical development. Likewise, early motherhood, low maternal qualifications, low family income, and unemployment predict lower scores at school [28]. Children whose families receive financial support for educational purposes are more likely to have lower grades [34,35]. AA is positively linked to higher parental income, high education levels, and the parents' occupation type [36,37]. Young people with highly educated parents are likelier to attain higher AA levels than their peers with low-educated parents. Middle school students with university-educated parents achieve at much higher levels than their peers whose parents did not pursue postsecondary education [38], and parents from low educational backgrounds struggle to value education, give the necessary support, and provide the resources needed for improving AA [39,40].
Parental involvement in school affairs has a noticeable relationship with AA [41,42]. It enhances students' ability to cope with schooling activities and promotes appropriate behavioral attitudes that lead to success [43]. Parents who are more involved usually have higher expectations regarding their children's academic path, and these high expectations are linked to increased motivation for work and better results [42]. Participating in school activities seems to be more influential on AA for socioeconomically disadvantaged students [44,45], and schools play an essential role in helping develop parenting skills [46], helping to diminish the gap derived from pre-existing conditions of students' background.

2-1-3-Schools' Characteristics
Research has also explored the role of schools' characteristics on student performance. Smaller schools seemingly benefit students with lower socioeconomic status and learning difficulties [47]. Likewise, schools with an average number of students, a higher percentage of teachers on a permanent contract, and a lower proportion of economically disadvantaged students tend to yield better results [4]. AA in Math and Reading falls as school size increases, and these adverse effects of large schools appear to be more salient in higher grades, which is also when schools tend to be the largest. Attending a school with a higher proportion of students from less-educated families harms AA [38].
Having smaller classes has often been suggested as a way of boosting academic success. In fact, smaller classes seem to improve AA, but they may also be more effective in diminishing the achievement gaps [48]. Nevertheless, it is not a consensual effect, as other studies demonstrate that reducing class size is not directly linked to better AA [5,49,50].

2-1-4-Teachers' Characteristics
Teachers' influence is also among the most significant determinants explaining the students' AA [51]. Adolescents' perceptions about teacher connection is the strongest predictor of growth in Math AA in middle and high school, particularly for adolescents from lower socioeconomic background families [52]. Teacher involvement is crucial for developing student perceptions of academic, social, and emotional support and helping students' career planning and decision-making [53]. Teachers with higher expectations about their students' AA help to improve their academic selfconcept, which increases AA [44].
Previous research in Portuguese schools assessed the impact of teacher gender, teacher employment situation, education level, and experience and found that female teachers have a more substantial influence on students' AA than male teachers. Teachers working away from home have significant adverse effects on AA. Advanced degree teachers (with MSc or PhD) seemed not to affect AA compared to those with an undergraduate degree. Teachers with more experience seem to be more effective at increasing AA [54]. However, younger teachers typically work away from home in a more precarious work situation, which can account for this difference in teacher age and its influence on AA.

2-2-Application of Machine Learning on AA
Researchers and practitioners have resorted to novel data science and artificial intelligence techniques to shed light on AA drivers in recent years. It should be noted that most studies in this regard focus on classic, traditional methods. As more sophisticated data science methods tend to perform better than classic ones, our work uses the first to understand better what drives AA at the upper-secondary level.
One of the first studies using data science and artificial intelligence methods is Musso et al. [55]. The authors used data from 655 students from private universities in Buenos Aires to examine their AA using neural networks with error backpropagation. Despite its high performance, the ability of these authors to quantify each AA driver using neural networks is limited. Hence, to shed some light on AA drivers, some authors used more interpretable methods such as decision trees or random forests. Among these is Şen et al. [56] who examined data from 5,000 8th grade Turkish students to examine placement test scores. ML methods have been demonstrated to be highly effective, particularly decision trees. Abad and López [57] also used decision trees to study the AA of 18,935 upper-secondary students from 99 Mexican schools, which proved to be extremely efficient thanks to its simplicity and high interpretability in understanding the underlying conditions that drive AA. Asif et al. [58] provide another example of a study that used decision trees to examine AA drivers, this time from 210 Pakistani undergraduate students. Even in cases where decision trees or random forests are not the best methods for interpretation reasons, these end up being chosen for their advantages in terms of understanding the pattern between AA and its drivers. Delen [59], for example, compared which ML methods better predict school dropout -using data from 16,066 first-year students from a public university in the United States. As with other studies, although decision trees provided the second-best results, these methods were considered the most appropriate technique. Likewise, Miguéis et al. [60] used random forests, decision trees, and other variants to examine the AA of 2,459 students attending European public universities.
In addition to the methodological issue regarding the trade-off between prediction power and interpretability discussed above, it should also be noted that most studies using sophisticated data science methods use sample data. In fact, this is the case in every example mentioned above. Although using more sophisticated methods has proven to be worthwhile, it is still unquestionable that using sample data is an explicit limitation. There are a few studies using data that do not come from samples. One example is provided by Costa-Mendes et al. [61], who used neural networks, support vector machines, regression, random forests, and an extreme gradient boosting machine, discovering that the last yields the best performance. Another example is given by Cruz-Jesus et al. [62], who compared the performance of a classic method with ML techniques to understand the factors leading to the failure or success of a school year. Random forests proved to be the most accurate procedure.
By examining previous research on AA drivers using machine learning methods, two facts are immediately noticeable: first, despite the most sophisticated approach, several studies using machine learning methods in the context of AA still use samples, which brings some representability and generalization constraints; second, because of the machine learning's "black-box" effect, none of these studies in the author's best knowledge is able to quantify precisely how much each potential driver impacts AA. Our study bridges this gap by combining these two limitations, i.e., we use a machine learning method with data from virtually every Portuguese public upper-secondary student and provide an estimate of how much each AA driver impacts Mathematics and mother tongue grade, providing also a comparison between the two.

3-Methodology
This section describes the methodology used in this work and is divided into three subsections. The first one presents a brief overview of the well-known machine learning method used in this paper -neural networks. The second focuses on a typical data science problem related to feature selection, i.e., choosing a subset of the independent variables. Finally, we present the data used in this work. The research methodology is displayed in Figure 1.

3-1-Machine Learning Methods
Artificial neural networks, or just neural networks (NN), are biologically inspired methods mimicking the structure of a human brain [63].
Multilayer perceptron (MLP) is an architecture with an input layer, multiple hidden layers, and an output layer. All these layers are formed by neurons, and each neuron in the input layer connects to every neuron in the first hidden layer, which in turn connects to every neuron of the next hidden or output layer. The number of layers and their respective number of neurons define the topology of the neural network. Each connection is associated with a weight, and the set of weights represents the parameters of the model. Thus, the learning process aims at determining the optimal values for the set of weights.
The computation is performed as follows: the neurons in the input layer collect the predictor variables' values and send a signal to every neuron in the first hidden layer. At this point, the inputs of the first hidden layer neurons are linear combinations of the signals received plus a bias. The neurons in the first hidden layer produce, by using an activation function, an output vector that represents the input for the second hidden layer, and the feed-forward process goes on up to the output layer. The output layer, for the considered problem, contains a single layer that is responsible for outputting a prediction for the considered sample in input.
In the second step of the learning procedure (i.e., the backpropagation), the network's parameters (i.e., the weights) are updated based on the feedback signal obtained by comparing the predictions of the neural network against the target values. In particular, backpropagation aims at minimizing the following error function with respect to the neural network's weights: where is the set of samples, N denotes the number of observations, ̂ the prediction of the model for the i th observation, and is the target value. This is achieved by calculating, for each weight , the derivative of the error function (with respect to the specific weight) . where denotes the weight for the neuron j in the layer k for the incoming neuron i. when the calculation of the gradient of the error function is completed, the weights are modified by a quantity equal to ∆ = − ( , ) where is a parameter called learning rate. This process can be repeated until the error is below a predefined threshold or the maximum number of iterations is reached [63].

3-2-Feature Selection
Feature selection is a process of reducing the independent variables based on relevance. This process allows for the creation of simpler models, shorter the training times, increases precision and avoids the curse of dimensionality.
Recursive Feature Elimination (RFE) [64] is used to select the optimal number of independent variables (N). The second step was to select the top N features, accordingly to their importance in predicting the target, using RFE, Lasso and Ridge Regressions. By combining the selected variables of these three methods, it was possible to reduce the number of variables on the datasets, from over 50 variables to 12-15 independent variables, depending on the dataset.

3-2-1-Recursive Feature Elimination
RFE is a feature selection method that fits a model and removes the less relevant features until the specified number of features is reached. In particular, RFE selects features by recursively considering smaller and smaller sets of features. First, the model is trained on the initial set of features and the importance (e.g., the absolute value of the coefficients for a linear model) of each feature is obtained. Subsequently, the least important features are removed from the current set of features and the procedure is recursively repeated on the pruned set until the desired number of features to select is reached.

3-2-2-Ridge Regression
Ridge regression penalizes the coefficients of the features allowing for minimizing the error. The coefficients are shrivelled toward zero (and each other), which grants the reduction of complexity and multicollinearity. The ridge estimate is specified as: where N denotes the number of samples, p represents the number of variables, and t ≥ 0 is a complexity parameter that controls the amount of shrinkage: the larger the value of t, the greater the amount of shrinkage [63]. The features selected to be part of the model have higher coefficients.

3-2-3-Lasso Regression
Lasso is a form of linear regression that uses a shrinkage and minimizes the prediction error by also considering a constraint on the value of the coefficients. In particular, the model shrinking process penalizes the regression coefficients, and the higher the penalty, the further the estimates are shrunk towards zero. After the shrinking process, the variables shown to be the weakest, the zeros, are eliminated. The solution to the defined Lasso estimate 1 optimization problem is: , and ≥ 0 is the parameter that determines the strength of the penalty; the larger the value of , the greater the amount of shrinkage [64]. In more detail, when = 0, no parameters are eliminated and, in this case, the estimate is equal to the one found with linear regression. On the other hand, as increases, more and more coefficients are set to zero and eliminated.

3-3-Data
An anonymized dataset from the 2018-2019 academic year provided by the Directorate-General of Statistics for Education and Science of the Portuguese Ministry of Education was used. The data include virtually every student attending a public upper-secondary year in Portugal. We include data from the mathematics and mother tongue (Portuguese) national exams used in Portugal for admission to tertiary education. The dataset includes the grades of 19,445 unique students who performed 35,780 exams (14,207 for mathematics exams and 21,573 for Portuguese). We created three target variables measuring AA: the mathematics and the Portuguese exams; and an aggregate grade that combines the two.

4-Results
A parameter tuning phase was applied in all the data science methods for the learning phase to yield better results. The parameters that were tested to tune the NNs were the hidden layers, the activation function, the solver, and the learning rate, using a grid search and cross-validation provided by scikit-learn to choose the optimal option. Then the models were trained using the chosen parameters. Cross-validation with repeated K(ten)-fold was implemented to assess each technique's performance and to prevent overfitting. The performance of each method was assessed using the mean R-Squared of all the iterations for each model, i.e., from each K in the cross-validation.

4-1-Feature Importance
To examine the features (i.e., AA drivers) included, we started by looking at the significance of the top 10 features of the best models. Feature importance was also assessed using scikit-learn. This method randomly shuffles each feature and computes the models' performance changes. Features significantly impacting the model's performance are considered the most important (see Table 1).
As mentioned above, one of the main contributions of this work is that it provides an exact measure of how much each driver impacts AA. In this sense, while feature importance provides a measure regarding the importance of each feature, it does not show the extent to which each variable impacts the grade. In other words, although it allows us to have an ordinal measure (rank) of importance, it is not particularly useful for understanding how much AA is affected and even less to draw policies to engender AA. This limitation of data science methods is commonly known as the "black-box effect", which is why sometimes researchers prefer to use less powerful, but more interpretable methods, as discussed in Subsection 2.2.
To mitigate this limitation, we use a new approach, creating a dataset made with "prototypes" to understand the importance of the variables in the neural networks' scoring process. A prototype dataset was built for each target (Portuguese, mathematics, and aggregated grades). The baseline is given by a prototype (fictional student) for whom all variables have the average dataset value. Then, for each variable in the dataset, we add two rows (prototypes): one with one standard deviation above the mean; the other with one standard deviation below. For both cases, all other variables are fixed with average dataset values. For binary variables, the rows are added with one or zero. As an example, to quantify the effect of age on AA, we add two prototypes, one with average values for all the variables, except age with one standard deviation below and one above the mean, respectively. For gender, as another example, we also created two prototypes (i.e., rows), one as gender=0 and the other as gender=1. In both cases, the baseline is given by a row with average values for all variables. The score, i.e., the predicted grade that the neural network assigns to each prototype, is then compared with that of the baseline. This difference gives us the weight that each AA driver has on the students' grades (see Table 1). Note: Rate of students who failed the year in that school (Sch_FailR), student's age (Stu_Age), education years the legal guardian completed (LG_Educ), rate of students with school social support (Sch_SocSup), rate of teachers with MSc or PhD at the school (Sch_ProfDeg), the student being female (Stu_Fem), number of enrolments that the student attempted (Stu_EnrolNum), students having the highest level of social support (Stu_SocSup), the legal guardian being the mother (LG_mother), the father (LG_father), or being the student himself (LG_own), teachers holding an MSc or a PhD (Prof_MscPhD), schools offering elementary and upper-secondary (Sch_Elem&High), and student's access to the Internet (Stu_Net).

5-1-Discussion of Findings
We assessed the impact of several potential AA drivers in Portuguese public upper-secondary schools. To the best of our knowledge, this work is the first, at least in this research stream, to use prototypes to quantify the impact of each feature on a specific target, i.e., AA drivers on AA. Hence, we bring to sophisticated data science methods some advantages of traditional techniques, i.e., interpretability. Besides the methodological aspect, in this work we used data from the whole national reality and not data based on a sample, which is always a partial representation of a larger reality. We found that the three most crucial AA drivers are student age, the rate of failing students at the school, and the legal guardians' education. More importantly, we can indicate exactly by how much each driver affects mother tongue (PT) and mathematics (MAT) national exams.
Students' age is a proxy for a driver of utmost importance in AA research, i.e., previous retention, as older than average students in a class are synonymous with previous retention. We demonstrate that older students perform more poorly in terms of AA, especially in Mathematics ( : = −1.5 ; : = −1.1). This point is absolutely critical, as it seems that preventing a student from progressing in his/her school path does not prevent future retentionsperhaps the contrary. Moreover, it is known that students who experience retention also suffer from a psychosocial point of view. Although the previous retention is a fairly known AA driver, in this case inhibitor, our study sheds further light on this finding as it indicated that it is even more important for Mathematics. Besides the impact on the individual students, it should be noted that the failing rate at the school level is also among the most important AA (negative) drivers we found ( : = −0.8 ; : = −0.5). Hence, one may question the need to revise the process of failing students across all school years from a pedagogical and policy point of view. It is interesting to notice that the effect of retention is coherent at individual (student) and at school level, as in both cases, it is particularly harmful for mathematics. We should note that we found evidence that retention has deleterious effects on AA for the failing student's progress and the entire school community.
Our results also indicate that the legal guardians' education level is among the most critical AA drivers. The legal guardian's education has a strong positive effect on exam results, as students whose legal guardians have more than 12 years of education (tertiary education) outperform their peers by as much as 1.5 points out of 20. This driver is especially significant in Mathematics ( : = 1.5 ; : = 0.9), an important result to consider as there is a high and rising demand for STEM (science, technology, engineering, and mathematics) skills in the job market [65]. It seems that difficulties in Mathematics are easier to overcome when the legal guardians have tertiary education. Likewise, when this is not the case, a student who lags behind on this subject is significantly penalized, especially considering mathematics' importance to the job market and society [66]. Hence, we posit that enabling teachers to know the degree of education of the legal guardians may be an effective tool to prevent poor AA. We believe this finding is, perhaps, one of the most important of our study. The fact that legal guardian's (often parents) education is a critical AA driver means that education has can promote social mobility but can also inhibit it. A student whose legal guardian has a low level of education is also more likely to underperform in terms of AA as well, thus creating a snowball effect for further generations. Like with previous retention, our study indicates this is special true for mathematics as well.
School-related characteristics are also relevant in the AA context. Starting with the schools' sizes, our results show that in bigger schools, where middle and upper-secondary students are together, upper-secondary exam scores tend to be lower ( : = −0.1 ; : = −0.2) , in line with previous research [67]. Schools with higher rates of underprivileged students (e.g., receiving government support) are more likely to present lower levels of AA ( : = −0.1 ; : = −0.2). Likewise, schools with a higher rate of teachers holding an MSc or PhD tend to present slightly higher AA levels, although only in Mathematics ( : = 0.5). In this context, it is worth mentioning that in Portugal, (public) school enrolment is determined by students' home addresses because students should be as close to home as possible. This criterion is a disadvantage for those living in underprivileged neighborhoods, creating a snowball effect that is challenging to escape. One may question if it would make sense to develop specific learning programs in these locations to prevent the formation of underprivileged clusters.
Additional findings of this study show other relevant AA drivers. We find that females tend to outperform boys, especially in Portuguese ( : = 0.2 ; : = 0.6). Previous studies also report this link [29]. The use of the internet seems to be a positive driver of AA, albeit with a small impact. Support for this link between the internet and AA is far from unanimous in the literature. Some authors find that the internet positively affects AA [32], while others report an adverse effect [68].
It is interesting to note that education itself plays a decisive role in AA. Note that the legal guardians' and teachers' education are among the leading AA drivers. Hence, students with more educated legal guardians and teachers are also more likely to perform better at school. Hence, it is absolutely critical that education stakeholders (researchers, teachers, and policymakers) try to avoid a contagion effect in which underprivileged students will also be legal guardians of underprivileged students in the future. It should be noted that higher education levels lead to more personal development and improved life conditions (i.e., a better job, higher S.E.S., and cultural capital) and that legal guardians in these conditions also help their children succeed at school, it is understandably the path to follow by decision-makers in the education field.

5-2-Theoretical Implications
Our results contribute to the literature by using sophisticated methods to understand AA at the upper-secondary level. We develop an innovative approach based on prototypes to quantify how much each driver influences AA, comparing the results between what are arguably the two most essential subjects in upper-secondary: mathematics and mother tongue.
Our results are mainly in line with previous research on this topic, as they indicate that the effects of school retention are mainly adverse on AA itself [69]. Our results demonstrate that older (previously retained) students perform more poorly, with greater prejudice toward mathematics. Anderson et al. [24] also posit that retention is a strong predictor of upper-secondary school dropout, mainly because there is an absence of effective strategies to increase competencies and peer stigmatization that may exacerbate behavioral and socioemotional adjustment problems. Any short-term positive academic results from retention tend to fade, and failing students fall further behind their promoted counterparts within a few years [27]. Retention is also related to lower academic self-concept and higher maladaptive motivation and absenteeism [23].
We also show that upper-secondary AA is affected by legal guardians' education, as students with legal guardians with tertiary education have better results in final exams, particularly in mathematics. Despite the fact that the literature shows evidence of the positive impact of legal guardians' education level [36], identifying a university or post-secondary degree as one of the most significant aspects to enhance student AA is a very important and novel finding. According to previous research, legal guardians' education seems to be a driver of AA across all levels, evidencing positive effects from as early as kindergarten [70]. Highly educated legal guardians are typically more involved and supportive in school matters, and they can help students in a way that parents with less cultural capital cannot, by engaging in academic discussions with their children, for instance (Tan et al., 2019). Hence, it is shown that students' home contexts play a decisive role in school experiences and AA. This dynamic has been studied at both the classroom level, where the higher education level of peers' parents improves AA [71], and at the school level, as in middle school, where attending a school with a higher proportion of students from educationally disadvantaged families harms AA [38].

5-3-Implications for Practice
Besides its theoretical implications, this study also helps shape education practices. By quantifying the impact of each driver on AA precisely, we provide meaningful insights for education policymakers and practitioners to implement substantial adjustments and improve school success. The first practical implication is related to the retention criteria and process. We believe this could be revisited, as school retention has detrimental effects on AA. Interestingly, retention policies have been under recent debate in Portugal. Retention rates have been falling, but further work at the student level (specialized tutoring at school for students at risk of failing, for example) and at the institutional level, namely changing retention criteria, is recommended, according to this study. Eurydice [72] found a prevailing culture of retention in some European countries. In countries where this culture exists, the dominant belief is that repetition benefits learning, even though recent research and results show the contrary. Some teachers still share this belief, along with school communities and legal guardians, even though it has been repeatedly called into question. Therefore, the challenge may reside more in questioning such assumptions before implementing legislative changes.
Concerning legal guardians' education (a main positive driver of AA found in this study), some measures can be suggested, such as considering the education of the legal guardian when assigning students to classes. This measure will create heterogeneous groups that integrate students with low and high-educated parents or provide teachers with information about parents' education level at the beginning of each academic year, helping to flag students who may need extra support in class, considering the strong impact of parental education on exam grades.
The teacher education level (undergraduate degree vs. MSc or PhD) is also a promising positive driver of AA, so policymakers could facilitate career opportunities for teachers who pursue post-graduate education and motivate schools to hire post-graduate teachers. Policies at the school level are also contemplated: our results show that Internet use can help boost AA, and therefore schools could implement an effective digital reinforcement, allowing all students to have internet access, granting equality of access, and minimizing any prior disadvantages (e.g., students who do not have a computer or internet access at home). Likewise, because our results also show lower exam scores in schools with higher rates of students who receive an allowance, policymakers may want to implement measures to minimize the effects of living in socially and economically disadvantaged territories.

6-Conclusion
This study allowed us to understand the primary drivers of AA in upper-secondary public schools in Portugal. Analyzing data from virtually all Portuguese public upper-secondary schools, we verified the negative effect of previous retention on AA and, more importantly, identified new AA drivers, such as legal guardians' tertiary degrees and schools with highly educated teachers. AA has a contagion effect, as parents with university education and teachers with higher degrees (MSc or PhD) are positive AA drivers. Higher education levels lead to more personal development and improved life conditions (i.e., a better job, higher socioeconomic status, and more cultural capital) and seemingly equip parents to better help their children succeed at school. Our results highlight the importance of promoting AA, and provide valuable insights for theory by identifying new determinants and pinpointing their effect, and also for practice, helping to define the most effective strategies to promote successraising compulsory graduation rates and increasing university enrolments and graduates.

6-1-Limitations and Future Work
As with any other work of this nature, ours has some limitations that need to be acknowledged. First, although we use data from virtually every upper-secondary student attending a public school in Portugal, our data still pertain to only one moment (academic year) in time. Hence, changes that may occur in AA drivers may be partially overlooked as time goes by. The second limitation is also related to the data. It should be noted that some potentially important AA drivers have not been included in our study. There is a trade-off between the depth and width of the data, i.e., the more observations we include, the fewer the available variables. Because we use data from the Portuguese Ministry of Education, we were not able to collect specific variables of interest, which could be collected only using surveys and, therefore, through samples. Future research should aim to include multiple years and seek to expand the types of variables used (e.g., sleeping habits or personality traits).

7-2-Data Availability Statement
Data were obtained from DGEEC-Direção Geral de Estatísticas da Educação e da Ciência and are available from the authors upon reasonable request with the permission of DGEEC-Direção Geral de Estatísticas da Educação e da Ciência.

7-4-Acknowledgements
We also gratefully acknowledge financial support from FCT Fundação para a Ciência e a Tecnologia (Portugal), national funding through research grant Information Management Research Center -MagIC/NOVA IMS (UIDB/04152/2020).