Predicting Academic Performance With Artificial Intelligence
Predicting Academic Performance With Artificial Intelligence
Intelligence (AI),
a new tool for teachers and students
Abstract—Learning Analytics (LA) is an analysis With this predictive-algorithmic base, we decided to replicate
toolset that enables the collection and study of students’ the study with Physics I and extend the predictions to Physics
data and context data. In this study, artificial intelligence II courses for engineering students and taught by five different
(AI) algorithms like K-Nearest Neighbor and Random instructors. Thus, the objective of the research was to
Forest were used. These algorithms trained a model that determine the predictive scope of the algorithm, as well as the
could predict the academic success of undergraduate perceptions of teachers and students about Artificial
engineering students. One finding of this study is that, Intelligence (AI) in the educational field.
despite the prediction not being correct for each student,
a general picture of the performance of the group was
given. This allowed the instructor to adapt their teaching
technique to get better results. Finally, most students
agree to take advantage LA and they think that knowing
their predictive results at the beginning of the course will
help them do better in class.
III. METHODOLOGY
For training the predictive model, the algorithm was fed with
the grades and log of the previous course of each instructor,
as well as the photographs of those past students. Once the
model was trained, the photographs of current students were
used as the only input to make the prediction for the first
evaluation period (we did not have any grades for these
students at the beginning of the course). The algorithms used
were K-Nearest Neighbors and Random-Forest. For the
second period, three predictions were made, one using
photographs as only input, another using photographs and the
grades from the first evaluation period, and the third using
Fig. 2. Predictive and real results of the first evaluation
only the grades from the first evaluation period.
period. Photograph was the only input. The Physics I
instructors are 1 and 2, the rest are from Physics II (blue=
To predict the grade of the second evaluation period, only
forecast, orange=real).
Random-Forest was used. To evaluate the accuracy of the
forecasts, 2 error measures were used, and a very basic
reference forecast was constructed for purposes of
Even though only the photograph was used to make the
comparison. The error measures used were: Mean Absolute
forecast, it is worth noting that the difference between the
Deviation (MAD) and Mean Absolute Percentage Error
real and the predictive is not much, particularly for
(MAPE). Equations 1 and 2 show how these measures are
instructors 3 and 4.
calculated.
1
𝑀𝐴𝐷 = ( ) ∑|𝐴𝑐𝑡𝑢𝑎𝑙 − 𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡| (1)
𝑛
1 |𝐴𝑐𝑡𝑢𝑎𝑙−𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡|
𝑀𝐴𝑃𝐸 = 100 ∗ ( ) ∑ (2)
𝑛 𝐴𝑐𝑡𝑢𝑎𝑙
B. Adaptive actions of instructors
The best forecasts using the photograph and the first period
Fig. 3. Summary of interviews with instructors. The Physics grades were those of Instructor 1 and Instructor 3, while for
I instructors are 1 and 2, the rest are from Physics II. Instructors 2, 4 and 5, the best forecast was generated using
only the first period grades. Comparing the forecasts for the
The design of the course is the same for all instructors, first and second period, the gap of the second period is
however each of them decided to work with different smaller, particularly in the results of instructors 2, 3 and 4.
resources to adapt their course adding their own personal
style. For example, Instructor 2 preferred to project
motivational videos to engage students, while instructor 5 D. Forecast validation
chose to sit students strategically to improve class attention. To validate the quality of the forecasts generated by the
algorithm, a reference forecast (also called naïve forecast)
was calculated as a basis for comparison [14]. This forecast
C. Predictive results for second period thorugh AI.
is very simple, as the prediction for each student will be
After we had the first evaluation period results, a second simply the average of the forecasts of all the students with the
forecast was made. Three forecasts were generated according same instructor. This reference forecast can also be called the
to the type of input, using only the photograph, the do-nothing forecast because it does not differentiate between
photograph and the grades of the first period, and using only students in the same class.
the grades of the first period (Figure 4).
Figures 5 and 6 show the Mean Absolute Percentage Error of
the forecast for the second period generated by the algorithm
and the reference forecast. Random Forest forecasts had an
average error of 16.4% meanwhile the reference forecast
average error is 17.8%. As expected, the reference forecast
has the higher average error for each of the 5 instructors when
we use the grades as only input. However, the difference is
very slight (specially for Instructor 3, where the difference of
0.1% is hard to see in Figure 5). Surprisingly, when we use
the grades and photographs as input (Figure 6) the difference
between both forecasts is less obvious.
On the other hand, when they were asked about the chance of
Forecast Mean Absolute Percetange Error
using AI to predict their final course exams they declared that
Input: First period grade they would like to know the prediction of their own academic
30
performance in their courses. See Figure 7.
25
20
MAPE
15
10
0
Instructor 1 Instructor 2 Instructor 3 Instructor 4 Instructor 5 TOTAL
shown in Figure 8.
15
10
0
Instructor 1 Instructor 2 Instructor 3 Instructor 4 Instructor 5 TOTAL
During the second round of interviews, the teachers were It is interesting how most students are eager about using
asked about the usefulness of AI in their lessons. The teachers predictive algorithms with the aim of improving or having
agreed that the algorithm might be able to predict the overall greater control of their grades (red color bar). At the same
performance level of the group. Besides, they asserted that time, other students showed some resistance (purple color
knowing this information before each evaluation period bar). Some outstanding opinions of the students were the
allowed them to adjust their teaching strategies to improve following:
student performance. Nonetheless, four teachers out of five o
agreed in using the predictive algorithm as a teaching tool. “It could allow planning and detecting on time what are your
weak topics, and you will know what grade you need to stay
At the end of the program, the undergraduate students away from the risk of losing a scholarship”
voluntarily answered to a questionnaire to know their own
prediction as well as to give their opinion about the potential “It would be interesting, but it could also be a little terrifying”
use of the algorithm as a teaching tool. A total of 45 students
submitted their answers. Individual forecasts show that around
80% of the undergraduates agreed and strongly agreed with
the assignments scores, the quizzes and the mid-term exam
marks. They were in consonance with what they expected
input. Even though students agreed with the use of their facial
information for research purposes, ethical questions can still
V. DISCUSION arise around this issue. Then, the possibility of making the
predictions without using facial recognition is left open.
Predictive algorithms based on decision trees can offer a close
approximation to the undergraduate students’ performance AI has a lot of potential especially for teaching assessment as
that occur in the classroom and that improves as it is fed with it enables teachers to have large amounts of data of each
more data. This can be seen in Figures 2 and 4, where the trend student without devoting too much time it the collection
between mid-term and final evaluation is similar. This is in process. The combination of automatic information given by
line with what was found by [15,16] who demonstrate that the the forecast algorithms with the teachers’ decision making
students' final academic performance could be predicted more processes might have strong impact in the students’ learning
accurately when a third of the semester has passed. Therefore, of future generations.
using AI-based predictive algorithms could become a
pedagogical tool to optimize adaptive learning in the Nonetheless, there are few limitations on the use of AI in
classroom. education. First, it is still an emerging technology that needs
to be further developed since there is open room for
In this study, both teachers and students agree to use this type miscalculations. Second, human behavior is complex and
of technology as a tool, both to improve teaching practice and intricate, depending on multiple variables, therefore it is not
to improve academic performance, respectively. However, easy to predict. The personal and social factors are still
although today’s technology can support vast data needed to be taken into the account by the teachers.
processing, unlike the 1970s when the AI began [6,7], there
are serious limitations in its use due to the data protection law
and ethical aspects that strongly advise not to use facial VII. REFERENCES
recognition, for instance, as an input for these algorithms
[17].
[1] B. T. M. Wong, “Learning analytics in higher education:
In this regard, it is encouraging that three of the five an analysis of case studies,” Asian Assoc. Open Univ. J.,
predictions were more accurate using the student's academic vol. 12, no. 1, pp. 21–40, 2017.
records alone. Therefore, colleges and universities could [2] C. Vieira, P. Parsons, and V. Byrd, “Visual learning
facilitate access to a greater number of data and thereby analytics of educational data: A systematic literature
review and research agenda,” Comput. Educ., vol. 122,
improve the predictive power of algorithms by covering no. March, pp. 119–135, 2018.
greater possibilities and reducing the margin of error. In other [3] V.K. Ayyadevara, “Random Forest. In: Pro Machine
words, if the algorithm is trained based on students' academic Learning Algorithms”. Apress, Berkeley, CA, 2018.
history such as recorded homework, quizzes and exams, it [4] G. Chirici, M. Mura, D. McInerney, E.O. Tomppo, L.T.
could be possible to accurately predict that those with a good Waser, D.Travaglini, R.E.McRoberts, “A meta-analysis
track record will get good grades. However, it might happen and review of the literature on the k-Nearest Neighbors
technique for forestry applications that use remotely
otherwise, as factors such as teacher pedagogical style, sensed data”. Remote Sensing of Environment. Elsevier
students’ study habits or personal circumstances may Inc, 2016. https://doi.org/10.1016/j.rse.2016.02.
influence the real results. For this reason, feeding the [5] O. Olmos, M. Hernández, E. Avilés, I. Treviño,.
algorithm with a greater amount of data encompasses a “Optimal Paths for academic performance supported by
artificial intelligence”. Conference Proceedings of the
greater number of possibilities for future teaching. For 6th International Conference on Educational Innovation,
example, it would easy teachers’ work by foregoing initial CIIE 2018. Monterrey, Mexico, 2018.
evaluations of students’ aptitude or spotting students with [6] EduTrends, “Aprendizaje y evaluación adaptativos
difficulties in the course of the teaching program. (adaptive learning and evaluation)”. Observatorio de
Innovación Educativa del Tecnológico de Monterrey,
julio 2014.
[7] T. Smith, “How adaptive learning really works”. Tech &
VI. CONCLUSIONS Learning, 37(3), 20–26, 2016.
[8] D. Gašević, S. Dawson, G.Siemens, “Let’s not forget:
Based on the results of the research, the algorithm provided a Learning analytics are about learning”. TechTrends,
forecast of the performance of each group in general, which 59(1), 2015.https://doi.org/10.1007/s11528-014-0822-x
makes it a potential resource for the instructor with respect to [9] Y. Koren, “The bellkor solution to the netflix grand
the design of adaptive routes. It is expected that the prize”. Netflix Prize Documentation, 1–10. Agosto 2009.
https://doi.org/10.1.1.162.2118
predictions made by the algorithm will improve even more [10] Havens, T. (2019). Netflix. In From Networks to Netflix
once we have the final grades. (pp. 321–331). Routledge.
At the same time, it is expected that the forecasts will gain https://doi.org/10.4324/9781315658643-30
accuracy as the model is fed a greater amount of data. [11] Y. Chen, X. Li, J. Liu, Z.,Ying, Z, “ Recommendation
However, in face-to-face courses there are legal limitations System for Adaptive Learning”. Applied Psychological
Measurement, 42(1), 24–41. 2018.
on the use of personal data, which hinders obtaining a greater https://doi.org/10.1177/0146621617697959
amount of data and in a timely manner. [12] M. Castañer, O. Camerino, M.T. Anguera, “Métodos
The forecasts of three out of five instructors had a smaller mixtos en la investigación de las ciencias de la actividad
margin of error using only the academic information of the física y el deporte”. Apuntes Educación Física y
student, without the need of having the face picture as an Deportes 112, 31-36, 2013.
[13] J.W. Creswell, “A concise introduction to mixed [16] D. T. Tempelaar, B. Rienties, and B. Giesbers, “In search
methods research”. SAGE, Thousand Oaks, 2015. for the most informative data for feedback generation:
[14] M. Gillililand, “The Business Forecasting Deal”. John Learning analytics in a data-rich context,” Comput.
Wiley and sons.New Jersey, USA, 2010. Human Behav., vol. 47, pp. 157–167, 2015.
[15] O. H. T. Lu, A.Y.Q. Huang, J.C.H. Huang, A.J.Q, Lin, [17] S. Agarwal and D. P. Mukherjee, “Facial expression
H. Ogata, S.J.H. Yang, “Applying learning analytics for recognition through adaptive learning of local motion
the early prediction of students’ academic performance descriptor,” Multimed. Tools Appl., vol. 76, no. 1, pp.
in blended learning”. Educational Technology and 1073–1099, 2017.
Society, 21(2), 220–232, 2018.
https://doi.org/10.2307/26388400