2015-Student Performance Prediction Using Machine Learning
2015-Student Performance Prediction Using Machine Learning
ISSN: 2278-0181
Vol. 4 Issue 03, March-2015
B. Features of this student. Octave was used for test purposes. The marks
Bhardwaj and Pal [2] conducted a study on the student of 80 B.E. I.T (Bachelor of Engineering, Information
performance based by selecting 300 students from 5 different Technology) students from semester 3 to semester 6 were
degree college conducting BCA (Bachelor of Computer used. The algorithm is trained on a training set of 60 students,
Application) course of Dr. R. M. L. Awadh University, and tested on a cross-validation set of 10 students, to predict
Faizabad, India. By means of Bayesian classification method marks in 6 subjects. This is done 7 times, varying the training
on 17 attributes, it was found that the factors like students' and test sets each time (k-fold cross validation). An error of
grade in senior secondary exam, living location, medium of plus or minus 8 marks was considered as accurate. The error
teaching, mother's qualification, students other habit, family statistics were as follows:
annual income and student's family status were highly Average error = 6
correlated with the student academic performance. Accurate = 296
In the present study, those variables whose probability Erroneous = 124
values were greater than 0.70 were given due considerations Accuracy Rate = 70.48%
and the highly influencing variables with high probability Once it was confirmed that the data conforms well to a
values have been shown in Table 1. These features were used machine learning algorithm, we conducted a comparative
for prediction model construction. For both variable selection study of neural networks and Bayesian classification, on the
and prediction model construction, the publishers have used basis of varying training and test sets. The results were fairly
MATLAB. surprising. In general, the neural networks tend to outperform
From the table, it is found that the second high potential Bayesian classification. This is somewhat justified once one
variable for students' performance is their living location, and realizes that the input provided to the algorithm was on a
the third high potential variable for students' performance is continuous range, and Bayesian classification traditionally
medium of teaching. In Uttar Pradesh the mother tongue requires discrete data.
language of students is Hindi. Hence, students tend to be Finally, an application was made that employed neural
more comfortable in Hindi and other languages, than in the networks (Figure 2). The application provides to and fro
English language. access of data from .csv (Comma Separated Values) files.
C. Uniqueness When a prediction is required, it dynamically trains a network
of 3 layers, and provided prediction of marks in discrete
The study conducted by Erkan Er [3] proved valuable in
classes of 20 marks.
confirming the uniqueness of the proposed application. His
work concluded that all current applications of machine The training dataset size was increased in increments of
learning in an academic setting were to predict dropout rates 10, starting from 40, for 17 subjects. The test set was of 10
in a distance learning program. There is perhaps no students, to predict a single subject. The accuracy results are
application that attempts to predict the absolute performance summarized in Table 2.
of the student. If one does exist, it has not been published yet.
D. Inference
We analyzed the experiments and results of the
aforementioned studies, and two prominent inferences were
drawn. The first is that Naive Bayes Classification proves to
be an excellent algorithm for the application of predicting
student performance in an academic setting. Further, a worthy
contender for the same is neural networks. Secondly, several
factors contribute to a student's performance, apart from
previous academic performance.