review-on-predicting-student-academic-performance-using-data-mining-classification-algorithm-Rwuc
review-on-predicting-student-academic-performance-using-data-mining-classification-algorithm-Rwuc
Journal of Computer
Engineering & Information
Technology
Mini Review a SciTechnol journal
Data Mining Classification to improve the systems performance. Thus, the application of data
mining techniques can be focused on specific needs with different
Algorithm entities. In this systematic article review is reviewed to answer the
following question as hypothesis. These are
Wasyihun Sema Admass*
¾¾ what importance attributes the researcher focused to predict
student’s academic performance
Abstract ¾¾ What are the methods that different researchers used to predict
This paper has reviewed previous studies on predicting students’
student’s academic performance
performance with various analytical methods. Most of the ¾¾ What will be predicted as a future work from the given articles?
researchers have used cumulative grade point average (CGPA) and
internal assessment as data sets. While for prediction techniques, ¾¾ Which of the data mining algorithms when used the most
the classification method is frequently used in educational data predictive set of students’ academic performance.
mining area. Under the classification techniques, Neural Network
and Decision Tree are the two methods highly used by the Objectives
researchers for predicting students’ performance. In conclusion, the
meta-analysis on predicting students’ performance has motivated us The objectives of this systematically article review on the
to carry out further research to be applied in our environment. It will prediction of student academic performance using data mining
help the educational system to monitor the students’ performance classification technique is the following.
in a systematic way.
¾¾ To identify the attributes used to predict the academic
Keywords performance of students
Student Performance, Prediction Technique, Data-Mining, ¾¾ To identify the gaps in the existing prediction and indicating
Algorithms.
future work
¾¾ To identify the methods used in the existing prediction methods
Introduction to predict students’ performance.
The topic of explanation and prediction of academic performance
is widely researched. The prediction of student performance should be
Methodology
topical debates in the education center. There are increasing research The reasoning for performing meta-analysis way of systematic
interests in education field using data mining. Application of Data article review is to find suitable methods for existing parameter, to
mining techniques concerns to develop the methods that discover fulfil the gaps in existing research and to place a new research activity
knowledge from data and used to uncover hidden or unknown in the suitable context.
information that is not apparent, but potentially useful [1].
¾¾ Searching: There is large amount of articles done on the area of
In the area of educational center the data is increased rapidly so education with different titles. To perform meta-analysis way of
the researcher should have to transform in to useful information and article review searching is important to get multiple articles from
knowledge, so data mining techniques play a special role in extracting different journals by using keywords.
useful and hidden patterns form tremendous amount of data. In the
area of education, educational data mining EDM has become an Usage of Data mining to predict students’ academic
emerging area for research interest amongst scientists and researchers performance
across the globe. The EDM converts raw data from traditional and
Many researcher do research on the area of education to predict
online education systems into important and useful information the academic performance of students by using different data mining
for educational institutes and research [2]. Different scholars Techniques.
perform research on the area of education to predict the academic
performance of students. All of them agreed that predicting student’s Surjeet Kumar Yadav and Saurabh Pal [3] conducted a research
academic performance helps to identify the status of students as slow on 400 students to predict academic performance engineering
learner (poor), good learner(Good), Medium learner(Average), very students by using decision tree (ID3 and C4.5 and CART) algorithms.
The researcher uses the past performance of the students to predict
*Corresponding author: Wasyihun Sema Admass, Faculty of informatics and
whether a new student will perform or not and it predict the result
Department of information technology, University of Gondar, Gondar, Ethiopia, as pass and faille. The study experiments are conducted to find the
E-mail: [email protected] best classifier for prediction of student’s performance in First Year
Received: November 03 2021 Accepted: November 17, 2021 Published: of engineering exam. From the classifiers accuracy it is clear that the
November 24, 2021 true positive rate of the model for the FAIL class is 0.786 for ID3 and
All articles published in Journal of Computer Engineering & Information Technology are the property of SciTechnol, and is
International Publisher of Science,
protected by copyright laws. Copyright © 2021, SciTechnol, All Rights Reserved.
Technology and Medicine
Citation: Admass WS (2021) Review on Predicting Student Academic Performance using Data Mining Classification Algorithm. J Comput Eng Inf Technol
10:11.
C4.5 decision trees that means model is successfully identifying the Naïve Bayes algorithm, it is also decision tree classification based on
students who are likely to fail. These students can be considered for probability inference the result was found from Figure 1, shows the
proper counselling so as to improve their result. The study is also performance analysis by using this algorithm the result shows 30%
focused to identify those students which needed special attention. of slow learner, 20% of Average learner, 40% good learner and 10%
of excellent learner. The experiment of knearest- neighbor algorithm
Vrushali Mhetre and Prof. Mayura Nagar [4] paper focuses on
result shows 45% of slow learner, 10% of Average learner, 5% good
predicting academic performance as slow learner, fast learner and
learner and 40% of excellent learner. From these experiment and
average learner. For that they applied various data mining techniques
analysis of classification accuracy, K-Nearest Neighbor taken a less
and compare the Accuracy based on students attributes. This research
time for classifying the student performance as excellent learner,
work is done to identify the best feature selection and classification
Good Learner, Average Learner and Slow Learner. Knearest Neighbor
algorithms to examine slow, average and fast in education data set.
has best accuracy of time taken in classification when compared to
to find the best attribute by comparing the performance of various
other techniques by the significance of examination result and other
feature selection techniques in the prediction of learners using
activities are affected in the rule set. This study is very useful to
different classification algorithms such as Naïve Bayes, J48, ZeroR and
identify the ratio of slow learner for rectify the failures early and take
Random Tree using WEKA tool. The idea of this research work is to
identify slow learners which help the faculties to give special attention action to improve the weaker student in perfectly manner.
to individual student’s to improve their overall performance. Finally V. Ramesh, P. Parkavi and K. Ramar [7] conducted this research
it has been investigated that Random Tree technique performs best paper which focuses on identifying weak students and the identified
with accuracy 95.4545% and identify students who are slow learners student can be individually assisted by the educators so that their
which further provide base for deciding Special aid to them. performance is better for the future. This study is also investigate
Sagardeep Roy and Anchal Garg [5] conduct a research on the accuracy of some classification techniques for predicting
predicting student academic performance using data mining the performance of students. The researcher uses four different
techniques which has the goal to help student improve their skills, classification algorithms: NaïveBayes, Multilayer Perception, J48, and
to find out what hinders student from achieving success and how to REPTree. From the research experiment the result shows multilayer
improve it. This paper is don on 32 attributes of a student by using perception (MLP) classifier is most appropriate for predicting
Naïve Bayes classifier, J48 Decision Tree and MLP classification student performance which gives 72.38% of prediction and the paper
algorithms. The accuracy of these algorithm is Naïve Bayes classifier concludes the important factors that affect the students performance
68.6 %, J48 algorithm 73.92% and MLP has 51.13 % there for the related to the school.
result is J48 perform best accuracy than others. The result identify Sajadin Sembiring [8] conduct a research on student performance
the abilities of students, their interests and weaknesses. Student prediction to predict the performance of students based on their
performance can be influenced by different types of attributes. This grade (GPA). The researcher grouped all the grades in to five groups
can be social, demographic and related to school. ’excellent, very good, good, average and poor’ and the researcher
M. Mayilvaganan and D. Kalpanadevi [6] conduct a research to categorized the value of each item in questionaries’ with high, low
predict student academic performance using classification algorithm and medium. The researcher uses two data mining techniques
to classify the student as Excellent Leaner, Good Learner, and average SSVM and kerner k-means clustering algorithm. This paper is done
learner, Slow Learner for diagnosis by using three main classification on 300 students of samples and every samples is expressed by ten
techniques such as decision tree, Naïve Bayesian methods, and characteristics parameters. We used five performance predictors that
knearest- neighbor. The research experiment result of these three proposed in this study and five characteristics demographic data of
algorithms the decision tree, shows that 30% slow learner, 20% of student. From the research experiment result shows that the average
Average learner, 40% good learner and 10% of excellent learner and testing accuracy for the lowest 61% for prediction “good” performance
and the highest 93.7% for the prediction “poor” performance. Based The most popular task to predict students’ performance is
on the results obtained they are sufficient to prove that the rule model classification. From the classification techniques the researcher uses
of prediction student performance by using predictor’s of student Decision tree, Artificial Neural Networks, Naive Bayes, K-Nearest
performance proposed acceptable and good enough to serve as Neighbor and Support Vector Machine.
predictor of student performance.
The specific application of data mining techniques grouped by
Ahmed Mueen, [9] this research paper is conducted two achiever algorithms in predicting student performance will be described in the
three basic objectives first objective was to predict student academic following:
performance, second objective was to reduce number of attributes,
and the last objective is to compare classification accuracy of different Decision Tree
classifiers. The researcher uses three classifiers to achieve these Decision tree is the most wieldy used classification algorithms
objectives Naïve Bayes, Multilayer Perception (neural network) in data mining techniques. The decision tree models are easily
and C4.5 (J48). From these research experiment the accuracy of understood because of their reasoning process and can be directly
each classifier shows naïve Bayes 86%, Multilayer Perception 82.7% converted into set of IF-THEN rules. From the seven papers six of the
and decision Tree (J48) 79.2%. From this the researcher conclude paper have been used decision tree algorithms Table 2.
that Naïve Bayes classifier has best accuracy to predict the student
performance than others. Finally the researcher analyzed the dataset Naive Bayes
to identify factors which cause student to loss his academic status due Naïve Bayes algorithm is the next option for the researcher to
to academic performance. We have found that poor performance of predict the students’ academic performance. Among seven (7) papers
student was due to lack of participation in on-line discussion forum. five (5) of them have been used naïve Bayes algorithm as prediction
method to estimate student performance. The following table shows
Important Attributes used to Predict Student
the predicted result of naïve Bayes algorithm Table 3.
Performance
Neural Network
The meta-analysis systematical article review helps to identify the
important attributes used to predict academic student performance. The next predictor method used by the researcher to estimate
The attribute’s which are frequently used and which has great role student performance is the neural network. The researcher uses
is predicting student academic performance the attributes will be Multi-layer perceptron algorithm to predict student performance.
important attribute Table 1. Among seven papers three of them have been used this neural
network techniques. Look at the following table which shows the
The attribute’s that are frequently used is GPA and assessment.
result estimated by neural network techniques Table 4.
The researcher used the GPA frequently either directly or indirectly to
predict the students’ academic performance. GPA is a good predictor K-Nearest Neighbor
because it is tangible measurement for future education and career
mobility CGPA is the most influence attributes in determining the The researcher have been also used K-nearest Neighbor data
survival of students in their study, whether they can complete their mining algorithm as predictor method to predict student performance.
study or not. In this review, assessment was classified as assignment From the seven paper one paper is used K-nearest neighbor algorithm
mark, quizzes, lab work, class test and attendance. All attributes will be as predictor method. According to [6] the estimated result of the
grouped in one attribute called internal assessment. The attributes are student by using k-nearest neighbor algorithm is 45% of slow learner,
mostly used among the researchers to predict students’ performance. 10% of Average learner, 5% good learner and 40% of excellent learner
The next important attributes used to predict student academic from the required data set.
performance is student demographic factors which includes gender, REPTree, Random Tree and ZeroR
age, family background, and disability. The reason that the researcher
used demographic factors of a student is to identify which sex has Some of the researcher also used these algorithms to predict
better attitude to learn and more strategic to study. The other attributes student academic performance. According to [7] REPTree is used
which are used by the researcher extracurricular activity high school to estimate the student performance based on demographic and
background there are also several researchers in another study who psychometric attributes. The predicted result of REPTree is 60.13%.
have used psychometric factor to predict students’ performance. A ZeroR also used by [4] to estimate the student performance, then the
psychometric factor is identified as student interest, study behavior, result shows 36.36%. Random Tree algorithm is used by [4] to predict
engage time, and family support. They have used this attributes to student performance and the result is 95.45 % Table 5.
make a system to look very clear, simple and user friendly. It helps
the lecturer to evaluate students achievement based on their personal Support vector Machine and K-means Clustering
interest and behavior. However, these attributes are rarely to apply
The researcher Sajadin Sembiring [8] has been used these two
in predicting students’ performance by several researchers because it
methodes to estimate the student performance. According to [8]
focuses more on qualitative data and it is also hard to get a valid data
after data set is prepared the data set is inserted to k means clustering
from respondents.
algorithm. The number of clusters was determined as an external
Prediction Methods used in Predicting Student parameter then grouped in to five clusters. The researcher is also used
SVM as prediction method and the result of the prediction. Average
Performance testing accuracy for the lowest 61% for prediction “good” performance
To predict student performance prediction modelling is used. To and the highest 93.7% for the prediction “poor” performance. The
use predictive modelling in educational data mining different activity following graph indicates the best prediction methodes used in these
may be performed like classification, regression and categorization. systematic review.
Attribute Authors
Students Branch,Students grade in High School,Students grade in Senior Secondary,Medium of teaching, Living location of teaching Surjeet Kumar Yadavand
,Student family size, Student family status, Family income,Family occupation,Result(Pass,Pro,Fail) Saurabh Pal [1]
School, Type of Address, Parent's Cohabitation Status, family educational qualification, family employment type, Reason for opting
a certain school, Time taken to travel to school, Weekly Study time, Educational support given by family, internet access, family Sagardeep Roy, Anchal
relationship, free time out of school, workday alcohol Consumption, weakly alcohol consumption, current health status, Absences in Garg [5]
school, first year grade, second year grade, grades
Speciality, lower class grade, Higher Class Grade, Extra Knowledge or skill, Attendance, hours spend to study, resources, seminar Mayilvaganan
performance, result, class test grade(internal), lab work, exercise, homework, quiz, over all semester mark Kalpanadevi [6]
Grade obtained at secondary level father occupation, mother occupation, school area at secondary level, school area at higher
secondary level, private tuition at secondary level group of study, student’s community, school area at elementary level, parent’s
education
Interest, Study Behaviour, Engage Time, Believe, and Family Support and GPA as dependent variable
Grade Point Average (GPA), quiz1, quiz2, quiz average, Assignment submit, Assignment delay, labtest1, labtest2, lab test average,
final exam grade, total time spent, hours spent studying daily, methods of study used, city of birth, transport method, distance to
the college, subjects interest, motivation level, difficulty doing homework, facilities in college, having home tuition, level of father
education, level of mother education, attendance
Decision Tree ID3=62.2, C4.5=67.7,CART=62.2 Surjeet Kumar Yadav and Saurabh Pal [1]
Naïve Bayes 68.1818% Vrushali Mhetre and Prof. Mayura Nagar [4]
Naïve Bayes 85.7% Ahmed Mueen, Bassam Zafar, Umar Manzoor [9]
Neural network(MLP) 81.4% Ahmed Mueen, Bassam Zafar, Umar Manzoor [9]
Table 5: 1 Accuracy Result for REPTree, Random Tree and ZeroR Algorithm.
Random Tree 36.36% Vrushali Mhetre and Prof. Mayura Nagar [4]
Strong side 4. Vrushali Mhetre (2017) Classification based data mining algorithms topredict
slow, average and fast learners in educational system using Weka. IEEE
All the papers are estimate the students’ academic performance to International Conference on Computing Methodologies and Communication.
identify the week or low performed students to announce the teachers 5. Sagardeep Roy (2017) Predicting Academic Performance of Student Using
to focus on weak or low performed students to prevent from failure Classification Techniques.
and indicate the teacher should more interactive with student, provide 6. Mayilvaganan (2014) Comparison of Classification Techniques for predicting
proper guidance and motivate the student. These all researchers use the performance of Students Academic Environment. International
tangible variable, demographic variable, psychometric variables Conference on Communication and Network Technologies (ICCNT).
to predict the performance of students (weak or Lowe performed 7. Ramesh (2013) Predicting Student Performance: A Statistical and Data
students and strong well performed students). Mining ApproachInt J Comput Appl 63: 975-8887.