Multiclass Prediction Model For Student Grade Prediction Using Machine Learning
Multiclass Prediction Model For Student Grade Prediction Using Machine Learning
ABSTRACT Today, predictive analytics applications became an urgent desire in higher educational insti-
tutions. Predictive analytics used advanced analytics that encompasses machine learning implementation
to derive high-quality performance and meaningful information for all education levels. Mostly know that
student grade is one of the key performance indicators that can help educators monitor their academic per-
formance. During the past decade, researchers have proposed many variants of machine learning techniques
in education domains. However, there are severe challenges in handling imbalanced datasets for enhancing
the performance of predicting student grades. Therefore, this paper presents a comprehensive analysis of
machine learning techniques to predict the final student grades in the first semester courses by improving
the performance of predictive accuracy. Two modules will be highlighted in this paper. First, we compare the
accuracy performance of six well-known machine learning techniques namely Decision Tree (J48), Support
Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbor (kNN), Logistic Regression (LR) and
Random Forest (RF) using 1282 real student’s course grade dataset. Second, we proposed a multiclass predic-
tion model to reduce the overfitting and misclassification results caused by imbalanced multi-classification
based on oversampling Synthetic Minority Oversampling Technique (SMOTE) with two features selection
methods. The obtained results show that the proposed model integrates with RF give significant improvement
with the highest f-measure of 99.5%. This proposed model indicates the comparable and promising results
that can enhance the prediction performance model for imbalanced multi-classification for student grade
prediction.
INDEX TERMS Machine learning, predictive model, imbalanced problem, student grade prediction,
multi-class classification.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
95608 VOLUME 9, 2021
S. D. Abdul Bujang et al.: Multiclass Prediction Model for Student Grade Prediction Using Machine Learning
insightful information related to student academic perfor- best selected features to improve imbalanced multi-
mance. Solomon et al. [1] indicated that determining stu- classification for student grade prediction.
dent academic performance is a crucial challenge in HEI. • Our comparative analysis showed that the ratio between
Due to this, many previous researchers have well-defined the minority class in imbalanced dataset does not neces-
the influence factors that can highly affect student academic sarily to approach same ratio of majority class to obtain
performance [2]. However, most common factors are rely- better performance in student grade prediction.
ing on socioeconomic background, demographics [3] and • Our proposed model shows different impact in improv-
learning activities [4] compared to final student grades in ing the performance of student grade prediction model
the final examination [5]. As for this reason, we observe based on the versatility of two feature selection algo-
that the trend of predicting student grades can be one of the rithm after implementing SMOTE.
solutions that are applicable to improve student academic This paper is organized as follows. Section II describes the
performance [6]. related research work that has been conducted for student
Predictive analytics has shown the successful benefit in grade prediction. Section III illustrates the methodology of
the HEI. It can be a potential approach to benefit the com- developing predictive models to predict final student grades
petitive educational domain to find hidden patterns and make by phases. Section IV and Section V present the descrip-
predictions trends in a vast database [7]. It has been used to tive analysis and prediction results of this study’s findings,
solve several educational areas that include student perfor- respectively. Section VI discusses the findings result. Lastly,
mance, dropout prediction, academic early warning systems, the paper is highlighted with the main conclusions with some
and course selection [8]. Moreover, the application of predic- future directions in Section VII.
tive analytics in predicting student academic performance has
increased over the years [9]. II. RELATED WORKS
The ability to predict student grade is one of the important Several studies have been conducted in HEI for predicting
area that can help to improve student academic performance. student grades using various machine learning techniques.
Many previous research has found variant machine learn- It involves analytical process of many attributes and samples
ing techniques performed in predicting student academic data from variety of sources for student grade prediction in
performance. However, the related works on mechanism to different outcome. However, the performance of predictive
improve imbalanced multi-classification problem in predict- model for imbalanced dataset in education domains are still
ing students’ grade prediction are difficult to found [10], [11]. rarely discussed. Related to this issues, a study from [12]
Therefore, in this study, a comparative analysis has been done used discretization and oversampling SMOTE methods to
to find the best prediction model for student grade prediction improve the accuracy of students’ final grade prediction.
by addressing the following questions: Several classification algorithms have been applied such as
RQ1: Which predictive model among the selected machine NB, DT and Neural Network (NN) for classifying students’
learning algorithms performs high accuracy performance to final grade into five categories; A, B, C, D and F. They
predict student’s final course grades? showed that NN and NB applied with SMOTE and optimal
RQ2: How imbalanced multi-classification dataset can equal width binning outperformed other methods with similar
be addressed with selected machine learning algorithms highest accuracy of 75%. However, NB found better com-
using oversampling Synthetic Minority Oversampling Tech- pared to NN as the optimal time to utilize the prediction
nique (SMOTE) and feature selection (FS) methods? models are faster than NN. Research conducted by [13],
To address the above-mentioned questions, we collect the has developed a method for predicting future course grades
student final course grades from two core courses in the obtained from the Computer Science and Engineering (CSE)
first semester of the final examination result. We present and Electrical and Computer Engineering (ECE) programs at
a descriptive analysis of student datasets to visualize stu- the University of Minnesota. Based on the proposed meth-
dent grade trends, which can lead to strategic planning in ods, the results indicated that Matrix Factorization (MF)
decision making for the lecturers to help students more and Linear Regression (LinReg) performed more accurate
effectively. Then, we conduct comparative analysis using predictions than the existing traditional methods. The author
six well-known machine learning algorithms, including LR, also found that the use of a course-specific subset of data
NB, J48, SVM, kNN and RF on the real student data of can improve prediction accuracy for predicting future course
Diploma in Information Technology (Digital Technology) at grades. Another study in [14], applied MF, Collaborative Fil-
one of Malaysia Polytechnic. As for addressing the imbal- tering (CF) and Restricted Boltzmann Machines (RBM) tech-
anced multi-classification, we endeavor to enhance the per- niques on 225 real data of undergraduate students to predict
formance of each predictive model with data-level solutions student grade in different courses. They observe that using
using oversampling SMOTE and FS. The novel contribution CF does not indicate good performance especially when
of this paper are summarized as follows: there found many sparsity in the dataset compared to MF.
• We proposed combination of modification on over- However, their overall findings show that the proposed RBM
sampling SMOTE and two feature selection algorithms provides efficient learning and better prediction accuracy
to automatically determine the sampling ratio with compared to CF and MF with minimum Root Mean Squared
Error (RMSE) 0.3 especially for modeling tabular data. multi-classification for student grade prediction. The frame-
A study in [15] has developed a predictive model that can work consists of four main phases is shown in Figure 1.
predict student’s final grades in introductory courses at an The input of our framework contain student’s final course
early stage of the semester. They have compared eleven grade that we extract from student’s academic spreadsheet
machine learning algorithms in five different categories con- document and student academic repository. We applied two
sist of Bayes, Function, Lazy (IBK), Rules-Based (RB) and data-level solution using oversampling SMOTE and two
Decision Tree (DT) using WEKA. To reduce high dimen- FS methods to reduce the overfitting and misclassification of
sionality and unbalanced data, they have performed feature imbalanced multi-classification dataset. Then, we design our
selection correlation-based and information-gain for data- proposed model by combining both techniques into selected
preprocessing. The author also applied SMOTE to balance machine learning classifier to evaluate the performance using
the distribution instances of three different classes. Among performance metrics. Finally, data visualization is used to
the 11 algorithms, they indicated that Decision Tree clas- visualize the trend of dataset and final classification results.
sifier (J48) have the highest accuracy of 88% compared to The description of each phases is given in the following
other categories of algorithms. Al-Barrak [16] used DT (J48) subsection.
algorithm to discover classification rules for predicting stu-
dents’ final Grade Point Average (GPA) based on student
grades in previous courses. They have used 236 students
who graduated from Computer Science College at King
Saud University in 2012. They found that the classification
rule produced from J48 can detect early predictors and can
extract useful knowledge for final student GPA based on
their grades in all mandatory courses to improve students’
performance. Another study in [17] have predicted the stu-
dent’s grade performance using three different DT algo-
rithms; Random Tree (RT), RepTree and J48. In this context,
cross-validation is used to measure the performance of the
predictive model. From the findings, the results indicated
that RT obtained the highest accuracy of 75.188% better
than the other algorithms. The accuracy of the predictive
models can be improved by adding more number of samples
and attributes in the dataset. [18] has proposed a framework
for predicting student academic performance at University
Sultan Zainal Abidin (UniSZA), Malaysia. The study applied
399 student records from the academic department database
in the eight years’ intakes that contained student demograph-
ics, previous academic records and family background infor-
mation. The results indicated that the Rule-Based (PART) is FIGURE 1. The framework of the proposed multiclass prediction model
for predicting final student grades.
the best model with 71.3% accuracy compared to DT and NB.
However, using the small sample size has affected accuracy
performance due to incomplete and missing value found in A. DATA PREPARATION
the dataset. Anderson and Anderson [19] performed an exper- The dataset we used was collected by the Department of
imental study on 683 students at the Craig School of Business Information and Communication Technology (JTMK) at
at California State University from 2006 to 2015 by applying one of the Malaysia Polytechnics. The dataset contains
three machine learning algorithms to predict student grades. 1282 instances which is the total course grades of the first
The study found that SVM is the best classifier. Itfo consis- semester students taken from the final examination during
tently outperforms a simple average approach that obtained June 2016 to December 2019 session. Students need to take
the lowest error rate to optimize each data class. The result some compulsory, specialization and core courses modules
could be different for the large set of data due to significant to qualify them for the next academic semester. However,
changes in the historical grade dataset’s structure and format. in this study we selected only two core courses that contained
We have summarized related studies composed of sample the percentage of final examination and course assessment
size, data source, attributes, algorithm, best performance and marks. All features which are used for prediction are listed in
limitation in Table 1. the Table 2.
III. FRAMEWORK OF MULTICLASS PREDICTION MODEL
FOR STUDENT GRADE PREDICTION B. DATA PRE-PROCESSING AND DESIGN MODEL
This paper aims to identify the most effective In this phase, we applied data pre-processing for the collected
predictive model especially in addressing imbalanced dataset. For the convenience of data pre-processing, we have
95610 VOLUME 9, 2021
S. D. Abdul Bujang et al.: Multiclass Prediction Model for Student Grade Prediction Using Machine Learning
ranked and grouped the students into 5 categories of grades: different machine learning algorithms to evaluate which
Exceptional (A+), Excellent (A), Distinction (A−, B+, B), of the algorithms performed the highest performance for
Pass (B−, C+, C, C−, D+, D) and Fail (E, E−, F). The group predicting student’s final grades. There are three experi-
was created to be the output of the prediction class. However, ments were conducted in four distinct phases based on
the class distribution of the dataset indicated an imbalanced the five different classes. The accuracy is evaluated using
class instances containing number of (63) exceptional, (377) ten-fold cross-validation which our dataset is partitioned
excellent, (635) distinction, (186) pass and (21) fail with high into 90% for training set and 10% for testing set on the same
number of ratio 3:18:30:9:1 that can lead to overfit results. dataset [22].
Therefore, data-level solution using oversampling SMOTE Figure 2 illustrates the flowchart of the proposed multiclass
and two FS methods; Wrapper and Filter based were used as prediction model applied in this study.
the benchmark methods in this study to overcome the problem In particular, the following are the theoretical model used
of imbalanced multi-classification dataset. The experiment as basis to construct our multiclass prediction model:
used the open-source tool Waikato Environment for Knowl- • Logistic Regression (LR) known as cost function that
edge Analysis (WEKA) version 3.8.3 because it provides used logistic function as represent mathematical mod-
many machine learning algorithms with easy graphical user eling to solve classification problems. The model per-
interfaces for simple visualization [20], [21]. forms great contextual analysis for categorical data to
understand the relationship between variables [23].
C. PERFORMANCE ANALYSIS • Naïve Bayes (NB) is based on Bayesian theorem that
This paper aims to predict students’ final grades based widely used as it is simple and able to make fast pre-
on their previous course performance records in the first dictions. It is suitable for small datasets that combines
semester’s final examination. The proposed model applied complexity with a flexible probabilistic model [24].
TABLE 2. The information of the input features. Algorithm 1 Algorithm for Multiclass Prediction Model
(SFS)
Input: The training dataset
Output: The predicted Student’s Grade label, SG
1 Begin
2. Import necessary library packages and select dataset
3. Perform data preprocessing
3.1 Select filters for oversampling
3.2 Set parameter of SMOTE (nearest neighbor, k =
10)
3.3 Select features with attribute evaluator & search
method
3.4 Select attribute selection mode (Use full training
set)
4. Use classification models to predict the results
4.1. Splitting data into training and testing dataset
using 10-fold cross validation
4.2. Using well-known classification models (J48,
kNN, SVM, LR, NB, RF) to predict the SG (Excep-
tional, Excellent, Distinction, Pass, Fail)
5. Evaluate the accuracy of well-known classification
models
6. end
• Decision Tree (J48) a widely used in several multi class
classification that can handle missing values with high
dimensional data. It has been implemented effectively a sorted dataset and predicts, which of two conceivable
for giving an optimum results of accuracy with mini- classes includes the information, making the SVM a
mum number of features [25]. non-probabilistic binary linear classifier.
• Support Vector Machine (SVM) is based on the notion • K-Nearest Neighbor (kNN) is a non-parametric algo-
of decision planes that states decision boundaries which rithm that classifies and calculate the difference between
handle classification problem successfully [11]. It takes instances in the dataset based on their nearest vectors
DD
where k refers to the distance in the n- dimensional +
space. It uses a distance function to suitability performs DA + BD + DC + DD + DE
in small features of dataset [11]. EE
+ (4)
• Random Forest (RF) is a classifier based on ensemble EA + EB + EC + ED + EE
learning that used number of decision trees on various PR
F − Measure = 2 (5)
subset to find the best features for high accuracy and P+R
prevents the problem of overfitting. The RF is relatively where the f-measure is weighted harmonic mean of precision
robust to outliers and noise that operates effectively in and recall.
classification [26].
A confusion matrix helps to visualize the classification per- D. DATA VISUALIZATION
formance of each predictive model. Table 3 presents the con- In this phase, after performed the data analysis, we extracted
fusion matrix used for student grade prediction where A, B, and visualized our findings to view the useful information and
C, D and E represent the classes for student grade (SG) level student grade performance trends in different courses using
as being ‘exceptional’, ‘excellent’, ‘distinction’, ‘pass’ and Python. Data visualization allows discovering all the features
‘failure’. The class label represents in a form an expression: and insightful of the student dataset to help lecturers improve
student academic performance for better decision making in
SG ∈ {A, B, C, D, E} (1) the future. We also compare each the result of our proposed
model in a better graphical approach to better understand the
TABLE 3. Confusion matrix for student grade prediction classification.
findings’ results.
FIGURE 5. Analysis of average grade point trend for ICS and CSA courses
FIGURE 3. Mean and standard deviation of student’s final marks against by yearly basis.
student’s final grades achievement according to the taken courses.
ICS students is higher than the CSA. Therefore, from these
on the total of percentage from continuous assessment marks findings, we indicated that CSA course is more challenging
evaluated during class and the final test marks in the final to those students who are weak in mathematics whereas the
exam at the end of the semester. However, the students must ICS course is more easy to understand for students who
earn more than 40 marks for both assessments in order to already have basic knowledge of computers before entering
enable them to pass in both courses. the polytechnic.
From the results, we recognize there is a difference in
student achievement results between the CSA and ICS, where V. EXPERIMENTAL RESULTS
the students obtained higher marks better in ICS course com- In this section, the results of this study are divided into
pared to CSA. Figure 4 shows the normal trend of final marks two subsections according to research questions. We have
distribution achieved by the students. Out of the total number conducted a comprehensive performance analysis with three
of failure students, we found 3% of them are prominent experiments that run based on real dataset. The experiments’
in CSA compared to the ICS course. From these findings, results of J48, kNN, NB, SVM, LR and RF were explored and
we indicated that students who failed in both courses were compared. Then, we also compared and evaluate the impact
not performed the minimum passing marks of the final exam- of using oversampling SMOTE and FS methods in order to
ination, although their final marks classified as good and pass improve the imbalanced multi-classification problem with the
grades. same dataset.
TABLE 5. Performance comparison of predictive models. auto-detect the non-empty minority class. Then, the num-
ber of nearest neighbor’ k value was set up to equal 10
(k = 10) with percentage of instances 100% and SMOTE
filter was applied in ten times of iteration. The impact of
oversampled dataset has increased the number of instances
from 1282 up to 2932 where the SG class distribution using
SMOTE becomes (504) exceptional, (377) excellent, (635)
distinction, (744) pass and (672) fail by reducing the ratio
to 1:1:2:2:2. In Table 6 we present the details comparison
results of all predictive models with all performance mea-
sures. When the classifiers were used with oversampling
created while training the dataset. For generalizability pur- SMOTE, we found that the effectiveness of all predictive
pose, another experiments in dealing with the issues were models were consistently improved.
conducted to reduce the ratio of each classes which it is Among these predictive models, RF generated the most
described in the next subsection. promising f-measure of 99.5%, whereas followed by kNN
with 99.3%, J48 with 99.1%, SVM with 98.9%, LR with
B. RQ2: IMPACT OF OVERSAMPLING AND FEATURE 98.8% and NB with 98.3%. This result was statistically sig-
SELECTION FOR IMBALANCED MULTI-CLASS DATASET nificant with confidence level of 95% using Paired T-Tester
Here, we only focus on data-level solution using (corrected) as showed in Figure 6. We also observed when
oversampling SMOTE and two FS algorithms for addressing SMOTE method was applied, the minority class instance has
imbalanced multi-classification dataset [27], [28]. To see the increased to balance with other classes by number of iteration
performance of each predictive model, we have performed and number of k value to our dataset. The detailed analysis of
three experiments on six selected machine learning algo- the accuracy performance was presented based on confusion
rithms to reduce the imbalanced problem. First, we performed matrix as reported in Table 7.
SMOTE on our dataset with six selected machine learning
algorithms independently. Secondly, the dataset was executed
on two FS algorithms independently using three different
attribute evaluators, and thirdly the proposed multiclass pre-
diction model (SFS) was performed and tested using the same
dataset in six selected machine learning algorithms. For a
FIGURE 6. Result of predictive model performance with SMOTE.
better view of the dimensionality prediction accuracy, other
performance metrics on precision, recall and f-measure were
used to ensure that our predictive model was fit to produce It is obviously seen that confusion matrix of all predictive
accurate results. models derived from J48, NB, kNN, SVM, LR and RF shows
improvement results of correctly classified for ‘Pass’ and
1) SMOTE OVERSAMPLING TECHNIQUE ‘Fail’ grades.
SMOTE known as Synthetic Minority Oversampling Tech- However, there is small decrease performance from SVM
nique is the most commonly used to improve the overfitting where the predictive model correctly classified 97.2% of
problem based on random sampling algorithm [29]. It can student who obtained ‘Pass’ grades compared to 99.5%
modify an imbalanced dataset and generates new existing when applied without SMOTE. For comparative analysis,
minority class instances by using synthetic sampling tech- Figures 7 and Figure 8 illustrate actual scores and predictions
nique to create the distribution more balanced. This study was based on five categories of grade before and after applying
taking into consideration by increasing the default parameter the SMOTE respectively. Each predictive model performance
of nearest neighbors (k) in sample SG in the minority class, shows the significant improvement for the majority classes
select N samples randomly and record them as SGi . The new except for minority class.
sample SGnew is defined by the follows expression:
2) FEATURE SELECTION
SGnew = SGorigin + rand Another experiment that we applied is feature selection (FS)
× SGi − SGorigin , i = 1, 2, 3, . . . n which is effective in reducing dimensionality, removing irrel-
(6)
evant data and learning accuracy [30], [31]. In this experi-
where rand is a seed used of random sampling within range ment, two FS methods consist of wrapper and filter based
(0,1) and index of class value 0 with the ratio of generating were used as the benchmark methods to maximize the per-
new samples approximates 100%. In Weka, we implemented formance of six predictive models. The FS wrapper algo-
weka.filters.supervised.instance.SMOTE to insert synthetic rithm used to identify the best features set in this study
instances between minority class samples of neighbors to consist of two attribute evaluator using J48 classifier; Wrap-
our dataset. We set parameter of index class value 0 to perSubsetEval (FS-1) and ClassifierSubsetEval (FS-2) with
TABLE 10. Classification performance of the proposed SFS in different predictive models.
improve the accuracy performance of RF that might be due SMOTE is overall improved consistently than using FS alone
to imbalanced dataset. Thus, we indicated FS enabled the with all predictive models. However, our proposed multiclass
predictive model to be interpreted more quickly, but the prediction model performed more effectively than using over-
improvement was not depending on few features [34]. sampling SMOTE and FS alone with some parameter settings
Then, we attempted to reduce the overfitting and misclas- that can influence the performance accuracy of all predic-
sification of the minority class by combining SMOTE with tive models. Here, our findings contribute to be a practical
a selection of appropriate features for all predictive models approach for addressing the imbalanced multi-classification
by introducing the SFS model. Here, the overall performance based on the data-level solution for student grade prediction.
indicated the proposed SFS model outperformed with RF In HEI, predictive analytics plays a significant role in gov-
higher than previous study conducted by [12], [15]. The best ernance for improving valuable information and developing
accuracy obtained by the RF with 99.5% slightly higher than trusted decision-making that contributes to data science [38].
kNN and J48 shows that the RF algorithm was the ideal Determining the quality of the collected dataset to reduce
solution algorithm to predict student final grade. Meanwhile, the imbalance and missing values difficulties is part of the
kNN was the ideal solution that can work with the best challenging issues that adhere to select the relevant and valu-
value of k and optimal features [35]. The experiment results able predictive models [39]. Therefore, as for future works,
revealed that the proposed SFS model had more significant further investigation on the use of appropriate emerging pre-
effect on kNN depending on the selected of FS algorithms. dictive techniques in such advanced machine learning algo-
Certainly, these result also similar to the best performance of rithms [40] and more ensemble algorithms are recommended
kNN in handling imbalanced data with different case studies to optimize the result for predicting student grades. It is also
as depicted in [36]. In this context, we also observe that most essential to select several multi-class imbalanced datasets
of the predictive models considered benefit when performing to be analyzed with appropriate sampling techniques and
oversampling SMOTE but integrating the accurate features different evaluation metrics which suitable for the imbalanced
with different FS algorithms can influence the prediction multi-class domain such as Kappa, Weighted Accuracy and
effectiveness as well. other measures. Thus, using machine learning in higher learn-
Despite these findings, we have identified several limi- ing institutions for student grade prediction will ultimately
tations to this fact; (1) the analysis is based on a defined enhance the decision support system to improve their student
dataset, but other dataset should be tested for data general- academic performance in the future.
ization that could affect the analysis results; (2) the analysis
is only carried out with the certain well-known algorithms ACKNOWLEDGMENT
but can be analyzed with ensemble or advanced machine The authors are grateful for the support of Student Sebastien
learning algorithms to compare the effectiveness for imbal- Mambou in consultations regarding application aspects.
anced multi-classification prediction model. (3) we used only
one method of oversampling SMOTE, more method could REFERENCES
be used to analyze whether they can improve the multi-class [1] D. Solomon, S. Patil, and P. Agrawal, ‘‘Predicting performance and poten-
imbalanced problem. tial difficulties of university student using classification: Survey paper,’’
Therefore, this study still needs to be improved in predict- Int. J. Pure Appl. Math, vol. 118, no. 18, pp. 2703–2707, 2018.
[2] E. Alyahyan and D. Düştegör, ‘‘Predicting academic success in higher
ing students’ final grades by improving the sampling tech- education: Literature review and best practices,’’ Int. J. Educ. Technol.
niques for imbalanced multi-class dataset that might affect the Higher Educ., vol. 17, no. 1, Dec. 2020.
[3] V. L. Miguéis, A. Freitas, P. J. V. Garcia, and A. Silva, ‘‘Early segmen-
accurate prediction results. In addition, we also be considered tation of students according to their academic performance: A predictive
to use SVM ensemble to be as part of the analysis since it modelling approach,’’ Decis. Support Syst., vol. 115, pp. 36–51, Nov. 2018.
has produced greater accuracy when predicting students’ final [4] P. M. Moreno-Marcos, T.-C. Pong, P. J. Munoz-Merino, and C. D. Kloos,
‘‘Analysis of the factors influencing Learners’ performance prediction with
grades as mentioned in [37]. learning analytics,’’ IEEE Access, vol. 8, pp. 5264–5282, 2020.
[5] A. E. Tatar and D. Düştegör, ‘‘Prediction of academic performance at
undergraduate graduation: Course grades or grade point average?’’ Appl.
VII. CONCLUSION AND FUTURE DIRECTIONS
Sci., vol. 10, no. 14, pp. 1–15, 2020.
Predicting student grades is one of the key performance [6] Y. Zhang, Y. Yun, H. Dai, J. Cui, and X. Shang, ‘‘Graphs regularized robust
indicators that can help educators monitor their academic matrix factorization and its application on student grade prediction,’’ Appl.
Sci., vol. 10, p. 1755, Jan. 2020.
performance. Therefore, it is important to have a predic- [7] H. Aldowah, H. Al-Samarraie, and W. M. Fauzy, ‘‘Educational data mining
tive model that can reduce the level of uncertainty in the and learning analytics for 21st century higher education: A review and
outcome for an imbalanced dataset. This paper proposes synthesis,’’ Telematics Informat., vol. 37, pp. 13–49, Apr. 2019.
[8] K. L.-M. Ang, F. L. Ge, and K. P. Seng, ‘‘Big educational data &
a multiclass prediction model with six predictive models analytics: Survey, architecture and challenges,’’ IEEE Access, vol. 8,
to predict final student’s grades based on the previous stu- pp. 116392–116414, 2020.
[9] A. Hellas, P. Ihantola, A. Petersen, V. V. Ajanovski, M. Gutica,
dent final examination result of the first-semester course. T. Hynninen, A. Knutas, J. Leinonen, C. Messom, and S. N. Liao, ‘‘Predict-
Specifically, we have done a comparative analysis of com- ing academic performance: A systematic literature review,’’ in Proc. 23rd
bining oversampling SMOTE with different FS methods to Annu. Conf. Innov. Technol. Comput. Sci. Educ., Jul. 2018, pp. 175–199.
[10] L. M. Abu Zohair, ‘‘Prediction of student’s performance by modelling
evaluate the performance accuracy of student grade predic- small dataset size,’’ Int. J. Educ. Technol. Higher Educ., vol. 16, no. 1,
tion. We also have shown that the explored oversampling pp. 1–8, Dec. 2019, doi: 10.1186/s41239-019-0160-3.
[11] X. Zhang, R. Xue, B. Liu, W. Lu, and Y. Zhang, ‘‘Grade prediction of [36] P. Nair and I. Kashyap, ‘‘Optimization of kNN classifier using hybrid
student academic performance with multiple classification models,’’ in preprocessing model for handling imbalanced data,’’ Int. J. Eng. Res.
Proc. 14th Int. Conf. Natural Comput., Fuzzy Syst. Knowl. Discovery Technol., vol. 12, no. 5, pp. 697–704, 2019.
(ICNC-FSKD), Jul. 2018, pp. 1086–1090. [37] Brodic, A. Amelio, and R. Jankovic, ‘‘Comparison of different classifi-
[12] S. T. Jishan, R. I. Rashu, N. Haque, and R. M. Rahman, ‘‘Improving cation techniques in predicting a university course final grade,’’ in Proc.
accuracy of students’ final grade prediction model using optimal equal 41st Int. Conv. Inf. Commun. Technol. Electron. Microelectron., 2018,
width binning and synthetic minority over-sampling technique,’’ Decis. pp. 1382–1387.
Anal., vol. 2, no. 1, pp. 1–25, Dec. 2015. [38] P. Brous and M. Janssen, ‘‘Trusted decision-making: Data governance for
[13] A. Polyzou and G. Karypis, ‘‘Grade prediction with models specific to creating trust in data science decision outcomes,’’ Administ. Sci., vol. 10,
students and courses,’’ Int. J. Data Sci. Anal., vol. 2, nos. 3–4, pp. 159–171, no. 4, p. 81, Oct. 2020.
Dec. 2016. [39] H. Sun, M. R. Rabbani, M. S. Sial, S. Yu, J. A. Filipe, and J. Cherian, ‘‘Iden-
[14] Z. Iqbal, J. Qadir, A. N. Mian, and F. Kamiran, ‘‘Machine learning tifying big Data’s opportunities, challenges, and implications in finance,’’
based student grade prediction: A case study,’’ 2017, arXiv:1708.08744. Mathematics, vol. 8, no. 10, p. 1738, Oct. 2020.
[Online]. Available: https://arxiv.org/abs/1708.08744 [40] M. Tsiakmaki, G. Kostopoulos, S. Kotsiantis, and O. Ragos, ‘‘Implement-
[15] I. Khan, A. Al Sadiri, A. R. Ahmad, and N. Jabeur, ‘‘Tracking student ing autoML in educational data mining for prediction tasks,’’ Appl. Sci.,
performance in introductory programming by Means of machine learning,’’ vol. 10, no. 1, pp. 1–27, 2020.
in Proc. 4th MEC Int. Conf. Big Data Smart City (ICBDSC), Jan. 2019,
pp. 1–6.
[16] M. A. Al-Barrak and M. Al-Razgan, ‘‘Predicting students final GPA using SITI DIANAH ABDUL BUJANG received the
decision trees: A case study,’’ Int. J. Inf. Educ. Technol., vol. 6, no. 7, B.S. degree in science (computer science) and the
pp. 528–533, 2016. M.S. degree in science from Universiti Teknologi
[17] E. C. Abana, ‘‘A decision tree approach for predicting student grades in Malaysia (UTM), in 2006 and 2010, respectively.
research project using WEKA,’’ Int. J. Adv. Comput. Sci. Appl., vol. 10, She is currently pursuing the Ph.D. degree in soft-
no. 7, pp. 285–289, 2019. ware engineering with the Malaysia-Japan Inter-
[18] F. Ahmad, N. H. Ismail, and A. A. Aziz, ‘‘The prediction of students’ national Institute of Technology, UTM, Kuala
academic performance using classification data mining techniques,’’ Appl. Lumpur. Her thesis focuses on the application of
Math. Sci., vol. 9, pp. 6415–6426, Apr. 2015. predictive analytics on student grade prediction in
[19] T. Anderson and R. Anderson, ‘‘Applications of machine learning to
a higher education institution. From 2010 to 2019,
student grade prediction in quantitative business courses,’’ Glob. J. Bus.
Pedagog., vol. 1, no. 3, pp. 13–22, 2017. she was a Senior Lecturer of Information and Communication Technology
[20] S. Hussain, N. A. Dahan, F. M. Ba-Alwib, and N. Ribata, ‘‘Educational data Department, Polytechnic Sultan Idris Shah, Sabak, Selangor, Malaysia. She
mining and analysis of students’ academic performance using WEKA,’’ has experience in developing the polytechnic curriculum for Diploma in
Indonesian J. Electr. Eng. Comput. Sci., vol. 9, no. 2, pp. 447–459, 2018. information technology (technology digital), 2.5 years’ program. She is one
[21] A. Verma, ‘‘Evaluation of classification algorithms with solutions to class of the book authors that contribute for the Department of Polytechnic and
imbalance problem on bank marketing dataset using WEKA,’’ Int. Res. Community College Education. Her research interests include data analyt-
J. Eng. Technol., vol. 6, no. 3, pp. 54–60, 2019. ics, predictive analytics, learning analytics, educational data mining, and
[22] D. Berrar, ‘‘Cross-validation,’’ Comput. Biol., vols. 1–3, pp. 542–545, machine learning.
Jan. 2018, doi: 10.1016/B978-0-12-809633-8.20349-X.
[23] M. Hussain, W. Zhu, W. Zhang, S. M. R. Abidi, and S. Ali, ‘‘Using machine
ALI SELAMAT (Member, IEEE) has been the
learning to predict student difficulties from learning session data,’’ Artif.
Intell. Rev., vol. 52, no. 1, pp. 381–407, Jun. 2019. Dean of the Malaysia-Japan International Institute
[24] B. Predić, G. Dimić, D. Ranćić, P. Štrbac, N. Maček, and P. Spalević, of Technology (MJIIT), an academic institution
‘‘Improving final grade prediction accuracy in blended learning environ- established under the cooperation of the Japanese
ment using voting ensembles,’’ Comput. Appl. Eng. Educ., vol. 26, no. 6, International Cooperation Agency (JICA) and the
pp. 2294–2306, Nov. 2018, doi: 10.1002/cae.22042. Ministry of Education Malaysia (MOE) to pro-
[25] K. Srivastava, D. Singh, A. S. Pandey, and T. Maini, ‘‘A novel feature vide the Japanese style of education in Malaysia,
selection and short-term price forecasting based on a decision tree (J48) Universiti Teknologi Malaysia (UTM), Malaysia,
model,’’ Energies, vol. 12, p. 3665, Jan. 2019. since 2018. He is currently a Full Professor with
[26] L. E. O. Breiman, ‘‘Random Forests,’’ Mach. Learn., vol. 45, pp. 5–32,
UTM, where he is also a Professor with the Soft-
Oct. 2001.
[27] T. M. Barros, P. A. SouzaNeto, I. Silva, and L. A. Guedes, ‘‘Predictive ware Engineering Department, Faculty of Computing. He has published
models for imbalanced data: A school dropout perspective,’’ Educ. Sci., more than 60 IF research articles. His H-index is 20, and his number of
vol. 9, no. 4, p. 275, Nov. 2019. citations in WoS is more than 800. His research interests include software
[28] T. Alam, C. F. Ahmed, S. A. Zahin, M. A. H. Khan, and M. T. Islam, ‘‘An engineering, software process improvement, software agents, web engineer-
effective recursive technique for multi-class classification and regression ing, information retrievals, pattern recognition, genetic algorithms, neural
for imbalanced data,’’ IEEE Access, vol. 7, pp. 127615–127630, 2019. networks, soft computing, computational collective intelligence, strategic
[29] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, ‘‘SMOTE: management, key performance indicator, and knowledge management. He is
Synthetic minority over-sampling technique,’’ J. Artif. Intell. Res., vol. 16, on the Editorial Board of the journal Knowledge-Based Systems (Elsevier).
pp. 321–357, Jun. 2002. He has been serving as the Chair of the IEEE Computer Society Malaysia,
[30] C. Jalota and R. Agrawal, Feature Selection Algorithms and Student Aca-
demic Performance: A Study, vol. 1165. Singapore: Springer, 2021. since 2018.
[31] G. A. Sharifai and Z. Zainol, ‘‘Feature selection for high-dimensional
and correlation based redundancy and binary,’’ Genesm, vol. 11, pp. 1–26, ROLIANA IBRAHIM (Member, IEEE) received
Jun. 2020. the B.Sc. degree (Hons.) in computer studies
[32] Buenaño-Fernández, D. Gil, and S. Luján-Mora, ‘‘Application of machine from Liverpool John Moores University, the M.Sc.
learning in predicting performance for computer engineering students: degree in computer science from Universiti
A case study,’’ Sustain., vol. 11, no. 10, pp. 1–18, 2019. Teknologi Malaysia (UTM), and the Ph.D. degree
[33] S. Chinna Gopi, B. Suvarna, and T. Maruthi Padmaja, ‘‘High dimensional in systems engineering from Loughborough Uni-
unbalanced data classification Vs SVM feature selection,’’ Indian J. Sci. versity. She is currently the Director of applied
Technol., vol. 9, no. 30, Aug. 2016. computing at the School of Computing, formerly
[34] R. Hasan, S. Palaniappan, S. Mahmood, A. Abbas, K. U. Sarker, and
M. U. Sattar, ‘‘Predicting student performance in higher educational insti-
known as the Faculty of Computing. Previously,
tutions using video learning analytics and data mining techniques,’’ Appl. she was the Head of the Information Systems
Sci., vol. 10, no. 11, p. 3894, Jun. 2020. Department, for three years, at the Faculty of Computing, UTM. She has been
[35] S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, ‘‘Learning k for kNN an Academic Staff at the Information Systems Department, since 1999. She
Classification,’’ ACM Trans. Intell. Syst. Technol., vol. 8, no. 3, pp. 1–9, was previously a Coordinator of the B.Sc. Computer Science (Bioinformat-
2017. ics) Program and the Master of Information Technology (IT Management).
ONDREJ KREJCAR received the Ph.D. degree AI journals like IEEE TRANSACTIONS ON FUZZY SYSTEMS, IEEE TRANSACTIONS
in technical cybernetics from the Technical Uni- ON INTELLIGENT TRANSPORTATION SYSTEMS, IEEE TRANSACTIONS ON SYSTEMS,
versity of Ostrava, Czech Republic, in 2008, and MAN, AND CYBERNETICS: SYSTEMS, Knosys, Applied Soft Computing, Fuzzy
the Ph.D. degree in applied informatics from the optimization and Decision Making, Information Sciences, and Soft Com-
University of Hradec Kralove. He is focusing on puting. He has also been a Guest Lecturer in plenary lectures and tutorials
lecturing on smart approaches to the development in multiple national and international conferences related to artificial
of information systems and applications in ubiqui- intelligence.
tous computing environments with the University
of Hradec Kralove. He is a Full Professor of
systems engineering and informatics at the Center
for Basic and Applied Research, Faculty of Informatics and Management,
University of Hradec Kralove, Czech Republic, and a Research Fellow
at the Malaysia-Japan International Institute of Technology, University
HAMIDO FUJITA (Life Senior Member, IEEE)
Technology Malaysia, Kuala Lumpur, Malaysia. From 2016 to 2020, he was
received the title of Honorary Professor and the
the Vice-Dean of Science and Research at the Faculty of Informatics and
Doctor Honoris Causa degree from Óbuda Uni-
Management, UHK. He has been the Vice-Rector of science and creative
versity, Budapest, Hungary, in 2011 and 2013,
activities at the University of Hradec Kralove, since June 2020. He is also cur-
respectively, and the Doctor Honoris Causa degree
rently the Director of the Center for Basic and Applied Research, University
from Timisoara Technical University, Timişoara,
of Hradec Kralove. His H-index is 20, with more than 1300 citations received
Romania, in 2018. He is an Emeritus Profes-
in the Web of Science, where more than 100 IF journal articles is indexed in
sor with Iwate Prefectural University, Takizawa,
JCR index. In 2018, he was the 14th Top Peer Reviewer in multidisciplinary
Japan. He is currently the Executive Chair-
in the world according to Publons and a Top Reviewer in the Global Peer
man of i-SOMET Incorporated Association,
Review Awards 2019 by Publons. His research interests include control
www.i-SOMET.org, Morioka, Japan. He is a Distinguished Research Profes-
systems, smart sensors, ubiquitous computing, manufacturing, wireless tech-
sor at the University of Granada and an Adjunct Professor with Stockholm
nology, portable devices, biomedicine, image segmentation and recognition,
University, Stockholm, Sweden; the University of Technology Sydney,
biometrics, technical cybernetics, and ubiquitous computing. His second area
Ultimo, NSW, Australia; National Taiwan Ocean University, Keelung,
of research interests include biomedicine (image analysis), biotelemetric
Taiwan, and others. He has supervised Ph.D. students jointly with the
system architecture (portable device architecture and wireless biosensors),
University of Laval, Quebec City, QC, Canada; the University of Technology
and development of applications for mobile devices with use of remote
Sydney; Oregon State University, Corvallis, OR, USA; the University of
or embedded biomedical sensors. He is currently on the Editorial Board
Paris 1 Pantheon-Sorbonne, Paris, France; and the University of Genoa, Italy.
of the Sensors (MDPI) IF journal (Q1/Q2 at JCR) and several other ESCI
He has four international patents in software systems and several research
indexed journals. He has been the Vice-Leader and Management Committee
projects with Japanese industry and partners. He was a recipient of the
Member at WG4 at Project COST CA17136, since 2018. He has also been
Honorary Scholar Award from the University of Technology Sydney, in 2012.
a Management Committee Member Substitute at Project COST CA16226,
He was a Highly Cited Researcher in cross-field for the year 2019 and in
since 2017. Since 2019, he has been the Chairman of the Program Committee
computer science field for the year 2020 by Clarivate Analytics. He headed
of the KAPPA Program, Technological Agency of the Czech Republic, and
a number of projects, including intelligent HCI, a project related to mental
was a Regulator of the EEA/Norwegian Financial Mechanism in the Czech
cloning for healthcare systems as an intelligent user interface between
Republic, from 2019 to 2024. Since 2020, he has been the Chairman of the
human-users and computers, and a SCOPE Project on virtual doctor systems
Panel 1 (computer, physical and chemical sciences) of the ZETA Program,
for medical applications. He collaborated with several research projects in
Technological Agency of the Czech Republic. From 2014 to 2019, he was
Europe. He is recently collaborating in OLIMPIA Project supported by
the Deputy Chairman of the Panel 7 (processing industry, robotics, and
Tuscany Region on therapeutic monitoring of Parkison’s disease. He has
electrical engineering) of the Epsilon Program, Technological Agency of
published more than 400 highly cited articles. He is the Emeritus Editor-
the Czech Republic.
in-Chief of Knowledge-Based Systems and currently the Editor-in-Chief of
Applied Intelligence (Springer).