Applied Computational Intelligence and Soft Computing - 2024 - Ahmed - Student Performance Prediction Using Machine
Applied Computational Intelligence and Soft Computing - 2024 - Ahmed - Student Performance Prediction Using Machine
Research Article
Student Performance Prediction Using Machine
Learning Algorithms
Esmael Ahmed
Information System, College of Informatics, Wollo University, Dessie 7200, Ethiopia
Received 2 January 2024; Revised 4 April 2024; Accepted 6 April 2024; Published 30 April 2024
Copyright © 2024 Esmael Ahmed. Tis is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Education is crucial for a productive life and providing necessary resources. With the advent of technology like artifcial in-
telligence, higher education institutions are incorporating technology into traditional teaching methods. Predicting academic
success has gained interest in education as a strong academic record improves a university’s ranking and increases student
employment opportunities. Modern learning institutions face challenges in analyzing performance, providing high-quality
education, formulating strategies for evaluating students’ performance, and identifying future needs. E-learning is a rapidly
growing and advanced form of education, where students enroll in online courses. Platforms like Intelligent Tutoring Systems
(ITS), learning management systems (LMS), and massive open online courses (MOOC) use educational data mining (EDM) to
develop automatic grading systems, recommenders, and adaptative systems. However, e-learning is still considered a challenging
learning environment due to the lack of direct interaction between students and course instructors. Machine learning (ML) is used
in developing adaptive intelligent systems that can perform complex tasks beyond human abilities. Some areas of applications of
ML algorithms include cluster analysis, pattern recognition, image processing, natural language processing, and medical di-
agnostics. In this research work, K-means, a clustering data mining technique using Davies’ Bouldin method, obtains clusters to
fnd important features afecting students’ performance. Te study found that the SVM algorithm had the best prediction results
after parameter adjustment, with a 96% accuracy rate. In this paper, the researchers have examined the functions of the Support
Vector Machine, Decision Tree, naive Bayes, and KNN classifers. Te outcomes of parameter adjustment greatly increased the
accuracy of the four prediction models. Naı̈ve Bayes model’s prediction accuracy is the lowest when compared to other prediction
methods, as it assumes a strong independent relationship between features.
lack of standardized assessment measures, high dropout stands out with its innovative clustering technique, com-
rates, and difculty in predicting students’ specialized needs prehensive comparative analysis, and practical application in
due to lack of direct communication. Long-term log data forecasting student performance. Tis emphasizes the rel-
from e-learning platforms can be used for student and evance and impact of the research fndings in educational
course assessment [3]. practice.
Numerous machine-learning algorithms have been
discovered to be efcient for specifc learning tasks. Tey are 2. Related Works
particularly helpful in poorly understood felds where people
might lack the expertise necessary to create efcient EDM’s state-of-the-art methods and application techniques
knowledge-engineering algorithms [4]. In general, machine play a central role in advancing the learning environment.
learning (ML) investigates algorithms that conclude from Te discipline explores, researches, and implements data
examples provided externally (the input set) to develop mining (DM) methods, incorporating multi-disciplinary
general hypotheses that make predictions about instances to techniques for its success. It extracts valuable and in-
come [5]. On the other hand, data mining is crucial in sifting tellectual insights from raw data to determine meaningful
through a massive amount of data to fnd relevant in- patterns that improve students’ knowledge and academic
formation. Making decisions is aided by it. Data mining has institutions [10]. Te acquired information is processed and
many important uses in the feld of education [6]. Learning analyzed using diferent machine-learning methods to im-
analytics focuses on the gathering and analysis of data from prove usability and build interactive tools on the learning
learners to optimize learning materials and enhance platform. Machine learning is part of artifcial intelligence
learners’ learning experiences [7]. Tis need can be met and (AI), where ML systems learn from data, analyze patterns,
potential improvements in course design and delivery can be and predict outcomes. Te growing volumes of data, cheaper
suggested by classifying students based on their profles. To storage, and robust computational systems have led to the
analyze the factors impacting student performance and rebirth of machine learning from pattern recognition al-
student dropout, the major goal is to identify meaningful gorithms to Deep Learning (DL) methods [11]. Te Uni-
indicators or metrics in a learning context and to examine versity of Cordoba implemented a grammar-guided genetic
the interactions between these metrics utilizing the ideas of programming algorithm, G3PMI, to predict student failure
learning analytics and educational data mining [8]. Finding or success in a course. Te algorithm has a 74.29% accuracy
noteworthy patterns in educational databases is a practice rate. Te Vishwakarma Engineering Research journal cre-
known as “educational data mining.” It aids educators in ated a platform for forecasting student performance using
foreseeing, enhancing, and assessing students’ academic machine learning algorithms, using attendance and related
standing. Students can enhance learning activities, enabling subject marks [12]. Somiya College Mumbai developed
management to enhance system performance [9]. Educa- a model for predicting student performance, which accu-
tional data mining (EDM) has signifcantly infuenced recent rately expressed correlations with past academic results.
developments in the education sector, providing new op- With data set growth, neural network output improved,
portunities for technologically enhanced learning systems reaching 70.48 percent precision. Artifcial neural networks
based on students’ needs. (ANNs) were used by Talwar et al. to forecast student
Tis research signifcantly contributes to the feld of success in exams, achieving a high precision of 85% [13].
EDM by advancing the prediction of student performance Kotsiantis et al. estimated student success using machine
using machine learning techniques. By addressing the learning techniques, fnding the Naı̈ve Bayes strategy with
challenges faced by modern learning institutions and a higher average accuracy of 73%. Te Eindhoven University
leveraging innovative methodologies, the study ofers of Technology assessed the efcacy of machine learning for
valuable insights into enhancing academic outcomes. Te dropout student outcome prediction using various machine
research explores the integration of machine learning al- learning approaches, with the J48 classifer being the most
gorithms into traditional teaching methods, demonstrating efective model [14]. Researchers from three Indian uni-
how these can improve student performance analysis and versities analyzed a data set of university students using
educational outcomes. It uses K-means clustering with diferent algorithms, comparing the accuracy and recall
Davies’ Bouldin method to identify clusters and signifcant values. Te ADT decision tree architecture provided the
features infuencing student performance, providing most correct outcomes. Te University of Minho, Portugal,
a deeper understanding of academic success factors. Te evaluated the accuracy of decision trees, random forests,
study also compares various machine learning algorithms, vector support machines, and neural networks in evaluating
including Support Vector Machine (SVM), Decision Tree, students’ success in math and Portuguese language subjects.
Naı̈ve Bayes, and K-Nearest Neighbors (KNN), to evaluate Another paper predicted student success at the beginning of
their predictive performance in predicting student out- an academic cycle based on academic records, achieving an
comes. Te research addresses technical gaps in predicting accuracy of 85% [15].
student performance by focusing on alternative algorithms Te study in [16] investigates machine learning (ML)
over artifcial neural networks (ANNs). Te study employs approaches for predicting student performance in tertiary
rigorous methodologies, such as repeated k-fold cross- institutions. Using 29 studies, six ML models were identifed:
validation and hyperparameter optimization, to ensure ro- decision tree, artifcial neural networks (ANNs), support
bust and reliable prediction outcomes. Te proposed model vector machine (SVM), K-nearest neighbor (KNN), linear
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Applied Computational Intelligence and Soft Computing 3
regression, and Naive Bayes (NB). ANN outperformed other specialized technical skills required for efective imple-
models and had higher accuracy levels. Te analysis revealed mentation. Consequently, these more accessible algorithms
an increasing number of research in this domain and a broad are widely used in educational contexts, leading to the
range of ML algorithms applied, suggesting ML can be underutilization of ANNs. Tis study aims to enhance
benefcial in identifying and improving academic perfor- prediction accuracy by comparing and refning the per-
mance areas [16]. formance of SVM, KNN, DT, and Naı̈ve Bayes, which are
Research in [17] aims to predict student performance commonly employed and easier to apply in EDM practices.
using Artifcial Intelligence, aiming to help students avoid Terefore, this research work presents a support vector
poor results and groom them for future exams. By identi- machine with some performance enhancement. In addition,
fying dependencies and course requirements, teachers can it presents a comparative study among KNN, SVM, decision
provide appropriate advice to students. Te system can help trees, and Naı̈ve Bayes. Compared to existing approaches,
teachers monitor students and provide tailored assistance, our proposed platform relies on more accurate student
reducing student lag. Te research achieved a 94.88% ac- performance predictors. Moreover, our approach shows the
curacy rate, benefting both students and teachers. discovery of less accuracy and undiscovered features using
Research work stated in [18] presents a model for hyperparameter tuning with enhanced performance.
predicting students’ academic performance using supervised
machine learning algorithms like support vector machine 3. Materials and Methods
and logistic regression. Te sequential minimal optimization
algorithm outperforms logistic regression in accuracy. Te Nowadays, machine learning (ML) is used in developing
research aims to help educational institutes predict future adaptive intelligent systems that can perform complex tasks
student behavior and identify impactful features like teacher that are beyond human abilities [21]. Some of the areas of
performance and student motivation, ultimately reducing applications of ML algorithms include cluster analysis,
dropout rates. pattern recognition, image processing, natural language
As described in [19] student performance in the fnal processing, and medical diagnostics, to mention just a few.
exam could be afected by many factors. Te study uses Cluster analysis, also known as clustering, is an unsupervised
Support Vector Machines (SVM) and Random Forest (RF) machine learning technique for identifying and grouping
algorithms to predict fnal grades in mathematics and related data points in large datasets without concern for the
Portuguese language courses. Te results show that binary specifc outcome [22]. In this research work, K-means,
classifcation achieves a 93% accuracy rate, while regression a clustering data mining technique, using Davies’ Bouldin
has the lowest RMSE of 1.13 in RF. Tis early prediction can method is used to obtain clusters to fnd the important
help educational organizations provide solutions for stu- features afecting students’ performance.
dents with low performance, enhancing their academic
results. Te study aims to enhance the performance of
educational organizations. 3.1. Methodology. Te proposed model in this study has four
According to recent research [20], contemporary aca- components which are data preprocessing, hyperparameter
demic institutions have difculties assessing student tuning, recommender model, and model evaluation. How-
achievement, delivering high-quality instruction, and ana- ever, these main components incorporate other elements.
lyzing performance. According to a comprehensive analysis Te general architecture of the model is presented in
of the literature on EDM from 2009 to 2021, machine Figure 1.
learning (ML) approaches are utilized to forecast the risk of First, dataset collection involves collecting the data from
and dropout rates among students. Te majority of research the Wollo University learning management system called
employs data from online learning environments and stu- A+. Next, we utilized three stages of data preprocessing. Te
dent databases. To improve student performance and predict data preprocessing consists of data cleaning, categorization,
risk and dropout rates, machine learning techniques are and reduction to make the dataset ready to train the data
essential. Te researchers recommended that future studies mining algorithms. Ten, we utilized feature extraction to
ought to concentrate on developing efective dynamic and determine the most informative features. After this, we used
ensemble techniques for predicting student performance hyperparameter tuning for the enhancement of the algo-
and delivering automated corrective measures. Tis will rithm. Hyperparameter tuning is used for the automatic
support educators in developing appropriate solutions and enhancement of the hyperparameters of a model. Hyper-
meeting precision education goals. parameters are all the parameters of a model that are not
Terefore, despite the aforementioned research works, updated during the learning and are used to confgure the
a lot of work should be done on predicting student per- algorithm to lower the cost function of the learning rate for
formance. Because there were technical gaps observed in the gradient descent algorithm. We apply this to the features
existing works such as less accurate predictions and un- which are fed into the algorithm. In this study, hyper-
discovered features. In EDM research, alternative algorithms parameter tuning is used just to enhance the loop of
such as decision trees, SVM, KNN, and Naı̈ve Bayes are model learning to fnd the set of hyper-parameters leading to
favored over ANNs for predicting student outcomes due to the lowest error on the validation set. Tus, a validation set
their accessibility and ease of use. While ANNs boast high has to be set apart, and a loss has to be defned. In this study,
prediction accuracy, their adoption is limited by the we clustered to predict a student’s fnal result based on
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Applied Computational Intelligence and Soft Computing
Clusterization
Dataset
Decision Trees
Feature Selection
Naïve Bayes
gender, region, entrance_result, num_of_prev_attempts, categorization to handle numerical values. Label coding was
studied_credits, and disability using various prediction employed to standardize the data. Te purpose of the label
models and choosing the best prediction model. Te clus- encoder was to convert categorical values such as distinction,
tering algorithm used in this study is K-Means. Model pass, withdrawn, pass, and fail into numeric numbers.
building involves developing a wide range of models using Numerical values are more suitable for machine learning
prediction methods. Finally, evaluating the model: It in- algorithms than categorical ones. Categorical data often take
volves testing the validity of the model against each other the form of strings or categories, have a fnite number of
and the goals of the study. Using the model involves making possible values, and only have two categories. Tere is an
it a part of the decision-making process. inherent order in the frst ordinal data categories. When
encoding ordinal data, the information about the order in
which the category is given is kept. Tey each have an ordinal
3.2. Dataset. Te dataset was gathered from Wollo Uni- relationship within the table “entrance result.” To map it,
versity and the Kombolcha Institute of Technology. Te their ordinal equivalent numbers are used. Te second
student’s data from the academic years 2017–2022 was category is made up of nominal data, which lack an inherent
exported from the student information portal system. Tere order. Nominal data is encoded with the presence or absence
were 8 columns in the fnal dataset. Te dataset’s columns of features taken into account. In the table, “region,” “dis-
contain the student’s ID, gender, region, entrance_result, ability,” and “fnal_result” are nominally related to each
num_of_prev_attempts, studied_credits, disability, and other. Finally, we made a data reduction to reduce and
fnal_result. Upon removing missing data, the dataset had organize data to simplify the objective behind running and
information on 32,582 students. Tese data are inconsistent processing the data. Besides, the matrix was sparse, with
and dirty, so data preprocessing has been done. Since the most elements being zero should be dropped.
quality of input data has an impact on the predictive model,
data preparation has paramount importance. Te researcher
pre-processes the data using Python software. Te major 3.4. Feature Selection. In our dataset, we encountered a mix
problems of the original data set that needs data pre- of numerical and categorical variables, necessitating
processing are attributes have so many missing values; the a thoughtful curation process. We treated numerical features
data contain duplicated records, in the original dataset. After and categorical features diferently to accommodate their
eliminating incomplete data, the dataset comprised 32,005 distinct characteristics. Each feature was accompanied by
students in the dataset. For instance, Figure 2 shows the a brief description, providing context to aid in the sub-
region frequency distribution in the dataset. sequent analysis. To identify the most informative charac-
teristics within this diverse dataset, we employed the random
forest algorithm as our primary tool. Tis algorithm is well-
3.3. Data Preprocessing. Te dataset was preprocessed before suited for feature selection due to its ability to handle various
being ftted into the models to guarantee the best possible types of features efectively. Our goal was to iteratively train
performance from them. Our data was mostly non- a random forest model under a 5-fold cross-validation setup.
numerical, so there was much preprocessing needed. In Tis method not only helps in assessing the model’s per-
this study, we used three stages of preprocessing. Firstly, we formance but also allows us to determine the optimal
utilized data cleaning to detect missing values and noisy data number of features. Te choice of 5-fold cross-validation is
that could corrupt the dataset. Next, we employed data both strategic and computationally efective. Tis technique
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Applied Computational Intelligence and Soft Computing 5
5000
4000
Count
3000
2000
1000
0 2 4 6 8 10
Regions
count
Figure 2: Region frequency in the dataset.
involves partitioning the dataset into fve subsets or “folds” 3.5. Clusterization. An unsupervised learning method called
and using four of them for training while reserving the ffth clustering can be used to fnd patterns or structures in the
for validation. Tis process is repeated fve times, with each data that are hidden. Te data are divided into homogenous
fold taking a turn as the validation set. Te smaller number groups via clustering, which makes the observations in one
of folds is particularly suitable for our dataset, ensuring group more similar to one another than to the observations
computational efciency while still providing robust in- in other groups. We have utilized K-means clustering among
sights. Moreover, employing a relatively modest number of the many partition-based clustering algorithms.
folds is advantageous because it allows each fold to represent Te k-means clustering algorithm is utilized in this study
a meaningful subset of the data. Given the dataset’s size, this to cluster the student data. Te K-means clustering algo-
approach ensures that each iteration captures a diverse and rithm divides n observations into k clusters, each of which
representative sample, contributing to the overall reliability contains the observation that corresponds to the cluster with
of the model’s performance evaluation. the closest mean. Every iteration’s k-means output clusters
Following the application of the random forest algorithm may be unique. K-means is therefore run numerous times on
for feature selection, a comprehensive analysis identifed the the dataset to obtain trustworthy clusters, and the clusters
following attributes as the most informative for subsequent are created based on all of the iteration results. After
predictive modeling: clustering the students, the 3 clusters are assigned Grades A,
B, and C depending on the metric values of the features that
(i) Gender: Te gender of the student.
are the cluster with the highest metric values is assigned
(ii) Region: Te geographic region associated with the Grade A, the second-highest metric value with B, and the last
student’s place of origin. cluster with C. For the selection of K in K-means clustering
(iii) Entrance Result: Te outcome of the entrance ex- the Elbow method has been used. Te elbow method is one
amination undertaken by the student. of the most popular ways to fnd the optimal number of
(iv) Number of Previous Attempts: Te number of clusters [23]. Tis method uses the concept of WCSS
times the student has attempted the course or (within-cluster sum-of-squares) value. Te elbow method
examination previously. formula is shown in equation [23]:
(v) Studied Credits: Te total number of credits the Pi Pi C1 + Pi Pi C2 + . . . + Pi Pi Ck , (1)
student has completed and currently undertaking.
(vi) Disability: Te presence or absence of any dis- Pi in Cluster1 distance (Pi C1 )2: Te sum of the squares rep-
abilities reported by the student. resenting the separations between each data point and its
cluster centroid1 is what determines the other two elements.
(vii) Final result: Te previous academic outcome
We have utilized the Euclidean distance to calculate the
achieved by the student.
separation between the data points and the centroid. Te
Tese selected features were deemed to possess the most plot’s sharp bend looks at the relationship between the
signifcant impact on predicting student performance, based number of clusters and the derived WCSS values. K, like an
on the rigorous analysis conducted with the random forest arm, is considered the best value of K. Figure 3 depicts the
algorithm. By leveraging this curated subset of attributes, we elbow method graph [23].
aimed to enhance the predictive accuracy of our subsequent We employed repeated k-fold cross-validation to reduce
modeling endeavors, thereby facilitating more informed the bias related to the samples. Te entire dataset is split into
decision-making in educational contexts. k equal-sized, mutually exclusive subsets for k-fold cross-
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Applied Computational Intelligence and Soft Computing
10000
9000
Inertia
8000
7000
6000
1 2 3 4 5 6 7 8 9
Number of clusters
validation. Te classifcation and regression models are automated hyperparameter optimization in various felds,
trained and tested k times, with each test being performed on from industry to scientifc research.
the fold that was not used for training. One confusion matrix In this study, the following steps are involved: Te frst
contains the prediction outcomes from the k experiments. step is to select the appropriate type of model for predicting
Te accuracy and other metrics are afterward calculated student performance. We choose using factors such as the
using this confusion matrix. In this investigation, the value nature of the data, the complexity of the problem, and the
of k was set at 10, or 10-fold cross-validation, and it was desired outcome. Terefore, we employed models including
carried out three times. decision trees, SVM, KNN, and Naı̈ve Bayes. Second, upon
selecting the modes, we examine their parameters and
3.6. Hyperparameter Optimization. Algorithm parameter proceed to establish the hyperparameter space. Tese
tuning, before presenting the results or getting a system hyperparameters, encompassing factors like learning rate
ready for production, tuning is a crucial step for enhancing and regularization strength, signifcantly infuence the
algorithm performance. Optimization of hyperparameters is learning process of the model. By utilizing our model
another name for it [24]. Making a computer system that can particulars, we construct a hyperparameter space charac-
automatically create models from data without requiring terized by a spectrum of values for each parameter to be
laborious and time-consuming human involvement is the explored during the tuning phase. Tird, to traverse the
aim of machine learning. Setting parameters for learning hyperparameter space, a grid search algorithm has been
algorithms before using the models is one of the employed. Grid search meticulously explores all feasible
challenges [24]. combinations of hyperparameters within predetermined
In machine learning, fnding the best hyperparameter ranges, while random search randomly samples hyper-
settings is like searching for a needle in a haystack. In our parameter values within specifed intervals. Bayesian opti-
research, we use grid search to navigate this complex search mization, on the other hand, employs probabilistic models to
space. We fne-tune model parameters by comparing pre- discern the most promising regions of the hyperparameter
dictions to actual values, aiming for the highest accuracy. space for further exploration. Next, to evaluate model
However, tweaking hyperparameters presents unique performance and mitigate overftting, we employ the cross-
challenges, which can be addressed with techniques like validation scheme approach. Tis entails partitioning the
dataset pruning. data into multiple subsets and training and evaluating the
Automated hyperparameter optimization (HPO) is es- model repeatedly, each time utilizing a diferent subset for
sential in modern machine learning to simplify the process validation. Trough this iterative process, we can measure
and improve model performance. Despite its importance, the model’s ability to generalize to unseen data. Finally, we
HPO faces signifcant hurdles, such as expensive function tuned hyperparameters via cross-validation, and then we
evaluations and unclear optimization goals. While grid assessed each model confguration’s performance using
search is a common method, it has limitations in handling predefned evaluation metrics. Table 1 shows the search
complex spaces and continuous parameters. Although HPO values/range for hyperparameters of each algorithm.
shows promise for transforming machine learning, its Te confguration demonstrating optimal performance
widespread use is limited by ongoing challenges. Over- on the validation set is designated as the fnal model. It is
coming these obstacles is essential to fully leverage imperative to scrutinize the model’s performance on an
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Applied Computational Intelligence and Soft Computing 7
independent test set to safeguard against overftting during Step 1: Compute the mean fnal result value of every
the tuning process. Adhering to these systematic procedures student according to the user-student performance
enables us to efectively fne-tune the hyperparameters of class matrix.
our machine learning models, thereby enhancing their Step 2: Calculate similarity based on the distance
performance and yielding superior results on our dataset. function
Table 2 shows the optimal parameters used for model
Step 3: Find K neighbors of the class by searching for
enhancement.
the K class closest to a specifc student performance
Table 2 presents parameters that were selected based on
class which is most similar to a specifc student in terms
the results obtained through grid search and optimization
of attributes.
techniques. Tey represent the confgurations that yielded
the best performance for each respective algorithm. Step 4: Predict the top N similar student performance
class for similar students.
3.7. Prediction Methods. Four prediction/classifcation al- In the study, the value of k in the k-nearest neighbors
gorithms are utilized in this study, and they are contrasted (kNN) algorithm was determined through grid search,
with one another. Tey are KNN, Naive Bayes, decision a technique used to train and evaluate models using diferent
trees, and support vector machines. Tese algorithms are values of k. After employing 10-fold cross-validation, the
employed due to their excellent modeling abilities for optimal value of k was found to be 8 based on performance
classifcation-type prediction issues. Te short descriptions metrics.
of the prediction techniques are provided below. nj�1 (Xj . Xij)
sim(x, xi) � ��������� ����������. (2)
nj�1 (Xj)2 . nj�1 (Xij)2
3.7.1. K-Nearest Neighbor (KNN). K-nearest neighbors
(KNN) is a fundamental machine learning algorithm widely Compute the distance between the data point to be
used for classifcation tasks. It relies on the principle of classifed (x) and each point in the training dataset (xi).
similarity, where new data points are classifed based on the
majority class of their nearest neighbors in the feature space.
In the context of this study, KNN is applied to predict 3.7.2. Support Vector Machine (SVM). Te goal of support
student performance categories, such as distinction, pass, vector machines (SVM), which are a subset of generalized
withdraw, or fail, based on their features. Te algorithm linear models, is to make predictions based on a linear
calculates the cosine similarity between the attributes of each combination of features obtained from the variables [27].
student’s record and those of other students in the dataset. SVM translates the input data to a high-dimensional feature
Based on the class of the k-nearest neighbors, the KNN space, where the input data becomes more comprehensible,
algorithms classify new data [25]. It involves fnding the top using both linear and nonlinear kernel functions. SVM
K-nearest neighbors for the class of student performance determines the mathematical defnition of a hyperplane that
(i.e., fnal result categorized as distinction, pass, withdraw, divides training data into classes, with data points from the
fail, etc.). Class of student performance from the list of same class located on the same side of the hyperplane. Once
nearest neighbors are combined to predict the unknown the best hyperplane is identifed, it can be used to classify
class. Te K-nearest neighbor classifer usually applies either new data into one of the classes [27]. Te decision boundary
the Euclidean distance or the cosine similarity between the in SVM is represented by a hyperplane as shown in the
training tuples and the test tuple but, for this research work, following equation:
the cosine similarity (equation (2)) approach has been ap- w · x + b � 0, (3)
plied in implementing the KNN model for our prediction
model. Te KNN algorithm involved in predicting student where w is the weight vector (coefcients of the features), x is
performance based on student historical record work is as the input feature vector, and b is the bias term or intercept.
follows [26]: SVM aims to maximize the margin (equation (4)) which is
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Applied Computational Intelligence and Soft Computing
the distance between the decision boundary and the nearest 3.7.3. Decision Trees (DTs). One of the most used methods
data points of each class. for prediction is the decision tree. Decision trees are pre-
2 ferred by most researchers for the following reasons: (1)
Margin � . (4) Decision tree outputs are more accessible and clearer for the
‖w‖
user, making them more transparent to the user. (2) Tey
SVM can handle nonlinearly separable data by mapping can be simply incorporated into the decision support system
the input features into a higher-dimensional space using by being transformed into a collection of IF-THEN rules. To
kernel functions. Te decision boundary in the higher- build a tree with the maximum potential prediction accu-
dimensional space becomes linear, even if it was non- racy, this technique recursively divides data into branches.
linear in the original feature space. Diferent algorithms, including information gain and chi-
In this study, the linear kernel was selected for the square statistics, are utilized to build the tree, and the
Support Vector Machine (SVM) as defned in equation (5). variables for each node are chosen based on these results
Te choice of the linear kernel was made due to its simplicity [28]. Te complete tree is built by repeating this procedure
and interpretability, making it easier to understand the for every node. Decision trees frequently produce outcomes
decision boundary and the relationship between features and that are easier to understand and more accurate in decision-
the target variable. Additionally, linear kernels are com- making. Te decision tree’s initial node is known as the root
putationally efcient and can perform well when the data is node, and its subsequent nodes are known as the leaf nodes.
linearly separable or when the number of features is high Te end node refers to the tree’s fnal node. Te specifc
compared to the number of samples. While the linear kernel algorithm employed and the number of values for the chosen
is advantageous in terms of clarity, it may not capture these variable determine how many branches the decision tree will
complexities efectively. Terefore, to address this limitation have [28].
and enhance the model’s predictive performance, we Decision Trees aim to fnd the optimal splits that
employed hyperparameter tuning techniques, notably grid maximize the information gain or minimize the entropy at
search. each node. Entropy is a measure of impurity in a set of data
Te use of grid search facilitates a systematic exploration points, and information gain quantifes the reduction in
of various model confgurations, including diferent kernel entropy achieved by a split. Te equations for entropy and
functions (such as linear, polynomial, or radial basis func- information gain are as follows:
tions) and their associated parameters. Tis approach en- n
sures that we strike a balance between interpretability and Entropy (H(S)) � − pi log 2(pi), (7)
predictive accuracy, catering to the nuances present in the i�1
dataset while still maintaining clarity in decision-making.
where pi is the proportion of data points in class i in the set S.
Essentially, the choice of a linear kernel at the very
beginning is per the need for intelligibility and simplicity,
but the use of grid search later on enables us to optimize the 3.7.4. Naı̈ve Bayes. A straightforward probabilistic classifer,
model confguration taking complexity and performance the naive Bayes classifer is based on the application of the
into account. Te aim is to develop a model that can properly Bayes theorem with strong independence assumptions be-
anticipate outcomes in real-world settings and analyze data tween the features. Te scalability of the Naive Bayes clas-
efciently. To this end, we desire to include a grid search in sifer is excellent [29]. In proportion to the number of
the SVD model. variables in the learning issue, numerous parameters are
needed. Simple Bayes and independent Bayes are additional
K(xi, xj) � xi · xj. (5) names for naive Bayes models. Given the class variable, Nave
Finally, to predict the class label of a new data point x, we Bayes assumes that a feature’s value is independent of all
simply plug it into the equation of the decision boundary: other features [29]. Despite any potential relationships be-
tween the features, a Naive Bayes classifer considers each
Predicted class label � sign(w · x + b). (6) feature to contribute independently to the likelihood.
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Applied Computational Intelligence and Soft Computing 9
To predict the class label of a new data instance, Naive 4. Results and Discussion
Bayes calculates the posterior probability P (C∣X1 , X2 , . . .,
Xn ) for each class C and selects the class with the highest 4.1. Environment Setup. Tis research used a machine with an
probability. In this study, using Bayes’ theorem and the naive 11th Gen Intel Core i7-1165G7 processor, 8.00 GB RAM, and
assumption, the posterior probability can be calculated using a 64 bit operating system. Special tools and programs were
equation: used to conduct the experimentation, including Anaconda
n IDE for Python 3, Jupyter Notebook for data visualization, and
P C ∣ X1 , X2 , . . . , Xn ∝ P(C) × P(Xi ∣ C), (8) Microsoft Excel for data handling. Python was chosen due to
i�1 its easy-to-learn syntax and the availability of libraries like
Numpy, Pandas, Scikit-learn, and Sklearn. Numpy calculates
where: P (C) is the prior probability of class C, P (Xi∣C) is the mean values; Pandas fetches data from fles, creates data
likelihood of observing feature Xi given class C, ∝ denotes frames, and handles data frames. Scikit-learn, also known as
proportionality, indicating that the probabilities are scaled to Sklearn, contains machine learning tools like classifcation,
sum up to 1. regression, clustering, dimensionality reduction, model se-
lection, and preprocessing. Te study used, data preprocessing
3.8. Performance Measures. In this work, the efectiveness of model selection and used popular machine learning algo-
a categorization strategy was summarized using a confusion rithms like Naive Bayes, Decision Trees, KNN, and SVM.
matrix. When there are more than two classes in a dataset or
when there are not an equal number of observations in each 4.2. Data Preprocessing. Tis study employed the dataset
class, classifcation accuracy might be deceptive. Confusion from Wollo University, Kombolcha Institute of Technology,
matrix calculation provides a clearer picture of the classi- which includes information on students from 2017 to 2022.
fcation model’s successes and shortcomings. Performance is After removing missing data, the data was pre-processed
measured based on precision, recall, and accuracy. Precision using Python software. Te original dataset had numerous
is the ratio of correctly predicted positive observations to the missing values and duplicated records, requiring data pre-
total predicted positive observations. processing to ensure the quality of input data for the pre-
Te easiest performance metric to understand is accu- dictive model. So, the dataset underwent rigorous pre-
racy, which is just the proportion of properly predicted processing to address issues such as missing values and
observations to all observations. One can believe that our duplicated records. Python software was utilized for data
model is the best if it has a high level of accuracy [30]. Yes, preprocessing to ensure the quality of input data for sub-
accuracy is an excellent indicator, but only when the values sequent predictive modeling tasks.
of the false positive and false negative rates are nearly equal
in the datasets. Terefore, one has to look at other pa-
rameters to evaluate the model’s performance [30]. Te 4.3. Data Visualization. Various visualizations were gen-
recall is the ratio of correctly predicted positive observations erated to provide insights into diferent aspects of the
to all observations in the actual class [31]. In this study, dataset. Figures such as entrance exam distribution along
performance is measured based on the following parameters, with fnal result, distribution of fnal result classes, regional
as shown in equations (9)–(11). distribution, disability frequency, and gender distribution
TP among students were presented to aid in understanding the
Precision � , (9) dataset’s characteristics and patterns.
TP + FP Te outcome of a student’s academic performance or
TP achievement at the end of a certain period, typically an
Recall � , (10) academic year has been used as the fnal result. It encom-
TP + FN
passes the overall assessment of the student’s progress, in-
TP + TN cluding factors such as grades, credits earned, and any
Accuracy � , (11) additional distinctions or qualifcations attained. In the
TP + TN + FP + FN
context of this study, the fnal result likely indicates the
where TP � True Positive, TN � True Negative, FP � False culmination of a student’s academic endeavors within the
Positive, and FN � False Negative. specifed time frame, providing a comprehensive measure of
Additionally, we employed Cohen’s kappa statistic, their performance and success. As Figure 4 indicates the
which is an excellent indicator that efectively handles both equivalent result for the minimum requirement to join the
multi-class and imbalanced-class issues. We occasionally university is dominant over others. However, the lower level
encounter a multi-class classifcation challenge in machine also has a high number of occurrences. Te researcher
learning. Because of this, we employ these measurements. analyses the data to magnify a sense of what additional work
Cohen’s kappa is defned as [32]: should be performed to quantify and extract insights from
p − pe the data. Te distribution of the fnal result class is presented
k� o , (12) in Figure 5.
1 − pe
Figure 6 shows the distribution of regions by fnal result.
where po is the observed agreement, and pe is the expected As it indicates, the southwestern region has less count
agreement. compared to the others. However, the southern region and
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 Applied Computational Intelligence and Soft Computing
fnal_result
Pass
entrance_result
HE Qualifcation Lower Tan A Level
A Level or Equivalent Post Graduate Qualifcation
12000
10000
8000
Count
6000
4000
2000
0
Pass Withdrawn Fail Distinction
Final Result
Figure 5: Final result class distribution.
Sidamo region have the highest count. Researchers used this Te majority of the Grade A students come from the
distribution to analyze and magnify the highest fnal result of Oromia Region, Addis Ababa, and Amhara Region. Grades
the students in their home region. C students are mostly found in the South Region, Somalia
Te distribution of the students with a disability con- Region, and Sidamo Region. It is noticed that the Grade B
cerning the number of attempts to pass the admission exam and C students share a common region which is the South
is inspected in Figure 7. After analysis and feature extraction, Region, Somalia Region, and Sidamo Region. Te connec-
the column disability has been dropped because this is in- tion between the regions of Grade B and C students can be
signifcant in the scope of the study. Terefore, the proposed understood by analyzing cluster C2. Table 4 shows the region
model considers only the signifcant features based on the features distribution.
feature selection outputs. In cluster C2 as shown in Table 5, the majority of the
males have clustered but when other clusters are taken into
4.4. Cluster Analysis. Clusters were identifed based on fnal account it is inferred that more females are scored Grade A
result classes, allowing for a deeper understanding of the when compared to males.
distribution of student performance across diferent cate- Grade A and B mostly have their entrance results as “A
gories. Tis analysis enabled the identifcation of distinct Level or Equivalent”. Slightly more than half of the Grade C
clusters and their characteristics, aiding in targeted in- have their entrance result as “Lower than A level” as shown
terventions and support strategies. Based on the fnal results, in Table 6. Tis infers that students in Grade C at this ed-
we can classify clusters as grades A, grades B, and grades C, ucational level fnd it possible to understand their courses
as shown in Table 3. and hence drop them. It is inferred that as the educational
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Applied Computational Intelligence and Soft Computing 11
final_result
Pass
Addis Ababa
Afar Region
Amhara Region
South Region
Tigrai Region
North Region
Somalia Region
Sidamo Region
Oromia Region
num_of_prev_attempts
140 4
3.5
120
num_of_prev_attempts
3
100
2.5
80
2
60 1.5
40 1
20 0.5
0
0
N Y
disability
Figure 7: Disability frequency with the number of attempts.
level increases people’s understanding of the course also 4.5. Algorithms Comparison. A comparative analysis of
increases, and the dropout rate decreases. Tis is the same various machine learning algorithms, including decision
for failed students. trees, Naı̈ve Bayes, support vector machine (SVM), and K-
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 Applied Computational Intelligence and Soft Computing
nearest neighbors (KNN), was conducted to evaluate their parameters. As shown by the fndings, SVM Linear gave the
efectiveness in predicting student outcomes. Te perfor- best prediction results before parameter adjustment, with
mance of each algorithm was assessed based on metrics such a 95.4% accuracy rate, followed by decision tree with a 90.9%
as precision, recall, accuracy, and kappa statistics. accuracy rate, and Naive Bayes with a 77.3% accuracy rate.
Decision trees (DT) are widely used for classifcation and Te outcomes of parameter adjustment greatly increased the
prediction, including predicting student performance, accuracy of the three prediction systems. SVM Linear’s
dropout rates, and fnal GPA. Naive Bayesian is a popular prediction precision increased from 95.4% to 96.0%. De-
classifcation algorithm due to its simplicity, computational cision tree accuracy increased from 90.9% to 93.4%. Te
efciency, and high accuracy. In educational settings, Naı̈ve Nave Bayes model’s prediction accuracy increased the
Bayes is used to predict student performance based on greatest, from 77.3% to 83.3%.
previous semester results, achieving the highest accuracy in Te prediction accuracy of Naı̈ve Bayes is the lowest
forecasting graduate students’ GPAs. Support Vector Ma- when compared to other prediction methods. Tis can be
chine is another accurate technique for student performance attributed to Naı̈ve Bayes assuming the strong independent
prediction. Ramesh et al. [33] examined the accuracy of relationship between the features.
Naı̈ve Bayes Simple, multilayer perceptron, SMO, J48, and Te fndings of this study provide a comprehensive
REP tree techniques for predicting student performance, understanding of student performance prediction in higher
fnding multilayer perceptron as the most appropriate al- education. By employing rigorous data preprocessing and
gorithm, but SMO is a competitive one. In this study, we feature selection techniques, the study establishes a robust
conduct a comparative study among KNN, SVM, decision predictive model, ensuring the reliability of subsequent
tree, and naı̈ve Bayes classifer. analyses.
Table 7 shows the outcomes of the prediction models Te comparative analysis of machine learning algo-
that were employed in this investigation. Table 6 displays the rithms, including support vector machine (SVM), decision
outcomes of prediction algorithms after adjusting the tree, Naı̈ve Bayes, and K-nearest neighbors (KNN), confrms
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Applied Computational Intelligence and Soft Computing 13
Table 7: Prediction results of all methods before parameter tuning. Table 8: Prediction results of all methods after parameter tuning.
Prediction method Precision Recall Accuracy Kappa statistic Prediction method Precision Recall Accuracy Kappa statistic
SVM linear 0.9402 0.9789 0.9541 0.9305368 SVM linear 0.9418 0.9843 0.9603 0.9398497
Naı̈ve bayes 0.8186 0.8943 0.7738 0.6545808 Naı̈ve bayes 0.8918 0.8943 0.8332 0.7428229
Decision tree 0.8890 0.8784 0.9099 0.8632813 Decision tree 0.9018 0.9043 0.9341 0.9004462
KNN 0.8441 0.8584 0.8538 0.8232813 KNN 0.8941 0.8984 0.8738 0.7738735
their efectiveness in predicting student outcomes. Tese adjustment. Additionally, researchers have looked into the
fndings align with existing literature, validating the versa- functions of K-nearest Neighbor, Naive Bayes, Decision
tility and accuracy of these classifers in educational settings. trees, and Support Vector Machine classifers. Using the
In the study, we employed grid search, a method used to dataset, we develop models, after which we assess the stu-
train and assess models with various values of k in the k- dent’s performance. Te fndings indicate that the decision
nearest neighbors (kNN) algorithm. Following the appli- tree is the second-best predictor, with 93.4% accuracy, and
cation of 10-fold cross-validation, we determined the op- the support vector machine is the best, with 96.0% accuracy.
timal value of k to be 8, as it yielded the best performance Te accuracy of Nave Bayes is the lowest at 83.3%. Although
metrics. the constructed model can ofer accurate predictions, there is
Te study uncovers patterns in student performance still much work to be done to incorporate these proposed
across regions and demographic groups, highlighting dis- methods into other predictive algorithms to generate a better
parities and intervention opportunities. Te lower pre- performance and user experience.
diction accuracy of Naı̈ve Bayes (83.3%) compared to SVM
(96.0%) and decision tree (93.4%) can be attributed to its 5. Conclusions
strong independence assumption, sensitivity to feature
correlations, and limited model fexibility. Te detailed re- Te methodology used afects data mining success. To lessen
sult has been presented in Table 8. sample-related bias in our investigation, we used repeated
SVM’s superior performance (96.0% accuracy) stems k-fold cross-validation. One of the causes of accurate pre-
from its margin maximization, ability to handle nonlinear diction outcomes is this. Te accuracy of the prediction
relationships, and robustness to overftting. Decision trees models was then further increased by parameter tuning or
(93.4% accuracy) excel in interpretability, handling non- hyperparameter optimization. Te results showed an in-
linear relationships, and identifying feature importance, crease in accuracy both before and after parameter adjust-
making them valuable predictors. In the study, the linear ment. Tis study demonstrated how data mining tools may
kernel was selected for the Support Vector Machine (SVM). forecast students’ grades when used with a solid method-
Tis decision was based on several factors: the linear kernel’s ology. Tis study explores the efectiveness of machine
simplicity and interpretability, its computational efciency, learning algorithms in predicting student outcomes in
and its ability to perform well with high-dimensional data or higher education. Te results show that Support Vector
when the data is linearly separable. Tese qualities make the Machine (SVM), decision tree, Naı̈ve Bayes, and K-nearest
linear kernel a suitable choice for analyzing and interpreting neighbors (KNN) classifers are more versatile and accurate
the decision boundary and the relationship between features than Naı̈ve Bayes (83.3%). Naı̈ve Bayes’ lower prediction
and the target variable in SVM classifcation. accuracy can be attributed to several factors, including its
Moreover, using hyperparameter tuning with these al- strong independence assumption, sensitivity to feature
gorithms has an improvement in model performance correlations, limited expressiveness, imbalanced-class dis-
compared with the existing methods as shown in Table 6. tribution, and lack of model fexibility. SVM achieved the
Predicting a student’s performance could be helpful in highest accuracy of 96.0% compared to other classifers like
various contexts related to the university-level learning Decision Tree, Naı̈ve Bayes, and KNN. SVM’s superior
process. Numerous papers have been produced that analyze performance is due to margin maximization, nonlinear
distinct characteristics or aspects crucial to comprehending separability, robustness to overftting, handling high-
and enhancing pupils’ academic achievement. Tis study has dimensional data, efective kernel functions, and fewer
developed a model that, with the aid of historical student hyperparameters. Decision trees, on the other hand,
records, can assist students in improving their exam per- achieved the second-highest accuracy of 93.4% among the
formance by foretelling student achievement. Terefore, it is classifers evaluated in the study. Decision trees provide
clear that the issue is one of classifcation, and the suggested a transparent and interpretable model, making it easier for
model assigns a student to a category depending on the users to understand the decision-making process. Tey can
information provided. Te methodology used afects data handle nonlinear relationships by recursively partitioning
mining success. To lessen sample-related bias in our in- the feature space into subsets based on feature thresholds.
vestigation, we used repeated k-fold cross-validation. One of Tey also rank features based on their importance in the
the causes of accurate prediction outcomes is this. Te ac- classifcation process, identifying key predictors of the target
curacy of the prediction models was then further increased variable and providing valuable insights into the underlying
by parameter tuning or hyperparameter optimization. Te data distribution. Decision trees are robust to irrelevant
results showed an increase in accuracy after parameter features or noisy data because they selectively choose
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 Applied Computational Intelligence and Soft Computing
features that contribute to improving classifcation accuracy. student ID, gender, region, entrance result, number of
Teir scalability allows them to handle large volumes of data previous attempts, studied credits, disability status, and fnal
efciently while maintaining high predictive accuracy. Te result. Tis dataset facilitates demographic studies, academic
outcomes of parameter adjustment greatly increased the trend analysis, and identifcation of factors infuencing
accuracy of the three prediction systems. SVM Linear’s student outcomes. (Supplementary Materials)
prediction precision increased from 95.4% to 96.0%. De-
cision tree accuracy increased from 90.9% to 93.4%. Te References
Nave Bayes model’s prediction accuracy increased the
greatest, from 77.3% to 83.3%. [1] Y. Baashar, G. Alkawsi, N. Ali, H. Alhussian, and
Te study uses advanced machine learning algorithms to H. T. Bahbouh, “Predicting student’s performance using
predict student performance, enhancing accuracy and enabling machine learning methods: a systematic literature review,” in
Proceedings of the 2021 International Conference on Computer
early intervention. It also allows for personalized interventions
and Information Sciences (ICCOINS), pp. 357–362, Kuching,
based on individual needs, optimizing resource allocation. Te Malaysia, June 2021.
study provides insights into the efectiveness of diferent ML [2] S. K. Yadav and S. Pal, “Data mining: a prediction for per-
algorithms, enabling informed decision-making for educators formance improvement of engineering students using clas-
and policymakers. It also emphasizes continuous improvement sifcation,” 2012, https://arxiv.org/abs/1203.3832.
through longitudinal studies and stakeholder feedback, en- [3] M. Liu and D. Yu, “Towards intelligent E-learning systems,”
suring the models remain relevant and efective in addressing Education and Information Technologies, vol. 28, no. 7,
evolving challenges in education and student support. How- pp. 7845–7876, 2023.
ever, it has limitations, including a small sample size single [4] T. M. Mitchell, “Te discipline of machine learning,” Machine
institution focus, and parameter tuning sensitivity. Future Learning, vol. 9, 2006.
[5] O. Fy, A. Jet, O. Awodele, J. O. Hinmikaiye, O. Olakanmi, and
research should focus on larger, more diverse datasets, lon-
J. Akinjobi, “Supervised machine learning algorithms: clas-
gitudinal analysis, incorporating additional variables, im- sifcation and comparison,” International Journal of Computer
proving model interpretability, and external validation. Tese Trends and Technology, vol. 48, no. 3, pp. 128–138, 2017.
could enhance the robustness and generalizability of predictive [6] N. Delavari, S. Phon-Amnuaisuk, and M. R. Beikzadeh, “Data
models, provide deeper insights into performance factors, and mining application in higher learning institutions,” In-
improve transparency and trust in predictive models. By formatics in Education, vol. 7, no. 1, pp. 31–54, 2008.
addressing these limitations and pursuing future directions, [7] S. Nunn, J. T. Avella, T. Kanai, and M. Kebritchi, “Learning
researchers can contribute to the development of more ac- analytics methods, benefts, and challenges in higher educa-
curate and actionable predictive models for improving student tion: a systematic literature review,” Online Learning, vol. 20,
outcomes. no. 2, pp. 13–29, 2016.
[8] S. Bharara, S. Sabitha, and A. Bansal, “Application of learning
analytics using clustering data Mining for Students. dispo-
Data Availability sition analysis,” Education and Information Technologies,
vol. 23, no. 2, pp. 957–984, 2018.
Te data used to support the fndings of this study are in-
[9] B. K. Baradwaj and S. Pal, “Mining educational data to analyze
cluded within the supplementary information fle(s). students’ performance,” 2012, https://arxiv.org/pdf/1201.
3417.pdf.
Disclosure [10] K. Aulakh, R. K. Roul, and M. Kaushal, “E-learning en-
hancement through Educational Data Mining with Covid-19
Te manuscript has been presented with the reference number outbreak period in backdrop: a review,” International Journal
[4433087] on https://papers.ssrn.com/sol3/papers.cfm?abstract_ of Educational Development, vol. 101, Article ID 102814, 2023.
id=4433087 as preprinted to work together on theories and [11] J. M. Helm, A. M. Swiergosz, H. S. Haeberle et al., “Machine
fndings and received comments from the scholars [34]. learning and artifcial intelligence: defnitions, applications,
and future directions,” Curr. Rev. Musculoskelet. Med., vol. 13,
Conflicts of Interest no. 1, pp. 69–76, 2020.
[12] C. K. Suryadevara, “Predictive modeling for student perfor-
Te authors declare that there are no conficts of interest mance: harnessing machine learning to forecast academic
regarding the publication of this paper. marks,” International Journal of Applied Science and Engi-
neering, vol. 8, no. 12, 2018.
[13] S. Talwar, M. Talwar, V. Tarjanne, and A. Dhir, “Why retail
Acknowledgments investors trade equity during the pandemic? An application of
artifcial neural networks to examine behavioral biases,”
I would like to take this opportunity to express my heartfelt
Psychology and Marketing, vol. 38, no. 11, pp. 2142–2163,
gratitude to the Wollo University ICT teams for their as- 2021.
sistance with data collection. [14] S. Kotsiantis, K. Patriarcheas, and M. Xenos, “A combina-
tional incremental ensemble of classifers as a technique for
Supplementary Materials predicting students’ performance in distance education,”
Knowledge-Based Systems, vol. 23, no. 6, pp. 529–535, 2010.
Te dataset, originating from Wollo University’s Kombolcha [15] B. A. Sani and H. Badamasi, “Machine learning algorithms to
Institute of Technology, encompasses student data from predict student’s academic performance,” Bakolori Journal of
2017 to 2022. It contains 32,582 records with eight columns: General Studies, vol. 12, no. 2, pp. 3656–3671, 2021.
4795, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/4067721, Wiley Online Library on [21/03/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Applied Computational Intelligence and Soft Computing 15
[16] Y. A. Alsariera, Y. Baashar, G. Alkawsi, A. Mustafa, Knowledge Management (CAMP), pp. 276–281, Selangor,
A. A. Alkahtani, and N. Ali, “Assessment and evaluation of Malaysia, June 2010.
diferent machine learning algorithms for predicting student [30] P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and
performance,” Computational Intelligence and Neuroscience, H. Nielsen, “Assessing the accuracy of prediction algorithms
vol. 2022, pp. 1–11, 2022. for classifcation: an overview,” Bioinformatics, vol. 16, no. 5,
[17] H. M. R. Hasan, A. K. M. S. A. Rabby, M. T. Islam, and pp. 412–424, 2000.
S. A. Hossain, “Machine learning algorithm for student’s [31] D. M. W. Powers, “Evaluation: from precision, recall and F-
performance prediction,” in Proceedings of the 2019 10th measure to ROC, informedness, markedness and correlation,”
International Conference on Computing, Communication and 2020, https://arxiv.org/abs/2010.16061.
Networking Technologies (ICCCNT), pp. 1–7, Kanpur, India, [32] T. Byrt, J. Bishop, and J. B. Carlin, “Bias, prevalence and
July 2019. kappa,” Journal of Clinical Epidemiology, vol. 46, no. 5,
[18] E. S. Bhutto, I. F. Siddiqui, Q. A. Arain, and M. Anwar, pp. 423–429, 1993.
“Predicting students’ academic performance through super- [33] V. Ramesh, P. Parkavi, and K. Ramar, “Predicting student
performance: a statistical and data mining approach,” In-
vised machine learning,” in Proceedings of the 2020 In-
ternational Journal of Computer Applications, vol. 63, no. 8,
ternational Conference on Information Science and
2013.
Communication Technology (ICISCT), pp. 1–6, Karachi,
[34] E. Ahmed, “Analysing and Predicting Student Performance,”
Pakistan, April 2020.
2023, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=
[19] L. H. Alamri, R. S. Almuslim, M. S. Alotibi, D. K. Alkadi,
4433087.
I. Ullah Khan, and N. Aslam, “Predicting student academic
performance using support vector machine and random
forest,” in Proceedings of the 2020 3rd International Confer-
ence on Education Technology Management, pp. 100–107,
London, UK, June 2020.
[20] B. Albreiki, N. Zaki, and H. Alashwal, “A systematic literature
review of student performance prediction using machine
learning techniques,” Education Sciences, vol. 11, no. 9, p. 552,
2021.
[21] C. Janiesch, P. Zschech, and K. Heinrich, “Machine learning
and deep learning,” Electronic Markets, vol. 31, no. 3,
pp. 685–695, 2021.
[22] I. H. Sarker, “Machine learning: algorithms, real-world ap-
plications, and research directions,” SN Computer Science,
vol. 2, no. 3, pp. 160–221, 2021.
[23] M. Cui and others, “Introduction to the k-means clustering
algorithm based on the elbow method,” Auditing in Ac-
counting, vol. 1, no. 1, pp. 5–8, 2020.
[24] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms
for hyper-parameter optimization,” Advances in Neural In-
formation Processing Systems, vol. 24, 2011.
[25] A. Mucherino, P. J. Papajorgji, P. M. Pardalos, A. Mucherino,
P. J. Papajorgji, and P. M. Pardalos, “K-nearest neighbor
classifcation,” Data Mining in Agriculture, pp. 83–106, 2009.
[26] E. Ahmed and A. Letta, “Book recommendation using col-
laborative fltering algorithm,” Applied Computational In-
telligence and Soft Computing, vol. 2023, Article ID 1514801,
12 pages, 2023.
[27] J. H. Min and Y.-C. Lee, “Bankruptcy prediction using
support vector machine with optimal choice of kernel
function parameters,” Expert Systems with Applications,
vol. 28, no. 4, pp. 603–614, 2005.
[28] P. K. Mall, R. K. Yadav, A. K. Rai, V. Narayan, and
S. Srivastava, “Early warning signs of Parkinson’s disease
prediction using machine learning technique,” Journal of
Pharmaceutical Negative Results, vol. 15, pp. 4784–4792, 2022.
[29] K. M. Al-Aidaroos, A. A. Bakar, and Z. Othman, “Naive Bayes
variants in classifcation learning,” in Proceeding of the 2010
International Conference on Information Retrieval and