0% found this document useful (0 votes)
2 views

Predicting the Students Performance

The document discusses the use of machine learning algorithms to predict student performance in Massive Open Online Courses (MOOCs), highlighting the challenges in online education and the importance of student engagement. It reviews various algorithms, with Decision Tree and Gradient Boosting showing the highest accuracy, while Random Forest performed poorly. The study emphasizes the need for effective predictive models to enhance student outcomes and the role of learning analytics in this context.

Uploaded by

saibhavnareddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Predicting the Students Performance

The document discusses the use of machine learning algorithms to predict student performance in Massive Open Online Courses (MOOCs), highlighting the challenges in online education and the importance of student engagement. It reviews various algorithms, with Decision Tree and Gradient Boosting showing the highest accuracy, while Random Forest performed poorly. The study emphasizes the need for effective predictive models to enhance student outcomes and the role of learning analytics in this context.

Uploaded by

saibhavnareddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 18

Predicting the Performance of Students in Massive Open

Online Courses Using Machine Learning Algorithms

Abstract-In recent years, Massive Open Online Courses (MOOCs) and online education have seen a surge in
popularity. Numerous issues pertaining to student dedication, presentation, and preservation assessments have been
brought to light by the considerable interest in online learning. Many academics have looked into methods to predict
student outcomes, like performance and failure in online courses, due to the increased demands and difficulties in
online education. MOOCs have been embraced by prestigious universities and institutions as an effective dashboard
platform that allows students from all over the world to take part in these courses. Set computer-marked assessments
are used to evaluate the learning progress of the students. Specifically, after the student finishes the online tests, the
computer provides instant feedback. In addition to the degree of engagement, the researchers assert that the success rate
of students in an online course can be linked to their performance in the previous session. The literature hasn't done
enough to assess whether student performance and engagement on earlier tests could have an impact on student
achievement on subsequent tests. The state-of-the-art research that uses machine learning techniques to analyze the data
of online learners and forecast their results is thoroughly reviewed in this paper. Identifying and classifying the
characteristics of online courses used to predict earners' outcomes, determining prediction outputs, identifying
strategies and feature extraction methodologies, describing evaluation metrics, offering taxonomy to analyze related
studies, and summarizing the field's limitations and challenges are all contributions of this study. In this study the
Decision Tree (90.64%) and Gradient Boosting (86.64%) algorithms show the highest accuracy, with Gradient
Boosting slightly lagging behind. SVM performed moderately, while Random Forest performed poorly, as indicated by
its significantly lower accuracy score.

Keywords: Machine Learning, Student Performance Prediction, MOOCs, Learning Analytics (LA), Tutor Marked
Assessments, Computer Marked Assessments), Recursive Feature Elimination.

I. INTRODUCTION

For a better life Education is an essential tool to fascinating tone-assurance and furnishes essential
needs. Due to changes in technology (such as AI&ML, IOT, etc.) and teaching methods, most of the
educational institutions are incorporate them into their traditional teaching methods [1]. Academic
performance of the student is the key indicator for any educational progress. Hence institutions are trying
to predict academic success of the student based on the factors like gender, age, staff, and literacy.
Placements and institutions rankings are improved if student performance is improved as it's a primary
factor analyzed by employer [2]. Higher Educational institutions are facing challenges for providing quality
education, formulating techniques for predicting and assessing student performance, and requirements.

Advanced teaching methods and massive open online courses (MOOC) platforms are taking
advantage in creating automatic grading systems, recommenders, and flexible systems. Lack of standard
evaluation tools, cost of the courses, and complexity in identifying specific requirements due to lack of
direct communication are main issues related with MOOC system. Online platforms are using Long-term
records to evaluate student performance and course [3]. Numerous scholars have proposed various machine
learning models to effective analysis of academic activities which poor in understanding where people may
produce efficient data- engineering algorithms [4]. In general, machine learning algorithms that conclude
from exemplifications handed externally to develop general suppositions that make prognostications about
cases to come [5].

On the other hand, data mining is the key in filtering through a massive quantum of data to find
relevant information generating opinions is backed by it. Data mining is applicable many ways in
educational data analysis [6]. Gathering and analysing of data from learners to improve literacy things and
enhance learning abilities is the main task of learning analytics [7]. This need can be met and implied
advancements by course design and delivery can be recommended by classifying student stranded on their
data. To scrutinize the factors impacting student performance and encouragement, the major thing is to
identify significant criterion in an academic environment and to examine the associations between these
criteria exercising the ideas of learning analytics and educational data mining [8].

The study also compares different machine learning algorithms, including SVM, Decision Tree,
Random Forest and Gradient Boosting to estimate their predictive performance in prognosticating pupil
issues. The exploration addresses specialized gaps in prognosticating pupil performance by fastening on
indispensable algorithms over artificial neural networks. The proposed model stands out with its innovative
clustering fashion, comprehensive relative analysis, and practical operation in soothsaying pupil
performance. The proposed models offer new sapience into determining the most critical literacy exertion
and help the preceptors in keeping shadowing of timely pupil performance. To the stylish of our
knowledge, pupil performance has been estimated in online course using only two targets “success” and
“fail”. Our model predicts the performance with three- class markers “success”, “fail” and “withdrew”.

II. RELATED WORK

In an online learning environment advanced standby implementation is a key point of machine


learning. Analysts have embraced different strategies to screen execution [10]. The Field Analysis Method
(FAM) was proposed to foresee the student's execution in ingeniously teaching structure (ITS) taking into
thought the trouble level of appraisals based on the Thing Reaction Hypothesis concept [9] [10]. The
trouble level of errands can induce an estimation of the relationship between the student’s exhibitions and
appraisal questions. The outcomes uncover that consolidating the idle factors into the gauges of
understudy execution can altogether improve the show [10].

In MOOCs, the performance of learner accomplishment is measured by how they affect their
learning. According to analysts, learning analytics and machine learning are practical tools for tracking
student information. They also believe that ML could assist students in transferring data about the
academic process, giving experimenters the ability to both visualize and analyze the data collected from
each learner league. Consequently, comparable courses can be used to learn an accurate prescient model
[11][12][13]. The final performance of students in an online course is predicted by their quiz scores and
marks from the first assessment, which are combined with social factors [14].

There were two prescient models displayed. Calculated relapse was used in the initial
demonstration to predict whether understudies received a typical or qualification certificate. Calculated
relapse was also used in the moment prescient demonstration to predict whether or not understudies
achieved certification. The results showed that the most persuasive factor in earning a qualification is the
quantity of peer evaluations. The most reliable indicator for receiving a certificate was thought to be the
typical test results. Separately, the qualification and ordinary model precision rates were 92.6% for the
initial demonstration and 79.6% for the current demonstration [14]. Calculated relapse has some obstacles
despite its focal points. It acknowledges that the autonomous factors have a direct relationship.

The association between the data of the Virtual Learning Environment and alternative
implementation has been examined at the College of Maryland, Baltimore Province (UMBC) [12]. LA was
utilized through the execution of the Check My Movement (CMA) apparatus. The courses appeared to
show that understudies who lock in with the course habitually are more likely to gain check C or higher
than those who do not frequently lock in [12]. Despite their various preferences, VLE apparatuses confront
challenges that restrain their adequacy. One major issue is the dependence on client engagement, as
understudies must reliably associate with the stage to advantage from its highlights.

III. PROPOSED METHODOLOGY

The process we followed to reach our results, we present and describe the stages in figure 1. This
system aims to provide a robust methodology for early prediction and classification of student outcomes
such as "Pass," "Fail," or "Withdrawn". The prediction system is built on a foundation of ML models, like
SVM, RF, DTA, and GBM. Each model evaluated using key metrics like Accuracy, Precision, Recall, F1-
Score, and RMSE. The feature selection process employs Feature Elimination using Recursive method to
prioritize the most influential variables, ensuring the models remain both efficient and interpretable.
Figure 1: Demonstration of Proposed System

rocess of hiding a secret


audio/video/text within a
larger
one in such a way that
someone cannot know
the presence or contents
of the
hidden audio/video/text.
Steganography is, many
times, confused with
cryptography as both the
techniques are used to
secure information. The
difference lies in the fact
that steganography hides
the data so that nothing
appears out of ordinary
while cryptography
encrypts the text, making
it difficult
for an out sider to infer
anything from it even if
they do attain the
encrypted text.
Both of them are
combined to increase the
security against various
malicious
attacks. The purpose of
Steganography is to
maintain secret
communication
between two parties.
Using the LSB
technique, which
facilitates plain text
hiding
in an image as well as
hiding files in an image.
It works with JPEG and
PNG
formats for the cover
image and always creates
PNG Stego image due to
its
lossless compression.
Least Significant Bit
Embeddings (LSB) are a
general steganographic
technique that may be
employed to embed data
into a
variety of digital media,
the most studied
applications are using
LSB embedding to
hide one image inside
another. In this image
steganography software,
we can
hide the data using LSB
embed techniques.
Steganography is the
process of hiding a secret
audio/video/text within a
larger
one in such a way that
someone cannot know
the presence or contents
of the
hidden audio/video/text.
Steganography is, many
times, confused with
cryptography as both the
techniques are used to
secure information. The
difference lies in the fact
that steganography hides
the data so that nothing
appears out of ordinary
while cryptography
encrypts the text, making
it difficult
for an out sider to infer
anything from it even if
they do attain the
encrypted text.
Both of them are
combined to increase the
security against various
malicious
attacks. The purpose of
Steganography is to
maintain secret
communication
between two parties.
Using the LSB
technique, which
facilitates plain text
hiding
in an image as well as
hiding files in an image.
It works with JPEG and
PNG
formats for the cover
image and always creates
PNG Stego image due to
its
lossless compression.
Least Significant Bit
Embeddings (LSB) are a
general steganographic
technique that may be
employed to embed data
into a
variety of digital media,
the most studied
applications are using
LSB embedding to
hide one image inside
another. In this image
steganography software,
we can
hide the data using LSB
3.1. FEATURE EXTRACTION
Chi-Square test identifies the top 4 features highly correlated with the target variable then fed into the
classification models. Features are reduced to 4 key features after feature extraction and each one is
denoted by
-------------------(1)
Where X and Y represents feature vector and target variable and Y=0, poor performance and y=1,
successful performance.
-------------------------------(2)

Chi-Square, Ok-Ek)2/Ek---------------(3)
Where, Ok and Ek are observed frequency and expected frequency.
The extracted features help the models to differentiate between students likely to perform well and those at
risk of dropping out.

3.2. SUPPORTED VECTOR MACHINE


SVM module leverages the SVM.SVC class configured with RBF kernel, C=2.0 a cost parameter,
and gamma='scale' to handle non-linear data effectively. SVM finds the optimal hyperplane that maximizes
the margin between two classes. For linearly separable data, hyper plane Equation can be represented as:
---------------------------(4)
Where, w is the Weight vector and b is Bias term.
Optimization:
i----------------------(5)
Kernel Function: For non-linearly separable data, a kernel function K(x i, xj) to map data into higher
dimensions:
------------------------------(6)
Decision Function:
-----------------------------------(7)

3.3. RANDOM FOREST (RF)


In training stage several decision trees are constructed then merge their results to get better
accuracy and then reducing over-fitting. In the implementation, parameters such as n_estimators=1 and
max_depth=0.9 restrict the model's ensemble capability, causing it to behave similarly to a single tree.
Prediction Aggregation for classification
---------------------------------(8)
Where Tk(x) is the prediction from the k-th tree.

3.4. DECISION TREE


Based on feature threshold, decision tree split the dataset recursively. The splits are used to
maximize class separation using Gini Impurity or Information Gain. The model trained using parameters
such as n_estimators=10, learning_rate=0.2, and a max_depth=2.
a. Gini Impurity
2
i ---------------------------------------------------(9)
Where, Pi is Proportion of samples belonging to class iii and C is Total number of classes.
b. Information Gain
---------------------------------(10)
Where, H(S) is Entropy of set S and H(Sv) is Entropy of subset Sv
c. Entropy
H(S) ilog2pi-------------------------------(11)
This process repeats recursively, creating branches until all data points are classified or a stopping criterion
is met (e.g., max depth).
 Input Features: X′=[f1′,f2′,f3′,f4′]
 Output: Classification as good performer (y=1) or poor performer (y=0).
Initialize the model with a constant value:
F0(x)=arg{min {c (yi,c)}}---------------------(12)
Additive model:
------------------(13)
Where, Fm(x): Model at iteration mmm, η: Learning rate and h m(x): A weak learner Gradient descent
minimizes the loss function.
---------------(14)
The weak learner hm(x) gi.
Loss function: Cross-entropy loss for classification:
L(y,y^)=− ilog(y^i)+(1−yi)log(1−y^i)----------------------(15)

3.5. EVALUATION METRICS


The performance of the classification model has been evaluated using various evaluation metrics
like accuracy, sensitivity, specificity, precision, recall, f1-measure, MSE, RMSE, MAE and ROC curve
(AUC).

Table. The performance metrics used for classification and regression

Metric Formula

Precision (P)

Recall (R)

Accuracy

F1-score

MSE

RMSE

MAE

IV. RESULT AND DISCUSSION


This study performed on OULAD dataset to design predictive models for student performance in
assessments and overall course success. The dataset contains a total of 2,616 records, of which 2,092 are
allocated for training and 524 for testing. The dataset consists of 14 original features such as, subject,
course, CRN, course title, A+, A, A-, B+, B, B-, C+, C, C-, D+, D, D-, F and W, which were subsequently
reduced to 4 extracted features based on their relevance using feature selection techniques.

Figure 2: Sample Dataset


iii. Gradient boosting iv. Support vector machine
Figure 3: Results of i)Decision Tree, ii)Random Forest, iii)Gradient Boosting and iv)SVM

Table .2. Evaluation Results


Algorithm AUC CA Precision Recall F1-score
Decision Tree 0.94 89.50 0.91 0.91 0.91
Gradient Boosting 0.98 86.64 0.82 0.83 0.87
Random Forest 0.40 04.38 0.00 0.04 0.00
Support Vector Machine 0.97 69.27 0.81 0.63 0.73

The Decision Tree algorithm achieved the highest accuracy of 89.50%, with a macro-average F1-
score of 82% and a weighted-average F1-score of 91%. The Gradient Boosting algorithm also performed
well, achieving an accuracy of 86.64%. Its macro-average F1-score is 72%, while the weighted-average F1-
score is 87%. The Random Forest algorithm, performed poorly, with an accuracy of only 4.39%. Both its
macro-average and weighted-average F1-scores were very low, indicating that it struggled to generalize on
this dataset. The SVM algorithm achieved an accuracy of 69.27%, with a macro-average F1-score of 41%
and a weighted-average F1-score of 69%. Its performance, while moderate, indicates that it struggles with
datasets containing imbalanced or non-linear relationships, as evidenced by its limited ability to capture all
performance classes accurately.

Figure .4. Decision Tree, Random Forest, and Gradient Boosting and Support Vector Machine confusion matrix
Figure .5. Heat map of Decision Tree, Random Forest, Gradient Boosting and Support Vector

Figure .6. Classification of Student Performance Reason


Based on the trained models, the system classifies individual students. The output predicts either
Good Performance (y=1) or Reason for Poor Performance (y=0). For example: Predicted: "Reason of Poor
Performance: Dropout" and Extracted Features: 0 (indicating the key indicators for dropout). For each sample, the
model predicts whether the reason for poor performance is a "dropout" (Extracted Feature: 0) or associated with good
performance (Extracted Feature: 4).

Figure .7. Accuracy of various Prediction models


Figure.5 represents a visual comparison of the performance of different machine learning
algorithms. From the figure it is observed that Decision Tree gives high accuracy (90.64%), Gradient
Boosting (86.64%), Support Vector Machine (69.27) and Random Forest (04.38%) which is very low. The
Decision Tree and Gradient Boosting algorithms show the highest accuracy, with Gradient Boosting
slightly lagging behind. SVM performed moderately, while Random Forest performed poorly, as indicated
by its significantly lower accuracy score.
Figure 8: Decision Tree, Random Forest, Gradient Boosting and Support Vector Machine ROC

Table 3. Comparative Summary of Models


Algorithm Accuracy (%) Key Characteristics
Decision Tree 89.50 Simple, interpretable, and non-linear relationships.
Complex data and corrects errors iteratively; prone to over
Gradient Boosting 86.64
fitting without tuning.
Underperformed due to possible data imbalance or insufficient
Random Forest 4.39
hyper parameter tuning.
Effective for smaller datasets but limited by overlapping
SVM 69.27
feature distributions.

The results validate the efficiency of feature selection in improving model performance,
and the Decision Tree algorithm emerged as the most reliable model for accurate predictions in
this application. Finally, the results are analyzed to uncover meaningful insights to understand
how different features impact student performance, identifying potential areas of improvement in
the model, and visualizing the results for better interpretability.

V. CONCLUSION

In this research, we established a data-driven approach to the analysis of predict learning performance
on an MOOC platform. Now a days, Massive Open Online Courses (MOOCs) and online education have
seen a surge in popularity. In data-driven environment, ML is a tool for predicting student performance in
online courses, with the goal of enhancing educational outcomes. By leveraging various ML models,
identify key factors that influence student success and to develop predictive models that can forecast
academic performance. The results of the research demonstrated the effectiveness of these models in
providing accurate predictions, with Random Forest and Gradient Boosting outperforming than other
algorithms in terms of accuracy, making them the preferred choices for predicting student success. A
critical finding of this research is the importance of prior academic performance in predicting future
success. The predictive models confirmed that factors such as previous grades, engagement with course
materials, and demographic features play significant roles in determining whether a student will pass, fail,
or withdraw from a course.

The analysis showed that the incorporation of both static features and dynamic features
significantly improved the accuracy of predictions, especially when using ensemble methods like Random
Forest and Gradient Boosting. This paper also highlighted the challenges posed by class imbalance and
high-dimensional data, issues that were addressed through techniques like feature selection and data
balancing. This paper emphasizes the increasing relevance of Learning Analytics (LA) in education, which
utilizes data to understand student behaviours, predict outcomes, and implement timely interventions. The
ability to predict student performance early in the course provides an opportunity for educators to identify
at-risk students and offer targeted support, potentially improving retention rates and overall student success.
Furthermore, this paper highlights the need for effective model deployment in real-world educational
settings. By developing models that predict student performance, educational institutions can leverage these
tools to enhance course design, adapt content delivery, and allocate resources more efficiently. This
approach not only improves the learning experience for students but also ensures that educational
institutions can effectively support their students and optimize the learning process. In summary, this paper
demonstrates the power of machine learning in educational data mining, offering a powerful tool for
predicting student performance and supporting timely interventions. By combining predictive analytics
with adaptive learning strategies, educational institutions can foster more inclusive, effective, and data-
driven learning environments. As technology continues to advance, these models will evolve, further
enhancing their ability to support student success and contribute to the ongoing transformation of
education.

You might also like