0% found this document useful (0 votes)
43 views45 pages

1 Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views45 pages

1 Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

A Minor Project Report

on
STUDENT PERFORMANCE PREDICTION

Submitted to
RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA
BHOPAL (MP)

MINOR PROJECT – II REPORT

Submitted by
Vibhuti Shrivastava 0111CS201191
Vidushi Deshmukh 0111CS201193

Department of Computer Science & Engineering

Technocrats Institute of Technology, Bhopal (MP)


Session 2020-2024
Technocrats Institute of Technology,
Bhopal (MP)
Department of Computer Science & Engineering

CERTIFICATE
This is to certify that the work embodied in this report entitled “Student
Performance Prediction” has been satisfactorily completed by Vibhuti
Shrivastava, Vidushi Deshmukh. It is a bonafide piece of work, carried out
under our/my guidance in the department of computer science and information
technology, Technocrats Institute of Technology, Bhopal for the partial
fulfilment of the Bachelor of Technology during the academic session 2023.

Dr. Kiran Pandey Dr. Manoj Tyagi


(Professor CSE Department) (HOD CSE Department)
TECHNOCRATS INSTITUTE OF
TECHNOLOGY,
BHOPAL

Department of Computer Science & Engineering

DECLARATION

We VIBHUTI SHRIVASTAVA, VIDUSHI DESHMUKH student of


BACHLOR of ENGINEERING in COMPUTER SCIENCE
ENGINEERING. Session 2022-23 Technocrats Institute of
Technology, Bhopal M.P. here by declare that the work presented in
this project report entitled “STUDENT PERFORMANCE
PREDICTION” is the outcome of our own work, is bonafide and
correct to the best of our knowledge and this work has been carried
out taking of Engineering Ethics.

VIBHUTI SHRIVASTAVA (0111CS201191)


VIDUSHI DESHMUKH (0111CS201193)

Date:
ACKNOWLEDGEMENT

I deem it's my privilege to extent my profound


gratitude and appreciation towards all those who
have directly or indirectly involved themselves in
making this project a great success. It gives me
immense pleasure to express my deepest sense
of gratitude and sincere thanks to my respected
guide Dr. Kiran Pandey, for their valuable
guidance encouragement and help for this work.
I express my deep sense of gratitude to Dr.
Kiran Pandey, for his keen interest, continued
encouragement and support.

I would also like to express my sincere thanks to


Dr Asif Ullah Khan Director of Technocrats
Institute of Technology, Bhopal, Dr.Manoj Tyagi
Head of Department Computer Science and
Engineering for providing me with all the moral
support and necessary help. My sincere
appreciation and thanks to all for keen interest,
continued encouragement and support my family
members and friends.
VIBHUTI SHRIVASTAVA
(0111CS201191)
VIDUSHI DESHMUKH
(0111CS201193)

INDEX

1. Abstract
2. Introduction
2.1 Background and problem motivation
2.2 Overall aim
2.3 Problem statement
2.4 Research questions
2.5 Scope
3. Objective of the Project
4. Scope of the Project
5. Literature Review
6. Challenges
7. Problem Statement
8. Related Literature Survey
8.1 Related surveys
8.2 Text classification using machine learning
methods
8.3 Text classification algorithms
8.4 Student performance evaluation in educational
data mining
8.5 Using TensorFlow to support educational
problem
8.6 Tracking and predicting student performance
in degree programs
9. Software and Hardware Requirements
9.1 Proposed methodology
9.2 Machine learning
10. Software Model
11. Module Description
11.1Scientific method description
11.2 Project method description
11.3 Evaluation method
12. Result Analysis
13. Conclusion
14. Application
15. Future Scope
15.1Future work
15.2 The degrees of each course
15.3 Bigger dataset for SVM and ANN
15.4 Other types of algorithms
16. References
ABSTRACT

Student’s performance is a
success factor in higher
education institutions.
Student’s performance is a
success factor in higher
education institutions.
Student’s performance is a
success factor in higher
education institutions.
Student performance is a success factor in higher
education institutions. The goal of any educational
institution is offering the best educational experience
and knowledge to the students. Identifying the students
who need extra support and taking the appropriate
actions to enhance their performance plays an important
role in achieving that goal.
In this machine-learning techniques have been used to
build a classifier that can predict the performance of the
students.

Machine learning techniques in educational data mining


aim to develop a model for discovering meaningful
hidden patterns and exploring useful information from
educational settings. The dataset was fetched from
Ladok and consisted of anonymous higher education
student credit from a multitude of courses. The
algorithms were run on TensorFlow with Keras as an
API and were built, trained, and run for evaluation all
on Google Colab. The source code is written in Python.
The study’s non-technical goal was to find a prediction
pattern for student performance and provide a technical
framework tool to provide feedback for students and
university faculty.
INTRODUCTION

Educational data mining refers to data mining


techniques used to analyze educational data.
Educational institutions store a vast quantity of data in
order to keep track of students, faculty, and courses.
This data contains personal and academic information
about students such as gender, nationality, grade,
semester, and so forth. Various universities and
independent organizations have begun to use
educational data mining to improve the lives of their
students.

This is Project uses a script for data analysis and


visualization using popular machine-learning libraries
such as Pandas, Seaborn, Matplotlib, and Scikit-learn.
The script reads a CSV file containing data about
students' academic performance and behavior and
allows the user to generate various graphs and charts
based on different aspects of the data. The user can
choose to visualize the data according to different
categories, such as gender, nationality, grade, and
semester, and generate different types of plots, such as
bar charts and count plots.
Additionally, the script also includes some
preprocessing steps where certain columns are dropped
from the data to prepare it for further analysis.

2.1 Background and problem motivation :


There are students that drop out of university programs
in the initial first year. According to a study in 2018,
Sweden had a dropout percentage of 29% of full-time
bachelor’s students. There are many possible factors to
this occurrence, motivation and progress perception
being chiefly two among many that are hypothesized to
be causative, in this study.
Furthermore, the study investigates a possible way of
providing constructive feedback and making, in a sense,
a gamification of the study process to increase students’
university program completion rate.
One way of investigating this pattern of behavior is
with the help of Machine Learning (ML). This is
certainly a broad subject and there are many problems
and challenges to be tackled. This study is interested in
the general category of classification. According to [2],
classification in ML is the process of determining which
category an observation belongs to. In essence, it’s a
process that establishes relationships between a
dependent variable (is categorical in nature) and an
independent variable (can be categorical or numerical in
nature).
Now, many decision-making problems lie within the
category of classification. One study suggested that this
type of problem has two empirical learning techniques
for classification. The first one is Statistical Pattern
Recognition (SPR) and the second one is Machine
Learning Techniques (MLT), which the latter create
decision trees and production rules. This study will be
looking at the latter category and compare different
MLTs.

2.2 Overall aim:


This study aims to help students perform better
academically by analyzing their higher education credit
data and identifying the students that need help with
their studies. It would be ideal if an ML modeled
framework could be created that can help universities’
staff and students, regarding student performance and
university program completion rate.
This study will also look at possible patterns for
students that eventually drop off or that will have a
difficulty completing the program. In a technical sense,
the purpose for this quantitative study is evaluating two
or more ML algorithms/models for performance in
terms of accuracy, precision, recall, f1 score, and
prediction. These parameters are explained in detail in
the Method section.
The expected outcome is an ML model that recognizes
patterns of students’ success based on previous data
from their higher education credit. This research aims to
help students and staff at universities to assess the
students’ progress and predict future academic outcome
(dropout/completion rate) based on current
performance. From a technical point of view, this study
provides a comparison of different ML algorithms and a
performance review in terms of the prior specified
parameters.

2.3 Problem statement:


It’s known that students are dropping at a certain rate
from different university level bachelor’s programs.
There are many reasons for this, but for the purposes of
this study, it’s mainly interested in motivation and
progress perception. In the technical sense however, it’s
interested in the comparison of different ML Models
used in ML frameworks to generate a consensus on
which model performs betters in terms of the
parameters to predict student performance.

This study will test two or more ML models in ML


frameworks, depending on how labor intensive and time
consuming the process is, using student data from
higher education credit data, and generate a comparison
study between different ML models. This problem
affects universities and students across the world. There
are many societal and economic consequences. For
instance, social stigma, fewer job opportunities, and
lower salaries, to name a few.

Therefore, the problem statement of this thesis is to


determine to what extent student data and machine
learning can be used to identify students that need help
with their university academic performance.

2.4 Research Questions:


How is performance affected in terms of accuracy,
precision, recall, f1 score, and prediction when running
the SVM vs ANN models on TensorFlow using student
datasets from Ladok.
This main research question is broken down further into
the following research questions:

1. Are ML algorithms an appropriate way and can they


be used to find out these student patterns?

2. How to implement ML models (SVM and ANN) and


what model performs best in terms of accuracy,
precision, recall, and f1 score on an open-source ML
framework using student data from higher education
credit data?

3. How much data is needed to be able to draw these


conclusions for classifying the dataset?

The scientific knowledge to be gained from the thesis


will be threefold:
1- Determining the “appropriateness” or legitimacy of
using ML algorithms for predicting student
performance using only datasets from Ladok.

2- Seeing how SVM and ANN react under these certain


conditions and to provide insights on improving ML
model performance for this type of data.

3- Providing an objective method/view for determining


the sufficiency of the size of the dataset.

2.5 Scope :

There are many open-source ML frameworks to


evaluate and analyze. For this thesis, it will focus on
two transformer models. A transformer is a deep
learning model that uses the attention mechanism to
weigh the significance of each element of the input data
differently. Its primary applications are in Natural
Language Processing (NLP) and Computer Vision
(CV). This study is interested in NLP Transformers and
will focus on usage in binary text classification for
open-source ML frameworks. Classification falls into
the category of supervised ML. This study will not
focus on the unsupervised category of ML and for
matter neither on other types of classification in
Machine Learning such as multi class /multi label /
imbalanced classification or regression.

OBJECTIVE OF THE PROJECT

The objective of this project is to develop a predictive


model that can accurately predict a student's
performance based on various factors such as
demographic information, academic background, and
other relevant features. Machine learning technology is
used to discover models or patterns of data, and it is
helpful in decision-making.
The main aim of the system is to predict the future
performance of the student using certain data about the
student such as gender, nationality, grade, semester, and
so forth.
The model should be able to identify students who are
at risk of falling behind and provide insights into
potential interventions that can help improve their
academic outcomes.

Ultimately, the goal is to use this model to help


educators and administrators make data-driven
decisions that can improve the overall academic success
of students.

SCOPE OF THE PROJECT

Educational organizations are one of the important parts


of our society and play a vital role in the growth and
development of any nation. Educational data mining is
the application of data mining. It is an emerging
interdisciplinary research area that deals with the
development of methods to explore data originating in
an educational context.
Educational data mining is an emerging trend, designed
for automatically exploring the unique types of data
from large repositories of educationally related data.
Quite often, this data is extensive, fine-grained, and
precise. The main objective of this paper is to use data
mining methodologies to study students’ performance in
the courses. Data mining provides many tasks that
could be used to study student performance. In this, the
classification task is used to evaluate students
performance and as there are many approaches that are
used for data classification, the decision tree method is
used here.
Information such as gender, nationality, grade, semester,
etc. was collected from the student’s management
system, to predict their performance. The faculty cannot
find out student’s abilities, interests, and academic
performance easily so they can enhance them by using
this project. Thus, it may affect poor university results,
placement, and career of individuals. The impact is it
helps us from fulfilling the mission and vision of the
institute. If the project gets successful then it will be a
great help for faculty and universities to enhance the
education system.

LITERATURE REVIEW

The application of machine learning algorithms for


forecasting student performance was the focus of the
literature review. Several factors, such as demographic
information, prior academic performance, behavior
information, and other relevant features have been
utilized in various studies to predict student
performance. These studies made use of decision trees,
random forests, and support vector machines among
other algorithms. Overall, the findings demonstrate that
machine learning models can reliably forecast student
performance and assist in the identification of children
who are at risk, resulting in targeted interventions and
better outcomes. To assess the efficacy and scalability
of these models in actual educational contexts, more
study is required. Yet, it is currently difficult to gather
reliable and thorough data.

CHALLENGES

There are several challenges that can be faced during


the making of a student performance prediction project.
Some of these challenges include:

1.Data quality and quantity: The availability and quality


of data can significantly impact the accuracy of the
model. The data should be relevant, accurate, and up-to-
date.
2.Feature selection and engineering: Selecting the right
features and engineering them properly can be a
challenging task, as it requires domain knowledge and
expertise.

3.Overfitting and underfitting: Overfitting occurs when


a model is too complex and fits the training data too
well, resulting in poor performance on new data.
Underfitting occurs when a model is too simple and
cannot capture the underlying patterns in the data.

4.Lack of interpretability: Machine learning models can


be difficult to interpret, making it challenging to
understand why the model is making certain
predictions.

5.Deployment and scalability: Deploying the model in a


real-world setting and ensuring scalability can be a
complex process, requiring expertise in software
engineering and infrastructure management.
PROBLEM STATEMENT

The objective of this project is to develop a machine-


learning model that can accurately predict the
performance of students based on their demographics,
past academic records, and other relevant factors. The
model should be able to predict the student's final grade
in a course, given a set of input variables. This can help
educators identify students who are at risk of failing and
provide them with targeted interventions to improve
their performance. It can also assist in the development
of more personalized learning plans for each student,
tailored to their individual strengths and weaknesses.
It’s known that students are dropping at a certain rate
from different university level bachelor’s programs.
There are many reasons for this, but for the purposes of
this study, it’s mainly interested in motivation and
progress perception. In the technical sense however, it’s
interested in the comparison of different ML Models
used in ML frameworks to generate a consensus on
which model performs betters in terms of the
parameters to predict student performance.

This study will test two or more ML models in ML


frameworks, depending on how labor intensive and time
consuming the process is, using student data from
higher education credit data, and generate a comparison
study between different ML models. This problem
affects universities and students across the world. There
are many societal and economic consequences. For
instance, social stigma, fewer job opportunities, and
lower salaries to name a few.

Therefore, the problem statement of this thesis is to


determine to what extent student data and machine
learning can be used to identify students that need help
with their university academic performance

RELATED LITERATURE SURVEY

There are different research papers written on the topic and


there are different research and surveys going worldwide in
different institutions and different top scientists working on
projects to develop such machines.
Some of the latest research papers are on the application of
machine learning which finds out the prediction of student
performance. Over the last decades machine learning has been
successfully applied to biological data mining, image analysis,
face recognition, and many more.
8.1 Related Surveys:
The following are two surveys on text classification
methods in ML

8.2 Text Classification Using Machine Learning


Methods - A Survey: Due to the high dimensional
feature vector including noisy and irrelevant data, text
categorization is a tough process. For removing
unimportant features and lowering the dimension of the
feature vector, many features reduction approaches have
been developed. This article discusses how to classify
text using machine learning algorithms and what
features to look for .

8.3 Text Classification Algorithms: A Survey


The number of complicated documents and texts has
increased exponentially in recent years, necessitating a
greater grasp of machine learning technologies to
effectively identify texts in numerous applications. For
researchers, finding appropriate structures,
architectures, and methodologies for text categorization
is a problem. Different text feature extractions,
dimensionality reduction approaches, current
algorithms and strategies, and assessment methods are
all included in this review .

Related work:
There are a multitude of ML research and development
for predicting student performance. Below are some
works that are like the one conducted in the thesis.

8.4 Student Performance Evaluation in Educational


Data Mining: The researchers created two machine
learning models and compared their effectiveness in
this study [18]. The first was a Random Forest (RF)
model, while the second was an ANN model.
TensorFlow was used as the backend for both. They
aimed to use geography and evaluation data to predict
student academic achievement. They did so by testing
whether ANN can achieve state-of-the-art performance
on a
significant quantity of educational data, just as it does
on data from other domains. In addition, comparing the
performance of ANN with that of RF.

The motivation for this study was the growing interest


among academics in using data mining to analyze
student data. The authors of the report believed that
these advancements in data mining should also aid the
field of education. In the conclusion chapter, the authors
stated that, for future work, Recurrent neural Networks
(RNN) can be used for detecting students who are about
withdraw or dropout. The authors said in the conclusion
chapter that Recurrent neural Networks (RNN) can be
utilized to detect students who are going to withdraw or
dropout in the future. It is on this basis (research gap/
future work) that this paper will conduct this current
research. In our thesis, it will differ in the type of ML
algorithms that will be used (ANN vs SVM).

8.5 Using TensorFlow to Support Educational


Problems:
The outcomes of a systematic mapping procedure
utilizing the TensorFlow framework on educational data
mining projects. They also intended to help with
explaining what kinds of problems to focus on, as well
as identifying, demonstrating, and cataloging all
academic publications that have covered it, as well as
the methodologies used, such as neural networks and
decision trees. The writers wanted to know how many
papers were on this topic and where they came from.
Also, to look at the difficulties that the TensorFlow
framework has tackled in the realm of education. As
well as determining existing techniques to solving these
issues.

8.6 Tracking and Predicting Student Performance in


Degree Programs :
There is a rich literature on predicting student
performance using data driven approaches. However,
there is a lack of research on predicting student
performance relating to program completion (college
programs, for example). New challenges are specified
by this paper [20]. They are the following: the
differentiation in student background and course
selection, not all courses taken make an accurate
student performance prediction, and lastly, the
prediction must consider the students’ evolving
progress.

SOFTWARE AND HARDWARE REQUIREMENTS

9.1PROPOSED METHODOLOGY

An accurate predictive modelling can be achieved by


several techniques such as regression, classification,
and clustering, however, it’s been noticed that
classification is one of the most popular techniques used
in predicting the student’s academic performance. There
are many methods under classification that have been
used for prediction. Among these are Artificial Neural
Network (ANN), decision tree, Support Vector Machine
(SVM), k-nearest Neighbour (KNN), and Naive Bayes.

Artificial Neural Network can solve the non-linear and


complex relationship between different input and output
variables. Decision tree are often used due to its clarity
and simplicity in discovering and prediciting data.
Many scholars found that decision trees can be easily
understood as science it is based on IF-THEN rules.
Support Vector Machine is good for handling a small
dataset and has a greater generalization ability
compared with other methods. Naive Bayes are
extremely scalable and require several linear attributes
to learn the problem.
ML can be an overwhelming subject to tackle as it
encompasses a large subset of research and studies on
different algorithms for the purpose of solving complex
tasks/patterns. This chapter will discuss several
concepts and algorithms that will be useful to know as
the reader moves forward.
9.2 Machine Learning (ML):
ML is considered as a branch of Artificial intelligence
(AI). It aims to imitate the way human beings solve
problems in the form of creating computational neural
networks that mimics neural networks in the brain.
9.1.1 Supervised vs Unsupervised ML:
There are two forms of supervised learning in machine
learning: classification and regression. An algorithm is
used to properly allocate test results into specific
groups, such as recognizing apples from oranges.
Another supervised learning approach is regression,
which employs an algorithm to identify the connection
between dependent and independent variables. Linear
regression, logistic regression, and polynomial
regression are three prominent regression techniques.
Unsupervised learning uses machine learning methods
to analyze and cluster unlabeled data sets. For tasks like
grouping, association, and dimensionality reduction,
unsupervised learning models are utilized. Clustering is
a data mining technique for grouping unlabeled data by
similarities and differences. The K number indicates the
grouping's size and granularity. K-means clustering
divides data into groups that are linked.
The difference is that in supervised learning, the
algorithm learns from the training dataset by generating
data predictions and correcting for the right answer
repeatedly. While supervised learning models are more
accurate than unsupervised learning models, data
identification requires human involvement.
Unsupervised learning models, on the other hand,
discover the structure of unlabeled data on their own is
an illustration of ML classification .
9.1.2 Transformers:
Transformer models use a developing collection of
mathematical approaches known as attention or self-
attention to discover subtle ways that even far-flung
data pieces in a series impact and depend on one
another. They're powering a surge of machine learning
advancements termed "transformer AI" by some.
Transformers are interpreting text and audio in real
time, allowing hearing impaired people to join meetings
and courses. They're assisting researchers in better
understanding gene chains in DNA and amino acids in
proteins, which can help speed up medication
development. To avoid fraud, speed production, provide
online suggestions, or enhance healthcare, transformers
can recognize patterns and abnormalities. Every time
someone searches on Google or Microsoft Bing, they
utilize transformers .
9.1.3 Text Classification
Text classification is the process of categorizing natural
language texts and assigning tags to them within a
specified set of categories. Apart from human
classification, automated Classification APIs are also
used to categorize significant sentences in a document
so that important terms may be utilized.
9.1.4 Binary Text Classification : Binary Text
classification is a sub class of text classification and is a
form of supervised learning technique which is used to
predict whether a piece of text belongs to one of two
categories. The datasets that are used to train this type
of binary classifier is labelled .
9.1.5 Artificial Neural Networks (ANN): ANNs are also
commonly known as just Neural Networks in ML make
up the backbone of all ML programs. They can be seen
as a composite function that’s made up by three
components which are the neurons that perform the
computation on the input data, the parameters which are
the values that the ANN must learn the pattern of, and
the biases which are the values that the ANN adds to the
input data to construct a pattern.

9.1.6 Support Vector Machine (SVM): SVM is a


supervised machine learning technique that may be
used for both classification and regression (Supervised
Learning). The goal of the SVM method is to discover a
hyperplane in an N dimensional space that categorizes
data points clearly. The hyperplane's size is determined
by the number of inputs. If there are just two inputs, the
hyperplane is merely a line. When there are three
inputs, the hyperplane becomes a two-dimensional
plane. When the number of inputs exceeds three, it
becomes impossible to imagine.
2.1.7 Random Forest (RF) : RF is a classification
technique that is made from many decision trees, which
are in turn data constructs that decide the rules/patterns
from the input data.

SOFTWARE MODEL
The pre-defined libraries used
The choices available to plot a graph
MODULE DESCRIPTION

• Data Preprocessing: This module is responsible for


loading and cleaning the data, handling missing
values, and transforming the categorical features
into numerical values.
• Feature Selection: This module selects the most
relevant features for predicting student
performance by using various feature selection
techniques such as correlation analysis, chi-square
test, and mutual information.
• Model Development: This module builds and trains
the machine learning models for predicting student
performance. It uses several models, including
linear regression, decision tree, random forest, and
neural network, to evaluate the performance of
each model.
• Model Evaluation: This module evaluates the
performance of each model by using various
evaluation metrics such as mean squared error, root
mean squared error, and R-squared.
• Model Selection: This module selects the best
model based on the evaluation metrics and deploys
it for making student performance predictions.
• Overall, these modules work together to preprocess
the data, select relevant features, build and train
machine learning models, evaluate their
performance, and select the best model for
predicting student performance.
3.1 Scientific method description:
This study used a quantitative method to approach and
answer the research questions. The first research
question in this study was to examine if ML algorithms
are an appropriate way to, and if they can be used to
find out student data patterns. As this research question
is general, the approach to solve this goal was also
general. This goal was be fulfilled by executing the
main experiment in this study which was to clean the
student data and then feed them into the different ML
models in TensorFlow to generate some sort of pattern,
whether it was Supervised (Classification vs
Regression) or Unsupervised (Clustering vs Dimension
Reduction vs Association). When a pattern was
generated, then the research question was fulfilled.

The second research question in this study was to


implement ML models (SVM and ANN) and to
examine what model performed better in terms of
accuracy, precision, recall, and f1 score on TensorFlow
using datasets from Ladok. The approach to reach this
research question was to define the parameters exactly
to, measure and compare their values. In this case,
Accuracy was defined as the fraction of predictions that
an ML model got right . Essentially:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 X 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟
𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠

The third research question in this study was to discern


how much data was needed to be able draw these
conclusions (who needed help and who didn’t) for
classifying the datasets. This study used A comparison
of achieved accuracy vs baseline accuracy to decide if
the data was sufficient. If the baseline data was lower
than the achieved accuracy by training the model, then
the amount of data used was deemed to be sufficient.
The baseline accuracy was calculated by dividing the
number of the larger set by the number of the total
TRUE data points . A reason by analogy approach
meaning that by looking at previous similar studies in
applied machine learning and observing their results to
estimate the amount of data that was needed. A method
that is used to determine the accuracy of a ML model is
to weigh the accuracy of the trained model against the
baseline accuracy of the same model. If the trained
accuracy is higher, then the model is deemed to be good
enough for the data used.

3.2 Project method description:


This study consisted of five phases in total. The first
phase consisted of a literature study. The study looked
at previous related work in the form of research articles,
surveys, journals, and e books. This process was done
to familiarize the reader on the current state-of-the-art
and to show a research gap, to justify the current
research that was being conducted. Google Scholar was
used to find these resources.
The second phase consisted of designing and planning
the experiment. The student dataset received from
faculty senior lecturer Stefan Forsström at Mid Sweden
University was analyzed and consider for different
cleaning methods to prepare for ML training.
Afterwards, the study considered different frameworks
to train and compare the ML algorithms. Furthermore,
different “scoring” methods were considered to evaluate
the trained ML models.
The third phase (Implementation) was closely related to
the second phase. This was where the execution of the
plan and most of the hard work was conducted. It
consisted of cleaning the student dataset using Excel,
learning, and training the ML models using the dataset
in TensorFlow, scoring the models by applying different
metric measuring functions, and finally evaluating the
models by generating metric values and graphs.
The fourth phase consisted of scoring, plotting, and
comparing the generated data. This was the process of
organizing and presenting the results of the generated
scores from the ML models. Scoring, often known as
prediction, is the act of creating values from new input
data using a trained machine learning model. The
created values or scores can be used to predict future
values, but they can also be used to represent a
predicted category or event. The score's meaning is
determined by the type of data that is submitted and the
model that is built.
The fifth and final phase consisted of analyzing the
results from the measured data. Since the TensorFlow
framework was used, there were plenty of tools and
articles on analyzing data. Different analyzing tools and
approaches were considered to analyze the trained
models and some tools were chosen to visualize the
model in the form of diagrams and charts.
3.3 Evaluation method:
Upon completion of the project, several aspects were
looked at. The first

aspect was reviewing how well defined were the goals


and research

scope. The second was looking at how comprehensive


and relevant the reference material (theory chapter)
was. did it show a broad understanding of the project?
The third was gauging the methodology used and the
reasons for using them. The fourth was to evaluate how
well drawn the conclusions were from the material. The
fifth was to examine the contribution of the scientific
knowledge and thesis structure, by evaluating how well
the results align with the stated goals and how much of
an original contribution it is .

RESULT ANALYSIS

Depending on the user's choice, the code generates a


different graph using the Seaborn and matplotlib library.
The menu options include different types of graphs such
as count plots for different class levels, grade levels,
gender, nationalities, semester, section, and topics.
After generating the graphs, the code removes some
columns that may not be needed for the analysis. It
removes columns such as gender, stageID, gradeID, etc.
It seems that the purpose of the code is to perform
exploratory data analysis (EDA) on student
performance data, and the graphs generated using the
Seaborn and matplotlib libraries help in visualizing and
understanding the distribution of data in different
categories. Additionally, the code also cleans the data
by removing unnecessary columns.
If we selected 1. i.e. marks class count graph
CONCLUSION

In conclusion, The user is asked to choose an


option from the menu. Depending on the user's
choice, the code generates a different graph using
the seaborn and matplotlib library. The menu
options include different types of graphs such as
count plots for different class levels, grade levels,
gender, nationalities, semester, section, and topic.
After generating the graphs, the code removes
some columns that may not be needed for the
analysis. It removes columns such as gender,
stageID, gradeID, etc.
It seems that the purpose of the code is to perform
an exploratory data analysis (EDA) on student
performance data, and the graphs generated using
the seaborn and matplotlib libraries help in
visualizing and understanding the distribution of
data in different categories. Additionally, the code
also cleans the data by removing unnecessary
columns.
APPLICATIONS

Predicting students’ performance is very important in


matters related to higher education as well as with
regard to deep learning and its relationship to
educational data. Prediction of students’ performance
provides support in selecting courses and designing
appropriate future study plans for students. In addition
to predicting the performance of students, it helps
teachers and managers to monitor students in order to
provide support to them and to integrate the training
programs to obtain the best results.
 One of the benefits of student prediction is that it
reduces the official warning signs as well as
expelling students because of their inefficiency.
Prediction provides support to the students
themselves through their choice of courses and
study plans appropriate to their abilities.
 The improved planning and accurate adjustments in
education management strategies yield enhanced
attainment rates in program learning outcomes.
 Identify, track, and improve student learning
outcomes and their impact on classroom activities.
For instance, prediction models could be tuned to
classify student performance as low, average, or
high. Based on the classification results, concerted
measures may be taken by the education managers
to support low-performing students.
 Allocating resources to the students based on their
predicted performance. For instance, the
identification and prediction of high-performing
students will support institutions to estimate the
number of awarded scholarships.
 Minimize the student dropout rates which is
considered a resources black hole that impacts
graduation rates, quality, and even institutional
ranking.

FUTURE SCOPE
8.1 Future Work
If the thesis had more time, the following could have
been done.
8.1.1 The degrees of each course
A possible future work is to consider using the degrees
of each course individually and not just the value of
whether the course was cleared or not and see how the
performance of each course affects future academic
outcomes. The degrees of each course could be used to
train the same ML algorithms in this study (SVM and
ANN) or perhaps other ML classification algorithms.

8.1.2 Bigger dataset for SVM and ANN


Another possible future work is to simply input a
bigger dataset into the SVM and ANN models, and see
how these models react, if any change results at all.

8.1.3 Other types of algorithms


A future work could be to test other types of models
such as KNN, Random Forest (classification
algorithms) or use CNN and RNN which are algorithms
typical used on image classification and adapt to the
same binary dataset used in this thesis and see how they
react. Indeed, the limited dataset that was used in this
study, limited the possible types (classification
algorithms) of ML algorithms that could be used.

REFERENCES

• Hands-On Machine Learning with Scikit-Learn,


Keras, and TensorFlow" by Aurélien Géron
• https://www.diva-portal.org/smash/get/
diva2:1676626/FULLTEXT01.pdf
• https://slejournal.springeropen.com/articles/
10.1186/s40561-022-00192-z

You might also like