0% found this document useful (0 votes)

43 views45 pages

1 Report

Uploaded by

Vibhuti Shrivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views45 pages

1 Report

Uploaded by

Vibhuti Shrivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

A Minor Project Report

on
STUDENT PERFORMANCE PREDICTION

Submitted to
RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA
BHOPAL (MP)

MINOR PROJECT – II REPORT

Submitted by
Vibhuti Shrivastava 0111CS201191
Vidushi Deshmukh 0111CS201193

Department of Computer Science & Engineering

Technocrats Institute of Technology, Bhopal (MP)

Session 2020-2024
Technocrats Institute of Technology,
Bhopal (MP)
Department of Computer Science & Engineering

CERTIFICATE
This is to certify that the work embodied in this report entitled “Student
Performance Prediction” has been satisfactorily completed by Vibhuti
Shrivastava, Vidushi Deshmukh. It is a bonafide piece of work, carried out
under our/my guidance in the department of computer science and information
technology, Technocrats Institute of Technology, Bhopal for the partial
fulfilment of the Bachelor of Technology during the academic session 2023.

Dr. Kiran Pandey Dr. Manoj Tyagi

(Professor CSE Department) (HOD CSE Department)
TECHNOCRATS INSTITUTE OF
TECHNOLOGY,
BHOPAL

Department of Computer Science & Engineering

DECLARATION

We VIBHUTI SHRIVASTAVA, VIDUSHI DESHMUKH student of

BACHLOR of ENGINEERING in COMPUTER SCIENCE
ENGINEERING. Session 2022-23 Technocrats Institute of
Technology, Bhopal M.P. here by declare that the work presented in
this project report entitled “STUDENT PERFORMANCE
PREDICTION” is the outcome of our own work, is bonafide and
correct to the best of our knowledge and this work has been carried
out taking of Engineering Ethics.

VIBHUTI SHRIVASTAVA (0111CS201191)

VIDUSHI DESHMUKH (0111CS201193)

Date:
ACKNOWLEDGEMENT

I deem it's my privilege to extent my profound

gratitude and appreciation towards all those who
have directly or indirectly involved themselves in
making this project a great success. It gives me
immense pleasure to express my deepest sense
of gratitude and sincere thanks to my respected
guide Dr. Kiran Pandey, for their valuable
guidance encouragement and help for this work.
I express my deep sense of gratitude to Dr.
Kiran Pandey, for his keen interest, continued
encouragement and support.

I would also like to express my sincere thanks to

Dr Asif Ullah Khan Director of Technocrats
Institute of Technology, Bhopal, Dr.Manoj Tyagi
Head of Department Computer Science and
Engineering for providing me with all the moral
support and necessary help. My sincere
appreciation and thanks to all for keen interest,
continued encouragement and support my family
members and friends.
VIBHUTI SHRIVASTAVA
(0111CS201191)
VIDUSHI DESHMUKH
(0111CS201193)

INDEX

1. Abstract
2. Introduction
2.1 Background and problem motivation
2.2 Overall aim
2.3 Problem statement
2.4 Research questions
2.5 Scope
3. Objective of the Project
4. Scope of the Project
5. Literature Review
6. Challenges
7. Problem Statement
8. Related Literature Survey
8.1 Related surveys
8.2 Text classification using machine learning
methods
8.3 Text classification algorithms
8.4 Student performance evaluation in educational
data mining
8.5 Using TensorFlow to support educational
problem
8.6 Tracking and predicting student performance
in degree programs
9. Software and Hardware Requirements
9.1 Proposed methodology
9.2 Machine learning
10. Software Model
11. Module Description
11.1Scientific method description
11.2 Project method description
11.3 Evaluation method
12. Result Analysis
13. Conclusion
14. Application
15. Future Scope
15.1Future work
15.2 The degrees of each course
15.3 Bigger dataset for SVM and ANN
15.4 Other types of algorithms
16. References
ABSTRACT

Student’s performance is a
success factor in higher
education institutions.
Student’s performance is a
success factor in higher
education institutions.
Student’s performance is a
success factor in higher
education institutions.
Student performance is a success factor in higher
education institutions. The goal of any educational
institution is offering the best educational experience
and knowledge to the students. Identifying the students
who need extra support and taking the appropriate
actions to enhance their performance plays an important
role in achieving that goal.
In this machine-learning techniques have been used to
build a classifier that can predict the performance of the
students.

Machine learning techniques in educational data mining

aim to develop a model for discovering meaningful
hidden patterns and exploring useful information from
educational settings. The dataset was fetched from
Ladok and consisted of anonymous higher education
student credit from a multitude of courses. The
algorithms were run on TensorFlow with Keras as an
API and were built, trained, and run for evaluation all
on Google Colab. The source code is written in Python.
The study’s non-technical goal was to find a prediction
pattern for student performance and provide a technical
framework tool to provide feedback for students and
university faculty.
INTRODUCTION

Educational data mining refers to data mining

techniques used to analyze educational data.
Educational institutions store a vast quantity of data in
order to keep track of students, faculty, and courses.
This data contains personal and academic information
about students such as gender, nationality, grade,
semester, and so forth. Various universities and
independent organizations have begun to use
educational data mining to improve the lives of their
students.

This is Project uses a script for data analysis and

visualization using popular machine-learning libraries
such as Pandas, Seaborn, Matplotlib, and Scikit-learn.
The script reads a CSV file containing data about
students' academic performance and behavior and
allows the user to generate various graphs and charts
based on different aspects of the data. The user can
choose to visualize the data according to different
categories, such as gender, nationality, grade, and
semester, and generate different types of plots, such as
bar charts and count plots.
Additionally, the script also includes some
preprocessing steps where certain columns are dropped
from the data to prepare it for further analysis.

2.1 Background and problem motivation :

There are students that drop out of university programs
in the initial first year. According to a study in 2018,
Sweden had a dropout percentage of 29% of full-time
bachelor’s students. There are many possible factors to
this occurrence, motivation and progress perception
being chiefly two among many that are hypothesized to
be causative, in this study.
Furthermore, the study investigates a possible way of
providing constructive feedback and making, in a sense,
a gamification of the study process to increase students’
university program completion rate.
One way of investigating this pattern of behavior is
with the help of Machine Learning (ML). This is
certainly a broad subject and there are many problems
and challenges to be tackled. This study is interested in
the general category of classification. According to [2],
classification in ML is the process of determining which
category an observation belongs to. In essence, it’s a
process that establishes relationships between a
dependent variable (is categorical in nature) and an
independent variable (can be categorical or numerical in
nature).
Now, many decision-making problems lie within the
category of classification. One study suggested that this
type of problem has two empirical learning techniques
for classification. The first one is Statistical Pattern
Recognition (SPR) and the second one is Machine
Learning Techniques (MLT), which the latter create
decision trees and production rules. This study will be
looking at the latter category and compare different
MLTs.

2.2 Overall aim:

This study aims to help students perform better
academically by analyzing their higher education credit
data and identifying the students that need help with
their studies. It would be ideal if an ML modeled
framework could be created that can help universities’
staff and students, regarding student performance and
university program completion rate.
This study will also look at possible patterns for
students that eventually drop off or that will have a
difficulty completing the program. In a technical sense,
the purpose for this quantitative study is evaluating two
or more ML algorithms/models for performance in
terms of accuracy, precision, recall, f1 score, and
prediction. These parameters are explained in detail in
the Method section.
The expected outcome is an ML model that recognizes
patterns of students’ success based on previous data
from their higher education credit. This research aims to
help students and staff at universities to assess the
students’ progress and predict future academic outcome
(dropout/completion rate) based on current
performance. From a technical point of view, this study
provides a comparison of different ML algorithms and a
performance review in terms of the prior specified
parameters.

2.3 Problem statement:

It’s known that students are dropping at a certain rate
from different university level bachelor’s programs.
There are many reasons for this, but for the purposes of
this study, it’s mainly interested in motivation and
progress perception. In the technical sense however, it’s
interested in the comparison of different ML Models
used in ML frameworks to generate a consensus on
which model performs betters in terms of the
parameters to predict student performance.

This study will test two or more ML models in ML

frameworks, depending on how labor intensive and time
consuming the process is, using student data from
higher education credit data, and generate a comparison
study between different ML models. This problem
affects universities and students across the world. There
are many societal and economic consequences. For
instance, social stigma, fewer job opportunities, and
lower salaries, to name a few.

Therefore, the problem statement of this thesis is to

determine to what extent student data and machine
learning can be used to identify students that need help
with their university academic performance.

2.4 Research Questions:

How is performance affected in terms of accuracy,
precision, recall, f1 score, and prediction when running
the SVM vs ANN models on TensorFlow using student
datasets from Ladok.
This main research question is broken down further into
the following research questions:

1. Are ML algorithms an appropriate way and can they

be used to find out these student patterns?

2. How to implement ML models (SVM and ANN) and

what model performs best in terms of accuracy,
precision, recall, and f1 score on an open-source ML
framework using student data from higher education
credit data?

3. How much data is needed to be able to draw these

conclusions for classifying the dataset?

The scientific knowledge to be gained from the thesis

will be threefold:
1- Determining the “appropriateness” or legitimacy of
using ML algorithms for predicting student
performance using only datasets from Ladok.

2- Seeing how SVM and ANN react under these certain

conditions and to provide insights on improving ML
model performance for this type of data.

3- Providing an objective method/view for determining

the sufficiency of the size of the dataset.

2.5 Scope :

There are many open-source ML frameworks to

evaluate and analyze. For this thesis, it will focus on
two transformer models. A transformer is a deep
learning model that uses the attention mechanism to
weigh the significance of each element of the input data
differently. Its primary applications are in Natural
Language Processing (NLP) and Computer Vision
(CV). This study is interested in NLP Transformers and
will focus on usage in binary text classification for
open-source ML frameworks. Classification falls into
the category of supervised ML. This study will not
focus on the unsupervised category of ML and for
matter neither on other types of classification in
Machine Learning such as multi class /multi label /
imbalanced classification or regression.

OBJECTIVE OF THE PROJECT

The objective of this project is to develop a predictive

model that can accurately predict a student's
performance based on various factors such as
demographic information, academic background, and
other relevant features. Machine learning technology is
used to discover models or patterns of data, and it is
helpful in decision-making.
The main aim of the system is to predict the future
performance of the student using certain data about the
student such as gender, nationality, grade, semester, and
so forth.
The model should be able to identify students who are
at risk of falling behind and provide insights into
potential interventions that can help improve their
academic outcomes.

Ultimately, the goal is to use this model to help

educators and administrators make data-driven
decisions that can improve the overall academic success
of students.

SCOPE OF THE PROJECT

Educational organizations are one of the important parts

of our society and play a vital role in the growth and
development of any nation. Educational data mining is
the application of data mining. It is an emerging
interdisciplinary research area that deals with the
development of methods to explore data originating in
an educational context.
Educational data mining is an emerging trend, designed
for automatically exploring the unique types of data
from large repositories of educationally related data.
Quite often, this data is extensive, fine-grained, and
precise. The main objective of this paper is to use data
mining methodologies to study students’ performance in
the courses. Data mining provides many tasks that
could be used to study student performance. In this, the
classification task is used to evaluate students
performance and as there are many approaches that are
used for data classification, the decision tree method is
used here.
Information such as gender, nationality, grade, semester,
etc. was collected from the student’s management
system, to predict their performance. The faculty cannot
find out student’s abilities, interests, and academic
performance easily so they can enhance them by using
this project. Thus, it may affect poor university results,
placement, and career of individuals. The impact is it
helps us from fulfilling the mission and vision of the
institute. If the project gets successful then it will be a
great help for faculty and universities to enhance the
education system.

LITERATURE REVIEW

The application of machine learning algorithms for

forecasting student performance was the focus of the
literature review. Several factors, such as demographic
information, prior academic performance, behavior
information, and other relevant features have been
utilized in various studies to predict student
performance. These studies made use of decision trees,
random forests, and support vector machines among
other algorithms. Overall, the findings demonstrate that
machine learning models can reliably forecast student
performance and assist in the identification of children
who are at risk, resulting in targeted interventions and
better outcomes. To assess the efficacy and scalability
of these models in actual educational contexts, more
study is required. Yet, it is currently difficult to gather
reliable and thorough data.

CHALLENGES

There are several challenges that can be faced during

the making of a student performance prediction project.
Some of these challenges include:

1.Data quality and quantity: The availability and quality

of data can significantly impact the accuracy of the
model. The data should be relevant, accurate, and up-to-
date.
2.Feature selection and engineering: Selecting the right
features and engineering them properly can be a
challenging task, as it requires domain knowledge and
expertise.

3.Overfitting and underfitting: Overfitting occurs when

a model is too complex and fits the training data too
well, resulting in poor performance on new data.
Underfitting occurs when a model is too simple and
cannot capture the underlying patterns in the data.

4.Lack of interpretability: Machine learning models can

be difficult to interpret, making it challenging to
understand why the model is making certain
predictions.

5.Deployment and scalability: Deploying the model in a

real-world setting and ensuring scalability can be a
complex process, requiring expertise in software
engineering and infrastructure management.
PROBLEM STATEMENT

The objective of this project is to develop a machine-

learning model that can accurately predict the
performance of students based on their demographics,
past academic records, and other relevant factors. The
model should be able to predict the student's final grade
in a course, given a set of input variables. This can help
educators identify students who are at risk of failing and
provide them with targeted interventions to improve
their performance. It can also assist in the development
of more personalized learning plans for each student,
tailored to their individual strengths and weaknesses.
It’s known that students are dropping at a certain rate
from different university level bachelor’s programs.
There are many reasons for this, but for the purposes of
this study, it’s mainly interested in motivation and
progress perception. In the technical sense however, it’s
interested in the comparison of different ML Models
used in ML frameworks to generate a consensus on
which model performs betters in terms of the
parameters to predict student performance.

This study will test two or more ML models in ML

Therefore, the problem statement of this thesis is to

determine to what extent student data and machine
learning can be used to identify students that need help
with their university academic performance

RELATED LITERATURE SURVEY

There are different research papers written on the topic and

there are different research and surveys going worldwide in
different institutions and different top scientists working on
projects to develop such machines.
Some of the latest research papers are on the application of
machine learning which finds out the prediction of student
performance. Over the last decades machine learning has been
successfully applied to biological data mining, image analysis,
face recognition, and many more.
8.1 Related Surveys:
The following are two surveys on text classification
methods in ML

8.2 Text Classification Using Machine Learning

Methods - A Survey: Due to the high dimensional
feature vector including noisy and irrelevant data, text
categorization is a tough process. For removing
unimportant features and lowering the dimension of the
feature vector, many features reduction approaches have
been developed. This article discusses how to classify
text using machine learning algorithms and what
features to look for .

8.3 Text Classification Algorithms: A Survey

The number of complicated documents and texts has
increased exponentially in recent years, necessitating a
greater grasp of machine learning technologies to
effectively identify texts in numerous applications. For
researchers, finding appropriate structures,
architectures, and methodologies for text categorization
is a problem. Different text feature extractions,
dimensionality reduction approaches, current
algorithms and strategies, and assessment methods are
all included in this review .

Related work:
There are a multitude of ML research and development
for predicting student performance. Below are some
works that are like the one conducted in the thesis.

8.4 Student Performance Evaluation in Educational

Data Mining: The researchers created two machine
learning models and compared their effectiveness in
this study [18]. The first was a Random Forest (RF)
model, while the second was an ANN model.
TensorFlow was used as the backend for both. They
aimed to use geography and evaluation data to predict
student academic achievement. They did so by testing
whether ANN can achieve state-of-the-art performance
on a
significant quantity of educational data, just as it does
on data from other domains. In addition, comparing the
performance of ANN with that of RF.

The motivation for this study was the growing interest

among academics in using data mining to analyze
student data. The authors of the report believed that
these advancements in data mining should also aid the
field of education. In the conclusion chapter, the authors
stated that, for future work, Recurrent neural Networks
(RNN) can be used for detecting students who are about
withdraw or dropout. The authors said in the conclusion
chapter that Recurrent neural Networks (RNN) can be
utilized to detect students who are going to withdraw or
dropout in the future. It is on this basis (research gap/
future work) that this paper will conduct this current
research. In our thesis, it will differ in the type of ML
algorithms that will be used (ANN vs SVM).

8.5 Using TensorFlow to Support Educational

Problems:
The outcomes of a systematic mapping procedure
utilizing the TensorFlow framework on educational data
mining projects. They also intended to help with
explaining what kinds of problems to focus on, as well
as identifying, demonstrating, and cataloging all
academic publications that have covered it, as well as
the methodologies used, such as neural networks and
decision trees. The writers wanted to know how many
papers were on this topic and where they came from.
Also, to look at the difficulties that the TensorFlow
framework has tackled in the realm of education. As
well as determining existing techniques to solving these
issues.

8.6 Tracking and Predicting Student Performance in

Degree Programs :
There is a rich literature on predicting student
performance using data driven approaches. However,
there is a lack of research on predicting student
performance relating to program completion (college
programs, for example). New challenges are specified
by this paper [20]. They are the following: the
differentiation in student background and course
selection, not all courses taken make an accurate
student performance prediction, and lastly, the
prediction must consider the students’ evolving
progress.

SOFTWARE AND HARDWARE REQUIREMENTS

9.1PROPOSED METHODOLOGY

An accurate predictive modelling can be achieved by

several techniques such as regression, classification,
and clustering, however, it’s been noticed that
classification is one of the most popular techniques used
in predicting the student’s academic performance. There
are many methods under classification that have been
used for prediction. Among these are Artificial Neural
Network (ANN), decision tree, Support Vector Machine
(SVM), k-nearest Neighbour (KNN), and Naive Bayes.

Artificial Neural Network can solve the non-linear and

complex relationship between different input and output
variables. Decision tree are often used due to its clarity
and simplicity in discovering and prediciting data.
Many scholars found that decision trees can be easily
understood as science it is based on IF-THEN rules.
Support Vector Machine is good for handling a small
dataset and has a greater generalization ability
compared with other methods. Naive Bayes are
extremely scalable and require several linear attributes
to learn the problem.
ML can be an overwhelming subject to tackle as it
encompasses a large subset of research and studies on
different algorithms for the purpose of solving complex
tasks/patterns. This chapter will discuss several
concepts and algorithms that will be useful to know as
the reader moves forward.
9.2 Machine Learning (ML):
ML is considered as a branch of Artificial intelligence
(AI). It aims to imitate the way human beings solve
problems in the form of creating computational neural
networks that mimics neural networks in the brain.
9.1.1 Supervised vs Unsupervised ML:
There are two forms of supervised learning in machine
learning: classification and regression. An algorithm is
used to properly allocate test results into specific
groups, such as recognizing apples from oranges.
Another supervised learning approach is regression,
which employs an algorithm to identify the connection
between dependent and independent variables. Linear
regression, logistic regression, and polynomial
regression are three prominent regression techniques.
Unsupervised learning uses machine learning methods
to analyze and cluster unlabeled data sets. For tasks like
grouping, association, and dimensionality reduction,
unsupervised learning models are utilized. Clustering is
a data mining technique for grouping unlabeled data by
similarities and differences. The K number indicates the
grouping's size and granularity. K-means clustering
divides data into groups that are linked.
The difference is that in supervised learning, the
algorithm learns from the training dataset by generating
data predictions and correcting for the right answer
repeatedly. While supervised learning models are more
accurate than unsupervised learning models, data
identification requires human involvement.
Unsupervised learning models, on the other hand,
discover the structure of unlabeled data on their own is
an illustration of ML classification .
9.1.2 Transformers:
Transformer models use a developing collection of
mathematical approaches known as attention or self-
attention to discover subtle ways that even far-flung
data pieces in a series impact and depend on one
another. They're powering a surge of machine learning
advancements termed "transformer AI" by some.
Transformers are interpreting text and audio in real
time, allowing hearing impaired people to join meetings
and courses. They're assisting researchers in better
understanding gene chains in DNA and amino acids in
proteins, which can help speed up medication
development. To avoid fraud, speed production, provide
online suggestions, or enhance healthcare, transformers
can recognize patterns and abnormalities. Every time
someone searches on Google or Microsoft Bing, they
utilize transformers .
9.1.3 Text Classification
Text classification is the process of categorizing natural
language texts and assigning tags to them within a
specified set of categories. Apart from human
classification, automated Classification APIs are also
used to categorize significant sentences in a document
so that important terms may be utilized.
9.1.4 Binary Text Classification : Binary Text
classification is a sub class of text classification and is a
form of supervised learning technique which is used to
predict whether a piece of text belongs to one of two
categories. The datasets that are used to train this type
of binary classifier is labelled .
9.1.5 Artificial Neural Networks (ANN): ANNs are also
commonly known as just Neural Networks in ML make
up the backbone of all ML programs. They can be seen
as a composite function that’s made up by three
components which are the neurons that perform the
computation on the input data, the parameters which are
the values that the ANN must learn the pattern of, and
the biases which are the values that the ANN adds to the
input data to construct a pattern.

9.1.6 Support Vector Machine (SVM): SVM is a

supervised machine learning technique that may be
used for both classification and regression (Supervised
Learning). The goal of the SVM method is to discover a
hyperplane in an N dimensional space that categorizes
data points clearly. The hyperplane's size is determined
by the number of inputs. If there are just two inputs, the
hyperplane is merely a line. When there are three
inputs, the hyperplane becomes a two-dimensional
plane. When the number of inputs exceeds three, it
becomes impossible to imagine.
2.1.7 Random Forest (RF) : RF is a classification
technique that is made from many decision trees, which
are in turn data constructs that decide the rules/patterns
from the input data.

SOFTWARE MODEL
The pre-defined libraries used
The choices available to plot a graph
MODULE DESCRIPTION

• Data Preprocessing: This module is responsible for

loading and cleaning the data, handling missing
values, and transforming the categorical features
into numerical values.
• Feature Selection: This module selects the most
relevant features for predicting student
performance by using various feature selection
techniques such as correlation analysis, chi-square
test, and mutual information.
• Model Development: This module builds and trains
the machine learning models for predicting student
performance. It uses several models, including
linear regression, decision tree, random forest, and
neural network, to evaluate the performance of
each model.
• Model Evaluation: This module evaluates the
performance of each model by using various
evaluation metrics such as mean squared error, root
mean squared error, and R-squared.
• Model Selection: This module selects the best
model based on the evaluation metrics and deploys
it for making student performance predictions.
• Overall, these modules work together to preprocess
the data, select relevant features, build and train
machine learning models, evaluate their
performance, and select the best model for
predicting student performance.
3.1 Scientific method description:
This study used a quantitative method to approach and
answer the research questions. The first research
question in this study was to examine if ML algorithms
are an appropriate way to, and if they can be used to
find out student data patterns. As this research question
is general, the approach to solve this goal was also
general. This goal was be fulfilled by executing the
main experiment in this study which was to clean the
student data and then feed them into the different ML
models in TensorFlow to generate some sort of pattern,
whether it was Supervised (Classification vs
Regression) or Unsupervised (Clustering vs Dimension
Reduction vs Association). When a pattern was
generated, then the research question was fulfilled.

The second research question in this study was to

implement ML models (SVM and ANN) and to
examine what model performed better in terms of
accuracy, precision, recall, and f1 score on TensorFlow
using datasets from Ladok. The approach to reach this
research question was to define the parameters exactly
to, measure and compare their values. In this case,
Accuracy was defined as the fraction of predictions that
an ML model got right . Essentially:
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 X 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟
𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠

The third research question in this study was to discern

how much data was needed to be able draw these
conclusions (who needed help and who didn’t) for
classifying the datasets. This study used A comparison
of achieved accuracy vs baseline accuracy to decide if
the data was sufficient. If the baseline data was lower
than the achieved accuracy by training the model, then
the amount of data used was deemed to be sufficient.
The baseline accuracy was calculated by dividing the
number of the larger set by the number of the total
TRUE data points . A reason by analogy approach
meaning that by looking at previous similar studies in
applied machine learning and observing their results to
estimate the amount of data that was needed. A method
that is used to determine the accuracy of a ML model is
to weigh the accuracy of the trained model against the
baseline accuracy of the same model. If the trained
accuracy is higher, then the model is deemed to be good
enough for the data used.

3.2 Project method description:

This study consisted of five phases in total. The first
phase consisted of a literature study. The study looked
at previous related work in the form of research articles,
surveys, journals, and e books. This process was done
to familiarize the reader on the current state-of-the-art
and to show a research gap, to justify the current
research that was being conducted. Google Scholar was
used to find these resources.
The second phase consisted of designing and planning
the experiment. The student dataset received from
faculty senior lecturer Stefan Forsström at Mid Sweden
University was analyzed and consider for different
cleaning methods to prepare for ML training.
Afterwards, the study considered different frameworks
to train and compare the ML algorithms. Furthermore,
different “scoring” methods were considered to evaluate
the trained ML models.
The third phase (Implementation) was closely related to
the second phase. This was where the execution of the
plan and most of the hard work was conducted. It
consisted of cleaning the student dataset using Excel,
learning, and training the ML models using the dataset
in TensorFlow, scoring the models by applying different
metric measuring functions, and finally evaluating the
models by generating metric values and graphs.
The fourth phase consisted of scoring, plotting, and
comparing the generated data. This was the process of
organizing and presenting the results of the generated
scores from the ML models. Scoring, often known as
prediction, is the act of creating values from new input
data using a trained machine learning model. The
created values or scores can be used to predict future
values, but they can also be used to represent a
predicted category or event. The score's meaning is
determined by the type of data that is submitted and the
model that is built.
The fifth and final phase consisted of analyzing the
results from the measured data. Since the TensorFlow
framework was used, there were plenty of tools and
articles on analyzing data. Different analyzing tools and
approaches were considered to analyze the trained
models and some tools were chosen to visualize the
model in the form of diagrams and charts.
3.3 Evaluation method:
Upon completion of the project, several aspects were
looked at. The first

aspect was reviewing how well defined were the goals

and research

scope. The second was looking at how comprehensive

and relevant the reference material (theory chapter)
was. did it show a broad understanding of the project?
The third was gauging the methodology used and the
reasons for using them. The fourth was to evaluate how
well drawn the conclusions were from the material. The
fifth was to examine the contribution of the scientific
knowledge and thesis structure, by evaluating how well
the results align with the stated goals and how much of
an original contribution it is .

RESULT ANALYSIS

Depending on the user's choice, the code generates a

different graph using the Seaborn and matplotlib library.
The menu options include different types of graphs such
as count plots for different class levels, grade levels,
gender, nationalities, semester, section, and topics.
After generating the graphs, the code removes some
columns that may not be needed for the analysis. It
removes columns such as gender, stageID, gradeID, etc.
It seems that the purpose of the code is to perform
exploratory data analysis (EDA) on student
performance data, and the graphs generated using the
Seaborn and matplotlib libraries help in visualizing and
understanding the distribution of data in different
categories. Additionally, the code also cleans the data
by removing unnecessary columns.
If we selected 1. i.e. marks class count graph
CONCLUSION

In conclusion, The user is asked to choose an

option from the menu. Depending on the user's
choice, the code generates a different graph using
the seaborn and matplotlib library. The menu
options include different types of graphs such as
count plots for different class levels, grade levels,
gender, nationalities, semester, section, and topic.
After generating the graphs, the code removes
some columns that may not be needed for the
analysis. It removes columns such as gender,
stageID, gradeID, etc.
It seems that the purpose of the code is to perform
an exploratory data analysis (EDA) on student
performance data, and the graphs generated using
the seaborn and matplotlib libraries help in
visualizing and understanding the distribution of
data in different categories. Additionally, the code
also cleans the data by removing unnecessary
columns.
APPLICATIONS

Predicting students’ performance is very important in

matters related to higher education as well as with
regard to deep learning and its relationship to
educational data. Prediction of students’ performance
provides support in selecting courses and designing
appropriate future study plans for students. In addition
to predicting the performance of students, it helps
teachers and managers to monitor students in order to
provide support to them and to integrate the training
programs to obtain the best results.
 One of the benefits of student prediction is that it
reduces the official warning signs as well as
expelling students because of their inefficiency.
Prediction provides support to the students
themselves through their choice of courses and
study plans appropriate to their abilities.
 The improved planning and accurate adjustments in
education management strategies yield enhanced
attainment rates in program learning outcomes.
 Identify, track, and improve student learning
outcomes and their impact on classroom activities.
For instance, prediction models could be tuned to
classify student performance as low, average, or
high. Based on the classification results, concerted
measures may be taken by the education managers
to support low-performing students.
 Allocating resources to the students based on their
predicted performance. For instance, the
identification and prediction of high-performing
students will support institutions to estimate the
number of awarded scholarships.
 Minimize the student dropout rates which is
considered a resources black hole that impacts
graduation rates, quality, and even institutional
ranking.

FUTURE SCOPE
8.1 Future Work
If the thesis had more time, the following could have
been done.
8.1.1 The degrees of each course
A possible future work is to consider using the degrees
of each course individually and not just the value of
whether the course was cleared or not and see how the
performance of each course affects future academic
outcomes. The degrees of each course could be used to
train the same ML algorithms in this study (SVM and
ANN) or perhaps other ML classification algorithms.

8.1.2 Bigger dataset for SVM and ANN

Another possible future work is to simply input a
bigger dataset into the SVM and ANN models, and see
how these models react, if any change results at all.

8.1.3 Other types of algorithms

A future work could be to test other types of models
such as KNN, Random Forest (classification
algorithms) or use CNN and RNN which are algorithms
typical used on image classification and adapt to the
same binary dataset used in this thesis and see how they
react. Indeed, the limited dataset that was used in this
study, limited the possible types (classification
algorithms) of ML algorithms that could be used.

REFERENCES

• Hands-On Machine Learning with Scikit-Learn,

Keras, and TensorFlow" by Aurélien Géron
• https://www.diva-portal.org/smash/get/
diva2:1676626/FULLTEXT01.pdf
• https://slejournal.springeropen.com/articles/
10.1186/s40561-022-00192-z

Handbook of Silicon Semiconductor Metrology by Alain C. Diebold (Ed.)
No ratings yet
Handbook of Silicon Semiconductor Metrology by Alain C. Diebold (Ed.)
866 pages
Iso 188-98
No ratings yet
Iso 188-98
14 pages
Leveraging Machine Learning Approaches For Predicting Students' Academic Success An Analytical Perspective
No ratings yet
Leveraging Machine Learning Approaches For Predicting Students' Academic Success An Analytical Perspective
16 pages
CD 1700 Service
0% (1)
CD 1700 Service
448 pages
Incubator Controller
100% (1)
Incubator Controller
5 pages
JEE Main Advanced 11 Sample
100% (1)
JEE Main Advanced 11 Sample
66 pages
Classification of Methods of Measurements
100% (2)
Classification of Methods of Measurements
60 pages
Instrumentation and Measurement (Lecture 2) : by Adnan Fazil & Akhtar Hanif
100% (1)
Instrumentation and Measurement (Lecture 2) : by Adnan Fazil & Akhtar Hanif
44 pages
Care and Control of Tools
No ratings yet
Care and Control of Tools
22 pages
Tracking and Predecting Students Performance With Machine Learning
0% (1)
Tracking and Predecting Students Performance With Machine Learning
47 pages
CP-001 - Calibration of Dial Gages.
100% (1)
CP-001 - Calibration of Dial Gages.
13 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
34 pages
Artificial Intelligent Approach To Predict The Student Behaviour and Performance
No ratings yet
Artificial Intelligent Approach To Predict The Student Behaviour and Performance
11 pages
Myfinaldoc
No ratings yet
Myfinaldoc
77 pages
Chapter-2-Scientific Approach
No ratings yet
Chapter-2-Scientific Approach
39 pages
Cycle 5t
No ratings yet
Cycle 5t
40 pages
Analysis of Student Academic Performance Using Machine Learning Algorithms: - A Study
No ratings yet
Analysis of Student Academic Performance Using Machine Learning Algorithms: - A Study
15 pages
Computer Science Students Academic Performance Prediction Using Ai
No ratings yet
Computer Science Students Academic Performance Prediction Using Ai
68 pages
Predicting Student Performance
No ratings yet
Predicting Student Performance
38 pages
Intro, Types of Surveying, Errors
No ratings yet
Intro, Types of Surveying, Errors
114 pages
Lucky Mini Project
No ratings yet
Lucky Mini Project
32 pages
1822 B.E Cse Batchno 7
No ratings yet
1822 B.E Cse Batchno 7
60 pages
2 Final
No ratings yet
2 Final
45 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs Mod
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs Mod
64 pages
PDL Sem 3
No ratings yet
PDL Sem 3
36 pages
MiniProject XLSX Merged1
No ratings yet
MiniProject XLSX Merged1
37 pages
Predicting The Admissions of Students in Masters Program Using Machine Learning
No ratings yet
Predicting The Admissions of Students in Masters Program Using Machine Learning
16 pages
Machine Learning Glob (22241a1237)
No ratings yet
Machine Learning Glob (22241a1237)
16 pages
IB Design Technology Extended Essay Final - Harinath Ranjit (Updated Font Sizing)
No ratings yet
IB Design Technology Extended Essay Final - Harinath Ranjit (Updated Font Sizing)
32 pages
First Project
No ratings yet
First Project
34 pages
Journal Publications
No ratings yet
Journal Publications
13 pages
Gen Chem 1 Measurements
No ratings yet
Gen Chem 1 Measurements
28 pages
Cyber Cafe Management System DEEPAK SHINDE
No ratings yet
Cyber Cafe Management System DEEPAK SHINDE
36 pages
Documentation Miniproject Alen-Final
No ratings yet
Documentation Miniproject Alen-Final
40 pages
A Systematic Literature Review
No ratings yet
A Systematic Literature Review
28 pages
Dalim Es
No ratings yet
Dalim Es
8 pages
Predicting The Students Performance
No ratings yet
Predicting The Students Performance
18 pages
PredictingStudentSuccess-AutoML PrePrint
No ratings yet
PredictingStudentSuccess-AutoML PrePrint
23 pages
PM Web 18058
No ratings yet
PM Web 18058
18 pages
Phishintentionllm: Uncovering Phishing Website Intentions Through Multi-Agent Retrieval-Augmented Generation
No ratings yet
Phishintentionllm: Uncovering Phishing Website Intentions Through Multi-Agent Retrieval-Augmented Generation
18 pages
A Machine Learning Model For University Student1
No ratings yet
A Machine Learning Model For University Student1
17 pages
Report WT
No ratings yet
Report WT
24 pages
Project Interim
No ratings yet
Project Interim
13 pages
An Enhanced Machine Learning-Based Approach For Analysis and Prediction of Student Performance in Classroom Learning
No ratings yet
An Enhanced Machine Learning-Based Approach For Analysis and Prediction of Student Performance in Classroom Learning
17 pages
Clinical Chemistry I
No ratings yet
Clinical Chemistry I
16 pages
Predicting Student Performance From Online Engagement Activities Using Novel Statistical Features
No ratings yet
Predicting Student Performance From Online Engagement Activities Using Novel Statistical Features
19 pages
Discontinuity Spacings in Rock
No ratings yet
Discontinuity Spacings in Rock
14 pages
Intern ReportFSDFSDF
No ratings yet
Intern ReportFSDFSDF
18 pages
MAJOR
No ratings yet
MAJOR
16 pages
Arasetv44 N1 PP105 119
No ratings yet
Arasetv44 N1 PP105 119
15 pages
Q Orbitals
No ratings yet
Q Orbitals
57 pages
Sample Project Report
No ratings yet
Sample Project Report
19 pages
Major Project Report Sem 7
No ratings yet
Major Project Report Sem 7
23 pages
Paper Predicting Student Scores
No ratings yet
Paper Predicting Student Scores
10 pages
Bee Jay1
No ratings yet
Bee Jay1
11 pages
Theory of Measurements and Errors
No ratings yet
Theory of Measurements and Errors
36 pages
Artificial Intelligence (Subject Code - 417)
No ratings yet
Artificial Intelligence (Subject Code - 417)
9 pages
Research Paper, 2020
No ratings yet
Research Paper, 2020
5 pages
12 IV April 2024
No ratings yet
12 IV April 2024
8 pages
ssrn-3370802 2
No ratings yet
ssrn-3370802 2
5 pages
Predicting Students Performance Through Data Mini
No ratings yet
Predicting Students Performance Through Data Mini
15 pages
83 CD
No ratings yet
83 CD
6 pages
Jeml 0102005
No ratings yet
Jeml 0102005
7 pages
12058-Article Text-21417-1-10-20220201
No ratings yet
12058-Article Text-21417-1-10-20220201
7 pages
Cautious and Precision Worksheet
No ratings yet
Cautious and Precision Worksheet
4 pages
University of Mumbai
No ratings yet
University of Mumbai
5 pages
Predicting Student Academic Performance Using Data Mining Methods
No ratings yet
Predicting Student Academic Performance Using Data Mining Methods
5 pages
1.student Performance Prediction Techniques
No ratings yet
1.student Performance Prediction Techniques
5 pages
1st Review.1
No ratings yet
1st Review.1
10 pages
Ncisem-2022 Paper 24
No ratings yet
Ncisem-2022 Paper 24
13 pages
Article 4
No ratings yet
Article 4
9 pages
Student Performance Prediction: Mukul Gharpure, Pushpak Chaudhari, Yash Bhole, Sagar Borkar, Aashutosh Awasthi
No ratings yet
Student Performance Prediction: Mukul Gharpure, Pushpak Chaudhari, Yash Bhole, Sagar Borkar, Aashutosh Awasthi
7 pages
Student Performance Prediction Using Machine Learn
No ratings yet
Student Performance Prediction Using Machine Learn
8 pages
Ieee 06651674
No ratings yet
Ieee 06651674
10 pages
Alcohol Detection of Drunken Drivers
No ratings yet
Alcohol Detection of Drunken Drivers
15 pages
Hölzer Et Al 97
No ratings yet
Hölzer Et Al 97
15 pages
IJRTI2005019
No ratings yet
IJRTI2005019
6 pages
Geo Metallurgy 2
No ratings yet
Geo Metallurgy 2
11 pages
11861-Article Text-21047-1-10-20211230
No ratings yet
11861-Article Text-21047-1-10-20211230
7 pages
Internal Assessment Resource Physics Level 2
No ratings yet
Internal Assessment Resource Physics Level 2
9 pages
Feature Extraction For Classifying Students Based On Their Academic Performance
No ratings yet
Feature Extraction For Classifying Students Based On Their Academic Performance
5 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
2 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
2 pages
Abstract Student Outcomes
No ratings yet
Abstract Student Outcomes
2 pages
Oracle Demand Management Cloud Ds
No ratings yet
Oracle Demand Management Cloud Ds
4 pages
Synopsis Education Data Analysis and Prediction of Student Performance Using ML
No ratings yet
Synopsis Education Data Analysis and Prediction of Student Performance Using ML
3 pages
Data 2
No ratings yet
Data 2
1 page
Exploring Higher Vocational Software Technology Education
From Everand
Exploring Higher Vocational Software Technology Education
Chen Ping
No ratings yet
ICT Project Management: Framework for ICT-based Pedagogy System: Development, Operation, and Management
From Everand
ICT Project Management: Framework for ICT-based Pedagogy System: Development, Operation, and Management
Suman Ahmmed
No ratings yet
How to Integrate and Evaluate Educational Technology
From Everand
How to Integrate and Evaluate Educational Technology
Rebecca Bunz
4.5/5 (3)

1 Report

Uploaded by

1 Report

Uploaded by

A Minor Project Report

MINOR PROJECT – II REPORT

Department of Computer Science & Engineering

Technocrats Institute of Technology, Bhopal (MP)

Dr. Kiran Pandey Dr. Manoj Tyagi

Department of Computer Science & Engineering

We VIBHUTI SHRIVASTAVA, VIDUSHI DESHMUKH student of

VIBHUTI SHRIVASTAVA (0111CS201191)

I deem it's my privilege to extent my profound

I would also like to express my sincere thanks to

Machine learning techniques in educational data mining

Educational data mining refers to data mining

This is Project uses a script for data analysis and

2.1 Background and problem motivation :

2.2 Overall aim:

2.3 Problem statement:

This study will test two or more ML models in ML

Therefore, the problem statement of this thesis is to

2.4 Research Questions:

1. Are ML algorithms an appropriate way and can they

2. How to implement ML models (SVM and ANN) and

3. How much data is needed to be able to draw these

The scientific knowledge to be gained from the thesis

2- Seeing how SVM and ANN react under these certain

3- Providing an objective method/view for determining

There are many open-source ML frameworks to

OBJECTIVE OF THE PROJECT

The objective of this project is to develop a predictive

Ultimately, the goal is to use this model to help

SCOPE OF THE PROJECT

Educational organizations are one of the important parts

The application of machine learning algorithms for

There are several challenges that can be faced during

1.Data quality and quantity: The availability and quality

3.Overfitting and underfitting: Overfitting occurs when

4.Lack of interpretability: Machine learning models can

5.Deployment and scalability: Deploying the model in a

The objective of this project is to develop a machine-

This study will test two or more ML models in ML

Therefore, the problem statement of this thesis is to

RELATED LITERATURE SURVEY

There are different research papers written on the topic and

8.2 Text Classification Using Machine Learning

8.3 Text Classification Algorithms: A Survey

8.4 Student Performance Evaluation in Educational

The motivation for this study was the growing interest

8.5 Using TensorFlow to Support Educational

8.6 Tracking and Predicting Student Performance in

SOFTWARE AND HARDWARE REQUIREMENTS

An accurate predictive modelling can be achieved by

Artificial Neural Network can solve the non-linear and

9.1.6 Support Vector Machine (SVM): SVM is a

• Data Preprocessing: This module is responsible for

The second research question in this study was to

The third research question in this study was to discern

3.2 Project method description:

aspect was reviewing how well defined were the goals

scope. The second was looking at how comprehensive

Depending on the user's choice, the code generates a

In conclusion, The user is asked to choose an

Predicting students’ performance is very important in

8.1.2 Bigger dataset for SVM and ANN

8.1.3 Other types of algorithms

• Hands-On Machine Learning with Scikit-Learn,

You might also like