Prediction of Student Performance Using Linear Regression

This document describes a study that uses linear regression to predict student academic performance based on student background data. The study collects student data, pre-processes it by identifying key attributes, and then uses linear regression to develop a model that predicts students' academic scores based on attributes like gender, age, family details, test preparation, and absent days. Several related works that used different machine learning algorithms like SVM, decision trees, and regression to predict student performance are also reviewed. The methodology involves data collection, pre-processing to select important attributes, and using linear regression to build a predictive model.

Uploaded by

Wisnuaryn

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

Prediction of Student Performance Using Linear Regression

Uploaded by

Wisnuaryn

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2020 International Conference for Emerging Technology (INCET)

Belgaum, India. Jun 5-7, 2020

Prediction of Student Performance Using Linear

Regression
Boddeti Sravani Myneni Madhu Bala
Bachelor Student, Computer Science and Engineering Professor, Computer Science and Engineering
Institute of Aeronautical Engineering, Institute of Aeronautical Engineering,
Hyderabad, Telangana, India Hyderabad, Telangana, India
[email protected] [email protected]

Abstract— This paper is about how the application of dataset should be taken as a tabular format which contains
machine Learning have huge impact in teaching and learning information related to students (that is age, gender,
for further improvement in learning environment in higher academics record, medical information). Various algorithms
education. Due to the interest of students in online and digital can be used to create the model which gives the output for
courses increased rapidly websites such as Course Era, Udemy this thesis. There are many algorithms that are used in
etc became very influential. We implement the new predictive models. In this study we focus on how linear
applications of machine learning in teaching and learning regression is implemented to the student’s academic
considering the students background, students past academic performance considering student’s dataset
score and considering other attributes. As the sizes of classes
are large, it would be difficult to assist each individual student
in each open learning course, this can increase the bar of the II. LITERATURE REVIEW
dropout rate at the end of the course. In this paper we are To specify the thesis as well-structured idea, I have
implementing linear regression which is a machine learning alluded to many research papers that are similar to the thesis.
algorithm to predict the student’s performance in academics
Conclusion details of few of the papers are as follows. This
Keywords— Classification Prediction, Machine Learning,
research study describes how the linear regression approach is
Data cleaning, Data Processing, Linear Regression. used in predicting student’s academic performance.
[1] In this research paper author implemented the thesis
I. INTRODUCTION using SVM approach in java, decision tree, C4.5,
Naive Bayes, LibSVM, Logistic Regression and
As the usage of computers and internet is everywhere, the
Hybrid approach LMT and compared the accuracy
availability of data that can be analysed rapidly increased.
of performance prediction among the hybrid
Data can be anything related to population, academic data of
approaches. The above methods are implemented by
students, and interests of people. We can see that new data
considering suitable attributes.
emerging from time to time. Analysing the data is the
difficult task for humans. So here comes the computer which [2] In this research paper it is observed that the author
can analyse the data more efficiently than humans because it used some of the most popular algorithms and
stores the data digitally in a well-formatted way. regression algorithms. The experiment was carried
using administrate data from the University Polo
This is where the machine learning emerged. Machine
considering 700 courses. The paper concludes that
learning is the branch of Artificial Intelligence that provides
best results are obtained by decision Trees and
ability to automatically learn from past experiences. Here the
SVM.The main contribute of this paper is to
machines do get programmed explicitly. As the name
compare the accuracy levels of different algorithms.
suggests, it gives the ability to the computer that makes
humans and machines look alike in the aspect of learning. On [3] The research is focused on predicting student’s
the basis of the nature of the learning signal, machine performance using personalized analytics. This paper
learning is classified into supervised learning and presents two different approaches to work on the
unsupervised learning. This study focuses on supervised thesis. The first approach used by the author is
learning, more specifically on predictive analysis. Whenever Regression Algorithm, which is one of the data
the predictions of future outcomes are done, predictive mining function. Error rate of the regression
analysis plays an important role. The range of applications of algorithm is also calculated by using the approach
predictive analysis is very vast. Predicting student’s called root mean square.
academic performance is very important because it can
indicate the teachers about the students who are possibly to [4] In this paper the author worked on how to improve
drop out from the course and prediction can provide the prediction algorithms which are used to analyze
additional assistance to the students who need to improve and predict the student’s performance. The work of
their academic performance. this paper is carried using decision trees algorithm

This study is on implementation of machine learning in [5] This paper proposed the student Academic
education. The outcome of this study is to predict student’s performance prediction using Support Vector
academic performance. The data of students is used to Machine. Here the author compared SVM with other
develop a model that can predict student’s performance in ML techniques such as linear regression, Decision
academics considering some background data of the student. Trees, KNN and concluded that SVM outperforms
The input data for this study should be student’s dataset. This other ML algorithms.

978-1-7281-6221-8/20/$31.00 ©2020 IEEE 1

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.
III. METHODOLOGY Here the attributes used in the dataset are gender, age,
The foremost step in the implementation is to collect the parent education, family size, test preparation, father job,
data set required for the research work. The methodology is mother job, absent days, parent status, travel time, academic
applied to the dataset containing the information of the scores.
students. To reduce our analyzation, we can identify the V. ALGORITHM USED
unique attributes from the data set and remove as those
cannot be used for analyzation. After collecting the data, the There are many algorithms that are used to implement the
data is transformed into the desired form. This process is thesis. In this thesis we use linear regression. Though they are
called as pre-processing of data. It is the most important step all used to predict the dependent variable based on
in order to get the particular desired data from the raw data. independent variables, they differ in implementation of the
More the rate of the accuracy of Pre-processing of the raw algorithm.
data, more the rate accuracy of suitable data.
A. Linear Regression
The next step after pre-processing the data is to find the
incomplete, irrelevant data in the dataset and remove it in Linear regression is one of the machine learning
order to obtain the accurate results of the work. Removing of algorithms. It is based on supervised learning is one of the
the unwanted data phase is called as Data Cleaning. Next, we algorithms that is widely known and it is easily understood
can choose any one of the algorithms such as linear even by the person who is not so familiar with machine
regression, Support vector machine, Naive Bayes Standard learning algorithms. As the name suggests linear regression
Classification, decision tree algorithms for better performs regression. It defines the relationship between the
classification. two variables by fitting regression line to the data.one of the
two variables is dependent variable which is dependent on
Here, in this paper linear regression algorithm is chosen another variable called as independent variable. One should
for the implementation. Further, we have to choose training make sure that there exists a relationship between the
set from the dataset and identify the Result attributes that dependent and independent variables before modelling.
decides the output and start classification. Strength of the relationship between the variables can be
known by using the scatterplot.
Linear regression line is represented in the form of:
Y=a*X+b
x Y-Dependent Variable
x a-slope
x X-Independent Variable
x b-Intercept
With the best fit regression line to the data the error rate
between the predicted and true values can be minimized.
Linear Regression is classified into two types. One of it is
Simple Linear Regression, in which only one independent
variable is used and the second type of regression is Multiple
Linear Regression. In this type of regression multiple
Fig. 1. Implementation of the model independent variables are used which we are presently using
for the thesis.
IV. DATA DESCRIPTION B. Implementation
The data used in this paper is the sample data. The sample
dataset is comprised of 100 students. In this study we deal
with 100 instances and 10 attributes. All the dependent
variables and independent variables are given in figure2.

Fig. 3. Description of variables

Fig. 2. Student related dataset

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.
The first step is to read our data set into R and explore its D. Boxplot Representation
summary and structure. The summary () functions provides Box plot is a graph that provides us with the measures of
us with information on each variable such as type of data: central tendency, spread and visual of out-liners:
character, numerical, and if numerical, we find basic
descriptive statistics such as measure of central tendency and x Median: The middle value of the data set.
spread. It also provides as with information on missing x First Quartile: The middle number between the
values (NA values). smallest number and the median of the dataset.
x Third Quartile: the middle value between the median
C. Data Visualization and the highest value of the dataset.
The primary purpose of Visualization is to find visual x Interquartile range: 25th to the 75th percentile.
patterns. We are going to plot academic score versus gender, x outliers, maximum, minimum.
age, parent status using GGPLOT package. GGPLOT
requires three key components: Summary for boxplot visualizations:
x students who completed the prep class had better
x Define data in form of data frame academic scores.
x Describe aesthetics for the visualization or how to x Female students scored more than the male students.
map the attributes.
x Define the geometry or type of graphics to be used

Fig. 6. Boxplot representation of the academics and the gender.

Fig. 4. Data visualization between Academic scores and gender of the

students.

We can observe the percentage of the academic scores

regarding the gender of the student. We can visualize the
academic scores with every attribute we considered using the
GGPLOT.

Fig. 5. Data visualization between the academic scores and age of the
students Fig. 7. Boxplot representation of academics and age of the student.

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.
VI. RESULT The prediction interval is rather similar to the confidence
In this section we are going to build a linear regression interval in calculation. The prediction interval equation is
model, predicting academic scores. Academic scores- defined as:
dependent variable(Y), Gender, Age, Mother job, Parent
Education, Test Prep, Father job, Parent status, travel time,
absent days independent variables (X). First, we will divide
our dataset into two parts as a training and testing data sets.
Then, we will apply lm() function on” training” data and
predict() function on” testing” data, and create a visualization
of our regression model with regression line and 95%
confidence intervals.

TABLE I. VALUES OF THE VARIABLES

Fig. 8. Visualization of predicted scores

VII. CONCLUSION
The effectiveness of using machine learning in education
The table describes estimated value, error value, tvalue. field depends on the algorithm and the usage of the data.
Here tvalue refers to the value of relative difference of the Choosing the algorithm to implement predicting the students’
variation in the data. performance is important. The accuracy of the result depends
on the Machine learning algorithm. The algorithm used to
The P value defines the statistically predictive capability prove the thesis in this paper is Linear Regression. Present
of the independent variable. Probability of the predictive studies show that academic performances of the students are
capability and the impact of the variable on the output are also dependent on student’s background and other attributes.
inversely related. Many research works confirm that apart from the past
academic performances student’s background and other
TABLE II. VALUES OF FITTING OF THE DATA attributes indeed got a significant influence over students’
performance. Machine learning has an emerging role in
recent times in every sector, and it can also be used
effectively in academia. In the future, many applications with
improved ability and efficiency may become an integrated
part of every academic institutions.

ACKNOWLEDGMENT
This research contribution is a part of undergraduate
research and content development at Institute of Aeronautical
Engineering.

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [5] S.A. Oloruntoba, J.L. Akinode “Student Academic Performance
Prediction Using Support Vector Machine”, December, 2017
[1] Amandepp Kaur, Nitin Umesh, Barjinder Singh” Machine Learning
approach to predict Student Academic Performance, International [6] Dhanashree Mane, Pranali Namdas, Pooja Gargade, Dnyaneshwari
Journal for Research in Applied Science Engineering Technology Jagtap, S.S. Rathi” Predicting student Performance Using
(IJRASET), Volume.6 Issue IV, April 2018. Machine Learning Approach”. VJER Vishwakarma Journal of
Engineering Research, Volume 2 Issue 4, December 2018.
[2] Pedro Strecht, Luis Cruz, Carlos Soares, João Mendes-Moreira and
[7] Havan Agarwal, Harshil Mavani” Student Performance Prediction
Rui Abreu “A comparative study of classification and regression
Using Machine learning”, International Journal of Engineering
algorithms for Modelling student’s Academic performance”,
Research and Technology (Ijert), Vol. 4 Issue 03, March-2015.
Proceedings of the 8th International Conference on Educational Data
Mining,2015. [8] Raheela Asif, Agathe Merceron, and Mahmood. K Pathan,
“Predicting student academic performance at degree level: A case
[3] G. Sujatha, S. Sindhu and P. Savaridassan “Predicting student’s
study, International Journal of Intelligent Systems and Applications”
performance using personalized analytics”, Volume.119 No. 12,
Vol.7, No.1, 2014.
2018.
[4] Ankitha A Nichat, Dr. Anjali B Raut “predicting and Analysis of [9] Murat Pojon “Using Machine Learning to Predict Student
Performance” June 2017.
student Performance Using Decision Tree Technique”, International
Journal of Innovative Research in Computer and Communication [10] Erkan Er” Identifying At-Risk Students Using Machine Learning
Engineering.Vol.5, Issue 4, April 2017. Techniques: A Case Study with IS 100”, International Journal of
Machine Learning and Computing, Vol.2, No.4, August 2012

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.

Stonex Cube-A V5 November2020
100% (2)
Stonex Cube-A V5 November2020
48 pages
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
From Everand
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
Sukhpreet Kaur Gill
No ratings yet
Environmental & Social Impact Assessment Report of Sirajganj Economic Zone
No ratings yet
Environmental & Social Impact Assessment Report of Sirajganj Economic Zone
142 pages
En 10290
100% (3)
En 10290
48 pages
Student Performance Analysis System Using Data Mining IJERTCONV5IS01025
No ratings yet
Student Performance Analysis System Using Data Mining IJERTCONV5IS01025
3 pages
11861-Article Text-21047-1-10-20211230
No ratings yet
11861-Article Text-21047-1-10-20211230
7 pages
Analysis of Student Performance Based On Classification and Mapreduce Approach in Bigdata
No ratings yet
Analysis of Student Performance Based On Classification and Mapreduce Approach in Bigdata
8 pages
18d2d550ad9b71c9315f45c680d8629283cd
No ratings yet
18d2d550ad9b71c9315f45c680d8629283cd
6 pages
Student Performance Prediction Using Multi-Layers Artificial Neural Networks A Case Study On Educational Data Mining
No ratings yet
Student Performance Prediction Using Multi-Layers Artificial Neural Networks A Case Study On Educational Data Mining
6 pages
Analysis of Educational
No ratings yet
Analysis of Educational
5 pages
Using ID3 Decision Tree Algorithm To The Student Grade Analysis and Prediction
No ratings yet
Using ID3 Decision Tree Algorithm To The Student Grade Analysis and Prediction
4 pages
Irjet V7i2688 PDF
No ratings yet
Irjet V7i2688 PDF
4 pages
Literature Review
No ratings yet
Literature Review
11 pages
Student Performance Evaluation in Educat
No ratings yet
Student Performance Evaluation in Educat
3 pages
Review On Prediction Algorithms in Educational Data Mining: A.Dinesh Kumar, R.Pandi Selvam, K.Sathesh Kumar
No ratings yet
Review On Prediction Algorithms in Educational Data Mining: A.Dinesh Kumar, R.Pandi Selvam, K.Sathesh Kumar
8 pages
Optimizing E-Learning Platforms Using Machine Learning Algorithms
No ratings yet
Optimizing E-Learning Platforms Using Machine Learning Algorithms
5 pages
E10380585S19
No ratings yet
E10380585S19
6 pages
A Survey On Educational Data Mining Techniques in Predicting Student's Academic Performance
No ratings yet
A Survey On Educational Data Mining Techniques in Predicting Student's Academic Performance
3 pages
ICSMB2016-C Anuradha
No ratings yet
ICSMB2016-C Anuradha
7 pages
Article 3
No ratings yet
Article 3
4 pages
(Gdrive) A Learning Performance Assessment Model Using Neural Network Classification Methods of E-Learning Activity Log Data
No ratings yet
(Gdrive) A Learning Performance Assessment Model Using Neural Network Classification Methods of E-Learning Activity Log Data
8 pages
Feature-Selection-Algorithms-For-Predicting-Students-Academic-Performance-Using-Data-Mining-Techniques
No ratings yet
Feature-Selection-Algorithms-For-Predicting-Students-Academic-Performance-Using-Data-Mining-Techniques
5 pages
An Efficient Comparison Neural Network Methods To Evaluate Student Performance
No ratings yet
An Efficient Comparison Neural Network Methods To Evaluate Student Performance
4 pages
Journal Publications
No ratings yet
Journal Publications
13 pages
Classification Model of Prediction for Placement of Students
No ratings yet
Classification Model of Prediction for Placement of Students
9 pages
Predicting Students' Academic Performance in The University Using Meta Decision Tree Classifiers
No ratings yet
Predicting Students' Academic Performance in The University Using Meta Decision Tree Classifiers
9 pages
Predicting and Interpreting Student Performance Using Ensemble Models and Shapley Additive Explanations
No ratings yet
Predicting and Interpreting Student Performance Using Ensemble Models and Shapley Additive Explanations
16 pages
review-on-predicting-student-academic-performance-using-data-mining-classification-algorithm-Rwuc
No ratings yet
review-on-predicting-student-academic-performance-using-data-mining-classification-algorithm-Rwuc
5 pages
Predicting Student Action Through Online Examination in An Online Training
No ratings yet
Predicting Student Action Through Online Examination in An Online Training
7 pages
2340121796
No ratings yet
2340121796
4 pages
Ade 2014
No ratings yet
Ade 2014
4 pages
Students Performance Analysis Using Machine Learning Tools
No ratings yet
Students Performance Analysis Using Machine Learning Tools
5 pages
Studentplacement
No ratings yet
Studentplacement
10 pages
Placement
No ratings yet
Placement
5 pages
Higher Education Prediction BY Using Data Mining: Related Work
No ratings yet
Higher Education Prediction BY Using Data Mining: Related Work
2 pages
Classifying Students Performance Using Gradient Boosting Algorithm Technique
No ratings yet
Classifying Students Performance Using Gradient Boosting Algorithm Technique
7 pages
Student Academic Performance Prediction by using Decision Tree Algorithm
No ratings yet
Student Academic Performance Prediction by using Decision Tree Algorithm
5 pages
ssrn-3370802_2
No ratings yet
ssrn-3370802_2
5 pages
Research Paper
No ratings yet
Research Paper
4 pages
University of Mumbai
No ratings yet
University of Mumbai
5 pages
Jurnal Internasional 2
No ratings yet
Jurnal Internasional 2
8 pages
Educational Data Mining: A State-Of-The-Art Survey On Tools and Techniques Used in EDM
No ratings yet
Educational Data Mining: A State-Of-The-Art Survey On Tools and Techniques Used in EDM
7 pages
A Paper
No ratings yet
A Paper
11 pages
Performance Evaluation of Feature Selection Algorithms in Educational Data Mining
No ratings yet
Performance Evaluation of Feature Selection Algorithms in Educational Data Mining
9 pages
Students Performance Prediction System Using Multi Agent Data Mining Technique
No ratings yet
Students Performance Prediction System Using Multi Agent Data Mining Technique
20 pages
Synopsis New
No ratings yet
Synopsis New
5 pages
Applying Classification Techniques in E-Learning System: An Overview
No ratings yet
Applying Classification Techniques in E-Learning System: An Overview
4 pages
Student's Placement Eligibility Prediction Using Fuzzy Approach
No ratings yet
Student's Placement Eligibility Prediction Using Fuzzy Approach
5 pages
Students' Course Results Prediction Based On Data Processing and Machine Learning Methods
No ratings yet
Students' Course Results Prediction Based On Data Processing and Machine Learning Methods
13 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
2 pages
paper-predicting-student-scores
No ratings yet
paper-predicting-student-scores
10 pages
Reasearch Pper
No ratings yet
Reasearch Pper
13 pages
Ukwuoma 2019
No ratings yet
Ukwuoma 2019
5 pages
Expert System For Student Placement Prediction
No ratings yet
Expert System For Student Placement Prediction
5 pages
SSRN Id3243704
No ratings yet
SSRN Id3243704
6 pages
A Comparative Study On University Admiss
No ratings yet
A Comparative Study On University Admiss
12 pages
Ijet V3i5p39
No ratings yet
Ijet V3i5p39
15 pages
Student Performance Analysis Using Educa
No ratings yet
Student Performance Analysis Using Educa
8 pages
Review paper DONE (1)
No ratings yet
Review paper DONE (1)
10 pages
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Teaching and Learning in STEM With Computation, Modeling, and Simulation Practices: A Guide for Practitioners and Researchers
From Everand
Teaching and Learning in STEM With Computation, Modeling, and Simulation Practices: A Guide for Practitioners and Researchers
Alejandra J. Magana
No ratings yet
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Three Basic Moral Virtues by Aristotle
No ratings yet
Three Basic Moral Virtues by Aristotle
1 page
Physics Grade 8
No ratings yet
Physics Grade 8
5 pages
Itb05-042 (Procedure To Complete Iavl When Idle Speed Needs To Be Reduced)
No ratings yet
Itb05-042 (Procedure To Complete Iavl When Idle Speed Needs To Be Reduced)
5 pages
Drifting Back To You
No ratings yet
Drifting Back To You
324 pages
Helix Bmed
No ratings yet
Helix Bmed
14 pages
Let 2022
No ratings yet
Let 2022
11 pages
NAFEM
No ratings yet
NAFEM
14 pages
The Clinical Use of Blood - Handbook: Search Topics Titles Organizations Keywords
No ratings yet
The Clinical Use of Blood - Handbook: Search Topics Titles Organizations Keywords
5 pages
Gamblers Ruin
No ratings yet
Gamblers Ruin
6 pages
Global Surgical Periods
No ratings yet
Global Surgical Periods
4 pages
Wind Turbine Power Curves
No ratings yet
Wind Turbine Power Curves
6 pages
Dispersal of Seeds by Animals
100% (1)
Dispersal of Seeds by Animals
15 pages
Rock Slope Stability
100% (2)
Rock Slope Stability
7 pages
Microchip 16-Bit MCU PIC24F
100% (1)
Microchip 16-Bit MCU PIC24F
84 pages
Inbound 3104886245642113262
No ratings yet
Inbound 3104886245642113262
18 pages
6.08 - Bacterial Transformation Live Lesson 1.19.21
No ratings yet
6.08 - Bacterial Transformation Live Lesson 1.19.21
23 pages
DELTA-SC 2030 tds_eng
No ratings yet
DELTA-SC 2030 tds_eng
2 pages
Assessment of The Breast: Subjective Data
No ratings yet
Assessment of The Breast: Subjective Data
2 pages
Class - 10th pt-1 English
No ratings yet
Class - 10th pt-1 English
4 pages
84 Salads Dressings Dips Condiments
0% (1)
84 Salads Dressings Dips Condiments
23 pages
TC 600V 12awg PVC
No ratings yet
TC 600V 12awg PVC
1 page
Module 4
No ratings yet
Module 4
21 pages
Buzzer
No ratings yet
Buzzer
2 pages
Unit 5
No ratings yet
Unit 5
60 pages
Advanced Process Control of PDF
No ratings yet
Advanced Process Control of PDF
4 pages
6000 - EN - 02 - ACBB - Indd 404 06-10-02 14.21.53
No ratings yet
6000 - EN - 02 - ACBB - Indd 404 06-10-02 14.21.53
64 pages
Medical Errors: Glossary (Variables, Metrics and Measurement Methods)
No ratings yet
Medical Errors: Glossary (Variables, Metrics and Measurement Methods)
8 pages