0% found this document useful (0 votes)
69 views

Prediction of Student Performance Using Linear Regression

This document describes a study that uses linear regression to predict student academic performance based on student background data. The study collects student data, pre-processes it by identifying key attributes, and then uses linear regression to develop a model that predicts students' academic scores based on attributes like gender, age, family details, test preparation, and absent days. Several related works that used different machine learning algorithms like SVM, decision trees, and regression to predict student performance are also reviewed. The methodology involves data collection, pre-processing to select important attributes, and using linear regression to build a predictive model.

Uploaded by

Wisnuaryn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Prediction of Student Performance Using Linear Regression

This document describes a study that uses linear regression to predict student academic performance based on student background data. The study collects student data, pre-processes it by identifying key attributes, and then uses linear regression to develop a model that predicts students' academic scores based on attributes like gender, age, family details, test preparation, and absent days. Several related works that used different machine learning algorithms like SVM, decision trees, and regression to predict student performance are also reviewed. The methodology involves data collection, pre-processing to select important attributes, and using linear regression to build a predictive model.

Uploaded by

Wisnuaryn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2020 International Conference for Emerging Technology (INCET)

Belgaum, India. Jun 5-7, 2020

Prediction of Student Performance Using Linear


Regression
Boddeti Sravani Myneni Madhu Bala
Bachelor Student, Computer Science and Engineering Professor, Computer Science and Engineering
Institute of Aeronautical Engineering, Institute of Aeronautical Engineering,
Hyderabad, Telangana, India Hyderabad, Telangana, India
[email protected] [email protected]

Abstract— This paper is about how the application of dataset should be taken as a tabular format which contains
machine Learning have huge impact in teaching and learning information related to students (that is age, gender,
for further improvement in learning environment in higher academics record, medical information). Various algorithms
education. Due to the interest of students in online and digital can be used to create the model which gives the output for
courses increased rapidly websites such as Course Era, Udemy this thesis. There are many algorithms that are used in
etc became very influential. We implement the new predictive models. In this study we focus on how linear
applications of machine learning in teaching and learning regression is implemented to the student’s academic
considering the students background, students past academic performance considering student’s dataset
score and considering other attributes. As the sizes of classes
are large, it would be difficult to assist each individual student
in each open learning course, this can increase the bar of the II. LITERATURE REVIEW
dropout rate at the end of the course. In this paper we are To specify the thesis as well-structured idea, I have
implementing linear regression which is a machine learning alluded to many research papers that are similar to the thesis.
algorithm to predict the student’s performance in academics
Conclusion details of few of the papers are as follows. This
Keywords— Classification Prediction, Machine Learning,
research study describes how the linear regression approach is
Data cleaning, Data Processing, Linear Regression. used in predicting student’s academic performance.
[1] In this research paper author implemented the thesis
I. INTRODUCTION using SVM approach in java, decision tree, C4.5,
Naive Bayes, LibSVM, Logistic Regression and
As the usage of computers and internet is everywhere, the
Hybrid approach LMT and compared the accuracy
availability of data that can be analysed rapidly increased.
of performance prediction among the hybrid
Data can be anything related to population, academic data of
approaches. The above methods are implemented by
students, and interests of people. We can see that new data
considering suitable attributes.
emerging from time to time. Analysing the data is the
difficult task for humans. So here comes the computer which [2] In this research paper it is observed that the author
can analyse the data more efficiently than humans because it used some of the most popular algorithms and
stores the data digitally in a well-formatted way. regression algorithms. The experiment was carried
using administrate data from the University Polo
This is where the machine learning emerged. Machine
considering 700 courses. The paper concludes that
learning is the branch of Artificial Intelligence that provides
best results are obtained by decision Trees and
ability to automatically learn from past experiences. Here the
SVM.The main contribute of this paper is to
machines do get programmed explicitly. As the name
compare the accuracy levels of different algorithms.
suggests, it gives the ability to the computer that makes
humans and machines look alike in the aspect of learning. On [3] The research is focused on predicting student’s
the basis of the nature of the learning signal, machine performance using personalized analytics. This paper
learning is classified into supervised learning and presents two different approaches to work on the
unsupervised learning. This study focuses on supervised thesis. The first approach used by the author is
learning, more specifically on predictive analysis. Whenever Regression Algorithm, which is one of the data
the predictions of future outcomes are done, predictive mining function. Error rate of the regression
analysis plays an important role. The range of applications of algorithm is also calculated by using the approach
predictive analysis is very vast. Predicting student’s called root mean square.
academic performance is very important because it can
indicate the teachers about the students who are possibly to [4] In this paper the author worked on how to improve
drop out from the course and prediction can provide the prediction algorithms which are used to analyze
additional assistance to the students who need to improve and predict the student’s performance. The work of
their academic performance. this paper is carried using decision trees algorithm

This study is on implementation of machine learning in [5] This paper proposed the student Academic
education. The outcome of this study is to predict student’s performance prediction using Support Vector
academic performance. The data of students is used to Machine. Here the author compared SVM with other
develop a model that can predict student’s performance in ML techniques such as linear regression, Decision
academics considering some background data of the student. Trees, KNN and concluded that SVM outperforms
The input data for this study should be student’s dataset. This other ML algorithms.

978-1-7281-6221-8/20/$31.00 ©2020 IEEE 1

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.
III. METHODOLOGY Here the attributes used in the dataset are gender, age,
The foremost step in the implementation is to collect the parent education, family size, test preparation, father job,
data set required for the research work. The methodology is mother job, absent days, parent status, travel time, academic
applied to the dataset containing the information of the scores.
students. To reduce our analyzation, we can identify the V. ALGORITHM USED
unique attributes from the data set and remove as those
cannot be used for analyzation. After collecting the data, the There are many algorithms that are used to implement the
data is transformed into the desired form. This process is thesis. In this thesis we use linear regression. Though they are
called as pre-processing of data. It is the most important step all used to predict the dependent variable based on
in order to get the particular desired data from the raw data. independent variables, they differ in implementation of the
More the rate of the accuracy of Pre-processing of the raw algorithm.
data, more the rate accuracy of suitable data.
A. Linear Regression
The next step after pre-processing the data is to find the
incomplete, irrelevant data in the dataset and remove it in Linear regression is one of the machine learning
order to obtain the accurate results of the work. Removing of algorithms. It is based on supervised learning is one of the
the unwanted data phase is called as Data Cleaning. Next, we algorithms that is widely known and it is easily understood
can choose any one of the algorithms such as linear even by the person who is not so familiar with machine
regression, Support vector machine, Naive Bayes Standard learning algorithms. As the name suggests linear regression
Classification, decision tree algorithms for better performs regression. It defines the relationship between the
classification. two variables by fitting regression line to the data.one of the
two variables is dependent variable which is dependent on
Here, in this paper linear regression algorithm is chosen another variable called as independent variable. One should
for the implementation. Further, we have to choose training make sure that there exists a relationship between the
set from the dataset and identify the Result attributes that dependent and independent variables before modelling.
decides the output and start classification. Strength of the relationship between the variables can be
known by using the scatterplot.
Linear regression line is represented in the form of:
Y=a*X+b
x Y-Dependent Variable
x a-slope
x X-Independent Variable
x b-Intercept
With the best fit regression line to the data the error rate
between the predicted and true values can be minimized.
Linear Regression is classified into two types. One of it is
Simple Linear Regression, in which only one independent
variable is used and the second type of regression is Multiple
Linear Regression. In this type of regression multiple
Fig. 1. Implementation of the model independent variables are used which we are presently using
for the thesis.
IV. DATA DESCRIPTION B. Implementation
The data used in this paper is the sample data. The sample
dataset is comprised of 100 students. In this study we deal
with 100 instances and 10 attributes. All the dependent
variables and independent variables are given in figure2.

Fig. 3. Description of variables

Fig. 2. Student related dataset

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.
The first step is to read our data set into R and explore its D. Boxplot Representation
summary and structure. The summary () functions provides Box plot is a graph that provides us with the measures of
us with information on each variable such as type of data: central tendency, spread and visual of out-liners:
character, numerical, and if numerical, we find basic
descriptive statistics such as measure of central tendency and x Median: The middle value of the data set.
spread. It also provides as with information on missing x First Quartile: The middle number between the
values (NA values). smallest number and the median of the dataset.
x Third Quartile: the middle value between the median
C. Data Visualization and the highest value of the dataset.
The primary purpose of Visualization is to find visual x Interquartile range: 25th to the 75th percentile.
patterns. We are going to plot academic score versus gender, x outliers, maximum, minimum.
age, parent status using GGPLOT package. GGPLOT
requires three key components: Summary for boxplot visualizations:
x students who completed the prep class had better
x Define data in form of data frame academic scores.
x Describe aesthetics for the visualization or how to x Female students scored more than the male students.
map the attributes.
x Define the geometry or type of graphics to be used

Fig. 6. Boxplot representation of the academics and the gender.

Fig. 4. Data visualization between Academic scores and gender of the


students.

We can observe the percentage of the academic scores


regarding the gender of the student. We can visualize the
academic scores with every attribute we considered using the
GGPLOT.

Fig. 5. Data visualization between the academic scores and age of the
students Fig. 7. Boxplot representation of academics and age of the student.

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.
VI. RESULT The prediction interval is rather similar to the confidence
In this section we are going to build a linear regression interval in calculation. The prediction interval equation is
model, predicting academic scores. Academic scores- defined as:
dependent variable(Y), Gender, Age, Mother job, Parent
Education, Test Prep, Father job, Parent status, travel time,
absent days independent variables (X). First, we will divide
our dataset into two parts as a training and testing data sets.
Then, we will apply lm() function on” training” data and
predict() function on” testing” data, and create a visualization
of our regression model with regression line and 95%
confidence intervals.

TABLE I. VALUES OF THE VARIABLES

Fig. 8. Visualization of predicted scores

VII. CONCLUSION
The effectiveness of using machine learning in education
The table describes estimated value, error value, tvalue. field depends on the algorithm and the usage of the data.
Here tvalue refers to the value of relative difference of the Choosing the algorithm to implement predicting the students’
variation in the data. performance is important. The accuracy of the result depends
on the Machine learning algorithm. The algorithm used to
The P value defines the statistically predictive capability prove the thesis in this paper is Linear Regression. Present
of the independent variable. Probability of the predictive studies show that academic performances of the students are
capability and the impact of the variable on the output are also dependent on student’s background and other attributes.
inversely related. Many research works confirm that apart from the past
academic performances student’s background and other
TABLE II. VALUES OF FITTING OF THE DATA attributes indeed got a significant influence over students’
performance. Machine learning has an emerging role in
recent times in every sector, and it can also be used
effectively in academia. In the future, many applications with
improved ability and efficiency may become an integrated
part of every academic institutions.

ACKNOWLEDGMENT
This research contribution is a part of undergraduate
research and content development at Institute of Aeronautical
Engineering.

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [5] S.A. Oloruntoba, J.L. Akinode “Student Academic Performance
Prediction Using Support Vector Machine”, December, 2017
[1] Amandepp Kaur, Nitin Umesh, Barjinder Singh” Machine Learning
approach to predict Student Academic Performance, International [6] Dhanashree Mane, Pranali Namdas, Pooja Gargade, Dnyaneshwari
Journal for Research in Applied Science Engineering Technology Jagtap, S.S. Rathi” Predicting student Performance Using
(IJRASET), Volume.6 Issue IV, April 2018. Machine Learning Approach”. VJER Vishwakarma Journal of
Engineering Research, Volume 2 Issue 4, December 2018.
[2] Pedro Strecht, Luis Cruz, Carlos Soares, João Mendes-Moreira and
[7] Havan Agarwal, Harshil Mavani” Student Performance Prediction
Rui Abreu “A comparative study of classification and regression
Using Machine learning”, International Journal of Engineering
algorithms for Modelling student’s Academic performance”,
Research and Technology (Ijert), Vol. 4 Issue 03, March-2015.
Proceedings of the 8th International Conference on Educational Data
Mining,2015. [8] Raheela Asif, Agathe Merceron, and Mahmood. K Pathan,
“Predicting student academic performance at degree level: A case
[3] G. Sujatha, S. Sindhu and P. Savaridassan “Predicting student’s
study, International Journal of Intelligent Systems and Applications”
performance using personalized analytics”, Volume.119 No. 12,
Vol.7, No.1, 2014.
2018.
[4] Ankitha A Nichat, Dr. Anjali B Raut “predicting and Analysis of [9] Murat Pojon “Using Machine Learning to Predict Student
Performance” June 2017.
student Performance Using Decision Tree Technique”, International
Journal of Innovative Research in Computer and Communication [10] Erkan Er” Identifying At-Risk Students Using Machine Learning
Engineering.Vol.5, Issue 4, April 2017. Techniques: A Case Study with IS 100”, International Journal of
Machine Learning and Computing, Vol.2, No.4, August 2012

Authorized licensed use limited to: University of New South Wales. Downloaded on August 23,2020 at 14:07:37 UTC from IEEE Xplore. Restrictions apply.

You might also like