Student Performance Regression
Student Performance Regression
Section – B
Study Group – 4
P23085 – DEEPIKA S K
Step 1) Descriptive Stats - Descriptive statistics summarizes or describes the characteristics of a data set.
The main purpose of descriptive statistics is to provide information about a data set. It summarizes the
large amount of data into several useful bits of information. You can refer the excel sheet for the
descriptive analysis of our data.
Performance Index
120
100
Performance Index
40
20
0
30 40 50 60 70 80 90 100 110
Previous Score
In our dataset, we have plotted 2 scatter plots namely, a) PI vs Hours studied, and b) PI vs Previous score.
The regression line is a trend line we use to model a linear trend that we see in a scatterplot, but realize
that some data will show a relationship that isn’t necessarily linear. As you can see in the figure above,
the regression line is given by y = 1.0138x - 15.182 and R² = 0.8376
Step 3) Correlation -
Sample
Question Extra
Hours Previous Sleep Papers Curricular Performanc
Studied Scores Hours Practiced activities e Index
Hours
Studied 1
Previous
Scores -0.01239 1
0.00124 0.00594
Sleep Hours 5 4 1
Sample
Question
Papers 0.01746 0.00788 0.003990
Practiced 3 8 2 1
Extra -
Curricular 0.00387 0.00836 0.023283
activities 3 9 6 0.013102781 1
Performanc 0.91518 0.048105
e Index 0.37373 9 8 0.043268327 0.024524947 1
Step 4) Regression -
4 Checks:
We also attempted to combine a categorical variable with a continuous variable, such as pairing
extra-curricular studies with hours studied and extra-curricular activities with previous scores, in
an effort to enhance the significance of our models. The exact values can be found in the Excel
sheet. However, it was observed that the p-value was not significant (p-value > 0.05). As a result,
we decided to discard this model.
In the regression model, we interpreted that the Performance Index (PI) is most affected by the
no. of hours studied. It means more you will study, more marks you will get. After no. of hours
studied, Previous scores has the most effect on PI. It means people who scored good in their
past will generally tend to do good in the next exams. After all these, we see that extra curricular
activities also has an effect on PI. It means a person should also do extracurricular activities in
order to have good PI. After extracurriculars, sleep also has an effect on PI. If you have an
adequate amount of sleep you will perform better. Also, we see that sample papers solved has
the least effect on PI according to the data.
Interpretation: In the regression model represented by the equation PI = -34 + 2.85 * Hrs_Studied + 1.02
* Prev_score + 0.48 * Sleep_hrs + 0.19 * Sample_papers + 0.61 * Extra_curriculars, we observe the
following impact on the Performance Index (PI).
The variable with the most significant effect on PI is the number of hours studied. This implies that an
increase in study hours is associated with a higher PI, suggesting that more dedicated study time
contributes to better performance.
Following study hours, the next influential factor is the previous score. Individuals who achieved higher
scores in their previous examinations tend to exhibit better performance in subsequent exams.
Moreover, engagement in extracurricular activities also has a discernible effect on PI. The data suggests
that individuals involved in extracurricular activities tend to have a higher PI.
Additionally, the amount of sleep influences PI, indicating that having an adequate sleep duration is
correlated with better performance.
Lastly, the analysis reveals that the impact of solving sample papers on PI is comparatively lower than
other factors considered in the model.
In summary, the regression model highlights the relative importance of various factors in determining
the Performance Index, with study hours and previous scores playing more substantial roles, followed by
extracurricular activities, sleep, and sample papers.