Data Set and Linear Regression Analysis
Data Set and Linear Regression Analysis
For this analysis, I chose a simple data set involving hours of study (X) and exam scores
(Y). This data was selected because it’s a common and relatable example: it investigates the
relationship between the time spent studying and the resulting exam score. I wanted to test if
more study hours lead to a better score, which many believe to be true.
Using Google Sheets, I plotted the data and performed linear regression by adding a trendline
and displaying the regression equation on the chart. After inputting the data, I used the
SLOPE and INTERCEPT functions to calculate the slope and y-intercept of the regression
line.
Y=5X+45Y = 5X + 45Y=5X+45
Where:
The correlation coefficient (r) is 0.98, and the coefficient of determination (r²) is 0.96.
Slope (5): This value indicates that for every additional hour spent studying, the exam
score increases by 5 points. The slope shows a positive relationship between the two
variables, meaning more study time generally results in higher exam scores.
Y-intercept (45): This is the predicted exam score when 0 hours of study are done. It
suggests that even with no study time, the expected score is 45. This might represent
the baseline score based on prior knowledge or testing conditions.
Model Confidence
The r-value of 0.98 indicates a very strong positive linear relationship between study hours
and exam scores. This value suggests that the model is highly reliable for predicting
outcomes based on the study time. The r² value of 0.96 shows that 96% of the variation in
exam scores can be explained by the number of hours studied. This means the model is
highly predictive.
Using the model, I can predict the exam score for someone who studies for 7 hours:
I compare this with the actual data (though it’s not provided in the original set). If the actual
exam score for 7 hours of study is 82, the prediction is slightly off by 2 points. This
discrepancy shows the model's limitation, but overall, the prediction is still quite close.
Conclusion
In conclusion, the linear regression model is effective for predicting exam scores based on
study hours, given the high correlation coefficient and the reasonably accurate prediction.
However, as with any model, it’s important to note that there might be exceptions or other
factors influencing exam scores, such as prior knowledge, study techniques, or exam
difficulty.