Updated Lecture 7
Updated Lecture 7
• Residual ɛ is the distance between the predicted point (on the regression line) and the
actual point
• The objective of linear regression is to find the line that best predicts Y given X.
Linear Regression: Understand how to compute
a and b for a set of X and Y values. ….
Steps to Calculate 'a' and 'b':
1. Determine the Mean of X and Y:
Example :
•Suppose our model predicts student scores (Y_predicted) as 75, 80, 85, while the actual scores
(Y_actual) are 80, 85, 90. SSE would be
Interpretation:
The calculated R-squared value of
approximately 0.527 means that about
52.7% of the variability in exam scores is
explained by the number of hours studied.
The remaining 47.3% is not explained by
our model.
Example1 :Predicting Exam Scores
Suppose we want to predict students' exam scores (dependent
variable) based on the number of hours they study (independent
variable). Hours Studied (X) Exam Scores (Y)
2 60
3 70
4 80
5 85
Predicting Exam Scores ………
Step 1: Calculate the Mean of X
and Y
Hours Studied (X) Exam Scores (Y)
2 60
3 70
4 80
5 85
Predicting Exam Scores ………
Hours Studied Exam Scores
(X) (Y)
2 60
3 70
4 80
5 85
= 8.5
= 44.00
Ypredicted=8.5⋅6+44=95 3
4
70
80
Step 4: Calculate Residual (Sum of Squared Errors) 5 85
Residual=Yactual−Ypredicted
Residuals=[60−(8.5⋅2+44),70−(8.5⋅3+44),80−(8.5⋅4+44),85−(8.5⋅5+44)]
Residuals=[-1,0.5,2,-1.5]
Sum of Squared Errors (SSE)= -1^2 +0.5^2+2^2+(-1.5)^2
SSE=7.5
This represents the total error in our predictions. The smaller the SSE, the
better the model fits the data.
Example2:Investigating Academic
Performance
A college professor is intrigued by the idea that there might be a correlation
between students' grades in internal examinations and their subsequent
performance in external examinations. To explore this hypothesis, the professor
selects a random sample of 15 students from the class.
Exploring the Relationship:
•The professor is interested in understanding whether a high grade in internal
exams tends to correlate with high grades in external exams.
Data Collection:
•A random sample of 15 students is chosen for the study.
•The professor gathers data on the grades of these students in both internal and
external examinations.
Investigating Academic Performance
…..
As you can observe from the above graph,
the line does not predict the data exactly.
Instead, it just cuts through the data. Some
prediction are lower than expected, while
some others are higher than expected
2 1 60
3 2 70
4 2 80
5 3 85
This model allows us to predict the exam score based on the number of hours studied and the number
of practice tests taken . For example, if a student studies for 4 hours and takes 2 practice tests , the
predicted exam score would be:
So, according to our multiple linear regression model, the predicted exam score would be 78.75 for a
student who studies for 4 hours and takes 2 practice tests.
Linear Regression in Python
Importing Libraries Building and Fitting the Model
Visualizing Results