Slides - Simple Linear Regression
Slides - Simple Linear Regression
Regression Analysis
Regression Analysis = drawing the “best” line through data
The equation of a line
𝑦 = 𝑚𝑥 + 𝑏
The equation of a line
Rearranging:
𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept
• m is the slope
The equation of a line
𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept
• m is the slope
(0,10)
The equation of a line
𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept
• m is the slope
y increases by .8
(0,10) When x
increases by 1
Simple Linear Regression
• β0 is the y intercept
• β1 is the slope
• However, we cannot find a
slope and intercept that
perfectly fits the data.
𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
• On the other hand, the variable that does the predicting, explaining, affecting, etc. is
the independent variable.
(Ex) A professor wants to know how well studying predicts test scores.
What is the dependent and independent variable?
The dependent variable is test scores, the independent variable is studying.
What do 𝛽0 and 𝛽1 tell us?
𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
• They describe the relationship between the independent and dependent variables.
• If the number of customers increases by 1, β1 (the slope coefficient) tells us how much
annual sales will change.
• β0 , the y-intercept coefficient, tells us the annual sales when X is exactly zero.
➢ Although this can sometimes be interesting, we are usually more interested
in the slope, β1.
#
Practice
Suppose you estimate the following relationship between a manager’s salary
(in thousands) and their age (in years):
𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
ε𝑖 = 𝑌𝑖 − β0 − β1 𝑋𝑖
𝑖=1 𝑖=1
• Note that some residuals are Positive error
negative and some are
positive
𝑁 𝑁
ε𝑖 = 𝑌𝑖 − β0 − β1 𝑋𝑖
Linear 𝑖=1 𝑖=1
Regression
How do we define the “best” line through the data?
𝑁 𝑁
ε2𝑖 = 𝑌𝑖 − β0 − β1 𝑋𝑖 2
Linear 𝑖=1 𝑖=1
Regression • This equation gives you the sum of the squared residuals,
or SSR.
• Since Yi and Xi are data that we’ve observed (i.e. they can’t
be changed), we can only adjust βo and β1 to achieve this
goal.
• 𝑌ത is the mean of Y
• 𝑋ത is the mean of X
Suppose you want to know how the number of customers near your
store effects annual sales. You decide to use simple linear regression.
• b0 = -1.21
𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Estimate b1 and b0
• b1 = 2.07
• b0 = -1.21
𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
#
Interpretation
𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Linear
Regression • If the number of customers increases by 1 million, the
annual sales increase by 2.07 million.
In other words…
But not
here
Or here
Or here
1. Variation of the
observed data around 𝑌ത = 6.63
the mean. This is the
total sum of squares, or
SST.
𝑁
𝑆𝑆𝑇 = Yi − 𝑌ത 2
𝑖=1
Measures of Variation
• Just like with ANOVA,
it can be helpful to
break the total
variation in the data
into 3 different groups SST
SSR
2. Variation of the
predicted values around
𝑌ത = 6.63
the mean. This is the
sum of the squared
residuals, or SSR.
𝑁
2
𝑆𝑆𝑅 = 𝑌𝑖 − 𝑌ത
𝑖=1
• 𝑁𝑜𝑡𝑒: the above formula is equivalent
to the previous one given for SSR.
Measures of Variation
Note that 𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅
• Just like with ANOVA,
it can be helpful to
SSE
break the total
variation in the data
into 3 different groups SST
SSR
3. Variation of the observed
values around the predicted 𝑌ത = 6.63
values. This is the error sum
of squares, or SSE.
𝑁
2
𝑆𝑆𝐸 = Yi − 𝑌𝑖
𝑖=1
#