0% found this document useful (0 votes)
52 views

Slides - Simple Linear Regression

This document provides an overview of linear regression. It defines linear regression as drawing the "best" line through data. It explains that the equation of a line is y=mx+b, where b is the y-intercept and m is the slope. Linear regression aims to estimate the parameters β0 and β1 in the model Yi = β0 + β1Xi + εi to find the line that best fits the data based on minimizing the sum of squared residuals. It discusses interpreting the estimated slope b1 and intercept b0 coefficients and making predictions within the relevant range of the data.

Uploaded by

Jarir Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Slides - Simple Linear Regression

This document provides an overview of linear regression. It defines linear regression as drawing the "best" line through data. It explains that the equation of a line is y=mx+b, where b is the y-intercept and m is the slope. Linear regression aims to estimate the parameters β0 and β1 in the model Yi = β0 + β1Xi + εi to find the line that best fits the data based on minimizing the sum of squared residuals. It discusses interpreting the estimated slope b1 and intercept b0 coefficients and making predictions within the relevant range of the data.

Uploaded by

Jarir Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Linear Regression

Regression Analysis
Regression Analysis = drawing the “best” line through data
The equation of a line

𝑦 = 𝑚𝑥 + 𝑏
The equation of a line
Rearranging:

𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept

• m is the slope
The equation of a line

𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept

• m is the slope

(0,10)
The equation of a line

𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept

• m is the slope
y increases by .8

(0,10) When x
increases by 1
Simple Linear Regression

• Our goal is to use the data to estimate the “best”


line through the data- specifically the intercept and
the slope.
Linear
Regression • In linear regression, we typically use the symbol β
for parameters (values of the slope and intercept)

• So we try to estimate the following β0 and β1:


𝑌 = β0 + β1 𝑋

• β0 is the y intercept

• β1 is the slope
• However, we cannot find a
slope and intercept that
perfectly fits the data.

• For example, the slopes


between any two observations
are different.
• However, we cannot find a
slope and intercept that
perfectly fits the data.

• For example, the slopes


between any two observations
are different.

• So we do the best we can at


drawing ONE line through the
data…
• However, we cannot find a
slope and intercept that
perfectly fits the data.

• For example, the slopes


between any two observations
are different.

• So we do the best we can at


drawing ONE line through the
data…

• …and the differences between


the data and our line are called
residuals
Simple Linear Regression Model:

𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖

Linear • β0 is the y-intercept


Regression
• β1 is the slope

• ε𝑖 is the residual for observation i=1,2,…N

• Xi is the independent variable

• Yi is the dependent variable


#

Which Variable do I Make Y, and Which Variable do I make X?


𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
• You will need to carefully read the question to find key words that tell you which is the
dependent variable (Y) and which is the independent variable (X).

• A variable that is being “predicted”, “explained”, “affected”, “impacted”, etc., is the


dependent variable.

• On the other hand, the variable that does the predicting, explaining, affecting, etc. is
the independent variable.
(Ex) A professor wants to know how well studying predicts test scores.
What is the dependent and independent variable?
The dependent variable is test scores, the independent variable is studying.
What do 𝛽0 and 𝛽1 tell us?

𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
• They describe the relationship between the independent and dependent variables.

(Ex) Suppose Y is annual sales, and X is customers.

• If the number of customers increases by 1, β1 (the slope coefficient) tells us how much
annual sales will change.

• β0 , the y-intercept coefficient, tells us the annual sales when X is exactly zero.
➢ Although this can sometimes be interesting, we are usually more interested
in the slope, β1.
#
Practice
Suppose you estimate the following relationship between a manager’s salary
(in thousands) and their age (in years):

෣ = 48.4 + 5.2 𝐴𝑔𝑒


𝑠𝑎𝑙𝑎𝑟𝑦

• What is the interpretation of the coefficient on 𝐴𝑔𝑒?

• What is the interpretation of the y-intercept?


Simple Linear Regression Model:

𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖

Linear • Note that β0 and β1 are population parameters.


Regression
• The represent the real relationship between X and Y.

• But just like with hypothesis testing, we typically only


have a sample of data, so we use this to estimate the
population parameters.

• The estimates of β0 and β1 are typically denoted b0


and b1 .
How do we define the “best” line through the
data?
• Intuitively, we would want to make the residuals
as small as possible.
Linear
𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
Regression
• Solve for the residuals:
ε𝑖 = 𝑌𝑖 − β0 − β1 𝑋𝑖

• Add up the residuals for every observation:


𝑁 𝑁

෍ ε𝑖 = ෍ 𝑌𝑖 − β0 − β1 𝑋𝑖
𝑖=1 𝑖=1
• Note that some residuals are Positive error
negative and some are
positive

• So when we add them


together, the negative values
partially offset the positive
values
Negative error
• This is bad because it will Positive error
underestimate the total
distance between the errors
and the line

• To fix this problem, we square


the residuals so they’re all
positive
How do we define the “best” line through the data?

𝑁 𝑁

෍ ε𝑖 = ෍ 𝑌𝑖 − β0 − β1 𝑋𝑖
Linear 𝑖=1 𝑖=1
Regression
How do we define the “best” line through the data?

𝑁 𝑁

෍ ε2𝑖 = ෍ 𝑌𝑖 − β0 − β1 𝑋𝑖 2
Linear 𝑖=1 𝑖=1
Regression • This equation gives you the sum of the squared residuals,
or SSR.

• Our goal is to minimize this value.

• Since Yi and Xi are data that we’ve observed (i.e. they can’t
be changed), we can only adjust βo and β1 to achieve this
goal.

• This is called the Least Squares Method of estimating βo


and β1 .
Estimation
• In practice, we use calculus to choose 𝑏0 and 𝑏1 to minimize
the sum of the squared residuals.

Linear • But for simple (one-variable) regression, there is an easy


formula for the slope b1:
Regression
𝐶𝑜𝑣(𝑋, 𝑌)
𝑏1 =
𝑉𝑎𝑟(𝑥)
• After we compute the slope, we can use it to solve
for the intercept with the following formula:
𝑏0 = 𝑌ത − 𝑏1 𝑋ത

• 𝑌ത is the mean of Y

• 𝑋ത is the mean of X
Suppose you want to know how the number of customers near your
store effects annual sales. You decide to use simple linear regression.

• Annual sales (in millions) is your dependent variable, Y.

• Number of customers (in millions) is your independent variable, X.


Estimate b1 and b0
• b1 = 2.07

• b0 = -1.21

Our estimated regression line is:

෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Estimate b1 and b0
• b1 = 2.07

• b0 = -1.21

Our estimated regression line is:

෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
#

Interpretation
෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Linear
Regression • If the number of customers increases by 1 million, the
annual sales increase by 2.07 million.

• If the number of customers is zero, the average annual


sales are -1.21 million.
Predictions
෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Linear
Regression • What are the predicted sales if there are 4 million
customers?
Predicted value
Predictions
෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Linear
Regression • What are the predicted sales if there are 4 million
customers?

𝑌෡𝑖 = −1.21 + 2.07 ∗ 4 = 7.07 million dollars

• NOTE: Whenever possible, base your predictions off


the exact coefficient estimates, rather than the
rounded numbers.

➢ With the exact numbers in Excel, the prediction


would be 7.09 million dollars. This is a slightly
more accurate estimate.
Your Predictions are Limited
• Only make predictions that are within the
relevant range of your data
Linear
Regression • In other words, you can predict Y for values of X
that are between the smallest and the largest
values of X in your data

• This is called Interpolation.

• Predicting values outside of your relevant range


is called extrapolation, and should be avoided.
#

In other words…
But not
here

Or here
Or here

You can predict here


Measures of Variation
• Just like with ANOVA,
it can be helpful to
break the total
variation in the data
into 3 different groups SST

1. Variation of the
observed data around 𝑌ത = 6.63
the mean. This is the
total sum of squares, or
SST.
𝑁

𝑆𝑆𝑇 = ෍ Yi − 𝑌ത 2

𝑖=1
Measures of Variation
• Just like with ANOVA,
it can be helpful to
break the total
variation in the data
into 3 different groups SST
SSR
2. Variation of the
predicted values around
𝑌ത = 6.63
the mean. This is the
sum of the squared
residuals, or SSR.
𝑁
2
𝑆𝑆𝑅 = ෍ 𝑌෡𝑖 − 𝑌ത
𝑖=1
• 𝑁𝑜𝑡𝑒: the above formula is equivalent
to the previous one given for SSR.
Measures of Variation
Note that 𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅
• Just like with ANOVA,
it can be helpful to
SSE
break the total
variation in the data
into 3 different groups SST
SSR
3. Variation of the observed
values around the predicted 𝑌ത = 6.63
values. This is the error sum
of squares, or SSE.

𝑁
2
𝑆𝑆𝐸 = ෍ Yi − 𝑌෡𝑖
𝑖=1
#

Two ways to Evaluate a Model Using Variation

1. The coefficient of determination, 𝑅2


Linear 𝑆𝑆𝑅
2
Regression 𝑅 =
𝑆𝑆𝑇
• This measures the amount of variation in
Y that is explained by X.

• A high 𝑅2 means your independent


variable, X, is a good predictor of Y.
(Ex) If your 𝑅2 = .90, then your model explains 90% of
the variation in Y. This is considered a very good fit, and
should make relatively good predictions.
#

Two ways to Evaluate a Model Using Variation

2. The standard error of the estimate, 𝑆𝑥𝑦


Linear
Regression 𝑆𝑆𝐸
𝑆𝑥𝑦 =
𝑛−2
• This is the standard deviation of
observations around the prediction line.

• It tells you, on average, how far off a


prediction will be.
(Ex) Say, for our previous example with annual sales and
customers, we get a 𝑆𝑥𝑦 = 1.5. Then, on average, our
predictions are off by 1.5 (million) dollars.

You might also like