Simple Linear Regression Model I
Simple Linear Regression Model I
Tobias Broer
1. After this lecture, you will be familiar with the concept, and language, of
“OLS regression analysis"
2. You will know how to estimate the coefficients in the simple linear
regression model on the basis of a sample
3. You will know the properties of the OLS estimates, including goodness of
fit
4. And you will be able to interpret the estimated coefficients, and the effect
of changing units of measurement, and functional form
1. Definition of the Simple Regression Model
y = β0 + β1 x + u, (1)
Simple linear regression model
y = β0 + β1 x + u, (1)
y = β0 + β1 x + u, (1)
y = β0 + β1 x + u, (2)
when ∆u = 0
I But: Is each year of education really worth the same dollar amount no
matter how much education one starts with?
Simple linear regression model
y = β0 + β1 x + u, (6)
y = β0 + β1 x + u, (6)
y = β0 + β1 x + u, (7)
y = β0 + β1 x + u (8)
Simple linear regression model
y = β0 + β1 x + u, (9)
y = β0 + β1 x + u (10)
y = (β0 + α0 ) + β1 x + (u − α0 ), (11)
y = β0 + β1 x + u, (12)
y = β0 + β1 x + u, (14)
y = β0 + β1 x + u, (15)
I Assumption 2b: E (u|x) = E (u) for all values of x, where E (u|x) means
“the expected value of u given x.”
I We say u is mean independent of x.
I Note: Full independence also suffices, as it implies mean independence.
I For this course, Cov (x, u) = 0 also suffices.
Example 1: Fertilizer and yield
so that the average ability is the same in the different portions of the
population with an 8th grade education, a 12th grade education, and a
four-year college eduction.
I Because people choose education levels partly based on ability, this
assumption is almost certainly false.
Zero conditional mean and population regression function
yi = β0 + β1 xi + ui (19)
where the i subscript indicates a particular observation.
I Strategy: Use our assumptions about u to “identify" β0 , β1 from observed
yi , xi .
Deriving the Ordinary Least Squares Estimates
1. E (u) = E (y − β0 − β1 x) = 0
2. E (xu) = E [x(y − β0 − β1 x)] = 0
I These are the two conditions in the population that determine β0 and β1 .
I Method of moments: Use their sample analogs to determine β̂0 and β̂1 ,
the estimates from the data (Note the hats!):
d
N
X
N −1 (yi − β̂0 − β̂1 xi ) = 0 (20)
i=1
N
X
N −1 xi (yi − β̂0 − β̂1 xi ) = 0 (21)
i=1
I These are the “OLS normal equations", two linear equations in the two
unknowns β̂0 and β̂1 .
Deriving the Ordinary Least Squares Estimates
N
X
N −1 (yi − β̂0 − β̂1 xi ) = 0 (22)
i=1
N
X
N −1 xi (yi − β̂0 − β̂1 xi ) = 0 (23)
i=1
Pn
I Solve (22) to get β̂0 = ȳ − β̂1 x̄, for the sample averages ȳ = n−1 i=1 yi
and x̄ = n−1 ni=1 xi
P
n
" n
#
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (25)
i=1 i=1
Deriving the Ordinary Least Squares Estimates
n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (26)
i=1 i=1
Deriving the Ordinary Least Squares Estimates
n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (26)
i=1 i=1
I Remember:
Pn
i=1 (xi − x̄) = 0
Pn Pn Pn
i=1 xi (yi − ȳ ) = i=1 (xi − x̄)(yi − ȳ ) = i=1 (xi − x̄)yi
Pn Pn 2
i=1 xi (xi − x̄) = i=1 (xi − x̄)
Deriving the Ordinary Least Squares Estimates
n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (26)
i=1 i=1
I Remember:
Pn
i=1 (xi − x̄) = 0
Pn Pn Pn
i=1 xi (yi − ȳ ) = i=1 (xi − x̄)(yi − ȳ ) = i=1 (xi − x̄)yi
Pn Pn 2
i=1 xi (xi − x̄) = i=1 (xi − x̄)
n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (26)
i=1 i=1
I Remember:
Pn
i=1 (xi − x̄) = 0
Pn Pn Pn
i=1 xi (yi − ȳ ) = i=1 (xi − x̄)(yi − ȳ ) = i=1 (xi − x̄)yi
Pn Pn 2
i=1 xi (xi − x̄) = i=1 (xi − x̄)
Pn
I If i=1 (xi − x̄)2 > 0, we can write
Pn
i=1 (xi − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
β̂1 = Pn 2
= (28)
i=1 (xi − x̄) Sample Variance(xi )
Deriving the Ordinary Least Squares Estimates
Pn
i=1 (xi − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
β̂1 = Pn 2
= (29)
i=1 (xi − x̄) Sample Variance(xi )
I This formula for β̂1 shows us how to take the data we have and compute
the slope estimate. For reasons we will see, β̂1 is called the ordinary least
squares (OLS) slope estimate. We often refer to it as the slope estimate.
I It can be computed whenever the sample variance of the xi is not zero,
which only rules out the case where each xi is the same value. In other
words, we do not have to assume anything about the population to
calculate β̂1 .
I Once we have β̂1 , we compute β̂0 = ȳ − β̂1 x̄. This is the OLS intercept
estimate.
I These days, one lets a computer do the calculations, which can be tedious
even if n is small.
Why “Ordinary Least Squares Estimates"
I For any candidates β̂0 and β̂1 , define a fitted value for each data point i as
We have n of these. It is the value we predict for yi given that x has taken
on the value xi .
I The mistake we make is the residual:
I For any candidates β̂0 and β̂1 , define a fitted value for each data point i as
n
X n
X
ûi2 = (yi − β̂0 − β̂1 xi )2 (34)
i=1 i=1
β̂0 = ȳ − β̂1 x̄
n
X
FOCβ1 2(yi − β̂0 − β̂1 xi )(−xi ) = 0
i=1
n
X n
X n
X
xi yi − β̂0 xi − β̂1 xi2 = 0
i=1 i=1 i=1
n
" n
#
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄)
i=1 i=1
Pn
i=1 (xi − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
β̂1 = Pn 2
= (37)
i=1 (xi − x̄) Sample Variance(xi )
I Once we have the numbers β̂0 and β̂1 for a given data set, we write the
OLS regression line as a function of x:
I Data are from 1991 on men only. wage is reported in dollars per hour,
educ is number of completed years of schooling.
I The estimated equation is
[ = −5.12 + 1.43educ
wage (40)
n = 759 (41)
I Below we discuss the negative intercept. Literally, it says that wage is
predicted to be −$5.12 when educ = 0!
I Each additional year of schooling is estimated to be worth $1.43.
EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)
[ = −5.12 + 1.43educ
wage (42)
n = 759 (43)
I The function
[ = −5.12 + 1.43educ
wage (44)
we get the OLS fitted values by plugging the xi into the equation:
n
X
ûi = 0 (50)
i=1
n
X n
X n
X
n−1 yi = n−1 ŷi + n−1 ûi
i=1 i=1 i=1
and so
ȳ = ŷ (51)
So the sample average of the actual yi is the same as the sample average
of the fitted values.
Algebraic Properties of OLS Statistics 2
PN
From the second OLS normal equation (23) N −1 i=1 xi (yi − β̂0 − β̂1 xi ) = 0
4. The sample covariance (and therefore the sample correlation) between the
explanatory variables and the residuals is always zero:
n
X
xi ûi = 0 (52)
i=1
5. Because the ŷi are linear functions of the xi , the fitted values and residuals
are uncorrelated, too:
n
X n
X n
X n
X
ŷi ûi = β̂0 + β̂1 xi ûi == β̂0 ûi + β̂1 xi ûi = 0 (53)
i=1 i=1 i=1 i=1
I Property (50) to (53) hold by construction. β̂0 and β̂1 were chosen to
make them true.
Algebraic Properties of OLS Statistics 3: ȳ = β̂0 + β̂1 x̄
I From (22), the point (x̄, ȳ ) is always on the OLS regression line. That is,
if we plug in the average for x, we predict the sample average for y :
N
X
N −1 (yi − β̂0 − β̂1 xi ) = 0
i=1
I From (22), the point (x̄, ȳ ) is always on the OLS regression line. That is,
if we plug in the average for x, we predict the sample average for y :
N
X
N −1 (yi − β̂0 − β̂1 xi ) = 0
i=1
β̂0 = ȳ − β̂1 x̄
yi = β̂0 + β̂1 xi
yi − ȳ = β̂1 (xi − x̄) + ûi (55)
Sometimes this helps to interpret β̂1 . Also means we can estimate β1 from
1
PN
N i−1 (xi − x̄)(yi − ȳ − β̂ 1 (xi − x̄)) = 0.
2. Goodness-of-Fit
2. Goodness-of-Fit
and using that the fitted values and residuals are uncorrelated
SSE SSR
R2 = =1− (62)
SST SST
I Called the R-squared of the regression.
I It can be shown to equal the square of the correlation between yi and ŷi .
Therefore,
0 ≤ R2 ≤ 1 (63)
Goodness-of-Fit
\ = 963.191 + 18.501roe
salary (64)
n = 209, R 2 = .0132 (65)
Units of Measurement
\ = 963.191 + 18.501roe
salary (64)
n = 209, R 2 = .0132 (65)
\ = 963.191 + 1850.1roedec
salary (67)
n = 209, R 2 = .0132 (68)
I Now a one percentage point change in roe is the same as ∆roedec = .01,
and so we get the same effect as before.
Units of Measurement
\ = 963191 + 18501roe
salarydol (69)
n = 209, R 2 = .0132 (70)
Using the Natural Logarithm in Simple Regression
[ = −5.12 + 1.43educ
wage (71)
n = 759, R 2 = .133 (72)
I Might be an okay approximation, but unsatisfying for a couple of reasons.
First, the negative intercept is a bit strange (even though the equation
gives sensible predictions for education ranging from 8 to 20).
I Second reason is more important: the dollar value of another year of
schooling is constant. So the 16th year of education is worth the same as
the second. We expect additional years of schooling to be worth more, in
dollar terms, than previous years.
I How can we incorporate an increasing effect? One way is to postulate a
constant percentage effect. We can approximate percentage changes using
the natural log (log for me, but also ln is common).
Using the Natural Logarithm in Simple Regression
Holding u fixed,
so
∆ log(wage)
β1 = (75)
∆educ
Using the Natural Logarithm in Simple Regression
∆ log(wage)
β1 = (76)
∆educ
∆ log(wage)
β1 = (76)
∆educ
∆ log(wage)
β1 = (76)
∆educ
∆ log(wage)
β1 = (76)
∆educ
∆ log(wage)
β1 = (76)
∆educ
I In this example, 100β1 is often called the return to education (just like an
investment).
I This measure is free of units of measurement of wage (currency, price
level).
Using the Natural Logarithm in Simple Regression
I We still want to explain wage in terms of educ! This gives us a way to get
a (roughly) constant percentage effect.
I Predicting wage is more complicated but not usually the most interesting
question.
I The next graph shows the relationship between wage and educ when
u = 0.
Using the Natural Logarithm in Simple Regression
I We can also use the log on both sides of the equation to get constant
elasticity models. For example, if
then
y = β0 + β1 x + u (81)
I We saw how we needed to assume E (u|x) = 0 to identify β0 , β1 from
observations on x, y .
I Alternatively, we could have assumed Cov (x, u) = 0, or independence of
x, u.
I We derived the OLS estimator of β0
Pn
i=1 (xi − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
β̂1 = Pn 2
= (82)
(x
i=1 i − x̄) Sample Variance(xi )