Lec Topic3
Lec Topic3
Fall, 2024
1 / 51
Motivation for Multiple Regression
2 / 51
Motivation for Multiple Regression
y = β0 + β1 x1 + · · · + βk xk + u,
3 / 51
Example: Wage Equation
▶ Suppose
wage = β0 + β1 educ + β2 exper + u
4 / 51
Example: Family Income and Family Consumption
▶ Suppose
cons = β0 + β1 inc + β2 inc 2 + u,
∂cons
= β1 + 2β2 inc,
∂inc
which depends on how much income is already there.
5 / 51
Mechanics and Interpretation of
Ordinary Least Squares
6 / 51
Obtaining the OLS Estimates
7 / 51
Obtaining the OLS Estimates
8 / 51
Interpreting the OLS Regression Equation
▶ In the MLR model y = β0 + β1 x1 + · · · + βk xk + u:
∂y
βj = ,
∂xj
which means “by how much does the dependent variable change if
the j-th independent variable is increased by one unit, holding all
other independent variables and the error term constant”.
▶ The multiple linear regression model manages to hold the values of
other explanatory variables fixed even if, in reality, they are
correlated with the explanatory variable under consideration.
▶ dy ∂u
dxj |(x1 ,··· ,xj−1 ,xj+1 ,··· ,xk ) = βj + ∂xj .
9 / 51
Example: Determinants of College GPA
10 / 51
A ”Partialling Out“ Interpretation of Multiple Regression
▶ One can show that the estimated coefficient of an explanatory
variable in a multiple regression can be obtained in two steps:
1. Regress the explanatory variable on all other explanatory
variables.
2. Regress y on the residuals from this regression.
11 / 51
A ”Partialling Out“ Interpretation of Multiple Regression
Pn
▶ The FOC w.r.t β̂1 is i=1 xi1 (yi − β̂0 − β̂1 xi1 − β̂k xi2 ) = 0
n
X
=⇒ (δ̂0 + δ̂1 xi2 + r̂i1 )(yi − β̂0 − β̂1 xi1 − β̂k xi2 )
i=1
n
X n
X n
X
= δ̂0 ûi + δ̂1 xi2 ûi + r̂i1 (yi − β̂0 − β̂1 xi1 − β̂2 xi2 )
i=1 i=1 i=1
n
X n
X n
X
= −β̂0 r̂i1 − β̂2 xi2 r̂i1 + r̂i1 [yi − β̂1 (x̂i1 + r̂i1 )]
i=1 i=1 i=1
n
X
= r̂i1 (yi − β̂1 r̂i1 ) = 0
i=1
Pn Pn
where i=1 ûi = 0 and i=1 xi2 ûi = 0 are obtained from the FOCs
w.r.t β̂0 and β̂2 , respectively.
12 / 51
Why does This Procedure Work?
▶ The residuals from the first regression is the part of the explanatory
variable that is uncorrelated with the other explanatory variables.
▶ The slope coefficient of the second regression therefore represents
the isolated (or pure) effect of the explanatory variable on the
dependent variable.
▶ Recall that in the SLR,
Pn
(xi − x̄ )yi
β̂1 = Pi=1
n 2
.
i=1 (xi − x̄ )
13 / 51
Properties of OLS on Any Sample of Data
▶ These properties are corollaries of the FOCs for the OLS estimates.
14 / 51
Goodness-of-Fit
▶ R-squared:
SSE SSR
R2 = =1− .
SST SST
▶ Alternative expression for R-squared [proof not required]:
Pn 2
( yi − ȳ )(ŷi − ŷ¯ ) d (y , ŷ )2
Cov
R 2 = Pn i=1 Pn = d (y , ŷ )2
= Corr
(y − ȳ ) 2 (ŷ − ¯
ŷ ) 2
Var (y )Var (ŷ )
i=1 i i=1 i
d d
15 / 51
R 2 Cannot Decrease When One More Regressor Is Added
▶ SSR with k and k + 1 regressors,
n
X
SSRk = min (yi − β̂0 − β̂1 xi1 − · · · − β̂k xik )2
β̂0 ,β̂1 ,··· ,β̂k i=1
n
X
SSRk+1 = min (yi − β̂0 − β̂1 xi1 −· · ·− β̂k xik − β̂k+1 xik+1 )2
β̂0 ,β̂1 ,··· ,β̂k ,β̂k+1 i=1
16 / 51
Example: Explaining Arrest Records
▶ The fitted regression line is
narr
\ 86 = 0.712 − 0.150pcnv − 0.034ptime86 − 0.104qemp86
n = 2, 725, R 2 = 0.0413
17 / 51
Example: Explaining Arrest Records
narr
\ 86 = 0.707 − 0.151pcnv + 0.007avgsen − 0.037ptime86 − 0.103qemp86
n = 2, 725, R 2 = 0.0422 (increases only slightly)
18 / 51
The Expected Value of the OLS Estimators
19 / 51
Standard Assumptions for the MLR Model
y = β0 + β1 x1 + · · · + βk xk + u.
yi = β0 + β1 xi1 + · · · + βk xik + ui .
20 / 51
Standard Assumptions for the MLR Model
– Constant variables are ruled out (collinear with the regressor 1).
▶ This is an extension of ni=1 (xi − x̄ )2 > 0 in the SLR models. Why?
P
21 / 51
Example for Perfect Collinearity
22 / 51
Standard Assumptions for the MLR Model
▶ Assumption MLR.4 (Zero Conditional Mean):
E [u|x1 , x2 , · · · , xk ] = 0
23 / 51
Unbiasedness of OLS
▶ Explanatory variables that are correlated with u are called
endogenous variables; endogeneity is a violation of assumption
MLR.4.
▶ Explanatory variables that are uncorrelated with u are called
exogenous variables; MLR.4 holds if all explanatory variables are
exogenous.
▶ Exogeneity is the key assumption for a causal interpretation of the
regression, and for unbiasedness of the OLS estimators.
▶ Theorem (Unbiasedness of OLS): Under assumptions
MLR.1-MLR.4,
E [β̂j ] = βj , j = 0, 1, · · · , k,
for any values of population parameter βj .
▶ Unbiasedness is an average property in repeated samples; in a given
sample, the estimates may still be far away from the true values.
24 / 51
Including Irrelevant Variables in a Regression Model
▶ Suppose
y = β0 + β1 x1 + β2 x2 + β3 x3 + u,
where β3 = 0, i.e., x3 is irrelevant to y .
▶ No problem because E [β̂3 ] = β3 = 0.
25 / 51
Omitted Variable Bias: the Simple Case
y = β0 + β1 x1 + β2 x2 + u,
y = α0 + α1 x1 + ε.
So, x2 is omitted.
▶ If x1 and x2 are correlated. Assume a linear regression relationship:
x2 = δ0 + δ1 x1 + v .
26 / 51
Omitted Variable Bias: the Simple Case
▶ Then,
y = β0 + β1 x1 + β2 (δ0 + δ1 x1 + v ) + u
= β0 + β2 δ0 + (β1 + β2 δ1 ) x1 + (u + β2 v ) .
| {z } | {z } | {z }
α0 α1 ε
E [α̂0 ] = β0 + β2 δ0
E [α̂1 ] = β1 + β2 δ1
Why? The new error term ε = u + β2 v satisfies the zero conditional
mean assumption: E [u + β2 v |x1 ] = E [u|x1 ] + β2 E [v |x1 ] = 0.
▶ Obviously, if β2 δ1 = 0, α̂1 is an unbiased estimator of β1 .
27 / 51
bias is determined by the sizes of b2 and δ 1.
In practice, since b2 is an unknown population parameter, we cannot be certain
Omitted Variable Bias: the Simple Case
whether b2 is positive or negative. Nevertheless, we usually have a pretty good idea about
the direction of the partial effect of x2 on y. Further, even though the sign of the correlation
between x1 and x2 cannot be known if x2 is not observed, in many cases, we can make an
educated guess about whether x1 and x2 are positively or negatively correlated.
In the wage equation (3.42), by definition, more ability leads to higher productivity
and therefore higher wages: b2 0. Also, there are reasons to believe that educ and abil
are▶positively
Summary correlated:
of biasoninaverage,
α̂1 when individuals with more
x2 is omitted ininnate ability choose
estimating higher
equation.
~
T a b l e 3 . 2 Summary of Bias in b1 when x2 Is Omitted in Estimating Eqution (3.40)
age Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial rev
d that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
28 / 51
Example: Omitting Ability in a Wage Equation
wage = α0 + α1 educ + ε.
▶ Suppose
abil = δ0 + δ1 educ + v , where δ1 > 0.
▶ The return to education β1 will be overestimated because β2 δ1 > 0.
It will look as if people with many years of education earn very high
wages, but this is partly due to the fact that people with more
education are also more able on average.
29 / 51
Omitted Variable Bias: More General Cases
=⇒ E [α̂j ] = βj + βk δj .
▶ α̂j is an unbiased estimator for βj only if βk = 0 or δj = 0.
30 / 51
Exercise
31 / 51
Solution
32 / 51
The Variance of the OLS Estimators
33 / 51
Standard Assumptions for the MLR Model
Var [u|x1 , · · · , xk ] = σ 2
34 / 51
Sampling Variances of the OLS Slope Estimators
▶ Theorem (Sampling Variances of the OLS Slope Estimators):
Under assumptions MLR.1-MLR.5,
σ2
Var (β̂j ) = , j = 1, · · · , k,
SSTj (1 − Rj2 )
35 / 51
The Components of OLS Variances
▶ The error variance, σ 2 :
1 Pn 2
– SSTj = n i=1 (xij − x̄j ) = nVar
d (xj ); Var
d (xj ) tends to be
n
stable.
– Increasing the sample size n is thus a way to get more precise
estimates.
36 / 51
The Components of OLS Variances
37 / 51
An Example for Multicollinearity
▶ Consider the following MLR model,
38 / 51
Discussion of the Multicollinearity Problem
1
VIFj = .
1 − Rj2
39 / 51
96 Part 1 Regression Analysis with Cross-Sectional Data
Var( ˆ 1)
ˆj is in
will see in Chapter 4, for statistical inference, what ultimately matters is how big b 40 / 51
Variances in Misspecified Models
y = β0 + β1 x1 + β2 x2 + u,
and model 2 is
ỹ = β̃0 + β̃1 x1 .
▶ It might be the case that the likely omitted variable bias of β̃1 in the
misspecified model 2 is overcompensated by a smaller variance.
41 / 51
Variances in Misspecified Models
42 / 51
Estimating the Error Variance
43 / 51
Estimating the Error Variance
44 / 51
Exercise
45 / 51
Solution
46 / 51
Efficiency of OLS: The Gauss-Markov Theorem
47 / 51
Efficiency of OLS
48 / 51
Efficiency of OLS
i.e.,
xi − x̄ xi − x̄
wi1 = Pn 2
= ,
(x
i=1 i − x̄ ) SSTx
which is a function of {xi : i = 1, · · · , n}. (How about β̂j in MLR?)
49 / 51
The Gauss-Markov Theorem
50 / 51
Gauss-Markov Theorem
The Gauss-Markov
OLS is Theorem
efficient in the class of unbiased, linear estimators.
All estimators
unbiased
linear