EC212: Introduction To Econometrics Multiple Regression: Estimation (Wooldridge, Ch. 3)
EC212: Introduction To Econometrics Multiple Regression: Estimation (Wooldridge, Ch. 3)
Tatiana Komarova
Summer 2021
1
1. Motivation for multiple regression
(Wooldridge, Ch. 3.1)
2
Example: Wage equation
log(wage) = β0 + β1 educ + β2 IQ + u
where IQ is IQ score
3
Model with two regressors
y = β0 + β1 x1 + β2 x2 + u
4
Back to example
5
Model with k regressors
y = β0 + β1 x1 + · · · + βk xk + u
• Key assumption is
E (u|x1 , . . . , xk ) = 0
6
• Multiple regression allows to incorporate different factors to
explain behavior of y
7
• We already know that 100 · β1 is percent change in wage
when educ increases by one year. 100 · β2 has similar
interpretation (for one point increase in IQ)
∂ log(wage)
= β3 + 2β4 exper
∂exper
• Multiply by 100 to get percentage effect
8
2. Mechanics and interpretation of
OLS
(Wooldridge, Ch. 3.2)
9
OLS for multiple regression
10
• As in simple regression case, different ways to derive OLS
estimator. We choose β̂0 , β̂1 , and β̂2 (so three unknowns) to
minimize sum of squared residuals
n
X
(yi − β̂0 − β̂1 xi1 − β̂2 xi2 )2
i=1
12
• What if we “hold x2 fixed”? Then
∆ŷ
β̂1 = if ∆x2 = 0
∆x1
• Similarly
∆ŷ
β̂2 = if ∆x1 = 0
∆x2
• We call β̂1 and β̂2 partial effects
13
Example: Regress log(wage) on educ (WAGE2.dta)
14
Multiple regression: log(wage) on educ and IQ
. reg lwage educ IQ
. corr educ IQ
(obs=935)
educ IQ
educ 1.0000
IQ 0.5157 1.0000
15
• Results
\
log(wage) = 5.973 + 0.060 educ
\
log(wage) = 5.658 + 0.039 educ + 0.0059 IQ
16
• Not surprisingly, there is nontrivial positive correlation
between educ and IQ: Corr (educi , IQi ) = 0.516
17
Fitted values and residuals
and residual is
ûi = yi − ŷi
18
Algebraic properties
19
Goodness-of-fit
• As with simple regression, it can be shown that
where SST , SSE and SSR are total, explained and residual
sum of squares
SSE SSR
R2 = =1−
SST SST
• Property: It holds 0 ≤ R 2 ≤ 1, but using same dependent
variable, R 2 never falls when another regressor is added
to regression (adding another x cannot increase SSR)
[SSR/(n − k − 1)]
R̄ 2 = 1 −
[SST /(n − 1)]
• When more regressors are added, SSR falls, but so does
df = n − k − 1. R̄ 2 can increase or decrease
21
Compare simple and multiple regression estimates
ỹ = β̃0 + β̃1 x1
ŷ = β̂0 + β̂1 x1 + β̂2 x2
where tilde (∼) denotes simple regression and hat (∧) denotes
multiple regression (by same data)
22
• Yes, but we need to define another simple regression. Let δ̃1
be the slope from regression
xi2 on xi1
23
Case 1: β̂2 > 0, x1 & x2 are positively correlated
24
Example: log(wage) = β0 + β1 educ + β2 IQ + u
. reg IQ educ
25
Case 2: β̂2 > 0, x1 & x2 are negatively correlated
26
Regression through the origin
ỹ = β̃1 x1 + · · · + β̃k xk
27
3. Expected value of OLS estimators
(Wooldridge, Ch. 3.3)
28
Statistical properties of OLS
29
Assumptions MLR (Multiple Linear Regression)
• Assumption MLR.1 (Linear in parameters) In population,
it holds
y = β0 + β1 x1 + · · · + βk xk + u
where βj ’s are parameters and u is error term
In population, it holds
y = β0 + β1 x1 + · · · + βk xk + u
31
Assumption MLR.2
yi = β0 + β1 xi1 + · · · + βk xik + ui
for i = 1, . . . , n
32
Assumption MLR.3
33
Perfect collinearity
34
Example: Same variable in different units
35
• Also be careful with functional forms
36
One more example
• Consider
• In equation like
38
Assumption MLR.4
39
Example: Effects of class size on student performance
40
Theorem: Unbiasedness of OLS
E (β̂j ) = βj
for each j = 0, 1, . . . , k
41
Inclusion of irrelevant variables
• Consider
42
• We automatically know from unbiasedness result that
E (β̂j ) = βj for j = 0, 1, 2
E (β̂3 ) = 0
43
Omitted variable bias (OVB)
y = β0 + β1 x1 + β2 x2 + u
ỹ = β̃0 + β̃1 x1
44
Derivation of OVB
x2 on x1
45
• Now use the fact that β̂1 and β̂2 are unbiased conditional on X
E (β̂1 ) = β1
E (β̂2 ) = β2
• Therefore, conditional on X
46
When does β̃1 happen to be unbiased?
• We do not know β2 and only have vague idea about size of δ̃1 .
But we can often guess sign of bias
47
Bias in simple regression estimator of β1
• Sign of bias
Corr (x1 , x2 ) > 0 Corr (x1 , x2 ) < 0
β2 > 0 Positive Bias Negative Bias
β2 < 0 Negative Bias Positive Bias
48
Example: Omitted ability bias
• Consider
49
• In this scenario
E (β̃1 ) = β1 + β2 δ̃1
= β1 + (+)(+) > β1
50
Example: Effects of tutoring program on student
performance
• Consider
GPA = β0 + β1 tutor + β2 abil + u
where tutor is hours spent in tutoring.
• Again β2 > 0. Suppose that students with lower ability tend
to use more tutoring
• In this scenario,
E (β̃1 ) = β1 + β2 δ̃1
= β1 + (+)(−) < β1
52
Assumptions so far
• MLR.1: y = β0 + β1 x1 + · · · + βk xk + u
• MLR.4: E (u|x1 , . . . , xk ) = 0
53
Assumption MLR.5
• MLR.1-4 imply
E (y |x1 , . . . , xk ) = β0 + β1 x1 + · · · + βk xk
54
Example: Savings equation
55
Formula for Var (β̂j )
• Focus on slope (different formula is needed for intercept)
σ2
Var (β̂j ) =
SSTj (1 − Rj2 )
for j = 1, . . . , k, where
n
X
SSTj = (xij − x̄j )2
i=1
Rj2 = R 2 of regression from xj on other regressors
57
Remark on theorem
58
Remark on variance formula
• Variance formula
σ2
Var (β̂j ) =
SSTj (1 − Rj2 )
59
Effect of SSTj
SSTj ≈ nσj2
60
Effect of Rj2
σ2
Var (β̂j ) =
SSTj
61
Multicollinearity
62
• In fact, formula is doing its job: It shows that if Rj2 is “close”
to one, Var (β̂j ) might be very large
63
• Value of Rj2 per se is not important. Ultimately what
important is Var (β̂j )
• For Var (β̂j ), large Rj2 can be offset by large SSTj , which
grows roughly linearly with sample size n
64
Correlation among control variables
• Consider
y = β0 + β1 x1 + β2 x2 + β3 x3 + u
where β1 is coefficient of interest. Assume x2 and x3 act as
controls so that we hope to get good ceteris paribus estimate
of x1 . Such controls are often highly correlated. (E.g. x2 and
x3 are different test scores)
65
Example
percapproved = β0 + β1 percminority
+β2 avginc + β3 avghouseval + u,
where β1 is of interest
66
Variance in misspecified models
• Consider
y = β0 + β1 x1 + β2 x2 + u
where Assumptions MLR.1-5 hold true
ỹ = β̃0 + β̃1 x1
ŷ = β̂0 + β̂1 x1 + β̂2 x2
67
• From previous analysis, we know: conditional on X,
σ2
Var (β̂1 ) =
SST1 (1 − R12 )
σ2 σ2
Var (β̃1 ) = < = Var (β̂1 )
SST1 SST1 (1 − R12 )
68
Two cases: y = β0 + β1 x1 + β2 x2 + u
• (1) If β2 6= 0, then
β̃1 is biased, β̂1 is unbiased, but Var (β̃1 ) < Var (β̂1 )
• (2) If β2 = 0, then
β̃1 and β̂1 are both unbiased and Var (β̃1 ) < Var (β̂1 )
69
Case 1
70
Estimation of σ 2 and Standard error
• We still need to estimate σ 2 = Var (u). For multiple
regression, its unbiased estimator is
n
2 1 X SSR
σ̂ = ûi2 =
n−k −1 n−k −1
i=1
σ̂
se(β̂j ) = q
SSTj (1 − Rj2 )
72
5. Efficiency of OLS: Gauss-Markov
theorem
(Wooldridge, Ch. 3.5)
73
Efficiency of OLS
• How come we use OLS β̂j , rather than some other estimation
method, say β̌j ?
74
Gauss-Markov theorem
75
• G-M theorem says: under MLR.1-5, if we take any linear
unbiased estimator β̃j , then conditional on X
for j = 0, . . . , k
76