Block 4
Block 4
Instrumental variables
1 Not in the main (structural) equation: no effect on the
dependent variable after controlling for observed regressors.
2 Correlated (positively or negatively) with the endogenous
regressor (this can be tested).
3 Not correlated with the error term (in some cases, this can
be tested, see Sargan test discussed next).
y ← x
dy cov(y, x)
- and = β1 =
dx var(x)
u
y ← x
dy du
- | and = β1 +
dx dx
u
y ← x ← z
cov(z, y)
- | and β1 =
cov(z, x)
u
yi = xi β + ui IVR in MLRMs:
β̂OLS = (X 0 X)−1 X 0 y
β̂IV = (Z 0 X)−1 Z 0 y
Exogenous regressors:
n
n−1
X
(yi1 − β̂0 − β̂1 yi2 − β̂2 xi2 − · · · − β̂k xik ) = 0
i=1
n
n−1
X
zi1 · (yi1 − β̂0 − β̂1 yi2 − β̂2 xi2 − · · · − β̂k xik ) = 0
i=1
n
n−1
X
xi2 · (yi1 − β̂0 − β̂1 yi2 − β̂2 xi2 − · · · − β̂k xik ) = 0
i=1
...
n
n−1
X
xik · (yi1 − β̂0 − β̂1 yi2 − β̂2 xi2 − · · · − β̂k xik ) = 0
i=1
In moment equations, yi2 is replaced by zi1
Exogenous regressors serve as their own instruments.
IVR estimator is consistent
h i
1 0
If consistency condition holds: plim nZ u = 0,
β̂IV is consistent.
yi1 = β0 +β1 yi2 +β2 xi2 +· · ·+βk xik +ui | z1 , z2 , z3 are IVs for y2
X̂ = Z(Z 0 Z)−1 Z 0 X,
σu
plimβ̂1,OLS = β1 + corr(x, u) ·
σx
corr(z, u) σu
plimβ̂1,IV = β1 + ·
corr(z, x) σx
If y2 is endogenous ⇔ corr(y2 , u) 6= 0
Reduced form: y2 = l.f.(x1 , z1 ) + ε ⇒ y2 = ŷ2 + ε̂
corr(y2 , u) 6= 0 ∧ corr(ŷ2 , u) = 0 ⇒ corr(ε̂, u) 6= 0
y1 is always correlated with u.
Hence, ε̂ is significant in an auxiliary regression
yi1 = β0 + β1 yi2 + β2 xi1 + δ ε̂i + ui ,
if y2 is an endogenous regressor.
IV/IVs being uncorrelated with u is an essential condition for
DWH test to “work”.
(use HC inference).
3 H is rejected if ε̂ in the modified equation (1) is
0
statistically significant (t-test).
Weak instruments
Structural equation:
y1 = β0 + β1 y2 + β2 x1 + · · · + βk+1 xk + u; IVs: z1 , z2 , . . . , zm
y2 = π0 + π1 x1 + π2 x2 + · · · + πk xk + θ1 z1 + · · · + θm zm + ε
H0 : θ1 = θ2 = · · · = θ m = 0
interpretation: “instruments are weak”.
H1 : ¬ H0
Testing algorithm:
1 Estimate equation (3) using IVR and save the û residuals.
2 Use OLS to estimate auxiliary regression: û ← f (x, z) and
save the Ra2
3 Under H0 : nRa2 ∼ χ2q where
q = (number of IVs) - (number of endogenous regressors)
i.e. q is the number of over-identifying variables.
4 If the observed test statistic exceeds its critical value
(at a given significance level), we reject H0 .
IVR diagnostic tests: example
Wooldridge, bwght dataset
R code, {AER} package
Call :
i v r e g ( formula = lbwght ~ packs + male | f aminc + motheduc + male ,
d a t a = bwght )
Residuals :
Min 1Q Median 3Q Max IVs
−1.66291 −0.09793 0.01717 0.11616 0.82793 Regressors
explicitly included
Coefficients : in equation
E s t i m a t e Std . E r r o r t v a l u e Pr ( >| t | )
( Intercept ) 4.77419 0 . 0 1 0 9 9 4 3 4 . 4 7 8 < 2 e −16 ∗∗∗
packs −0.25584 0.07613 −3.361 0 . 0 0 0 7 9 8 ∗∗∗
male 0.02422 0.01048 2.311 0.021003 ∗
SEM: outline
SEM: identification
Identification conditions
hs = α1 w + β1 z1 + u1
hd = α2 w + β2 z2 + u2
Endogenous variables, exogenous variables,
observed and unobserved supply shifter,
observed and unobserved demand shifter
hi = α1 wi + β1 zi1 + ui1
hi = α2 wi + β2 zi2 + ui2
y2 is dependent on u1
(substitute RHS of the 1st equation for y1 in the 2nd eq.)
α2 β1 β2 α2 u1 +u2
⇒ y2 = 1−α2 α1 z1 + 1−α2 α1 z2 + 1−α2 α1
Structural and reduced form equations, 2SLS method
Structural equations (example)
y1 = β10 + β11 y2 + β12 z1 + u1
y2 = β20 + β21 y1 + β22 z2 + u2
Example 3: (Identification)
Identification problem in a SEM
Ilustration
Identification conditions
Identification conditions for a sample 2-equation SEM
(individual i subscripts omitted)
Example 4: (Identification)
Labor supply of married working women
Supply (workers):
Demand (enterprises):
log(wage) =α2 hours + β20 + β21 educ + β22 exper + β23 exper2 + u2
Example 4: (Identification)
Labor supply of married working women contnd.
Ct = β0 + β1 (Yt − Tt ) + β2 rt + ut1
It = γ0 + γ1 rt + ut2
Yt ≡ Ct + It + Gt
Endogenous: Ct , It , Yt Exogenous: Tt , Gt , rt
Order condition for identification is the same as for
two-equation systems, rank condition is more complicated.
Complex models based on macroeconomic time series are
sometimes used. Problems with these models: series are
usually not weakly dependent, it is difficult to find enough
exogenous variables as instruments. Question is, if any
macroeconomic variables are exogenous at all.
Identification in SEMs with more than two equations
K − Ki ≥ Gi − 1