0% found this document useful (0 votes)
17 views51 pages

Block 4

Uploaded by

karolina.jindr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views51 pages

Block 4

Uploaded by

karolina.jindr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Block 4

Instrumental variable regression (IVR)


Two stage least squares (2SLS)
Simultaneous Equation Models

Advanced econometrics 1 4EK608


Pokročilá ekonometrie 1 4EK416

Vysoká škola ekonomická v Praze


Outline

1 Introduction & repetition from BSc courses


2 Instrumental variables
3 Two stage least squares
4 IVR diagnostic tests
Durbin-Wu-Hausman (endogeneity in regressors)
Weak instruments test
Sargan (exogeneity in IVs, over-identification only)
5 SEM: introduction
6 SEM identification
7 Identification conditions
8 Systems with more than two equations
Introduction: endogenous regressors

CS model: yi = xi β + ui and E[xi , ui ] 6= 0.


If important regressors cannot be measured (thus make part
of ui ) and are correlated with observed regressors of LRM.
Endogeneity can be caused by measurement errors.
Always present in simultaneous equations models (SEMs).
With endogenous regressors, OLS is biased & inconsistent.

Endogeneity in regressors can sometimes be solved


By means of proxy variables (if uncorrelated to ui ).
More detailed (multi-equation) specification, if possible.
Using panel data methods (data availability permitting).
Using instrumental variable regression (IVR)
(we need “good” instruments, assumptions apply).
Introduction: instrumental variables

Example: log(wage i ) = β0 + β1 educ i + [abil i + ui ]

Instrumental variables
1 Not in the main (structural) equation: no effect on the
dependent variable after controlling for observed regressors.
2 Correlated (positively or negatively) with the endogenous
regressor (this can be tested).
3 Not correlated with the error term (in some cases, this can
be tested, see Sargan test discussed next).

Possible IVs: father’s education, mother’s education,


number of siblings, etc.
Usually, IQ is not a good IV - it’s often correlated with
abil, i.e. with the error term [abil i + ui ].
Instrumental variables

yi = β0 + β1 xi + ui SLRM with exogenous regressor x:

y ← x
dy cov(y, x)
- and = β1 =
dx var(x)
u

yi = xi β + ui MLRM with exogenous regressor(s):

β̂ = (X 0 X)−1 X 0 y | subs. for y


β̂ = (X 0 X)−1 X 0 (Xβ + u) | rearr. & take expects.
0 −1 0
E[β̂] = β + E[(X X) X u] = β

With exogenous regressors, OLS is unbiased.


Instrumental variables

yi = β0 + β1 xi + ui SLRM with endogenous regressor x:

y ← x
dy du
- | and = β1 +
dx dx
u

yi = xi β + ui MLRM with endogenous regressor(s):

β̂ = (X 0 X)−1 X 0 y | subs. for y


0 −1 0
β̂ = (X X) X (Xβ + u) | rearr. & take expects.
E[β̂] = β + E[(X 0 X)−1 X 0 u] 6= β

With endogenous regressors, E[(X 0 X)−1 X 0 u] 6= 0.


Thus, OLS is biased (and asymptotically biased).
Instrumental variables
yi = β0 + β1 xi + ui IVR principle (SLRM):

y ← x ← z
cov(z, y)
- | and β1 =
cov(z, x)
u

yi = xi β + ui IVR in MLRMs:

β̂OLS = (X 0 X)−1 X 0 y
β̂IV = (Z 0 X)−1 Z 0 y

where Z is a matrix of instruments, same dimensions as X.

Exact identification: # endogenous regressors = # IVs,


Z follows from X, each endogenous regressor (column) is
replaced by unique instrument (full column ranks of X,Z),
in IVR, R2 has no interpretation (SST 6= SSE + SSR),
for IVR, we use specialized robust standard errors,
IVR estimator is biased and consistent.
Instrumental variables: IVR as MM estimator

Exogenous regressors:

MM: replace E[X 0 (y − Xβ)] = 0 by 1 0


n [X (y − X β̂)] = 0
and solve moment equations

OLS provides identical estimate: β̂OLS = (X 0 X)−1 X 0 y

With endogenous regressors (exact identification), moment


conditions change:

MM: replace E[Z 0 (y − Xβ)] = 0 by 1 0


n [Z (y − X β̂)] = 0
and solve moment equations

IVR provides identical estimate: β̂IV = (Z 0 X)−1 Z 0 y


Instrumental variables: IVR as MM estimator

yi1 = β0 + β1 yi2 + β2 xi2 + · · · + βk xik + ui | z1 is IV for y2

n
n−1
X
(yi1 − β̂0 − β̂1 yi2 − β̂2 xi2 − · · · − β̂k xik ) = 0
i=1
n
n−1
X
zi1 · (yi1 − β̂0 − β̂1 yi2 − β̂2 xi2 − · · · − β̂k xik ) = 0
i=1
n
n−1
X
xi2 · (yi1 − β̂0 − β̂1 yi2 − β̂2 xi2 − · · · − β̂k xik ) = 0
i=1
...
n
n−1
X
xik · (yi1 − β̂0 − β̂1 yi2 − β̂2 xi2 − · · · − β̂k xik ) = 0
i=1
In moment equations, yi2 is replaced by zi1
Exogenous regressors serve as their own instruments.
IVR estimator is consistent

β̂IV = (Z 0 X)−1 Z 0 y | subs. for y


0 −1 0
β̂IV = (Z X) Z (Xβ + u) | rearrange
β̂IV = β + (Z 0 X)−1 Z 0 u

h i
1 0
If consistency condition holds: plim nZ u = 0,
β̂IV is consistent.

This can be seen from expansion of [(Z 0 X)−1 Z 0 u]:

β̂IV = β + (n−1 Z 0 X)−1 n−1 Z 0 u


Instrumental variables: over-identification

yi1 = β0 +β1 yi2 +β2 xi2 +· · ·+βk xik +ui | z1 , z2 , z3 are IVs for y2

By choosing any of the z1 , z2 , z3 IVs


(or any linear combination of), we perform IVR
β̂ IV values change, as IV in moment equations changes.
We cannot “simply” use all three instruments.
If # columns in Z (l) > # columns in X (k),
Z 0 X is (l × k) with rank k and no inverse:
β̂IV = (Z 0 X)−1 Z 0 y cannot be calculated

Solution: Project X to the space column of Z (GMM).


(X has an endogenous column, Z is purely exogenous).
Instrumental variables: over-identification

Projection matrices (exogenous X) – repetition


ŷ = X β̂ = X(X 0 X)−1 X 0 y = P y
y = ŷ + û = P y + M y, where
M = I − X(X 0 X)−1 X 0 = I − P

Projection of columns of X in the column space of Z:


X̂ = Z(Z 0 Z)−1 Z 0 X = PZ X,
Columns of X̂ are linear combinations of columns in Z.
Exogenous columns in X are repeated in Z, hence
projected on themselves & therefore do not change between
X and Z.
General form of the IV estimator (over-identification):
β̂IV = (X̂ 0 X)−1 X̂ 0 y
Instrumental variables: over-identification

Projection of columns of X in the column space of Z:

X̂ = Z(Z 0 Z)−1 Z 0 X,

It may be shown that IVR is equivalent to OLS regression


y ← X̂:

β̂IV = (X̂ 0 X)−1 X̂ 0 y


= (X 0 (I − MZ )X)−1 X 0 (I − MZ )y
= (X̂ 0 X̂)−1 X̂ 0 y

y ← X̂ is part of a two-stage LS (2SLS) method,


(discussed next).
Instrumental variables: identification conditions

In y = Xβ + u, multiple xj regressors may be endogenous.

Identification (estimability) conditions:


Order condition: We need at least as many IVs (excluded
exogenous variables) as there are included endogenous
regressors in the main (structural) equation.
This is a necessary condition for identification.

Rank condition: X̂ = Z(Z 0 Z)−1 Z 0 X has full column


rank (k) so that (X̂ 0 X)−1 or (X̂ 0 X̂)−1 can be calculated in
the IV estimator β̂IV = (X̂ 0 X)−1 X̂ 0 y (will be discussed in
detail with respect to 2SLS method and for SEM models).
This is a necessary and sufficient condition for identification.
Instrumental variables: statistical properties

SLRM: yi1 = β0 + β1 xi1 + ui | xi1 endog., zi1 exists

Asymptotic variance of the IV estimator decreases with


increasing correlation between z and x.

IV-related routines & tests are implemented in R, . . .

Both endogenous explanatory variables and IVs can be


binary variables.

R2 can be negative and has no interpretation nor relevance


if IVR is used.
Instrumental variables: statistical properties

SLRM: yi1 = β0 + β1 xi1 + ui | xi1 endog., zi1 exists

In large samples, IV estimator has approximately normal


distribution (MM/GMM properties).

For calculation of standard errors, we usually need


assumption of homoscedasticity conditional on IV(s).
Alternatively, we calculate robust errors.

Asymptotic variance of the IV estimator is always higher


than of the OLS estimator.
σ̂ 2 σ̂ 2
var(β̂1,IV ) = 2
> var(β̂1,OLS ) =
SSTx · Rx,z SSTx
Instrumental variables: statistical properties

SLRM: yi1 = β0 + β1 xi1 + ui | xi1 endog., zi1 exists

If (small) correlation between u and instrument z is


possible, inconsistency in the IV estimator can be much
higher than in the OLS estimator:

σu
plimβ̂1,OLS = β1 + corr(x, u) ·
σx

corr(z, u) σu
plimβ̂1,IV = β1 + ·
corr(z, x) σx

Weak instrument: if correlation between z and x is small.


Instrumental variables: statistical properties

MLRM: y = Xβ + u | valid Z exists

IVR method is a “trick” for consistent estimation of the


ceteris paribus effects, i.e. β̂j,IV .

Fitted values are generated as ŷ = X β̂IV


(NOT from ŷ = X̂ β̂IV ).
1 n
Similarly: var(ûi ) = σ̂ 2 = n−k i=1 (yi − xi β̂IV )
2
P

d.f. correction is superfluous (asymptotic use only).

Asy.Var(β̂IV ) = σ̂ 2 (Z 0 X)−1 (Z 0 Z)(X 0 Z)−1


for the exactly identified & homoscedastic case.

With heteroscedasticity and/or over-identification, the


Asy.Var(β̂IV ) formula is complex and built into all SW
packages.
2SLS as a special case of IVR

β̂IV = (X̂ 0 X)−1 X̂ 0 y = (X̂ 0 X̂)−1 X̂ 0 y


2SLS:
Structural equation (as in SEMs)
y1 = β0 + β1 y2 + β2 x2 + · · · + βk xk + u | z1 exists

Reduced form for y2 – endogenous variable as function of


all exogenous variables (including IVs)
y2 = π0 + π1 z1 + π2 x2 + · · · + πk xk + ε
1st stage of 2SLS: Estimate reduced form by OLS
Order condition for identification of the structural equation:
at least one instrument for each endogenous regressor).
If z1 is an IV for y2 , its coefficient must not be zero (rank
condition for identification) in the reduced form equation -
see stage 2 of 2SLS.
2SLS as a special case of IVR

β̂IV = (X̂ 0 X)−1 X̂ 0 y = (X̂ 0 X̂)−1 X̂ 0 y


2SLS:
Structural equation
y1 = β0 + β1 y2 + β2 x2 + · · · + βk xk + u | z1 exists
1st stage of 2SLS: estimate reduced form for y2 :
ŷ2 = π̂0 + π̂1 z1 + π̂2 x2 + · · · + π̂k xk
2nd stage of 2SLS: Use ŷ2 to estimate structural equation:
y1 = β0 + β1 ŷ2 + β2 x2 + · · · + βk xk + u
Note that RHS in the 2nd stage contains all exogenous
regressors repeated from X, while ŷ2 is y2 “projected” onto
Z and thus uncorrelated with u.
Order condition fulfilled. Rank condition explained: if
π1 = 0, ŷ2 is a perfect linear combination of the remaining
RHS regressors in 2nd stage.
Instrumental variables

Instrumental variables: summary

Excluded from the main / structural equation


Must be correlated with endogenous regressor(s)
Must not be correlated with u

All IVs used in IVR / 2SLS estimation must fulfill the


conditions above.

In 2SLS, 1st stage is used to generate the “best” IV.


With multiple endogenous regressors, reduced forms for each
endogenous regressor must be constructed and estimated, rank
and order conditions apply.
Two stage least squares
2SLS properties

The standard errors from the OLS second stage regression


are biased and inconsistent estimators with respect to the
original structural equation (SW handles this problem
automatically).

If there is one endogenous variable and one instrument


then 2SLS = IVR

With multiple endogenous variables and/or multiple


instruments, 2SLS is a special case of IVR.
Example:
Consider MLRM with one endogenous regressor and 3 relevant IVs.
Choosing any IV (or any ad-hoc linear combination of IVs) results in
IVR (MM-type & consistent estimator). 2SLS (GMM-type approach)
provides the “best” IVR estimator – lowest variance in the 2nd stage
comes from best fit between IVs and endogenous regressor in 1st stage.
Two stage least squares

Statistical properties of the 2SLS/IV estimator

Under assumptions completely analogous to OLS, but


conditioning on zi rather than on xi , 2SLS/IV is
consistent and asymptotically normal.

2SLS/IV estimator is typically much less efficient than the


OLS estimator because there is more multicollinearity and
less explanatory variation in the second stage regression

Problem of multicollinearity is much more serious with


2SLS than with OLS
Two stage least squares

Statistical properties of the 2SLS/IV estimator

Corrections for heteroscedasticity/serial correlation


analogous to OLS

2SLS/IVR estimamtion easily extends to time series and


panel data situations
IVR diagnostic tests: introduction

LRM: yi1 = β0 + β1 yi2 + β2 xi1 + ui ; z instruments exist

IV regression advantages for endogenous y2 :


→ β̂1,OLS is a biased and inconsistent estimator
(asymptotic errors)
→ β̂1,IV is a biased and consistent estimator (increased
sample size (n) lowers estimator bias and s.e.)

IVR disadvantages (price for the IVR):


s.e.(β̂1,IV ) > s.e.(β̂1,OLS )
β̂1,IV is biased, even if y2 is actually exogenous
β̂1,OLS is unbiased for exogenous regressors
(potentially, pending other G-M conditions).
IVR diagnostic tests: introduction

LRM: yi1 = β0 + β1 yi2 + β2 xi1 + ui ; z instruments exist

Is the regressor y2 endogenous / corr(y2 , u) 6= 0 / ?


Is it meaningful to use IVR (considering IVRs “price”)?
Durbin-Wu-Hausman endogeneity test
Are the instruments actually helpful
(weakly or strongly correlated with endogenous regressors)?
Weak instruments test
Are the instruments really exogenous / corr(zj , u) = 0 / ?
Sargan test (only applicable in case of over-identification)
Different types & specifications for IV-tests exist, often focusing on
the distribution of the difference between IVR and OLS estimators
(β̂IV − β̂OLS ) under the corresponding H0 .
Durbin-Wu-Hausman endogeneity test

yi1 = β0 + β1 yi2 + β2 xi1 + ui | zi1 ,


DWH test motivation:
If z1 is a proper instrument (uncorrelated with u), then y2 is
endogenous (correlated with u) if and only if ε (error from reduced
form equation) is correlated with u.

If y2 is endogenous ⇔ corr(y2 , u) 6= 0
Reduced form: y2 = l.f.(x1 , z1 ) + ε ⇒ y2 = ŷ2 + ε̂
corr(y2 , u) 6= 0 ∧ corr(ŷ2 , u) = 0 ⇒ corr(ε̂, u) 6= 0
y1 is always correlated with u.
Hence, ε̂ is significant in an auxiliary regression
yi1 = β0 + β1 yi2 + β2 xi1 + δ ε̂i + ui ,
if y2 is an endogenous regressor.
IV/IVs being uncorrelated with u is an essential condition for
DWH test to “work”.

Note: other variants of the DWH test exist...


Durbin-Wu-Hausman endogeneity test
Structural equation:

yi1 = β0 + β1 yi2 + β2 xi1 + ui ; IVs: z1 and z2 (1)

Reduced form for y2 :

yi2 = π0 + π1 zi1 + π2 zi2 + π3 xi1 + εi (2)

H0 : y2 is exogenous ↔ ε̂ is not significant when added to


equation (1)
H1 : y2 is endogenous → OLS is not consistent for (1)
estimation, use IVR (2SLS).
Testing algorithm:
1 Estimate equation (2) and save residuals ε̂.

2 Add residuals ε̂ into equation (1) and estimate using OLS

(use HC inference).
3 H is rejected if ε̂ in the modified equation (1) is
0
statistically significant (t-test).
Weak instruments

Motivation for Weak instruments and Sargan tests:


SLRM: yi1 = β0 + β1 yi2 + ui ; z instrument exists

IVR is consistent if corr(z, y2 ) 6= 0 and corr(z, u) = 0


If we allow for (weak) correlation between z and u, the
asymptotic error of IV estimator is:
corr(z, u) σu
plim(β̂1,IV ) = β1 + ·
corr(z, y2 ) σy2

If corr(z, y2 ) is too weak (too close to zero in absolute value),


OLS may be better than IV. The asymptotic bias for OLS (LRM
with endogenous y2 ):
σu
plim(β̂1,OLS ) = β1 + corr(y2 , u) ·
σy2

Rule of thumb: IF |corr(z, y2 )| < |corr(y2 , u)|, do not use IVR.


Weak instruments

Structural equation:

y1 = β0 + β1 y2 + β2 x1 + · · · + βk+1 xk + u; IVs: z1 , z2 , . . . , zm

The reduced form for y2 :

y2 = π0 + π1 x1 + π2 x2 + · · · + πk xk + θ1 z1 + · · · + θm zm + ε

H0 : θ1 = θ2 = · · · = θ m = 0
interpretation: “instruments are weak”.
H1 : ¬ H0

Testing for weak instruments:


Use F -test (heteroscedasticity-robust).
Note: multiple testing approaches & exist.
Sargan test (over-identification only)
Structural equation:

yi1 = β0 + β1 yi2 + β2 xi1 + ui ; IVs: z1 , z2 , . . . (3)

H0 : all IVs are uncorrelated with u


H1 : at least one instrument is endogenous

Testing algorithm:
1 Estimate equation (3) using IVR and save the û residuals.
2 Use OLS to estimate auxiliary regression: û ← f (x, z) and
save the Ra2
3 Under H0 : nRa2 ∼ χ2q where
q = (number of IVs) - (number of endogenous regressors)
i.e. q is the number of over-identifying variables.
4 If the observed test statistic exceeds its critical value
(at a given significance level), we reject H0 .
IVR diagnostic tests: example
Wooldridge, bwght dataset
R code, {AER} package
Call :
i v r e g ( formula = lbwght ~ packs + male | f aminc + motheduc + male ,
d a t a = bwght )

Residuals :
Min 1Q Median 3Q Max IVs
−1.66291 −0.09793 0.01717 0.11616 0.82793 Regressors
explicitly included
Coefficients : in equation
E s t i m a t e Std . E r r o r t v a l u e Pr ( >| t | )
( Intercept ) 4.77419 0 . 0 1 0 9 9 4 3 4 . 4 7 8 < 2 e −16 ∗∗∗
packs −0.25584 0.07613 −3.361 0 . 0 0 0 7 9 8 ∗∗∗
male 0.02422 0.01048 2.311 0.021003 ∗

Diagnostic tests : XReject H0 :


df1 d f 2 s t a t i s t i c p−v a l u e IVs are weak
Weak i n s t r u m e n t s 2 1383 38.732 <2e −16 ∗∗∗
Wu−Hausman 1 1383 5.385 0.0205 ∗
Sargan 1 NA 4.476 0.0344 ∗ XReject H0 :
−−− pack are exogenous
S i g n i f . codes : 0 ∗∗∗ 0 . 0 0 1 ∗∗ 0 . 0 1 ∗ 0 . 0 5 . 0.1 (IVR adequate)

R e s i d u a l s t d . e r r o r : 0 . 1 9 5 on 1384 d . f . !! Reject H0 : all IVs


M u l t i p l e R−S q u a r e d : − 0 . 0 4 3 7 1 , Adj R−s q r : −0.04522 are uncorrelated with u
Wald t e s t : 8 . 3 4 2 on 2 and 1384 DF, p−v a l u e : 0 . 0 0 0 2 5 0 4 (!DWH assumptions!)
Simlultaneous equation model (SEM)

SEM: outline

SEM: identification

Identification conditions

SEMs with more than two equations


SEM: introduction

Simultaneity is another important form of endogeneity

Simultaneity occurs if at least two variables are jointly


determined. A typical case is when observed outcomes are the
result of separate behavioral mechanisms that are coordinated
in an equilibrium.
Prototypical case: a system of demand and supply equations:
D(p) how high would demand be if the price was set to p?
S(p) how high would supply be if the price was set to p?

Both mechanisms have a ceteris paribus interpretation.


Observed quantity and price will be determined in
equilibrium, where D(p) = S(p).

Simultaneous equations systems can be estimated by 2SLS/IVR


. . . Identification conditions apply.
SEM examples

Example 1: Labor supply and demand in agriculture

hs = α1 w + β1 z1 + u1
hd = α2 w + β2 z2 + u2
Endogenous variables, exogenous variables,
observed and unobserved supply shifter,
observed and unobserved demand shifter

We have n regions, market sets equilibrium price and


quantity in each. We observe the equilibrium values only

his = hid ⇒ (hi , wi )


SEM examples

Example 1: Labor supply and demand in agriculture contnd.

hi = α1 wi + β1 zi1 + ui1
hi = α2 wi + β2 zi2 + ui2

If we have the same exogenous variables in each equation,


we cannot identify (distinguish) equations.

We assume independence between errors in structural


equations & exogenous regressors.
SEM examples

Example 1: Labor supply and demand in agriculture contnd.

If we estimate the structural equation with OLS method,


estimators will be biased – so called “simultaneity bias”.
y1 = α1 y2 + β1 z1 + u1
y2 = α2 y1 + β2 z2 + u2

y2 is dependent on u1
(substitute RHS of the 1st equation for y1 in the 2nd eq.)
     
α2 β1 β2 α2 u1 +u2
⇒ y2 = 1−α2 α1 z1 + 1−α2 α1 z2 + 1−α2 α1
Structural and reduced form equations, 2SLS method
Structural equations (example)
y1 = β10 + β11 y2 + β12 z1 + u1
y2 = β20 + β21 y1 + β22 z2 + u2

Reduced form equations


y1 = π10 + π11 z1 + π12 z2 + ε1 ⇒ ŷ1 by OLS
y2 = π20 + π21 z1 + π22 z2 + ε2 ⇒ ŷ2 by OLS

2SLS (a special case of IVR)


1st stage: Estimate reduced forms, get ŷ1 and ŷ2 .
2nd stage: Replace endogenous regressors in structural
equations by fitted values from 1st stage, estimate by OLS.
Estimation assumptions and “problems” involved:
. . . Identification of structural equations,
. . . Statistical inference in structural equations (2nd stage).
SEM examples

Example 2: (Structural equations)


Estimation of murder rates
murdpc = β10 + α1 polpc + β11 incpc + u1
polpc = β20 + α2 murdpc + β(other factors) + u2

1st equation describes the behaviour of murderers,


2nd one the behaviour of municipalities.
Each one has its ceteris paribus interpretation.

For the municipality policy, the 1st equation is interesting:


what is the impact of exogenous increase of police force on
the murder rate?
However, the number of police officers is not exogenous
(simultaneity problem).
SEM examples

SEM equation properties (for each equation):


Variables with proper ceteris paribus interpretation
Structural equations describe process from different
perspectives
Labor market: employees vs. employers
Criminality: authorities vs. “criminals”

Counter example: households’ saving and housing expendituress:


housing = β10 + β11 saving + β12 income + · · · + u1
saving = β20 + β21 housing + β22 income + · · · + u2

Both equations model household behavior


Both endogenous variables chosen by the same agent
Cannot reasonably change income and hold saving fixed (first
equation)
SEM identification

Example 3: (Identification)
Identification problem in a SEM

Example: Supply and demand for milk


Supply of milk: q = α1 p + β1 z1 + u1
Demand for milk: q = α2 p + u2

Supply of milk cannot be consistently estimated because we


do not have (at least) one exogenous variable “available” to
be used as instrument for p in the supply equation.

Demand for milk can be consistently estimated because we


can use exogenous variable z1 as instrument for p in the
demand equation.
SEM identification

Ilustration
Identification conditions
Identification conditions for a sample 2-equation SEM
(individual i subscripts omitted)

y1 = β10 + α1 y2 + β11 z11 + β12 z12 + · · · + β1k z1k + u1


y2 = β20 + α2 y1 + β21 z21 + β22 z22 + · · · + β2k z2k + u2

Order condition (necessary): 1st equation is identified


if at least one exogenous variable z is excluded from 1st
equation (yet in the SEM).
Rank condition (necessary and sufficient): 1st equation is
identified if and only if the second equation includes at
least one exogenous variable excluded from the first
equation with a nonzero coefficient, so that it actually
appears in the reduced form.
For the second equation, the conditions are analogous.
Some estimation approaches allow for identification
through IVs not explicitly included in the SEM.
Examples

Example 4: (Identification)
Labor supply of married working women

Supply (workers):

hours = α1 log(wage) + β10 + β11 educ + β12 age + β13 kidslt6


+ β14 nwifeinc + u1

Demand (enterprises):

log(wage) =α2 hours + β20 + β21 educ + β22 exper + β23 exper2 + u2

Order condition is fulfilled in both equations.


Examples

Example 4: (Identification)
Labor supply of married working women contnd.

Identification of the first equation (Supply). For the rank


condition, either β22 or β23 non-zero population coefficient
(in the second equation) is required – so that exper, exper2
(or both) can be used in the reduced form.

To evaluate the rank condition for supply equation, we


estimate the reduced form for log(wage) and test if we can
reject the null hypothesis that coefficients for both exper
and exper2 are zero.
If H0 is rejected, the rank condition is fulfilled.

We would do the evaluation of the rank condition for the


demand equation analogically.
Estimation

We can consistently estimate identified equations with the


2SLS method.

In the 1st stage, we regress each endogenous variable on all


exogenous variables (“reduced forms”).

In the 2nd stage we put into the structural equations


instead of endogenous variables their predictions from the
1st stage and estimate with the OLS method.

The reduced form can be always estimated (by OLS).

In the 2nd stage, we cannot estimate unidentified structural


equations.

With some additional assumptions, we can use a more


efficient estimation method than 2SLS: 3SLS.
Systems with more than two equations

Example 5: Keynesian macroeconomic model

Ct = β0 + β1 (Yt − Tt ) + β2 rt + ut1
It = γ0 + γ1 rt + ut2
Yt ≡ Ct + It + Gt

Endogenous: Ct , It , Yt Exogenous: Tt , Gt , rt
Order condition for identification is the same as for
two-equation systems, rank condition is more complicated.
Complex models based on macroeconomic time series are
sometimes used. Problems with these models: series are
usually not weakly dependent, it is difficult to find enough
exogenous variables as instruments. Question is, if any
macroeconomic variables are exogenous at all.
Identification in SEMs with more than two equations

yi = Xi β + ui is the i-th equation of a SEM.


K - number of exogenous/predetermined variables in the SEM,
Ki - number of K in the i-th equation,
Gi - number of endogenous variables in the i-th equation.

Order condition for the i-th equation:


necessary, not sufficient condition for identification

K − Ki ≥ Gi − 1

Condition evaluates as:


= Equation i is just-identified,
> Equation i is over-identified,
< Equation i is not identified,
structural equation i cannot be estimated by 2SLS/IVR.
Identification in SEMs with more than two equations
Rank condition: based on matrix algebra & IV estimator
Consider IVR for an identified i-th equation of SEM
yi = Xi β + ui
Xi is a (n×k) matrix, includes the intercept column and all
endogenous regressors of the i-th equation,
X̂i is a (n×k) matrix, includes the intercept column.
Exogenous regressors are repeated from Xi , endogenous are
projected to the column space of Z: a (n×l) matrix of all
exogenous variables in the SEM.

Single equation (limited information) estimator for each i-th


equation:
 −1
β̂IVR = β̂2SLS,i = X̂i0 Xi X̂i0 y

GMM – moment equations can be used


Identification in SEMs with more than two equations

Rank condition: based on matrix algebra & IV estimator (cont.)


 −1
β̂IVR = X̂i0 Xi X̂i0 y

Order condition: The necessary condition for the i-th


equation to be identified is that the number of columns
(exogenous variables of SEM) in Z should be no less than
the number of columns (explanatory variables) in Xi .

Rank condition: The necessary and sufficient condition


for identification of the i-th equation is that X̂i0 has full
column rank of Xi .  −1
. . . ensures the existence of X̂i0 Xi .
Identification in SEMs with more than two equations

Identification: recap & final remarks

Reduced form equations can always be estimated.


Structural equations can be estimated (IV/2SLS)
only if identified: i.e. if rank condition is met.
 −1
With SW, checking rank condition for X̂i0 Xi is easy
for finite datasets.
Asymptotic identification may be “tricky”:
because some columns in Xi are endogenous,
plim n−1 X̂i0 Xi
depends on the parameters of the DGP.
. . . see Davidson-MacKinnon (2009) Econometric theory and methods

You might also like