Endogeneity and Instrumental Variables
Endogeneity and Instrumental Variables
Instrumental
Variables
Presented by Justin Balthrop
September 28, 2015
Take the simple case of ordinary least squares regression with a single
explanatory variable:
Before we worry about external validity and the big picture implications of
our results, we need to satisfy internal validity.
This gives estimates of _0 and _1, which are used to get predicted X values:
Instrument validity
Why it works:
They need to be adjusted for the fact that the explanatory variables are estimated
See Woolridge for the math, STATA for the code- ivreg, robust
Other considerations:
Heteroskedasticity
Appropriate clustering
Instrument relevance- more relevance lower estimator variance and higher Rsquared in the first stage
With estimator:
Tests the hypothesis that instruments Z_i do not enter the first-stage regression
Small F-stat (less than 10) are the result of weak instruments
Statistics: J-test
Step 1: estimate the conditional expectation function using TSLS and both instruments
Step 5: Test the hypothesis that all coefficients on Z_i are zero, with J-statistic J= mF
If some instruments are exogenous and others endogenous, J-stat will be large, rejecting the
null that all instruments are exogenous
If the statistic (nR2) exceeds the critical 2 value, conclude the instruments
are invalid.
They are not uncorrelated with the error term and hence has some explanatory
power in the main equation.
Be very careful: The test assumes that one instrument is valid.
If all instruments do not fulfill the criteria Cov(zi,ui) = 0, then the test might
suggest that the instruments are valid, even when they are not
Durbin-Wu-Hausman Test
If |DWH| > 1.96, then X is endogenous and IV is the preferred estimator despite its
inefficiency
The
following example is
taken from the University of
Albany Center for Social and
Demographic Analysis
presentation on IV
Estimation
Final 2SLS model interacted QOB with year of birth (30), state of birth (150)
OLS: b = .0628 (s.e. = .0003)
2SLS: b = .0811 (s.e. = .0109)
Least squares estimate does not appear to be badly biased by omitted variables
But...replication effort identified some pitfalls in this analysis that are instructive
Small Cov(X,Z) introduces finite-sample bias, which will be exacerbated with the inclusion of many IVs
Even small Cov(Z,e) will cause inconsistency, and this will be exacerbated when Cov(X,Z) is small
QOB qualifies as a weak instrument that may be correlated with unobserved determinants of
wages (e.g., family income)
Even if the instrument is good, matters can be made far worse with IV as opposed to LS
Weak correlation between IV and endogenous regressor can pose severe finite-sample bias
Andreally large samples wont help, especially if there is even weak endogeneity between IV and error
Identification is achieved by having regressors that are uncorrelated with the product of
heteroskedastic errors
Y1 = X1+Y21+1 (1)
Y2 = X2+Y12+2 (2)