0% found this document useful (0 votes)
9 views

Instrumental variable in regression

The document discusses the use of instrumental variables (IV) methods to address omitted variable bias and endogeneity in econometric models, particularly in the context of job training and schooling. It highlights the importance of controlling for unobserved variables that may confound the relationship between treatment and outcomes, and presents examples illustrating the application of IV techniques, including Two Stage Least Squares (2SLS) estimation. The document emphasizes the need for appropriate instruments to isolate causal effects and mitigate biases in regression analyses.

Uploaded by

isaiahmpapi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Instrumental variable in regression

The document discusses the use of instrumental variables (IV) methods to address omitted variable bias and endogeneity in econometric models, particularly in the context of job training and schooling. It highlights the importance of controlling for unobserved variables that may confound the relationship between treatment and outcomes, and presents examples illustrating the application of IV techniques, including Two Stage Least Squares (2SLS) estimation. The document emphasizes the need for appropriate instruments to isolate causal effects and mitigate biases in regression analyses.

Uploaded by

isaiahmpapi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction to Instrumental

Variables Methods
Dismas Alex
The institute of Finance
Management
Introduction
Motivation

Motivation
• Suppose that we extend the model to
include covariates/control variables:

• What would be the problem with this


simple model?
• Issue of Omitted Variable bias, then T is
endogenous (Its true for many variables
we use)
• We need to think about the best way to
actually mitigate the issue
Motivation
• The problem of omitted variable bias or
unobserved heterogeneity can be quite
extensive
• Often times important personal variables
cannot be observed
• The unobservables are correlated with the
explanatory variables of interest, T.
• Thus T is endogenous.
The consequence of an
endogenous T

What are the solutions to OVB and
unobserved heterogeneity
• Ignore the problem – biased and
inconsistent estimate of the coefficients
• Fi nd a sui tabl e proxy var i abl e for the
unobserved variable e.g. IQ test for ability
• Assume that the unobserved variable that
does not change over time and we can
obtain panel data
– Fixed effects or
– First-differencing model
Example 1:The Case of Job Training and
Earnings
• Suppose we want to measure the impact of job training on
earnings. We Observe data on earnings for people who have and
have not completed job training.

• We compare two groups: those who got trained and those who
didn’t.

• Want to infer the causal effect of job training on earnings

• What if people who are more “motivated” are more likely to get
training and on average earn more than less “motivated”?
‒ Difference between average earnings across the trained
and untrained confounds the effects of motivation and
training
‒ Omitted variables bias: Would like to control for unobserved
(and unobservable?) motivation
Example 1:The Case of Job Training and
Earnings
• In this scenario, "motivation" acts as a potential
confounding variable, as it inf lu ences both whether
someone receives job training and their final earnings.
• This can indeed bias the observed relationship between
training and earnings, making it dif ficult to infer a causal
effect.
• Selection bias:
– If motivated individuals are more likely to pursue training, simply
comparing the earnings of those who trained vs. those who
didn't will be misleading.
– The trained group might have inherently higher earning potential
due to their motivation, not necessarily the training itself.
How to address this?
• Randomized controlled trials (RCTs):
– The gold standard for causal inference! If you randomly assign
individuals to receive training or not, any differences in earnings can be
attributed to the training, not pre-existing differences like motivation.

• Control variables:
– You can incorporate variables related to motivation (e.g., education level,
prior work experience) into your analysis. This statistically "controls" for
their influence, allowing you to isolate the effect of training while
accounting for motivation differences.

• Instrumental variables (IVs):


– Find a variable that influences the decision to train but not earnings
directly. This "instrument" can help identify the true causal effect of
training by separating it from the confounding influence of motivation.
Instrumental variables (IVs)
• A solution to the endogeneity problem is
to find an instrumental variable (IV)
Instrumental variables (IVs)
Instrumental variables (IVs)
Instrumental variables (IVs)
IV Estimation in Multiple regression
Two stage Least Square (2SLS)
Estimation
Two stage Least Square (2SLS)
Estimation
Example: Job Training
• OLS Results (from Stata):

regress earnings train x1-x13 , robust

Linear regression Number of obs = 5102


F( 14, 5087) = 38.35
Prob > F = 0.0000
R-squared = 0.0909
Root MSE = 18659

------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
train | 3753.362 536.3832 7.00 0.000 2701.82 4804.904
.
.
.

If intuition about source of endogeneity is correct, this should be an over-


estimate of the effect of training.
Example: Job Training
• First-Stage Results (from Stata):

regress train offer x1-x13 , robust

Linear regression Number of obs = 5102


F( 14, 5087) = 390.75
Prob > F = 0.0000
R-squared = 0.3570
Root MSE = .39619

------------------------------------------------------------------------------
| Robust
train | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
offer | .6088885 .0087478 69.60 0.000 .591739 .6260379
.
.
.

Strong evidence that E[zixi] ≠ 0


Example: Job Training
• Reduced-Form Results (from Stata):

regress earnings offer x1-x13 , robust

Linear regression Number of obs = 5102


F( 14, 5087) = 34.19
Prob > F = 0.0000
R-squared = 0.0826
Root MSE = 18744

------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
offer | 970.043 545.6179 1.78 0.075 -99.60296 2039.689.
.
.
.

Moderate evidence of a non-zero treatment effect


(maintaining exclusion restriction)
Example: Job Training Note: Some
software reports R2
• IV Results (from Stata): after IV regression.
This object is NOT
meaningful and
should not be used.
ivreg earnings (train = offer) x1-x13 , robust

Instrumental variables (2SLS) regression Number of obs = 5102


F( 14, 5087) = 34.38
Prob > F = 0.0000
R-squared = 0.0879
Root MSE = 18689

------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
train | 1593.137 894.7528 1.78 0.075 -160.9632 3347.238
.
.
.

Moderate evidence of a positive treatment effect (maintaining


exclusion restriction). Substantially attenuated relative to OLS,
consistent with intuition.
Example: Returns to Schooling

Example: Returns to Schooling
• OLS Results (from Stata):

xi: reg lwage educ i.yob i.sob , robust


i.yob _Iyob_30-39 (naturally coded; _Iyob_30 omitted)
i.sob _Isob_1-56 (naturally coded; _Isob_1 omitted)

Linear regression Number of obs = 329509


F( 60,329448) = 649.29
Prob > F = 0.0000
R-squared = 0.1288
Root MSE = .63366

------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .067339 .0003883 173.40 0.000 .0665778 .0681001
.
.
.
If intuition about source of endogeneity is correct, this should be an over-
estimate of the effect of schooling.
Example: Returns to Schooling
• First-Stage Results (from Stata):
xi: regress educ i.qob i.sob i.yob , robust
Linear regression Number of obs = 329509
F( 62,329446) = 292.87
Prob > F = 0.0000
R-squared = 0.0572
Root MSE = 3.1863

------------------------------------------------------------------------------
| Robust
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iqob_2 | .0455652 .015977 2.85 0.004 .0142508 .0768797
_Iqob_3 | .1060082 .0155308 6.83 0.000 .0755683 .136448
_Iqob_4 | .1525798 .0157993 9.66 0.000 .1216137 .1835459

.
.
.
testparm _Iqob*

( 1) _Iqob_2 = 0
( 2) _Iqob_3 = 0
( 3) _Iqob_4 = 0
First-stage F-statistic.
F( 3,329446) = 36.06
Prob > F = 0.0000
Example: Returns to Schooling
• Reduced-Form Results (from Stata):
xi: regress lwage i.qob i.sob i.yob , robust

Linear regression Number of obs = 329509


F( 62,329446) = 147.83
Prob > F = 0.0000
R-squared = 0.0290
Root MSE = .66899

------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iqob_2 | .0028362 .0033445 0.85 0.396 -.0037188 .0093912
_Iqob_3 | .0141472 .0032519 4.35 0.000 .0077736 .0205207
_Iqob_4 | .0144615 .0033236 4.35 0.000 .0079472 .0209757
.
.

testparm _Iqob*

( 1) _Iqob_2 = 0
( 2) _Iqob_3 = 0
( 3) _Iqob_4 = 0

F( 3,329446) = 10.43
Prob > F = 0.0000
Example: Returns to Schooling
• 2SLS Results (from Stata):
xi: ivregress 2sls lwage (educ = i.qob) i.yob i.sob , robust

Instrumental variables (2SLS) regression Number of obs = 329509


Wald chi2(60) = 9996.12
Prob > chi2 = 0.0000
R-squared = 0.0929
Root MSE = .64652

------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1076937 .0195571 5.51 0.000 .0693624 .146025
.
.
.

Bigger than OLS?

You might also like