Instrumental variable in regression
Instrumental variable in regression
Variables Methods
Dismas Alex
The institute of Finance
Management
Introduction
Motivation
•
Motivation
• Suppose that we extend the model to
include covariates/control variables:
• We compare two groups: those who got trained and those who
didn’t.
• What if people who are more “motivated” are more likely to get
training and on average earn more than less “motivated”?
‒ Difference between average earnings across the trained
and untrained confounds the effects of motivation and
training
‒ Omitted variables bias: Would like to control for unobserved
(and unobservable?) motivation
Example 1:The Case of Job Training and
Earnings
• In this scenario, "motivation" acts as a potential
confounding variable, as it inf lu ences both whether
someone receives job training and their final earnings.
• This can indeed bias the observed relationship between
training and earnings, making it dif ficult to infer a causal
effect.
• Selection bias:
– If motivated individuals are more likely to pursue training, simply
comparing the earnings of those who trained vs. those who
didn't will be misleading.
– The trained group might have inherently higher earning potential
due to their motivation, not necessarily the training itself.
How to address this?
• Randomized controlled trials (RCTs):
– The gold standard for causal inference! If you randomly assign
individuals to receive training or not, any differences in earnings can be
attributed to the training, not pre-existing differences like motivation.
• Control variables:
– You can incorporate variables related to motivation (e.g., education level,
prior work experience) into your analysis. This statistically "controls" for
their influence, allowing you to isolate the effect of training while
accounting for motivation differences.
------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
train | 3753.362 536.3832 7.00 0.000 2701.82 4804.904
.
.
.
------------------------------------------------------------------------------
| Robust
train | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
offer | .6088885 .0087478 69.60 0.000 .591739 .6260379
.
.
.
------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
offer | 970.043 545.6179 1.78 0.075 -99.60296 2039.689.
.
.
.
------------------------------------------------------------------------------
| Robust
earnings | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
train | 1593.137 894.7528 1.78 0.075 -160.9632 3347.238
.
.
.
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .067339 .0003883 173.40 0.000 .0665778 .0681001
.
.
.
If intuition about source of endogeneity is correct, this should be an over-
estimate of the effect of schooling.
Example: Returns to Schooling
• First-Stage Results (from Stata):
xi: regress educ i.qob i.sob i.yob , robust
Linear regression Number of obs = 329509
F( 62,329446) = 292.87
Prob > F = 0.0000
R-squared = 0.0572
Root MSE = 3.1863
------------------------------------------------------------------------------
| Robust
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iqob_2 | .0455652 .015977 2.85 0.004 .0142508 .0768797
_Iqob_3 | .1060082 .0155308 6.83 0.000 .0755683 .136448
_Iqob_4 | .1525798 .0157993 9.66 0.000 .1216137 .1835459
.
.
.
testparm _Iqob*
( 1) _Iqob_2 = 0
( 2) _Iqob_3 = 0
( 3) _Iqob_4 = 0
First-stage F-statistic.
F( 3,329446) = 36.06
Prob > F = 0.0000
Example: Returns to Schooling
• Reduced-Form Results (from Stata):
xi: regress lwage i.qob i.sob i.yob , robust
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iqob_2 | .0028362 .0033445 0.85 0.396 -.0037188 .0093912
_Iqob_3 | .0141472 .0032519 4.35 0.000 .0077736 .0205207
_Iqob_4 | .0144615 .0033236 4.35 0.000 .0079472 .0209757
.
.
testparm _Iqob*
( 1) _Iqob_2 = 0
( 2) _Iqob_3 = 0
( 3) _Iqob_4 = 0
F( 3,329446) = 10.43
Prob > F = 0.0000
Example: Returns to Schooling
• 2SLS Results (from Stata):
xi: ivregress 2sls lwage (educ = i.qob) i.yob i.sob , robust
------------------------------------------------------------------------------
| Robust
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1076937 .0195571 5.51 0.000 .0693624 .146025
.
.
.