0% found this document useful (0 votes)
166 views33 pages

ECON W3412: Introduction To Econometrics Chapter 12. Instrumental Variables Regression (Part II)

This document outlines instrumental variables (IV) regression with multiple endogenous regressors, instruments, and control variables. It discusses the general IV regression model, two-stage least squares (TSLS) with a single endogenous regressor, and checking the validity of instruments. Key points include: instruments must be relevant to the endogenous regressors and exogenous to the error term, and having more instruments than endogenous regressors allows testing if the instruments are valid.

Uploaded by

Bri Min
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views33 pages

ECON W3412: Introduction To Econometrics Chapter 12. Instrumental Variables Regression (Part II)

This document outlines instrumental variables (IV) regression with multiple endogenous regressors, instruments, and control variables. It discusses the general IV regression model, two-stage least squares (TSLS) with a single endogenous regressor, and checking the validity of instruments. Key points include: instruments must be relevant to the endogenous regressors and exogenous to the error term, and having more instruments than endogenous regressors allows testing if the instruments are valid.

Uploaded by

Bri Min
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

ECON W3412: Introduction to Econometrics

Chapter 12. Instrumental Variables Regression


(Part II)

Simon Lee

Department of Economics
Columbia University

Spring 2018

1 / 14
Outline

The General IV Regression Model

TSLS with a Single Endogenous Regressor

Checking Instrument Validity

2 / 14
The General IV Regression Model I

I So far we have considered IV regression with a single


endogenous regressor (X ) and a single instrument (Z ).
I We need to extend this to:
I multiple endogenous regressors (X1 , . . . , Xk ).
I multiple included exogenous variables (W1 , . . . , Wr ) or control
variables, which need to be included for the usual OV reason.
I multiple instrumental variables (Z1 , . . . , Zm ). More (relevant)
instruments can produce a smaller variance of TSLS: the R 2 of
the first stage increases, so you have more variation in X̂ .

3 / 14
The General IV Regression Model II

I In general, a parameter is said to be identified if di↵erent


values of the parameter produce di↵erent distributions of the
data.
I In IV regression, whether the coefficients are identified
depends on the relation between the number of instruments
(m) and the number of endogenous regressors (k).
I Intuitively, if there are fewer instruments than endogenous
regressors, we can’t estimate 1 , . . . , k .

4 / 14
The General IV Regression Model III

I The coefficients 1, . . . , k are said to be:


I exactly identified if m = k.
There are just enough instruments to estimate 1 , . . . , k .
I overidentified if m > k.
There are more than enough instruments to estimate
1 , . . . , k . If so, you can test whether the instruments are
valid (a test of the “overidentifying restrictions”) – we’ll return
to this later.
I underidentified if m < k.
There are too few instruments to estimate 1 , . . . , k . If so,
you need to get more instruments!

5 / 14
Outline

The General IV Regression Model

TSLS with a Single Endogenous Regressor

Checking Instrument Validity

6 / 14
TSLS with a Single Endogenous Regressor I

I Consider

Yi = 0+ 1 X1i + 2 W1i + . . . + 1 + r Wri + ui , i = 1, . . . , n.

I m instruments: Z1i , . . . , Zmi


I First stage
I Regress X1 on all the exogenous regressors: regress X1 on
W1 , . . . , Wr , Z1 , . . . , Zm , and an intercept, by OLS
I Compute predicted values X̂1i , i = 1, . . . , n
I Second stage
I Regress Y on X̂1 , W1 , . . . , Wr , and an intercept, by OLS
I The coefficients from this second stage regression are the
TSLS estimators, but SEs are wrong
I To get correct SEs, do this in a single step in your regression
software

7 / 14
TSLS with a Single Endogenous Regressor II

I Required Assumptions
I Instrument exogeneity: corr(Z1i , ui ) = 0, . . . , corr(Zmi , ui ) = 0.
I With one endogenous regressor X1i , the instrument relevance
condition is that (a) at least one instrument must enter the
population counterpart of the first stage regression, and (b)
the W ’s are not perfectly multicollinear.

8 / 14
Example: Demand for cigarettes, ctd.
Suppose income is exogenous (this is plausible – why?), and
we also want to estimate the income elasticity:

ln(Qicigarettes ) = b0 + b1ln( Pi cigarettes ) + b2ln(Incomei) + ui

We actually have two instruments:


Z1i = general sales taxi
Z2i = cigarette-specific taxi
• Endogenous variable: ln( Pi cigarettes ) (“one X”)
• Included exogenous variable: ln(Incomei) (“one W”)
• Instruments (excluded endogenous variables): general sales
tax, cigarette-specific tax (“two Zs”)
• Is b1 over–, under–, or exactly identified?
SW Ch. 12 12/30
Example: Cigarette demand, one instrument
IV: rtaxso = real overall sales tax in state
Y W X Z
. ivreg lpackpc lperinc (lravgprs = rtaxso) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 2, 45) = 8.19
Prob > F = 0.0009
R-squared = 0.4189
Root MSE = .18957

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.143375 .3723025 -3.07 0.004 -1.893231 -.3935191
lperinc | .214515 .3117467 0.69 0.495 -.413375 .842405
_cons | 9.430658 1.259392 7.49 0.000 6.894112 11.9672
------------------------------------------------------------------------------
Instrumented: lravgprs
Instruments: lperinc rtaxso STATA lists ALL the exogenous regressors
as instruments – slightly different
terminology than we have been using S
------------------------------------------------------------------------------
• Running IV as a single command yields the correct SEs
• Use , r for heteroskedasticity-robust SEs
SW Ch. 12 13/30
Example: Cigarette demand, two instruments
Y W X Z1 Z2
. ivreg lpackpc lperinc (lravgprs = rtaxso rtax) if year==1995, r;

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 2, 45) = 16.17
Prob > F = 0.0000
R-squared = 0.4294
Root MSE = .18786

------------------------------------------------------------------------------
| Robust
lpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lravgprs | -1.277424 .2496099 -5.12 0.000 -1.780164 -.7746837
lperinc | .2804045 .2538894 1.10 0.275 -.230955 .7917641
_cons | 9.894955 .9592169 10.32 0.000 7.962993 11.82692
------------------------------------------------------------------------------
Instrumented: lravgprs
Instruments: lperinc rtaxso rtax STATA lists ALL the exogenous regressors
as “instruments” – slightly different
terminology than we have been using S
------------------------------------------------------------------------------

SW Ch. 12 14/30
TSLS estimates, Z = sales tax (m = 1)
! cigarettes !
ln(Qi ) = 9.43 – 1.14 ln(Pi cigarettes ) + 0.21ln(Incomei)
(1.26) (0.37) (0.31)

TSLS estimates, Z = sales tax & cig-only tax (m = 2)


! cigarettes !
ln(Qi ) = 9.89 – 1.28 ln(Pi cigarettes ) + 0.28ln(Incomei)
(0.96) (0.25) (0.25)

• Smaller SEs for m = 2. Using 2 instruments gives more


information – more “as-if random variation”.
• Low income elasticity (not a luxury good); income elasticity
not statistically significantly different from 0
• Surprisingly high price elasticity

SW Ch. 12 15/30
Outline

The General IV Regression Model

TSLS with a Single Endogenous Regressor

Checking Instrument Validity

9 / 14
Checking Instrument Validity I

I We will focus on a single included endogenous regressor:

Yi = 0+ 1 X1i + 2 W1i + . . . + 1 + r Wri + ui , i = 1, . . . , n.

I First stage regression:

Xi = ⇡0 +⇡1 Z1i +. . .+⇡m Zmi +⇡m+1 W1i +. . .+⇡m+k Wki +ui .

I The instruments are relevant if at least one of ⇡1 , . . . , ⇡m are


nonzero.
I The instruments are said to be weak if all the ⇡1 , . . . , ⇡m are
either zero or nearly zero.
I Weak instruments explain very little of the variation in X ,
beyond that explained by the W ’s.

10 / 14
Checking Instrument Validity II

I If instruments are weak, the sampling distribution of TSLS


and its t-statistic are not (at all) normal, even with n large.
I To measuring the strength of instruments in practice, consider
the first stage regression (one X ): Regress X on
Z1 , . . . , Z m , W 1 , . . . , W k .
I The first-stage F -statistic tests the hypothesis that
Z1 , . . . , Zm do not enter the first stage regression.
I Weak instruments imply a small first stage F -statistic.

11 / 14
An example: the sampling distribution of the TSLS
t-statistic with weak instruments

Dark line = irrelevant instruments


Dashed light line = strong instruments
SW Ch. 12 16/30
Checking Instrument Validity III

I Rule-of-thumb: If the first stage F -statistic is less than 10,


then the set of instruments is weak.
I If so, the TSLS estimator will be biased, and statistical
inferences (standard errors, hypothesis tests, confidence
intervals) can be misleading.
I Comparing the first-stage F to 10 tests for whether the bias
of TSLS, relative to OLS, is less than 10%. If F is smaller
than 10, the relative bias exceeds 10% – that is, TSLS can
have substantial bias.
I What to do if you have weak instruments?

12 / 14
Checking Instrument Validity IV

I How to check instrument exogeneity?


I It is not testable if the model is exactly identified (e.g. one
endogenous regressor with one instrument).
I If there are more instruments than endogenous regressors, it is
possible to test – partially – for instrument exogeneity.

13 / 14
Checking Instrument Validity V

I Testing overidentifying restrictions


I Intuition is as follows.
I Consider the simplest case:

Yi = 0 + 1 Xi + ui

I Suppose there are two valid instruments: Z1i , Z2i .


I Then you could compute two separate TSLS estimates.
I Intuitively, if these 2 TSLS estimates are very di↵erent from
each other, then something must be wrong: one or the other
(or both) of the instruments must be invalid.
I The J-test of overidentifying restrictions makes this
comparison in a statistically precise way.
I This can only be done if #Z ’s > #X ’s (overidentified).

14 / 14
Application to the Demand for Cigarettes
(SW Section 12.4)

Why are we interested in knowing the elasticity of demand


for cigarettes?
• Theory of optimal taxation. The optimal tax rate is
inversely related to the price elasticity: the greater the
elasticity, the less quantity is affected by a given
percentage tax, so the smaller is the change in
consumption and deadweight loss.
• Externalities of smoking – role for government
intervention to discourage smoking
o Health effects of second-hand smoke? (non-monetary)
o monetary externalities
SW Ch. 12 17/30
Panel data set
• Annual cigarette consumption, average prices paid by end
consumer (including tax), personal income, and tax rates
(cigarette-specific and general statewide sales tax rates)
• 48 continental US states, 1985-1995

Estimation strategy
• We need to use IV estimation methods to handle the
simultaneous causality bias that arises from the interaction
of supply and demand.
• State binary indicators = W variables (control variables)
which control for unobserved state-level characteristics
that affect the demand for cigarettes and the tax rate, as
long as those characteristics don’t vary over time.

SW Ch. 12 18/30
Fixed-effects model of cigarette demand
ln(Qitcigarettes ) = ai + b1ln( Pitcigarettes ) + b2ln(Incomeit) + uit

• i = 1,…,48, t = 1985, 1986,…,1995


• corr(ln( Pitcigarettes ),uit) is plausibly nonzero because of
supply/demand interactions
• ai reflects unobserved omitted factors that vary across
states but not over time, e.g. attitude towards smoking
• Estimation strategy:
o Use panel data regression methods to eliminate ai
o Use TSLS to handle simultaneous causality bias
o Use T = 2 with 1985 – 1995 changes (“changes”
method) – look at long-term response, not short-term
dynamics (short- v. long-run elasticities)
SW Ch. 12 19/30
The “changes” method (when T=2)
• One way to model long-term effects is to consider 10-year
changes, between 1985 and 1995
• Rewrite the regression in “changes” form:
ln(Qicigarettes
1995
) – ln( Q cigarettes
i1985
)
= b1[ln( Pi1995
cigarettes cigarettes
) – ln( Pi1985 )]
+b2[ln(Incomei1995) – ln(Incomei1985)]
+ (ui1995 – ui1985)
• Create “10-year change” variables, for example:
10-year change in log price = ln(Pi1995) – ln(Pi1985)
• Then estimate the demand elasticity by TSLS using 10-year
changes in the instrumental variables
• This is equivalent to using the original data and including
the state binary indicators (“W” variables) in the regression
SW Ch. 12 20/30
STATA: Cigarette demand

First create “10-year change” variables


10-year change in log price
= ln(Pit) – ln(Pit–10) = ln(Pit/Pit–10)

. gen dlpackpc = log(packpc/packpc[_n-10]); _n-10 is the 10-yr lagged value


. gen dlavgprs = log(avgprs/avgprs[_n-10]);
. gen dlperinc = log(perinc/perinc[_n-10]);
. gen drtaxs = rtaxs-rtaxs[_n-10];
. gen drtax = rtax-rtax[_n-10];
. gen drtaxso = rtaxso-rtaxso[_n-10];

SW Ch. 12 21/30
Use TSLS to estimate the demand elasticity by using the
“10-year changes” specification
Y W X Z
. ivregress 2sls dlpackpc dlperinc (dlavgprs = drtaxso) , r;

IV (2SLS) regression with robust standard errors Number of obs = 48


F( 2, 45) = 12.31
Prob > F = 0.0001
R-squared = 0.5499
Root MSE = .09092

------------------------------------------------------------------------------
| Robust
dlpackpc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dlavgprs | -.9380143 .2075022 -4.52 0.000 -1.355945 -.5200834
dlperinc | .5259693 .3394942 1.55 0.128 -.1578071 1.209746
_cons | .2085492 .1302294 1.60 0.116 -.0537463 .4708446
------------------------------------------------------------------------------
Instrumented: dlavgprs
Instruments: dlperinc drtaxso
------------------------------------------------------------------------------
NOTE:
- All the variables – Y, X, W, and Z’s – are in 10-year changes
- Estimated elasticity = –.94 (SE = .21) – surprisingly elastic!
- Income elasticity small, not statistically different from zero
- Must check whether the instrument is relevant…
SW Ch. 12 22/30
Check instrument relevance: compute first-stage F
. reg dlavgprs drtaxso dlperinc;

Source | SS df MS Number of obs = 48


-------------+------------------------------ F( 2, 45) = 23.86
Model | .191437213 2 .095718606 Prob > F = 0.0000
Residual | .180549989 45 .004012222 R-squared = 0.5146
-------------+------------------------------ Adj R-squared = 0.4931
Total | .371987202 47 .007914621 Root MSE = .06334
------------------------------------------------------------------------------
dlavgprs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drtaxso | .0254611 .0037374 6.81 0.000 .0179337 .0329885
dlperinc | -.2241037 .2119405 -1.06 0.296 -.6509738 .2027664
_cons | .5321948 .031249 17.03 0.000 .4692561 .5951334
------------------------------------------------------------------------------

. test drtaxso;
( 1) drtaxso = 0 We didn’t need to run “test” here!
With m=1 instrument, the F-stat is
F( 1, 45) = 46.41 the square of the t-stat:
Prob > F = 0.0000 6.81*6.81 = 46.41

First stage F = 46.5 > 10 so instrument is not weak


Can we check instrument exogeneity? No: m = k
SW Ch. 12 23/30
Cigarette demand, 10 year changes – 2 IVs
Y W X Z1 Z2
. ivregress 2sls dlpackpc dlperinc (dlavgprs = drtaxso drtax) , vce(r);

Instrumental variables (2SLS) regression Number of obs = 48


Wald chi2(2) = 45.44
Prob > chi2 = 0.0000
R-squared = 0.5466
Root MSE = .08836

------------------------------------------------------------------------------
| Robust
dlpackpc | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dlavgprs | -1.202403 .1906896 -6.31 0.000 -1.576148 -.8286588
dlperinc | .4620299 .2995177 1.54 0.123 -.1250139 1.049074
_cons | .3665388 .1180414 3.11 0.002 .1351819 .5978957
------------------------------------------------------------------------------
Instrumented: dlavgprs
Instruments: dlperinc drtaxso drtax
------------------------------------------------------------------------------

drtaxso = general sales tax only


drtax = cigarette-specific tax only
Estimated elasticity is -1.2, even more elastic than using general
sales tax only!

SW Ch. 12 24/30
First-stage F – both instruments
X Z1 Z2 W
. reg dlavgprs drtaxso drtax dlperinc ;

Source | SS df MS Number of obs = 48


-------------+------------------------------ F( 3, 44) = 51.36
Model | .289359873 3 .096453291 Prob > F = 0.0000
Residual | .082627329 44 .001877894 R-squared = 0.7779
-------------+------------------------------ Adj R-squared = 0.7627
Total | .371987202 47 .007914621 Root MSE = .04333

------------------------------------------------------------------------------
dlavgprs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drtaxso | .013457 .0030498 4.41 0.000 .0073106 .0196033
drtax | .0075734 .0010488 7.22 0.000 .0054597 .009687
dlperinc | -.0289943 .1474923 -0.20 0.845 -.3262455 .2682568
_cons | .4919733 .0220923 22.27 0.000 .4474492 .5364973
------------------------------------------------------------------------------

. test drtaxso drtax;

( 1) drtaxso = 0
( 2) drtax = 0
F( 2, 44) = 75.65 75.65 > 10 so instruments aren’t weak
Prob > F = 0.0000
With m>k, we can test the overidentifying restrictions…
SW Ch. 12 25/30
Test the overidentifying restrictions
. predict e, resid; Computes predicted values for most recently
estimated regression (the previous TSLS regression)
. reg e drtaxso drtax dlperinc; Regress e on Z’s and W’s

Source | SS df MS Number of obs = 48


-------------+------------------------------ F( 3, 44) = 1.64
Model | .037769176 3 .012589725 Prob > F = 0.1929
Residual | .336952289 44 .007658007 R-squared = 0.1008
-------------+------------------------------ Adj R-squared = 0.0395
Total | .374721465 47 .007972797 Root MSE = .08751

------------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drtaxso | .0127669 .0061587 2.07 0.044 .000355 .0251789
drtax | -.0038077 .0021179 -1.80 0.079 -.008076 .0004607
dlperinc | -.0934062 .2978459 -0.31 0.755 -.6936752 .5068627
_cons | .002939 .0446131 0.07 0.948 -.0869728 .0928509
------------------------------------------------------------------------------
. test drtaxso drtax;
( 1) drtaxso = 0 Compute J-statistic, which is m*F,
( 2) drtax = 0 where F tests whether coefficients on
the instruments are zero
F( 2, 44) = 2.47 so J = 2 ´ 2.47 = 4.93
Prob > F = 0.0966 ** WARNING – this uses the wrong d.f. **

SW Ch. 12 26/30
The correct degrees of freedom for the J-statistic is m–k:
• J = mF, where F = the F-statistic testing the coefficients on
Z1i,…,Zmi in a regression of the TSLS residuals against
Z1i,…,Zmi, W1i,…,Wmi.
• Under the null hypothesis that all the instruments are
exogeneous, J has a chi-squared distribution with m–k
degrees of freedom
• Here, J = 4.93, distributed chi-squared with d.f. = 1; the 5%
critical value is 3.84, so reject at 5% sig. level.
• In STATA:
. dis "J-stat = " r(df)*r(F) " p-value = " chiprob(r(df)-1,r(df)*r(F));
J-stat = 4.9319853 p-value = .02636401

J = 2 × 2.47 = 4.93 p-value from chi-squared(1) distribution

Now what???

SW Ch. 12 27/30
Tabular summary of these results:

SW Ch. 12 28/30
How should we interpret the J-test rejection?
• J-test rejects the null hypothesis that both the instruments
are exogenous
• This means that either rtaxso is endogenous, or rtax is
endogenous, or both!
• The J-test doesn’t tell us which!! You must exercise
judgment…
• Why might rtax (cig-only tax) be endogenous?
o Political forces: history of smoking or lots of smokers
political pressure for low cigarette taxes
o If so, cig-only tax is endogenous
• This reasoning doesn’t apply to general sales tax
• ⇒ use just one instrument, the general sales tax

SW Ch. 12 29/30
The Demand for Cigarettes:
Summary of Empirical Results

• Use the estimated elasticity based on TSLS with the general


sales tax as the only instrument:
Elasticity = -.94, SE = .21
• This elasticity is surprisingly large (not inelastic) – a 1%
increase in prices reduces cigarette sales by nearly 1%.
This is much more elastic than conventional wisdom in the
health economics literature.
• This is a long-run (ten-year change) elasticity. What would
you expect a short-run (one-year change) elasticity to be –
more or less elastic?

SW Ch. 12 30/30

You might also like