0% found this document useful (0 votes)
14 views48 pages

s10 IV Handout

Uploaded by

Xi Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views48 pages

s10 IV Handout

Uploaded by

Xi Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Gov 2002: 10.

Instrumental
Variables
Matthew Blackwell
November 5, 2015

1 / 48
1. IV setup

2. IV with constant treatment effects

3. IV with heterogenous treatment effects

4. IV extensions

2 / 48
1/ IV setup

3 / 48
Where are we? Where are we
going?

• We saw how to identify and estimate effects under no


unmeasured confounding and with repeated measurements
• What if we have neither? Are we doomed?
• Not necessarily if you can identify some exogenous sources of
variation that drives the treatment.
• Instrumental variables allows for unmeasured confounding on
the the treatment-outcome relationship.
• Use the unconfounded variation in the instrument to help
identify treatment effects.

4 / 48
Basic IV setup with DAGs
𝑈

𝑍 𝐷 𝑌

exclusion restriction

• 𝑍 is the instrument, 𝐷 is the treatment, and 𝑈 is the


unmeasured confounder
• Exclusion restriction
▶ no common causes of the instrument and the outcome
▶ no direct or indirect effect of the instrument on the outcome
not through the treatment.

• First-stage relationship: 𝑍 affects 𝐷

5 / 48
An IV is only as good as its
assumptions

𝑍 𝐷 𝑌

exclusion restriction

• Finding a believable instrument is incredibly difficult and some


people never believe any IV setups.
• When effects vary, the IV approach estimates a “local” ATE
that is local to this particular instrument.

6 / 48
IVs in the field
• Angrist (1990): Draft lottery as an IV for military service
(income as outcome)
• Acemoglu et al (2001): settler mortality as an IV for
institutional quality (GDP/capita as outcome)
• Levitt (1997): being an election year as IV for police force size
(crime as outcome)
• Kern & Hainmueller (2009): having West German TV
reception in East Berlin as an instrument for West German TV
watching (outcome is support for the East German regime)
• Nunn & Wantchekon (2011): historical distance of ethnic
group to the coast as a instrument for the slave raiding of
that ethnic group (outcome are trust attitudes today)
• Acharya, Blackwell, Sen (2015): cotton suitability as IV for
proportion slave in 1860 (outcome is white attitudes today)

7 / 48
2/ IV with
constant
treatment effects
8 / 48
IV with constant effects

• Let’s write down a causal model for 𝑌𝑖 with constant effects


and an unmeasured confounder, 𝑈𝑖 :

𝑌𝑖 (𝑑, 𝑢) = 𝛼 + 𝜏𝑑 + 𝛾𝑢 + 𝜂𝑖

• If we connect this with a consistency assumption, we get the


this regression form:

𝑌𝑖 = 𝛼 + 𝜏𝐷𝑖 + 𝛾𝑈𝑖 + 𝜂𝑖

• Here we assume that 𝔼[𝐷𝑖 𝜂𝑖 ] = 0, so if we measured 𝑈𝑖 , then


we would be able to estimate 𝜏.
• But Cov(𝛾𝑈𝑖 + 𝜂𝑖 , 𝐷𝑖 ) ≠ 0 because 𝑈 is a common cause of 𝐷
and 𝑌 .

9 / 48
The role of the instrument

• If we have an instrument, 𝑍𝑖 , that satisfies the exclusions


restriction, then

Cov(𝛾𝑈𝑖 + 𝜂𝑖 , 𝑍𝑖 ) = 0

• It must be independent of 𝑈𝑖 and it has no correlation with 𝜂𝑖


because neither does the treatment.
Cov(𝑌𝑖 , 𝑍𝑖 ) = Cov(𝛼 + 𝜏𝐷𝑖 + 𝛾𝑈𝑖 + 𝜂𝑖 , 𝑍𝑖 )
= Cov(𝛼, 𝑍𝑖 ) + Cov(𝜏𝐷𝑖 , 𝑍𝑖 ) + Cov(𝛾𝑈𝑖 + 𝜂𝑖 , 𝑍𝑖 )
= 0 + 𝜏Cov(𝐷𝑖 , 𝑍𝑖 ) + 0

10 / 48
IV estimator with constant effects

𝑌𝑖 = 𝛼 + 𝜏𝐷𝑖 + 𝛾𝑈𝑖 + 𝜂𝑖

• With this in hand, we can formulate an expression for the


average treatment effect here:

Cov(𝑌𝑖 , 𝑍𝑖 ) Cov(𝑌𝑖 , 𝑍𝑖 )/𝕍[𝑍𝑖 ]


𝜏= =
Cov(𝐷𝑖 , 𝑍𝑖 ) Cov(𝐷𝑖 , 𝑍𝑖 )/𝕍[𝑍𝑖 ]
• Reduced form coefficient: Cov(𝑌𝑖 , 𝑍𝑖 )/𝕍[𝑍𝑖 ]
• First stage coefficient: Cov(𝐷𝑖 , 𝑍𝑖 )/𝕍[𝑍𝑖 ]

11 / 48
Weak instruments
• Natural estimator:

̂ 𝑖 , 𝑍𝑖 )
Cov(𝑌
𝜏
̂𝐼𝑉 =
̂ 𝑖 , 𝑍𝑖 )
Cov(𝐷
• What happens with a weak first stage? Can show that this
estimator converges to:
𝑝 Cov(𝑍𝑖 , 𝑈𝑖 )
𝜏
̂𝐼𝑉 → 𝜏 +
Cov(𝑍𝑖 , 𝐷𝑖 )
• If Cov(𝑍𝑖 , 𝐷𝑖 ) is small, then even very small violations of the
exclusion restriction Cov(𝑍𝑖 , 𝑈𝑖 ) ≠ 0 can lead to large
inconsistencies and finite sample bias.
• Important to convey the strength of the first-stage via 𝑡-test
or 𝐹-test with multiple instruments.

12 / 48
Wald Estimator

• Binary instrument leads to the Wald estimator:

Cov(𝑌𝑖 , 𝑍𝑖 ) 𝔼[𝑌𝑖 |𝑍𝑖 = 1] − 𝔼[𝑌𝑖 |𝑍𝑖 = 0]


𝜏= =
Cov(𝐷𝑖 , 𝑍𝑖 ) 𝔼[𝐷𝑖 |𝑍𝑖 = 1] − 𝔼[𝐷𝑖 |𝑍𝑖 = 0]
• Intuitively:

effect of instrument on outcome


effect of instrument on treatment

13 / 48
What about covariates?

• No covariates up until now. What if we have a set of


covariates 𝑋𝑖 that we are also conditioning on?
• Let’s start with linear models for both the outcome and the
treatment:
𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜏𝐷𝑖 + 𝜀𝑖
𝐷𝑖 = 𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 + 𝜈𝑖
• Now, we assume that 𝑋𝑖 are exogenous along with 𝑍𝑖 :

𝔼[𝑍𝑖 𝜈𝑖 ] = 0 𝔼[𝑍𝑖 𝜀𝑖 ] = 0

𝔼[𝑋𝑖 𝜈𝑖 ] = 0 𝔼[𝑋𝑖 𝜀𝑖 ] = 0
• …but 𝐷𝑖 is endogenous: 𝔼[𝐷𝑖 𝜀𝑖 ] ≠ 0

14 / 48
Getting the reduced form
• We can plug the treatment equation into the outcome
equation:

𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜏[𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 + 𝜈𝑖 ] + 𝜀𝑖


= 𝑋𝑖′ 𝛽 + 𝜏[𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 ] + [𝜏𝜈𝑖 + 𝜀𝑖 ]
= 𝑋𝑖′ 𝛽 + 𝜏[𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 ] + 𝜀∗𝑖
= 𝑋𝑖′ 𝛽 + 𝜏𝔼[𝐷𝑖 |𝑋𝑖 , 𝑍𝑖 ] + 𝜀∗𝑖

• Red value in the brackets is the population fitted value of the


treatment, 𝔼[𝐷𝑖 |𝑋𝑖 , 𝑍𝑖 ]
• Because 𝑍𝑖 and 𝑋𝑖 are uncorrelated with 𝜈𝑖 and 𝜀𝑖 , then this
fitted value is also independent of 𝜀∗𝑖 .
• Thus, the population regression coefficient of a 𝑌𝑖 on
[𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 ] is the average treatment effect, 𝜏.

15 / 48
Two-stage least squares

• Estimate 𝛼
̂ and 𝛾
̂ from OLS and form fitted values:

̂ 𝑖 |𝑋𝑖 , 𝑍𝑖 ] = ̂
𝔼[𝐷 𝐷𝑖 = 𝑋𝑖′ 𝛼
̂+𝛾
̂ 𝑍𝑖 .

• Regress of 𝑌𝑖 on 𝑋𝑖 and ̂
𝐷𝑖 . Add and subtract 𝜏̂
𝐷𝑖 :

𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜏̂
𝐷𝑖 + [𝜀𝑖 + 𝜏(𝐷𝑖 − ̂
𝐷𝑖 )]

• Key question: is ̂
𝐷𝑖 uncorrelated with the error?
• ̂
𝐷𝑖 is just a function of 𝑋𝑖 and 𝑍𝑖 so it is uncorrelated with 𝜀𝑖 .
• We also know that ̂
𝐷𝑖 is uncorrelated with (𝐷𝑖 − ̂
𝐷𝑖 )?

16 / 48
Two-stage least squares
• Heuristic procedure:
1. Run regression of treatment on covariates and instrument
2. Construct fitted values of treatment
3. Run regression of outcome on covariates and fitted values
• Note that this isn’t how we actually estimate 2SLS because
the standard errors are all wrong.
• Computer wants to calculate the standard errors based on 𝜀∗𝑖 :

𝜀∗𝑖 = 𝑌𝑖 − 𝑋𝑖′ 𝛽 − 𝜏̂
𝐷𝑖

• but what we really want is the standard errors based on 𝜀𝑖 :

𝜀𝑖 = 𝑌𝑖 − 𝑋𝑖′ 𝛽 − 𝜏𝐷𝑖

17 / 48
Nunn & Wantchekon IV example

18 / 48
General 2SLS

• Notational convenience: combine 𝑋𝑖 and 𝐷𝑖 into one matrix,


𝑋𝑖 , of size 𝑘, where one column contains 𝐷𝑖 .
• The structural model, then is:

𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜀𝑖

• 𝑍𝑖 will be a vector of 𝑙 exogenous variables that includes any


exogenous variables in 𝑋𝑖 plus any instruments.
• Key assumption on the instruments:

𝔼[𝑍𝑖 𝜀𝑖 ] = 0

19 / 48
Nasty Matrix Algebra
• Projection matrix projects values from the columns of 𝑍𝑖 to
the columns of 𝑋𝑖 :
Π = (𝔼[𝑍𝑖 𝑍𝑖′ ])−1 𝔼[𝑍𝑖 𝑋𝑖′ ] (projection matrix)
𝑋𝑖̃ = Π′ 𝑍𝑖 (fitted values)
• To derive the 2SLS estimator, take the fitted values, Π′ 𝑍𝑖 and
multiply both sides of the outcome equation by them:
𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜀𝑖
Π′ 𝑍𝑖 𝑌𝑖 = Π′ 𝑍𝑖 𝑋𝑖′ 𝛽 + Π′ 𝑍𝑖 𝜀𝑖
𝔼[Π′ 𝑍𝑖 𝑌𝑖 ] = 𝔼[Π′ 𝑍𝑖 𝑋𝑖′ ]𝛽 + 𝔼[Π′ 𝑍𝑖 𝜀𝑖 ]
𝔼[Π′ 𝑍𝑖 𝑌𝑖 ] = 𝔼[Π′ 𝑍𝑖 𝑋𝑖′ ]𝛽 + Π′ 𝔼[𝑍𝑖 𝜀𝑖 ]
𝔼[Π′ 𝑍𝑖 𝑌𝑖 ] = 𝔼[Π′ 𝑍𝑖 𝑋𝑖′ ]𝛽
𝔼[𝑋𝑖̃ 𝑌𝑖 ] = 𝔼[𝑋𝑖̃ 𝑋𝑖′ ]𝛽
𝛽 = (𝔼[𝑋𝑖̃ 𝑋𝑖′ ])−1 𝔼[𝑋𝑖̃ 𝑌𝑖 ]

20 / 48
How to estimate the parameters
• Collect 𝑋𝑖 into a 𝑛 × 𝑘 matrix 𝐗 = (𝑋1′ , … , 𝑋𝑛′ )
• Collect 𝑍𝑖 into a 𝑛 × 𝑙 matrix 𝐙 = (𝑍1′ , … , 𝑍𝑛′ )
̂ = 𝐙(𝐙′ 𝐙)−1 𝐙′ 𝐗 be the matrix of fitted values for 𝐗,
• Let 𝐗
then we have
𝑁 𝑝
• Matrix party trick: 𝐗′ 𝐙/𝑛 = (1/𝑛) ∑𝑖 𝑋𝑖 𝑍𝑖′ → 𝔼[𝑋𝑖 𝑍𝑖′ ].
• Take the population formula for the parameters:

𝛽 = (𝔼[𝑋𝑖̃ 𝑋𝑖′ ])−1 𝔼[𝑋𝑖̃ 𝑌𝑖 ]

• And plug in the sample values (the 𝑛 cancels out):

̂ ′ 𝐗)−1 𝐗
𝛽 ̂ = (𝐗 ̂ ′𝐲

• This is how R/Stata estimates the 2SLS parameters

21 / 48
Asymptotics for 2SLS
̂ ′ 𝐗)−1 𝐗
𝛽 ̂ = (𝐗 ̂ ′𝐲

• We can insert the true model for 𝐲:

̂ ′ 𝐗)−1 𝐗
𝛽 ̂ = (𝐗 ̂ ′ (𝐗𝛽 + 𝜀)

̂ ′𝐗 = 𝐗
• Using the matrix party trick and that 𝐗 ̂ ′ 𝐗,
̂ we have

̂ ′ 𝐗)−1 𝐗
𝛽 ̂ = (𝐗 ̂ ′ 𝐗𝛽 + (𝐗
̂ ′ 𝐗)−1 𝐗
̂ ′𝜀
= 𝛽 + (𝐗̂ ′ 𝐗)
̂ −1 𝐗
̂ ′𝜀
−1
̂𝑖 𝑋
= 𝛽 + [𝑛−1 ∑ 𝑋 ̂𝑖′ ] 𝑛−1 ∑ 𝑋
̂𝑖 𝜀𝑖
𝑖 𝑖

𝑝
̂𝑖 𝜀𝑖 → 𝔼[𝑋
• Consistent because 𝑛−1 ∑𝑖 𝑋 ̂𝑖 𝜀𝑖 ] = 0.

22 / 48
Asymptotic variance for 2SLS
−1
̂𝑖 𝑋
√𝑛(𝛽̂ − 𝛽) = (𝑛−1 ∑ 𝑋 ̂𝑖′ ) (𝑛−1/2 ∑ 𝑋
̂𝑖 𝜀𝑖 )
𝑖 𝑖

̂𝑖 𝜀𝑖 converges in distribution to
• By the CLT, 𝑛−1/2 ∑𝑖 𝑋
𝑁(0, 𝐵), where 𝐵 = 𝔼[𝑋̂𝑖′ 𝜀′𝑖 𝜀𝑖 𝑋
̂𝑖 ].
𝑝
̂𝑖 𝑋
• By the LLN, 𝑛−1 ∑𝑖 𝑋 ̂𝑖′ → 𝔼[𝑋 ̂𝑖 𝑋̂𝑖′ ].
• Thus, we have that √𝑛(𝛽̂ − 𝛽) has asymptotic variance:

̂𝑖 𝑋
(𝔼[𝑋 ̂𝑖′ ])−1 𝔼[𝑋
̂𝑖′ 𝜀′𝑖 𝜀𝑖 𝑋
̂𝑖 ](𝔼[𝑋
̂𝑖 𝑋
̂𝑖′ ])−1

• Replace with the sample quantities to get estimate of the


robust 2SLS variance estimator:
̂ ′ 𝐗)
ar(𝛽)̂ = (𝐗
v̂ ̂ −1 ( ∑ 𝑢𝑖2̂ 𝑋
̂𝑖 𝑋 ̂ ′ 𝐗)
̂𝑖′ )(𝐗 ̂ −1
𝑖

where 𝑢𝑖̂ = 𝑌𝑖 − 𝑋𝑖′ 𝛽̂


23 / 48
Overidentification

• What if we have more instruments than endogenous variables?


• When there are more instruments than causal parameters
(𝑙 > 𝑘), the model is overidentified.
• When there are as many instruments as causal parameters
(𝑙 = 𝑘), the model is just identified.
• With more than one instrument and constant effects, we can
test for the plausibility of the exclusion restriction(s) using an
overidentification test.
• Is it plausible to find more than one instrument?

24 / 48
Overidentification tests
• Sargan-Hausman test:
▶ Under the null of all valid instruments, using all instruments
versus a subset should only differ by sampling variation.
▶ Regress 2SLS residuals, 𝜀𝑖̂ on 𝑋𝑖 and calculate 𝑅𝑢2 from this
regression.
▶ Under the null (and homoskedasticity), 𝑁𝑅𝑢2 ∼ Χ2𝑙−𝑘 .
▶ Degrees of freedom depends on how many overidentifying
restrictions there are.
• If we reject the null hypothesis in these overidentification
tests, then it means that the exclusion restrcitions for our
instruments are probably incorrect.
• Note that it won’t tell us which of them are incorrect, just
that at least one is.
• These overidentification tests depend heavily on the constant
effects assumption

25 / 48
3/ IV with
heterogenous
treatment effects
26 / 48
Instrumental Variables and
Potential Outcomes
• Basic idea of IV:
▶ 𝐷𝑖 not randomized, but 𝑍𝑖 is
▶ 𝑍𝑖 only affects 𝑌𝑖 through 𝐷𝑖

• 𝐷𝑖 now depends on 𝑍𝑖 ⇝ potential treatments:


𝐷𝑖 (1) = 𝐷𝑖 (𝑧 = 1) and 𝐷𝑖 (0).
• Consistency:

𝐷𝑖 = 𝑍𝑖 𝐷𝑖 (1) + (1 − 𝑍𝑖 )𝐷𝑖 (0)

• Outcome can depend on both the treatment and the


instrument: 𝑌𝑖 (𝑑, 𝑧) is the outcome if unit 𝑖 had received
treatment 𝐷𝑖 = 𝑑 and instrument value 𝑍𝑖 = 𝑧.

27 / 48
Key assumptions

1. Randomization
2. Exclusion Restriction
3. First-stage relationship
4. Monotonicity

28 / 48
Randomization

• Need the instrument to be randomized:

[{𝑌𝑖 (𝑑, 𝑧), ∀𝑑, 𝑧}, 𝐷𝑖 (1), 𝐷𝑖 (0)] ⟂⟂ 𝑍𝑖

• We can weaken this to conditional ignorability


• But why believe conditional ignorability for the instrument but
not the treatment?
• Best instruments are truly randomized.
• Identifies the intent-to-treat (ITT) effect:

𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0] = 𝐸[𝑌𝑖 (𝐷𝑖 (1), 1) − 𝑌𝑖 (𝐷𝑖 (0), 0)]

29 / 48
Exclusion Restriction

• The instrument has no direct effect on the outcome, once we


fix the value of the treatment.

𝑌𝑖 (𝑑, 1) = 𝑌𝑖 (𝑑, 0) for 𝑑 = 0, 1

• Given this exclusion restriction, we know that the potential


outcomes for each treatment status only depend on the
treatment, not the instrument:

𝑌𝑖 (1) ≡ 𝑌𝑖 (1, 1) = 𝑌𝑖 (1, 0)


𝑌𝑖 (0) ≡ 𝑌𝑖 (0, 1) = 𝑌𝑖 (0, 0)

• NOT A TESTABLE ASSUMPTION

30 / 48
The linear model with
heterogeneous effects

• As usual, rewrite 𝑌𝑖 using consistency:

𝑌𝑖 = 𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖


= 𝛼 0 + 𝜏 𝑖 𝐷 𝑖 + 𝜂𝑖

• Here, we have 𝛼0 = 𝐸[𝑌𝑖 (0)] and 𝜏𝑖 = 𝑌𝑖 (1) − 𝑌𝑖 (0).

31 / 48
First Stage

• This next assumption is a little mundane, but turns out to be


very important: the instrument must have an effect on the
treatment.
𝐸[𝐷𝑖 (1) − 𝐷𝑖 (0)] ≠ 0
• Otherwise, what would we be doing? The instrument
wouldn’t affect anything.
• Implies that Cov(𝐷𝑖 , 𝑍𝑖 ) ≠ 0

32 / 48
Monotonicity

• Lastly, we need to make another assumption about the


relationship between the instrument and the treatment.
• Monotonicity says that the presence of the instrument never
dissuades someone from taking the treatment:

𝐷𝑖 (1) − 𝐷𝑖 (0) ≥ 0

• Note if this holds in the opposite direction 𝐷𝑖 (1) − 𝐷𝑖 (0) ≤ 0,


we can always rescale 𝐷𝑖 to make the assumption hold.

33 / 48
Monotonicity means no defiers

• This is sometimes called no defiers.


• With a binary treatment and a binary instrument, there are
four groups:
Name 𝐷𝑖 (1) 𝐷𝑖 (0)
Always Takers 1 1
Never Takers 0 0
Compliers 1 0
Defiers 0 1
• These compliance groups are sometimes called principal strata.
• The monotonicity assumption remove the possibility of there
being defiers in the population.
• Anyone with 𝐷𝑖 = 1 when 𝑍𝑖 = 0 must be an always-taker and
anyone with 𝐷𝑖 = 0 when 𝑍𝑖 = 1 must be a never-taker.

34 / 48
Local Average Treatment Effect
(LATE)

• Under these four assumptions, the Wald estimator is equal


what we call Local average treatment effect (LATE) or the
complier average treatment effect (CATE).
• This is is the ATE among the compliers: those that take the
treatment when encouraged to do so.
• That is, the LATE theorem, states that:

𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0]


= 𝐸[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 (1) > 𝐷𝑖 (0)]
𝐸[𝐷𝑖 |𝑍𝑖 = 1] − 𝐸[𝐷𝑖 |𝑍𝑖 = 0]
• This fact was a massive intellectual jump in our understanding
of IV.

35 / 48
Proof of the LATE theorem
• Under the exclusion restriction and randomization,

𝐸[𝑌𝑖 |𝑍𝑖 = 1] = 𝐸[𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1]


= 𝐸[𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 (1)] (randomization)
• The same applies to when 𝑍𝑖 = 0, so we have

𝐸[𝑌𝑖 |𝑍𝑖 = 0] = 𝐸[𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 (0)]


• Thus, 𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0] =

𝐸[(𝑌𝑖 (1) − 𝑌𝑖 (0))(𝐷𝑖 (1) − 𝐷𝑖 (0))]


=𝐸[(𝑌𝑖 (1) − 𝑌𝑖 (0))(1)|𝐷𝑖 (1) > 𝐷𝑖 (0)] Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)]
+𝐸[(𝑌𝑖 (1) − 𝑌𝑖 (0))(−1)|𝐷𝑖 (1) < 𝐷𝑖 (0)] Pr[𝐷𝑖 (1) < 𝐷𝑖 (0)]
=𝐸[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 (1) > 𝐷𝑖 (0)] Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)]
• The third equality comes from monotonicity: with this
assumption, 𝐷𝑖 (1) < 𝐷𝑖 (0) never occurs.
36 / 48
Proof (continued)

𝐸[𝑌𝑖 |𝑍𝑖 = 1]−𝐸[𝑌𝑖 |𝑍𝑖 = 0] = 𝐸[𝑌𝑖 (1)−𝑌𝑖 (0)|𝐷𝑖 (1) > 𝐷𝑖 (0)] Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)]

• We can use the same argument for the denominator:

𝐸[𝐷𝑖 |𝑍𝑖 = 1] − 𝐸[𝐷𝑖 |𝑍𝑖 = 0] = 𝐸[𝐷𝑖 (1) − 𝐷𝑖 (0)]


= Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)]

• Dividing these two expressions through gives the LATE.

37 / 48
Is the LATE useful?

• Once we allow for heterogeneous effects, all we can estimate


with IV is the effect of treatment among compliers.
• This is a unknown subset of the data.
▶ Treated units are a mix of always takers and compliers.
▶ Control units are a mix of never takers and compliers.

• Without further assumptions, 𝜏𝐿𝐴𝑇 𝐸 ≠ 𝜏𝐴𝑇 𝐸 .


• Complier group depends on the instrument ⇝ different IVs
will lead to different estimands.
• 2SLS “cheats” by assuming that the effect is constant, so it is
the same for compliers and non-compliers.

38 / 48
Randomized trials with one-sided
noncompliance
• Will the LATE ever be equal to a usual causal quantity?
• When non-compliance is one-sided, then the LATE is equal to
the ATT.
• Think of a randomized experiment:
▶ Randomized treatment assignment = instrument (𝑍𝑖 )
▶ Non-randomized actual treatment taken = treatment (𝐷𝑖 )
• One-sided noncompliance: only those assigned to treatment
(control) can actually take the treatment (control). Or

𝐷𝑖 (0) = 0∀𝑖 ⇝ Pr[𝐷𝑖 = 1|𝑍𝑖 = 0] = 0

• Maybe this is because only those treated actually get pills or


only they are invited to the job training location.

39 / 48
Benefits of one-sided
noncompliance
• One-sided noncompliance ⇝ no “always-takers” and since
there are no defiers,
▶ Treated units must be compliers.
▶ ATT is the same as the LATE.
• Thus, we know that: 𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0] =
𝔼[𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1] − 𝔼[𝑌𝑖 (0)|𝑍𝑖 = 0]
(exclusion restriction + one-sided noncompliance)
=𝔼[𝑌𝑖 (0)|𝑍𝑖 = 1] + 𝐸[(𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1] − 𝔼[𝑌𝑖 (0)|𝑍𝑖 = 0]
=𝔼[𝑌𝑖 (0)] + 𝔼[(𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1] − 𝔼[𝑌𝑖 (0)]
(randomization)
=𝔼[(𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1]
=𝔼[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 = 1, 𝑍𝑖 = 1] Pr[𝐷𝑖 = 1|𝑍𝑖 = 1]
(law of iterated expectations + binary treatment)
=𝔼[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 = 1] Pr[𝐷𝑖 = 1|𝑍𝑖 = 1]
(one-sided noncompliance) 40 / 48
• Noting that Pr[𝐷𝑖 = 1|𝑍𝑖 = 0] = 0, then the Wald estimator is
just the ATT:
𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0]
= 𝐸[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 = 1]
Pr[𝐷𝑖 = 1|𝑍𝑖 = 1]
• Thus, under the additional assumption of one-sided
compliance, we can estimate the ATT using the usual IV
approach

41 / 48
4/ IV extensions

42 / 48
Falsification tests
𝑈

𝑍 𝐷 𝑌

exclusion restriction

• The exclusion restriction cannot be tested directly, but it can


be falsified.
• Falsification test Test the reduced form effect of 𝑍𝑖 on 𝑌𝑖 in
situations where it is impossible or extremely unlikely that 𝑍𝑖
could affect 𝐷𝑖 .
• Because 𝑍𝑖 can’t affect 𝐷𝑖 , then the exclusion restriction
implies that this falsification test should have 0 effect.
• Nunn & Wantchekon (2011): use distance to coast as an
instrument for Africans, use distance to the coast in an Asian
sample as falsification test.
43 / 48
Nunn & Wantchekon falsification
test

44 / 48
Size, characteristics of the
compliers

• While we cannot identify who is a complier and who is not a


complier in general, we can estimate the size of the complier
group:

Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)] = 𝐸[𝐷𝑖 (1)−𝐷𝑖 (0)] = 𝐸[𝐷𝑖 |𝑍𝑖 = 1]−𝐸[𝐷𝑖 |𝑍𝑖 = 0]

• Can extend this to calculate features of the complier group:


▶ Covariate means, variances, etc.
▶ Abadie (2003) shows how to weight the data to estimate these
quantities.

45 / 48
Multiple instruments

• Different instruments ⇝ different LATEs


▶ Instrument 1, 𝑍𝑖1 with LATE 𝜏1
▶ Instrument 2, 𝑍𝑖2 with LATE 𝜏2
• Use both in the first stage:

̂
𝐷𝑖 = 𝜋1 𝑍1𝑖 + 𝜋2 𝑍2𝑖 .

46 / 48
2SLS as weighted average

• MHE shows that the 2SLS estimator using these two


instruments is a weighted sum of the two component LATEs:

𝜌2𝑆𝐿𝑆 = 𝜓𝜏1 + (1 − 𝜓)𝜏2 ,

where the weights are:


𝜋1 Cov(𝐷𝑖 , 𝑍1𝑖 )
𝜓=
𝜋1 Cov(𝐷𝑖 , 𝑍1𝑖 ) + 𝜋2 Cov(𝐷𝑖 , 𝑍2𝑖 )
• Thus, the 2SLS estimate is a weighted average of causal
effects for each instrument, where the weights are related to
the strength of first-stage.

47 / 48
Covariates and heterogeneous
effects
• It might be the case that the above assumptions only hold
conditional on some covariates, 𝑋𝑖 . That is, instead of
randomization, we might have conditional ignorability:

[{𝑌𝑖 (𝑑, 𝑧), ∀𝑑, 𝑧}, 𝐷𝑖 (1), 𝐷𝑖 (0)] ⟂⟂ 𝑍𝑖 |𝑋𝑖

• We would also have exclusion conditional on the covariates:

Pr[𝑌𝑖 (𝑑, 0) = 𝑌𝑖 (𝑑, 1)|𝑋𝑖 ] = 1 for 𝑑 = 1, 0

• Under these assumptions, with fully saturated first and second


stages, then 2SLS estimates a weighted average of the
covariates-specific LATEs (very similar to regression).
• Abadie (2003) shows how to estimate the overall LATE using
a weighting approach based on a “propensity score” for the
instrument.
48 / 48

You might also like