0% found this document useful (0 votes)

14 views48 pages

s10 IV Handout

Uploaded by

Xi Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views48 pages

s10 IV Handout

Uploaded by

Xi Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Gov 2002: 10.

Instrumental
Variables
Matthew Blackwell
November 5, 2015

1 / 48
1. IV setup

2. IV with constant treatment effects

3. IV with heterogenous treatment effects

4. IV extensions

2 / 48
1/ IV setup

3 / 48
Where are we? Where are we
going?

• We saw how to identify and estimate effects under no

unmeasured confounding and with repeated measurements
• What if we have neither? Are we doomed?
• Not necessarily if you can identify some exogenous sources of
variation that drives the treatment.
• Instrumental variables allows for unmeasured confounding on
the the treatment-outcome relationship.
• Use the unconfounded variation in the instrument to help
identify treatment effects.

4 / 48
Basic IV setup with DAGs
𝑈

𝑍 𝐷 𝑌

exclusion restriction

• 𝑍 is the instrument, 𝐷 is the treatment, and 𝑈 is the

unmeasured confounder
• Exclusion restriction
▶ no common causes of the instrument and the outcome
▶ no direct or indirect effect of the instrument on the outcome
not through the treatment.

• First-stage relationship: 𝑍 affects 𝐷

5 / 48
An IV is only as good as its
assumptions

𝑍 𝐷 𝑌

exclusion restriction

• Finding a believable instrument is incredibly difficult and some

people never believe any IV setups.
• When effects vary, the IV approach estimates a “local” ATE
that is local to this particular instrument.

6 / 48
IVs in the ﬁeld
• Angrist (1990): Draft lottery as an IV for military service
(income as outcome)
• Acemoglu et al (2001): settler mortality as an IV for
institutional quality (GDP/capita as outcome)
• Levitt (1997): being an election year as IV for police force size
(crime as outcome)
• Kern & Hainmueller (2009): having West German TV
reception in East Berlin as an instrument for West German TV
watching (outcome is support for the East German regime)
• Nunn & Wantchekon (2011): historical distance of ethnic
group to the coast as a instrument for the slave raiding of
that ethnic group (outcome are trust attitudes today)
• Acharya, Blackwell, Sen (2015): cotton suitability as IV for
proportion slave in 1860 (outcome is white attitudes today)

7 / 48
2/ IV with
constant
treatment effects
8 / 48
IV with constant effects

• Let’s write down a causal model for 𝑌𝑖 with constant effects

and an unmeasured confounder, 𝑈𝑖 :

𝑌𝑖 (𝑑, 𝑢) = 𝛼 + 𝜏𝑑 + 𝛾𝑢 + 𝜂𝑖

• If we connect this with a consistency assumption, we get the

this regression form:

𝑌𝑖 = 𝛼 + 𝜏𝐷𝑖 + 𝛾𝑈𝑖 + 𝜂𝑖

• Here we assume that 𝔼[𝐷𝑖 𝜂𝑖 ] = 0, so if we measured 𝑈𝑖 , then

we would be able to estimate 𝜏.
• But Cov(𝛾𝑈𝑖 + 𝜂𝑖 , 𝐷𝑖 ) ≠ 0 because 𝑈 is a common cause of 𝐷
and 𝑌 .

9 / 48
The role of the instrument

• If we have an instrument, 𝑍𝑖 , that satisfies the exclusions

restriction, then

Cov(𝛾𝑈𝑖 + 𝜂𝑖 , 𝑍𝑖 ) = 0

• It must be independent of 𝑈𝑖 and it has no correlation with 𝜂𝑖

because neither does the treatment.
Cov(𝑌𝑖 , 𝑍𝑖 ) = Cov(𝛼 + 𝜏𝐷𝑖 + 𝛾𝑈𝑖 + 𝜂𝑖 , 𝑍𝑖 )
= Cov(𝛼, 𝑍𝑖 ) + Cov(𝜏𝐷𝑖 , 𝑍𝑖 ) + Cov(𝛾𝑈𝑖 + 𝜂𝑖 , 𝑍𝑖 )
= 0 + 𝜏Cov(𝐷𝑖 , 𝑍𝑖 ) + 0

10 / 48
IV estimator with constant effects

𝑌𝑖 = 𝛼 + 𝜏𝐷𝑖 + 𝛾𝑈𝑖 + 𝜂𝑖

• With this in hand, we can formulate an expression for the

average treatment effect here:

Cov(𝑌𝑖 , 𝑍𝑖 ) Cov(𝑌𝑖 , 𝑍𝑖 )/𝕍[𝑍𝑖 ]

𝜏= =
Cov(𝐷𝑖 , 𝑍𝑖 ) Cov(𝐷𝑖 , 𝑍𝑖 )/𝕍[𝑍𝑖 ]
• Reduced form coefficient: Cov(𝑌𝑖 , 𝑍𝑖 )/𝕍[𝑍𝑖 ]
• First stage coefficient: Cov(𝐷𝑖 , 𝑍𝑖 )/𝕍[𝑍𝑖 ]

11 / 48
Weak instruments
• Natural estimator:

̂ 𝑖 , 𝑍𝑖 )
Cov(𝑌
𝜏
̂𝐼𝑉 =
̂ 𝑖 , 𝑍𝑖 )
Cov(𝐷
• What happens with a weak first stage? Can show that this
estimator converges to:
𝑝 Cov(𝑍𝑖 , 𝑈𝑖 )
𝜏
̂𝐼𝑉 → 𝜏 +
Cov(𝑍𝑖 , 𝐷𝑖 )
• If Cov(𝑍𝑖 , 𝐷𝑖 ) is small, then even very small violations of the
exclusion restriction Cov(𝑍𝑖 , 𝑈𝑖 ) ≠ 0 can lead to large
inconsistencies and finite sample bias.
• Important to convey the strength of the first-stage via 𝑡-test
or 𝐹-test with multiple instruments.

12 / 48
Wald Estimator

• Binary instrument leads to the Wald estimator:

Cov(𝑌𝑖 , 𝑍𝑖 ) 𝔼[𝑌𝑖 |𝑍𝑖 = 1] − 𝔼[𝑌𝑖 |𝑍𝑖 = 0]

𝜏= =
Cov(𝐷𝑖 , 𝑍𝑖 ) 𝔼[𝐷𝑖 |𝑍𝑖 = 1] − 𝔼[𝐷𝑖 |𝑍𝑖 = 0]
• Intuitively:

effect of instrument on outcome

effect of instrument on treatment

13 / 48
What about covariates?

• No covariates up until now. What if we have a set of

covariates 𝑋𝑖 that we are also conditioning on?
• Let’s start with linear models for both the outcome and the
treatment:
𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜏𝐷𝑖 + 𝜀𝑖
𝐷𝑖 = 𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 + 𝜈𝑖
• Now, we assume that 𝑋𝑖 are exogenous along with 𝑍𝑖 :

𝔼[𝑍𝑖 𝜈𝑖 ] = 0 𝔼[𝑍𝑖 𝜀𝑖 ] = 0

𝔼[𝑋𝑖 𝜈𝑖 ] = 0 𝔼[𝑋𝑖 𝜀𝑖 ] = 0
• …but 𝐷𝑖 is endogenous: 𝔼[𝐷𝑖 𝜀𝑖 ] ≠ 0

14 / 48
Getting the reduced form
• We can plug the treatment equation into the outcome
equation:

𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜏[𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 + 𝜈𝑖 ] + 𝜀𝑖

= 𝑋𝑖′ 𝛽 + 𝜏[𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 ] + [𝜏𝜈𝑖 + 𝜀𝑖 ]
= 𝑋𝑖′ 𝛽 + 𝜏[𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 ] + 𝜀∗𝑖
= 𝑋𝑖′ 𝛽 + 𝜏𝔼[𝐷𝑖 |𝑋𝑖 , 𝑍𝑖 ] + 𝜀∗𝑖

• Red value in the brackets is the population fitted value of the

treatment, 𝔼[𝐷𝑖 |𝑋𝑖 , 𝑍𝑖 ]
• Because 𝑍𝑖 and 𝑋𝑖 are uncorrelated with 𝜈𝑖 and 𝜀𝑖 , then this
fitted value is also independent of 𝜀∗𝑖 .
• Thus, the population regression coefficient of a 𝑌𝑖 on
[𝑋𝑖′ 𝛼 + 𝛾𝑍𝑖 ] is the average treatment effect, 𝜏.

15 / 48
Two-stage least squares

• Estimate 𝛼
̂ and 𝛾
̂ from OLS and form fitted values:

̂ 𝑖 |𝑋𝑖 , 𝑍𝑖 ] = ̂
𝔼[𝐷 𝐷𝑖 = 𝑋𝑖′ 𝛼
̂+𝛾
̂ 𝑍𝑖 .

• Regress of 𝑌𝑖 on 𝑋𝑖 and ̂
𝐷𝑖 . Add and subtract 𝜏̂
𝐷𝑖 :

𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜏̂
𝐷𝑖 + [𝜀𝑖 + 𝜏(𝐷𝑖 − ̂
𝐷𝑖 )]

• Key question: is ̂
𝐷𝑖 uncorrelated with the error?
• ̂
𝐷𝑖 is just a function of 𝑋𝑖 and 𝑍𝑖 so it is uncorrelated with 𝜀𝑖 .
• We also know that ̂
𝐷𝑖 is uncorrelated with (𝐷𝑖 − ̂
𝐷𝑖 )?

16 / 48
Two-stage least squares
• Heuristic procedure:
1. Run regression of treatment on covariates and instrument
2. Construct fitted values of treatment
3. Run regression of outcome on covariates and fitted values
• Note that this isn’t how we actually estimate 2SLS because
the standard errors are all wrong.
• Computer wants to calculate the standard errors based on 𝜀∗𝑖 :

𝜀∗𝑖 = 𝑌𝑖 − 𝑋𝑖′ 𝛽 − 𝜏̂
𝐷𝑖

• but what we really want is the standard errors based on 𝜀𝑖 :

𝜀𝑖 = 𝑌𝑖 − 𝑋𝑖′ 𝛽 − 𝜏𝐷𝑖

17 / 48
Nunn & Wantchekon IV example

18 / 48
General 2SLS

• Notational convenience: combine 𝑋𝑖 and 𝐷𝑖 into one matrix,

𝑋𝑖 , of size 𝑘, where one column contains 𝐷𝑖 .
• The structural model, then is:

𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜀𝑖

• 𝑍𝑖 will be a vector of 𝑙 exogenous variables that includes any

exogenous variables in 𝑋𝑖 plus any instruments.
• Key assumption on the instruments:

𝔼[𝑍𝑖 𝜀𝑖 ] = 0

19 / 48
Nasty Matrix Algebra
• Projection matrix projects values from the columns of 𝑍𝑖 to
the columns of 𝑋𝑖 :
Π = (𝔼[𝑍𝑖 𝑍𝑖′ ])−1 𝔼[𝑍𝑖 𝑋𝑖′ ] (projection matrix)
𝑋𝑖̃ = Π′ 𝑍𝑖 (fitted values)
• To derive the 2SLS estimator, take the fitted values, Π′ 𝑍𝑖 and
multiply both sides of the outcome equation by them:
𝑌𝑖 = 𝑋𝑖′ 𝛽 + 𝜀𝑖
Π′ 𝑍𝑖 𝑌𝑖 = Π′ 𝑍𝑖 𝑋𝑖′ 𝛽 + Π′ 𝑍𝑖 𝜀𝑖
𝔼[Π′ 𝑍𝑖 𝑌𝑖 ] = 𝔼[Π′ 𝑍𝑖 𝑋𝑖′ ]𝛽 + 𝔼[Π′ 𝑍𝑖 𝜀𝑖 ]
𝔼[Π′ 𝑍𝑖 𝑌𝑖 ] = 𝔼[Π′ 𝑍𝑖 𝑋𝑖′ ]𝛽 + Π′ 𝔼[𝑍𝑖 𝜀𝑖 ]
𝔼[Π′ 𝑍𝑖 𝑌𝑖 ] = 𝔼[Π′ 𝑍𝑖 𝑋𝑖′ ]𝛽
𝔼[𝑋𝑖̃ 𝑌𝑖 ] = 𝔼[𝑋𝑖̃ 𝑋𝑖′ ]𝛽
𝛽 = (𝔼[𝑋𝑖̃ 𝑋𝑖′ ])−1 𝔼[𝑋𝑖̃ 𝑌𝑖 ]

20 / 48
How to estimate the parameters
• Collect 𝑋𝑖 into a 𝑛 × 𝑘 matrix 𝐗 = (𝑋1′ , … , 𝑋𝑛′ )
• Collect 𝑍𝑖 into a 𝑛 × 𝑙 matrix 𝐙 = (𝑍1′ , … , 𝑍𝑛′ )
̂ = 𝐙(𝐙′ 𝐙)−1 𝐙′ 𝐗 be the matrix of fitted values for 𝐗,
• Let 𝐗
then we have
𝑁 𝑝
• Matrix party trick: 𝐗′ 𝐙/𝑛 = (1/𝑛) ∑𝑖 𝑋𝑖 𝑍𝑖′ → 𝔼[𝑋𝑖 𝑍𝑖′ ].
• Take the population formula for the parameters:

𝛽 = (𝔼[𝑋𝑖̃ 𝑋𝑖′ ])−1 𝔼[𝑋𝑖̃ 𝑌𝑖 ]

• And plug in the sample values (the 𝑛 cancels out):

̂ ′ 𝐗)−1 𝐗
𝛽 ̂ = (𝐗 ̂ ′𝐲

• This is how R/Stata estimates the 2SLS parameters

21 / 48
Asymptotics for 2SLS
̂ ′ 𝐗)−1 𝐗
𝛽 ̂ = (𝐗 ̂ ′𝐲

• We can insert the true model for 𝐲:

̂ ′ 𝐗)−1 𝐗
𝛽 ̂ = (𝐗 ̂ ′ (𝐗𝛽 + 𝜀)

̂ ′𝐗 = 𝐗
• Using the matrix party trick and that 𝐗 ̂ ′ 𝐗,
̂ we have

̂ ′ 𝐗)−1 𝐗
𝛽 ̂ = (𝐗 ̂ ′ 𝐗𝛽 + (𝐗
̂ ′ 𝐗)−1 𝐗
̂ ′𝜀
= 𝛽 + (𝐗̂ ′ 𝐗)
̂ −1 𝐗
̂ ′𝜀
−1
̂𝑖 𝑋
= 𝛽 + [𝑛−1 ∑ 𝑋 ̂𝑖′ ] 𝑛−1 ∑ 𝑋
̂𝑖 𝜀𝑖
𝑖 𝑖

𝑝
̂𝑖 𝜀𝑖 → 𝔼[𝑋
• Consistent because 𝑛−1 ∑𝑖 𝑋 ̂𝑖 𝜀𝑖 ] = 0.

22 / 48
Asymptotic variance for 2SLS
−1
̂𝑖 𝑋
√𝑛(𝛽̂ − 𝛽) = (𝑛−1 ∑ 𝑋 ̂𝑖′ ) (𝑛−1/2 ∑ 𝑋
̂𝑖 𝜀𝑖 )
𝑖 𝑖

̂𝑖 𝜀𝑖 converges in distribution to
• By the CLT, 𝑛−1/2 ∑𝑖 𝑋
𝑁(0, 𝐵), where 𝐵 = 𝔼[𝑋̂𝑖′ 𝜀′𝑖 𝜀𝑖 𝑋
̂𝑖 ].
𝑝
̂𝑖 𝑋
• By the LLN, 𝑛−1 ∑𝑖 𝑋 ̂𝑖′ → 𝔼[𝑋 ̂𝑖 𝑋̂𝑖′ ].
• Thus, we have that √𝑛(𝛽̂ − 𝛽) has asymptotic variance:

̂𝑖 𝑋
(𝔼[𝑋 ̂𝑖′ ])−1 𝔼[𝑋
̂𝑖′ 𝜀′𝑖 𝜀𝑖 𝑋
̂𝑖 ](𝔼[𝑋
̂𝑖 𝑋
̂𝑖′ ])−1

• Replace with the sample quantities to get estimate of the

robust 2SLS variance estimator:
̂ ′ 𝐗)
ar(𝛽)̂ = (𝐗
v̂ ̂ −1 ( ∑ 𝑢𝑖2̂ 𝑋
̂𝑖 𝑋 ̂ ′ 𝐗)
̂𝑖′ )(𝐗 ̂ −1
𝑖

where 𝑢𝑖̂ = 𝑌𝑖 − 𝑋𝑖′ 𝛽̂

23 / 48
Overidentiﬁcation

• What if we have more instruments than endogenous variables?

• When there are more instruments than causal parameters
(𝑙 > 𝑘), the model is overidentified.
• When there are as many instruments as causal parameters
(𝑙 = 𝑘), the model is just identified.
• With more than one instrument and constant effects, we can
test for the plausibility of the exclusion restriction(s) using an
overidentification test.
• Is it plausible to find more than one instrument?

24 / 48
Overidentiﬁcation tests
• Sargan-Hausman test:
▶ Under the null of all valid instruments, using all instruments
versus a subset should only differ by sampling variation.
▶ Regress 2SLS residuals, 𝜀𝑖̂ on 𝑋𝑖 and calculate 𝑅𝑢2 from this
regression.
▶ Under the null (and homoskedasticity), 𝑁𝑅𝑢2 ∼ Χ2𝑙−𝑘 .
▶ Degrees of freedom depends on how many overidentifying
restrictions there are.
• If we reject the null hypothesis in these overidentification
tests, then it means that the exclusion restrcitions for our
instruments are probably incorrect.
• Note that it won’t tell us which of them are incorrect, just
that at least one is.
• These overidentification tests depend heavily on the constant
effects assumption

25 / 48
3/ IV with
heterogenous
treatment effects
26 / 48
Instrumental Variables and
Potential Outcomes
• Basic idea of IV:
▶ 𝐷𝑖 not randomized, but 𝑍𝑖 is
▶ 𝑍𝑖 only affects 𝑌𝑖 through 𝐷𝑖

• 𝐷𝑖 now depends on 𝑍𝑖 ⇝ potential treatments:

𝐷𝑖 (1) = 𝐷𝑖 (𝑧 = 1) and 𝐷𝑖 (0).
• Consistency:

𝐷𝑖 = 𝑍𝑖 𝐷𝑖 (1) + (1 − 𝑍𝑖 )𝐷𝑖 (0)

• Outcome can depend on both the treatment and the

instrument: 𝑌𝑖 (𝑑, 𝑧) is the outcome if unit 𝑖 had received
treatment 𝐷𝑖 = 𝑑 and instrument value 𝑍𝑖 = 𝑧.

27 / 48
Key assumptions

1. Randomization
2. Exclusion Restriction
3. First-stage relationship
4. Monotonicity

28 / 48
Randomization

• Need the instrument to be randomized:

[{𝑌𝑖 (𝑑, 𝑧), ∀𝑑, 𝑧}, 𝐷𝑖 (1), 𝐷𝑖 (0)] ⟂⟂ 𝑍𝑖

• We can weaken this to conditional ignorability

• But why believe conditional ignorability for the instrument but
not the treatment?
• Best instruments are truly randomized.
• Identifies the intent-to-treat (ITT) effect:

𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0] = 𝐸[𝑌𝑖 (𝐷𝑖 (1), 1) − 𝑌𝑖 (𝐷𝑖 (0), 0)]

29 / 48
Exclusion Restriction

• The instrument has no direct effect on the outcome, once we

fix the value of the treatment.

𝑌𝑖 (𝑑, 1) = 𝑌𝑖 (𝑑, 0) for 𝑑 = 0, 1

• Given this exclusion restriction, we know that the potential

outcomes for each treatment status only depend on the
treatment, not the instrument:

𝑌𝑖 (1) ≡ 𝑌𝑖 (1, 1) = 𝑌𝑖 (1, 0)

𝑌𝑖 (0) ≡ 𝑌𝑖 (0, 1) = 𝑌𝑖 (0, 0)

• NOT A TESTABLE ASSUMPTION

30 / 48
The linear model with
heterogeneous effects

• As usual, rewrite 𝑌𝑖 using consistency:

𝑌𝑖 = 𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖

= 𝛼 0 + 𝜏 𝑖 𝐷 𝑖 + 𝜂𝑖

• Here, we have 𝛼0 = 𝐸[𝑌𝑖 (0)] and 𝜏𝑖 = 𝑌𝑖 (1) − 𝑌𝑖 (0).

31 / 48
First Stage

• This next assumption is a little mundane, but turns out to be

very important: the instrument must have an effect on the
treatment.
𝐸[𝐷𝑖 (1) − 𝐷𝑖 (0)] ≠ 0
• Otherwise, what would we be doing? The instrument
wouldn’t affect anything.
• Implies that Cov(𝐷𝑖 , 𝑍𝑖 ) ≠ 0

32 / 48
Monotonicity

• Lastly, we need to make another assumption about the

relationship between the instrument and the treatment.
• Monotonicity says that the presence of the instrument never
dissuades someone from taking the treatment:

𝐷𝑖 (1) − 𝐷𝑖 (0) ≥ 0

• Note if this holds in the opposite direction 𝐷𝑖 (1) − 𝐷𝑖 (0) ≤ 0,

we can always rescale 𝐷𝑖 to make the assumption hold.

33 / 48
Monotonicity means no deﬁers

• This is sometimes called no defiers.

• With a binary treatment and a binary instrument, there are
four groups:
Name 𝐷𝑖 (1) 𝐷𝑖 (0)
Always Takers 1 1
Never Takers 0 0
Compliers 1 0
Defiers 0 1
• These compliance groups are sometimes called principal strata.
• The monotonicity assumption remove the possibility of there
being defiers in the population.
• Anyone with 𝐷𝑖 = 1 when 𝑍𝑖 = 0 must be an always-taker and
anyone with 𝐷𝑖 = 0 when 𝑍𝑖 = 1 must be a never-taker.

34 / 48
Local Average Treatment Effect
(LATE)

• Under these four assumptions, the Wald estimator is equal

what we call Local average treatment effect (LATE) or the
complier average treatment effect (CATE).
• This is is the ATE among the compliers: those that take the
treatment when encouraged to do so.
• That is, the LATE theorem, states that:

𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0]

= 𝐸[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 (1) > 𝐷𝑖 (0)]
𝐸[𝐷𝑖 |𝑍𝑖 = 1] − 𝐸[𝐷𝑖 |𝑍𝑖 = 0]
• This fact was a massive intellectual jump in our understanding
of IV.

35 / 48
Proof of the LATE theorem
• Under the exclusion restriction and randomization,

𝐸[𝑌𝑖 |𝑍𝑖 = 1] = 𝐸[𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1]

= 𝐸[𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 (1)] (randomization)
• The same applies to when 𝑍𝑖 = 0, so we have

𝐸[𝑌𝑖 |𝑍𝑖 = 0] = 𝐸[𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 (0)]

• Thus, 𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0] =

𝐸[(𝑌𝑖 (1) − 𝑌𝑖 (0))(𝐷𝑖 (1) − 𝐷𝑖 (0))]

=𝐸[(𝑌𝑖 (1) − 𝑌𝑖 (0))(1)|𝐷𝑖 (1) > 𝐷𝑖 (0)] Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)]
+𝐸[(𝑌𝑖 (1) − 𝑌𝑖 (0))(−1)|𝐷𝑖 (1) < 𝐷𝑖 (0)] Pr[𝐷𝑖 (1) < 𝐷𝑖 (0)]
=𝐸[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 (1) > 𝐷𝑖 (0)] Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)]
• The third equality comes from monotonicity: with this
assumption, 𝐷𝑖 (1) < 𝐷𝑖 (0) never occurs.
36 / 48
Proof (continued)

𝐸[𝑌𝑖 |𝑍𝑖 = 1]−𝐸[𝑌𝑖 |𝑍𝑖 = 0] = 𝐸[𝑌𝑖 (1)−𝑌𝑖 (0)|𝐷𝑖 (1) > 𝐷𝑖 (0)] Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)]

• We can use the same argument for the denominator:

𝐸[𝐷𝑖 |𝑍𝑖 = 1] − 𝐸[𝐷𝑖 |𝑍𝑖 = 0] = 𝐸[𝐷𝑖 (1) − 𝐷𝑖 (0)]

= Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)]

• Dividing these two expressions through gives the LATE.

37 / 48
Is the LATE useful?

• Once we allow for heterogeneous effects, all we can estimate

with IV is the effect of treatment among compliers.
• This is a unknown subset of the data.
▶ Treated units are a mix of always takers and compliers.
▶ Control units are a mix of never takers and compliers.

• Without further assumptions, 𝜏𝐿𝐴𝑇 𝐸 ≠ 𝜏𝐴𝑇 𝐸 .

• Complier group depends on the instrument ⇝ different IVs
will lead to different estimands.
• 2SLS “cheats” by assuming that the effect is constant, so it is
the same for compliers and non-compliers.

38 / 48
Randomized trials with one-sided
noncompliance
• Will the LATE ever be equal to a usual causal quantity?
• When non-compliance is one-sided, then the LATE is equal to
the ATT.
• Think of a randomized experiment:
▶ Randomized treatment assignment = instrument (𝑍𝑖 )
▶ Non-randomized actual treatment taken = treatment (𝐷𝑖 )
• One-sided noncompliance: only those assigned to treatment
(control) can actually take the treatment (control). Or

𝐷𝑖 (0) = 0∀𝑖 ⇝ Pr[𝐷𝑖 = 1|𝑍𝑖 = 0] = 0

• Maybe this is because only those treated actually get pills or

only they are invited to the job training location.

39 / 48
Beneﬁts of one-sided
noncompliance
• One-sided noncompliance ⇝ no “always-takers” and since
there are no defiers,
▶ Treated units must be compliers.
▶ ATT is the same as the LATE.
• Thus, we know that: 𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0] =
𝔼[𝑌𝑖 (0) + (𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1] − 𝔼[𝑌𝑖 (0)|𝑍𝑖 = 0]
(exclusion restriction + one-sided noncompliance)
=𝔼[𝑌𝑖 (0)|𝑍𝑖 = 1] + 𝐸[(𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1] − 𝔼[𝑌𝑖 (0)|𝑍𝑖 = 0]
=𝔼[𝑌𝑖 (0)] + 𝔼[(𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1] − 𝔼[𝑌𝑖 (0)]
(randomization)
=𝔼[(𝑌𝑖 (1) − 𝑌𝑖 (0))𝐷𝑖 |𝑍𝑖 = 1]
=𝔼[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 = 1, 𝑍𝑖 = 1] Pr[𝐷𝑖 = 1|𝑍𝑖 = 1]
(law of iterated expectations + binary treatment)
=𝔼[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 = 1] Pr[𝐷𝑖 = 1|𝑍𝑖 = 1]
(one-sided noncompliance) 40 / 48
• Noting that Pr[𝐷𝑖 = 1|𝑍𝑖 = 0] = 0, then the Wald estimator is
just the ATT:
𝐸[𝑌𝑖 |𝑍𝑖 = 1] − 𝐸[𝑌𝑖 |𝑍𝑖 = 0]
= 𝐸[𝑌𝑖 (1) − 𝑌𝑖 (0)|𝐷𝑖 = 1]
Pr[𝐷𝑖 = 1|𝑍𝑖 = 1]
• Thus, under the additional assumption of one-sided
compliance, we can estimate the ATT using the usual IV
approach

41 / 48
4/ IV extensions

42 / 48
Falsiﬁcation tests
𝑈

𝑍 𝐷 𝑌

exclusion restriction

• The exclusion restriction cannot be tested directly, but it can

be falsified.
• Falsification test Test the reduced form effect of 𝑍𝑖 on 𝑌𝑖 in
situations where it is impossible or extremely unlikely that 𝑍𝑖
could affect 𝐷𝑖 .
• Because 𝑍𝑖 can’t affect 𝐷𝑖 , then the exclusion restriction
implies that this falsification test should have 0 effect.
• Nunn & Wantchekon (2011): use distance to coast as an
instrument for Africans, use distance to the coast in an Asian
sample as falsification test.
43 / 48
Nunn & Wantchekon falsiﬁcation
test

44 / 48
Size, characteristics of the
compliers

• While we cannot identify who is a complier and who is not a

complier in general, we can estimate the size of the complier
group:

Pr[𝐷𝑖 (1) > 𝐷𝑖 (0)] = 𝐸[𝐷𝑖 (1)−𝐷𝑖 (0)] = 𝐸[𝐷𝑖 |𝑍𝑖 = 1]−𝐸[𝐷𝑖 |𝑍𝑖 = 0]

• Can extend this to calculate features of the complier group:

▶ Covariate means, variances, etc.
▶ Abadie (2003) shows how to weight the data to estimate these
quantities.

45 / 48
Multiple instruments

• Different instruments ⇝ different LATEs

▶ Instrument 1, 𝑍𝑖1 with LATE 𝜏1
▶ Instrument 2, 𝑍𝑖2 with LATE 𝜏2
• Use both in the first stage:

̂
𝐷𝑖 = 𝜋1 𝑍1𝑖 + 𝜋2 𝑍2𝑖 .

46 / 48
2SLS as weighted average

• MHE shows that the 2SLS estimator using these two

instruments is a weighted sum of the two component LATEs:

𝜌2𝑆𝐿𝑆 = 𝜓𝜏1 + (1 − 𝜓)𝜏2 ,

where the weights are:

𝜋1 Cov(𝐷𝑖 , 𝑍1𝑖 )
𝜓=
𝜋1 Cov(𝐷𝑖 , 𝑍1𝑖 ) + 𝜋2 Cov(𝐷𝑖 , 𝑍2𝑖 )
• Thus, the 2SLS estimate is a weighted average of causal
effects for each instrument, where the weights are related to
the strength of first-stage.

47 / 48
Covariates and heterogeneous
effects
• It might be the case that the above assumptions only hold
conditional on some covariates, 𝑋𝑖 . That is, instead of
randomization, we might have conditional ignorability:

[{𝑌𝑖 (𝑑, 𝑧), ∀𝑑, 𝑧}, 𝐷𝑖 (1), 𝐷𝑖 (0)] ⟂⟂ 𝑍𝑖 |𝑋𝑖

• We would also have exclusion conditional on the covariates:

Pr[𝑌𝑖 (𝑑, 0) = 𝑌𝑖 (𝑑, 1)|𝑋𝑖 ] = 1 for 𝑑 = 1, 0

• Under these assumptions, with fully saturated first and second

stages, then 2SLS estimates a weighted average of the
covariates-specific LATEs (very similar to regression).
• Abadie (2003) shows how to estimate the overall LATE using
a weighting approach based on a “propensity score” for the
instrument.
48 / 48

(Cornelius Lanczos) Linear Differential Operators
No ratings yet
(Cornelius Lanczos) Linear Differential Operators
582 pages
PHYS30201
No ratings yet
PHYS30201
97 pages
Lesson 5 - Instrumental Variables
No ratings yet
Lesson 5 - Instrumental Variables
14 pages
Economics 717 Fall 2019 Lecture - IV PDF
No ratings yet
Economics 717 Fall 2019 Lecture - IV PDF
30 pages
LATE - An Intro
No ratings yet
LATE - An Intro
24 pages
Cathy Econ0019 - w3
No ratings yet
Cathy Econ0019 - w3
44 pages
Endogeneity and Instrumental Variables
No ratings yet
Endogeneity and Instrumental Variables
22 pages
05 - Instrumental Variables
No ratings yet
05 - Instrumental Variables
92 pages
Vb V ε X = σ Vb = σ Vb = X'X Σx X'X: I X'X X'
No ratings yet
Vb V ε X = σ Vb = σ Vb = X'X Σx X'X: I X'X X'
9 pages
Instrumental Variable: Rus'an Nasrudin
No ratings yet
Instrumental Variable: Rus'an Nasrudin
29 pages
Empirical Methods in Microeconomics
No ratings yet
Empirical Methods in Microeconomics
3 pages
Day 9.2
No ratings yet
Day 9.2
81 pages
Instrumental Variables
No ratings yet
Instrumental Variables
33 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
ECON6200 Section 3
No ratings yet
ECON6200 Section 3
3 pages
Lectures On IV Estimation: 1 General Set-UP
No ratings yet
Lectures On IV Estimation: 1 General Set-UP
7 pages
Panel Data Lecture Rome
No ratings yet
Panel Data Lecture Rome
47 pages
Applied Economics IV Lecture Notes
No ratings yet
Applied Economics IV Lecture Notes
64 pages
Cathy Econ0019 - w2
No ratings yet
Cathy Econ0019 - w2
62 pages
05 - Instrumental Variables PDF
No ratings yet
05 - Instrumental Variables PDF
92 pages
Instrumental Variable: Statistics Econometrics Epidemiology
No ratings yet
Instrumental Variable: Statistics Econometrics Epidemiology
5 pages
Section 12 PDF
No ratings yet
Section 12 PDF
7 pages
Chapter 15
No ratings yet
Chapter 15
38 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
Sunil IFPRI 23mar21 IV ESR PDFFormat
No ratings yet
Sunil IFPRI 23mar21 IV ESR PDFFormat
54 pages
Inst Va Reg
No ratings yet
Inst Va Reg
37 pages
MIT Microeconomics 14.32 Final Review
No ratings yet
MIT Microeconomics 14.32 Final Review
5 pages
Ps 5
No ratings yet
Ps 5
4 pages
Het IV
No ratings yet
Het IV
65 pages
Ch. 1 - Endogeneity
No ratings yet
Ch. 1 - Endogeneity
18 pages
RMDA Final Review: 2020-2021 Semester 1 Prof. Sally Hudson
No ratings yet
RMDA Final Review: 2020-2021 Semester 1 Prof. Sally Hudson
26 pages
InstrumentalVars Kolesar Gsas - Harvard 0084L 10796
No ratings yet
InstrumentalVars Kolesar Gsas - Harvard 0084L 10796
162 pages
Instrumental PDF
No ratings yet
Instrumental PDF
69 pages
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
No ratings yet
Instrumental Variable Estimation 1: Framework: Instructor: Yuta Toyama Last Updated: 2021-05-18
30 pages
Lecture 12 Instrumental Variables
No ratings yet
Lecture 12 Instrumental Variables
5 pages
Heckman and Urzua
No ratings yet
Heckman and Urzua
110 pages
Econometrics Eviews 4
No ratings yet
Econometrics Eviews 4
14 pages
Solution Assignment
No ratings yet
Solution Assignment
34 pages
PSM Inès
No ratings yet
PSM Inès
71 pages
Endogeneity
No ratings yet
Endogeneity
73 pages
Instrumental Variables Slides 2021
No ratings yet
Instrumental Variables Slides 2021
26 pages
Lecture 1b
No ratings yet
Lecture 1b
7 pages
Class 7 After
No ratings yet
Class 7 After
23 pages
Micro-Econometrics ECO 6175: Abel Brodeur
No ratings yet
Micro-Econometrics ECO 6175: Abel Brodeur
32 pages
2SLS Notes
No ratings yet
2SLS Notes
44 pages
Rev Lect 3&4 J
No ratings yet
Rev Lect 3&4 J
56 pages
Endogeneity 6
No ratings yet
Endogeneity 6
16 pages
5 Ivmf
No ratings yet
5 Ivmf
13 pages
Event Studies Slides
No ratings yet
Event Studies Slides
39 pages
Stata IV Simple Example
No ratings yet
Stata IV Simple Example
7 pages
Slides 5 Iu
No ratings yet
Slides 5 Iu
38 pages
GMM 2
No ratings yet
GMM 2
30 pages
Generalized Methods of Moments (GMM) Estimation With PDF
No ratings yet
Generalized Methods of Moments (GMM) Estimation With PDF
30 pages
Econometrics II. Lecture Notes 1
No ratings yet
Econometrics II. Lecture Notes 1
17 pages
Slides 1 Arnold Ventures 2024
No ratings yet
Slides 1 Arnold Ventures 2024
68 pages
Lecture3.2 IV 2020
No ratings yet
Lecture3.2 IV 2020
63 pages
Applied Econometrics: William Greene Department of Economics Stern School of Business
No ratings yet
Applied Econometrics: William Greene Department of Economics Stern School of Business
68 pages
Block 4
No ratings yet
Block 4
51 pages
Hrs RDD Slides F
No ratings yet
Hrs RDD Slides F
40 pages
Sample Exam With Solutions. Econometrics II 2015.
No ratings yet
Sample Exam With Solutions. Econometrics II 2015.
15 pages
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
From Everand
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
P.Y. Cheng
No ratings yet
Tutorial Overview of Model Predictive Control
No ratings yet
Tutorial Overview of Model Predictive Control
15 pages
2008 SAR ADC Algorithm With Redundancy
No ratings yet
2008 SAR ADC Algorithm With Redundancy
4 pages
OFDM De-Noising With RLS Adaptive Filter
No ratings yet
OFDM De-Noising With RLS Adaptive Filter
5 pages
Data Structure and Algorithm
No ratings yet
Data Structure and Algorithm
30 pages
Linear Equations
No ratings yet
Linear Equations
7 pages
Smai A1 PDF
No ratings yet
Smai A1 PDF
3 pages
Econometrics III, Summary
No ratings yet
Econometrics III, Summary
7 pages
Lecture 9 Queing Theory
No ratings yet
Lecture 9 Queing Theory
58 pages
CRD Assignment
No ratings yet
CRD Assignment
9 pages
CH 12 Simulation
No ratings yet
CH 12 Simulation
49 pages
Econometric Si Syl Lab Us
No ratings yet
Econometric Si Syl Lab Us
5 pages
Lec Note - Robust Control - 1501
No ratings yet
Lec Note - Robust Control - 1501
83 pages
Soft Computing
No ratings yet
Soft Computing
30 pages
Two Pointers
No ratings yet
Two Pointers
3 pages
Block and Stream Ciphers
No ratings yet
Block and Stream Ciphers
8 pages
Sanaullah
No ratings yet
Sanaullah
1 page
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
46 pages
Ban701-Computational Fluid Dynamics Answer Key Part-A: 1.applications of CFD
No ratings yet
Ban701-Computational Fluid Dynamics Answer Key Part-A: 1.applications of CFD
4 pages
Arid Agriculture University, Rawalpindi: Mid Exam / Spring 2021 (Paper Duration 12 Hours) To Be Filled by Teacher
No ratings yet
Arid Agriculture University, Rawalpindi: Mid Exam / Spring 2021 (Paper Duration 12 Hours) To Be Filled by Teacher
4 pages
Decision Theory
No ratings yet
Decision Theory
6 pages
CSE 473: Artificial Intelligence: Backtracking Search
No ratings yet
CSE 473: Artificial Intelligence: Backtracking Search
17 pages
CH 16
No ratings yet
CH 16
12 pages
DS Notes Removed
No ratings yet
DS Notes Removed
14 pages
Malamud y Turcotte
No ratings yet
Malamud y Turcotte
24 pages
1155 CS F407 20230601095112 Mid Semester Question Paper
No ratings yet
1155 CS F407 20230601095112 Mid Semester Question Paper
1 page
A New Iterative Method For Ranking College Football Teams
No ratings yet
A New Iterative Method For Ranking College Football Teams
15 pages
CBM342 BCI Unit III
No ratings yet
CBM342 BCI Unit III
16 pages
1.2.1 Idealization of A Continuum
No ratings yet
1.2.1 Idealization of A Continuum
6 pages