0% found this document useful (0 votes)

35 views58 pages

13 Dind

The document discusses the Difference-in-Differences (DiD) research design, emphasizing its application in estimating causal effects of policies across groups while addressing potential confounding factors. It highlights key considerations such as the importance of counterfactual estimands, structural assumptions, and the challenges of pre-testing for parallel trends. The document also references notable studies and suggests methods for improving the robustness of DiD analyses.

Uploaded by

hector.rufrancos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views58 pages

13 Dind

Uploaded by

hector.rufrancos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Canonical Research Designs I:

Diﬀerence-in-Diﬀerences

Paul Goldsmith-Pinkham

March 22, 2021

1/2
Revisiting Research Design
- Recall my attempt at a deﬁnition:
- A (causal) research design is a statistical
and/or economic statement of how an
empirical research paper will estimate a
relationship between two (or more)
variables that is causal in nature – X
causing Y .
- The design should have a description
for how some variation in X is either
caused by or approximated by a
randomized experiment.

2/2
Revisiting Research Design
- Recall my attempt at a deﬁnition:
- A (causal) research design is a statistical
and/or economic statement of how an
empirical research paper will estimate a
relationship between two (or more)
variables that is causal in nature – X
causing Y .
- The design should have a description
for how some variation in X is either
caused by or approximated by a
randomized experiment.

- Dinardo and Lee (2011) have a famous

handbook chapter entitled “Program
Evaluation and Research Design” where
they make a distinction between two
types of research designs
2/2
Revisiting Research Design

- D-condition designs fall clearly into the “PGP” description

of a research design – knowledge of the DGP leads to a
variation in the data generating our identiﬁcation

- S-conditions fall less clearly into this context (as we will

discuss).
- The relationship between X and Y can be clearly
articulated, but how it is potentially approximated by a
random experiment is less obvious

- This issue will become clear as we discuss our ﬁrst topic

3/2
Estimating causal eﬀects in real settings

- In many applications, we want to estimate the

eﬀect of a policy across groups

- However, the policy assignment is not

necessarily uncorrelated with group
characteristics

- How can we identify the eﬀect of the policy

without being confounded by these level
diﬀerences?

4/2
Estimating causal eﬀects in real settings

- In many applications, we want to estimate the

eﬀect of a policy across groups

- However, the policy assignment is not Diﬀerence-in-diﬀerences!

necessarily uncorrelated with group (DinD)
characteristics

- How can we identify the eﬀect of the policy

without being confounded by these level
diﬀerences?

4/2
First, a warning
- This literature has had a certain amount of upheaval over the past 5-6 years

- Tension: provide context for how people currently and historically have studied
diff-in-diff
- But also elaborate on concerns identified in recent papers

- The key issues boil down into two questions:

1. What is the counterfactual estimand?
- Does your estimator map to your estimand? (e.g. “Are you getting at what you meant to?”)
2. What are your structural assumptions and their implications?
- Do you need to assume functional forms? (e.g. “Is this really something that has an
experimental analog”?)

- Papers have both pointed out issues but also provided solutions to almost all of the
problems that they’ve raised, so not something that should prevent you from using
these tools
5/2
Basic setup
- Assume we have n units (i) and T time periods (t )

- Consider a binary policy Dit , and we are interested in estimating its eﬀect on
outcomes Yit

- The inherent problem is that Dit is not necessarily randomly assigned

- The historical key (and parametric) assumption underlying of the potential outcomes
model (one version):
Yit (Dit ) = αi + γt + τi Dit
s.t. Yit (1) − Yit (0) = τi

- Implication? In the absence of the treatment, the Yit across units evolve in parallel –
their γt are identical. Absent the policy, units may have diﬀerent levels (αi ) but their
changes would evolve in parallel
- This is a key (parametric!) identifying assumption
- Yit (0) − Yi ,t −k (0) = γt − γt −k , Yit (0) − Yjt (0) = αi − αj
6/2
Basic 2x2 DinD setup

- Recall our typical estimand of interest is the ATE or the ATT:

τATE = E (Yit (1) − Yit (0)) = E (τi )

τATT = E (Yit (1) − Yit (0)) = E (τi |Dit = 1)

- Since D is not randomly assigned and we only observe one time

period, this model is inherently not identiﬁed without additional
assumptions.
- Why? Di could be correlated with αi
- Recall that our plug-in estimator approaches need estimates for
E (Yit (1)) and E (Yit (0))
- Where can we get unbiased estimates?

- With two time periods we can make a lot more progress!

7/2
Basic 2x2 DinD setup

- Recall our typical estimand of interest is the ATE or the ATT:

τATE = E (Yit (1) − Yit (0)) = E (τi )

τATT = E (Yit (1) − Yit (0)) = E (τi |Dit = 1)

- Since D is not randomly assigned and we only observe one time

- With two time periods we can make a lot more progress!

7/2
2 × 2 DinD estimation
t =0 t=1
D=0 γ0 + αi γ1 + αi
D=1 γ0 + αi + τi γ1 + αi + τi
- Now consider the within unit diﬀerence:
Yi1 − Yi0 = (γ1 − γ0 ) + τi (Di1 − Di0 )

- Hence
E (Yi1 − Yi0 |Di1 − Di0 = 1) − E (Yi1 − Yi0 |Di1 − Di0 = 0) = E (τi |Di1 − Di0 = 1)

- Wait, you say, that’s a lot more notation than I was expecting.
- Simplifying assumption: treatment only goes one way in period 1
- “absorbing adoption”, e.g. Di0 = 0

E (Yi1 − Yi0 |Di1 = 1) − E (Yi1 − Yi0 |Di1 = 0) = E (τi |Di1 = 1)

| {z }
ATT
8/2
An aside on our simplifying assumption

- The choice of focusing on take-up of a policy, such that Di1 ≥ Di0 , is well-grounded in
many policy settings

- However, there are cases where policies turn on, and then turn oﬀ, and this can vary
across units

- This can be challenging and potentially problematic with heterogeneous eﬀects

- Need to think carefully about whether Di turning on is identical (but opposite sign) to
Di turning oﬀ
- Hull (2018) working paper on mover designs discusses this

- For today, will ignore this issue

9/2
Estimation using linear regression
- A simple linear regression will identify E (τi |Di1 = 1) with two time periods:

Yit = αi + γt + Dit β + eit (1)

- This setup is sometimes referred to as the Two-way Fixed Eﬀects estimator (TWFE)

- Note: we could have also estimated τ directly:

τ̂ = n−1 ∑ Di (Yi1 − Yi0 ) − (1 − Di1 )(Yi1 − Yi0 )

i
| {z } | {z }
∆Y 1 ∆Y 0

- Intuitively, we generate a counterfactual for the treatment using the changes in the
untreated units: E (Yi1 − Yi0 |Di = 0)

- Necessary: two time periods! What if we have more?

10 / 2
Multiple time periods in basic setup
- Let’s consider a policy that occurs all at t0 (e.g. single timing rolled out to treated units)

- More time periods helps in several ways:

1. If we have multiple periods before the policy implementation, we can partially test the
underlying assumptions
- Sometimes referred to as “pre-trends”
2. If we have multiple periods after the policy implementation, we can examine the timing of
the effect
- Is it an immediate effect? Does it die off? Is it persistent?
- If you pool all time periods together into one “post” variable, this estimates the average effect.
If sample is not balanced, can have unintended effects!

- How do we implement this?

T
Yit = αi + γt + ∑ δt Dit + eit ,
t =1,t 6=t0

- One of the coeﬃcients is fundamentally unidentiﬁed because of αi

- All coeﬃcients measure the eﬀect relative to period t0 .
11 / 2
Pre-testing and structural assumptions

- Note that for the above model, we made a stronger assumption about trends
- The Dinardo and Lee “S-assumptions” start to bite
- We assumed that Yit (d ) − Yi ,t −k (d ) = γt − γt −k for all k and d
- This is testable pre-treatment (hence the pre-test)

- This is very powerful and has helped spark the growth in DinD regressions
- Visual demonstrate of “pre-trends” helps support the validity of the design
- Worth doing!

- Two key issues:

1. Pre-testing can cause statistical problems
2. What does parallel trends even mean?

12 / 2
Pre-testing issues (Roth 2020)
- Consider T = 3 and think about what a
pre-trend test is trying to do