Lesson 4 - Diff in diff
Lesson 4 - Diff in diff
ECON 398
Jonathan Graves
2025-03-11
Regression has a Problem
Regression is a powerful tool, but it has a significant drawback: it’s really hard to believe the
conditional independence assumption.
• In many situations, we’d like to identify an alternative assumption that will still allow
us to identify a causal effect of interest.
• Unfortunately, this is quite difficult in general: something very specific needs to hold,
and it’s usually hard to find anything more flexible than the CIA.
Our Framework
What we need is something to fix step (2). This requires situations that have a special feature:
time.
–
All econometric models rely on a special feature to create an identification condition (2).
In DD models, the identification condition relates to how time and treatment interact.
The Basics
• Observation: person 𝑖 observed in a particular time period 𝑡. This give us the (horrible)
notation 𝑗 = 𝑖𝑡.
• There is a particular time period when person 𝑖 is treated, let’s call it 𝑔𝑖 1 .
1
For go-time
2
• Each person is also in a group: 𝐺𝑖𝑡 .
– Two groups: “never-treated” and “eventually-treated” groups.
– Medical framework: “control” and “treatment” groups.
– If 𝐺𝑖𝑡 = 0 this means that observation 𝑖𝑡 was in the control group.
• We also measure an outcome 𝑌𝑖𝑡 for person 𝑖 at time 𝑡.
At this point, there’s too much going on, so we’re going to simplify things.
Setting Up DD
1. One “go-time”: 𝑔𝑖 = 𝑔
2. No changing groups: 𝐺𝑖𝑡 = 𝐺𝑖
3. Two groups: 𝐺𝑖 = 1 or 0
4. ( ) Only treated if both in the eventually treated group and treatment has started
• 𝐷𝑖𝑡 = 1 ⟺ 𝐺𝑖 = 1 and 𝑇𝑡 = 1.
These also mean we can write 𝐺𝑖 for group (a dummy), and a “before-after” dummy, 𝑇𝑡 =
𝟙(𝑡 > 𝑔).
Connecting to POM
The last one is important: this is the difference between being treated and not being treated
at time 𝑡 for person 𝑖.
3
Δ𝑖𝑡 is Different
Cross-Sectional Difference
𝐴𝑇 𝐸𝜏 = 𝐸[Δ𝑖𝜏 |𝑡 = 𝜏 ]
4
However, in this set-up:
. . .
Before After
Control Not treated Not treated
Treatment Not treated Treated
Differences
𝜃𝐷𝐷 = 𝛿 = 𝛿1 − 𝛿0
2
He said the thing!
5
In the POM
. . .
So, our comparison 𝛿 is:
𝜃𝐷𝐷 = 𝛿 = 𝐴𝑇 𝑇 + 𝐵
6
What is 𝐵?
It’s not your classical selection bias, but it’s something else important:
• 𝐵 is the difference in would have happened to the treatment group over time, without
treatment, and what actually happened to the control group over time.
• 𝐵 = 0 is called the common trends assumption
In a Regression
𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝐺𝑖 + 𝛽2 𝑇𝑡 + 𝛽3 𝐺𝑖 × 𝑇𝑖 + 𝜖𝑖𝑡
1. 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑇𝑡 = 1] = 𝛽0 + 𝛽1 + 𝛽2 + 𝛽3
2. 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑇𝑡 = 0] = 𝛽0 + 𝛽1
3. 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑇𝑡 = 1] = 𝛽0 + 𝛽2
4. 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑇𝑡 = 0] = 𝛽0
7
• 𝛽1 : group effect
• 𝛽2 time effect
• Where is the CTA?
𝜃𝐷𝐷 in a Regression
𝛿1 = 𝛽0 + 𝛽1 + 𝛽2 + 𝛽3 − (𝛽0 + 𝛽1 )
= 𝛽 2 + 𝛽3
𝛿0 = 𝛽0 + 𝛽2 − (𝛽0 ) = 𝛽2
⟹ 𝜃𝐷𝐷 = 𝛿 = 𝛿1 − 𝛿0 = (𝛽2 + 𝛽3 ) − 𝛽2 = 𝛽3
. . .
You can create an appropriate version of the “standard” DD model for every time period:
8
Figure 1: Vienna hospital mortality calculations from Goodman-Bacon and Johnson (2023)
1. Show that there are common pre-trends: the trends are common before treatment.
2. Identify possible confounders or reasons for a trend violation, then rule them out.
3. Perform placebo tests to show that the result is not a spurious time correlation.
9
Figure 2: Sir Thomas Eyeball at Work
10
The more “scientific” version of Sir Thomas is to run a regression like:
𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝐺𝑖 + 𝛽2 𝑡 + 𝛽3 𝐺𝑖 × 𝑡 + 𝜖𝑖
1. You think that the DD estimate is “polluted” by some other event which affected the
treated group at the time of treatment.
2. You find a “placebo” group which is also plausibly affected by the “pollutant” but is not
affected by treatment.
3. You run your standard difference-in-difference analysis but using the placebo group in-
stead of the actual group.
Conditional Difference-in-Differences
• Sort of.
• People who are inexperienced with DD models often try to treat them like regular re-
gression models, by adding lots of covariates.
• This isn’t necessary, nor is it a good idea.
11
“Add Controls”
𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝐺𝑖 + 𝛽2 𝑇𝑡 + 𝛽3 𝐺𝑖 × 𝑇𝑖 + 𝛽4 𝑋𝑖 + 𝜖𝑖𝑡
• The time trends are not affected by 𝑋𝑖 . So how do they help with the common trends
assumption?
• The treatment effect (𝛽3 ) also does not vary by 𝑋𝑖 . So what is this doing here?
We want:
𝐴𝑇 𝑇 = ∑ 𝐴𝑇 𝑇 (𝑥)𝑃 (𝑥|𝐷𝑖𝑡 = 1)
𝑥
12
Conditional Common Trends
• This is your regular common trends assumption, but now specific to the group where
𝑋𝑖 = 𝑥.
• Even if you are not sure that the common trends assumption hold, you may believe that
conditional common trends holds
• DD with controls can be used to find different ATTs and then estimate them.
• You need 𝑌𝑖𝑡 (0) to be changing over time in the same way in both the treatment and
control groups.
• 𝑋𝑖𝑡 is also changing over time, and is present in different amounts and evolves differently
in each group.
You’re doomed!
Triple Differences
The idea of multiple differencing extend the logic from placebo tests to create new estimators.
𝜃𝐷𝐷 = 𝜏1 + Δ − 𝜏0
• CTA: 𝜏1 = 𝜏0 ⟹ 𝜃𝐷𝐷 = Δ
13
What happens if there’s a difference in the time effects?
• 𝜏1 = 𝜏0 + 𝛾.
• ⟹ 𝜃𝐷𝐷 = Δ + 𝛾 ≠ Δ
• What we need is a way to isolate the group effect.
• CTA′ : 𝑡0 = 𝑡1
′
⟹ 𝜃𝐷𝐷 = 𝜏1 + 𝛾 − 𝜏0 = 𝛾
• This is the group effect (or placebo effect in the medical literature).
′
𝜃𝐷𝐷𝐷 = 𝜃𝐷𝐷 − 𝜃𝐷𝐷 =Δ+𝛾−𝛾 =Δ
And we’re done! We’ve saved our model, at the expense of introducing a tongue-twisting new
estimator.
14
Identifying Conditions
1. The control and treatment groups for 𝐺𝑖 have the same time trend.
2. The placebo-control and placebo groups for 𝐻𝑖 have the same time trend.
3. The group (placebo) effect is the same between the two sets of groups.
Comments
• Identifying assumption is not that the placebo and regular controls share a time trend.
• Sometimes, such as in a medical trial, the placebo-control and control groups are the
same.
• Not required, nor important!
• Why not DDDD or DDDDD?
A Family of Models
• For instance, if you had continuous 𝑡, how could you estimate a “change in trend”
These all use the same logic as DD at a fundamental level, which makes it a powerful and
effective technique.
• Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. “How much should we
trust differences-in-differences estimates?.” The Quarterly journal of economics 119.1
(2004): 249-275.
15