0% found this document useful (0 votes)
7 views

Lesson 4 - Diff in diff

The document discusses the limitations of regression analysis, particularly the challenge of the conditional independence assumption (CIA) in identifying causal effects. It introduces the Difference in Differences (DD) model as a framework that utilizes time and treatment interactions to establish identification conditions. The document outlines the structure of the DD model, its assumptions, and its application in regression analysis to estimate treatment effects while addressing potential biases.

Uploaded by

ps4npw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lesson 4 - Diff in diff

The document discusses the limitations of regression analysis, particularly the challenge of the conditional independence assumption (CIA) in identifying causal effects. It introduces the Difference in Differences (DD) model as a framework that utilizes time and treatment interactions to establish identification conditions. The document outlines the structure of the DD model, its assumptions, and its application in regression analysis to estimate treatment effects while addressing potential biases.

Uploaded by

ps4npw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Difference in Differences

ECON 398

Jonathan Graves

2025-03-11
Regression has a Problem

Regression is a powerful tool, but it has a significant drawback: it’s really hard to believe the
conditional independence assumption.

• In many situations, we’d like to identify an alternative assumption that will still allow
us to identify a causal effect of interest.

• Unfortunately, this is quite difficult in general: something very specific needs to hold,
and it’s usually hard to find anything more flexible than the CIA.

Our Framework

Returning to our framework for causal models, we can see that

1. What do you want to know? (ATT)


2. What are you willing to believe? (Not the CIA!)

3. What should you do to find it? (???)


4. Were your beliefs plausible? (???)
5. What did you learn? (Nothing � )

What we need is something to fix step (2). This requires situations that have a special feature:
time.

All econometric models rely on a special feature to create an identification condition (2).

• This is (generically) the CIA.


• However, in specific settings it can be more specific or stated differently.

In DD models, the identification condition relates to how time and treatment interact.

The Basics

The DD model will take place over time.

• Observation: person 𝑖 observed in a particular time period 𝑡. This give us the (horrible)
notation 𝑗 = 𝑖𝑡.
• There is a particular time period when person 𝑖 is treated, let’s call it 𝑔𝑖 1 .
1
For go-time

2
• Each person is also in a group: 𝐺𝑖𝑡 .
– Two groups: “never-treated” and “eventually-treated” groups.
– Medical framework: “control” and “treatment” groups.
– If 𝐺𝑖𝑡 = 0 this means that observation 𝑖𝑡 was in the control group.
• We also measure an outcome 𝑌𝑖𝑡 for person 𝑖 at time 𝑡.

At this point, there’s too much going on, so we’re going to simplify things.

Setting Up DD

1. One “go-time”: 𝑔𝑖 = 𝑔
2. No changing groups: 𝐺𝑖𝑡 = 𝐺𝑖
3. Two groups: 𝐺𝑖 = 1 or 0
4. ( ) Only treated if both in the eventually treated group and treatment has started
• 𝐷𝑖𝑡 = 1 ⟺ 𝐺𝑖 = 1 and 𝑇𝑡 = 1.

• This is equivalent to writing 𝐷𝑖𝑡 = 𝑇𝑡 × 𝐺𝑖 .

These also mean we can write 𝐺𝑖 for group (a dummy), and a “before-after” dummy, 𝑇𝑡 =
𝟙(𝑡 > 𝑔).

Connecting to POM

The fourth assumption is our link to the POM model:

• 𝑌𝑖𝑡 : outcome variable of interest.


• 𝑌𝑖𝑡 (𝐷𝑖𝑡 ): potential outcomes
– 𝑌𝑖𝑡 (0): outcome for person 𝑖 in time 𝑡 if person 𝑖 was not treated at 𝑡.
– 𝑌𝑖𝑡 (1): outcome for person 𝑖 in time 𝑡 if person 𝑖 was treated at 𝑡.
• 𝐷𝑖𝑡 : is our treatment variable
• Δ𝑖𝑡 = 𝑌𝑖𝑡 (1) − 𝑌𝑖𝑡 (0)

The last one is important: this is the difference between being treated and not being treated
at time 𝑡 for person 𝑖.

3
Δ𝑖𝑡 is Different

Δ𝑖𝑡 = 𝑌𝑖𝑡 (1) − 𝑌𝑖𝑡 (0)

• Both 𝑖 and 𝑡 vary in this set-up!


• 𝐸[Δ𝑖𝑡 |𝑖 = 𝜄]: average over time for a specific person 𝜄
• 𝐸[Δ𝑖𝑡 |𝑡 = 𝜏 ]: average over people for a specific time 𝜏
• The LIE says we can just average these averages to get the ATE, but there are concep-
tually more ways to do that average than before!

Cross-Sectional Difference

We’ll call the cross-sectional difference:

𝐴𝑇 𝐸𝜏 = 𝐸[Δ𝑖𝜏 |𝑡 = 𝜏 ]

• This one is going to be pretty important.


• It’s the ATE at time 𝜏 .
• In our original model, with only one 𝑡, this was our old average treatment effect.

Post Hoc, Ergo Propter Hoc

Is time alone enough? No.

• Imagine everyone was in the eventually-treated group (𝐺𝑖 = 1).

𝜃𝐵𝐴 = 𝐸[𝑌𝑖𝑡 |𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 |𝑇𝑡 = 0]

• Calculate the difference in outcome over time. This is:

𝜃𝐵𝐴 = 𝐸[𝑌𝑖𝑡 (1)|𝐷𝑖𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 0] = 𝐴𝑇 𝑇 + 𝐵

4
However, in this set-up:

𝐵 = 𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 0]

• Is there a reason why 𝐵 ≠ 0?

• Think carefully about what’s going on here!

. . .

• If there’s a time trend in that potential outcome, we have a serious problem


• Treatment effect 𝐷𝑖𝑡 and time effect 𝑇𝑡 share time.
• Adding covariates does not really help either!

Back of the Envelope Model

Table 1: The structure of the DD model

Before After
Control Not treated Not treated
Treatment Not treated Treated

• Just do the simplest thing possible.


• Calculate average outcomes for each group.

Differences

𝛿1 = 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑇𝑡 = 0]

𝛿0 = 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑇𝑡 = 0]

The comparison (the difference in differences)2 is:

𝜃𝐷𝐷 = 𝛿 = 𝛿1 − 𝛿0
2
He said the thing!

5
In the POM

𝜃𝐷𝐷 = 𝐸[𝑌𝑖𝑡 (1)|𝐺𝑖 = 1, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 0]

−(𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 0])

Let’s add and subtract:

− 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 1] + 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 1]


⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
=0

The first part of the expression is:

𝐸[𝑌𝑖𝑡 (1)|𝐺𝑖 = 1, 𝑇𝑡 = 1]−𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 1] = 𝐸[𝑌𝑖𝑡 (1)|𝐷𝑖𝑡 = 1]−𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 1] = 𝐴𝑇 𝑇

The remaining part is:

𝐵 = 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 0]


−(𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 0])

. . .
So, our comparison 𝛿 is:

𝜃𝐷𝐷 = 𝛿 = 𝐴𝑇 𝑇 + 𝐵

6
What is 𝐵?

It’s not your classical selection bias, but it’s something else important:

𝐵 = 𝐸[𝑌 𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 0]


⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
Change in 𝑌𝑖𝑡 (0) for treatment group

− (𝐸[𝑌 𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 0])


⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
Change in 𝑌𝑖𝑡 (0) for control group

• 𝐵 is the difference in would have happened to the treatment group over time, without
treatment, and what actually happened to the control group over time.
• 𝐵 = 0 is called the common trends assumption

Definition 0.0.1 (Common Trends Assumption). A difference-in-differences model meets the


common trends assumption when:

𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 0]


= 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 0]

In a Regression

DD is an excellent fit for regression, but we need to be careful. Consider:

𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝐺𝑖 + 𝛽2 𝑇𝑡 + 𝛽3 𝐺𝑖 × 𝑇𝑖 + 𝜖𝑖𝑡

• This model is saturated so 𝐸[𝜖𝑖𝑡 |𝑇𝑡 , 𝐺𝑖 ] = 0.


• Now, let’s look at the four terms from our “back of the envelope” analysis, earlier, after
plugging in the equation.

1. 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑇𝑡 = 1] = 𝛽0 + 𝛽1 + 𝛽2 + 𝛽3
2. 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑇𝑡 = 0] = 𝛽0 + 𝛽1
3. 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑇𝑡 = 1] = 𝛽0 + 𝛽2
4. 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑇𝑡 = 0] = 𝛽0

7
• 𝛽1 : group effect
• 𝛽2 time effect
• Where is the CTA?

𝜃𝐷𝐷 in a Regression

𝛿1 = 𝛽0 + 𝛽1 + 𝛽2 + 𝛽3 − (𝛽0 + 𝛽1 )

= 𝛽 2 + 𝛽3

𝛿0 = 𝛽0 + 𝛽2 − (𝛽0 ) = 𝛽2

⟹ 𝜃𝐷𝐷 = 𝛿 = 𝛿1 − 𝛿0 = (𝛽2 + 𝛽3 ) − 𝛽2 = 𝛽3

Blowing Your Minds

• 𝑡 does not have to be time.


• 𝑇𝑡 was not special

. . .
You can create an appropriate version of the “standard” DD model for every time period:

𝐷𝐷𝜏 = (𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑡 = 𝜏 ] − 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑡 = 𝑔])


−(𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑡 = 𝜏 ] − 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑡 = 𝑔])

• You can even vary the condition 𝑡 = 𝑔 to be 𝑡 ≤ 𝑔 or any other combination.

8
Figure 1: Vienna hospital mortality calculations from Goodman-Bacon and Johnson (2023)

Testing the Common Trends

CTA is fundamentally untestable without additional assumptions: it’s a statement about


the evolution of an unobservable.

• However, you can still do indirect tests.


• These usually rely on other data or information.

1. Show that there are common pre-trends: the trends are common before treatment.
2. Identify possible confounders or reasons for a trend violation, then rule them out.
3. Perform placebo tests to show that the result is not a spurious time correlation.

Method 1: Common Pre-Trends

1. Plot evolution of 𝑌𝑖𝑡 over time.


2. Look at tends.
3. “eh, eh, looks similar right”?

9
Figure 2: Sir Thomas Eyeball at Work

10
The more “scientific” version of Sir Thomas is to run a regression like:

𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝐺𝑖 + 𝛽2 𝑡 + 𝛽3 𝐺𝑖 × 𝑡 + 𝜖𝑖

for 𝑡 ≤ 𝑔 and then go “eh, eh” at the coefficient 𝛽3 .

Method 2: Identifying and Ruling Out Violations

1. Identify reason why the common trends assumption might be violated.


2. Determine some kind of prediction or pattern that the violation suggests should be
present in observable data.
3. Check to see if the pattern is there. Hopefully it’s not.

Method 3: Placebo Tests

1. You think that the DD estimate is “polluted” by some other event which affected the
treated group at the time of treatment.
2. You find a “placebo” group which is also plausibly affected by the “pollutant” but is not
affected by treatment.
3. You run your standard difference-in-difference analysis but using the placebo group in-
stead of the actual group.

Hopefully you don’t get a significant result.

Conditional Difference-in-Differences

What about covariates? Are they useful?

• Sort of.
• People who are inexperienced with DD models often try to treat them like regular re-
gression models, by adding lots of covariates.
• This isn’t necessary, nor is it a good idea.

Let’s see why.

11
“Add Controls”

𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝐺𝑖 + 𝛽2 𝑇𝑡 + 𝛽3 𝐺𝑖 × 𝑇𝑖 + 𝛽4 𝑋𝑖 + 𝜖𝑖𝑡

• The time trends are not affected by 𝑋𝑖 . So how do they help with the common trends
assumption?
• The treatment effect (𝛽3 ) also does not vary by 𝑋𝑖 . So what is this doing here?

The answer is “not much.”

A Viable Approach: Saturation

We want:

𝐴𝑇 𝑇 = 𝐸[𝑌𝑖𝑡 (1)|𝐷𝑖𝑡 = 1] − 𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 1]

𝐴𝑇 𝑇 = ∑ 𝐴𝑇 𝑇 (𝑥)𝑃 (𝑥|𝐷𝑖𝑡 = 1)
𝑥

= ∑ 𝑃 (𝑥|𝐷𝑖𝑡 = 1)𝐸[𝑌𝑖𝑡 (1)|𝐷𝑖𝑡 = 1, 𝑋𝑖 = 𝑥] − 𝑃 (𝑥|𝐷𝑖𝑡 = 1)𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 1, 𝑋𝑖 = 𝑥]


𝑥

= ∑ 𝑃 (𝑥|𝐷𝑖𝑡 = 1)𝐸[𝑌𝑖𝑡 (1)|𝐷𝑖𝑡 = 1, 𝑋𝑖 = 𝑥] − ∑ 𝑃 (𝑥|𝐷𝑖𝑡 = 1)𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 1, 𝑋𝑖 = 𝑥]


⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
𝑥 𝑥
=observable

= 𝐸[𝑌𝑖𝑡 (1)|𝐷𝑖𝑡 = 1] − ∑ 𝑃 (𝑥|𝐷𝑖𝑡 = 1)𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 1, 𝑋𝑖 = 𝑥]


𝑥

• 𝑃 (𝑥|𝐷𝑖𝑡 = 1) is observable from data, but 𝐸[𝑌𝑖𝑡 (0)|𝐷𝑖𝑡 = 1, 𝑋𝑖 = 𝑥] is not


• However, this is exactly what our common trends assumption was trying to find in
general.

12
Conditional Common Trends

𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 1, 𝑋𝑖 = 𝑥] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 1, 𝑇𝑡 = 0, 𝑋𝑖 = 𝑥] =

𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 1, 𝑋𝑖 = 𝑥] − 𝐸[𝑌𝑖𝑡 (0)|𝐺𝑖 = 0, 𝑇𝑡 = 0, 𝑋𝑖 = 𝑥]

• This is your regular common trends assumption, but now specific to the group where
𝑋𝑖 = 𝑥.
• Even if you are not sure that the common trends assumption hold, you may believe that
conditional common trends holds
• DD with controls can be used to find different ATTs and then estimate them.

Time Varying Controls?

𝑌𝑖𝑡 = 𝛽0 + 𝛽1 𝐺𝑖 + 𝛽2 𝑇𝑡 + 𝛽3 𝐺𝑖 × 𝑇𝑖 + 𝛽4 𝑋𝑖𝑡 + 𝜖𝑖𝑡

• You need 𝑌𝑖𝑡 (0) to be changing over time in the same way in both the treatment and
control groups.
• 𝑋𝑖𝑡 is also changing over time, and is present in different amounts and evolves differently
in each group.

You’re doomed!

Triple Differences

The idea of multiple differencing extend the logic from placebo tests to create new estimators.

𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 1, 𝑇𝑡 = 0] = 𝜏1 + Δ

𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 |𝐺𝑖 = 0, 𝑇𝑡 = 0] = 𝜏0

𝜃𝐷𝐷 = 𝜏1 + Δ − 𝜏0

• CTA: 𝜏1 = 𝜏0 ⟹ 𝜃𝐷𝐷 = Δ

13
What happens if there’s a difference in the time effects?

• 𝜏1 = 𝜏0 + 𝛾.
• ⟹ 𝜃𝐷𝐷 = Δ + 𝛾 ≠ Δ
• What we need is a way to isolate the group effect.

The idea is just to do DD again but specific to to the group effect:

• You want a pair of groups, 𝐻𝑖 = 0, 1 which


1. Share a time trend, and
2. One of them shares the group effect (𝐻𝑖 = 1)
• These are the placebo and the placebo-control groups.

𝐸[𝑌𝑖𝑡 |𝐻𝑖 = 1, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 |𝐻𝑖 = 1, 𝑇𝑡 = 0] = 𝑡1 + 𝛾

𝐸[𝑌𝑖𝑡 |𝐻𝑖 = 0, 𝑇𝑡 = 1] − 𝐸[𝑌𝑖𝑡 |𝐻𝑖 = 0, 𝑇𝑡 = 0] = 𝑡0

• CTA′ : 𝑡0 = 𝑡1


⟹ 𝜃𝐷𝐷 = 𝜏1 + 𝛾 − 𝜏0 = 𝛾

• This is the group effect (or placebo effect in the medical literature).

This result means that the difference-in-difference-in-differences (DDD) is:


𝜃𝐷𝐷𝐷 = 𝜃𝐷𝐷 − 𝜃𝐷𝐷 =Δ+𝛾−𝛾 =Δ

And we’re done! We’ve saved our model, at the expense of introducing a tongue-twisting new
estimator.

14
Identifying Conditions

1. The control and treatment groups for 𝐺𝑖 have the same time trend.
2. The placebo-control and placebo groups for 𝐻𝑖 have the same time trend.
3. The group (placebo) effect is the same between the two sets of groups.

Comments

• Identifying assumption is not that the placebo and regular controls share a time trend.
• Sometimes, such as in a medical trial, the placebo-control and control groups are the
same.
• Not required, nor important!
• Why not DDDD or DDDDD?

A Family of Models

These are just two examples of a whole family of models:

• For instance, if you had continuous 𝑡, how could you estimate a “change in trend”

These all use the same logic as DD at a fundamental level, which makes it a powerful and
effective technique.

References and Readings

• Bertrand, Marianne, Esther Duflo, and Sendhil Mullainathan. “How much should we
trust differences-in-differences estimates?.” The Quarterly journal of economics 119.1
(2004): 249-275.

Goodman-Bacon, Andrew, and Janna Johnson. 2023. “A Difference-in-Differences Roadmap


(That Actually Works).”

15

You might also like