0% found this document useful (0 votes)
13 views76 pages

Applied Economics DD Lecture Notes

Uploaded by

朱逢爽
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views76 pages

Applied Economics DD Lecture Notes

Uploaded by

朱逢爽
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Applied Economics: Difference-in-Differences

Philipp Ager
University of Mannheim and CEPR

DiD Lecture Notes – October 21 - November 11, 2024

1 / 76
Difference-in-Differences Estimation
A non-technical introduction

Difference-in-differences (DD) is both the most common and


the oldest quasi-experimental research design

Snow’s (1855) analysis of the 1854 Broad Street cholera


outbreak in London revealed that cholera is a water borne
disease (contaminated public water pump on Broad Street)
Map Video

The DD design can be used for observational studies

One can think of the DD method as a quasi-experimental


design, which makes use of (panel) data from treatment
and control groups to obtain an appropriate counterfactual
to estimate a causal effect
2 / 76
False Counterfactuals

Before vs. After Comparisons:


Compares: same individuals/communities before and after
program
Issue: does not control for time trends

Participant vs. Non-Participant Comparisons:


Compares: participants to those not in the program
Issue: selection ⇒ why didn’t non-participants participate?

3 / 76
Randomized Experiments
Random assignment makes treatment independent of potential
outcomes, hence:
Selection effect is zeroed out
Treatment effect on the treated is equal to the average treatment
effect

Often not feasible (ethical issues, costs, compliance . . . )


4 / 76
However . . .

Difference-in-differences (or “diff-in-diff” or “DD") estimation


combines the (flawed) pre vs. post and participant vs. non-
participant approaches
Can overcome the twin problems of [1] selection bias (on
fixed traits) and [2] time trends in the outcome of interest
The basic idea is to observe the (self-selected) treatment
group and a (self-selected) comparison group before and
after the program

The difference-in-differences estimator is:


 
treatment treatment control control
DD = Ȳpost − Ȳpre − Ȳpost − Ȳpre

5 / 76
DD Estimation
Intuitively, DD estimation is just a comparison of 4 cell-level
means
Only one cell is treated: Treatment x Post-Program

6 / 76
DD Estimation
DD is typically used to estimate the causal effect of a specific
intervention, such as the enactment of certain policies (school
laws, minimum wages, trade agreements, etc.), the adoption
of new technologies, . . .
The basic idea in DD is to compare the changes in outcomes
over time (1. diff) between the treatment and control groups
(2. diff)
See next slide for graphical explanation
In order to setup a DD analysis, you need (as a minimum):
1 Before/after date (you need to know when the intervention
happened)
2 You need to be able to separate treatment/control groups
(in some dimension). Who were affected and who were not
affected by the intervention?
7 / 76
DD Estimation – Graphical Approach
Visualization: DD on Twitter

8 / 76
DD Estimation – Graphical Approach

Y = β0 + β1 [Time] + β2 [Treated] + β3 [Time × Treated] + ϵ

Q: Can you calculate the "DD Estimator"?


9 / 76
Classical Example: Card and Krueger (1994)
Classic question in Labor Economics: Effect of minimum
wage on employment

In a competitive labor market minimum wages would reduce


employment, especially those workers that were intended to
benefit from higher minimum wages

Card and Krueger (1994) use a substantial change in the


New Jersey state minimum wage to see if this is the case
“On April 1, 1992, New Jersey’s minimum wage rose from $4.25 to
$5.05 per hour. To evaluate the impact of the law we surveyed 410
fast-food restaurants in New Jersey and eastern Pennsylvania before
and after the rise. Comparisons of employment growth at stores in
New Jersey and Pennsylvania (where the minimum wage was constant)
provide simple estimates of the effect of the higher minimum wage.
[. . . ] We find no indication that the rise in the minimum wage reduced
employment.”
10 / 76
Card and Krueger (1994) – Restaurant Locations

11 / 76
Card and Krueger (1994) – Wages before policy
change

12 / 76
Card and Krueger (1994) – Wages after policy change

13 / 76
Card and Krueger (1994) – DD estimate
Potential outcomes:
1 Y1ist = fast food employment at restaurant i and period t if
there is a high state minimum wage
2 Y0ist = fast food employment at restaurant i and period t if
there is a low state minimum wage

Problem, in practice we either observe (1) or (2). For example,


we observe Y1ist in NJ in November 1992

Heart of the DD setup is an additive structure for potential


outcomes in the no-treatment state

We assume E(Y0ist |s, t) = γs + λt

⇒ In the absence of a minimum wage change, employment is


determined by the sum of a time-invariant state effect and a year
effect that is common across states
14 / 76
Card and Krueger (1994) – DD estimate

Recall: E(Y0ist |s, t) = γs + λt


What two implicit identifying assumptions are made here?
Selection bias relates to fixed characteristics of (in this case)
states (γs ) and is assumed to not change over time
Time trend (λt ) same for treatment and control group

Both necessary conditions for identification in DD estimation


⇒ referred to a common trends assumption (more on that
later)

15 / 76
Card and Krueger (1994) – DD estimate
Let’s introduce a dummy for high-minimum wage states, Dst

Yist = γs + λt + βDst + ϵist (1)

Treatment effect = β and E(ϵist |s, t) = 0


For the case of NJ and PY we get:
1 E[Yist |s = PA, t = Nov ] − E[Yist |s = PA, t = Feb] = λNov − λFeb
2 E[Yist |s = NJ, t = Nov ] − E[Yist |s = NJ, t = Feb] = λNov − λFeb + β

We obtain the population difference-in-differences by subtracting


(1) from (2) = β, the causal effect of interest
Q’s
(a) Calculate treatment vs. control estimator (in Nov)
(b) What is the issue of a simple pre-vs-post estimator for restaurants in NJ?
16 / 76
Card and Krueger (1994) – Main Results

The relative gain (the “difference in differences” of the changes in employment) is


2.76 full-time-equivalent employees (or 13 percent), with a t-statistic of 2.03
Opposite of what one might expect if a higher minimum wage pushes businesses
up the labor demand curve
17 / 76
DD Estimation – Common Trends Assumption
Key identifying assumption: employment trends would
be the same in both states in the absence of treatment
Treatment induces a deviation from this common trend

18 / 76
DD Estimation – Common Trends Assumption

We do not observe the average post-period outcome for the


treated group in the absence of the treatment

Not directly testable, but we can look at whether trends are


parallel prior to treatment using multiple periods

This is often done by so-called event studies, or by simply


plotting the mean of the outcomes in the treatment and
control groups prior to treatment

One might also control for certain time trends directly (depends
on data structure and identification comes from functional
form assumption)

Note: none of these approaches are possible with only two


periods of data
19 / 76
Example – Common Trends Assumption

20 / 76
Example – Common Trends Assumption

21 / 76
DD Estimation – Common Trends Assumption

Figure: Card and Krueger (2000, AER)


22 / 76
DD Estimation – Common Trends Assumption
Employment levels in New Jersey and Pennsylvania were
similar at the end of 1991
Yet employment in Pennsylvania fell relative to employment
in New Jersey over the next three years
Pennsylvania may not provide a very good measure of counter-
factual employment rates in New Jersey in the absence of
a policy change, and vice versa
A more encouraging example is Pischke (2007)
Looks at the effect of school term length on student performance
using variation generated by a sharp policy change in Germany
Beginning in the 1966-67 school year, the Spring-starters
moved to start school in the Fall (except in Bavaria)
The transition to a Fall start required two short school years
for affected cohorts, 24 weeks long instead of 37
23 / 76
DD Estimation – Common Trends Assumption

Graph provides strong visual evidence of treatment and control states with a
common underlying trend, and a treatment effect that induces a sharp but transitory
deviation from this trend
Shorter school year seems to have increased repetition rates for affected cohorts
24 / 76
DD Estimation: 2 Periods - 2 Groups

Let β denote the true impact of a policy intervention

β = E(Y1ist − Y0ist |s, t)

The difference-in-differences estimator is:


 
treatment treatment control control
DD = Ȳpost − Ȳpre − Ȳpost − Ȳpre

treatment − Ȳ treatment = E[γ |D = 1] + β + λ − (E[γ |D = 1] + λ )


(1) Ȳpost pre i i 2 i i 1

comparison comparison
(2) Ȳpost − Ȳpre = E[γi |Di = 0] + λ2 − (E[γi |Di = 0] + λ1 )

Subtracting 2 from 1, the DD estimation recovers the true


impact of the policy change, β, on participants (as long as
the common trends assumption isn’t violated)
25 / 76
DD Estimation: Example Duflo (2001)

Duflo (2001) examines the impacts of a large school construction


program in Indonesia

26 / 76
DD Estimation: Example Duflo (2001)

The Sekolar Dasar INPRES program (1973-1978):


Oil revenues used to fund construction of primary schools
Close to 62,000 schools were built within 5 years
Circa 1 school built per 500 school-aged children (age 5-14)

Target: children not previously been enrolled in school

Allocation rule based on number of children not enrolled in


primary school in 1972

27 / 76
DD Estimation: Example Duflo (2001)

Research question: Do children who were born into areas


with more newly built INPRES primary schools get more
education? Do they earn more as adults?
Strategy: difference-in-differences estimation
Pre vs Post: children born before and after program
Children aged 12+ in 1974 did not benefit from program
Children aged 6 and under were young enough to be treated

Treatment vs Control Group: Children born in regions where


many schools were built (treatment) vs those where few schools
were built (control)
DD estimate of program impact compares pre vs. post differences
of individuals in treatment vs. control regions

28 / 76
DD Estimation: Example Duflo (2001)

29 / 76
DD Estimation: Example Duflo (2001)

Simple DD estimation compares change in years of schooling


and wages (i.e., the pre vs. post estimate) between both
types of regions (difference in number of schools constructed
per capita between high and low program regions is 0.9)

Key assumption: in the absence of the program, the increase


in educational attainment and wages would not have been
systematically different in low and high program regions

Estimates not significantly different from 0 (one school per


1,000 children contributed to an increase in education by
0.13 years and wages by 0.029 for children aged 2-6 when
the program was initiated).

30 / 76
DD Estimation – Regression Model
We can use a regression to obtain the DD estimate from
Card and Krueger (1994)

Yist = α + γNJs + λdt + β(NJs × dt ) + ϵist (2)

NJs is a dummy for restaurants in New Jersey


dt is a time-dummy equal to one in November
NJs × dt interaction term for observations from NJ in Nov (= Dst )

Saturated model: the conditional mean function E(Yist |s, t)


takes on four possible values and there are four parameters
The link between the parameters in (1) and the DD-model:
α = E[Yist |s = PA, t = Feb] = γPA + λFeb
γ = E[Yist |s = NJ, t = Feb] − E[Yist |s = PA, t = Feb] = γNJ − γPA
λ = E[Yist |s = PA, t = Nov ] − E[Yist |s = PA, t = Feb] = λNov − λFeb
β = {E[Yist |s = NJ, t = Nov ] − E[Yist |s = NJ, t = Feb]} − {E[Yist |s =
PA, t = Nov ] − E[Yist |s = PA, t = Feb]}
31 / 76
DD Estimation – Regression Model

Estimate equations like (1) provide a convenient way to construct


DD estimates and standard errors

Generalization of equation (1): add, for example, additional


states or periods, and covariates

So far, we have only considered switched-on/switched-off


dummy variables

Continuous treatment: “treatment intensity”

32 / 76
DD Estimation –Treatment Intensity
Example Card (1992): exploits regional variation in the impact
of the federal minimum wage some states are more affected
by a change in a federal law than others (sample: 51 states
in two periods (1989, 1990))
Yst = γs + λt + β(FAs × dt ) + ϵst (3)

FAs : fraction of teenage labor force that earned less than $3.80 before the
wage increase
dt : dummy for observations in 1990 when the federal minimum wage increased
from $3.35 to $3.80
Interaction term FAs × dt (like NJs × dt ) takes on distinct values for each
observation in the data set

The estimates (next slide) are reported from a first-differences


(FD) equation of (2), i.e. ∆Ys = λ∗ + βFAs + ∆ϵs
Note: λt turns into a constant (λ∗ ) and γs cancels out when
using FD of (2)
33 / 76
DD Estimation – Treatment Intensity

Reported estimates are from a FD of (2)


One can easily add further state-and-time varying covariates (e.g., adult employment
rate) to reduce potential omitted variable bias (caution: potentially bad controls!)
Counterfactual employment: E(Y0ist |s, t, Xst ) = γs + λt + Xst′ δ 34 / 76
DD Estimation – Treatment Intensity

Main empirical specification in Duflo (2001):

Sijk = education of individual i born in region j in year k


α1j = region of birth fixed effect
β1k = year of birth fixed effect
Ti = treatment dummy for young cohort (age 2-6 in 1974)
Pj = INPRES schools per thousand school-aged children
Cj = Vector of region-specific controls (that change over time)

35 / 76
DD Estimation – Treatment Intensity

36 / 76
DD Estimation – Anticipation effects

If the sample includes many years, the regression-DD model


lends itself to a test for (Granger) causality

The Granger idea is to see whether causes happen before


consequences and not vice versa

Suppose treatment Dst , changes at different points in times


in different states

In this context, Granger causality testing means a check


on whether, conditional on state and year effects, past Dst
predicts Yist while future Dst does not

37 / 76
DD Estimation – Anticipation effects

m
X q
X

Yist = γs + λt + β−τ Ds,t−τ + βτ Ds,t+τ + Xist δ + ϵist (4)
τ =0 τ =1

If Dst causes Yist but not vice versa, leads should not matter
Pattern of lagged effects interesting on its own
Autor (2003): Effect of employment protection on firms’ use of temporary worker

38 / 76
DD Estimation – Anticipation effects

Idea can also be used with cohorts like in Duflo (2001)— we would expect that
the school program only have an impact for relevant cohorts

dil is a dummy that indicates whether individual i is age l in 1974 (a year-of-birth


dummy)
Each coefficient γ1l can be interpreted as an estimate of the impact of the
program on a given cohort
Individuals aged 24 in 1974 are the reference group (omitted category)

39 / 76
DD Estimation – Anticipation effects

Cohort-by-cohort contrasts (Duflo, 2001)

40 / 76
DD Estimation – State-specific time trends


Yist = γ0s + γ1st t + λt + βDs,t + Xist δ + ϵist (5)

γ0s : state-specific intercept

γ1st : state-specific time-trend (allows treatment and control states


to follow different trends)

Note: at least 3 (usually more) periods are required to estimate a


model with state-specific trends

Check Table 5.2.3 in Angrist and Pischke textbook: inclusion of


state-specific time trends kills the labor regulation effect

NOTE: Card and Krueger (1994) is a typical example for a DD


setting in applied economics: states and time, but the method is
widely used for other applications as well
41 / 76
Further Diagnostics for Parallel Trends

Placebo test using previous periods:


Suppose DD with time periods t1 , t2 , t3 , where treatment
occurs in t3
Exclude data from t3 , assign t2 as “placebo” treatment period
and re-estimate DD

Placebo test using alternative groups:


Re-code some control groups as treated
Re-estimate DD with the placebo treated units & without
actual treated units

Placebo outcomes:
Find outcomes that, theoretically, should be unaffected by
the treatment
Re-estimate DD on these outcomes

42 / 76
Further Diagnostics for Parallel Trends

Example: Duflo (2001)

43 / 76
Different Types of DD Regressions
1. Standard DD
Treatment happened at the same time
Intensity of the treatment is the same (treated yes/no)
One classical reference is Card and Krueger (1994, AER)

2. DD with different treatment groups/periods


Multiple groups and/or time periods
This is often used when there is a staggered implementation
of a policy (i.e., a roll-out)
One classical reference is Wulfers (2006, AER)

3. DD with different treatment intensities


Often used when there is no clear control group
Units are treated to different degrees by the intervention
One classical reference is Duflo (2001, AER)
44 / 76
DD Estimation – further important assumptions

a) Intervention unrelated to outcome at baseline (allocation of


intervention was not determined by outcome)
“Ashenfelter dip” : It was common to compare wage gains
among participants and non-participants in training programs
to evaluate the effect of training on earnings. Ashenfelter
(1978) noted that participants in training programs often experience
a dip in earnings just before they enter the program (probably
one reason why they participate). Since wages have a natural
tendency to mean reversion, this leads to an upward bias of
the DD estimate of the program effect.
Regional targeting: NGOs may target villages that appear
most promising (or that are the worst off)

45 / 76
DD Estimation – further important assumptions
b) No spillover effects
Control group should not at all be affected by treatment
This assumption is actually in many situations violated
For example, in disaster studies using DD, it’s quite likely that
one area is flooded (i.e., treatment) and the neighboring area
is not (i.e., control) but through economic interactions, the
control area is also affected by treatment (= the flood)

What should an applied researcher do about this?


Use control units that are less likely to be affected. In the
flood case, don’t use neighboring counties. However, this
could introduce some unwanted heterogeneity. So there is
a trade-off here!
New research on how to estimate DD when spillover effects
are present (e.g., DD Spillovers )
46 / 76
DD Estimation – further important assumptions

c) No future shocks correlated with treatment


There is really no test for this, but it implies that one should
not extrapolate the design too far into the future
Applied researchers also sometimes attempt to control for
time-varying confounders, however some of them are so-called
bad controls (i.e., variables which themselves are influenced
by treatment)

d) No compositional differences across time


In repeated cross-sections, the composition of the sample
may change between periods, i.e. due to migration
This may confound any DD estimate since “effect” may be
attributable to change in population

47 / 76
DD Estimation – Individual-Level Panel Data
Individual-level panel data is a powerful tool for estimating
policy effects
Let’s consider again only two periods and let wit be a binary
indicator, which is unity if unit i participates in the program
at time t

yit = α + λd2t + βwit + ci + uit , t = 1,2

d2t = 1 if t = 2 and zero otherwise, ci are individual fixed


effects, β denotes the treatment effect
Remove ci by first differencing

(yi2 − yi1 ) = λ + β(wi2 − wi1 ) + (ui2 − ui1 )

OR ...
48 / 76
DD Estimation – Individual-Level Panel Data

∆yi = λ + β∆wi + ∆ui

If E(∆wi ∆ui ) = 0, that is, the change in treatment status is


uncorrelated with changes in the idiosyncratic errors, then
OLS estimation is consistent (but remember: issues with
serially correlated standard errors!)

The leading case is when wi1 = 0 for all i, so that no units


were exposed to the program in the initial time period

The OLS estimator is: βbFD = ∆ȳ treat − ∆ȳ control

Can be extended to many time periods and arbitrary treatment


patterns (further DD assumptions see slides below)

49 / 76
Extension: DDD Estimation
Recall – Basic DD Estimation

y = α + γdB + λd2 + βd2 × dB + ϵ

Let A be the control group and B the treatment group

y is outcome of interest

dB captures possible differences between the treatment and


control groups prior to the policy change

d2 captures aggregate factors that would cause changes in


y over time even in the absence of the policy change

The coefficient of interest is β = (ȳB,2 − ȳB,1 ) − (ȳA,2 − ȳA,1 )

50 / 76
Extension: DDD Estimation

Let’s add a “third dimension”

E.g.: Change in state health care policy aimed at elderly

Use the same two groups from another (“untreated”) state


as additional control group

Let dE be a dummy equal to one for someone over 65

Let dB be a dummy for living in the “treatment” state

y = α + γ1 dB + γ2 dE + γ3 dB × dE + λd2 + . . .
+ β1 d2 × dB + β2 d2 × dE + β3 d2 × dB × dE + ϵ

51 / 76
Extension: DDD Estimation

Differences-in-differences-in-differences (DDD) estimator

βb3 = [(ȳB,E,2 − ȳB,E,1 ) − (ȳB,N,2 − ȳB,N,1 )]


− [(ȳA,E,2 − ȳA,E,1 ) − (ȳA,N,2 − ȳA,N,1 )]

The A subscript means the state not implementing the policy


and the N subscript represents the non-elderly

The triple difference estimator can be computed as the difference


between two difference-in-differences estimators

βb3 is the difference-in-difference-in-differences (DDD) estimator

52 / 76
Extension: DDD Estimation

Why is a DDD estimation appealing?

1 Why not only using data on people in the state with the
policy change, both before and after the change, with the
control group being people under 65 and the treatment group
being people 65 and older?
⇒ βb = [(ȳB,E,2 − ȳB,E,1 ) − (ȳB,N,2 − ȳB,N,1 )]

2 Why not use another state as the control group and use the
elderly from the non-policy state as the control group?
⇒ βb = [(ȳB,E,2 − ȳB,E,1 ) − (ȳA,E,2 − ȳA,E,1 )]

53 / 76
Extension: DDD Estimation
DDD Estimate – Repeated Cross Sections
Problem with (1): Other factors unrelated to the state’s new
policy might affect the health of the elderly relative to the
younger population in the state
Problem with (2): Changes in the health of the elderly might
be systematically different across states due to, say, income
and wealth differences, rather than the policy change
DDD estimate accounts for those two potential confounding
factors
DDD estimate is the difference between the DD of interest
and the placebo DD (that is supposed to be zero)
One can add covariates to the DDD analysis to (hopefully)
control for compositional changes
54 / 76
Example DDD: Aaronson et al. (2014, AER)

Question: Does better access to education reduce fertility?

55 / 76
Example DDD: Aaronson et al. (2014, AER)

Case study Rosenwald Rural Schools Initiative:


Construction of almost 5,000 schools between 1913-1932 in
the US South—targeted at rural blacks (allows formation of
proper control groups)

Idea: How does access to Rosenwald schools affect fertility


of black households

Key empirical challenge: Rosenwald schools were not randomly


located

56 / 76
Example DDD: Distribution of Rosenwald Schools

Fraction of school-age black children in a county who could have been


seated in a Rosenwald school

57 / 76
Example DDD: Aaronson et al. (2014, AER)

Census data from IPUMS random samples for 1910-1930

Fertility: # surviving children under the age of ten


Rosenwald exposure for older cohorts (age 25-49)
1
P10
Ect = 10 k =1 Tt−k ,c
Tct = Rosenwald teachers multiplied by assumed classroom
size of 45 divided by rural black children (age 7-13) in county
c at time t
Expanded schooling opportunities that women of childbearing
age might expect for their children based on the Rosenwald
schools they observe in their community

58 / 76
Example DDD: Aaronson et al. (2014, AER)
Empirical Specification: Older Cohorts of Women
yict = β0 blacki + β1 rurali + β2 Xi + β3 ageit + t + c
= (γ0 + γ1 blacki + γ2 rurali + γ3 (blacki × rurali )) × Ect + ϵict

Preferred estimator (DDD): B-W rural - B-W urban (γ3 )


Any alternative explanation for the result (γ3 ) must reflect
confounding factors that affected only rural blacks and not
rural whites or urban blacks in the same county
Alternative estimators (DD): Black, rural-urban (γ2 + γ3 ) or
B-W rural (γ1 + γ3 )
Ect takes on values between 0 and 1, such that the coefficients
(γ0 − γ3 ) can be interpreted as the effect of going from no
Rosenwald exposure in a county to complete exposure
59 / 76
Example DDD: Aaronson et al. (2014, AER)

60 / 76
Example DDD: Aaronson et al. (2014, AER)
The preferred estimate shows that going from no exposure
to complete exposure results in an increase of 0.055 children
(Col. 1) but not statistically significant
These positively signed point estimates appear to contradict
the prediction of the standard quantity-quality model
Along the extensive margin (Col. 2), the triple DD estimate
indicates that complete exposure to the schools increases
the probability that a woman had a child in the preceding
ten years by 5.0 percentage points
Among women who had at least one child in the preceding
ten years (Col. 3), full exposure leads to 0.100 (0.108) fewer
children (not statistically significant)
More black children grew up in smaller families as the distribution
of the number of children was “compressed” from both sides
61 / 76
Heterogeneous Difference in Differences

So far: Assumption: treatment does not change over groups


or time
⇒ Homogeneous treatment effects

But: what happens if individuals do not receive the treatment


at the same time (rollout designs)? Timing of treatment and
who the treated groups are compared to matter
⇒ Heterogeneous treatment effects

62 / 76
Heterogeneous Difference in Differences

We can use a two-way fixed effects regression if there is


staggered treatment:

Yst = γs + λt + β DD Ds,t + ϵist (6)

where γs and λt are state and time fixed effects and Ds,t the
treatment dummy that indicates when states get treated

We can add to this equation controls, specific time trends


but . . .

the issue is how to interpret β DD is now more complicated

63 / 76
Heterogeneous Difference in Differences

I provide intuition and point to recent developments in the


econometrics literature that provide solutions to this problem

Goodman-Bacon Decomposition (2021):


Goodman-Bacon provides a helpful decomposition of the
two-way fixed effects estimate of β DD
Punchline: two-way fixed effects estimator is a weighted
average of all potential 2 x2 DD estimates where weights
are both based on group sizes and variance in treatment
Issue if treatment effects are time-varying (More Info)

64 / 76
Goodman-Bacon Decomposition
Assume we have three groups: two groups are treated at
different dates and one group is never treated

That means we have four comparisons and hence four difference


in differences that will contribute to β DD
65 / 76
Goodman-Bacon Decomposition
First DD: Early-treated vs never treated

66 / 76
Goodman-Bacon Decomposition
Second DD: Late-treated vs never treated

67 / 76
Goodman-Bacon Decomposition
Third DD: Early-treated with late-treated before tl∗

68 / 76
Goodman-Bacon Decomposition
Fourth DD: Late-treated vs early-treated after tk∗

69 / 76
Goodman-Bacon Decomposition
What we get as β̂ DD is a weighted average of all DDs —
some we don’t want (already-treated units act as controls
even though they are treated)

70 / 76
Goodman-Bacon Decomposition

Weights s are a function of:


Size of each subgroup (what share of units (e.g., states) are
in the treatment and control group for a given pair, and what
share of time periods are used in a given 2x2 sub-sample)
Variance of treatment (how close to the beginning/end of the
sub-sample window does the treatment turn on

One can show:


Given the weighting function, panel length alone can change
the DD estimate substantially even when each β̂ DD does not
change
Groups treated closer to the middle of panel receive higher
weights than those treated earlier or later

71 / 76
Goodman-Bacon Decomposition – Example

72 / 76
Goodman-Bacon Decomposition – Example

73 / 76
Goodman-Bacon Decomposition – Recap

Two-way FE don’t do what people thought they did!

Decomposition helps us understanding where the DD is


“coming from”, i.e., showing us which 2x2 matter most (source
of variation) and how different are 2x2 DD (heterogeneity)

Provides insights why estimates differ across specification:


is it weights, 2x2 DD, or both

Implemented in STATA: command bacondecomp

74 / 76
Solutions to staggered treatment adoption setups

Callaway and Sant’Anna (2021): Difference-in-differences


with multiple time periods

Sun and Abraham (2021): Estimating dynamic treatment


effects in event studies with heterogeneous treatment effects

Other solutions see surveys of Roth et al. (2023) and Chaismartin


and D’Haultfoeuille (2021)

STATA: Heterogenous DD

75 / 76
Learning Outcomes

1 For what research designs is it appropriate to use a DD


estimation strategy?
2 What are the minimum requirements to conduct a DD analysis?
3 Which is the key assumption of a DD analysis – can it be
tested?
4 What are other potential concerns that would invalidate a
DD analysis?
5 Why is it appealing to use a DDD estimation instead of a
DD estimation if data and the research question allow it?
6 What problems arise if there is treatment heterogeneity?

76 / 76

You might also like