Applied Economics DD Lecture Notes
Applied Economics DD Lecture Notes
Philipp Ager
University of Mannheim and CEPR
1 / 76
Difference-in-Differences Estimation
A non-technical introduction
3 / 76
Randomized Experiments
Random assignment makes treatment independent of potential
outcomes, hence:
Selection effect is zeroed out
Treatment effect on the treated is equal to the average treatment
effect
5 / 76
DD Estimation
Intuitively, DD estimation is just a comparison of 4 cell-level
means
Only one cell is treated: Treatment x Post-Program
6 / 76
DD Estimation
DD is typically used to estimate the causal effect of a specific
intervention, such as the enactment of certain policies (school
laws, minimum wages, trade agreements, etc.), the adoption
of new technologies, . . .
The basic idea in DD is to compare the changes in outcomes
over time (1. diff) between the treatment and control groups
(2. diff)
See next slide for graphical explanation
In order to setup a DD analysis, you need (as a minimum):
1 Before/after date (you need to know when the intervention
happened)
2 You need to be able to separate treatment/control groups
(in some dimension). Who were affected and who were not
affected by the intervention?
7 / 76
DD Estimation – Graphical Approach
Visualization: DD on Twitter
8 / 76
DD Estimation – Graphical Approach
11 / 76
Card and Krueger (1994) – Wages before policy
change
12 / 76
Card and Krueger (1994) – Wages after policy change
13 / 76
Card and Krueger (1994) – DD estimate
Potential outcomes:
1 Y1ist = fast food employment at restaurant i and period t if
there is a high state minimum wage
2 Y0ist = fast food employment at restaurant i and period t if
there is a low state minimum wage
15 / 76
Card and Krueger (1994) – DD estimate
Let’s introduce a dummy for high-minimum wage states, Dst
18 / 76
DD Estimation – Common Trends Assumption
One might also control for certain time trends directly (depends
on data structure and identification comes from functional
form assumption)
20 / 76
Example – Common Trends Assumption
21 / 76
DD Estimation – Common Trends Assumption
Graph provides strong visual evidence of treatment and control states with a
common underlying trend, and a treatment effect that induces a sharp but transitory
deviation from this trend
Shorter school year seems to have increased repetition rates for affected cohorts
24 / 76
DD Estimation: 2 Periods - 2 Groups
comparison comparison
(2) Ȳpost − Ȳpre = E[γi |Di = 0] + λ2 − (E[γi |Di = 0] + λ1 )
26 / 76
DD Estimation: Example Duflo (2001)
27 / 76
DD Estimation: Example Duflo (2001)
28 / 76
DD Estimation: Example Duflo (2001)
29 / 76
DD Estimation: Example Duflo (2001)
30 / 76
DD Estimation – Regression Model
We can use a regression to obtain the DD estimate from
Card and Krueger (1994)
32 / 76
DD Estimation –Treatment Intensity
Example Card (1992): exploits regional variation in the impact
of the federal minimum wage some states are more affected
by a change in a federal law than others (sample: 51 states
in two periods (1989, 1990))
Yst = γs + λt + β(FAs × dt ) + ϵst (3)
FAs : fraction of teenage labor force that earned less than $3.80 before the
wage increase
dt : dummy for observations in 1990 when the federal minimum wage increased
from $3.35 to $3.80
Interaction term FAs × dt (like NJs × dt ) takes on distinct values for each
observation in the data set
35 / 76
DD Estimation – Treatment Intensity
36 / 76
DD Estimation – Anticipation effects
37 / 76
DD Estimation – Anticipation effects
m
X q
X
′
Yist = γs + λt + β−τ Ds,t−τ + βτ Ds,t+τ + Xist δ + ϵist (4)
τ =0 τ =1
If Dst causes Yist but not vice versa, leads should not matter
Pattern of lagged effects interesting on its own
Autor (2003): Effect of employment protection on firms’ use of temporary worker
38 / 76
DD Estimation – Anticipation effects
Idea can also be used with cohorts like in Duflo (2001)— we would expect that
the school program only have an impact for relevant cohorts
39 / 76
DD Estimation – Anticipation effects
40 / 76
DD Estimation – State-specific time trends
′
Yist = γ0s + γ1st t + λt + βDs,t + Xist δ + ϵist (5)
Placebo outcomes:
Find outcomes that, theoretically, should be unaffected by
the treatment
Re-estimate DD on these outcomes
42 / 76
Further Diagnostics for Parallel Trends
43 / 76
Different Types of DD Regressions
1. Standard DD
Treatment happened at the same time
Intensity of the treatment is the same (treated yes/no)
One classical reference is Card and Krueger (1994, AER)
45 / 76
DD Estimation – further important assumptions
b) No spillover effects
Control group should not at all be affected by treatment
This assumption is actually in many situations violated
For example, in disaster studies using DD, it’s quite likely that
one area is flooded (i.e., treatment) and the neighboring area
is not (i.e., control) but through economic interactions, the
control area is also affected by treatment (= the flood)
47 / 76
DD Estimation – Individual-Level Panel Data
Individual-level panel data is a powerful tool for estimating
policy effects
Let’s consider again only two periods and let wit be a binary
indicator, which is unity if unit i participates in the program
at time t
OR ...
48 / 76
DD Estimation – Individual-Level Panel Data
49 / 76
Extension: DDD Estimation
Recall – Basic DD Estimation
y is outcome of interest
50 / 76
Extension: DDD Estimation
y = α + γ1 dB + γ2 dE + γ3 dB × dE + λd2 + . . .
+ β1 d2 × dB + β2 d2 × dE + β3 d2 × dB × dE + ϵ
51 / 76
Extension: DDD Estimation
52 / 76
Extension: DDD Estimation
1 Why not only using data on people in the state with the
policy change, both before and after the change, with the
control group being people under 65 and the treatment group
being people 65 and older?
⇒ βb = [(ȳB,E,2 − ȳB,E,1 ) − (ȳB,N,2 − ȳB,N,1 )]
2 Why not use another state as the control group and use the
elderly from the non-policy state as the control group?
⇒ βb = [(ȳB,E,2 − ȳB,E,1 ) − (ȳA,E,2 − ȳA,E,1 )]
53 / 76
Extension: DDD Estimation
DDD Estimate – Repeated Cross Sections
Problem with (1): Other factors unrelated to the state’s new
policy might affect the health of the elderly relative to the
younger population in the state
Problem with (2): Changes in the health of the elderly might
be systematically different across states due to, say, income
and wealth differences, rather than the policy change
DDD estimate accounts for those two potential confounding
factors
DDD estimate is the difference between the DD of interest
and the placebo DD (that is supposed to be zero)
One can add covariates to the DDD analysis to (hopefully)
control for compositional changes
54 / 76
Example DDD: Aaronson et al. (2014, AER)
55 / 76
Example DDD: Aaronson et al. (2014, AER)
56 / 76
Example DDD: Distribution of Rosenwald Schools
57 / 76
Example DDD: Aaronson et al. (2014, AER)
58 / 76
Example DDD: Aaronson et al. (2014, AER)
Empirical Specification: Older Cohorts of Women
yict = β0 blacki + β1 rurali + β2 Xi + β3 ageit + t + c
= (γ0 + γ1 blacki + γ2 rurali + γ3 (blacki × rurali )) × Ect + ϵict
60 / 76
Example DDD: Aaronson et al. (2014, AER)
The preferred estimate shows that going from no exposure
to complete exposure results in an increase of 0.055 children
(Col. 1) but not statistically significant
These positively signed point estimates appear to contradict
the prediction of the standard quantity-quality model
Along the extensive margin (Col. 2), the triple DD estimate
indicates that complete exposure to the schools increases
the probability that a woman had a child in the preceding
ten years by 5.0 percentage points
Among women who had at least one child in the preceding
ten years (Col. 3), full exposure leads to 0.100 (0.108) fewer
children (not statistically significant)
More black children grew up in smaller families as the distribution
of the number of children was “compressed” from both sides
61 / 76
Heterogeneous Difference in Differences
62 / 76
Heterogeneous Difference in Differences
where γs and λt are state and time fixed effects and Ds,t the
treatment dummy that indicates when states get treated
63 / 76
Heterogeneous Difference in Differences
64 / 76
Goodman-Bacon Decomposition
Assume we have three groups: two groups are treated at
different dates and one group is never treated
66 / 76
Goodman-Bacon Decomposition
Second DD: Late-treated vs never treated
67 / 76
Goodman-Bacon Decomposition
Third DD: Early-treated with late-treated before tl∗
68 / 76
Goodman-Bacon Decomposition
Fourth DD: Late-treated vs early-treated after tk∗
69 / 76
Goodman-Bacon Decomposition
What we get as β̂ DD is a weighted average of all DDs —
some we don’t want (already-treated units act as controls
even though they are treated)
70 / 76
Goodman-Bacon Decomposition
71 / 76
Goodman-Bacon Decomposition – Example
72 / 76
Goodman-Bacon Decomposition – Example
73 / 76
Goodman-Bacon Decomposition – Recap
74 / 76
Solutions to staggered treatment adoption setups
STATA: Heterogenous DD
75 / 76
Learning Outcomes
76 / 76