01 Introduction
01 Introduction
Lecture 1: Introduction
Pedro H. C. Sant’Anna
Emory University
January 2025
Introduction and DiD Popularity
Importance of Empirical Research
2
What about Experiments (or A/B tests)?
3
Popularity of Difference-in-Differences methods
4
Recent popularity of DiD methods in empirical work
Goldsmith-Pinkham (2024) built on Currie et al. (2020) update the analysis using NBER
working papers data that ends in May 2024.
Fraction of papers
2%
20%
1%
10%
0% 0%
2000 2004 2008 2012 2016 2020 2024 2000 2004 2008 2012 2016 2020 2024
Year of NBER Working Paper Year of NBER Working Paper
5
Popularity of Difference-in-Differences methods: by fields
Goldsmith-Pinkham (2024) built on Currie et al. (2020) and document the popularity of
DiD within economics.
6
Popularity of Difference-in-Differences methods
7
Why DiD is so popular?
Causality with Observational Data: What can I do?
■ With observational data, we have no choice but rely on assumptions to talk about
causal inference.
■ Our job as researchers is to assess the pros and cons of each method in their ability
to answer the questions we (and the business/policy makers/stakeholders) care
about.
8
Causality with Observational Data: What can I do?
■ WHY?!
9
Causality with Observational Data: What else?
10
The appeal of Difference-in-Differences
■ DiD methods exploit variation in time (before vs. after) and across groups (treated vs.
untreated) to recover causal effects of interest.
■ Data Requirements: We need data from time periods before and after treatment to
use DiD (and some periods where no unit is treated).
11
Some DiD Examples
Some DiD Examples
▶ Compared the changes in wages, employment, and prices at stores in New Jersey
(increased minimum wage) relative to stores in Pennsylvania (minimum wage remained
fixed).
■ Dube, Lester and Reich (2010); Dube, William Lester and Reich (2016), Callaway and
Sant’Anna (2021) and many others:
Effect of minimum wage on different measures of employment
▶ Callaway and Sant’Anna (2021) exploit variation in the timing of state minimum wage
changes to understand its effect on teen employment.
12
Some DiD Examples
■ Meyer, Viscusi and Durbin (1995): Effect of weekly benefit amount on time out of work
due to injury.
▶ They compared high-earnings (affected by the policy change) and low-earnings (not
affected by the policy change) individuals injured before and after increases in the
maximum weekly benefit amount. Estimated effects in Kentucky and Michigan.
13
Some DiD Examples
■ Carey, Miller and Wherry (2020): Effect of Medicaid expansion on access to care and
utilization for those who are already insured.
▶ They compare different measures of insurance coverage and health care utilization
among states that opted to expand Medicaid eligibility in 2014 or 2015 with those states
that did not expand by 2015, before and after the expansion.
■ Assunção, Gandour, Rocha and Rocha (2020): Effect of rural credit on deforestation.
14
Some DiD Examples
■ Beck, Levine and Levkov (2010): Effect of bank branching deregulation on income
distribution in the US.
▶ Exploit staggered bank deregulation across states to understand its effect on Gini index
(among other outcomes); see also Baker, Larcker and Wang (2022).
■ Venkataramani, Shah, O’Brien, Kawachi and Tsai (2017): Effect of US Deferred Action
for Childhood Arrivals (DACA) immigration program on health outcomes.
▶ Compared changes in health outcomes among individuals who met key DACA eligibility
criteria (based on age at immigration and at the time of policy implementation) before
and after program implementation versus changes in outcomes for individuals who did
not meet these criteria.
15
Canonical DiD Estimator
The Canonical Difference-in-Differences estimator
where Yg=d,t=j is the sample mean of the outcome Y for units in group d in time
period j,
Nall
1
Ng=d,t=j i∑
Yg=d,t=j = Yi 1{Gi = d}1{Ti = j},
=1
with
Nall
Ng=d,t=j = ∑ 1{Gi = d}1{Ti = j},
i=1
Gi and Ti are group and time dummy, respectively, and Yi is the “pooled” outcome
data.
16
Difference-in-Differences via graphs
Raw Data
60
50
40
Outcome
20 20
20
10
0
Pre Post
Comparison Treated 17
Difference-in-Differences via graphs
Raw Data
60
50
Average outcome for those treated
40
Outcome
20 20
20
10
Comparison Treated 18
Difference-in-Differences via graphs
60
50
Average outcome for those treated
40
Outcome
30
20 20
20
10
Comparison Treated 19
Difference-in-Differences via graphs
60
50
Average outcome for those treated
Treatment Effect:
40
20
Outcome
30
20 20
20
10
Comparison Treated 20
But what kind of treatment effect
21
We need to talk about:
1. Potential outcomes
2. Assumptions
22
Potential Outcomes
Causality with Potential Outcomes
■ We will adopt the Rubin Causal Model and define potential outcomes.
■ Potential outcomes will reflect the time you are first treated (we can “play” with this
later).
■ Let Yi,t (g) be the potential outcome for unit i, at time t, if this unit is first treated at
time period g.
■ T periods: t = 1, ..., T.
■ Let Gi ∈ G ⊂ {1, ..., T} ∪ {∞} denote the time unit i is first-treated, with the notion
that if a unit is “never-treated”, Gi = ∞.
■ We are calling a group “never treated” if this set of units remains untreated in all
time periods in our data.
■ With two time periods t = 1, 2, we call the group of units that are still not exposed to
treatment by time t = 2 the “never treated”.
▶ This is the case even if some of these units are eventually treated at time t = 3 (which
we do not have access to this data yet).
24
Causality with Potential Outcomes in the canonical 2x2 DiD setup
“Traditionally”, we call these potential outcomes Yi,t (1) and Yi,t (0), instead of Yi,t (2) and Yi,t (∞).
However, that notation is hard to extend to setups with variations in treatment timing.
25
Causality with Potential Outcomes in the canonical 2x2 DiD setup
■ Treatment Effect
▶ The treatment effect or causal effect of the treatment on the outcome of unit i at time t
is the difference between its two potential outcomes:
Yi,t (2) − Yi,t (∞)
■ Observed outcome
▶ At time t we cannot observe both potential outcomes Yi,t (2) and Yi,t (∞).
26
Fundamental problem of causal inference: Missing data problem
Data
Unit Yi,t=1 (2) Yi,t=2 (2) Yi,t=1 (∞) Yi,t=2 (∞) Gi
1 ? ? ✓ ✓ ∞
2 ✓ ✓ ? ? 2
3 ? ? ✓ ✓ ∞
4 ✓ ✓ ? ? 2
.. .. .. .. .. ..
. . . . . .
n ✓ ✓ ? ? 2
✓: Observed data
?: Missing data (unobserved counterfactuals)
27
Causality with Potential Outcomes in the canonical 2x2 DiD setup
■ Problem:
▶ Causal inference is difficult because it involves missing data.
28
Causal Parameters of Interest
Target Parameters in the 2x2 DiD Setup
29
Parameters of interest in the 2x2 DiD Setup
■ ATT
The Average Treatment Effect on the Treated at time period t = 2 is
ATT = E [Yi,t=2 (2) − Yi,t=2 (∞)|Gi = 2]
■ ATU
The Average Treatment Effect on the Untreated at time period t = 2 is
ATU = E [Yi,t=2 (2) − Yi,t=2 (∞)|Gi = ∞]
■ ATE
The (overall) Average Treatment Effect at time period t = 2 is
ATE = E [Yi,t=2 (2) − Yi,t=2 (∞)]
30
Parameters of interest in the 2x2 DiD Setup
■ ATT: What is the average effect of the policy/treatment among units that actually
received the treatment by time t = 2?
■ ATU: What is the average effect of the policy/treatment among units that did not
receive the treatment by time t = 2 if they were to receive the treatment?
■ ATE: What is the overall average effect of the policy/treatment if everybody were to
be treated at time t = 2?
31
What if we have multiple groups?
32
Potential Parameters of interest in the multi-group DiD setups
■ ATT(g, t)
The average treatment effect of being first-treated in period g < ∞ (compared to
never-being treated), among units first-treated in period g, at time period t is
ATT(g, t) = E [Yi,t (g) − Yi,t (∞)|Gi = g]
■ ATU(g, t)
The average treatment effect of being first-treated in period g (compared to
never-being treated), among the never-treated units, at time period t is
ATU(g, t) = E [Yi,t (g) − Yi,t (∞)|Gi = ∞]
■ ATE(g, t)
The (overall) average treatment effect of being first-treated in period g (compared to
never-being treated) at time period t is
ATE(g, t) = E [Yi,t (g) − Yi,t (∞)] 33
But we do not need to fix the baseline!
34
Parameters of interest in the multi-group DiD setups
■ ATT(g’, g, t|g∗ )
The average treatment effect of switching first-treatment time from g′ to g, among
units first treated in period g′ , at time t:
ATT(g′ , g, t|g∗ ) = E Yi,t (g) − Yi,t (g′ )|Gi = g∗
■ ATU(g’, g, t|∞)
The average treatment effect of switching first-treatment time from g′ to g, among
never-treated units, at time t is
ATU(g′ , g, t|∞) = E Yi,t (g) − Yi,t (g′ )|Gi = ∞ = ATU(g, t) − ATU(g′ , t)
■ ATE(g’,g, t)
The (overall) a average treatment effect of switching first-treatment time from g′ to g,
at time period t is
ATE(g′ , g, t) = E Yi,t (g) − Yi,t (g′ ) = ATE(g, t) − ATE(g′ , t) 35
What if treatment can turn on and off?
What if treatment is
multi-valued/continuous?
36
Exercise
Exercise with treatments turning on and off
■ Time to check how well we are following the principles of how to build causal
parameters in different setups.
37
Exercise with treatments turning on and off
■ Question 4: Define the overall average treatment effect at time t of taking a specific
treatment sequence compared to never being treated.
38
Exercise with continuous and multi-valued treatments
39
Exercise with continuous and multi-valued treatments
40
Exercise with continuous and multi-valued treatments
■ Question 11: The above marginal average treatment effects are “local“ to a dosage d.
Can you think of a more aggregate treatment effect measure that may summarize the
above marginal average treatment effects across different dosages d?
41
References
Assunção, Juliano, Clarissa Gandour, Romero Rocha, and Rudi Rocha, “The Effect of Rural
Credit on Deforestation: Evidence from the Brazilian Amazon,” Economic Journal, 2020,
130 (626), 290–330.
Baker, Andrew C., David F. Larcker, and Charles C.Y. Wang, “How much should we trust
staggered difference-in-differences estimates?,” Journal of Financial Economics, 2022,
144 (2), 370–395.
Beck, Thorsten, Ross Levine, and Alexey Levkov, “Big bad banks? The winners and losers
from bank deregulation in the United States,” Journal of Finance, 2010, 65 (5), 1637–1667.
Callaway, Brantly and Pedro H. C. Sant’Anna, “Difference-in-Differences with Multiple
Time Periods,” Journal of Econometrics, 2021, 225 (2), 200–230.
Card, David and Alan Krueger, “Minimum Wages and Employment: A Case Study of the
Fast-Food Industry in New Jersey and Pennsylvania,” American Economic Review, 1994,
84 (4), 772–793.
41
Carey, Colleen M., Sarah Miller, and Laura R. Wherry, “The Impact of Insurance Expansions
on the Already Insured: The Affordable Care Act and Medicare†,” American Economic
Journal: Applied Economics, 2020, 12 (4), 288–318.
Currie, Janet, Henrik Kleven, and Esmée Zwiers, “Technology and Big Data Are Changing
Economics: Mining Text to Track Methods,” AEA Papers and Proceedings, May 2020, 110,
42–48.
Dube, Arindrajit, T William Lester, and Michael Reich, “Minimum Wage Effects across State
Borders: Estimates Using Contiguous Counties,” The Review of Economics and Statistics,
2010, 92 (4), 945–964.
, T. William Lester, and Michael Reich, “Minimum wage shocks, employment flows, and
labor market frictions,” Journal of Labor Economics, 2016, 34 (3), 663–704.
Goldsmith-Pinkham, Paul, “Tracking the Credibility Revolution across Fields,”
arXiv:2405.20604, 2024.
41
Malesky, Edmund J., Cuong Viet Nguyen, and Anh Tran, “The impact of recentralization on
public services: A difference-in- differences analysis of the abolition of elected
councils in Vietnam,” American Political Science Review, 2014, 108 (1), 144–168.
Meyer, Bruce D., W. Kip Viscusi, and David L. Durbin, “Workers’ Compensation and Injury
Duration: Evidence from a Natural Experiment,” The American Economic Review, 1995,
85 (3), 322–340.
Venkataramani, Atheendar S., Sachin J. Shah, Rourke O’Brien, Ichiro Kawachi, and
Alexander C. Tsai, “Health consequences of the US Deferred Action for Childhood
Arrivals (DACA) immigration programme: a quasi-experimental study,” The Lancet Public
Health, 2017, 2 (4), e175–e181.
41