lecture03-differencing-estimators
lecture03-differencing-estimators
1
OBSERVATIONAL RESEARCH
3
Causal inference and observational research designs
• Randomized experiments are not always possible or practical
• Regression discontinuity designs require speci7c thresholds
• Now we move to next viable alternatives: observational research
designs and quasi-experiments (admittedly this is a bit of a loose
term)
• First quasi-experiment design: differencing estimators
• To understand what quasi-experiments solve, 7rst we’ll discuss
selection on observables designs, which are generally not considered
quasi-experiments
4
Selection on observables
Techniques
1. Regression adjustment
2. Non-parametric regression
3. Matching estimators
4. Propensity score estimators
5. Combinations of the above
5
Selection on observables
Conditional unconfoundedness
Y i1 , Y i0 ⊥ D i |X i
Y i1 , Y i0 ⊥ D i
6
Selection on observables
When does selection on observables fail?
7
Selection on observables
When does selection on observables fail?
8
Selection on observables
Voter mobilization experiment
10
Quasi-experiments
De>nition
12
DIFFERENCING ESTIMATORS
14
Differencing estimators
Roadmap
15
Differencing estimators
Our comparisons so far
16
Differencing estimators
What temporal comparisons deliver
• Key assumptions: the treated group would have looked the same as the
control group in the absence of treatment (after including controls)
• This assumption is often hard to defend in non-randomized settings
• One way to test is via progressive inclusion of controls: if point estimates
are sensitive to available controls, suggests they may also be sensitive
to controls you don’t observe
→ This is actually the idea that motivates Oster (2018), which
describes a method to bound bias from unobservables
• Alternatively, we can relax this assumption by exploiting temporal
comparisons alongside the cross-sectional comparison
17
Difference-in-differences
Single vs. double differences
• One way to describe our comparisons thus far is that they are single
d i f fe r e n c e s
• The estimated effect of a policy is simply the difference in expected
outcomes between treated and control groups:
τ = E[Y i1 |D i = 1] − E[Y i0 |D i = 0]
What DID does is take the difference of two comparisons in three steps:
The name comes from the fact that we are taking the difference (3)
between two differences (1 and 2)
19
Difference-in-differences
Why use DID?
20
Conservation example
With state-speci>c and time-speci>c effects
21
Conservation example
Possible comparisons
There are two ways you could think about trying to estimate B using
differences:
22
Conservation example
The cross-sectional difference
B + (NY + T 1 ) − (P A + T 1 ) = B + NY − P A
23
Conservation example
The time series difference
(B + NY + T 1 ) − (NY + T 0 ) = B + T 1 − T 0
24
Conservation example
Difference-in-differences
• The time series differences lets us control for all Nxed determinants
within a state
• The cross-sectional difference lets us control for all period-speci7c
determinants common across all states
• Combining these two differences addresses both and lets us recover B,
the true effect of the policy!
25
Difference-in-differences
Not magic
26
Difference-in-differences
Add a differential trend
27
Difference-in-differences
Key identi>cation assumption
• With this understanding of the structure of DID, you now have the
framework to assess basically any DID-style design
• Ask: are there are any unobserved correlated, time-varying drivers of
both outcome and treatment?
→ Still workable, if we can control for those drivers in some way
• This is what we assume away with the parallel trends assumption
28
Difference-in-differences
Regression framework
29
Difference-in-differences
Visual parallel trends
Parallel trends
30
Difference-in-differences
Defending parallel trends
Y it = α + βD i + ϕ t + δT it + ε it
32
Dynamic DID
Adding unit FE
Y it = α + ϕ i + ϕ t + δT it + ε it
33
CURRIE AND WALKER (2011)
35
Motivation
• Road congestion creates well-documented time externalities
(congestion), but what about health externalities?
• Idling cars create SO2, could impact health (esp. natal health)
• What is real-world impact of reduction in congestion on health for
children born nearby?
• How can we test this? (Ethically)
36
Currie and Walker (2011)
38
E-ZPass
• i is mother, t is birth
• E-ZPass not installed at same time everywhere, so this isn’t a “p
pure” DID,
but timing was mostly between 1997 and 2000
42
DID checks on mother characteristics
43
DID results
45
What do we learn?
• E-ZPass lead to signiNcant and socially impor tant improvements in
infant health
• Prematurity fell 6.7–9.1%, low birth weight fell 8.5-11.3%
• Research design gives high con7dence in estimate
• By contrast, a selection on observable approach would have had a hard
time accounting for correlations between pollution, housing location
choice, and infant health
46
What do we not learn?
• Measurement of pollution impact on maternal health
→ Only one pollution monitor near these E-ZPass locations, didn’t
monitor CO
→ More recent work focuses more on estimating this relationship
directly
• Impacts on populations that don’t live near highways
• What if we were interested in how effects evolve over time? Or
pretrends?
• Then we need an event study
47
EVENT STUDIES
49
Event study
• Event study is just another tweak on the DID/FE setup
• Allow treatment effect to vary over time by including indicator for time
until/since treatment occurred interacted with treatment indicator
• If we return to a 7xed treatment time (suppose 3 periods), can estimate
as follows:
1
Y it = α + βD i + ξPost t + ∑ δ k T tk + ε it
k=−1
50
Event study matrix goggles
i t D i P ost t T T −1 T 0 T 1
A 1 1 0 0 1 0 0
A 2 1 1 1 0 1 0
A 3 1 1 1 0 0 1
B 1 1 0 0 1 0 0
B 2 1 1 1 0 1 0
B 3 1 1 1 0 0 1
C 1 0 0 0 0 0 0
C 2 0 1 0 0 0 0
C 3 0 1 0 0 0 0
51
Event studies with time-varying treatment
• Many event studies don’t assume a 7xed treatment period
→ In these cases, we use 7xed effects and a time-varying indicator for
treatment: T itk
1
Y it = α + ϕ i + ϕ t + ∑ δ k T itk + ε it
k=−1
52
Parallel pre-trends
• In an event study we may have multiple periods before treatment
• We can estimate the effect of being in the treatment group in each of
these (pre-)periods
• If there is no trend in the effect, the pre-trends are parallel
• This gives us some comfort that the trends were likely to be parallel in
the post-period in the absence of treatment
• We never observe what would have actually happened though, parallel
pre-trends is just supporting evidence that the parallel trends
assumption holds
• Hollingsworth and Rudik (2021) use an event study design to examine
the effect of lead on elderly mortality
53
Dynamic effects
• Reminder: event study is change in outcome over time due to policy
• In this case, if lead emissions in year 1 only kill in year 1, should only see
effects in year 1
• If lead emissions in year 1 also kill in years 2, 3, etc., should also see
effects in those years as well
• Summing these coeecients can yield cumulative effects
• Hollingsworth and Rudik (2021) provide a nice example of a paper that
studies health effects of lead with an event study
54
HOLLINGSWORTH AND
RUDIK (2021)
56
NASCAR, lead, and (bad) health
“The effect of leaded gasoline on elderly mortality: Evidence from regulatory
exemptions”
58
Research inRuencing policy
60
Approach
Map of treatment and control counties
B e f o re After
Treated Areas near NASCAR racetracks Areas near NASCAR
before 2007 racetracks after 2007
Control Areas far from NASCAR racetracks Areas far from NASCAR
before 2007 racetracks after 2007
• They are comparing areas close vs far from racetracks, before vs after
deleading in 2007
61
Approach
Map of treatment and control counties
County map
62
Approach
Event study
• Here they estimate the effect of being in the treated group, relative to
some baseline period
• In the 2x2 DD this baseline period was “before”
• Conducting an event study has two bene7ts for the paper:
→ They can look at dynamic effects of lead over time (likely, since
lead does not disappear quickly)
→ It provides **supporting evidence(()) for the parallel trends
assumption: treated counties didn’t have declining mortality rates
before removal of lead from gasoline
63
Results
Event study >gures
Figure 7
64
Paper takeways:
What can we take away from HR (2021)
65
TRIPLE DIFFERENCES
67
Triple differences (DDD)
• Like DID, but add one more layer of variation
• For example, if DID was attainment vs non-attainment counties before
and after Clean Air Act
• Add comparison between regulated and unregulated 7rms
• Source of variation is the difference in differences of differences (hence
triple differences)
• Another way to think about this: second DID is placebo test
68
DDD estimation
Intuition: Difference between two DID estimates
0 0
δ^DID
Z=0 = ( ¯
Y T
1
− ¯
Y T ) − ( ¯
Y 1
C − ¯
Y C)
0 0
δ^DID
Z=1 = ( ¯
Y T
1
− ¯
Y T ) − ( ¯
Y 1
C − ¯
Y C)
δ^DDD = δ^DID − δ^DID
Z=1 Z=0 69
Sample DDD in regression form
• t = 0, t = 1: Before and after clean air act
• D i : Attainment status
• Z i : Regulated and unregulated
y it = α + β 1 D i + β 2 Z i + β 3 After t +
β 4 [D i × Z i ] + β 5 [D i × After t ] + β 6 [Z i × After t ]+
δ[D i × Z i × After t ] + ε it
72
Environmental regulations and job transitions
Walker (2013)
73
Motivation and RQ
• Environmental (and other) regulations create winners and losers.
• Air quality regs:
→ Winners: bene7ciaries from improved AQ, companies with cleaner
production processes.
→ Losers: 7rms and workers whose costs increase as a result of
regulation.
• Simple models assume job transitions are costless.
• But this doesn’t match reality: workers who lose employment will take
time to 7nd new work and train to do it.
• Walker (2013) uses restricted-access census data to identify
transitional costs of CAAA regs.
74
Attainment areas
10
Y cst = ∑ η k1 [N c × P s × 1(τ t > 0)] + ρ cs + λ t + n ct + p st + ε c
k=0
• η k1 are the event study estimates, showing how employment among pollutin
Nrms in newly-regulated counties changes in 1990.
• FEs stand in for 7rst and second-order terms
• Also estimates “cohor t” effects to directly measure costs of the policy in
terms of lost earnings and transitional periods
77
Results
Sector employment (levels)
81
Differencing/event study wrap up
• Differencing estimators are very common in policy evaluation
• Identifying assumptions weaker than selection on observables
techniques (weaker assumptions are preferred)
• But (, as always,) context and source of variation matters
• Parallel trends assumption can be supported, but is fundamentally
untestable
• Event studies allow non-simultaneous treatment / dynamic effects over
time
• Next lecture: Fixed effects and the impacts of climate change
82
References
Currie, Janet, and Reed Walker. 2011. “Tra]c Congestion and Infant Health: Evidence from E-ZPass.”
American Economic Journal: Applied Economics 3 (1): 65–90. https://doi.org/10.1257/app.3.1.65.
Hollingsworth, Alex, and Ivan Rudik. 2021. “The Effect of Leaded Gasoline on Elderly Mortality: Evidence
from Regulatory Exemptions.” American Economic Journal: Economic Policy 13 (3): 345–73. https://
doi.org/10.1257/pol.20190654.
Walker, W. Reed. 2013. “The Transitional Costs of Sectoral Reallocation: Evidence From the Clean Air
Act and the Workforce*.” The Quarterly Journal of Economics 128 (4): 1787–1835. https://
doi.org/10.1093/qje/qjt022.
83