05 Covariates
05 Covariates
Pedro H. C. Sant’Anna
Emory University
January 2025
Summary of previous lectures
Canonical DiD setup
1
Canonical DiD setup
PS: We are taking SUTVA for granted from now onwards (NOT without loss of generality,
though)
2
“Brute force” DiD estimator
where Ȳg=d,t=j is the sample mean of the outcome Y for units in group d in time
period j,
N·T
1
Ng=d,t=j i∑
Ȳg=d,t=j = Yi 1{Gi = d}1{Ti = j},
=1
with
N·T
Ng=d,t=j = ∑ 1{Gi = d}1{Ti = j},
i=1
Gi and Ti are group and time dummy, respectively, and Yi is the “poolled” outcome
data.
3
“TWFE” DiD estimator
4
What if the Parallel Trends
5
Conditional Parallel Trends
Unconditional parallel trends assumption
■ But what if units with different observed characteristics were to evolve differently in
the absence of treatment?
▶ Effect of Minimum wage on employment: is it sensible to assume that, in the absence of
treatment, employment in states in the NE of the US would have evolved similarly as in
states in the South of the US?
7
Conditional Parallel Trends Assumption
■ In order to “relax” the PTA, we can assume that it holds only after conditioning on a
vector of observed pre-treatment covariates
8
Strong overlap
■ The covariates X here are the same as those used to justify the conditional PT
assumption!
■ For identification purposes, we can take ϵ
= 0. For (standard) inference, though, we
would have problems without relying on “extrapolation“; see, e.g., Khan and Tamer
(2010). 9
How do the conditional PTA and
10
Identification of ATT under conditional parallel trends and overlap
11
Conditional Parallel Trends and the conditional ATT
■ Let’s define the Conditional ATT: ATT(X) ≡ E [Yt=2 (2) − Yt=2 (∞)|G = 2, X].
■ Now, combining the results of previous slides, we have that, under SUTVA +
No-Anticipation + Conditional PT assumptions, it follows that:
■ This also implies that the unconditional ATT is identified - all we have to do is to
integrate X among treated units:
ATT = E [ATT(X)|G = 2]
12
Conditional Parallel Trends and the unconditional ATT
where the second equality follows from the Law of Iterated Expectations and covariates
and group indicators being stationary (which hold by construction on a balanced panel;
we will come back to this in a bit).
13
Can we use a simple regression here?
14
Usage of simple TWFE linear regressions with covariates
Usage of simple TWFE linear regressions with covariates
The temptation
TWFE DiD estimator
■ Under unconditional PTA, we have shown that we can use the TWFE regression to
recover the ATT:
■ It is very tempting to “extrapolate” and use the “more general” TWFE regression
specification:
15
twfe
Is β̃ 0 “similar” to the ATT?
16
Usage of simple TWFE linear regressions with covariates
Simulation exercise
Monte Carlo simulation exercise
■ In this particular exercise, we will use a Data generating process similar to those of
Kang and Schafer (2007)
n
■ Available data are {Yt=2 , Yt=1 , D, X}i=1 , where Di = 1{Gi = 2}.
17
Monte Carlo simulation exercise
Di = 1 { p ( X i ) ≥ U }
We estimate β̃twfe
0 from the following specification:
twfe
■ Average of the b̃
β0 in the simulations: -16.36 (very biased!)
20
Simulation results
0.15
0.10
Density
0.05
0.00
−20 −10 0
Two−Way Fixed Effect Estimator 21
Why there is so much bias here?
22
Usage of simple TWFE linear regressions with covariates
The problems of the simple TWFE specification with covariates
Simple TWFE DiD regression estimator with covariates
23
Simple TWFE regression estimator with covariates
24
Simple TWFE regression estimator with covariates
25
Simple TWFE regression estimator with covariates
26
Simple TWFE regression estimator with covariates
28
Key to success:
29
How can we do it?
30
Alternative Estimands
Semi and nonparametric DiD procedures
Once you separate identification from estimation procedures, we realize that DiD with
covariates has many faces!
31
Alternative Estimands
Regression adjustment
Regression adjustment procedure
■ We have already seen that, under conditional PT, No-anticipation, and SUTVA,
ATT = E E [Yt=2 |G = 2, X] − E [Yt=1 |G = 2, X] − E [Yt=2 |G = ∞, X] − E [Yt=1 |G = ∞, X] G = 2
| {z } | {z } | {z } | {z }
=mGt==22 (X) =mGt==12 (X) =mGt==2∞ (X) =mGt==1∞ (X)
h i
= E mGt==22 (X) − mGt==12 (X) − mGt==2∞ (X) − mGt==1∞ (X) G = 2
32
Regression adjustment procedure
33
Regression adjustment procedure
G=g
■ For example, let µt=s (X) = X′ βG0,t==gs be a working model for mGt==sg (X).
■ We can then estimate the betas in each subsample using OLS, compute the fitted
values using all covariates values among treated units, and then average the
combination of these fitted values:
h i
d reg
ATTn = E n b
µ G=2
t=2 ( X ) − b
µ G=2
t=1 ( X ) − b
µ G= ∞
t=2 ( X ) − b
µ G= ∞
t=1 ( X ) G = 2 .
34
Regression adjustment with panel data
■ Observing Yt=1 and Yt=2 for the same units allows us to simplify the formulas a lot!
■ In this case, the formula can also be simplified (but not as much as in the case of
panel data):
h i
ATT = E mGt==22 (X) − mGt==12 (X) − mGt==2∞ (X) − mGt==1∞ (X) G = 2
h i
= (E [Y|G = 2, T = 2] − E [Y|G = 2, T = 1]) − E mGt==2∞ (X) − mtG==1∞ (X) G=2
37
Regression-adjusted DiD estimators
■ Instead, it models the propensity score, i.e., prob of belonging to the group G = 2:
p(X) ≡ P (G = 2|X) = P (D = 1|X), where D = 1{G = 2}.
(1 − D)p (X)
E D− (Yt=2 − Yt=1 )
ipw,p 1 − p(X)
ATT = ,
E [D]
(1 − D)p (X) T−λ
E D− Y
ipw,rc 1 − p(X) λ
ATT = ,
E [D]
where λ = E [T] .
■ These formulas suggest a simple two-step estimation procedure, too!
1. Choose your favorite method to estimate the unknown propensity score p(X).
2. Plug in the estimated fitted propensity score values into the ATT equation, and replace
the population expectations with their sample analogue.
40
Inverse probability weighting procedures
exp(X′ γ0 )
■ For example, let π (X) = Λ (X) ≡ be a working model for the propensity
1 + exp(X′ γ0 )
score
exp(X′ γ
b0 )
b (X)
■ Let π =
1 + exp(X′ γ
bn )
41
Hájek-based Inverse probability weighting procedures
■ One potential drawback of Abadie’s IPW DiD estimator is that their weights are not
“normalized”, i.e., they do not sum up to one.
■ More formally, Abadie’s IPW DiD estimator is of the Horvitz and Thompson (1952)
type.
■ We know from the survey literature that Hájek (1971)-type estimators can be more
stable, as they use “normalized” weights.
■ Building on this insight, Sant’Anna and Zhao (2020) built on Abadie (2005) and
considered the Hájek (1971)-type IPW DiD estimands.
42
Hájek-based Inverse probability weighting with panel
Sant’Anna and Zhao (2020) considered the following estimand when Panel data are
available:
ATTipw,p
std = E wpG=2 (D) − wpG=∞ (D, X; p) (Yt=2 − Yt=1 )
p(X) (1 − D)
D 1 − p(X)
= E
E [D] − p(X) (1 − D) (Yt=2 − Yt=1 ) ,
E
1 − p(X)
where
D g(X) (1 − D) g(X) (1 − D)
wpG=2 (D) = , and wpG=∞ (D, X; g) = E
E [D] 1 − g(X) 1 − g(X)
43
Hájek-based Inverse probability weighting with repeated cross-section
Sant’Anna and Zhao (2020) considered the following estimand when stationary RCS data
are available:
ATTipw,rc
std = E [(wrc rc
G=2 (D, T) − wG=∞ (D, T, X; p)) · Y]
where
wrc rc rc
G=2 (D, T) = wG=2,t=2 (D, T) − wG=2,t=1 (D, T) ,
wrc rc rc
G=∞ (D, T, X; g) = wG=∞,t=2 (D, T, X; g) − wG=∞,t=1 (D, T, X; g) ,
and, for s = 1, 2,
D · 1 {T = s}
wrc
G=2,t=s (D, T) = ,
E [D · 1 {T = s}]
g(X) (1 − D) · 1 {T = s} g(X) (1 − D) · 1 {T = s}
wrc
G=∞,t=s (D, T, X; g) = E .
1 − g(X) 1 − g(X)
44
IPW-adjusted DiD estimators
45
46
Alternative Estimands
Doubly robust DiD estimators
Doubly robust DiD procedures
■ Combine both outcome regression and IPW approaches to form more robust
estimators.
47
Doubly robust DiD procedure with panel
Sant’Anna and Zhao (2020) considered the following doubly robust estimand when panel
data are available:
h i
ATTdr,p = E wpG=2 (D) − wpG=∞ (D, X; p) (Yt=2 − Yt=1 ) − mGt==2∞ (X) − mtG==1∞ (X)
p(X) (1 − D)
D 1 − p(X)
= E
E [D] − (Yt=2 − Yt=1 ) − mtG==2∞ (X) − mtG==1∞ (X) ,
p(X) (1 − D)
E
1 − p(X)
where
D g(X) (1 − D) g(X) (1 − D)
wpG=2 (D) = , and wpG=∞ (D, X; g) = E
E [D] 1 − g(X) 1 − g(X)
48
Doubly robust DiD procedure with panel
■ Sant’Anna and Zhao (2020) also shown that ATTdr,p is semiparametrically (locally)
efficient.
■ If all working models are correctly specified, the DR DiD estimator for the ATTdr,p is
“the most precise estimator” (minimum asymptotic variance) among all (regular)
estimators that does not rely on additional functional form restrictions.
■ Sant’Anna and Zhao (2020) also discuss how to get further improved DR DiD
estimators by “carefully” choosing first-step estimators for the regression adjustment
and propensity score working models.
49
Doubly robust DiD procedure with repeated cross-section
Sant’Anna and Zhao (2020) considered two different doubly robust estimands when RCS
data are available.
ATTdr,rc
1 = E (wrc rc rc rc
G=2 (D, T) − wG=∞ (D, T, X; p)) · Y − mG=∞,t=2 (X) − mG=∞,t=1 (X)
where
wrc rc rc
G=2 (D, T) = wG=2,t=2 (D, T) − wG=2,t=1 (D, T) ,
wrc rc rc
G=∞ (D, T, X; g) = wG=∞,t=2 (D, T, X; g) − wG=∞,t=1 (D, T, X; g) ,
D · 1 {T = s}
wrc
G=2,t=s (D, T) = ,
E [D · 1 {T = s}]
g(X) (1 − D) · 1 {T = s} g(X) (1 − D) · 1 {T = s}
wrc
G=∞,t=s (D, T, X; g) = E . 50
1 − g(X) 1 − g(X)
Doubly robust DiD procedure with repeated cross-section
Sant’Anna and Zhao (2020) second DR DiD estimand also relies on outcome regression
models for the treated unit:
ATTdr,rc
2 = ATT1dr,rc
rc
+ E mrc rc rc
G=2,t=2 (X) − mG=∞,t=2 (X) D = 1 − E mG=2,t=2 (X) − mG=∞,t=2 (X) D = 1, T = 2
rc
− E mrc rc rc
G=2,t=1 (X) − mG=∞,t=1 (X) D = 1 − E mG=2,t=1 (X) − mG=∞,t=1 (X) D = 1, T = 1 ,
51
Doubly robust DiD procedure with repeated cross-section
■ Both DR DiD estimators for RCS data are consistent for the ATT under the same
conditions:
■ Even if the regression model for the outcome evolution for the treated group is
misspecified, ATTdr,rc
2 is consistent for the ATT (provided that either the pscore or the
regression models for outcome evolution among untreated units are correctly
specified).
dr,rc
■ However, in general, ATT2 is more efficient than ATTdr,rc
1 .
dr,rc
■ In fact, Sant’Anna and Zhao (2020) shown that ATT2 is (locally) semiparametrically
efficient.
52
Let’s see how these work in a
simulation exercise
53
Monte Carlo simulations
Simulations
■ Data generating processes are similar to those considered in the TFWE example
n
■ Available data are {Yt=2 , Yt=1 , D, Z}i=1 , where Di = 1{Gi = 2}.
■ We estimate the pscore assuming a logit specification, and the outcome regression
models assuming a linear specification
54
DGPs
55
DGPs
■ DGP1:
57
DGPs
■ DGP2:
58
DGPs
■ DGP3:
59
DGPs
■ DGP4:
60
Table 1: Monte Carlo Simulations, DGP1: Both pscore and OR are correctly specified
61
Figure 2: Monte Carlo for DID estimators, DGP1: Both pscore and OR are correctly specified
4
4
DR Tr.
DR Imp
0.3
3 3
0.2
Density
Density
Density
2 2
0.1
1 1
0 0 0.0
−0.25 0.00 0.25 0.50 −0.25 0.00 0.25 0.50 −5.0 −2.5 0.0 2.5 5.0 7.5
Regression DID Estimator DR DID Estimators Std. IPW DID Estimator
62
Table 2: Monte Carlo Simulations, DGP2: Only OR is correctly specified
63
Figure 3: Monte Carlo for DID estimators, DGP2: Only OR is correctly specified
4 4
0.4
DR Tr.
DR Imp
3 3
0.3
Density
Density
Density
2 2
0.2
1 1
0.1
0 0 0.0
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −5.0 −2.5 0.0 2.5
Regression DID Estimator DR DID Estimators Std. IPW DID Estimator
64
Table 3: Monte Carlo Simulations, DGP3: Only PS is correctly specified
65
Figure 4: Monte Carlo for DID estimators, DGP3: Only PS is correctly specified
0.4
0.3 DR Tr.
DR Imp
0.3
0.2
0.2
Density
Density
Density
0.2
0.1
0.1
0.1
−8 −4 0 4 −8 −4 0 4 −8 −4 0 4
Regression DID Estimator DR DID Estimators Std. IPW DID Estimator
66
Table 4: Monte Carlo Simulations, DGP4: Both OR and PS are misspecified
67
Figure 5: Monte Carlo for DID estimators, DGP4: Both OR and PS are misspecified
0.4
0.3
DR Tr.
DR Imp
0.3
0.2
0.2
Density
Density
Density
0.2
0.1
0.1
0.1
−10.0 −7.5 −5.0 −2.5 0.0 −10.0 −7.5 −5.0 −2.5 0.0 −8 −4 0
Regression DID Estimator DR DID Estimators Std. IPW DID Estimator
68
Monte Carlo simulations for repeated cross-section data
69
Table 5: Monte Carlo Simulations, DGP1: Both the pscore and the OR are correctly specified
70
Figure 6: Monte Carlo for DID estimators, DGP1: Both the pscore and the OR are correctly specified
Density
Density
0.03 0.03
0.02
0.02 0.02
0.01
0.01 0.01
Density
Density
2 2
0.05
1 1
0.00 0 0
−40 −20 0 20 40 −40 −20 0 20 40 −40 −20 0 20 40
DR (but not loc. eff.) DID Estimators DR & Loc. Eff. DID Estimators All DR DID Estimators
71
Figure 7: Monte Carlo for DID estimators, DGP1: Both the pscore and the OR are correctly specified
DR Trad.
DR Imp.
DR Trad. loc. eff.
DR Imp. loc. eff.
1.5
Density
1.0
0.5
0.0
−10 −5 0 5 10
72
Figure 8: Monte Carlo for DID estimators, DGP1: Both the pscore and the OR are correctly specified
1.5
Density
1.0
0.5
0.0
73
Table 6: Monte Carlo Simulations, DGP2: Only the OR is correctly specified
74
Figure 9: Monte Carlo for DID estimators, DGP2: Only the OR is correctly specified
0.05 0.05
0.04 OR
0.04 0.04 Std. IPW
0.03
0.03 0.03
Density
Density
Density
0.02
0.02 0.02
0.01
0.01 0.01
Density
Density
2 2
0.050
1 1
0.025
0.000 0 0
−40 −20 0 20 40 −40 −20 0 20 40 −40 −20 0 20 40
DR (but not loc. eff.) DID Estimators DR & Loc. Eff. DID Estimators All DR DID Estimators
75
Figure 10: Monte Carlo for DID estimators, DGP2: Only the OR is correctly specified
2.0
1.5
Density
1.0
0.5
0.0
76
Table 7: Monte Carlo Simulations, DGP3: Only the PS is correctly specified
77
Figure 11: Monte Carlo for DID estimators, DGP3: Only the PS is correctly specified
0.05 0.05
0.04 OR
Std. IPW
0.04 0.04
0.03
Density
Density
Density
0.03 0.03
0.02
0.02 0.02
0.01
0.01 0.01
Density
Density
0.04 0.050 0.050
78
Figure 12: Monte Carlo for DID estimators, DGP3: Only the PS is correctly specified
0.100
DR Trad.
DR Imp.
DR Trad. loc. eff.
DR Imp. loc. eff.
0.075
Density
0.050
0.025
0.000
−20 −10 0 10 20
79
Figure 13: Monte Carlo for DID estimators, DGP3: Only the PS is correctly specified
0.100
0.075
Density
0.050
0.025
0.000
−10 0 10
DR & Loc. Eff. DID Estimators
80
Table 8: Monte Carlo Simulations, DGP4: Both the OR and the PS are misspecified
81
Figure 14: Monte Carlo for DID estimators, DGP4: Both the OR and the PS are misspecified
0.05 0.05
0.04 OR
0.04 0.04 Std. IPW
0.03
0.03 0.03
Density
Density
Density
0.02
0.02 0.02
Density
Density
0.04 0.050 0.050
82
Figure 15: Monte Carlo for DID estimators, DGP4: Both the OR and the PS are misspecified
0.100
DR Trad.
DR Imp.
DR Trad. loc. eff.
DR Imp. loc. eff.
0.075
Density
0.050
0.025
0.000
−20 −10 0 10 20
83
Figure 16: Monte Carlo for DID estimators, DGP4: Both the OR and the PS are misspecified
0.100
0.075
Density
0.050
0.025
0.000
−20 −10 0 10 20
DR & Loc. Eff. DID Estimators
84
What are the main take-away
messages?
85
Take-way messages
DiD procedures with covariates
87
References
Abadie, Alberto, “Semiparametric Difference-in-Differences Estimators,” The Review of
Economic Studies, 2005, 72 (1), 1–19.
Hájek, J., “Discussion of ‘An essay on the logical foundations of survey sampling, Part I’, by
D. Basu,” in V. P. Godambe and D. A. Sprott, eds., Foundations of Statistical Inference,
Toronto: Holt, Rinehart, and Winston, 1971.
Heckman, James, Hidehiko Ichimura, Jeffrey Smith, and Petra Todd, “Characterizing
Selection Bias Using Experimental Data,” Econometrica, 1998, 66 (5), 1017–1098.
Heckman, James J., Hidehiko Ichimura, and Petra E. Todd, “Matching As An Econometric
Evaluation Estimator: Evidence from Evaluating a Job Training Programme,” The Review
of Economic Studies, October 1997, 64 (4), 605–654.
Horvitz, D. G. and D. J. Thompson, “A Generalization of Sampling Without Replacement
From a Finite Universe,” Journal of the American Statistical Association, 1952, 47 (260),
663–685.
87
Kang, Joseph D. Y. and Joseph L. Schafer, “Demystifying Double Robustness: A
Comparison of Alternative Strategies for Estimating a Population Mean from
Incomplete Data.,” Statistical Science, 2007, 22 (4), 569–573.
Khan, Shakeeb and Elie Tamer, “Irregular Identification, Support Conditions, and Inverse
Weight Estimation,” Econometrica, 2010, 78 (6), 2021–2042.
Sant’Anna, Pedro H. C. and Jun Zhao, “Doubly robust difference-in-differences estimators,”
Journal of Econometrics, November 2020, 219 (1), 101–122.
87