1 Introduction
There has been an explosion in recent advances in econometric methods for policy anal-
ysis. A particularly active area is that applied to estimating the impact of exposure to
some particular event or policy, when observations are available in a panel or repeated
cross section of groups and time (see for example recent surveys by de Chaisemartin and
D’Haultfœuille (2022); Roth et al. (2022) for reviews of these methods). A modelling
challenge in this setting is in determining what would have happened to exposed units
had they been left unexposed. Should such a counterfactual be estimable from under-
lying data, causal inference can be conducted by comparing outcomes in treated units
to those in theoretical counterfactual untreated states, under the potential outcome
framework (see for example the discussion in Holland (1986); Rubin (2005).
A substantial number of empirical studies in economics and the social sciences more
generally seek to estimate effects in this setting using difference-in-difference (hereafter
DID) style designs. Here impacts are inferred by comparing treated to control units,
where time-invariant level differences between units are permitted as well as general
common trends. However, the drawing of causal inferences requires a parallel trend
assumption, which states that in the absence of treatment, treated units would have
followed parallel paths to untreated units. Whether this assumption is reasonable in
2 Synthetic Difference In Differences
where there is a single treated unit, multiple treatment units, and multiple treatment
periods. It reports treatment effects laid out in Arkhangelsky et al. (2021), additionally
implementing their proposed bootstrap, jackknife and placebo inference procedures. A
number of graphical output options are provided to examine the generation of the SDID
estimator and the underlying optimal weight matrices. While principally written to
conduct SDID estimation, the sdid command (and the SDID method) nests as possible
estimation procedures SC and DID, which can be easily generated to allow comparison
of estimation procedures and estimates.1
In introducing the command, we first provide a primer on the core methodological
points of SDID (as well as comparisons to DID and SC), and then describe how these
procedures extend to a setting where treatment adoption occurs over multiple time
periods. We then lay out the command syntax of sdid, as well as the elements which
are returned to the user. We provide a number of examples to illustrate the use of the
SDID method in Stata, both based upon a well-known example of California’s passage
of Proposition 99, an anti-smoking measure previously presented in Abadie et al. (2010);
Arkhangelsky et al. (2021) in which a single state adopts a treatment at a given time,
as well as an example where exposure to a policy occurs at mutiple periods: the case of
parliamentary gender quotas studied by Bhalotra et al. (2022). We conclude by making
a number of practical points on the computational implementation of this estimator.
2 Methods
2.1 The Canonical Synthetic Difference-in-Differences Procedure
The synthetic DID procedure, hereafter SDID, is developed in Arkhangelsky et al.
(2021), and we lay out principal details here. As input, SDID requires a balanced
panel of N units or groups, observed over T time periods. An outcome, denoted Yit , is
observed for each unit i in each period t. Some, but not all, of these observations are
treated with a specific binary variable of interest, denoted Wit . This treatment variable
Wit = 1 if observation i is treated by time t, otherwise, Wit = 0 indicates that unit i is
untreated at time t. Here, we assume that there is a single adoption period for treated
units, which Arkhangelsky et al. (2021) refer to as a ‘block treatment assignment’. In
section 2.3, we extend this to a ‘staggered adoption design’ (Athey and Imbens 2022),
where treated units adopt treatment at varying points. A key element of both of these
designs is that once treated, units are assumed to remain exposed to treatment forever
thereafter. In the particular setting of SDID, no always treated units can be included
in estimation. For estimation to proceed, we require at least two pre-treatment periods
off of which to determine control units.
The goal of SDID is to consistently estimate the causal effect of receipt of policy
1 Code from the original paper was provided in R (Hirshberg undated), which can do many of the
procedures which sdid implements, and indeed, abstracting from differences in pseudo-random number
generation, give identical results in cases where similar procedures are possible. A number of useful
extensions are available in sdid, such as the implementation of estimates in cases where treatment
occurs in multiple periods, and alternative manners to include covariates.
4 Synthetic Difference In Differences
or treatment Wit , (an average treatment effect on the treated, or ATT) even if we do
not believe in the parallel trends assumption between all treatment and control units
on average. Estimation of the ATT proceeds as follows:
(N T )
sdid 2 sdid bsdid
τb ,µ
b, α
b, β = arg min
b (Yit − µ − αi − βt − Wit τ ) ω
b i λ t (1)
τ,µ,α,β i=1 t=1
where the estimand is the ATT, generated from a two-way fixed effect regression, with
optimally chosen weights ωbisdid and λ
bsdid discussed below. Note that here, this procedure
flexibly allows for shared temporal aggregate factors given the estimation of time fixed
effects βt and time invariant unit-specific factors given the estimation of unit fixed
effects αi . As is standard in fully saturated fixed-effect models, one αi and one βt
fixed effect are normalized to zero to avoid multi-colinearity. The presence of unit-fixed
effects implies that SDID will simply seek to match treated and control units on pre-
treatment trends, and not necessarily on both pre-treatment trends and levels, allowing
for a constant difference between treatment and control units.
In this setting, it is illustrative to consider how the SDID procedure compares to the
traditional synthetic control method of Abadie et al. (2010), as well as the baseline DID
procedure. The standard DID procedure consists of precisely the same two-way fixed
effect OLS procedure, simply assigning equal weights to all time periods and groups:
(N T )
did 2
τb , µb, α
b, β = arg min
b (Yit − µ − αi − βt − Wit τ ) . (2)
τ,µ,α,β i=1 t=1
The synthetic control, on the other hand, maintains optimally chosen unit-specific
weights ω (as laid out below), however does not seek to optimally consider time periods
via time weights, and omits unit fixed effects αi implying that the synthetic control and
treated units should maintain approximately equivalent pre-treatment levels, as well as
trends. (N T )
sc 2 sc
τb , µ
b, β = arg min
b (Yit − µ − βt − Wit τ ) ω
bi (3)
τ,µ,β i=1 t=1
From (2)-(3) it is clear that the SDID procedure offers greater flexibility than both the
DID and SC procedures; in the case of DID by permitting a violation of parallel trends
in aggregate data, and in the case of SC, by both additionally seeking to optimally
weight time periods when considering counterfactual outcomes, and allowing for level
differences between treatment and control groups.
The selection of unit weights, ω, as inputs to (1) (and (3)) seeks to ensure that com-
parison is made between treated units and controls which were approximately following
parallel trends prior to the adoption of treatments. The selection of time weights, λ in
the case of SDID seeks to draw more weight from pre-treatment periods which are more
similar to post-treatment periods, in the sense of finding a constant difference between
each control unit’s post treatment average, and pre-treatment weighted averages across
all selected controls. Specifically, as laid out in Arkhangelsky et al. (2021), unit-specific
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 5
( Nco
X 1
where Ω = ω∈ RN
+, with ωi = 1 and ωi = for all i = Nco + 1, . . . , N ,
N tr
||ω||2 refers to the Euclidean norm and ζ is a regularization parameter laid out in
Arkhangelsky et al. (2021, pp. 4091-4092).2 This leads to a vector of Nco non-negative
weights plus an intercept ω0 . The weights ωi for all i ∈ {1, . . . , Nco } imply that absolute
difference between control and treatment trends units should be minimized over all pre-
treatment periods, while ω0 initially allows for a constant difference between treatment
and controls over time. Together, these imply that units will follow parallel pre-trends,
though provided ω0 ̸= 0, not identical pre-trends.
In the case of time weights, a similar procedure is followed, finding weights which
minimize the following objective function:
Nco Tpre T
X X 1 X
b0 , λ = arg min λ0 + λt Yit − Yit + ζ 2 Nco ||λ||2(5)
λ0 ∈R,λ∈Λ i=1 t=1
t=Tpre +1
X 1
where Λ = λ ∈ RT+ , with λt = 1 and λt = for all t = Tpre + 1, . . . , T ,
where the final term in (5) is a very small regularization term to ensure uniqueness of
time weights, where ζ = 1 × 10−6 σb, and σ
b is defined as in footnote 2.
This estimation procedure is summarized in Arkhangelsky et al. (2021, algorithm
1), reproduced in Appendix 1 here for ease of access. Arkhangelsky et al. (2021) also
prove that the estimator is asymptotically normal, suggesting that confidence intervals
on τ can be constructed as: q
τb ± zα/2 Vbτ ,
where zα/2 refers to the inverse normal density function at percentile α/2 should one
wish to compute 1-α confidence intervals. These confidence intervals thus simply require
an estimate of the variance of τ , Vbτ . Arkhangelsky et al. (2021) propose three specific
procedures to estimate this variance: a block bootstrap, a jackknife, or a permutation-
based approach.
2 For the sake of completion, this regularization parameter is calculated as ζ = (N 1/4 σ
tr × Tpost ) b,
X X−1
co Tpre
X X−1
co Tpre
b2 =
σ ¯ 2 , ∆it = Yi,(t+1) −Yit , and ∆
(∆it −∆) ¯ = ∆it .
Nco (Tpre − 1) i=1 t=1
Nco (Tpre − 1) i=1 t=1
6 Synthetic Difference In Differences
The block (also known as clustered) bootstrap approach, consists of taking some
large number, B, of bootstrap resamples over units, where units i are the resampled
blocks in the block bootstrap procedure. Provided that a given resample does not con-
sist entirely of treated, or entirely of control units, the quantity τbsdid is re-estimated,
sdid (b)
and denoted as τb(b) for each bootstrap resample. The bootstrap variance Vbτ is then
calculated as the variance of resampled estimates τb(b) across all B resamples. The boot-
strap algorithm is defined in Arkhangelsky et al. (2021, algorithm 2), reproduced here
in appendix 1. This bootstrap procedure is observed in simulation to have particularly
good properties, but has two particular drawbacks, justifying alternative inference pro-
cedures. The first is that it may be computationally costly, given that in each bootstrap
resample the entire synthetic DID procedure is re-estimated, including the estimation
of optimal weights. This is especially computationally expensive in cases where working
with large samples, or where covariates are included, as discussed at more length below.
The second, is that formal proofs of asymptotic normality rely on the number of treated
units being large, and as such, estimated variance, and confidence intervals, may be
unreliable when the number of treated units is small.
An alternative estimator which significantly reduces the computational burden in-
herent in the bootstrap is estimating a jackknife variance for τbsdid . This procedure
consists of iterating over all units in the data, in each iteration removing the given
unit, and recalculating τbsdid , denoted τb(−i)
, maintaining fixed the optimal weights for ω
and λ calculated in the original SDID estimate. The jackknife variance, Vbτ is then
calculated based on the variance of all τb(−i) estimates, following Arkhangelsky et al.
(2021, Algorithm 3) (refer to Appendix 1 here). In this case, each iteration saves on re-
calculating optimal weights, and as documented by Arkhangelsky et al. (2021), provide
a variance leading to conservative confidence intervals, without the computational bur-
den imposed by the bootstrap. Once again, asymptotic normality relies on there being
a large number of treated units, and in particular if only 1 treated unit is observed –
as is often the case in comparative case studies – the jackknife will not even be defined
given that a τb(−i) term will be undefined when removing the single treated unit.
Given limits to these inference options when the number of treated units is small,
an alternative placebo, or permutation-based, inference procedure is proposed. This
consists of, first, conserving just the control units, and then randomly assigning the
same treatment structure to these control units, as a placebo treatment. Based on
this placebo treatment, we then re-estimate τbsdid , denoted τb(p) sdid
. This procedure is
repeated many times, giving rise to a vector of estimates τb(p) , and the placebo variance,
Vbτ , can be estimated as the variance of this vector. This is formally defined in
Arkhangelsky et al. (2021, algorithm 4), and appendix 1 here. It is important to note
that in the case of this placebo-based variance, homoskedasticity across units is required,
given that the variance is based off placebo assignments of treatment made only within
the control group.
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 7
where βb comes from regression of Yit on Xit . This procedure, in which the synthetic
DID process will be applied to the residuals Yitres , is different to the logic of synthetic
controls following Abadie et al. (2010). In Abadie et al.’s conception, when covariates
are included the synthetic control is chosen to ensure that these covariates are as closely
matched as possible between treated and synthetic control units. However in the SDID
conception, covariate adjustment is viewed as a pre-processing task, which removes the
impact of changes in covariates from the outcome Yit prior to calculating the synthetic
control. Along with their paper, Arkhangelsky et al. (2021) provide an implementation
of their algorithm in R (Hirshberg undated), and in practice they condition out these
variables Xit by finding βb within an optimization procedure which additionally allows for
the efficient calculation of optimal weights ω and λ. In the sdid code described below,
we follow Hirshberg (undated) in implementing this efficient optimization procedure (the
Frank and Wolfe (1956) solver), however there are a number of potential complications
which can arise in this manner of dealing with covariates, and as such, alternative
procedures are also available.
A first potential issue is purely numerical. In the Frank-Wolfe solver discussed above,
a minimum point is assumed to be found when successive iterations of the solver lead to
arbitrarily small changes in all parameters estimated.3 Where these parameters include
coefficients on covariates, in extreme cases the solution found for (1) can be sensitive
to the scaling of covariates. This occurs in particular when covariates have very large
magnitudes and variances. In such cases, the inclusion of covariates in (1) can cause
optimization routines to suggest solutions which are not actually globally optimal, given
that successive movements in βb can be very small. In extreme cases, this can imply that
when multiplying all variables Xit by a large constant values, the estimated treatment
effect can vary. While this issue can be addressed by using smaller tolerances for defining
stopping rules in the optimization procedure, it can be addressed in a more simple way
if all covariates are first re-standardized as Z-scores, implying that no very-high-variance
variables are included, while capturing the same underlying variation in covariates.
A second, potentially more complicated point is described by Kranz (2022). He
notes that in certain settings, specifically where the relationship between covariates
and the outcome vary over time differentially in treatment and control groups, the
procedure described above may fail to capture the true relationship between covariates
and the outcome of interest, and may subsequently lead to bias in estimated treatment
3 In the case of convex functions such as that in (1), the Frank-Wolfe solver finds a global minima,
Estimation Unlike the block assignment case where a single pre- versus post-treatment
date can be used to conduct estimation, in the staggered adoption design, multiple
adoption dates are observed. Consider for example the treatment matrix below, con-
sisting of 8 units, 2 of which (1 and 2) are untreated, while the other 6 are treated,
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 9
This single staggered treatment matrix W can be broken down into adoption date
specific matrices, W1 , W2 and W3 , or generically, W1 , . . . , WA , where A indicates
the number of distinct adoption dates. Additionally, a row vector A consisting of A
elements contains these distinct adoption periods. In this specific setting where units
first adopt treatment in period t3 (units 7 and 8), t4 (units 5 and 6), and t7 (units 3
and 4), the adoption date vector consists simple of periods 3, 4 and 7.
A= 3 4 7
Finally, adoption-specific matrices W1 -W3 simply consist of pure treated units, and
units which adopt in this specific period, as below:
t1 t2 t3 t4 t5 t6 t7 t8
1 0 0 0 0 0 0 0 0
W1 =
2 0 0 0 0 0 0 0 0 ,
3 0 0 0 0 0 0 1 1
4 0 0 0 0 0 0 1 1
t1 t2 t3 t4 t5 t6 t7 t8
1 0 0 0 0 0 0 0 0
W = 0 0 0 0 0 0 0 0 ,
5 0 0 0 1 1 1 1 1
6 0 0 0 1 1 1 1 1
t1 t2 t3 t4 t5 t6 t7 t8
1 0 0 0 0 0 0 0 0
W = 0 0 0 0 0 0 0 0 .
7 0 0 1 1 1 1 1 1
8 0 0 1 1 1 1 1 1
As laid out in Arkhangelsky et al. (2021, Appendix A), the average treatment effect
on the treated can then be calculated by applying the synthetic DID estimator to each
of these 3 adoption-specific samples, and calculating a weighted average of the three
estimators, where weights are assigned based on the relative number of treated units
10 Synthetic Difference In Differences
and time periods in each adoption group. Generically, this ATT is calculated based on
adoption-specific SDID estimates as:
X Tpost
[ T = × τ̂asdid (7)
for a∈A
where Tpost refers to the total number of post-treatment periods observed in treated
units. This estimation procedures is laid out formally in Algorithm 1 below.
Note that in this case, while the parameter interest is likely the treatment effect
AT T or adoption specific τasdid parameters, each adoption period is associated with an
optimal unit and time weight vector ωasdid and λsdid
a , which can be returned following
Algorithm 1: Estimation of the ATT with staggered adoption
Data: Y, W, A.
Result: Point estimate AT
[ T and adoption-specific values τ̂asdid , ω̂asdid and λ̂sdid
for all a ∈ A.
for a ∈ A do
1. Subset Y and W to units who are pure controls, and who first adopt
treatment in period t = a. ;
2. Compute regularization parameter ζ ;
3. Compute unit weights ω̂asdid ;
4. Compute time weights λ̂sdid
a ;
5. Compute the SDID estimator via the weighted DID regression ;
( )
τ̂asdid , µ̂a , α̂a , β̂a = arg min (Yit − µ − αi − βt − Wit τ )2 ω sdid sdid
ba,i λ̂a,t
τ,µ,α,β i=1 t=1
Inference In the staggered adoption design, estimated treatment effects are simply a
multi-period extension of the underlying SDID algorithm, in each case working with
the relevant pure control and treated sub-sample. Thus, inference can be conducted in
the staggered adoption design under similar resample or placebo procedures. Here we
discuss inference following each of the bootstrap, jackknife, or placebo procedures laid
out in Arkhangelsky et al. (2021), applied to a staggered adoption setting. We note that
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 11
in this design, it is likely the case that one wishes to conduct inference on the treatment
effect, AT T from (7). Thus in the below, we propose inference details for this estimand,
additionally noting that standard errors and confidence intervals on adoption-specific
SDID parameters τasdid come built-in as part of these procedures.
Consider first the case of bootstrap inference. Suppose that one wishes to estimate
standard errors or generate confidence intervals on the global treatment effect AT T . A
bootstrap procedure can be conducted based on many clustered bootstrap resamples
over the entire initial dataset, where in each case, a resampled ATT estimate AT [ T
is generated, following Algorithm 1. Based on many such resampled estimates, the
bootstrap variance can be calculated as the variance of these resamples. We lay out the
bootstrap variance estimate below in Algorithm 2. Note that as in the block treatment
for b ← 1 to B do
1. Construct a bootstrap dataset (Y(b) , W(b) , A(b) ) by sampling N rows of
(Y, W) with replacement, and generating A as the corresponding adoption
vector ;
2. if the bootstrap sample has no treated units or no control units then
Discard resample and go to 1 ;
3. Compute SDID estimate AT T (b) following Algorithm 1 based on
(Y(b) , W(b) , A(b) ). Similarly, generate a vector of adoption-date specific
resampled SDID estimates τa for all a ∈ A(b) ;
PB \ PB 2
cb 1 (b) − 1 \ (b)
4. Define VbAT T = B b=1 AT T B b=1 AT T . Similarly, estimate
adoption-date specific variances for each τasdid estimate as the variance over
each τa ;
design, this bootstrap procedure requires the number of treated units to grow with N
within each adoption period. As such, if a very small number of treated units exist
for certain adoption periods, placebo inference is likely preferable. Similarly, as laid
out in the block treatment design, the bootstrap procedure re-estimates optimal weight
matrices in each resample, and can be computationally expensive in cases where samples
are large.
An alternative inference procedure, which is less computationally intensive but sim-
ilarly based on asymptotic arguments with a large number of states, and many treated
units, is based on the jackknife. Here, optimal weight matrices calculated for each
12 Synthetic Difference In Differences
for i ← 1 to N do
1. Compute SDID estimate AT T (−i) following Algorithm 1 based on
(Y(−i) , W(−i) , A). Similarly, generate a vector of adoption-date specific
resampled SDID estimates τa for all a ∈ A;
jack PN (−i)
2. Compute VAT T = (N − 1)N
i=1 AT T
[ − AT T ;
Finally, in cases where the number of treated units is small, and concerns related to
the validity of the previous variance estimators exists, the placebo inference procedure
defined in algorithm 4 can be used. Here, this is defined for the staggered adoption case,
generalising Algorithm 4 of Arkhangelsky et al. (2021). To conduct this procedure,
placebo treatments are randomly assigned based on the actual treatment structure,
however only to the control units. Based on these placebo assignments, placebo values
for AT T are generated, which can be used to calculate the variance as laid out in
Algorithm 4. It is important to note that such a procedure will only be feasible in cases
where the number of control units is strictly larger than the number of treated units
(or hence placebo assignments will not be feasible), and, as laid out in Arkhangelsky
et al. (2021); Conley and Taber (2011), such a procedure relies on homoskedasticity
across units, as otherwise the variance of the treatment effect on the treated could not
be inferred from variation in assignment of placebo treatments to control units.
for b ← 1 to B do
1. Sample Ntr out of the Nco control units without replacment to ‘receive
the placebo’ ;
2. Construct a placebo treatment matrix Wco , for the controls ;
3. Compute SDID estimate AT T (b) following algorithm 1 based on
(Yco , Wco , A(b) ) ;
PB \ PB 2
placebo 1 (b) − 1 \
4. Define VbAT T = B b=1 AT T B b=1 AT T (b) ;
where depvar describes the dependent variable in a balanced panel of units (group-
var) and time periods (timevar). The variable which indicates units which are treated
at each time period, which accumulates over time, is indicated as treatment. Note that
here, it is not necessary for users to specify whether the design is a block or staggered
adoption design, as this will be inferred based off the data structure. Optionally, if and
in can be specified, provided that this does not result in imbalance in the panel. Re-
quired and permitted options are discussed below, followed by a description of objects
returned by sdid.
to be constructed. In each case, inference follows the specific algorithm laid out in
Arkhangelsky et al. (2021). We allow the no inference option (noinference) should one
wish to simply generate the point estimator. This is useful if you wish to plot outcome
trends without the added computational time associated with inference procedures.
covariates(varlist, type) Covariates should be included as a varlist, and if specified,
treatment and control units will be adjusted based on covariates in the synthetic
difference-in-differences procedure. Optionally, type may be specified, which indicates
how covariate adjustment will occur. If the type is indicated as “optimized” (the
default) this will follow the method described in Arkhangelsky et al. (2021), footnote
4, where SDID is applied to the residuals of all units after regression adjustment.
However, this has been observed to be problematic at times (refer to Kranz (2022)),
and is also sensitive to optimization if covariates have high dispersion. Thus, an
alternative type is implemented (“projected”), which consists of conducting regression
adjustment based on parameters estimated only in untreated units. This type follows
the procedure proposed by Kranz (2022) (xsynth in R), and is observed to be more
stable in some implementations (and at times, considerably faster). sdid will run
simple checks on the covariates indicated and return an error if covariates are constant,
to avoid multicolineality. However, prior to running sdid, you are encouraged to
ensure that covariates are not perfectly multicolinear with other covariates and state
and year fixed effects, in a simple two-way fixed effect regression. If perfectly multi-
colinear covariates are included sdid will execute without errors, however where type is
“optimized”, the procedure may be sensitive to the inclusion of redundant covariates.
seed(#) Define the seed for pseudo-random numbers.
reps(#) Set the number of repetitions used in the calculation of bootstrap and placebo
standard errors. Default is 50 repetitions. Larger values should be preferred where
method(type) this option allows you to change the estimation method. The type must
be one of sdid, did or sc, where sdid refers to synthetic difference-in-differences, sc
refers to synthetic control, and did refers to difference-in-differences. By default, sdid
is enabled.
graph if this option is specified, graphs will be displayed showing unit and time weights
as well as outcome trends as per Figure 1 from Arkhangelsky et al. (2021).
g1on this option activates the unit-specific weight graph. By default g1 is off.
g1 opt(string) option to modify the appearance of the unit-specific weight graph. These
options adjust the underlying scatter plot, so should be consistent with twoway scatter
g2 opt(string) option to modify the appearance of the outcome trend graphs. These
options adjust the underlying line plot, so should be consistent with twoway line plots.
graph export(string, type) Graphs will be saved as weightsYYYY and trendsYYYY
for each of the unit-specific weights and outcome trends respectively, where YYYY
refers to each treatment adoption period. Two graphs will be generated for each
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 15
treatment adoption period provided that g1on is specified, otherwise a single graph
will be generated for each adoption period. If this option is specified, type must be
specified, which refers to a valid Stata graph type (eg “.eps”, “.pdf”, and so forth).
Optionally, a stub can be specified, in which case this will be prepended to exported
graph names.
msize(markersizestyle) allows you to modify the size of the marker for graph 1.
unstandardized if controls are included and the “optimized” method is specified, con-
trols will be standardized as Z-scores prior to finding optimal weights. This avoids
problems with optimization when control variables have very high dispersion. If un-
standardized is specified, controls will simply be entered in their original units. This
option should be used with care.
mattitles requests labels to be added to the returned e(omega) weight matrix provid-
ing names (in string) for the unit variables which generate the synthetic control group
in each case. If mattitles is not indicated, the returned weight matrix (e(omega)) will
store these weights with a final column providing the numerical ID of units, where
this numerical ID is either taken from the unit variable (if this variable is a numerical
format), or arranged in alphabetical order based on the unit variable, if this variable
is in string format.
Returned Objects
The third line of this code excerpt quite simply implements the synthetic difference-
in-differences estimator, returning identical point estimates to those documented in
Table 1 of Arkhangelsky et al. (2021). Standard errors are slightly different, as these
are based on pseudo-random placebo reshuffling, though can be replicated as presented
here provided that the same seed is set in the seed option. Note that in this case,
given that a small number (1) of treated units is present, placebo inference is the only
appropriate procedure, as indicated in the vce() option.
Control Treated
-50 -30 -10
Packs per capita
New Hampshire
New Mexico
North Carolina
North Dakota
Rhode Island
South Carolina
South Dakota
West Virginia
Figure 1: Proposition 99, example from Abadie et al. (2010); Arkhangelsky et al. (2021)
Should we wish to generate the same graphs as in Arkhangelsky et al. (2021), summa-
rizing both (a) unit specific weights, and (b) treatment and synthetic control outcome
trends along with time specific weights, this can be requested with the addition of the
graph option. This is displayed below, where we additionally modify plot aesthetics
via the g1 opt() and g2 opt() options for weight graphs (Figure 1(a)), and trend
graphs (Figure 1(b)) respectively. Finally, generated graphs can be saved to disk us-
ing the graph export() option, with a graph type (.eps below), and optionally a
pre-pended plot name. Output corresponding to the below command is provided in
Figure 1.
. sdid packspercapita state year treated, vce(placebo) seed(1213) graph g1on
> g2_opt(ylabel(0(25)150) ytitle("Packs per capita") scheme(sj))
> g1_opt(xtitle("") scheme(sj)) g1on graph_export(sdid_, .eps)
18 Synthetic Difference In Differences
The sdid command returns multiple matrices containing treatment and control out-
come trends, weights, and other elements. These elements can be accessed simply
for use in post-estimation procedures or graphing. As a simple example, the follow-
ing code excerpt accesses treatment and synthetic control outcome trends (stored in
e(series), and time weights (stored in e(lambda)) and uses these elements to repli-
cate the plot presented in Figure 1b. Thus, if one wishes to have further control over
the precise nature of plotting, beyond that provided in the graphing options available
in sdid’s command syntax, one can simply work with elements returned in the ereturn
list. In Appendix 2, we show that with slightly more effort, returned elements can be
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 19
Packs per capita
1970 1980 1990 2000 1970 1980 1990 2000 1970 1980 1990 2000
Year Year Year
(a) SDID: Outcome Trends (b) DID: Outcome Trends (c) SC: Outcome Trendss
-110 -90
New Hampshire
New Mexico
North Carolina
North Dakota
Rhode Island
South Carolina
South Dakota
West Virginia
New Hampshire
New Mexico
North Carolina
North Dakota
Rhode Island
South Carolina
South Dakota
West Virginia
New Hampshire
New Mexico
North Carolina
North Dakota
Rhode Island
South Carolina
South Dakota
West Virginia
(d) SDID: Unit-Specific Weights (e) DID: Unit-Specific Weights (f) SC: Unit-Specific Weights
. preserve
. clear
. matrix series=e(series)
. matrix lambda=e(lambda)
. qui svmat series, names(col)
. qui svmat lambda
Control Treated
of which implement a parliamentary gender quota.5 For each of these countries, data
on the rates of women in parliament and the maternal mortality ratio are collected,
as well as a number of covariates.
This example presents a staggered adoption configuration, given that in the period
under study, quota adoption occurred in seven different yearly periods between 2000
and 2013. sdid handles a staggered adoption configuration seamlessly without any
particular changes to the syntax. In the code below, we implement the synthetic
difference-in-differences estimator using the bootstrap procedure to calculate standard
errors. The output by default reports the weighted ATT which is defined in (7)
above. However, as laid out in (7), this is based on each adoption-period specific
synthetic difference-in-differences estimate. These adoption-period specific estimates
are returned in the matrix e(tau), which is tabulated below the standard command
countries for which observations of women in parliament and maternal mortality exist for the full time
period, without missing observations.
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 21
r1 8.388868 2000
r2 6.967746 2002
r3 13.95226 2003
r4 -3.450543 2005
r5 2.749036 2010
r6 21.76272 2012
r7 -.8203235 2013
All other elements are identical to those documented in the case of a single adoption
period, however generalised to multiple adoptions. For example, if requesting graphi-
cal output, a single treatment versus synthetic control trend graph and corresponding
unit-level weight graph is provided for each adoption date. Similarly, ereturned matri-
ces such as e(lambda), e(omega) and e(series) provide columns for each particular
adoption period.
Adding Covariates As laid out in Section 2.2, covariates can be handled in synthetic
difference-in-differences in a number of ways. Below we document the inclusion of a
single covariate (the natural logarithm of GDP per capita). As sdid is based on a
balanced panel of observations, we must first ensure that there are no missing observa-
tions for all covariates, in this case dropping a small number of (control) countries for
which this measure is not available. We then include covariates via the covariates()
option. In the first case, this is conducted exactly following the procedure discussed
by Arkhangelsky et al. (2021), in which parameters on covariates are estimated within
the optimization routines in Mata. This is analogous to indicating covariates(,
optimized). Estimates in this particular case suggest that the inclusion of this con-
trol does little to dampen effects. After estimation, the coefficients on the covariates
can be inspected as part of e(beta), where an adoption-specific value for each co-
variate is provided, given that the underlying SDID estimate is calculated for each
adoption period.
. drop if lngdp==.
. sdid womparl country year quota, vce(bootstrap) seed(1213) covariates(lngdp)
Bootstrap replications (50). This may take some time.
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
Synthetic Difference-in-Differences Estimator
22 Synthetic Difference In Differences
Here, results are slightly different, though quantitatively comparable to those when
using alternative procedures for conditioning out covariates. In this case, if examining
the e(beta) matrix, only a single coefficient will be provided, as the regression used to
estimate the coefficient vector is always based on the same sample. This additionally
offers a non-trivial speed up in the execution of the code. For example, on a particular
personal computer with Stata SE 15.1 and relatively standard specifications, using
the optimized method above requires 324 seconds of computational time while using
projected requires 61 seconds (compared with 58 seconds where covariates are not
included in sdid).
and tabular output could be further enriched using additional options within esttab
if desired.
. webuse set
. webuse quota_example.dta, clear
. lab var quota "Parliamentary Gender Quota"
. drop if lngdp==.
. eststo sdid_2: sdid womparl country year quota, vce(bootstrap) seed(2022)
> covariates(lngdp, optimized)
In the following three code blocks we document bootstrap, placebo and jackknife
inference procedures. The difference in implementation in each case is very minor,
simply indicating either bootstrap, placebo or jaccknife in the vce() option. For
example, in the case of bootstrap inference, where block bootstraps over the variable
country are performed, the syntax is as follows:
6 As an example, with 50 replicates for bootstrap and placebo, and on a standard personal computer
running Stata SE, 15.1, the execution time for bootstrap was 18.2 seconds, for placebo permutations
was 10.09 seconds, and for jackknife was 0.7 seconds. This time scales approximately linearly with the
number of replicates in the case of bootstrap and placebo. With 500 replicates the time was 178.1 and
101.6 for bootstrap and placebo procedures respectively.
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 25
Women in Parliament
Women in Parliament
Figure 4: Outcome trends and event study style estimate of the impact of quotas on %
women in parliament
What such an analysis seeks to document is the differential evolution of treated and
(synthetic) control units, abstracting away from any baseline difference between the
groups. As an example, refer to Figure 4(a), which is based on the adoption of gender
quotas laid out in section 4.2, and in particular quota adoption year 2002. This is
standard output from sdid, presenting trends in rates of women in parliament in
countries which adopted quotas in 2002 (solid line), and synthetic control countries
which did not adopt quotas (dashed line). We will refer to the values plotted in these
26 Synthetic Difference In Differences
trend lines as ȲtT r for treated units in year t, and ȲtCo for synthetic control units in
year t. While this standard output allows us to visualize trends in the two groups in
a simple way, it is not immediately clear how the differences in these outcomes evolve
over time compared to baseline differences, nor the confidence intervals on any such
changes over time.
For this to resemble the logic of an event study analysis, we wish to consider, for
each period t, whether differences between treated units and synthetic controls have
changed when compared to baseline differences. Namely, for each period t, we wish
to calculate:
ȲtT r − ȲtCo − Ȳbaseline
Tr Co
− Ȳbaseline , (8)
Tr Co
along with the confidence interval for this quantity. Here Ȳbaseline and Ȳbaseline refer
to baseline (pre-treatment) means for treated and synthetic control units respectively.
In standard panel event studies, some arbitrary baseline period is chosen off of which
to estimate pre-treatment differences. This is often one year prior to treatment. In
the case of SDID where pre-treatment weights are optimally chosen as λ bsdid (refer to
Tr Co
section 2), this suggests an alternative quantity for Ȳbaseline and Ȳbaseline , namely:
Tpre Tpre
Tr bsdid Ȳ T r Co bsdid Ȳ Co .
Ȳbaseline = λt t Ȳbaseline = λ t t (9)
t=1 t=1
In words these baseline outcomes are simply pre-treatment aggregates, where weights
are determined by optimal pre-treatment weights (indicated by the shaded gray area
in Figure 4(a)). The event study then plots the quantities defined in (8), for each time
An example of such an event study style plot is presented in Figure 4(b). Here,
blue points present the quantity indicated in (8) for each year. In this case, t ranges
from 1990 to 2015. While all these points are based off a simple implementation of
sdid comparing outcomes between treated and control units following (8), confidence
intervals documented in gray shaded areas of Figure 4(b) can be generated following
the resampling or permutation procedures discussed earlier in this paper. Specifically,
in the case of re-sampling, a block bootstrap can be conducted, and in each iteration
the quantity in (8) can be re-calculated for each t. The confidence interval associated
with each of these quantities can then be calculated based on its variance across many
(block)-bootstrap resamples.
Figure 4(b), and graphs following this principle more generally, can be generated
following the use of sdid. However, by default sdid simply provides output on trends
among the treated and synthetic control units (as displayed in Figure 4(a)). In the
code below, we lay out how one can move from these trends to the event study in
panel (b). As this procedure requires conducting the inference portion of the plot
manually (unlike most other procedures involving sdid where inference is conducted
automatically as part of the command) the code is somewhat more involved. For
this reason, we discuss the code below in a number of blocks, terminating with the
generation of the plot displayed in Figure 4(b).
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 27
In a first code block, we will open the parliamentary gender quota data which we
used in section 4.2, and keep the particular adoption period considered here (countries
which adopt quotas in 2002), as well as un-treated units:
. webuse set
. webuse quota_example.dta, clear
We can then implement the standard SDID procedure, additionally exporting the
trend graphs which is displayed in Figure 4(a). This is done in the first line below,
after which a number of vectors are stored. These vectors allow us to calculate the
Tr Co bsdid , from
quantity (Ȳbaseline − Ȳbaseline ) indicated in (8), which is generated from λ
the returned matrix e(lambda), and pre-treatment values for ȲtT r and ȲtCo , from
the returned matrix e(series). This baseline quantity is referred to as meanpre o
below. Finally, the quantity of interest in (8) for each time period t is generated as
the variable d, which is plotted below as the blue points on the event study in Figure
. qui sdid womparl country year quota, vce(noinference) graph g2_opt(ylab(-5(5)20)
> ytitle("Women in Parliament") scheme(sj)) graph_export(groups, .pdf)
> covariates(lngdp, projected)
Perhaps the most complicated portion of code is that which implements the boot-
strap procedure. This is provided below, where for ease of replication we consider a
relatively small number of bootstrap resamples, which are set as the local B = 100.
In each bootstrap resample, we first ensure that both treatment and control units are
present (using the locals r1 and r2), and then re-estimate the sdid procedure with the
new bootstrap sample generated using Stata’s bsample command. This is precisely
the same block bootstrap procedure laid out by Arkhangelsky et al. (2021), and which
sdid conducts internally, however here we are interested in collecting, for each boot-
strap resample, the same quantity estimated above with the main sample as d, which
captures the estimate defined in (8) for each t. To do so, we simply follow an identical
procedure as that conducted above, however now save the resulting resampled values
of the quantities from (8) as a series of matrices d‘b’ for later processing to generate
confidence intervals in the event study.
. local b = 1
28 Synthetic Difference In Differences
. local B = 100
. while `b´<=`B´ {
. preserve
. bsample, cluster(country) idcluster(c2)
. qui count if quota == 0
. local r1 = r(N)
. qui count if quota != 0
. local r2 = r(N)
. if (`r1´!=0 & `r2´!=0) {
. qui sdid womparl c2 year quota, vce(noinference) graph covariates(lngdp, projected)
. local ++b
. }
. restore
. }
The final step is to calculate the standard deviation of each estimate from (8) based
on the bootstrap resamples, and then to generate confidence intervals for each pa-
rameter based on the estimates generated above (d), as well as their standard errors.
This is conducted in the first lines of the code below. For each of the B = 100 re-
samples conducted above, we import the vector of resampled estimates from (8), and
then using rowsd() calculate the standard deviation of the estimates across each time
period t. This is the bootstrap standard error, which is used below to calculate the
upper and lower bounds of 95% confidence intervals as [LCI;UCI]. Finally, based on
these generated elements (d, as blue points on the event study, and LCI, UCI as the
end points of confidence intervals) we generate the output for Figure 4(b) in the final
lines of code.
. preserve
. keep time d
. keep if time!=.
. forval b=1/`B´ {
. svmat d`b´ // import each bootstrap replicate of difference between trends
. }
. *generate plot
. tw rarea UCI LCI time, color(gray%40) || scatter d time, color(blue) m(d)
> xtitle("") ytitle("Women in Parliament") xlab(1990(1)2015, angle(45))
> legend(order(2 "Point Estimate" 1 "95% CI") pos(12) col(2))
> xline(2002, lc(black) lp(solid)) yline(0, lc(red) lp(shortdash))
> scheme(sj)
. graph export "event_sdid.pdf", replace
. restore
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 29
As noted above, the outcome of this graph is provided in Figure 4(b), where we
observe that, as expected, the synthetic difference-in-difference algorithm has resulted
in quite closely matched trends between the synthetic control and treatment group in
the pre-treatment period, given that all pre-treatment estimates lie close to zero. The
observed impact of quotas on women in parliament occurs from the treatment year
onward, where these differences are observed to be large and statistically significant.
This process of estimating an event study style plot is conducted here for a specific
adoption year. In the case of a block adoption design where there is only one adoption
period, this will be the only resulting event study to consider. However in a staggered
adoption design, a single event study could be generated for each adoption period.
Potentially, such event studies could be combined, but some way would be required to
deal with unbalanced lags and leads, and additionally some weighting function would
be required to group treatment lags and leads where multiple such lags and leads are
available. One such procedure has been proposed in Sun and Abraham (2021), and
could be a way forward here.
5 Conclusions
In this paper we have laid out the details behind Arkhangelsky et al. (2021)’s SDID
method, and discussed its implementation in Stata using the sdid command. We have
briefly discussed the methods behind this command, as well as laid out extensions into
a staggered adoption setting. We provide two empirical examples to demonstrate the
usage of the command.
It is important to note that given the nature of the algorithm, a number of require-
ments must be met for this to be applied to data. We lay these out below, as key
considerations for empirical researchers wishing to conduct estimation and inference
using the SDID estimator.
• Firstly, and most importantly, a balanced panel of data is required that provides
outcomes and treatment measures for each unit in all periods under study. Should
missing values in such outcomes be present in the panel, these either must be
eliminated from the estimation sample, or data should be sought to fill in gaps.
• Secondly, no units can be considered if they were exposed to treatment from the
first period in which data is observed. If this occurs, there is no pre-treatment
period on which to generate synthetic control cohorts. If always treated units are
present in the data, these either need to be eliminated, or earlier data sought.
• Third, pure control units are required. At least some units must never be treated
in order to act as donor units. If all units are treated at some point in the panel,
no donor units exist, and synthetic controls cannot be generated.
• Fourth, in cases where covariates are included, these covariates must be present
in all observations. If missing observations are present in covariates, this will
generate similar problems as when outcomes or treatment measures are missing.
30 Synthetic Difference In Differences
If missing observations are present, these treated units shold be removed from the
estimation sample, or data should be sought to complete the covariate coverage.
• Finally, in the case of inference, a number of additional requirements must be
met. In the case of bootstrap or jackknife procedures, the number of treated units
should be larger than 1 (and ideally considerably larger than this). Should only 1
treated unit be present, placebo inference should be conducted. Additionally, in
the case of placebo inference, this can only be conducted if the number of control
units exceeds the number of treated units.
Should a balanced panel of data be available, the SDID method, and the sdid
command described here, offers a flexible, easy to implement and robust option for
the analysis of impacts of policies or treatments in certain groups at certain times.
These methods provide clear graphical results to describe outcomes, and an explicit
description of how counterfactual outcomes are inferred. These methods are likely
well suited to a large body of empirical work in social sciences, where treatment
assignment is not random, and offer benefits over both difference-in-differences and
synthetic control methods.
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 31
6 References
Abadie, A., A. Diamond, and J. Hainmueller. 2010. Synthetic Control Methods for
Comparative Case Studies: Estimating the Effect of California’s Tobacco Control
Program. Journal of the American Statistical Association 105(490): 493–505.
Abadie, A., and J. Gardeazabal. 2003. The Economic Costs of Conflict: A Case Study
of the Basque Country. American Economic Review 93(1): 113–132.
Abadie, A., and J. L’Hour. 2021. A Penalized Synthetic Control Estimator for Disag-
gregated Data. Journal of the American Statistical Association 116(536): 1817–1834.
Ben-Michael, E., A. Feller, and J. Rothstein. 2021. The Augmented Synthetic Control
Method. Journal of the American Statistical Association 116(536): 1789–1803.
Bilinski, A., and L. A. Hatfield. 2018. Nothing to see here? Non-inferiority approaches
to parallel trends and other model assumptions. arXiv e-prints arXiv:1805.03273.
de Chaisemartin, C., and X. D’Haultfœuille. 2022. Two-way fixed effects and differences-
in-differences with heterogeneous treatment effects: a survey. The Econometrics
Journal Utac017.
Clarke, D., and K. Tapia-Schythe. 2021. Implementing the panel event study. The Stata
Journal 21(4): 853–884.
Conley, T. G., and C. R. Taber. 2011. Inference with “Difference in Differences” with
a Small Number of Policy Changes. The Review of Economics and Statistics 93(1):
32 Synthetic Difference In Differences
Holland, P. W. 1986. Statistics and Causal Inference. Journal of the American Statistical
Association 81(396): 945–960.
Manski, C. F., and J. V. Pepper. 2018. How Do Right-to-Carry Laws Affect Crime
Rates? Coping with Ambiguity Using Bounded-Variation Assumptions. The Review
of Economics and Statistics 100(2): 232–244. a 00689.
Orzechowski, W., and R. C. Walker. 2005. The Tax Burden on Tobacco. Historical
Compilation Volume 40, Arlington, VA.
Pailañir, D., and D. Clarke. 2022. SDID: Stata module to perform syn-
thetic difference-in-differences estimation, inference, and visualization. Sta-
tistical Software Components, Boston College Department of Economics.
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 33
34 Synthetic Difference In Differences
1 Estimation Algorithms for the Block Design
In this appendix, we replicate the estimation algorithm and inference algorithms de-
fined in Arkhangelsky et al. (2021). These are referred to in the text, and follow the
same notation as in section 2 here.
τ̂ , µ̂, α̂, β̂ = arg min (Yit − µ − αi − βt − Wit τ )2 ω
bisdid λ̂sdid
τ,µ,α,β i=1 t=1
for b ← 1 to B do
1. Construct a bootstrap dataset (Y(b) , W(b) ) by sampling N rows of
(Y, W) with replacement;
2. if the bootstrap sample has no treated units or no control units then
Discard resample and go to 1;
3. Compute SDID estimate τ (b) following algorithm A1 based on
(Y(b) , W(b) );
PB d PB 2
1 1
4. Define Vbτcb = B b=1 τ
(b) −
B b=1 τ
d (b) ;
Susan Athey and Damian Clarke and Guido Imbens and Daniel Pailañir 35
for i ← 1 to N do
1. Compute τb(−i) : arg minτ,{αj ,βt }j̸=i,t j̸=i,t (Yit − αj − βt − Wit τ )2 ω
bj λ̂t ;
PN 2
2. Compute Vbτjack = (N − 1)N −1 i=1 τb(−i) − τb ;
for b ← 1 to B do
1. Sample Ntr out of the Nco control units without replacment to ‘receive
the placebo’;
2. Construct a placebo treatment matrix Wco , for the controls;
3. Compute SDID estimate τ (b) based on (Yco , Wco );
PB d PB 2
1 1
4. Define Vbτplacebo = B b=1 τ
(b) −
B b=1
τd ;
36 Synthetic Difference In Differences
. preserve
. clear
. matrix omega=e(omega)
. svmat omega
. ren (omega1 omega2) (omega id)
. keep if id<=39
. tempfile domega
. save `domega´
. restore
West Virginia