Xu GeneralizedSyntheticControl 2017
Xu GeneralizedSyntheticControl 2017
REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/10.2307/26563292?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
Cambridge University Press and Society for Political Methodology are collaborating with
JSTOR to digitize, preserve and extend access to Political Analysis
Abstract
Difference-in-differences (DID) is commonly used for causal inference in time-series cross-sectional data.
It requires the assumption that the average outcomes of treated and control units would have followed
parallel paths in the absence of treatment. In this paper, we propose a method that not only relaxes
this often-violated assumption, but also unifies the synthetic control method (Abadie, Diamond, and
Hainmueller 2010) with linear fixed effects models under a simple framework, of which DID is a special case.
It imputes counterfactuals for each treated unit using control group information based on a linear interactive
fixed effects model that incorporates unit-specific intercepts interacted with time-varying coefficients. This
method has several advantages. First, it allows the treatment to be correlated with unobserved unit and
time heterogeneities under reasonable modeling assumptions. Second, it generalizes the synthetic control
method to the case of multiple treated units and variable treatment periods, and improves efficiency and
interpretability. Third, with a built-in cross-validation procedure, it avoids specification searches and thus is
easy to implement. An empirical example of Election Day Registration and voter turnout in the United States
is provided.
1 Introduction
Difference-in-differences (DID) is one of the most commonly used empirical designs in today’s
social sciences. The identifying assumptions for DID include the “parallel trends” assumption,
which states that in the absence of the treatment the average outcomes of treated and control
units would have followed parallel paths. This assumption is not directly testable, but researchers
have more confidence in its validity when they find that the average outcomes of the treated
and control units follow parallel paths in pretreatment periods. In many cases, however, parallel
pretreatment trends are not supported by data, a clear sign that the “parallel trends” assumption
is likely to fail in the posttreatment period as well. This paper attempts to deal with this problem
systematically. It proposes a method that estimates the average treatment effect on the treated
using time-series cross-sectional (TSCS) data when the “parallel trends” assumption is not likely
to hold.
The presence of unobserved time-varying confounders causes the failure of this assumption.
There are broadly two approaches in the literature to deal with this problem. The first one is
to condition on pretreatment observables using matching methods, which may help balance
Political Analysis (2017)
vol. 25:57–76 the influence of potential time-varying confounders between treatment and control groups. For
DOI: 10.1017/pan.2016.2
example, Abadie (2005) proposes matching before DID estimations. Although this method is easy
Published
21 February 2017
Corresponding author Author’s note: The author is indebted to Matt Blackwell, Devin Caughey, Justin Grimmer, Jens Hainmueller, Danny Hidalgo,
Yiqing Xu
Simon Jackman, Jonathan Katz, Luke Keele, Eric Min, Molly Roberts, Jim Snyder, Brandon Stewart, Teppei Yamamoto,
Edited by as well as seminar participants at the 2015 MPSA Annual Meeting and 2015 APSA Annual Meeting for helpful comments
R. Michael Alvarez and suggestions. The author thanks the editor, Mike Alvarez, and two anonymous reviewers for their extremely helpful
suggestions. He also thanks Jushan Bai for generously sharing the Matlab codes used in Bai (2009) and Melanie Springer
c The Author(s) 2017. Published for kindly providing the state-level voter turnout data (1920–2000). The source code and data used in the paper can be
by Cambridge University Press downloaded from the Political Analysis Dataverse at dx.doi.org/10.7910/DVN/8AKACJ (Xu 2016) as well as the author’s
on behalf of the Society for
Political Methodology. website.
57
1 See Hsiao, Ching, and Wan (2012) and Angrist, Jord, and Kuersteiner (2013) for alternative matching methods along this
line of thought.
2 To gauge the uncertainty of the estimated treatment effect, the synthetic control method compares the estimated
treatment effect with the “effects” estimated from placebo tests in which the treatment is randomly assigned to a control
unit.
3 See Campbell, Lo, and MacKinlay (1997) for applications of factor models in finance.
4 For more empirical applications of the IFE estimator, see Kim and Oka (2014) and Gaibulloev, Sandler, and Sul (2014).
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 58
2 Framework
Suppose Yi t is the outcome of interest of unit i at time t . Let T and C denote the sets of units in
treatment and control groups, respectively. The total number of units is N = N t r + N co , where
N t r and N co are the numbers of treated and control units, respectively. All units are observed for
T periods (from time 1 to time T ). Let T0,i be the number of pretreatment periods for unit i , which
5 When the treatment effect is heterogeneous (as it is almost always the case), an IFE model that imposes a constant
treatment effect assumption gives biased estimates of the average treatment effect because the estimation of the factor
space is affected by the heterogeneity in the treatment effect.
6 For example, Acemoglu et al. (2016), who estimate the effect of Tim Geithner connections on stock market returns, conduct
the synthetic control method repeatedly for each connected (treated) firm; Dube and Zipperer (2015) estimate the effect of
minimum wage policies on wage and employment by conducting the method for each of the 29 policy changes. The latter
also extend Abadie, Diamond, and Hainmueller (2010)’s original inferential method to the case of multiple treated units
using the mean percentile ranks of the estimated effects.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 59
Yi t = δi t D i t + x i0t β + λi0ft + εi t ,
where the treatment indicator D i t equals 1 if unit i has been exposed to the treatment prior to time
t and equals 0 otherwise (i.e., D i t = 1 when i ∈ T and t > T0 and D i t = 0 otherwise).7 δi t is the
heterogeneous treatment effect on unit i at time t ; x i t is a (k × 1) vector of observed covariates,
β = [β 1, . . . , β k ]0 is a (k ×1) vector of unknown parameters,8 ft = [f1t , . . . , fr t ]0 is an (r ×1) vector of
unobserved common factors, λi = [λi 1 , . . . , λi r ]0 is an (r ×1) vector of unknown factor loadings, and
εi t represents unobserved idiosyncratic shocks for unit i at time t and has zero mean. Assumption 1
requires that the treated and control units are affected by the same set of factors and the number
of factors is fixed during the observed time periods, i.e., no structural breaks are allowed.
The factor component of the model, λi0ft = λi 1f1t + λi 2f2t + · · · + λi r fr t , takes a linear, additive
form by assumption. In spite of the seemingly restrictive form, it covers a wide range of unobserved
heterogeneities. First and foremost, conventional additive unit and time fixed effects are special
cases. To see this, if we set f1t = 1 and λi 2 = 1 and rewrite λi 1 = αi and f2t = ξt , then λi 1f1t +
λi 2 f2t = αi + ξt .9 Moreover, the term also incorporates cases ranging from unit-specific linear
or quadratic time trends to autoregressive components that researchers often control for when
analyzing TSCS data.10 In general, as long as an unobserved random variable can be decomposed
into a multiplicative form, i.e., Ui t = a i × b t , it can be absorbed by λi0ft while it cannot capture
unobserved confounders that are independent across units.
To formalize the notion of causality, we also use the notation from the potential outcomes
framework for causal inference (Neyman 1923; Rubin 1974; Holland 1986). Let Yi t (1) and Yi t (0) be
the potential outcomes for individual i at time t when D i t = 1 or D i t = 0, respectively. We thus
have Yi t (0) = x i0t β + λi0ft + εi t and Yi t (1) = δi t + x i0t β + λi0ft + εi t . The individual treatment effect
on treated unit i at time t is therefore δi t = Yi t (1) − Yi t (0) for any i ∈ T , t > T0 .
We can rewrite the DGP of each unit as:
Yi = D i ◦ δi + Xi β + F λi + εi , i ∈ 1, 2, . . . N co , N co + 1, . . . , N ,
where Yi = [Yi 1,Yi 2, . . . ,YiT ]0; D i = [D i 1, D i 2, . . . , D iT ]0 and δi = [δi 1, δi 2, . . . , δiT ]0 (symbol “◦”
stands for point-wise product); εi = [εi 1, εi 2, . . . , εiT ]0 are (T × 1) vectors; Xi = [x i 1, x i 2, . . . , x iT ]0
is a (T × k ) matrix; and F = [f1, f2, . . . , fT ]0 is a (T × r ) matrix.
7 Cases in which the treatment switches on and off (or “multiple-treatment-time”) can be easily incorporated in this
framework as long as we impose assumptions on how the treatment affects current and future outcomes. For example,
one can assume that the treatment only affect the current outcome but not future outcomes (no carryover effect), as
fixed effects models often do. In this paper, we do not impose such assumptions. See Imai and Kim (2016) for a thorough
discussion.
8 β is assumed to be constant across space and time mainly for the purpose of fast computation in the frequentist framework.
It is a limitation compared with more flexible and increasingly popular random coefficient models in Bayesian multi-level
analysis.
9 For this reason, additive unit and time fixed effects are not explicitly assumed in the model. An extended model that directly
imposes additive two-way fixed effects is discussed in the next section.
10 In the former case, we can set f1t = t and f2t = t 2 ; in the latter case, for example, we can rewrite Yi t = ρYi ,t −1 + x i0t β + εi t
as Yi t = Yi 0 · ρ t + x i0t β + νi t , in which νi t is an AR(1) process and ρ t and Yi 0 are the unknown factor and factor loadings,
respectively. See Gobillon and Magnac (2016) for more examples.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 60
0
Yco = Xco β + F Λco + εco , (1)
in which Yco = [Y1,Y2, . . . ,YN co ] and εco = [ε1 , ε2 , . . . , εN co ] are (T × N co ) matrices; Xco is a
three-dimensional (T × N co × p) matrix; and Λco = [λ 1, λ 2, . . . , λ N co ]0 is a (N co × r ) matrix, hence,
the products Xco β and F Λco 0 are also (T × N ) matrices. To identify β , F and Λ in Equation (1),
co co
more constraints are needed. Following Bai (2003, 2009), I add two sets of constraints on the
factors and factor loadings: (1) all factor are normalized, and (2) they are orthogonal to each other,
i.e.: F 0F /T = I r and Λco
0 Λ 11
co = diagonal. For the moment, the number of factors r is assumed
to be known. In the next section, we propose a cross-validation procedure that automates the
choice of r .
The main quantity of interest of this paper is the average treatment effect on the treated (ATT)
at time t (when t > T0 ):
1 X 1 X
ATT t ,t >T0 = [Yi t (1) − Yi t (0)] = δi t .12
Nt r i ∈T Nt r i ∈T
Note that in this paper, as in Abadie, Diamond, and Hainmueller (2010), we treat the treatment
effects δi t as given once the sample is drawn.13 Because Yi t (1) is observed for treated units in
posttreatment periods, the main objective of this paper is to construct counterfactuals for each
treated unit in posttreatment periods, i.e., Yi t (0) for i ∈ T and t > T0 . The problem of causal
inference indeed turns into a problem of forecasting missing data.14
εi t ⊥⊥ D j s , x j s , λ j , fs [i , j , t , s .
Assumption 2 means that the error term of any unit at any time period is independent of treatment
assignment, observed covariates, and unobserved cross-sectional and temporal heterogeneities
11 These constraints do not lead to loss of generality because for an arbitrary pair of matrices F and Λco , we can find an (r × r )
invertible matrix A such that (F A)0 (F A)/T = I r and (A−1 Λco )0 A−1 Λco is a diagonal matrix. To see this, we can then rewrite
λi0 F as λ̃i0 F˜ , in which F˜ = F A and λ̃i = A−1 λi for units in both the treatment and control groups such that F˜ and Λ̃co satisfy
the above constraints. The total number of constraints is r 2 , the dimension of the matrix space where A belongs. It is worth
noting that although the original factors F may not be identifiable, the space spanned by F , a r -dimensional subspace of
in the T -dimensional space, is identified under the above constraints because for any vector in the subspace spanned by
F˜ , it is also in the subspace spanned by the original factors F .
12 For a clear and detailed explanation of quantities of interest in TSCS analysis, see Blackwell and Glynn (2015). Using
their terminology, this paper intends to estimate the Average Treatment History Effect on the Treated given two specific
treatment histories: Å[Yi t (a 1t ) − Yi t (a 0t )`D i ,t −1 = a 1t −1 ] in which a 0t = (0, . . . , 0), a 1t = (0, . . . , 0, 1, . . . , 1) with T0 zeros and
(t − T0 ) ones indicate the histories of treatment statuses. We keep the current notation for simplicity.
13 We attempt to make inference about the ATT in the sample we draw, not the ATT of the population. In other words, we do
not incorporate uncertainty of the treatment effects δi t .
14 The idea of predicting treated counterfactuals in a DID setup is also explored by Brodersen et al. (2014) using a structural
Bayesian time-series approach.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 61
Assumptions 3 and 4 (see the Online Appendix in Supplementary Materials for details) are needed
for the consistent estimation of β and the space spanned by F (or F 0F /T ). Similar, though slightly
weaker, assumptions are made in Bai (2009) and Moon and Weidner (2015). Assumption 3 allows
weak serial correlations but rules out strong serial dependence, such as unit root processes; errors
of different units are uncorrelated. A sufficient condition for Assumption 3 to hold is that the error
terms are not only independent of covariates, factors, and factor loadings, but also independent
both across units and over time, which is assumed in Abadie, Diamond, and Hainmueller (2010).
Assumption 4 specifies moment conditions that ensure the convergence of the estimator.
For valid inference based on a block bootstrap procedure discussed in the next section, we also
need Assumption 5 (see Online Appendix for details). Heteroscedasticity across time, however, is
allowed.
REMARK 1. Assumptions 3 and 5 suggest that the error terms εi t can be serially correlated.
Assumption 2 rules out dynamic models with lagged dependent variables; however, this is
mainly for the purpose of simplifying proofs (Bai 2009, p. 1243). The proposed method can
accommodate dynamic models as long as the error terms are not serially correlated.
3 Estimation Strategy
In this section, we first propose a GSC estimator for treatment effect of each treated unit. It is
essentially an out-of-sample prediction method based on Bai (2009)’s factor augmented model.
15 Note that because εi t is independent of D i s and x i s for all (t , s), Assumption 2 rules out the possibility that past outcomes
may affect future treatments, which is allowed by the so called “sequential exogeneity” assumption. A directed acyclic
graph (DAG) representation is provided in the Online Appendix. See Blackwell and Glynn (2015) and Imai and Kim (2016)
for discussions on the difference between the strict ignorability and sequential ignorability assumptions. What is unique
here is that we conditional on unobserved factors and factor loadings.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 62
s.t. F˜ 0F˜ /T = Ir 0 Λ̃
and Λ̃co co = diagonal.
We explain in detail how to estimate this model in the Online Appendix. The second step estimates
factor loadings for each treated unit by minimizing the mean squared error of the predicted treated
outcome in pretreatment periods:
in which βˆ and Fˆ 0 are from the first-step estimation and the superscripts “0”s denote the
pretreatment periods. In the third step, we calculate treated counterfactuals based on βˆ , Fˆ , and λ̂i :
An estimator for AT Tt therefore is: AT T t = (1/N t r ) i ∈ T [Yi t (1) − Ŷi t (0)] for t > T0 .
E P
REMARK 2. In the Online Appendix, we show that, under Assumptions 1–4, the bias of the GSC
shrinks to zero as the sample size grows, i.e., Åε (ATE T t `D, X , Λ, F ) → AT Tt as N co , T0 → 0
(N t r is taken as given), in which D = [D 1, D 2, . . . , D N ] is a (T × N ) matrix, X is a three-
dimensional (T × N × p) matrix; and Λ = [λ 1 , λ 2, . . . , λ N ]0 is a (N × r ) matrix. Intuitively, both
large N co and large T0 are necessary for the convergences of βˆ and the estimated factor space.
When T0 is small, imprecise estimation of the factor loadings, or the “incidental parameters”
problem, will lead to bias in the estimated treatment effects. This is a crucial difference from
the conventional linear fixed effects models.
Step 1. Start with a given number of factors r , estimate an IFE model using the control group data
{Yi , Xi }i ∈ C , obtaining βˆ and Fˆ ;
Step 2. Start a cross-validation loop that goes through all T0 pretreatment periods:
(a) In round s ∈ {1, . . . , T0 }, hold back data of all treated units at time s. Run an OLS
regression using the rest of the pretreatment data, obtaining factor loadings for each
treated unit i :
00 0 −1 00 0
λ̂i ,−s = (F −s F −s ) F −s (Yi ,−s − Xi00 ˆ
,−s β ), [i ∈ T ,
in which the subscript “-s” stands for all pretreatment periods except for s.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 63
T0 X
X
MSPE(r ) = e i2s /T0 .
s=1 i ∈ T
Step 4. Repeat Steps 1–3 with different r ’s and obtain corresponding MSPEs.
Step 5. Choose r ∗ that minimizes the MSPE.
The basic idea of the above procedure is to hold back a small amount of data (e.g., one
pretreatment period of the treatment group) and use the rest of data to predict the held-back
information. The algorithm then chooses the model that on average makes the most accurate
predictions. A TSCS dataset with a DID data structure allows us to do so because (1) there exists
a set of control units that are never exposed to the treatment and therefore can serve as the basis
for estimating time-varying factors and (2) the pretreatment periods of treated units constitute
a natural validation set for candidate models. This procedure is computationally inexpensive
because with each r , the IFE model is estimated only once (Step 1). Other steps involves merely
simple calculations. In the Online Appendix, we conduct Monte Carlo exercises and show that
the above procedure performs well in term of choosing the correct number of factors even with
relatively small datasets.
REMARK 3. Our framework can also accommodate DGPs that directly incorporate additive fixed
effects, known time trends, and exogenous time-invariant covariates, such as:
in which l t is a (q × 1) vector of known time trends that may affect each unit differently; γi
is (q × 1) unit-specific unknown parameters; z i is a (m × 1) vector of observed time-invariant
covariates; θt is a (m × 1) vector of unknown parameters; αi and ξt are additive individual and
time fixed effects, respectively. We describe the estimation procedure of this extended model
in the Online Appendix.
3.2 Inference
We rely on a parametric bootstrap procedure to obtain the uncertainty estimates of the GSC
estimator (deriving the analytical asymptotic distribution of the GSC estimator is a necessary
step for future research). When the sample size is large, when N t r is large in particular, a simple
nonparametric bootstrap procedure can provide valid uncertainty estimates. When the sample
size is small, especially when N t r is small, we are unable to approximate the DGP of the treatment
group by resampling the data nonparametrically. In this case, we simply lack the information of
the joint distribution of (Xi , λi , δi ) for the treatment group. However, we can obtain uncertainty
estimates conditional on observed covariates and unobserved factors and factor loadings using
a parametric bootstrap procedure via resampling the residuals. By resampling entire time series
of residuals, we preserve the serial correlation within the units, thus avoiding underestimating
the standard errors due to serial correlations (Beck and Katz 1995). Our goal is to estimate the
conditional variance of ATT estimator, i.e., Varε (AT E T t `D, X , Λ, F ). Notice that the only random
variable that is not being conditioned on is εi , which are assumed to be independent of treatment
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 64
in which Ỹi (0) is a vector of simulated outcomes in the absence of the treatment; Xi βˆ + Fˆ λ̂i is
p
the estimated conditional mean; and ε̃i and ε̃i are resampled residuals for unit i , depending
on whether it belongs to the treatment or control group. Because βˆ are Fˆ are estimated using
only the control group information, Xi βˆ + Fˆ λ̂i fits Xi β + F λi better for a control unit than for
p
a treated unit (as a result, the variance of ε̃i is usually bigger than that of ε̃i ). Hence, ε̃i and
p
ε̃i are drawn from different empirical distributions: ε̃i is the in-sample error of the IFE model
fitted to the control group data, and therefore is drawn from the empirical distribution of the
p
residuals of the IFE model, while ε̃i can be seen as the prediction error of the IFE model for treated
counterfactuals.17
Although we cannot observe treated counterfactuals, Yi t (0) is observed for all control units.
With the assumptions that treated and control units follow the same factor model (Assumption 1)
and the error terms are independent and homoscedastic across space (Assumption 5), we can
p
use a cross-validation method to simulate εi based on the control group data (Efron 2012).
Specifically, each time we leave one control unit out (to be taken as a “fake” treat unit) and use
the rest of the control units to predict the outcome of left-out unit. The difference between the
p
predicted and observed outcomes is a prediction error of the IFE model. εi is drawn from the
empirical distributions of the prediction errors. Under Assumptions 1–5, this procedure provides
valid uncertainty estimates for the proposed method without making particular distributional
assumptions of the error terms. Algorithm 2 describes the entire procedure in detail.
16 εi t may be correlated with λ̂i when the errors are serially correlated because λ̂i is estimated using the pretreatment data.
17 The treated outcome for unit i , thus can be drawn from Ỹi (1) = Ỹi (0) + δi . We do not directly observe δi , but since it is taken
as given, its presence will not affect the uncertainty estimates of AT
E T t . Hence, in the bootstrap procedure, we use Ỹi (0) for
both the treatment and control groups to form bootstrapped samples (set δi = 0, for all i ∈ T ). We will add back AT E Tt
when constructing confidence intervals.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 65
B B 2
1 X * E (k ) 1 X E (j ) +
Var(AT
E T t `D, X , Λ, F ) = .AT T t − AT T t /
B B
k =1 , j =1 -
and its confidence interval using the conventional percentile method (Efron and Tibshirani
1993).
Yi t = δi t D i t + x i t ,1 · 1 + x i t ,2 · 3 + λi0ft + αi + ξt + 5 + εi t (3)
where ft = (f1t , f2t )0 and λi = (λi 1, λi 2 )0 are time-varying factors and unit-specific factor loadings.
The covariates are (positively) correlated with both the factors and factor loadings: x i t ,k = 1 +
λi0ft + λi 1 + λi 2 + f1t + f2t + ηi t ,k , k = 1, 2. The error term εi t and disturbances in covariates ηi t ,1
and ηi t ,2 are i.i.d. N (0, 1). Factors f1t and f2t , as well as time fixed effects ξt , are also i.i.d. N (0, 1).
The treatment and control groups consist of N t r and N co units. The treatment starts to affect
the treated units at time T0 + 1 and since then 10 periods are observed (q = 10). The treatment
indicator is defined as in Section 2, i.e., D i t = 1 when i ∈ T and t > T0 and D i t = 0 otherwise. The
heterogeneous treatment effect is generated by δi t ,t >T0 = δ̄ t + e i t , in which e i t is i.i.d. N(0,1). δ̄ t is
given by: [δ̄T0 +1, δ̄T0 +1, . . . , δ̄T0 +10 ] = [1, 2, . . . , 10].
Factor loadings λi 1 and λi 2 , as well as unit fixed effects αi , are drawn from uniform distributions
√ √ √ √ √ √
U [− 3, 3] for control units and U [ 3 − 2w 3, 3 3 − 2w 3] for treated units (w ∈ [0, 1]). This
means that when 0 ≤ w < 1, (1) the random variables have variance 1; (2) the supports of factor
loadings of treated and control units are not perfectly overlapped; and (3) the treatment indicator
and factor loadings are positively correlated.18
18 The DGP specified here is modified based on Bai (2009) and Gobillon and Magnac (2016).
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 66
q
(k ) (k )
19 Standard deviation is defined as: S D (AT
E T t ) = Å[ATE T t − Å(AT E T t )]2 , while root mean squared error is defined as:
q
(k )
R M S E (AT
E T t ) = Å(AT
E T t − AT Tt(k ) )2 . The superscript (k ) denotes the k th sample. We see that they are very close
because the bias of the GSC estimator shrinks to zero as the sample size grows.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 67
the GSC estimator has limited bias even when T0 and N co are relatively small and the bias
goes away as T0 and N co grow. As expected, both the SD and RMSE shrink when T0 and N co
become larger. Table 1 also reports the coverage probabilities of 95 percent confidence intervals
for AT
E T i ,T0 +5 constructed by the parametric bootstrap procedure (Algorithm 2). For each pair ofT0
and N co , the coverage probability is calculated based on 5,000 simulated samples, each of which
is bootstrapped for 1,000 times. These numbers show that the proposed procedure can achieve
the correct coverage rate even when the sample size is relatively small (e.g., T0 = 15, N t r = 5,
N co = 80).
In the Online Appendix, we run additional simulations and compare the proposed method
with several existing methods, including the DID estimator, the IFE estimator, and the synthetic
matching method. We find that (1) the GSC estimator has less bias than the DID estimator in the
presence of unobserved, decomposable time-varying confounders; (2) it has less bias than the
IFE estimator when the treatment effect is heterogeneous; and (3) it is usually more efficient than
the original synthetic matching estimator. It is worth emphasizing that these results are under
the premise of correct model specifications. To address the concern that the GSC method relies
on correct model specifications, we conduct additional tests and show that the cross-validation
scheme described in Algorithm 1 is able to choose the number of factors correctly most of the time
when the sample is large enough.
5 Empirical Example
In this section, we illustrate the GSC method with an empirical example that investigates the effect
of EDR laws on voter turnout in the United States. Voting in the United States usually takes two
steps. Except in North Dakota, where no registration is needed, eligible voters throughout the
country must register prior to casting their ballots. Registration, which often requires a separate
trip from voting, is widely regarded as a substantial cost of voting and a culprit of low turnout
rates before the 1993 National Voter Registration Act (NVRA) was enacted (e.g., Highton 2004).
Against this backdrop, EDR is a reform that allows eligible voters to register on Election Day when
they arrive at polling stations. In the mid-1970s, Maine, Minnesota, and Wisconsin were the first
adopters of this reform in the hopes of increasing voter turnout; while Idaho, New Hampshire, and
Wyoming established EDR in the 1990s as a strategy to opt out the NVRA (Hanmer 2009). Before
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 68
20 In the Online Appendix, we list the years during which EDR laws were enacted and first took effect in presidential elections.
21 See Wolfinger and Rosenstone (1980), Mitchell and Wlezien (1995), Rhine (1992), Highton (1997), Timpone (1998), Timpone
(2002), Huang and Shields (2000), Alvarez, Ansolabehere, and Wilson (2002), Brians and Grofman (2001), Hanmer (2009),
Burden et al. (2009), Cain, Donovan, and Tolbert (2011), Teixeira (2011) for examples. The results are especially consistent
for the three early adopters, Maine, Minnesota, and Wisconsin.
22 See, for example, Fenster (1994), King and Wambeam (1995), Knack and White (2000), Knack (2001), Neiheisel and Burden
(2012), Springer (2014).
23 The data from 1920 to 2000 are from Springer (2014). The data from 2004 to 2012 are from The United States Election
Project, http://www.electproject.org/. Indicators of other registration laws, including universal mail-in registration and
motor voter registration, also come from Springer (2014), with a few supplements. Replication files can be found in Xu
(2016).
24 We do not use the voting-eligible population (VEP) as the denominator because they are not available in early years.
25 As is shown in the figure and has been pointed out by many, turnout rates are in general higher in states that have EDR laws
than states that have not, but this does not necessarily imply a causal relationship between EDR laws and voter turnout.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 69
Voter turnout %
FE GSC
Outcome variable (1) (2) (3) (4)
Election Day Registration 0.87 0.78 5.13 4.90
(3.01) (3.31) (2.27) (2.27)
Universal mail-in registration −0.94 0.15
(1.80) (0.80)
Motor voter registration −0.21 −1.05
(1.45) (0.79)
State fixed effects x x x x
Year fixed effects x x x x
Unobserved factors N/A N/A 2 2
Observations 1,128 1,128 1,128 1,128
Treated states 9 9 9 9
Control states 38 38 38 38
Note: Standard errors in columns (1) and (2) are based on nonparametric bootstraps (blocked at the state
level) of 2,000 times. Standard errors in columns (3) and (4) are based on parametric bootstraps (blocked at
the state level) of 2,000 times.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 70
Next, we apply the GSC method to the same dataset. Table 2 columns (3) and (4) summarize the
result.26 Again, both specifications impose additive state and year fixed effects. In column (3), no
covariates are included, while in column (4), mail-in and motor voter registration are controlled
for (assuming that they have constant effects on turnout). With both specifications, the cross-
validation scheme finds two unobserved factors to be important and after conditioning on both
the factors and additive fixed effects, the estimated ATT based on the GSC method is around
5 percent with a standard error of 2.3 percent.27 This means that EDR laws are associated with
a statistically significant increase in voter turnout, consistent with previous OLS results based on
individual-level data. The lower panel of Figure 2 shows the dynamics of the estimated ATT. Again,
in the left figure, averages are taken after the actual and predicted turnout rates are realigned to
the timing of the reform. With the GSC method, the average actual turnout and average predicted
turnout match well in pretreatment periods and diverge after EDR laws took effect. The right figure
shows that the gaps between the two lines are virtually flat in pretreatment periods and the effect
takes off right after the adoption of EDR.28
Figure 3 presents the estimated factors and factor loadings produced by the GSC method.29
Figure 3a depicts the two estimated factors. The x -axis is year and the y -axis is the magnitude
of factors (rescaled by the square root of their corresponding eigenvalues to demonstrate their
relative importance). Figure 3b shows the estimated factors loadings for each treated (black, bold)
and control (gray) units, with x - and y -axes indicating the magnitude of the loadings for the first
and second factors, respectively. Bearing in mind the caveat that estimated factors may not be
directly interpretable because they are, at best, linear transformations of the true factors, we find
that the estimated factors shown in this figure are meaningful. The first factor captures the sharp
increase in turnout in the southern states because of the 1965 Voting Rights Act that removed
Jim Crow laws, such as poll taxes or literacy tests, that suppressed turnout. As shown in the
26 Note that although the estimated ATT of EDR on voter turnout is presented in the same row as the coefficient of EDR using
the FE model, the GSC method does not assume the treatment effect to be constant. In fact, it allows the treatment effect
to be different both across states and over time. Predicted counterfactuals and individual treatment effect for each of the
nine treated states are shown in the Online Appendix.
27 The results are similar if additive state and year fixed effects are not directly imposed, though not surprisingly, the algorithm
includes an additional factor.
28 Although it is not guaranteed, this is not surprising since the GSC method uses information of all past outcomes and
minimizes gaps between actual and predicted turnout rates in pretreatment periods.
29 The results are essentially the same with or without controlling for the other two registration reforms.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 71
Voter turnout %
1st wave 2nd wave 3rd wave
Outcome variable (1) (2) (3)
Election Day Registration 7.27 2.17 −1.14
(3.33) (2.82) (3.00)
Mail-in and motor voter registration x x x
State fixed effects x x x
Year fixed effects x x x
Unobserved factors 2 2 2
Observations 1,128 1,128 1,128
Treated states 3 3 3
(ME, MN, WI) (ID, NH, WY) (MT, IA, CT)
Control states 38 38 38
Note: Standard errors are based on parametric bootstraps (blocked at the state level) of 1,000 times.
right figure, the top eleven states that have the largest loadings on the first factor are exactly the
eleven southern states (which were previously in the confederacy).30 The labels of these states
are underlined in Figure 3b. The second factor, which is set to be orthogonal to the first one, is
less interpretable. However, its nonnegligible magnitude indicates a strong downward trend in
voter turnout in many states in recent years. Another reassuring finding shown by Figure 3b is that
the estimated factor loadings of the nine treated units mostly lie in the convex hull of those of
the control units, which indicates that the treated counterfactuals are produced mostly by more
reliable interpolations instead of extrapolations.
Finally, we investigate the heterogeneous treatment effects of EDR laws. Previous studies have
suggested that the motivations behind enacting these laws are vastly different between the early
adopters and later ones. For example, Maine, Minnesota, and Wisconsin, which established the
EDR in mid-1970s, did so because officials in these states sincerely wanted the turnout rates
to be higher, while the “reluctant adopters,” including Idaho, New Hampshire, and Wyoming,
introduced the EDR as a means to avoid the NVRA because officials viewed the NVRA as “a more
costly and potentially chaotic system” (Hanmer 2009). Because of the different motivations and
other reasons, we may expect the treatment effect of EDR laws to be different in states that
adopted them in different times.
The estimation of heterogeneous treatment effects is embedded in the GSC method since it
gives individual treatment effects for all treated units in a single run. Table 3 summarizes the ATTs
of EDR on voter turnout among the three waves of EDR adopters. Again, additive state and year
fixed effects, as well as indicators of two other registration systems, are controlled for. Table 3
shows that EDR laws have a large and positive effect on the early adopters (the estimate is about
7 percent with a standard error of 3 percent) while EDR laws were found to have no statistically
significant impact on the other six states.31 Such differential outcomes can be due to two reasons.
First, the NVRA of 1993 substantially reduced the cost of registration: since almost everyone who
30 Although we can control for indicators of Jim Crow laws in the model, such indicators may not be able to capture the
heterogeneous impacts of these laws on voter turnout in each state.
31 In the Online Appendix, we show that the treatment effects are positive (and relatively large) for all three early adopting
states, Maine, Minnesota, and Wisconsin. Using a fuzzy regression discontinuity design, Keele and Minozzi (2013) show
that EDR has almost no effect on the turnout in Wisconsin. The discrepancy with this paper could be mainly due to the
difference in the estimands. Two biggest cities in Wisconsin, Milwaukee and Madison constitute a major part of Wisconsin’s
constituency but have neglectable influence to their local estimates. One advantage of Keele and Minozzi (2013)’s approach
over ours is the use of fine-grained municipal level data.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 72
6 Conclusion
In this paper, we propose the GSC method for causal inference with TSCS data. It attempts to
address the challenge that the “parallel trends” assumption often fails when researchers apply
fixed effects models to estimate the causal effect of a certain treatment. The GSC method estimates
the individual treatment effect on each treated unit semiparametrically. Specifically, it imputes
treated counterfactuals based on a linear interactive fixed effects model that incorporates time-
varying coefficients (factors) interacted with unit-specific intercepts (factor loadings). A built-in
cross-validation scheme automatically selects the model, reducing the risks of overfitting.
This method is in spirit of the original synthetic control method in that it uses data from
pretreatment periods as benchmarks to customize a reweighting scheme of control units in order
to make the best possible predictions for treated counterfactuals. It generalizes the synthetic
control method in two aspects. First, it allows multiple treated units and differential treatment
timing. Second, it offers uncertainty estimates, such as standard errors and confidence intervals,
that are easy to interpret.
Monte Carlo exercises suggest that the proposed method performs well even with relatively
smallT0 and N co and show that it has advantages over several existing methods: (1) it has less bias
than the two-way fixed effects or DID estimators in the presence of decomposable time-varying
confounders, (2) it corrects bias of the IFE estimator when the treatment effect is heterogeneous
across units; and (3) it is more efficient than the synthetic control method. To illustrate the
applicability of this method in political science, we estimate the effect of EDR laws on voter turnout
in the United States. We show that EDR laws increased turnout in early adopting states but not in
states that introduced them more recently.
Two caveats are worth emphasizing. First, insufficient data (with either a small T0 or a small
N co ) cause bias in the estimated treatment effect. In general, users should be cautious when
T0 < 10 or N co < 40. Second, excessive extrapolations based on imprecisely estimated factors
and factor loading can lead to erroneous results. To avoid this problem, we recommend the
following diagnostics upon using this method: (1) plot raw data of treated and control outcomes
as well as imputed counterfactuals and check whether the imputed values are within reasonable
intervals; (2) plot estimated factor loadings of both treated and control units and check the
overlap (as in Fig. 3). We provide software routines gsynth in R to implement the estimation
procedure as well as these diagnostic tests. When excessive extrapolations appear to happen,
32 Glynn and Quinn (2011) argue that traditional cross-sectional methods in general overestimate the effect of EDR laws on
voter turnout and suggest that EDR laws are likely to have minimum effect on turnout in non-EDR states (the ATC). In this
paper, we focus on the effect of EDR in EDR states (the ATT) instead.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 73
Supplementary material
For supplementary material accompanying this paper, please visit
https://doi.org/10.1017/pan.2016.2.
References
Abadie, Alberto. 2005. Semiparametric difference-in-differences estimators. The Review of Economic Studies
72(1):1–19.
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. Synthetic control methods for comparative
case studies: Estimating the effect of California’s tobacco control program. Journal of the American
Statistical Association 105(490):493–505.
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2015. Comparative politics and the synthetic
control method. American Journal of Political Science 59(2):495–510.
Acemoglu, Daron, Simon Johnson, Amir Kermani, James Kwak, and Todd Mitton. 2016. The value of
connections in turbulent times: Evidence from the United States. Journal of Financial Economics
121(2):368–391.
Alvarez, R. Michael, Stephen Ansolabehere, and Catherine H. Wilson. 2002. Election day voter registration in
the United States: How one-step voting can change the composition of the American electorate. Working
Paper, Caltech/MIT Voting Technology Project.
Angrist, Joshua D., Scar Jord, and Guido Kuersteiner. 2013. Semiparametric estimates of monetary policy
effects: String theory revisited. NBER Working Paper No. 19355.
Bai, Jushan. 2003. Theory for factor models of large dimensions. Econometrica 71(1):135–137.
Bai, Jushan. 2009. Panel data models with interactive fixed effects. Econometrica 77:1229–1279.
Beck, Nathaniel, and Jonathan N. Katz. 1995. What to do (and not to do) with time-series cross-section data.
American Political Science Review 89(3):634–647.
Blackwell, Matthew, and Adam Glynn. 2015. How to make causal inferences with time-series cross-sectional
data. Harvard University. Mimeo.
Brians, Craig Leonard, and Bernard Grofman. 2001. Election day registration’s effect on US voter turnout.
Social Science Quarterly 82(1):170–183.
Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2014. Inferring causal
impact using Bayesian structural time-series models. Annals of Applied Statistics 9(1):247–274.
Burden, Barry C., David T. Canon, Kenneth R. Mayer, and Donald P. Moynihan. 2009. The effects and costs of
early voting, election day registration, and same day registration in the 2008 elections. University of
Wisconsin-Madison. Mimeo.
Burnham, Walter Dean. 1980. The appearance and disappearance of the American voter. In Electoral
participation: A comparative analysis, ed. Richard Rose. Beverly Hills, CA: Sage Publications.
Cain, Bruce E., Todd Donovan, and Caroline J. Tolbert. 2011. Democracy in the states: Experiments in election
reform. Washington, DC: Brookings Institution Press.
Campbell, John Y., Andrew W. Lo, and A. Craig MacKinlay. 1997. The econometrics of financial markets.
Princeton, NJ: Princeton University Press.
Dube, Arindrajit, and Ben Zipperer. 2015. Pooling multiple case studies using synthetic controls: An
application to minimum wage policies. IZA Discussion Paper No. 8944.
Efron, Brad. 2012. The estimation of prediction error. Journal of the American Statistical Association
99(467):619–632.
Efron, Brad, and Rob Tibshirani. 1993. An introduction to the bootstrap. New York: Chapman & Hall.
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 74
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 75
Yiqing Xu ` Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models 76