Spillover

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

2021, VOL. 116, NO. 534, 632–644: Applications and Case Studies
https://doi.org/10.1080/01621459.2020.1775612

Causal Inference With Interference and Noncompliance in Two-Stage Randomized


Experiments
Kosuke Imaia , Zhichao Jiangb , and Anup Malanic,d
a
Department of Government and Department of Statistics, Institute for Quantitative Social Science, Harvard University, Cambridge, MA; b Department
of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA; c Pritzker School of Medicine, University of Chicago Law School, Chicago,
IL; d National Bureau of Economic Research, Cambridge, MA

ABSTRACT ARTICLE HISTORY


In many social science experiments, subjects often interact with each other and as a result one unit’s Received November 2018
treatment influences the outcome of another unit. Over the last decade, a significant progress has been Accepted May 2020
made toward causal inference in the presence of such interference between units. Researchers have shown
that the two-stage randomization of treatment assignment enables the identification of average direct KEYWORDS
and spillover effects. However, much of the literature has assumed perfect compliance with treatment Complier average causal
effects; Encouragement
assignment. In this article, we establish the nonparametric identification of the complier average direct and design; Program evaluation;
spillover effects in two-stage randomized experiments with interference and noncompliance. In particular, Randomization inference;
we consider the spillover effect of the treatment assignment on the treatment receipt as well as the Spillover effects; Two-stage
spillover effect of the treatment receipt on the outcome. We propose consistent estimators and derive least squares
their randomization-based variances under the stratified interference assumption. We also prove the exact
relationships between the proposed randomization-based estimators and the popular two-stage least
squares estimators. The proposed methodology is motivated by and applied to our own randomized
evaluation of India’s National Health Insurance Program (RSBY), where we find some evidence of spillover
effects. The proposed methods are implemented via an open-source software package. Supplementary
materials for this article, including a standardized description of the materials available for reproducing the
work, are available as an online supplement.

1. Introduction Unfortunately, the ITT analysis is unable to tell, for example,


whether a small causal effect arises due to ineffective treatment
Early methodological research on causal inference has assumed
or low compliance. While researchers have developed methods
no interference between units (e.g., Neyman 1923; Fisher 1935;
to deal with noncompliance (e.g., Angrist, Imbens, and Rubin
Holland 1986; Rubin 1990). That is, spillover effects are assumed
1996), they are based on the assumption of no interference
to be absent. In many social science experiments, however,
subjects often interact with each other and as a result one between units. This assumption may be unrealistic since there
unit’s treatment influences the outcome of another unit. Over are multiple ways in which spillover effects could arise. For
the last decade, a significant progress has been made toward example, one unit’s treatment assignment may influence another
causal inference in the presence of such interference between unit’s decision to receive the treatment. It is also possible that
units (e.g., Sobel 2006; Rosenbaum 2007; Hudgens and Halloran one’s treatment receipt affects the outcomes of other units.
2008; Tchetgen Tchetgen and VanderWeele 2010; Aronow 2012; In this article, we show how to analyze two-stage random-
Vanderweele et al. 2013; Liu and Hudgens 2014; Hong 2015; ized experiments with both interference and noncompliance
Forastiere, Airoldi, and Mealli 2016; Aronow and Samii 2017; (Section 3). In an influential paper, Hudgens and Halloran
Athey, Eckles, and Imbens 2018; Baird et al. 2018; Basse and (2008) proposed two-stage randomized experiments as a gen-
Feller 2018). eral approach to causal inference with interference. We extend
Much of this literature, however, has not addressed another their framework so that it is applicable even in the presence of
common feature of social science experiments where some con- two-sided noncompliance. In particular, we define the complier
trol units decide to take the treatment while others in the treat- average direct and spillover effects, propose consistent esti-
ment group refuse to receive one. Such noncompliance often mators, and derive their randomization-based variances under
occurs in these experiments because for ethical and logistical the stratified interference assumption. Like Aronow (2012), we
reasons, researchers typically cannot force experimental sub- follow Hudgens and Halloran (2008) by referring to the effect of
jects to adhere to experimental protocol. The existing methods one’s own treatment as a direct effect and the effect of another
either assume perfect compliance with treatment assignment unit’s treatment as a spillover effect. In a closely related work-
or focus on intention-to-treat (ITT) analyses by ignoring the ing paper, Kang and Imbens (2016) also analyzed two-stage
information about actual receipt of treatment. randomized experiments with interference and noncompliance.

CONTACT Zhichao Jiang [email protected] Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA 01003.
Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA.
These materials were reviewed for reproducibility.
© 2020 American Statistical Association
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 633

We consider a more general pattern of interference by allowing insurance scheme was cashless, with the plan paying providers
for the spillover effect of the treatment assignment on the treat- directly rather than reimbursing beneficiaries for expenses. The
ment receipt as well as the spillover effect of the treatment receipt plan also covered INR 100 (or approximately USD 5.77 using
on the outcome. the OECD’s purchasing-power parity adjusted exchange rate
Finally, we prove the exact relationships between the pro- of INR 17.34/USD for 2013) of transportation costs per hos-
posed randomization-based estimators and the popular two- pitalization. The coverage lasted one year starting the month
stage least squares estimators as well as those between their after the first enrollment in a particular district, but was often
corresponding variance estimators. Our results build upon and extended without cost to beneficiaries. The insurance plan was
extend the work of Basse and Feller (2018) to the case with provided by private insurance companies, but the premium was
noncompliance. We also conduct simulation studies to investi- paid by the government. In Karnataka, the state in which the
gate the finite sample performance of the confidence intervals randomized evaluation was conducted, premiums were roughly
based on the proposed variance estimators (see Appendix D INR 200 (USD 11.53) per year during the study. Households
in the supplementary materials). The proposed methods are only had to pay INR 30 (USD 1.73) per year user fee to obtain
implemented via an open-source software package, experiment an insurance card. There were no deductibles or co-payments
(Imai and Jiang 2018), which is available at https://cran.r-project. and there was an annual cap of INR 30,000 (USD 1,783) per
org/package=experiment. household.
The proposed methodology is motivated by our own We conducted a randomized controlled trial to determine
randomized evaluation of Indian’s Health Insurance Scheme whether RSBY increases access to hospitalization, and thus
(known by the acronym RSBY), a study that employed the two- health, and reduced impoverishment due to high medical
stage randomized design. In Section 2, we briefly describe the expenses. The findings are policy-relevant because the Indian
background and experimental design of this study. In Section 4, government has announced a new scheme called the National
we apply the proposed methodology to this study. We present Health Protection Scheme (known by the acronym PMJAY)
some evidence concerning the existence of positive spillover that seeks to build on RSBY to provide coverage for nearly 500
effects of treatment assignment on the enrollment in the RSBY. million Indians, but has not yet finalized its design or how much
In addition, we estimate the complier average direct effect to fund it.
(CADE) to be positive under the “low” treatment assignment In this evaluation, spillover effects are of concern because
mechanism, where fewer households in a village are encouraged formal insurance may crowd out informal insurance, which is a
to enroll in the insurance program. Finally, Section 5 concludes. substitute method of smoothing health care shocks (e.g., Jowett
2003; Lin, Liu, and Meng 2014). That is, the enrollment in RSBY
by one household may depend on the treatment assignment of
2. A Motivating Empirical Application other households. In addition, we also must address noncompli-
In this section, we describe the randomized evaluation of the ance because some households in the treatment group decided
Indian health insurance program, which serves as our motivat- not to enroll in RSBY while others in the control group managed
ing empirical application. We provide a brief background of the to join the insurance program.
evaluation and introduce its experimental design.1
2.2. Experimental Design
2.1. Randomized Evaluation of the Indian Health
Our evaluation study is based on a total of 11,089 above poverty
Insurance Program
line (APL) households in two districts of Karnataka state who
Each year, 150 million people worldwide face financial catastro- had no pre-existing health insurance coverage and lived within
phe due to spending on health. According to a 2010 study, more 25 km of an RSBY empaneled hospital. We selected APL house-
than one third of them live in India (Shahrawat and Rao 2011). holds because they are not otherwise eligible for RSBY, but are
Almost 63 million Indians fall below the poverty line (BPL) candidates for any expansion of RSBY. The two districts were
due to health spending (Berman, Ahuja, and Bhandari 2010). Gulbarga and Mysore, which are economically and culturally
In 2008, the Indian government introduced its first national, representative of central and southern India, respectively. We
public health insurance scheme, Rastriya Swasthya Bima Yojana required proximity to a hospital as hospital insurance has little
(RSBY), to address the problem. Its aim was to provide coverage value if there is no local hospital at which to use the insurance.
for hospitalization to its BPL population, comprising roughly As shown in Table 1, we employed a two-stage randomized
250 million persons. The program ran from 2013 to 2019. design to study both direct and spillover effects of RSBY. In
RSBY provided access to an insurance plan that covered the first stage, randomly selected 219 villages were assigned to
inpatient hospital care for up to five members of each household.
The plan covered all pre-existing diseases and there was no
age limit of the beneficiaries. The rates of most surgical proce- Table 1. Two-stage randomization design.
dures were fixed by the government. Beneficiaries could obtain Village-level arms Household-level arms
treatment at any hospital empaneled in the RSBY network. The Mechanisms Number of Treatment Control Number of Enrollment
Mechanisms villages Treatment Control households rates
1
For a more detailed description of the design, see the preanalysis plan High 219 80% 20% 5714 67.0%
posted on the American Economic Association’s Registry at https://www. Low 216 40% 60% 5373 46.2%
socialscienceregistry.org/trials/1793
634 K. IMAI, Z. JIANG, AND A. MALANI

the “High” treatment assignment mechanism whereas the rest (Aj = 0) indicates that a high (low) proportion of units are
of villages were assigned to the “Low” treatment assignment assigned to the treatment within cluster j. In our application,
mechanism. In the second stage, under the “High” assignment Aj = 1 corresponds to the treatment assignment probability
mechanism, 80% of the households within a cluster are com- of 80%, whereas Aj = 0 represents 40%. We assume complete
pletely randomly assigned to the treatment condition, whereas randomization, in which a total of Ja clusters are assigned to
the rest of households were assigned to the control group. In the assignment mechanism a for a = 0, 1 with J0 + J1 = J.
contrast, under the “Low” assignment mechanism, 40% of the Finally, A = (A1 , A2 , . . . , AJ ) denotes the vector of treatment
households within a cluster are completely randomly assigned to assignment mechanisms for all clusters.
the treatment condition. The households in the treatment group The second stage of randomization concerns the treatment
are given RSBY essentially for free, whereas some households in assignment for each unit within cluster j based on the assign-
the control group were able to buy RSBY at the government price ment mechanism Aj . Let Zij be the binary treatment assignment
of roughly INR 200.2 variable for unit i in cluster j where Zij = 1 (Zij = 0) implies
Households were informed of the assigned treatment con- that the unit is assigned to the treatment (control) condition. Let
ditions and were given the opportunities to enroll in RSBY Zj = (Z1j , . . . , Znj j ) denote the vector of assigned treatments for
from April to May 2015. Approximately 18 months later, we the nj units in cluster j and Pr(Zj = z j | Aj = a) represent the
carried out a post-treatment survey and measured a variety distribution of the treatment assignment vector when cluster j is
of outcomes. Policy makers are interested in the health and assigned to the assignment mechanism Aj = a. We assume the
financial effects of RSBY. To evaluate the efficacy of RSBY, we complete randomization such that a total of njz units in cluster
must estimate the effects of actual treatment receipt as well j are assigned to the treatment condition z for z = 0, 1, where
as the ITT effects because some households in the treatment nj1 + nj0 = nj .
group may not enroll in RSBY while others in the control group
may do so. Assumption 1 (Two-stage randomization).
1. Complete randomization of treatment assignment mecha-
3. The Proposed Methodology nism at the cluster level:
1
In this section, we first review the ITT analysis of two-stage Pr(A = a) =  J 
randomized experiments proposed by Hudgens and Halloran J1
(2008) and others. We then introduce a new causal quantity
for all a such that 1 = J1 where 1J is the J dimensional
J a
of interest, the CADE, present a nonparametric identification
vector of ones.
result, and propose a consistent estimator. We further consider
2. Complete randomization of treatment assignment within
the identification and inference of the CADE under the assump-
each cluster:
tion of stratified interference, and derive the randomization-
based variance of the proposed estimator. We also establish 1
Pr(Zj = z | Aj = a) =  nj 
the direct connections between these randomization-based esti- nj1
mators and the two-stage least squares estimators. Finally, we
present analogous results for another new causal quantity, the for all z such that 1
nj z = nj1 .
complier average spillover effect (CASE), in Appendix A in the
supplementary materials. Following the literature, we adopt the finite population
framework, in which potential outcomes are treated as constants
and randomness comes from treatment assignment alone. We
3.1. Two-Stage Randomized Experiments consider two-stage randomized experiments with noncompli-
We consider a two-stage randomized experiment (Hudgens and ance, in which the actual receipt of treatment may differ from the
Halloran 2008) with a total of N units and J clusters where each treatment assignment. Let Dij be the treatment receipt for unit i
unit belongs to one of the clusters. We use nj to denote the in cluster j and Dj = (D1j , . . . , Dnj j ) be the vector of treatment
 receipts for the nj units in the cluster. The outcome variable Yij
number of units in cluster j with N = Jj=1 nj . In a two-stage
is observed for each unit and Y j = (Y1j , . . . , Ynj j ) denotes the
randomized experiment, we first randomly assign each cluster vector of observed outcomes for the nj units in cluster j.
to one of the treatment assignment mechanisms, which in turn We use the potential outcomes framework of causal inference
assigns different proportions of units within each cluster to the (e.g., Neyman 1923; Holland 1986; Rubin 1990). For unit i in
treatment condition. For the sake of simplicity, we consider two cluster j, let Dij (z) represent the potential value of treatment
assignment mechanisms indicated by Aj ∈ {0, 1} where Aj = 1 receipt, when the treatment assignment vector for all N units in
the experiment equals z. In addition, we use Yij (z; d) to denote
2
For the sake of simplicity, we analyze this dichotomized assignment. In the the potential outcome, when the treatment assignment vector
original experiment, households could be assigned to any of four groups; equals z and treatment receipt vector equals d. Lastly, let Yij (z)
Group A was given RSBY for free, Group B was given RSBY for free and a cash
transfer equal to the premium on insurance, Group C was sold RSBY for the represent the potential value of outcome when the treatment
same premium as the government paid for RSBY coverage, and Group D assignment vector equals z, that is, Yij (z) = Yij (z; Dij (z)). The
had no intervention. Here, the High assignment group consists of villages observed values of treatment receipt and outcome are given by
where 80% of households were assigned to Groups A and B, whereas
the Low assignment group consists of villages with 60% of households Dij = Dij (Z) and Yij = Yij (Z) where Z is the N dimen-
assigned to Groups C and D. sional vector of treatment assignment for all units. If there
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 635

were no restriction on the pattern of interference, each unit has We define the ITT effects, starting with the average direct
2N potential values of treatment receipt and outcome, making effect of treatment assignment on the treatment receipt and
identification infeasible. Hence, following the literature (e.g., outcome under the treatment assignment mechanism a, as
Hong and Raudenbush 2006; Sobel 2006; Hudgens and Halloran DEDij (a) = Dij (1, a) − Dij (0, a),
2008), we only allow interference within each cluster.
DEYij (a) = Y ij (1, a) − Y ij (0, a),
Assumption 2 (Partial interference). where DED and DEY stand for the average direct effect on D
  and Y, respectively. These parameters quantify how the treat-
Yij (z) = Yij (z ) and Dij (z) = Dij (z )
ment assignment of a unit may affect its treatment receipt and
for all z and z  with z j = z j . outcome by averaging the treatment assignments of other units
within the same cluster under a specific assignment mechanism.
Assumption 2 implies that although the treatment receipt Finally, averaging these unit-level quantities gives the following
and outcome of a unit can be influenced by the treatment assign- average direct effects of treatment assignment for each cluster
ment of another unit within the same cluster, they cannot be and for the entire (finite) population,
affected by units in other clusters. This assumption substantially nj
1  1 
J
reduces the number of potential values of treatment receipt and
DEDj (a) = DEDij (a), DED(a) = nj DEDj (a),
outcome for each unit in cluster j from 2N to 2nj . nj N
i=1 j=1
nj
1  1 
J
3.2. Intention-to-Treat Effects: A Review DEYj (a) = DEYij (a), DEY(a) = nj DEYj (a).
nj N
i=1 j=1
We next review the previous results about the ITT analysis of
two-stage randomized experiments under the partial interfer- Another quantity of interest is the spillover effect, which
ence assumption (Hudgens and Halloran 2008). Our analysis quantifies how one unit’s treatment receipt or outcome is
differs from the existing ones in that we weight each unit equally affected by other units’ treatment assignments. Following
instead of giving an equal weight to each cluster as done in the Halloran and Struchiner (1995), we define the unit-level
literature. spillover effects on the treatment receipt and outcome as,
SEDij (z) = Dij (z, 1) − Dij (z, 0),
3.2.1. Causal Quantities of Interest
We begin by defining preliminary average quantities. First, we SEYij (z) = Y ij (z, 1) − Y ij (z, 0),
define the average potential value of treatment receipt for unit i which compare the average potential values under two different
in cluster j when the unit is assigned to the treatment condition assignment mechanisms, that is, a = 1 and a = 0, while holding
z under the treatment assignment mechanism a. We do so by one’s treatment assignment at z. We then define the spillover
averaging over the distribution of treatment assignments for the effects on the treatment receipt and outcome at the cluster and
other units within the same cluster, population levels,
 nj
Dij (z, a) = Dij (Zij = z, Z−i,j = z −i,j ) 1  1 
J

z −i,j ∈Z−i,j SEDj (z) = SEDij (z), SED(z) = nj SEDj (z),


nj N
i=1 j=1
Pr(Z−i,j = z −i,j | Zij = z, Aj = a), nj
1  1 
J
where Z−i,j = (Z1i , . . . , Zi−1,j , Zi+1,j , . . . , Znj j ) represents the SEYj (z) = SEYij (z), SEY(z) = nj SEYj (z).
(nj − 1) dimensional subvector of Zj with the entry for unit i nj N
i=1 j=1
removed and Z−i,j = {(z1j , . . . , zi−1,j , zi+1,j , . . . , znj j ) | zi j ∈ The quantities defined above differ from those introduced
{0, 1} for i = 1, . . . , i − 1, i + 1, . . . , nj } is the set of all possible in the literature in that we equally weight each unit (see Basse
values of the assignment vector Z−i,j . Similarly, we define the and Feller 2018). In contrast, Hudgens and Halloran (2008) gave
average potential outcome for unit i in cluster j as, an equal weight to each cluster regardless of its size. While our
 analysis focuses on the individual-weighted estimands rather
Y ij (z, a) = Yij (Zij = z, Z−i,j = z −i,j )
than cluster-weighted estimands, our method can be general-
z −i,j ∈Z−i,j
ized to any weighting scheme, and as such the proofs in the
Pr(Z−i,j = z −i,j | Zij = z, Aj = a). supplementary appendix are based on general weights.
Given these unit-level average values of potential outcomes, Finally, in actual policy implementations, the treatment
we consider the cluster-level and population-level average assignment is typically based on a deterministic criterion
potential values of treatment receipt and outcome, rather than randomization, suggesting that the causal quantities
discussed above may not be of direct interest to policy makers.
nj
1  1 
J Even in this situation, however, these causal quantities can
Dj (z, a) = Dij (z, a), D(z, a) = nj Dj (z, a), provide some policy implications by telling us whether or not
nj N
i=1 j=1 spillover effects exist at all. We discuss this issue in the context
nj
1  1  of our application (see Section 4) and consider a model-based
J
Y j (z, a) = Y ij (z, a), Y(z, a) = nj Y j (z, a). approach to further address this point (see Appendix E in the
nj N
i=1 j=1 supplementary materials).
636 K. IMAI, Z. JIANG, AND A. MALANI

3.2.2. Nonparametric Identification cluster. Thus, the compliance status of a unit is a function of the
Hudgens and Halloran (2008) established the nonparametric treatment assignment of other units in the same cluster,
identification of the ITT effects, which equally weight each clus-
Cij (z −i,j ) = I{Dij (1, z −i,j ) = 1, Dij (0, z −i,j ) = 0}. (1)
ter regardless of its size. Here, we present analogous results by
weighting each unit equally as done above. Define the following We consider a measure of compliance behavior for each unit
quantities, by averaging over the distribution of treatment assignments of
J the other units within the same cluster under the treatment
1 
j=1 nj Dj (z, a)I(Aj = a) assignment mechanism a. This general measure of compliance

D(z, a) =
N
1 J
, behavior ranges from 0 to 1 and is defined as,
J j=1 I(A j = a) 
1 J  Cij (z −i,j ) = Cij (z −i,j ) Pr(Z−i,j = z −i,j | Aj = a) (2)
j=1 nj Yj (z, a)I(Aj = a)

Y(z, a) =
N
 , z −i,j ∈Z−i,j
1 J
J j=1 I(Aj = a) for a = 0, 1. Given this compliance measure, we now define
where the CADE as the average direct effect of treatment assignment
among compliers,
n j
i=1 Dij I(Zij = z) J nj

Dj (z, a) = n j , j=1 i=1 CDYij (a)
i=1 I(Zij = z)
CADE(a) = J nj ,
n j j=1 i=1 Cij (z −i,j )
i=1 Yij I(Zij = z)

Yj (z, a) = n j . where
i=1 I(Zij = z) 
CDYij (a) = {Yij (1, z −i,j ) − Yij (0, z −i,j )}
Then, we can obtain the unbiased estimators of the direct effects z −i,j ∈Z−i,j
and the spillover effects.
× Cij (z −i,j ) Pr(Z−i,j = z −i,j | Aj = a).
Theorem 1 (Unbiased estimation of the ITT effects). Define the The definition requires that there exists at least one complier in
following estimators, the population. If units do not influence each other, we have
Yij (zij , z −i,j ) = Yij (zij ) and Dij (zij , z −i,j ) = Dij (zij ). Hence,

DED(a) = 
D(1, a) − 
D(0, a), 
SED(z) = 
D(z, 1) − 
D(z, 0), the compliance status for each unit in Equations (1) and (2)

DEY(a) =Y(1, a) − 
Y(0, a), 
SEY(z) = 
Y(z, 1) − 
Y(z, 0). no longer depends on the treatment assignment of the other
units. As a result, under this setting, the CADE equals the finite
Under Assumptions 1 and 2, these estimators are unbiased for sample version of the complier average causal effect defined
the ITT effects, in Angrist, Imbens, and Rubin (1996). Finally, in the absence
of noncompliance, that is, Cij (z −i,j ) = 1 for all z −i,j and

E{DED(a)} = DED(a), 
E{SED(z)} = SED(z), i, j, then CADE(a) asymptotically equals DEY(a) as the cluster

E{DEY(a)} = DEY(a), 
E{SEY(z)} = SEY(z). size grows.
The CADE combines two causal pathways: a unit’s treatment
Proof is straightforward and hence omitted. assignment Zij can affect its outcome Yij either through its
own treatment receipt Dij or that of the other units D−i,j =
(D1j , . . . , Di−1,j , Di+1,j , . . . , Dnj j ). If there is either no spillover
3.3. Complier Average Direct Effects
effect of encouragement on treatment receipt or no spillover
We now address the issue of noncompliance in the presence of effect of treatment receipt on outcome, then the second causal
interference between units. In a seminal paper, Angrist, Imbens, pathway no longer exists. Under this scenario, the CADE corre-
and Rubin (1996) showed how to identify the complier aver- sponds to the average direct effect of one’s own treatment receipt
age causal effect (CACE) in standard randomized experiments among compliers because the treatment assignment is the same
under the assumption of no interference. The CACE represents as the treatment receipt. In contrast, when both types of spillover
the average effect of treatment receipt among the compliers who effects exist, the CADE includes the indirect effect of one’s own
would receive the treatment only when assigned to the treatment encouragement on the outcome through the treatment receipt
condition. Below, we introduce the CADE, which is a general- of other units in the same village as well as the direct effect
ization of the CACE to settings with interference, and show how of one’s own treatment receipt on the outcome. Unfortunately,
to nonparametrically identify and consistently estimate it using without additional assumptions, the CADE is not identifiable.
the data from two-stage randomized experiments. We therefore propose a set of assumptions for nonparametric
identification. In addition, Appendix E.2 in the supplementary
3.3.1. Causal Quantity of Interest materials considers a model-based approach to the identifica-
We first generalize the definition of compliers to settings with tion and estimation for further distinguishing the two causal
interference between units. Under the assumption of no inter- pathways.
ference, compliers are those who receive the treatment only
when assigned to the treatment condition. However, in the 3.3.2. Nonparametric Identification
presence of partial interference, the treatment receipt is also To establish the nonparametric identification of the CADE,
affected by the treatment assignment of other units in the same we begin by generalizing the exclusion restriction of
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 637

Angrist, Imbens, and Rubin (1996), which assumes no inter- is plausible in our application because the encouragement is
ference between units. expected to increase the enrollment in the RSBY.
In the absence of interference between units, exclusion
Assumption 3 (Exclusion restriction with interference between restriction and monotonicity are sufficient for the nonpara-
units). metric identification of the complier average causal effect.
However, when interference exists, an additional restriction
Yij (z j ; dj ) = Yij (z j ; dj ) for any z j , z j and dj . on the interference structure is necessary. The reason is that
there are two types of possible spillover effects: the spillover
Assumption 3 states that the outcome of a unit does not
effect of treatment assignment on the treatment receipt and
depend on the treatment assignment of any unit within the same
the spillover effect of treatment receipt on the outcome.
cluster (including itself) so long as the treatment receipt for
As a result, even under exclusion restriction, the treatment
all the units of the cluster remains identical. In other words,
assignment of a noncomplier can still affect its outcome
the outcome of a unit depends only on the treatment receipt
through the treatment receipts of other units within in the same
vector of all units within its own cluster. The assumption is
cluster.
violated if the outcome of one unit is influenced by its own
To address this problem, we propose the following identifi-
treatment assignment or that of another unit within the same
cation assumption.
cluster even when the treatment receipts of all the units in the
cluster including itself are held constant. In our application, the Assumption 5 (Restricted interference under noncompliance).
assumption is plausible since the encouragement to enroll in the For any unit i in cluster j, if Dij (1, z −i,j ) = Dij (0, z −i,j ) for some
RSBY is unlikely to affect the hospital expenditure other than z −i,j ∈ Z−i,j , then Yij (Dj (1, z −i,j )) = Yij (Dj (0, z −i,j )) holds.
through the actual enrollment itself.
Under Assumption 3, we can write the potential outcome The assumption states that if the treatment receipt of a unit is
as the function of treatment receipt alone, Yij (dj ). Thus, the not affected by its own treatment assignment (i.e., the unit is a
observed outcome is written as Yij (Dj ) where Dj = Dj (Zj ). We noncomplier), then its outcome should also not be affected by
maintain Assumption 3 for the remainder of the article. To avoid its own treatment assignment through the treatment receipts
confusion, we will explicitly write out the treatment receipt as of other units in the same cluster. Although Assumption 5
the argument of potential outcome. For example, Yij (Dj = 1nj ) appears to be concerned only with the spillover effects of treat-
represents the potential outcome when Dij = 1 for j = 1, . . . , nj , ment receipt on the outcome, its plausibility also depends on
while Yij (1nj ) represents the potential outcome when Zij = 1 for the spillover effects of treatment assignment on the treatment
j = 1, . . . , nj . receipt.
We next generalize the monotonicity assumption of Angrist, To facilitate the understanding of this assumption, we con-
Imbens, and Rubin (1996). sider the following three scenarios under which Assumption 5
is satisfied. First, assume no spillover effect of treatment receipt
Assumption 4 (Monotonicity with interference between units). on the outcome (Scenario I of Figure 1(a)),
Dij (1, z −i,j ) ≥ Dij (0, z −i,j ) for all z −i,j ∈ Z−i,j .
Yij (dij , d−i,j ) = Yij (dij , d−i,j ) for dij = 0, 1, and any d−i,j , d−i,j .
The assumption states that being assigned to the treatment (3)
condition never negatively affects the treatment receipt of a
unit, regardless of how the other units within the same cluster Testable conditions for this scenario are given in Appendix B.1
are assigned to the treatment/control conditions. Assumption 4 in the supplementary materials.

Figure 1. Three scenarios that imply Assumption 5: (a) no spillover effect of the treatment receipt on the outcome; (b) no spillover effect of the treatment assignment on
the treatment receipt; (c) if treatment assignment of a unit does not affect its own treatment receipt, then it should not affect other units’ treatment receipts, that is, the
dotted edges do not exist.
638 K. IMAI, Z. JIANG, AND A. MALANI

Second, suppose that the treatment assignment has no Theorem 2 (Nonparametric identification and consistent estima-
spillover effect on the treatment receipt (Scenario II of tion of the CADE).
Figure 1(b)),
1. Under Assumptions 1–5, we have
Dij (zij , z −i,j ) = Dij (zij , z −i,j ) for zij = 0, 1, and any z −i,j , z −i,j .
DEY(a)
(4) lim = lim CADE(a).
nj →∞ DED(a) nj →∞
Such an assumption is made by Kang and Imbens (2016) in
the context of online experiments, in which the assignment 2. Suppose that the outcome is bounded and the restriction on
of treatment (e.g., social media messaging) can be individual- interference in Sävje, Aronow, and Hudgens (2017) holds for
ized but units may interact with each other once they receive both the treatment receipt and the outcome. Then, as both
the treatment. We can test this scenario by estimating SED(1) the cluster size nj and the number of clusters J go to infinity,
and SED(0). we can consistently estimate the CADE,
Third, we can weaken the condition in Equation (4) by 
DEY(a)
considering an alternative condition that if a unit’s treatment plim = lim CADE(a)
receipt is not affected by its own treatment assignment (i.e., the 
nj →∞,J→∞ DED(a) nj →∞,J→∞

unit is a noncomplier), then the treatment assignment of this


for each a = 0, 1.
unit has no effect on the treatment receipts of the other units in
the same cluster (the absence of dotted edges in Scenario III of The CADE is nonparametrically identifiable as the cluster
Figure 1(c)), size and the number of clusters tend to infinity, and can be
if Dij (1, z −i,j ) = Dij (0, z −i,j ), then consistently estimated by the ratio of two estimated ITT effects.
D−i,j (1, z −i,j ) = D−i,j (0, z −i,j ). The asymptotic properties are derived within the finite popula-
tion framework, approximating the sampling distribution of an
In our application, this scenario is violated, for example, if a estimator by embedding it in an asymptotically stable sequence
household that already has insurance and is not going to be of finite populations (Hájek 1960; Lehmann 2004).
affected by the encouragement influences the enrollment deci-
sion of another household by recommending the RSBY to it.
To increase the plausibility of this scenario in our application, 3.4. Stratified Interference
we excluded all the households with pre-existing insurance
Unfortunately, as pointed out by Hudgens and Halloran (2008),
from the experiment. As a result, this scenario is plausible
a valid estimator of the variances of these ITT effect estimators
because one’s encouragement is expected to have a much greater
is unavailable without an additional assumption. Hudgens and
influence on his/her own enrollment than the enrollment of
Halloran (2008) relied upon the stratified interference assump-
another unit.
tion that the outcome of one unit depends on the treatment
Although all three scenarios above satisfy Assumption 5, the
assignment of other units only through the number of those who
interpretation of the CADE is different. In particular, under
are assigned to the treatment condition within the same cluster.
Scenarios I and II, we can interpret the CADE as the average
In other words, what matters is the number of units rather than
direct effect of one’s own treatment receipt on the outcome
which units are assigned to the treatment condition.
among compliers. In contrast, under Scenario III, the CADE
We assume that stratified interference applies to both the
also includes the average direct effect of one’s own encourage-
outcome and treatment receipt.
ment on the outcome through the treatment receipts of other
units. Nevertheless, this combined direct effect of encourage- Assumption 6 (Stratified interference).
ment may be of interest to policy makers because most govern-
ment programs including the RSBY are based on the encourage- Dij (z j ) = Dij (z j ) and Yij (z j ) = Yij (z j ) if zij = zij and
ment design. In Appendix E.2 in the supplementary materials, nj nj
 
we address this issue using a model-based approach. zij = zij .
The next theorem establishes the nonparametric identifica- i=1 i=1
tion of the CADE as the cluster size tends to infinity. Under
Assumptions 1–5, we show that in the limit, the CADE equals In our application, stratified interference for the treatment
the ratio of the average direct effects of treatment assignment receipt requires that the enrollment decisions of households
on the outcome and on the treatment receipt while holding the depend only on their own encouragement and the number of
treatment assignment mechanism fixed. Although the unbiased encouraged households in their village. Under the assumption of
estimation of DEY(a) and DED(a) is readily available (Hudgens no spillover effect of treatment receipt on the outcome, stratified
and Halloran 2008), for the consistent estimation of the CADE, interference for the outcome holds so long as it is applicable
we need an additional restriction on the structure of interfer- to the treatment receipt. However, for more general scenarios,
ence. We follow Sävje, Aronow, and Hudgens’s (2017) result Assumption 6 may not be satisfied for the outcome even if it
on the consistency of average causal effect in finite population holds for the treatment receipt.
framework, and assume that the average amount of interference
per unit does not grow proportionally to the cluster size (see 3.4.1. Nonparametric Identification
Appendix B.2 in the supplementary materials for a proof of the Under Assumption 6, we can simplify the CADE because the
theorem and the details). number of the units assigned to the treatment condition in each
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 639

cluster is fixed given treatment assignment mechanism. This Thus, the treatment assignment can affect its outcome either
implies that we can write Dij (z j ) and Yij (z j ) as Dij (z, a) and directly through its own treatment or indirectly through the
Yij (z, a), respectively, and as a result CADE(a) equals treatment receipts of the other units in the same cluster.
For noncompliers (Dij (1, a) = Dij (0, a) = d), the exclusion
CADE(a) restriction implies,
J nj
j=1 i=1 {Yij (1, a) − Yij (0, a)}I{Dij (1, a) − Dij (0, a) = 1} Yij (Zij = 1, a) − Yij (Zij = 0, a)
= J nj ,
j=1 i=1 I{Dij (1, a) − Dij (0, a) = 1} = Yij (Dij = dij , D−i,j (Zij = 1, a))
where the complier status can also be simplified as a function − Yij (Dij = dij , D−i,j (Zij = 0, a)). (7)
of assignment mechanism alone, that is, Cij (a) = I{Dij (1, a) = The treatment assignment affects its own outcome only through
1, Dij (0, a) = 0}. the treatment receipt of the other units in the same cluster.
We now present the results on nonparametric identification Furthermore, Assumption 5 implies Yij (Dij = dij , D−i,j (Zij =
and consistent estimation under stratified interference. 1, a)) = Yij (Dij = dij , D−i,j (Zij = 0, a)). Under this assumption,
Equation (7) equals zero, implying NADE(a) = 0 and the
Theorem 3 (Nonparametric identification and consistent estima-
identification of CADE(a).
tion of the CADE under stratified interference). Suppose that the
outcome is bounded. Then, under Assumptions 1–6, we have
3.4.3. Randomization-Based Variances

DEY(a) We derive the randomization-based variances of the proposed
lim CADE(a) = plim estimators within the finite population framework, in which
nj →∞,J→∞ 
nj →∞,J→∞ DED(a)
the uncertainty comes solely from the two-stage randomization.
for a = 0, 1. As shown by Hudgens and Halloran (2008) in the context of
ITT analysis, stratified interference enables the estimation of
Proof is in Appendix B.4 in the supplementary materials. variance. Here, we first derive the variances of the ITT effect
Under the stratified interference assumption, the consistent esti- estimators and then derive the variance of the proposed CADE
mation of CADE no longer requires the restrictions on interfer- estimator. We begin by defining the following quantities,
ence in Sävje, Aronow, and Hudgens (2017). nj
1 
σj2 (z, a) = {Yij (z, a) − Y j (z, a)}2 ,
3.4.2. Effect Decomposition nj − 1
i=1
Under stratified interference, we can decompose the average J 
 2
direct effect of treatment assignment as the sum of the average 2 1 nj J
σDE (a) = DEYj (a) − DEY(a) ,
direct effects for compliers and noncompliers, J−1 N
j=1
nj
DEY(a) = CADE(a) · πc (a) + NADE(a) · {1 − πc (a)}, (5) 1 
ωj2 (a) = {Yij (1, a) − Yij (0, a)}
where NADE(a) is the non-CADE and is defined as, nj − 1
i=1
2
NADE(a) − {Y j (1, a) − Y j (0, a)} ,
J nj
i=1 {Yij (1, a) − Yij (0, a)}I{Dij (1, a) = Dij (0, a)}
=
j=1
J nj , where σj2 (z, a) is the within-cluster variance of potential out-
j=1 i=1 I{Dij (1, a) = Dij (0, a)} 2 (a) is the between-cluster variance of DEY (a), and
comes, σDE ij
and the proportion of compliers is given by, 2
ωj (a) is the within-cluster variance of DEYij (a). Using this
nj
notation, we give the results for the ITT effects of treatment
1 
J
assignment on the outcome. The results for the ITT effects of
πc (a) = I{Dij (1, a) = 1, Dij (0, a) = 0}.
N treatment assignment on the treatment receipt can be obtained
j=1 i=1
in the same way.
According to the exclusion restriction given in Assumption 3,
for compliers with Dij (1, a) = 1 and Dij (0, a) = 0, we can write Theorem 4 (Randomization-based variances of the ITT effect
the unit-level direct effect on the outcome as the sum of the estimators). Under Assumptions 1, 2, and 6, we have
direct effect through its own treatment receipt and the indirect   2
 Ja σDE (a)
effect through the treatment receipts of other units within the var DEY(a) = 1−
J Ja
same cluster, 
1 
J
nj J
+ var  j (a) | Aj = a ,
DEY
Yij (Zij = 1, a) − Yij (Zij = 0, a) Ja J N
j=1
= {Yij (Dij = 1, D−i,j (Zij = 1, a))
− Yij (Dij = 0, D−i,j (Zij = 1, a))} where
+ {Yij (Dij = 0, D−i,j (Zij = 1, a)) σj2 (1, a) σj2 (0, a) ωj2 (a)
 j (a) | Aj = a =
var DEY + − .
− Yij (Dij = 0, D−i,j (Zij = 0, a))}. (6) nj1 nj0 nj
640 K. IMAI, Z. JIANG, AND A. MALANI

Proof is given in Appendix B.5 in the supplementary materi- estimator, its variance blows up when DED is close to zero.
als. Because we cannot observe Yij (1, a) and Yij (0, a) simultane- This is similar to the weak instrument problem in the standard
ously, no unbiased estimator exists for ωj2 (a), implying that no instrumental variable settings.
unbiased estimation of the variances is possible. Thus, follow- We obtain the following variance estimator by replacing each
ing Hudgens and Halloran (2008), we propose a conservative term in the brackets with its conservative estimator,
estimator, 
  2   1
Ja  σDE (a) v
ar CADE(a) = v 
ar DEY(a)
v 
ar DEY(a) = 1 − 
DED(a)2
J Ja
 2  
DEY(a)  
1  nj J 
σj (1, a) 
J 2 2 σj2 (0, a)  
−2  DEY(a),
cov DED(a)
+ + I(Aj = a), (8) 
DED(a)
Ja J N2 nj1 nj0 
j=1  
 2
DEY(a)
+ ar 
v DED(a) , (9)
where 
DED(a)2
n j    
−
i=1 {Yij
Yj (z, a)}2 I(Zij = z) where var DED(a) and cov 
 DEY(a), 
D ED(a) are obtained
σj2 (z, a)
 = ,
njz − 1 by replacing Y with D and the sample variances with the
J  n j J 2
sample covariances in Equation (8), respectively. Similar to
 j (a) − DEY(a)
DEY  I(Aj = a)
j=1 N
2
σDE (a) = . the ITT analysis,
 each of the three terms in the brackets of
Ja − 1 v 
ar CADE(a) is a weighted average of between-cluster and
In Equation (8),  2 (a) represents the between-cluster sample
σDE within-cluster sample variances.
variance, and  2
σj (z, a) is the within-cluster variance in cluster j. Because the expectation of product is generally not equal
Thus, the variance of the ITT direct effect estimator is a weighted to 
the product  of expectations, unlike the ITT analysis,
average of the between-cluster sample variance and the within-  
var CADE(a) is not a conservative variance estimator in
cluster sample variance.
finite samples. In Appendix B.8 in the supplementary materials,
It can be shown that this variance estimator is on average no
however, we show that it is asymptotically conservative. Finally,
less than the true variance,
  to evaluate the robustness of the variance estimator based on
E v 
ar DEY(a) 
≥ var DEY(a) , Assumption 6, we conduct simulation studies and find that the
proposed variance estimator works well so long as the number of
where the inequality becomes equality when the unit-level clusters is relatively large (see Appendix D in the supplementary
direct effect, that is, Yij (1, a) − Yij (0, a), is constant within materials).
each cluster (see Appendix B.7 in the supplementary materials
for a proof). In Appendix B.3 in the supplementary materials,
we provide the asymptotic normality result of the ITT effect 3.5. Connections to Two-Stage Least Squares Regression
estimators under additional regularity conditions based on
In this section, we establish direct connections between the
the finite population central limit theorems in Hájek (1960),
proposed estimator of the CADE and the two-stage least squares
Ohlsson (1989), and Li and Ding (2017). These conditions are
estimator, which is popular among applied researchers. Basse
satisfied for a bounded outcome as the cluster size and the
and Feller (2018) studied the relationships between the ordinary
number of clusters go to infinity. See Chin (2018) for more
least squares and randomization-based estimators for the ITT
refined results on the asymptotic normality of the ITT effect
analysis under a particular two-stage randomized experiment
estimators without stratified interference.
design (see also Baird et al. 2018). Here, we further extend these
We next derive the asymptotic randomization-based vari-
previous results.
ance of the proposed estimator.

Theorem 5 (Randomization-based variance of the CADE esti- 3.5.1. Point Estimates


mator). Under Assumptions 1–6, the asymptotic variance of We begin with the ITT analysis. To account for different cluster

CADE(a) is sizes, we transform the treatment and outcome variables so
  that each unit, rather than each cluster, is equally weighted.
1  DEY(a) 
var DEY(a) − 2 cov DEY(a), 
DED(a) Specifically, we multiply them by the weights proportional to
DED(a)2 DED(a) the cluster size, that is, D∗ij = nj JDij /N and Yij∗ = nj JYij /N (see
DEY(a) 2   Appendix C in the supplementary materials for the results with
+ var 
DED(a) .
DED(a)2 general weights). We consider the following linear models for
the treatment receipt and outcome,
Proof of Theorem 5 is a direct application of the Delta  
D∗ij = γa I(Aj = a) + γ1a Zij I(Aj = a) + ξij , (10)
method based on the asymptotic normality of the ITT effect
a=0,1 a=0,1
estimators shown in Appendix B.3 in the supplementary
 
materials.
 Due to the space
 limitation, we give the expression of Yij∗ = αa I(Aj = a) + α1a Zij I(Aj = a) + ij , (11)

cov DEY(a), 
D ED(a) in Appendix B.6 in the supplementary a=0,1 a=0,1
materials. Because the proposed CADE estimator is a ratio where ξij and ij are error terms.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 641

Unlike the two-step procedure in Basse and Feller (2018), given in Equations (10) and (11) with Xij = (I(Aj = 1), I(Aj =
we fit the weighted least squares regression with the following 0), Zij I(Aj = 1), Zij I(Aj = 0)) . Let X = (X1 , . . . , XJ ) be
inverse probability weights, the entire design matrix, and Wj = diag(w1j , . . . , wnj j ) be the
1 1 weight matrix in cluster j, W = diag(W1 , . . . , WJ ) be the entire
wij = · . (12) weight matrix. We use ˆ j = (ˆ1j , . . . , ˆnj j ) to denote the residual
JAj njZij
vector in cluster j obtained from the weighted least squares fit
The next theorem shows that the resulting weighted least of the model given in Equation (11), and ˆ = (ˆ 1 , . . . , ˆ J ) to
squares estimators are equivalent to the randomization-based represent the residual vector for the entire sample.
ITT effect estimators. Proof is given in Appendix C.1 in the Using the weights, the cluster-robust generalization of HC2
supplementary materials. variance, v arcluster
hc2 ( α wls ), is given by
⎧ ⎫
Theorem 6 (Weighted least squares regression estimators for the ⎨J ⎬
 −1 −1/2 −1/2
γ wls and 
ITT analysis). Let  α wls be the weighted least squares (X WX) Xj Wj P̃j  
 jj P̃j Wj Xj (X WX)−1 ,
⎩ ⎭
estimators of the coefficients in the models given in Equa- j=1

tions (10) and (11), respectively. The regression weights are where Pj is the following “annihilator” matrix,
given in Equation (12). Then,
P̃j = Inj − Wj Xj (X WX)−1 Xj Wj ,
1/2 1/2
wls
γ1a =D γawls = 
ED(a),  D(0, a),
wls
α1a 
= DEY(a), wls 
αa = Y(0, a). with Inj being the nj × nj identity matrix.
It can be shown that varcluster
hc2 ( α1awls ) =  σDE2 (a)/J , represent-
a
For the CADE, we consider the weighted two-stage least ing the between-cluster sample variance. However, as shown in
squares regression where the weights are the same as before and Theorem 4, v 
ar DEY(a) is a weighted average of between-
given in Equation (12). In our setting, the first-stage regression cluster and within-cluster sample variances. Thus, unlike the
model is given by Equation (10) while the second-stage regres- results in Basse and Feller (2018), the cluster-robust HC2 vari-
sion is given by ance no longer equals the randomization-based variance esti-
  mator, because it only takes into account the between-cluster
Yij∗ = βa I(Aj = a) + β1a D∗ij I(Aj = a) + ηij , (13) variance.
a=0,1 a=0,1
To address this problem, we introduce the following individ-
where ηij is an error term. The weighted two-stage least squares ual individual-robust HC2 variance, v arind
hc2 (α )wls , given by
estimators of the coefficients for the model in Equation (13) ⎧ ⎫
can be obtained by first fitting the model in Equation (10) ⎨ J nj ⎬
with weighted least squares and then fitting the model in (X WX)−1 ij∗2 P̃ij−1 Xij Xij (X WX)−1 ,
wij2
⎩ ⎭
Equation (13) again via weighted least squares, in which D∗ij j i=1

is replaced by its predicted values based on the first stage


where P̃ij = 1 − wij Xij (Xj Wj Xj )−1 Xij is the individual anni-
regression model. The following theorem establishes the n j
equivalence between the resulting weighted two-stage least ij∗ = 
hilator and  ij − i =1 i j I(Zi j = z)/njz is the adjusted
squares regression and randomization-based estimators. Proof residuals for Zij = z so that we have Xj  ∗j = 04 . The next the-
is given in Appendix C.2 in the supplementary materials. orem establishes that the weighted average of the cluster-robust
and individual-robust HC2 variance estimators is numerically
Theorem 7 (Weighted two-stage least squares regression estimator equivalent to the randomization-based variance estimator.
for the CADE). Let βaw2sls and β1aw2sls be the weighted two-stage

least squares estimators of the coefficients for the model given in Theorem 8 (Regression-based variance estimators for the ITT
Equation (13). The first stage regression model is given in Equa- effects). The randomization-based variance estimator of the
tion (10), and the regression weights are given in Equation (12). direct effect is a weighted average of the cluster-robust and
Then, individual-robust HC2 variances,
1a
β w2sls 
= CADE(a), aw2sls = 
β 
Y(0, a) − CADE(a) ·
D(0, a).    Ja

Ja ind wls
v 
ar DED(a) = 1 − v
arcluster wls
) + varhc2 (
hc2 ( γ1a γ1a ),
J J
3.5.2. Variances  
 Ja Ja ind wls
Basse and Feller (2018) showed that the cluster-robust HC2 vari- var DEY(a) = 1− v
arcluster
hc2 (
wls
α1a ) + varhc2 (
α1a ).
ance (Bell and McCaffrey 2002) is equal to the randomization- J J
based variance of the average spillover effect estimator under Proof is given in Appendix C.3 in the supplementary
the assumption of equal cluster size. We first generalize this materials.
equivalence result to the case where the cluster size varies and To gain some intuition about the weighted average of two
then proposes a regression-based variance estimator for the robust variances, consider the following model commonly used
CADE estimator that is equivalent to the randomization-based for split-plot designs,
variance estimator.
 
We begin by introducing additional notation. Let Xj = Yij∗ = αa I(Aj = a) + α1a Zij I(Aj = a) + Bj + Wij ,
(X1j , . . . , Xnj j ) be the design matrix of cluster j for the model a=0,1 a=0,1
642 K. IMAI, Z. JIANG, AND A. MALANI

where Bj represents the random effects for whole plots (or complier direct effect is a weighted average of the cluster-robust
clusters), and Wij is the random effects for split-plots (or and individual-robust HC2 variances,
individuals). The cluster-robust HC2 variance is related to
Bj and the individual-robust HC2 variance is related to Wij .    Ja

v 
ar CADE(a) = 1 − varcluster w2sls
In Appendix C.4 in the supplementary materials, we discuss hc2 (β1a )
J
the connection between the random effects model and the
Ja ind w2sls
randomization-based inference and explain why the adjustment + v arhc2 (β1a ).
ij∗ is necessary.
for  J
Finally, we consider the weighted two-stage least squares Proof is given in Appendix C.5 in the supplementary
regression given in Equations (10) and (13). Let Mj = materials.
(M1j , . . . , Mnj j ) be the design matrix for cluster j in the second-
stage regression with M ∗
ij = (I(Aj = 1), I(Aj = 0), Dij I(Aj =
4. Empirical Analysis
Dij I(Aj = 0)) where 
1),  ∗ ∗
Dij represents the fitted value given in
Equation (10). Let M = (M  
1 , . . . , MJ ) be the entire design
In this section, we analyze the data introduced in Section 2 by
 applying the proposed methodology. We focus on the annual
matrix. We use  ηj = ( η1j , . . . , 
ηnj j ) to denote the residual
household hospital expenditure, which ranges from 0 to INR
vector in cluster j obtained from the model given in Equation
500,000 (USD 28,831) with the median value of 1,000 (USD
(13), whereas  η = ( η1 , . . . , η 
J ) represents the residual
58). The outcome is missing for 926 households, which is less
vector for the entire sample. We define the cluster-robust HC2
than 10% of the sample. For simplicity, we discard the observa-
variance, v arcluster w2sls ), as
hc2 (β tions with missing data from the current analysis and leave the
⎧ ⎫ development of a method for analyzing two-stage randomized
⎨J ⎬
(M WM)−1 M
−1/2
 η
ηj
−1/2
Wj Mj (M WM)−1 , (14) experiments with missing data to future research.
⎩ j Wj Q̃j j Q̃j ⎭
j=1 As expected, the enrollment rate in the villages assigned to
the “High” assignment mechanism is 67.0%, whereas the enroll-
where Qj is the cluster annihilator matrix,
ment rate in the villages under the “Low” assignment mecha-
Q̃j = Inj − Mj (M WM)−1 M
1/2 1/2 nism is just 46.2%. Because the encouragement proportion is
j Wj .
80% under the “High” assignment mechanism and 40% under
The individual-robust HC2 variance, v arind  w2sls , is given by the “Low” assignment mechanism, this implies the existence
hc2 (β)
⎧ ⎫ of two-sided noncompliance, in which some households in the
⎨ J nj ⎬ treatment group did not receive the treatment and others in the
(M WM)−1 η∗2 Q̃−1
wij2 M M 
ij ij (M WM)−1 , control group managed to receive it.
⎩ ij ⎭
j=1 i=1 Table 2 presents the estimates of ITT effects and complier
average direct and spillover effects. We show the results for
where Q̃ij = 1 − wij M  −1
ij (Mj Wj Mj ) Mij is the individual
 nj both the individual/household-weighted and cluster/village-
ηij∗ = 
annihilator and  ηij − i =1  ηi j I(Zi j = z)/njz for Zij = z weighted estimands. In the top row, we show the estimated
is the adjusted residual with Xj   η∗j = 04 . As in the case of average direct effect on enrollment in RSBY under the “High”
ITT analysis, we can show that the weighted average of cluster- treatment mechanism (DED(1)) and under the “Low” treatment
robust and individual-robust variance estimators is numerically mechanism (DED(0)) as well as the estimated average spillover
equivalent to the randomization-based variance estimator. effects under the treatment condition (SED(1)) and the control
condition (SED(0)). The treatment assignment is estimated to
Theorem 9 (Regression-based variance estimator for the CADE). increase the enrollment rate by more than 40 percentage points.
The randomization-based variance estimator of the average This quantity represents the estimated proportions of compliers.

Table 2. Estimated intention-to-treat (ITT) and complier average direct and spillover effects.
Enrollment in RSBY DED(1) DED(0) SED(1) SED(0)
Household-weighted 0.482 (0.023) 0.441 (0.021) 0.086 (0.053) 0.045 (0.028)
Village-weighted 0.457 (0.019) 0.445 (0.017) 0.044 (0.018) 0.031 (0.021)
Hospital expenditure DEY(1) DEY(0) SEY(1) SEY(0)
Household-weighted −795 (514) 875 (530) −1374 (823) 297 (858)
Village-weighted −222 (575) 1666 (734) −1677 (972) 211 (761)
Hospital expenditure CADE(1) CADE(0) CASE(1) CASE(0)
Household-weighted −1649 (1061) 1984 (1215) −15,900 (15,342) 6568 (18,305)
Village-weighted −485 (1258) 3752 (1652) −38,341 (26,845) 6846 (25,042)
NOTE: For the “household-weighted” estimates, we equally weight households, whereas each village is equally weighted for the “village-weighted” estimates. The top row
presents the average direct effects on enrollment in RSBY under the high treatment mechanism (DED(1)) and under the low treatment mechanism (DED(0)) as well as the
average spillover effects under the treatment (SED(1)) and control (SED(0)) conditions. The middle row presents the same set of ITT estimates for hospital expenditure.
Finally, the bottom row presents the complier average direct effect under the “High” (CADE(1)) and “Low” (CADE(0)) treatment assignment mechanisms as well as the
complier average spillover effect under the treatment (CASE(1)) and control (CASE(0)) conditions.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 643

Interestingly, we find that the average spillover effects are hospitals when many households newly enroll in the RSBY. The
estimated to be positive and, sometimes, statistically significant. government can address this issue by increasing the capacity of
In particular, the village-weighted average spillover effect on local hospitals.
enrollment is 4.4 percentage points with the standard error of In addition to the quantities in our analysis above, we may
1.8 under the treatment condition. The finding implies that also be interested in other quantities, for example, the average
assigning a greater proportion of households to the treatment spillover effect of the treatment assignment when all households
condition makes another household of the same village more are assigned to the treatment condition versus all households
likely to enroll in RSBY especially if the latter is also encouraged are assigned to the control condition, and the direct effect of
to enroll.3 one’s own treatment receipt when the treatment receipts of other
The middle row of Table 2 presents the estimated ITT effects households are fixed at some constant levels. Unfortunately,
on the outcome. The estimated average direct effect under the without modeling assumptions, we are unable to identify these
“Low” assignment mechanism (DEY(0)) tends to be positive quantities. In Appendix E in the supplementary materials, we
where the village-weighted estimate is statistically significant. In propose a model-based approach and estimate these quantities
contrast, the estimated average direct effect under the “High” using our application data.
assignment mechanism (DEY(1)) is negative although not sta-
tistically significant.4
One possible explanation for this difference is that the assign- 5. Concluding Remarks
ment to the treatment condition makes people visit hospitals In this article, we consider two-stage randomized experiments
more often and spend more on healthcare so long as fewer with noncompliance and interference. We merge two strands of
households within the same village are assigned to the treat- the causal inference literature, one on experiments with non-
ment. When a large number of households within the same vil- compliance and the other on experiments with interference.
lage are assigned to the treatment condition, the overcrowding We introduce new causal quantities of interest, propose non-
of hospitals may reduce hospital visits of each treated household. parametric identification results and consistent estimators, and
We examine the plausibility of this explanation by estimating derive their variances. We connect these randomization-based
the direct effect of the treatment assignment on the number estimators to two-stage least squares regressions that are com-
of hospital visits. The estimated direct effect under the “High” monly used by applied researchers. We apply the proposed
treatment assignment mechanism is −0.157, whereas that under methodology to evaluate the efficacy of the India’s National
the “Low” treatment assignment mechanism is 0.132. Although Health Insurance program (RSBY) and find some evidence of
these estimates are not statistically significant, they provide sug- spillover effects. We believe that the proposed methodology can
gestive evidence consistent with the overcrowding hypothesis. help applied researchers make best use of this effective exper-
The bottom row of Table 2 presents the estimates of the com- imental design for studying interference problems. In future
plier average direct and spillover effects. The village-weighted research, it is of interest to relax the assumption of partial
CADE under the “Low” assignment mechanism (CADE(0)) is interference and allow for interference between units of different
positive and statistically significant, implying that enrollment clusters.
in RSBY directly increases the household hospital expenditure
when few households are assigned to the treatment condition. In
contrast, the CADE under the “High” assignment mechanism Supplementary Materials
(CADE(1)) is negative. This difference is consistent with the The supplementary appendix includes the mathematical proofs, additional
overcrowding hypothesis discussed above. simulation results, and an alternative parametric modeling strategy.
In addition, we also estimate the CASEs. Unfortunately, they
are imprecisely estimated, making it difficult to draw a defi-
nite conclusion about whether or not the proportion of treated Acknowledgments
households in a village directly affects one’s outcome among The proposed methodology is implemented via an open-source software
those who enroll in RSBY only when a greater proportion of package experiment (Imai and Jiang 2018), which is available at https://
households is encouraged to sign up for the insurance program. cran.r-project.org/package=experiment. We thank Naoki Egami for helpful
Because most of the estimates are not statistically signifi- comments. We thank the editor, associate editor, and three reviewers for
cant, it is difficult to draw a definitive conclusion. However, careful reading and many constructive comments.
our analysis provides some suggestive policy recommendations.
First, the estimated positive spillover effect of encouragement Funding
on enrollment suggests that the government could increase the
enrollment rate by leveraging existing social networks among This study was partially funded by grants from the Law School, the
households within each village. Second, the estimated nega- MacLean Center for Bioethics, and the Becker-Friedman Institute
at the University of Chicago, U.S.; the Department for International
tive CADE under the “High” treatment assignment mechanism Development, U.K.; the International Growth Centre, U.K.; the Tata Trusts
condition suggests that there might be overcrowding of local through the Tata Centre for Development at the University of Chicago; and
SRM University, Andhra Pradesh, India.
3
This result should be viewed as an illustration that spillovers are possible in
this study. We leave a more complete examination of spillover effects to
subsequent work based on the prespecified analysis. ORCID
4
This result should also be viewed as illustrative. Analysis pursuant to the pre-
specified analysis is left to future work. Kosuke Imai http://orcid.org/0000-0002-2748-1022
644 K. IMAI, Z. JIANG, AND A. MALANI

References Imai, K., and Jiang, Z. (2018), “Experiment: R Package for Designing
and Analyzing Randomized Experiments,” Comprehensive R Archive
Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996), “Identification of Network (CRAN), available at https://CRAN.R-project.org/package=
Causal Effects Using Instrumental Variables,” Journal of the American experiment. [633,643]
Statistical Association, 91, 444–455. [632,636,637] Jowett, M. (2003), “Do Informal Risk Sharing Networks Crowd Out Public
Aronow, P. M. (2012), “A General Method for Detecting Interference Voluntary Health Insurance? Evidence From Vietnam,” Applied Eco-
Between Units in Randomized Experiments,” Sociological Methods & nomics, 35, 1153–1161. [633]
Research, 41, 3–16. [632] Kang, H., and Imbens, G. (2016), “Peer Encouragement Designs in Causal
Aronow, P. M., and Samii, C. (2017), “Estimating Average Causal Inference With Partial Interference and Identification of Local Average
Effects Under General Interference,” Annals of Applied Statistics, 11, Network Effects,” arXiv no. 1609.04464. [632,638]
1912–1947. [632] Lehmann, E. L. (2004), Elements of Large-Sample Theory, Boston, MA:
Athey, S., Eckles, D., and Imbens, G. W. (2018), “Exact p-Values for Network Springer. [638]
Interference,” Journal of the American Statistical Association, 113, 230– Li, X., and Ding, P. (2017), “General Forms of Finite Population Central
240. [632] Limit Theorems With Applications to Causal Inference,” Journal of the
Baird, S., Bohren, J. A., McIntosh, C., and Özler, B. (2018), “Optimal Design American Statistical Association, 112, 1759–1769. [640]
of Experiments in the Presence of Interference,” The Review of Economics Lin, W., Liu, Y., and Meng, J. (2014), “The Crowding-Out Effect of Formal
and Statistics, 100(5), 844–860. [632,640] Insurance on Informal Risk Sharing: An Experimental Study,” Games
Basse, G., and Feller, A. (2018), “Analyzing Two-Stage Experiments in the and Economic Behavior, 86, 184–211. [633]
Presence of Interference,” Journal of the American Statistical Association, Liu, L., and Hudgens, M. G. (2014), “Large Sample Randomization Infer-
113, 41–55. [632,633,635,640,641] ence of CAUSAl Effects in the Presence of Interference,” Journal of the
Bell, R. M., and McCaffrey, D. F. (2002), “Bias Reduction in Standard Errors American Statistical Association, 109, 288–301. [632]
for Linear Regression With Multi-Stage Samples,” Survey Methodology, Neyman, J. (1923), “On the Application of Probability Theory to Agricul-
28, 169–181. [641] tural Experiments: Essay on Principles, Section 9” (Translated in 1990),
Berman, P., Ahuja, R., and Bhandari, L. (2010), “The Impoverishing Effect Statistical Science, 5, 465–480. [632,634]
of Healthcare Payments in India: New Methodology and Findings,” Ohlsson, E. (1989), “Asymptotic Normality for Two-Stage Sampling From
Economic and Political Weekly, 45, 65–71. [633] a Finite Population,” Probability Theory and Related Fields, 81, 341–352.
Chin, A. (2018), “Central Limit Theorems via Stein’s Method for Random- [640]
ized Experiments Under Interference,” arXiv no. 1804.03105. [640] Rosenbaum, P. R. (2007), “Interference Between Units in Randomized
Fisher, R. A. (1935), The Design of Experiments, London: Oliver and Experiments,” Journal of the American Statistical Association, 102, 191–
Boyd. [632] 200. [632]
Forastiere, L., Airoldi, E. M., and Mealli, F. (2016), “Identification and Esti- Rubin, D. B. (1990), “Comments on ‘On the Application of Probability
mation of Treatment and Interference Effects in Observational Studies Theory to Agricultural Experiments. Essay on Principles. Section 9’
on Networks,” arXiv no. 1609.06245. [632] by J. Splawa-Neyman Translated from the Polish and edited by D. M.
Hájek, J. (1960), “Limiting Distributions in Simple Random Sampling From Dabrowska and T. P. Speed,” Statistical Science, 5, 472–480. [632,634]
a Finite Population,” Publications of the Mathematical Institute of the Sävje, F., Aronow, P. M., and Hudgens, M. G. (2017), “Average Treatment
Hungarian Academy of Sciences, 5, 361–374. [638,640] Effects in the Presence of Unknown Interference,” arXiv no. 1711.06399.
Halloran, M. E., and Struchiner, C. J. (1995), “Causal Inference in Infectious [638,639]
Diseases,” Epidemiology, 6, 142–151. [635] Shahrawat, R., and Rao, K. D. (2011), “Insured Yet Vulnerable: Out-of-
Holland, P. W. (1986), “Statistics and Causal Inference” (with discus- Pocket Payments and India’s Poor,” Health Policy and Planning, 27, 213–
sion), Journal of the American Statistical Association, 81, 945–960. 221. [633]
[632,634] Sobel, M. E. (2006), “What Do Randomized Studies of Housing Mobility
Hong, G. (2015), Causality in a Social World: Moderation, Mediation and Demonstrate? Causal Inference in the Face of Interference,” Journal of
Spill-Over, Chichester: Wiley. [632] the American Statistical Association, 101, 1398–1407. [632,635]
Hong, G., and Raudenbush, S. W. (2006), “Evaluating Kindergarten Reten- Tchetgen Tchetgen, E. J., and VanderWeele, T. J. (2010), “On Causal Infer-
tion Policy: A Case Study of Causal Inference for Multilevel Observa- ence in the Presence of Interference,” Statistical Methods in Medical
tional Data,” Journal of the American Statistical Association, 101, 901– Research, 21, 55–75. [632]
910. [635] Vanderweele, T. J., Hong, G., Jones, S. M., and Brown, J. L. (2013), “Medi-
Hudgens, M. G., and Halloran, M. E. (2008), “Toward Causal Inference ation and Spillover Effects in Group-Randomized Trials: A Case Study
With Interference,” Journal of the American Statistical Association, 103, of the 4Rs Educational Intervention,” Journal of the American Statistical
832–842. [632,634,635,636,638,639,640] Association, 108, 469–482. [632]

You might also like