0% found this document useful (0 votes)

80 views33 pages

Individualized Policy Evaluation and Learning Under Clustered Network Interference

Uploaded by

nwobodope

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views33 pages

Individualized Policy Evaluation and Learning Under Clustered Network Interference

Uploaded by

nwobodope

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Individualized Policy Evaluation and Learning under

Clustered Network Interference∗

Yi Zhang† Kosuke Imai‡

February 6, 2024
arXiv:2311.02467v2 [stat.ME] 4 Feb 2024

Abstract
While there now exists a large literature on policy evaluation and learning, much of prior
work assumes that the treatment assignment of one unit does not affect the outcome of another
unit. Unfortunately, ignoring interference may lead to biased policy evaluation and ineffective
learned policies. For example, treating influential individuals who have many friends can gen-
erate positive spillover effects, thereby improving the overall performance of an individualized
treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under
clustered network interference (also known as partial interference) where clusters of units are
sampled from a population and units may influence one another within each cluster. Unlike
previous methods that impose strong restrictions on spillover effects, the proposed methodol-
ogy only assumes a semiparametric structural model where each unit’s outcome is an additive
function of individual treatments within the cluster. Under this model, we propose an estimator
that can be used to evaluate the empirical performance of an ITR. We show that this estimator
is substantially more efficient than the standard inverse probability weighting estimator, which
does not impose any assumption about spillover effects. We derive the finite-sample regret
bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the
improved performance of learned policies. Finally, we conduct simulation and empirical studies
to illustrate the advantages of the proposed methodology.

Keywords: individualized treatment rules, partial interference, randomized experiments, spillover

effects

∗
We thank Georgia Papadogeorgou, Davide Viviano, and anonymous reviewers of the Alexander and Diviya
Magaro Peer Pre-Review Program for useful comments.
†
Ph.D. Student, Department of Statistics, Harvard University. 1 Oxford Street, Cambridge MA 02138. Email:
yi [email protected]
‡
Professor, Department of Government and Department of Statistics, Harvard University. 1737 Cambridge
Street, Institute for Quantitative Social Science, Cambridge MA 02138. Email: [email protected] URL:
https://imai.fas.harvard.edu

1
1 Introduction
Over the past decade, a number of scholars across various disciplines have studied the problem of
developing optimal individualized treatment rules (ITRs) that maximize the average outcome in a
target population (e.g., Imai and Strauss, 2011; Zhang et al., 2012; Zhao et al., 2012; Swaminathan
and Joachims, 2015; Kitagawa and Tetenov, 2018; Athey and Wager, 2021; Zhang et al., 2022).
Beyond the academia, these methods have played an essential role in the implementation of per-
sonalized medicine and micro-targeting in advertisement and political campaigns. Moreover, new
methodologies have been developed to evaluate the empirical performance of learned ITRs (Imai
and Li, 2023).
Much of this existing policy evaluation and learning literature assumes no interference between
units, i.e., one’s outcome is not affected by the treatments of others. Yet, in the real world appli-
cations, such spillover effects are the norm rather than an exception. This means that there is a
potential to exploit the existence of spillover effects when learning an optimal ITR by incorporating
the information about individuals and their network relations. For example, assigning influential
and well-connected students to an anti-bullying program may more effectively reduce the number
of conflicts within a school (Paluck et al., 2016). Another example is that individuals who are at
the center of social network can spread information more widely (Banerjee et al., 2019).
Despite these potential advantages, there exist some key methodological challenges when learn-
ing ITRs in the presence of interference between units. First, the structure of spillover effects is
often little understood. It is difficult to obtain the information about people’s relationships, and
the existence of unobserved networks can invalidate the performance evaluation of learned ITRs
(Egami, 2021). Second, the total number of possible treatment allocations increases exponentially,
leading to the difficulty of inferring causal effects of high-dimensional treatments. Thus, efficient
individualized policy learning and evaluation require an assumption that is sufficiently informa-
tive to constrain the structure of spillover effects. At the same time, we must avoid unrealistic
assumptions. In particular, many researchers assume anonymous (stratified) interference where
spillover effects are determined by the number of treated neighbors regardless of which neighbors
are treated (Hudgens and Halloran, 2008; Liu and Hudgens, 2014; Viviano, 2024). Yet, in the real
world, the way in which one unit’s treatment influences another unit’s outcome often depends on
their specific relationship. Our goal is to relax this anonymous interference assumption by allowing
for heterogeneous spillover effects for efficient policy evaluation and learning.
In this paper, we consider individualized policy evaluation and learning under clustered (partial)
network interference (see Section 2; Sobel, 2006; Hudgens and Halloran, 2008; Tchetgen Tchetgen
and VanderWeele, 2012). Under this setting, units are grouped into non-overlapping clusters and
interference arises within each of these disjoint clusters rather than between them. In other words,
the outcome of one unit is possibly affected by the treatments of the other units in the same cluster
but not by those of units in other clusters. We focus on the experimental settings where the treat-
ment assignment probabilities are known, though we briefly discuss an extension to observational
studies with unknown propensity scores.
We propose an individualized policy evaluation and learning methodology based on a semi-
parametric structural model (Section 3). Specifically, we assume that each individual’s conditional
mean outcome function is additive in the treatment vector of all individuals within the same clus-
ter. Importantly, under this additivity assumption, the proposed model uses individual-specific
nonparametric functions that place no restriction on the heterogeneity of spillover effects. The
model, for example, accommodates the possibility that well-connected units within a cluster have
a greater influence on other units. Indeed, this semiparametric model contains as a special case the
standard parametric model based on the anonymous (stratified) interference assumption.

2
Next, we introduce a new policy evaluation estimator that exploits the proposed semipara-
metric model (Section 4). Our estimator, which we call the additive inverse-probability weighting
(addIPW) estimator, leverages the semiparametric structural assumption of spillover effects with-
out the need to fit an outcome model. We show that the addIPW estimator is unbiased and is
more efficient than the standard IPW estimator, which makes no assumption about the structure of
within-cluster spillover effects (Hudgens and Halloran, 2008; Liu et al., 2016; Tchetgen and Vander-
Weele, 2012). Finally, using this addIPW estimator, we find an optimal ITR within a pre-specified
policy class. Following the previous works (Kitagawa and Tetenov, 2018; Athey and Wager, 2021;
Zhou et al., 2023), we show that the empirical policy optimization problem can be formulated as a
mixed-integer linear program, which can be solved with off-the-shelf optimization tools.
We theoretically and empirically evaluate the performance of the proposed individualized pol-
icy learning methodology. We establish a finite-sample regret and conduct simulation studies
(Section 6). Both of these analyses demonstrate the superiority of the proposed methodology over
the optimal policy learned with the standard IPW estimators. Furthermore, we briefly discuss an
extension of our methodology to observational studies, where treatment assignment probabilities
are unknown (Section 5). While complete development is left for future work, we introduce an effi-
cient semiparametric doubly robust estimator under our additivity assumption, which enables the
use of flexible machine learning methods to estimate unknown propensity score functions. Finally,
in Section 7, we apply our methodology and learn an optimal ITR to increase the household-level
school attendance among Colombian schoolchildren through its conditional cash transfer program.

Related work. Numerous scholars have studied the problem of interference between units (e.g.,
Liu et al., 2016; Aronow and Samii, 2017; Basse and Feller, 2017; Athey et al., 2018; Leung, 2020;
Imai et al., 2021; Sävje et al., 2021; Hu et al., 2022; Li and Wager, 2022; Puelz et al., 2022;
Gao and Ding, 2023; Chattopadhyay et al., 2023). Much of this literature, however, has focused
upon the estimation of various causal effects including spillover and diffusion effects. In contrast,
many researchers across fields have studied policy learning and evaluation with known treatment
assignment probabilities (e.g., Zhang et al., 2012; Zhao et al., 2012; Swaminathan and Joachims,
2015; Kitagawa and Tetenov, 2018; Imai and Li, 2023; Jin et al., 2022) or unknown propensity
scores (e.g., Kallus, 2018; Athey and Wager, 2021; Zhou et al., 2023; Chernozhukov et al., 2019). A
vast majority of this policy learning literature relies upon the assumption of no interference between
units. In contrast to these existing works, we consider the problem of policy learning and evaluation
under clustered network interference where units influence one another within each cluster.
A relatively small number of studies have addressed the challenge of interference when learning
optimal ITRs. Some utilized parametric outcome models (Kitagawa and Wang, 2023; Ananth,
2020), while others adopted an exposure mapping approach (Ananth, 2020; Viviano, 2024; Park
et al., 2023). These models often impose strong functional form assumptions on the conditional
mean outcome model. In particular, a vast majority of previous studies, if not all, rely on anony-
mous (or stratified) interference where spillover effects are assumed to be a function of the number
of treated units in a cluster (Viviano, 2024; Ananth, 2020; Park et al., 2023). Our approach avoids
placing these restrictive assumptions on the structure of spillover effects.
Two of the aforementioned studies are closely related to our work. First, Viviano (2024) assumes
anonymous interference but is able to develop an optimal policy learning methodology under a single
network setting. Our methodology avoids the anonymous interference assumption, but requires a
random sample of clusters from a target population. Second, Park et al. (2023) study policy learning
under the same clustered network interference settings. The authors consider an optimal cluster-
level treatment policy that suggests the minimum proportion of treated units required within a

3
cluster to achieve a pre-defined target average outcome level. A limitation of their methodology is
its inability to specify which individual units within a cluster should receive treatment. In contrast,
we propose an individualized policy learning methodology, which optimally assigns treatments to
individuals based on the information about the individual and network characteristics.
There also exists a literature on policy evaluation under clustered network interference. For
example, Tchetgen Tchetgen and VanderWeele (2012) and Liu and Hudgens (2014) study causal
effect estimates under a “type-B” policy, where units independently select to receive the treatment
with a uniform probability. In addition, Papadogeorgou et al. (2019) and Barkley et al. (2020)
propose policy-relevant causal estimands based on a shift in parametric propensity score models,
whereas Lee et al. (2022) introduce an incremental propensity score intervention that further relaxes
these parametric assumptions. In contrast, we propose an efficient policy evaluation estimator by
leveraging the semiparametric additivity assumption that places a relatively weak restriction on the
structure of spillover effects. Our semiparametric model is closer to the one recently considered by
Yu et al. (2022) who use the model to estimate treatment effects in a design-based single network
setting.
As discussed later, a fundamental challenge of policy learning and evaluation under clustered
network intereference is that the treatment assignment is high-dimensional. In particular, the
number of possible treatment combinations grow exponentially as the cluster size increases. The
problem of policy learning and evaluation with high-dimensional treatments has been studied in
different contexts. For example, Xu et al. (2023) examine policy learning with optimal combinations
of multiple treatments, while Chernozhukov et al. (2019) study policy learning with continuous
treatments.

2 The Problem Statement

We begin by describing the setup and notation of individualized policy learning under clustered
network interference.

2.1 Setup and Notation

Consider a setting where observed units can be partitioned into clusters, such as households, class-
rooms, or villages. Let Mi denote the number of units in cluster i ∈ {1, 2, . . . , n}. For unit
j ∈ {1, 2, . . . , Mi } in cluster i, we let Yij ∈ R represent the observed outcome, Aij ∈ {0, 1}
denote the binary treatment condition assigned to this unit, and Xij ∈ Rp be the vector of p-
dimensional pre-treatment covariates. We use Yi = (Yi1 , Yi2 , . . . , YiMi )⊤ , Ai = (Ai1 , Ai2 , . . . , AiMi )⊤
and Xi = (Xi1 , Xi2 , . . . , XiMi )⊤ to denote the cluster-level vectors of outcome and treatment, and
the cluster-level matrix of pre-treatment covariates, respectively. Lastly, we use A, X , and Y to
represent the support of Aij , Xij , and Yij . Thus, A (mi ) = {0, 1}mi is the set of all the possible
2mi combinations of individual-level treatment assignments within a cluster of size mi .
Throughout the paper, we assume clustered network interference, also called partial interference,
where interference between individuals only occur within a cluster rather than across different
clusters. This assumption is widely used in the literature when units are partitioned into non-
overlapping clusters (e.g., Sobel, 2006; Hudgens and Halloran, 2008; Liu et al., 2016; Basse and
Feller, 2017; Liu et al., 2019; Imai et al., 2021). Under this assumption, we define the potential
outcome of one unit as a function of their own treatment as well as the treatment of others in the
same cluster. Formally, we let Yij (ai ) denote the potential outcome of unit j in cluster i when
the cluster is assigned to the treatment vector ai = (ai1 , ai2 , . . . , aiMi ) ∈ A (Mi ) and define the
cluster-level potential outcome vector by Yi (ai ) = (Yi1 (ai ) , Yi2 (ai ) , . . . , YiMi (ai ))⊤ . Additionally,

4
we
P make the standard consistency assumption that the vector of observed outcome, i.e., Yi =
ai ∈A(Mi ) Yi (ai ) 1(Ai = ai ). Throughout, we assume that clusters are sampled from a super-
population according to a distribution O, where O represents the joint distribution of independently
and identically distributed random vectors Oi = {Yi (ai )}ai ∈A(Mi ) , Ai , Mi , Xi . For notational
simplicity, in the rest of the paper we will include Mi as one of the pre-treatment covariates in Xi
unless explicitly mentioned.
Consider the cluster-level (generalized) propensity score under clustered network interference
by jointly modelling all treatment assignments in a cluster, i.e., e (ai | Xi ) := P(Ai = ai | Xi ).
This represents the probability of observing treatment vector ai ∈ A (Mi ) given the cluster-level
covariates Xi ∈ X (Mi ), where X (Mi ) is the support of Xi for a cluster of size Mi . The following
assumption is maintained throughout this paper.

Assumption 1 (Strong Ignorability of Treatment Assignment). The following conditions hold for
all ai ∈ A (Mi ) and xi ∈ X (Mi ).

(a) Unconfoundedness: Yi (ai ) ⊥

⊥ Ai | Xi = xi

(b) Positivity: ∃ ϵ > 0 such that ϵ < e (ai | xi ) < 1 − ϵ

In the paper, we focus on the experimental setting, where the propensity score is known and hence
Assumption 1 can be satisfied by design. In Section 5, we briefly consider the extension of our
methodology to observational studies where the treatment assignment probabilities are unknown
but Assumption 1 is still satisfied.

2.2 Individualized Policy Learning Problem

Without loss of generality, we assume that a greater value of outcome is preferable. Thus, through-
out the paper, we focus on learning an optimal individualized treatment rule (ITR) that maximizes
the within-cluster average of outcome while allowing for possible interference between units within
each cluster. While the individual outcome of interest is aggregated to the cluster-level, the learned
policy will be a function of both individual-level and cluster-level characteristics, thereby providing
a guidance to policy makers about which individuals in each cluster should receive the treatment.
We define an ITR as a mapping from the individual-level covariates Xij to a binary treatment
decision for an individual unit, i.e., π : X → {0, 1}. The cluster-level vector of treatment assignment
under a policy π, therefore, is given by

{π(Xij )}M Mi
j=1 = (π(Xi1 ), . . . , π(XiMi )) ∈ {0, 1} .
i

In practice, researchers can also specify a class of policies Π that incorporate various constraints.
For example, the linear policy class is defined as,

Π = {π : X → {0, 1} | π(X) = 1(γ0 + X ⊤ γ ≥ 0), γ0 ∈ R, γ ∈ Rp }.

Other forms of policies, such as decision trees and decision tables, have also been considered (e.g.,
Athey and Wager, 2021; Ben-Michael et al., 2021; Jia et al., 2023; Zhou et al., 2023).
Unlike cluster-level policies considered in the literature (Park et al., 2023), our individualized
policy learning formulation enables different treatment decisions for units within the same cluster.
This potentially yields a much improved outcome and allows for individual-level cost, fairness, and
other considerations. Indeed, Xij may include not only the attributes of unit j, but also those of
its neighbors or friends within the same cluster i, as well as cluster-level characteristics such as

5
cluster size Mi or other network attributes. It is also possible to include the interactions between
these individual, network, and cluster-level characteristics.
While we allow policies to depend on any covariates at both individual and cluster levels, we
impose a restriction that each individual’s treatment decision does not directly depend on those of
others. Thus, individualization is achieved through covariates rather than formulating a different
treatment rule π for each individual. In other words, our policy class precludes any joint treatment
rules across individuals — e.g., assign treatment to no more than two siblings of a household —
even when such policies achieve better outcomes.
Although a policy class that includes joint treatment rules is more general, specifying and
optimizing such a policy when the cluster size varies leads to additional parameterization and
optimization challenges. In contrast, our individualized policies have a benefit of being easily
applicable to individuals in clusters of different sizes because their inputs have a fixed number
of dimensions. In addition, since our proposed policies employ a uniform decision rule for all
units, they are much easier to optimize than joint treatment assignment rules, which typically
involve high-dimensional products of different individual-level parameters and induce a complex
dependency structure among individual treatment decisions.
To evaluate a policy π ∈ Π, we follow the existing work and focus on the population mean of
the potential outcome distribution. A key departure from policy learning without interference is
that we must consider how each individual’s outcome depends on the treatment assignments of the
other units in the same cluster. Specifically, we define the value of policy π as follows:
 
Mi
1 X
V (π) = EO  Yij {π(Xij )}M i
j=1
, (1)
Mi
j=1

where the expectation is taken over the target super-population of clusters O. Given a pre-specified
policy class Π, we wish to find an optimal policy π ∗ within this class that maximizes the policy
value, i.e.,
π ∗ ∈ argmax V (π). (2)
π∈Π

As mentioned above, our framework does not allow for a policy class that directly constrains
joint treatment decisions across individuals. However, it is possible to discourage (or encourage)
certain joint treatment decisions by incorporating a treatment cost function that depends on the
treatment decisions of multiple individual units within the same cluster. For example, in Section 7,
we use a cost function that is proportional to the total number of units who receive treatment
within each cluster.
Finally, we define the regret of policy π as the difference in value between the optimal policy
π ∗ and the policy under consideration:

R(π) = V (π ∗ ) − V (π). (3)

Our goal is to learn a policy with the minimal regret within a policy class whose complexity is
bounded.

3 Policy Evaluation
Before we turn to the problem of learning an optimal ITR, we must identify and estimate the
policy value V (π) defined in Equation (1) for any given policy π. We first show that the high-

6
dimensional nature of treatments under clustered network interference makes the standard IPW
estimator inefficient. To address this challenge, we propose a semiparametric model that imposes
a constraint on the structure of spillover effects while allowing for unknown heterogeneity in how
units affect one another in the same cluster. We then propose a new efficient estimator of policy
value that exploits this semiparametric outcome model.

3.1 The Standard Inverse-probability-weighting (IPW) Estimator

Under Assumption 1, the policy value function in Equation (1) can be identified as
 
Mi
1 X h i
V (π) = E  E Yij | Ai = {π(Xij )}M i
j=1 , Xi
.
Mi
j=1

Then, the following standard IPW estimator can be used to estimate the policy value V (π),
n o
n 1 Ai = {π(Xij )}Mi
1 X j=1
Vb IPW (π) = Y i, (4)
n e({π(Xij )}M
i=1
i
j=1 | X i )
PMi
where Y i = j=1 Yij /Mi is the cluster-level average of the observed individual outcomes. The
weight for cluster i is given by the reciprocal of the cluster-level propensity score.
As noted earlier, we consider an experimental study, where the propensity score is known. Under
this setting, it is straightforward to show that this standard IPW estimator is both unbiased and
√
n−consistent. Similar IPW estimators have been applied to policy learning without interference
(e.g., Swaminathan and Joachims, 2015; Zhang et al., 2012; Zhao et al., 2012; Kitagawa and Tetenov,
2018). Indeed, the IPW estimator given in Equation (4) is a natural extension of the standard IPW
estimator to clustered network interference and has been used for the estimation of treatment effects
in the presence of interference (Tchetgen and VanderWeele, 2012; Liu et al., 2016; Papadogeorgou
et al., 2019; Imai et al., 2021).
While Assumption 1 is sufficient for causal identification, IPW estimators often suffer from a
large variability with a data set of moderate size. This issue is exacerbated under the clustered
network interference settings because Ai is a high-dimensional treatment vector. The positivity
assumption (Assumption 1(b)) means that the propensity score must have positive values for all
possible treatment combinations. However, because there exist a large number of treatment com-
binations, the probability of any treatment assignment combination can take an extremely small
value, leading to a high variance. As an example, consider an Bernoulli randomization design
where units are independently assigned to the treatment condition with probability q ≤ 0.5. Then,
the inverse propensity score for treating all units in the same cluster is given by 1/q Mi , with the
denominator scaling exponentially in cluster size Mi . This can directly lead to a large variance
of the IPW estimator for evaluating the value of the treat-everyone policy, even with a moderate
cluster size Mi . In practice, therefore, we may not even observe a single cluster whose treatment
vector aligns with the treatment assignment under a given policy especially when the number of
clusters is small but the cluster size is large. In such cases, the IPW estimator is not applicable as
one cannot estimate the policy value.
A large variance of the standard IPW estimator will negatively affect the performance of sub-
sequent policy learning, which optimizes the empirical estimate of the value Vb IPW (π) across all
policies in policy class Π. A simulation study in Section 6 demonstrates that a learned policy based
on Vb IPW (π) can exhibit a slow rate of learning and substantially deviate from an optimal policy.

7
An alternative is to use an efficient semiparametric estimator that is known to be asymptotically
efficient, and often exhibit an improved finite-sample performance relative to the standard IPW
estimator (Park and Kang, 2022). Unfortunately, these efficient semiparametric estimators may
still suffer from a large variance when the inverse-propensity weights are large.
Indeed, it is not possible to generally improve upon the IPW estimator without an additional
assumption given that the IPW estimator has been shown to be minimax optimal (up to some
constant factors) in the non-asymptotic regime (Wang et al., 2017). This observation motivates
our semiparametric modeling assumption that places a constraint on the structure of spillover
effects. We turn to the description of this proposed model next.

3.2 A Semiparametric Additive Outcome Model

To reduce a large estimation variance, we propose a semiparametric model that places a restriction
on the interference structure. We seek a relatively weak but still informative assumption that allows
for a sufficiently complex pattern of interference within each cluster, while significantly improving
the efficiency of policy value estimate. Specifically, we assume that each individual’s conditional
mean potential outcome is linear in the treatment vector of all individuals within the same cluster.

Assumption 2 (Heterogeneous Additive Outcome Model). The potential outcome model satisfies
Mi
(0) (k)
X
E [Yij (ai ) | Xi ] = gj (Xi ) + gj (Xi )aik (5)
k=1
⊤
(k) (0) (1) (m)
for all ai , where gj (·) : X (m) → R and gj (x) = gj (x), gj (x), . . . , gj (x) is an unknown
treatment effect function sets that may vary across individual units with m denoting a specific
realization of the cluster size Mi .

Remark 1. The expectation in Equation (5) is taken with respect to the sampling of clusters, but
the additive relationship holds for all units. In addition, this equality is invariant to the permutation
(k)
of unit index j within cluster i. The reason is that the coefficients gj (Xi ) for k = 0, 1, . . . , Mi is
completely unrestricted. Finally, this assumption only characterizes the conditional expectation of
potential outcomes given observed covariates of interest while allowing for the possible presence of
unmeasured effect modifiers that are not confounders.

A key feature of the proposed model is that it does not restrict the degree of heterogeneity
in spillover effects. This is important because how individuals affect one another may depend on
their specific relationships. For example, the influence of one’s close friend may be greater than
that of an acquaintance. Moreover, spillover effects may be asymmetric with one person exerting
greater effects on others without being influenced by them. In other words, the causal effect of one
unit’s treatment on another unit’s outcomes can depend on the characteristics of both units and
their relationship. Our model accommodates these and other possibilities by representing spillover
effects with a nonparametric function that is specific to a directed relationship from one unit to
another that depends on the whole cluster-level vector of characteristics.
The proposed model incorporates, as special cases, more restrictive assumptions on the structure
of interference considered in the literature. For example, scholars studied the following parametric
linear-in-means model (e.g., Liu et al., 2016, 2019; Park and Kang, 2022).

8
Example 1 (Linear-in-means model).

E [Yij (ai ) | Xi ] = γ1 + γ2 aij + γ3 ai(−j) + γ4⊤ Xij + γ5⊤ X i(−j) , (6)

where the potential outcome model is assumed to be a linear function of one’s own treatment
assignment P and characteristics, the proportion of treated units (other than yourself) in the cluster
a
Pi(−j) = k̸=j aik /(Mi − 1), and the cluster-level mean of other units’ characteristics X i(−j) =
k̸=j Xik /(Mi − 1), and γ1 , . . . , γ5 are the coefficients. We can show that this model is a special
(0)
case of the proposed model given in Equation (2) by setting gj (Xi ) = γ1 + γ4⊤ Xij + γ5⊤ X i(−j) ,
(j) (k)
gj (Xi ) = γ2 , and gj (Xi ) = γ3 (Mi − 1) for all k ̸= j.

In addition, the proposed model incorporates, as a special case, a model based on the anonymous
(stratified) interference assumption (e.g., Liu and Hudgens, 2014; Hudgens and Halloran, 2008;
Tchetgen and VanderWeele, 2012; Bargagli-Stoffi et al., 2020; Viviano, 2024; Park et al., 2023).

Example 2 (Additive nonparametric effect model under anonymous interference).

E [Yij (ai ) | Xi ] = h0 (Xij , Xi(−j) ) + h1 (Xij , Xi(−j) )aij + h2 (Xij , Xi(−j) )ai(−j) , (7)

where h := (h0 , h1 , h2 ) is a vector of nonparametric functions. This model is a nonparametric

extension of the parametric model given in Equation (6) under anonymous interference. We can
(k)
show that this model is a special case of Assumption (2) by setting gj (Xi ) = h2 (Xij , Xi(−j) ) for
all k ̸= j.

While the proposed model enables units to arbitrarily influence one another within each cluster,
it rules out an interaction between spillover effects. For example, the effect of treating one child in
a household cannot depend on whether their siblings are treated. In Section 3.4, we show that it
is possible to generalize the proposed model to a more complex, semiparametric polynomial model
that incorporates interaction terms. Such a general model, however, may yield a highly variable
estimate of policy value, worsening the performance of learned treatment rules. In Section 6, we find
that the proposed model serves as a good approximation to a more complex interference structure
and substantially outperforms the standard IPW estimator, which makes no structural assumption.

3.3 Identification and Estimation

We now propose an estimator of policy value function that leverages the additive structural assump-
tion (Assumption 2). Under this assumption, model complexity grows only linearly with cluster size
Mi rather than at the exponential rate. Before describing our estimator, we introduce an additional
assumption about the treatment assignment mechanism that replaces Assumption 1(b). Specifi-
cally, we assume that treatments are assigned to individuals independently within each cluster
conditional on the observed covariates.

Assumption 3. (Factored cluster-level propensity score) The cluster-level treatment probability

satisfies
YMi
P(Ai = ai | Xi ) = P(Aij = aij | Xi ).
j=1

In addition, there exists η > 0 such that η < P(Aij = aij | Xi ) < 1 − η for any aij ∈ {0, 1}.

9
Assumption 3 is satisfied under a Bernoulli randomized trial, i.e., each Aij ∼ Bern(pij ) for pij ∈
(0, 1). The assumption implies the conditional independence of treatment assignments across indi-
viduals within the same cluster (Tchetgen Tchetgen and VanderWeele, 2012), i.e., Aij ⊥ ⊥ Ai(−j) | Xi
where Ai(−j) ∈ A (Mi − 1) denotes the vector of treatment indicators for all units in cluster i other
than unit j. Recall that Assumption 1(b) demands strong overlap for every treatment combination,
which is unlikely to hold when Ai is a high-dimensional treatment vector. In contrast, Assump-
tion 3 only requires the individual-level propensity scores to be sufficiently bounded from zero. This
means that the constant, which satisfies Assumption 1(b), is likely to satisfy Assumption 3.
Under this setup, we introduce the following additive Inverse-Propensity-Weighting (addIPW)
estimator of the policy value V (π) for a given policy π ∈ Π,
 
n Mi
1 X X 1 {A ij = π(X ij )}
Vb addIPW (π) = Yi − 1 + 1 . (8)
n ej (π(Xij ) | Xi )
i=1 j=1

where ej (aij | Xi ) := P(Aij = aij | Xi ) is the individual-level propensity score, and subscript j
emphasizes the fact that units are allowed to have different propensity score models.
The proposed estimator is a weighted average of the cluster-level mean outcomes, where the
weight of each cluster equals the sum of individual inverse propensity scores up to a normalizing
constant Mi − 1. When there is only a single unit within each cluster, i.e., Mi = 1 for all i, the
estimator reduces to the standard IPW estimator under no interference settings.
Crucially, the proposed estimator leverages the linear additive assumption of the conditional
outcome regression by ensuring that the cluster-level weights scale linearly with the individual-level
inverse probability weights, which are typically of reasonable magnitude. In contrast, the standard
IPW estimator given in Equation (4) uses the product of individual-level inverse probability weights,
which tends to zero as cluster size increases, leading to a large variance.
We combine this factorized propensity scores with the semiparametric additive model to show
that if the propensity scores are known, the proposed estimator is unbiased for the policy value
function,
Proposition 1 (Unbiasedness). Under Assumptions 1(a), 2, and 3,

E[Vb addIPW (π)] = V (π)

for ∀ π ∈ Π.
Proof of this proposition is given in Appendix A. Below, we provide an additional intuition for
this result. First, using the law of iterated expectation and Assumption 1(a), we rewrite the value
function under Assumption 2 as follows,
 !
Mi Mi
1 X (0)
X (k)
V (π) = E  gj (Xi ) + gj (Xi )π(Xik )  . (9)
Mi
j=1 k=1

Thus, we can estimate the value function by substituting the unknown nuisance parameters gj (x) =
⊤
(0) (1) (m)
gj (x), gj (x), . . . , gj (x) with their empirical estimates.
Unconfoundedness (Assumption 1(a)) enables us to rewrite Equation (5) using observable quan-
tities. We can then view gj (Xi ) given Xi as the coefficients of the treatment vector Ãi :=
(1, Ai1 , . . . , AiMi )⊤ in a unit-specific OLS regression of Yij on Ãi . In principle, this regression
problem cannot be directly solved due to non-identifiability issues, as there is only one observation

10
for the Mi + 1 predictors. However, we can find the population solution gj (Xi ) that minimizes the
mean squared error (MSE), leading to the following MSE minimizer
h i−1 h i
g̊j (Xi ) = E Ãi Ã⊤
i | X i E Ã i Yij | X i . (10)
h i
Since the matrix E Ãi Ã⊤i | Xi is a function of known propensity scores, we can directly
compute it. Under Assumption 3, this matrix is invertible and its inverse is given by the following
expression,
h i−1
E Ãi Ã⊤
i | X i =
 P i ek (1|Xi ) 
1+ M k=1 1−ek (1|Xi )
1
− 1−e1 (1|Xi)
··· ··· − 1−eM 1(1|Xi )
i
 1 1 

 − 1−e1 (1|Xi ) e1 (1|Xi )(1−e1 (1|Xi )) 0 ··· 0 

 .. 1 .. 
.

 . 0 e2 (1|Xi )(1−e2 (1|Xi )) 0 . 
 .
.. .. .. 

 . 0 . 0 

− 1−eM 1(1|Xi ) 0 ··· 0 1
eMi (1|Xi )(1−eMi (1|Xi ))
i

h i (11)
Given that E Yij Ãi | Xi is unknown, we replace it with the single realized observation (Yij , Ãi )
in Equation (10), resulting in the following estimator,
h i−1
ĝj (Xi ) = E Ãi Ã⊤
i | Xi Ãi Yij . (12)

The unbiasedness of this estimator is immediate:

Therefore, the linearity of expectation implies the following unbiased estimator of V (π):
n M Mi
!
1 X 1 Xi (0)
X (k)
V (π) =
b ĝj (Xi ) + ĝj (Xi )π(Xik )
n Mi
i=1 j=1 k=1
n Mi i−1
1 X 1 X h
= π̃(Xi )⊤ E Ãi Ã⊤
i | Xi Ãi Yij , (13)
n Mi
i=1 j=1

where π̃(Xi ) := (1, π(Xi1 ), . . . , π(XiMi ))⊤ is a binary treatment assignment vector under a given
policy π. Finally, substituting Equation (11) into Equation (13) yields our estimator Vb addIPW (π)
given in Equation (8).
Equation (13) shows that the proposed estimator can also be written as a weighted average
of individual outcomes based on the inverse of individual-level propensity scores. Importantly, the
proposed estimator utilizes the data from a cluster whose realized treatment assignment does not
agree with the policy. This contrasts with the standard IPW estimator given in Equation (4) that
equals a weighted average of cluster-level mean outcome using the inverse of cluster-level propensity
scores, dropping any cluster whose realized treatment assignment does not match with the policy
under consideration. This difference explains why the variance of the proposed estimator is much
smaller than that of the standard IPW estimator. As demonstrated in Section 4.1, this efficiency

11
gain in policy evaluation leads to a better performance of policy learning.
The proposed estimator is derived by considering the unit-specific least squares regression of
outcome on a treatment vector. However, rather than explicitly fitting the outcome model for
estimation, it modifies the weights of the standard IPW estimator such that they are consistent with
the semiparametric outcome model. A similar technique has been used in the previous literature for
off-policy evaluation for online recommendation systems (Swaminathan et al., 2017), the estimation
of total treatment effect in a design-based single network interference setting (Cortez et al., 2022),
and the estimation of average total treatment effect in bipartite network experiments (Harshaw
et al., 2023). In Section 5, we provide an additional justification of the proposed estimator by
establishing its relation to the efficient semiparametric estimator under Assumption 2.
We emphasize that the validity of the proposed policy value estimator (as well as the generalized
estimator proposed below in Section 3.4) does not necessarily depend on the assumption of factored
propensity score h(Assumption i 3). In fact, the estimator remains valid so long as the experimental
⊤
design matrix E Ãi Ãi | Xi is invertible. In scenarios where the treatment assignment ensures
that every low-order treatment combination for a cluster has nonzero probability, this matrix is
likely be invertible. Even if the experimental design matrix is not invertible, we can use the Moore-
Penrose pseudoinverse in place of the matrix inverse in the proposed estimator, yielding an unbiased
estimator.

3.4 Semiparametric Model with Interactions

Before we discuss policy learning, we note that it is possible to extend our semiparametric additive
model by including interactions. Consider the following polynomial additive model,

E [Yij (ai ) | Xi ] = gj (Xi )⊤ ϕ(ai ), (14)

where we use the following augmented treatment vector that contains up to β-order interactions
between treatments of different units with β < mmax where mmax is an upper bound of Mi ,
⊤
ϕ(ai ) = 1, {aij }j , {aij1 aij2 }j1 ̸=j2 , . . . , {aij1 aij2 . . . aijβ }j1 ̸=...̸=jβ . (15)

Under this model, gj (x) represents the unknown heterogeneous effect function set whose size equals
the length of ϕ(ai ). We can derive an unbiased estimator for V (π) as before,
n M
1 X 1 Xi h i−1
Vb (π) = ϕ(π(Xi ))⊤ E ϕ(Ai )ϕ(Ai )⊤ | Xi ϕ(Ai )Yij . (16)
n Mi
i=1 j=1

The explicit form of this estimator can be obtained by calculating the inverse of E ϕ(Ai )ϕ(Ai )⊤ | Xi ,

which contains up to the β-order product of individual-level treatment probabilities. Since directly
computing the inverse of this matrix is tedious, one may obtain the weights for individual out-
comes by leveraging the linearity of expectation and unbiasedness property of the estimator (see
Yu et al., 2022, for a similar technique). Appendix C provides an explicit expression of the proposed
estimator under the general polynomial additive model.
In principle, ϕ(Ai ) can be extended to a vector of at most length M
P i Mi
k=0 k = 2Mi , which
allows for all possible treatment interactions within a cluster of size Mi . Under this extreme
scenario, which implies no assumption about spillover effects, the proposed estimator can be shown
to equal the standard IPW estimator given in Equation (4). In practice, however, researchers must
choose the value of polynomial order β by considering a bias-variance tradeoff (see also Cortez

12
et al., 2022, who proposes a similar low-order interaction model in a design-based single network
context). We can further extend our model by letting β depend on cluster size Mi . This approach
will allow for the inclusion of fewer interactions in smaller clusters.
Our experience suggests that in most cases the linear or quadratic additivity assumption is
sufficient for effective policy evaluation and learning. Formally, when the true model includes
higher-order interactions, our estimator based on the linear additive assumption can be interpreted
as the following approximation to the true policy value V (π) given in Equation (1), i.e.,
 
Mi
1
gjProj. (Xi )⊤ π̃(Xi ) ,
X
E (17)
Mi
j=1

where gjProj. (x) is the projection of each unit’s true outcome function µj (Ai , Xi ) = E[Yij | Ai , Xi ]
onto the linear treatment vector space,
2
Proj. ⊤
gj (Xi ) = arg inf E µj (Ai , Xi ) − g(Xi ) Ãi | Xi .
g

For simplicity, we assume the linear additivity semiparametric model (i.e., Assumption 2)
throughout this paper and leave the data-driven choice of β to future work.

4 Policy Learning
We now consider the problem of policy learning. Specifically, we solve the following empirical analog
of the optimization problem given in Equation (2) using our proposed estimator in Equation (8),

π̂ := argmax Vb addIPW (π). (18)

π∈Π

We first measure the learning performance of π̂ by deriving a non-asymptotic upper bound on the
true population regret of π̂ defined in Equation (3), i.e., R(π̂). We then show that the optimization
problem can be solved using a mixed-integer linear program formulation.

4.1 Regret Analysis

We establish a finite-sample regret bound for π̂, assuming that the propensity scores are known. In
Appendix B, we extend our results to the case of estimated propensity scores, where one can instead
use their plug-in estimates. In particular, we provide a bound that depends on the estimation error
of propensity scores, which may vanish as the sample size increases.
We begin by stating the following standard assumptions.

Assumption 4. The following statements hold:

(a) Bounded outcome: ∃ a constant B ≥ 0 such that |Yij (ai )| ≤ B for all ai ∈ A (Mi )

(b) Finite cluster size: ∃ mmax ∈ N such that Mi ≤ mmax almost surely

(c) Bounded complexity of policy class: consider a policy class Π of binary-valued functions
π : X → {0, 1} that has a finite VC dimension ν < ∞

13
Assumption 4(a) is standard in the literature. Assumption 4(b) restricts cluster size Mi to
be bounded, implying that cluster size is not too large relative to the number of clusters n. The
proposed methodology may not perform well when the number of clusters is small. Assumption 4(c)
restricts the complexity of the policy class Π of interest using the concept of VC dimension (Vapnik
and Chervonenkis, 2015). This assumption is often assumed in the existing policy learning literature
to avoid overfitting (e.g., Kitagawa and Tetenov, 2018; Athey and Wager, 2021). The assumption
holds for common policy classes such as linear and fixed-depth decision trees.

Theorem 1 (Finite-sample regret bound). Suppose Assumptions 1(a), and 2–4 hold. Define π̂ as
the solution to Equation (18). For any δ > 0, with probability at least 1 − δ, the regret of π̂ can
be upper bounded as, r r
4C Bmmax ν 2 1
R(π̂) ≤ √ + 4c0 + 2C log (19)
n η n n δ
where C = B[mmax × ( η1 − 1) + 1] and c0 is a universal constant.

Proof. See Appendix A.

Theorem 1 provides a finite-sample upper bound on the regret, under the case of known propen-
√
sity scores. The regret converges to zero at the rate of 1/ n, which matches the optimal regret
rate for i.i.d. policy learning (Kitagawa and Tetenov, 2018; Athey and Wager, 2021). Moreover,
the bound linearly depends on the maximal cluster size, and is inversely proportional to the lower
bound of the individual-level propensity score η. This result contrasts with the regret bound based
p
on the standard IPW estimator given in Equation (4), which is typically of order Op ηmBmax nν .
It is worth noting that even when the outcome model (Assumption 2) is mis-specified, the learned
optimal policy π̂ in Equation (18) still achieves a meaningful regret bound based on the best linear
semiparametric approximation of the true value function given in Equation (17).

4.2 Mixed-Integer Program Formulation

The above results hold for the exact solution to Equation (18). In general, solving Equation (18)
leads to a nonconvex optimization problem, which makes it difficult to apply the proposed indi-
vidualized policy learning methodology. For a certain policy class, however, we can considerably
simplify the optimization problem using a mixed-integer program (MIP) formulation. Such policy
classes include linear decision rules, fixed-depth decision trees, and treatment sets with piecewise
linear boundaries (e.g., Kitagawa and Tetenov, 2018; Viviano, 2024; Athey and Wager, 2021; Zhou
et al., 2023).
For example, consider a linear policy rule of the following form:

Π = {π : X → {0, 1} | π(X) = 1(X ⊤ γ ≥ 0), γ ∈ B}.

Following Kitagawa and Tetenov (2018), we introduce binary variable pij and write
⊤γ
Xij ⊤γ
Xij ⊤
< pij ≤ 1 + , Cij > sup Xij γ , pij ∈ {0, 1}.
Cij Cij γ∈B

⊤ γ is non-negative and zero otherwise, i.e., p = π(X ). We now

Then, pij is equal to one if Xij ij ij

14
can write the objective function (up to some constants) as,
n M
1 X Xi

Aij 1 − Aij
Yi − pij .
n ej (1 | Xi ) ej (0 | Xi )
i=1 j=1

This implies that Equation (18) can be equivalently represented as the following linear MIP, which
can be solved using an off-the-shelf algorithm:
n M
1 X Xi

Aij 1 − Aij
max Yi − pij .
γ∈B,{pij }∈R n ej (1 | Xi ) ej (0 | Xi )
i=1 j=1
⊤γ
Xij ⊤γ
Xij (20)
< pij ≤ 1 + for i = 1, . . . , n, and j = 1, . . . , Mi ,
s.t. Cij Cij
pij ∈ {0, 1},

⊤γ .
where constants Cij should satisfy Cij > supγ∈B Xij

5 Extension to Observational Studies

So far, we have focused on the experimental setting where the propensity score is known. In this
section, we briefly consider the extension of the proposed methodology to observational studies
where the propensity score is unknown. We begin by considering a plug-in approach that replaces
the true propensity score ej with its estimate êj in the policy value estimator given in Equation (8).
It can be shown that the resulting regret of π̂ depends on the estimation error of unknown propensity
scores (see Theorem 3 in Appendix B). This implies that in all but the simplest cases, the plug-in
√
approach will result in a sub-optimal slower-than-1/ n rate for the learned policy π̂.
Given this suboptimality of the plug-in approach, we develop an alternative efficient policy
learning methodology by building on the recent work of Chernozhukov et al. (2019), who studies
efficient policy learning with continuous actions. Specifically, we specialize their approach and
propose a doubly robust policy value estimator under our linear additive structural assumption. As
shown below, this approach is more robust to estimation errors of the propensity score function and
the outcome regression model and attains the semiparametric efficiency bound under Assumption 2.
To define the doubly robust estimator, let µ(a, x) = E [Yi | Ai = a, Xi = x] be the vector of
true conditional expected outcomes in cluster i. Using this notation, we rewrite Assumption 2 as,

µ(a, x) = G(x)ϕ(a) (21)

where ϕ(a) := (1, a1 , . . . , am )⊤ is the treatment assignment vector, and G(x) is a m × (m + 1)

unknown function matrix of the following form,
   (0) (m) 
g1 (x) g1 (x) . . . g1 (x)
G(x) :=  .. .. ..
= .
   
. . .
gm (x) (0) (m)
gm (x) . . . gm (x)

Note that following the strategy described in Section 3.4, it is possible to increase the model
complexity by augmenting the treatment vector with interaction terms (see Equation (14)). For
the sake of simplicity, we assume the linear additive model in this section.

15
We also define the conditional covariance matrix of ϕ(a) as Σ(x) = E[ϕ(a)ϕ(a)⊤ | x]. The
doubly robust estimator we propose below, will rely on h estimates iof these two nuisance functions.
Notice that Σ(Xi ) is exactly equal to the matrix E Ãi Ã⊤ i | Xi defined in Section 3.3, except
that it now involves unknown propensity scores. Due to the factorized propensity score assumption
(Assumption 3), Σ can be inverted where the resulting expression is given by Equation (11).
Based on Equation (21), the policy value is identified as V (π) = E w(Mi )⊤ G(Xi )ϕ(π(Xi )) ,

where w(Mi ) = M1 i 1Mi are the uniform weights for averaging the outcomes of units within the
cluster and π(Xi ) = (π(Xi1 ), . . . , π(XiMi ))⊤ is the vector of treatment assignment under policy π.
We propose the following doubly-robust estimator,
n
1X
Vb DR (π) = w(Mi )⊤ G
b DR (Yi , Xi )ϕ(π(Xi )), (22)
n
i=1

where
G
b DR (Yi , Xi ) = G(X b i )ϕ(Ai ) ϕ(Ai )⊤ Σ(X
b i ) + Yi − G(X b i )−1 (23)

can be viewed as an estimate of G(Xi ) based on a single observation, and G b and Σb are the
estimates for the nuisance quantities G and Σ. Equation (23) has a form similar to the standard
doubly robust estimators in the literature, which typically consist of an outcome regression estimate
plus an augmented weighted residual term. It can be easily seen that Vb DR (π) enjoys a doubly
robust property — it is consistent if either of the two nuisance models is consistently estimated,
therefore, it is more robust to the estimation errors of propensity score and outcome regression
models. In addition, if we substitute G b DR (Yi , Xi ) with GIPW (Yi , Xi ) = Yi ϕ(Ai )⊤ Σ(Xi )−1 given
in Equation (22), this policy value estimator reduces to our additive IPW estimator, providing an
additional justification of the proposed estimator under the experimental settings.
Under Assumption 2 and an additional assumption of homoskedastic error, the variance of the
doubly robust estimator achieves the semiparametric efficiency bound for estimating the policy
value of a given policy.

Assumption 5 (Homoskedasticity). Assume homoscedastic error in the outcome model, i.e.,

h i h i
E w(M )⊤ (Y − µ(A, X))(Y − µ(A, X))⊤ w(M ) | A, X = E w(M )⊤ (Y − µ(A, X))(Y − µ(A, X))⊤ w(M ) .

The assumption of homoskedastic error in the residual function is often seen in the semipara-
metric literature (Robinson, 1988; Ai and Chen, 2003; Chamberlain, 1992).

Theorem 2 (Semiparametric Efficiency). Under Assumptions 1(a), 2-3 and 5, Vb DR (π) is semi-
parametrically efficient for estimating V (π) for any given π.

Proof. See Appendix A.

In principle, one can perform policy optimization based on this doubly robust estimate, i.e.,
π̂ DR := argmax Vb DR (π). However, to implement this approach in practice, some challenges re-
π∈Π
main, which require further study. First, the proposed efficient estimation of policy value requires
estimates of the unknown nuisance functions G b and Σ.
b Existing machine learning methods or non-
parametric regression estimators may need adjustments in the context of partial interference with
dependent data to meet the required rate conditions (see Section 4 of Park and Kang (2022) for
examples).

16
Second, even if we use policy value estimates that are efficient, this does not necessarily im-
ply efficient learning of the optimal policy itself (e.g., Zhou et al., 2023). In observational studies
with no interference, Athey and Wager (2021) take a similar doubly robust approach to obtain a
variance-based regret bound, which is robust to the first-order estimation errors of nuisance func-
tions and whose leading term depends on the semiparametrically efficient variance. We conjecture
that a similar approach can be applied by appropriately regularizing a policy learning procedure,
as outlined in Chernozhukov et al. (2019). A complete analysis of these challenges is beyond the
scope of this paper and is left for future research.

6 Simulation Studies
We conduct simulation studies to assess the finite-sample performance of our proposed methodology
by comparing it against the oracle policy and the learned policy based on the standard IPW
estimator. We also examine the finite-sample performance of the proposed policy value estimator,
which can be used for policy evaluation.

6.1 Setup
We generate n ∈ {50, 100, 200, 400, 800} clusters, and for each cluster i, we randomly generate
cluster size Mi ∈ {5, 10, 15} with uniform probability. For each unit j, we independently sample
four covariates (Xij1 , . . . , Xij4 ) from the standard normal distribution and generate the treatment
variable Aij from independent Bernoulli distributions with success probability of 0.3. Thus, our
propensity score model satisfies Assumption 3. Throughout, we assume that the propensity score
is known. Lastly, we sample the outcome variable Yij from the following two models. The out-
come regression model under Scenario A satisfies Assumption 2, whereas the outcome model under
Scenario B does not as it includes interaction terms:
(A) Yij | Ai , Xi ∼ N (µij , 1) ,
P
j ′ ̸=j (Xij ′ 3 + Xij ′ 4 )Aij ′
µij = (Xij1 + 0.5Xij2 − Xij3 − 0.5Xij4 )Aij + 1.5 + 0.2Xij2 + 0.2Xij3 ,
Mi − 1
(B) Yij | Ai , Xi ∼ N (µij , 1) ,
P
j ′ ̸=j (Xij ′ 3 + Xij ′ 4 )Aij ′ 2 2
µij = (Xij1 + 0.5Xij2 − Xij3 − 0.5Xij4 )Aij + 1.5 − 0.5(Xij1 + Xij2 )Aij Ai(−j)
Mi − 1
+ 0.2Xij2 + 0.2Xij3 ,
(24)
We consider the following class of linear thresholding policies,

π(X) = 1(β0 + β1 X1 + β2 X2 + β3 X3 + β4 X4 ≥ 0).

We then find the best policy within this policy class based on our proposed estimator Vb addIPW (π)
given in Equation (8). For comparison, we consider the optimal policies learned based on two
standard IPW estimators. One is Vb IPW (π) given in Equation (4) that assumes no knowledge of
the interference structure. The other is the standard IPW estimator that assumes the absence of
interference (Kitagawa and Tetenov, 2018). We call them the IPW estimators with and without in-
terference, respectively. We slightly adjust the IPW estimator without interference to accommodate

17
our cluster setting, i.e.,
n M
NoInt 1 X 1 Xi 1 {Aij = π(Xij )}
V
b (π) = Yij .
n Mi ej (π(Xij ) | Xi )
i=1 j=1

For our proposed estimator and the IPW estimator without interference, we formulate the opti-
mization problem as a mixed-integer linear programming (MILP) problem. We use an off-the-shelf
R-pacakge Rglpk to solve this optimization problem. In contrast, n the IPW estimatoro with interfer-
Mi
ence involves the cluster-level policy indicator variable, i.e., 1 Ai = {π(Xij )}j=1 , which makes
optimization challenging. Following Fang et al. (2022), we apply a smooth Q i stochastic approxima-
tion to the deterministic ITR by adopting the logistic function f (Xi ) = M ⊤
j=1 {1 + exp(−Xij β)} .
−1

We then use an efficient L-BFGS-based optimization procedure to solve this approximate optimiza-
tion problem. Lastly, we also include the oracle estimator that directly optimizes the empirical
policy value based on the true outcome model given in Equation (24), using a similar stochastic
approximation approach.
For each simulation setup, we generate 4,000 independent data sets and conduct policy learning
based on the aforementioned four estimators (the proposed estimator, the IPW estimators with
and without interefernce, and the oracle estimator). Since the true policy value under each learned
policy cannot be easily calculated analytically, we separately obtain an additional sample of 10,000
clusters from the assumed outcome model and approximate the true policy value.

6.2 Results
The left panel of Figure 1 shows that under Scenario A where Assumption 2 is met, the proposed
policy learning methodology based on the addIPW estimator (red boxplot) outperforms the learned
policies based on the IPW estimator with (blue) and without (green) interference. Due to its un-
biasedness under this setting, the performance of our policy learning methodology approaches that
of the oracle as the sample size increases. While the consideration of interference substantially
improves the performance of policy learning based on the IPW estimator, its performance is sub-
stantially worse than the proposed methodology. In addition, the performance of IPW-based policy
learning is more variable than the proposed methodology.
The right panel of Figure 1 shows the results under Scenario B where Assumption 2 is violated
and hence our addIPW estimator is biased. As shown in Equation (17), however, the proposed
estimator still provides a reasonable linear approximation and captures a substantial proportion
of the spillover effects. We find that the proposed policy learning method still outperforms the
methods based on the two IPW estimators. While the difference between the proposed estimator
and the IPW estimator with interference is smaller under this scenario than under the previous one,
the former continues to exhibit a smaller variance than the latter. As before, the IPW estimator
with interference performs much better than the one without it, confirming that it is critical to
account spillover effects when learning policies.
Better policy learning requires better policy evaluation. We next examine the finite-sample per-
formance of the proposed estimator when estimating the policy values and compare its performance
with that of the IPW estimator with interference. We use the same two simulation scenarios and
consider the evaluation of three different policies: (i) linear policy: π(X) = 1(0.5X1 + X2 + X3 +
0.5X4 ≥ 2), (ii) depth-2 decision tree policy: π(X) = 1(X1 > 0.5, X2 > 0.5) and (iii) treat-nobody
policy: π(X) = 0.
Figure 2 shows the results, with each plot corresponding to a different combination of policy

18
Scenario A Scenario B
0.6
0.75

0.50
0.3
Policy Value

Policy Value
0.25
0.0

0.00

−0.3

−0.25

−0.6

50 100 200 400 800 50 100 200 400 800

Sample Size Sample Size

Method IPW.NoInt IPW AddIPW Oracle

Figure 1: Boxplots of the policy value V (π̂) under the learned policies based on the proposed
estimator (red), the standard IPW estimator with (blue) and without (green) interference, and the
oracle estimator (purple). Even when the model is misspecified (Scenario B), the proposed policy
learning methodology outperforms the two IPW estimators, and its estimated policy value is the
closest to the one based on the oracle estimator. The IPW estimator that does not account for
interference performs worst.

class (columns) and data generation scenario (rows). The results for the sample size of n = 50
are omitted due to extreme variability. Overall the proposed addIPW estimator (red) exhibits
a substantial advantage over the standard IPW estimator with interference in terms of standard
deviation (shown by the length of vertical lines), across the settings we have considered. In cases
where Assumption 2 is satisfied (Scenario A; upper panel), our estimator and the standard IPW
estimator provide unbiased results but the former exhibits a much less variability than the latter.
In contrast, when Assumption 2 is violated (Scenario B), the proposed estimator may deviate from
the true policy value due to model misspecification. The degree of bias varies across policies with
the most substantial bias arising for evaluating the treat-nobody policy. In contrast, the IPW
estimator with interference remains unbiased.
In addition, the differences in variability between the two estimators are much greater for linear
and decision-tree policies than those under the treat-nobody policy. This is because the variance
of policy value estimator depends on the deviation between the baseline policy (i.e., following
propensity score) and the (deterministic) policy under evaluation. In fact, the variance scales
exponentially for the standard IPW estimator and linearly for the proposed addIPW estimator
with the magnitude of {1{Aij = π(Xij )}/ej (Aij | Xi )}M i
j=1 .
If the true optimal policy significantly deviates from the baseline policy, the standard IPW
estimator can yield a highly inaccurate estimate of the optimal policy value. This in turn can
negatively impact the downstream empirical policy optimization. This bias-variance trade-off is
evident in our findings about policy learning (Figure 1), where learned policies based on the pro-

19
Scenario A Scenario A Scenario A
Linear Decision tree Treat−nobody
1.5
1.0 0.2
1.0
0.1
0.5 0.5
0.0
0.0 0.0
−0.1
−0.5
−0.5 −0.2
−1.0
Policy Value

Method
Scenario B Scenario B Scenario B
AddIPW
Linear Decision tree Treat−nobody
IPW
3
1.0
0.2
2
0.5 0.1
1
0.0
0
0.0
−0.1
−1
−0.5 −0.2
−2

100 200 400 800 100 200 400 800 100 200 400 800
Sample Size

Figure 2: The performance of policy evaluation based on the proposed estimator (red) and the IPW
estimator with interference (blue). The dots represent the average performance over simulations
while the lines indicate the one standard deviation above and below the mean. The true policy
value (dashed black line) is calculated based on Monte-Carlo simulations.

posed addIPW estimator is more robust than those based on the standard IPW estimator with
interference, even when Assumption 2 is not met.

7 Empirical Application
We illustrate our method by applying it to a randomized experiment from a conditional cash
transfer program in Colombia (Barrera-Osorio et al., 2011).

7.1 Setup
The experiment was conducted in two regions in Bogota, Colombia: San Cristobal and Suba. In
each region, the researchers recruited households that have one to five schoolchildren, and within
each household, the children were randomized to enroll in the cash transfer program. Specifically,
the researchers stratified children based on locality (San Cristobal or Suba), type of school (public
or private), gender, and grade level. While within each stratum, each child had an equal probability
of receiving treatment, the treatment assignment probabilities varied between children within each
household if they belong to different strata. The original treatment randomization probability for
each stratum is known, and is on average 0.63 for children in San Cristobal and 0.45 for those in
Suba. Since randomization was based on fine strata that include gender and grade level, almost
all children within each household belong to different strata. Therefore, we can assume that the

20
Estimated policy value
(Treated Proportion)
Method cost: 15% cost: 20% cost: 25%
Additive IPW 0.872 (0.506) 0.849 (0.565) 0.781 (0.401)
IPW with interference 0.830 (0.690) 0.766 (0.591) 0.713 (0.461)
IPW without interference 0.853 (0.342) 0.810 (0.213) 0.751 (0.098)

Table 1: Estimated values of the learned policies under different treatment costs. The proposed
individualized policy learning methodology (“Additive IPW”) outperforms the learned policies
based on the standard IPW estimators, yielding the highest policy values.

treatment assignment mechanisms across children are conditionally independent of one another and
satisfy Assumption 3.
Previous studies focused on estimating the effects of the conditional cash transfer program on
the attendance rate of students. The program was designed such that enrolled students received
cash subsidies if they attended school at least 80% of the time in a given month. For example,
the original study estimated the spillover effects on the siblings of an enrolled student in the same
household (Barrera-Osorio et al., 2011). In our application, we analyze the same dataset (a total of
1010 households with 2129 students) as the one examined by Park and Kang (2022) who developed
and applied semiparametric efficient estimators to estimate the direct and spillover effects of the
program on school attendance rates.
In contrast to these previous studies, however, we focus on learning an optimal individualized
treatment rule to maximize the average household-level school attendance rate. We consider the
following linear policy class based on three pre-treatment variables: student’s grade, household’s
size, and household’s poverty score (with lower scores indicating poorer households):

π(Xij ) = 1{β0 + β1 × student’s grade + β2 × household’s size + β3 × household’s poverty score ≥ 0}.

In order to evaluate the performance of the learned individualized policy, we randomly split
the data into K = 5 folds Ik , k = 1, . . . , K, and for each fold k, learn an optimal individualized
treatment policy π̂ (−k) (·) using all but the k-th fold data. We incorporate the cost of treatment into
the objective function to ensure that the optimal treatment rule is not trivial (Athey and Wager,
2021). We then evaluate the the learned policy using the kth fold, and compute the following
overall empirical policy value by averaging over the resulting five estimates. Specifically, we use
the following policy value estimator with a varying treatment cost C (15%, 20% and 25% of the
school-attendance rate benefit, which ranges from 0 to 1).
  PMi (−k)
K X Mi
!
(−k) (X )

1 1 A ij = π̂ ij j=1 π̂ (Xij )
X X
Vb = Yi (−k)
− 1 + 1 − C × . (25)
n
k=1 i∈Ik j=1
ej (π̂ (Xij ) | Xi ) Mi

7.2 Results
Table 1 presents the estimated values of learned individualized policies based on our proposed
estimator (“Additive IPW”) and those based on the standard IPW estimators with and without
interference. We find that the proposed policy learning methodology achieves the best performance
across all treatment cost specifications, yielding the highest estimated policy values. Interestingly,
the learned policies based on the standard IPW estimator perform better when ignoring interference

21
grade level household size poverty score
cost: 15%
Additive IPW −2.807 0.343 −4.077
IPW with interference −0.980 0.752 −0.517
IPW without interference −0.878 2.475 −2.225
cost: 20%
Additive IPW −2.063 1.103 −0.682
IPW with interference −0.958 0.865 −0.671
IPW without interference 0.000 0.334 0.000
cost: 25%
Additive IPW −1.003 1.129 −0.930
IPW with interference −2.009 2.544 −0.318
IPW without interference −0.756 1.319 −1.951

Table 2: Estimated coefficients of the linear optimal policy under all methodologies and all treat-
ment cost specifications, normalized by the absolute magnitude of intercept. The learned policies
tend to prioritize children who are in lower grade levels and live in larger and poorer households.

across all cases. The proportion of treated units (reported in parentheses) varies substantially from
one methodology to another with the standard IPW estimator that does not account for interference
treating the smallest proportion of units.
We also report the estimated coefficients of the linear policies in Table 2, normalized by the
absolute magnitude of estimated intercept |β̂0 |. Here, we use the entire data to learn a single policy
under each methodology. We find that, regardless of costs, the optimal ITR tends to negatively
depend on student’s grade and household’s poverty score while it has a positive relationship with
household’s size across all methodologies. This implies that the optimal policies tend to give a
priority to children who are in a lower grade level and reside in larger and poorer households.
However, the estimated optimal policy based on the standard IPW estimator that ignores
interference, can yield degenerate results. For instance, when the cost is 20%, the coefficients for
both grade level and poverty score are zero, indicating the failure to leverage heterogeneity due to
the underlying interference mechanism. In addition, although the standard IPW estimator with
interference yields the estimated coefficients of the same sign as those based on our methodology,
its statistical inefficiency may have yielded the lower policy values as shown in Table 1.

8 Concluding Remarks
In this paper, we propose new policy evaluation and learning methodologies under clustered net-
work interference. We introduce the semiparametric additive effect model that flexibly captures
heterogeneous spillover effects among individuals while avoiding restrictive assumptions used in the
existing literature. Our proposed estimator for policy evaluation exploits this structural assump-
tion and yields substantially improved statistical efficiency relative to the standard IPW estimator.
Theoretically, we provide a non-asymptotic regret bound for the learned optimal ITR that achieves
an optimal dependence on sample size, when the propensity scores are known.
The empirical results demonstrate the importance of considering individual-level network in-
formation in the policy learning problem. Consistent with our theoretical analysis, we find that
even when our assumption is violated, the proposed estimator outperforms the standard IPW es-
timators. Thus, the proposed methodology achieves a desirable degree of bias-variance tradeoff by

22
leveraging a structural assumption that is sufficiently informative but is not too restrictive.
There are several potential future directions. First, as briefly examined in Section 5, we can
extend our methodology to observational studies with unknown propensity scores using a doubly-
robust estimator. Second, it is of interest to extend our methodology to more general network
settings. In particular, we may consider weak dependencies between clusters or consider a single
network setup. These extensions, while methodologically challenging, will further enhance the effec-
tive use of individualized policy learning in real world applications with complex interdependencies.

References
Ai, C. and X. Chen (2003). Efficient estimation of models with conditional moment restrictions
containing unknown functions. Econometrica 71 (6), 1795–1843.

Ananth, A. (2020). Optimal treatment assignment rules on networked populations. Technical

report, working paper.

Aronow, P. M. and C. Samii (2017). Estimating average causal effects under general interference,
with application to a social network experiment.

Athey, S., D. Eckles, and G. W. Imbens (2018). Exact p-values for network interference. Journal
of the American Statistical Association 113 (521), 230–240.

Athey, S. and S. Wager (2021). Policy learning with observational data. Econometrica 89 (1),
133–161.

Banerjee, A., A. G. Chandrasekhar, E. Duflo, and M. O. Jackson (2019). Using gossips to spread in-
formation: Theory and evidence from two randomized controlled trials. The Review of Economic
Studies 86 (6), 2453–2490.

Bargagli-Stoffi, F. J., C. Tortu, and L. Forastiere (2020). Heterogeneous treatment and spillover
effects under clustered network interference. arXiv preprint arXiv:2008.00707 .

Barkley, B. G., M. G. Hudgens, J. D. Clemens, M. Ali, and M. E. Emch (2020). Causal inference
from observational studies with clustered interference, with application to a cholera vaccine study.

Barrera-Osorio, F., M. Bertrand, L. L. Linden, and F. Perez-Calle (2011). Improving the design of
conditional transfer programs: Evidence from a randomized education experiment in colombia.
American Economic Journal: Applied Economics 3 (2), 167–195.

Basse, G. and A. Feller (2017). Analyzing two-stage experiments in the presence of interference.
Journal of the American Statistical Association, forthcoming.

Ben-Michael, E., D. J. Greiner, K. Imai, and Z. Jiang (2021). Safe policy learning through extrap-
olation: Application to pre-trial risk assessment. Technical report, arXiv:2109.11679.

Chamberlain, G. (1992). Efficiency bounds for semiparametric regression. Econometrica: Journal

of the Econometric Society, 567–596.

Chattopadhyay, A., K. Imai, and J. R. Zubizarreta (2023). Design-based inference for generalized
network experiments with stochastic interventions.

23
Chernozhukov, V., M. Demirer, G. Lewis, and V. Syrgkanis (2019). Semi-parametric efficient policy
learning with continuous actions. Advances in Neural Information Processing Systems 32.

Cortez, M., M. Eichhorn, and C. L. Yu (2022). Exploiting neighborhood interference with low order
interactions under unit randomized design. arXiv preprint arXiv:2208.05553 .

Egami, N. (2021). Spillover effects in the presence of unobserved networks. Political Analysis 29 (3),
287–316.

Fang, E. X., Z. Wang, and L. Wang (2022). Fairness-oriented learning for optimal individualized
treatment rules. Journal of the American Statistical Association, 1–14.

Gao, M. and P. Ding (2023). Causal inference in network experiments: regression-based analysis
and design-based properties. arXiv preprint arXiv:2309.07476 .

Harshaw, C., F. Sävje, D. Eisenstat, V. Mirrokni, and J. Pouget-Abadie (2023). Design and
analysis of bipartite experiments under a linear exposure-response model. Electronic Journal of
Statistics 17 (1), 464–518.

Hu, Y., S. Li, and S. Wager (2022). Average direct and indirect causal effects under interference.
Biometrika.

Hudgens, M. and M. Halloran (2008). Toward causal inference with interference. Journal of the
American Statistical Association.

Imai, K., Z. Jiang, and A. Malani (2021). Causal inference with interference and noncompliance
in two-stage randomized experiments. Journal of the American Statistical Association 116 (534),
632–644.

Imai, K. and M. L. Li (2023). Experimental evaluation of individualized treatment rules. Journal

of the American Statistical Association 118 (541), 242–256.

Imai, K. and A. Strauss (2011, Winter). Estimation of heterogeneous treatment effects from ran-
domized experiments, with application to the optimal planning of the get-out-the-vote campaign.
Political Analysis 19 (1), 1–19.

Jia, Z., E. Ben-Michael, and K. Imai (2023). Bayesian safe policy learning with chance constrained
optimization: Application to military security assessment during the vietnam war.

Jin, Y., Z. Ren, Z. Yang, and Z. Wang (2022). Policy learning” without”overlap: Pessimism and
generalized empirical bernstein’s inequality. arXiv preprint arXiv:2212.09900 .

Kallus, N. (2018). Balanced policy evaluation and learning. Advances in neural information pro-
cessing systems 31.

Kitagawa, T. and A. Tetenov (2018). Who should be treated? empirical welfare maximization
methods for treatment choice. Econometrica 86 (2), 591–616.

Kitagawa, T. and G. Wang (2023). Who should get vaccinated? individualized allocation of vaccines
over sir network. Journal of Econometrics 232 (1), 109–131.

Lee, C., D. Zeng, and M. G. Hudgens (2022). Efficient nonparametric estimation of incremental
propensity score effects with clustered interference. arXiv preprint arXiv:2212.10959 .

24
Leung, M. P. (2020). Treatment and spillover effects under network interference. Review of Eco-
nomics and Statistics 102 (2), 368–380.

Li, S. and S. Wager (2022). Random graph asymptotics for treatment effect estimation under
network interference. The Annals of Statistics 50 (4), 2334–2358.

Liu, L. and M. Hudgens (2014). Large sample randomization inference of causal effects in the
presence of interference. Journal of the American Statistical Association.

Liu, L., M. G. Hudgens, and S. Becker-Dreps (2016). On inverse probability-weighted estimators

in the presence of interference. Biometrika 103 (4), 829–842.

Liu, L., M. G. Hudgens, B. Saul, J. D. Clemens, M. Ali, and M. E. Emch (2019). Doubly robust
estimation in observational studies with partial interference. Stat 8 (1), e214.

Paluck, E. L., H. Shepherd, and P. M. Aronow (2016). Changing climates of conflict: A social
network experiment in 56 schools. Proceedings of the National Academy of Sciences 113 (3),
566–571.

Papadogeorgou, G., F. Mealli, and C. M. Zigler (2019). Causal inference with interfering units for
cluster and population level treatment allocation programs. Biometrics 75 (3), 778–787.

Park, C., G. Chen, M. Yu, and H. Kang (2023). Minimum resource threshold policy under partial
interference. Journal of the American Statistical Association (just-accepted), 1–43.

Park, C. and H. Kang (2022). Efficient semiparametric estimation of network treatment effects
under partial interference. Biometrika 109 (4), 1015–1031.

Puelz, D., G. Basse, A. Feller, and P. Toulis (2022). A graph-theoretic approach to randomization
tests of causal effects under general interference. Journal of the Royal Statistical Society, Series
B 84 (1), 174–204.

Robinson, P. M. (1988). Root-n-consistent semiparametric regression. Econometrica: Journal of

the Econometric Society, 931–954.

Sävje, F., P. Aronow, and M. Hudgens (2021). Average treatment effects in the presence of unknown
interference. Annals of statistics 49 (2), 673.

Sobel, M. E. (2006). What do randomized studies of housing mobility demonstrate? causal inference
in the face of interference. Journal of the American Statistical Association 101 (476), 1398–1407.

Swaminathan, A. and T. Joachims (2015). Counterfactual risk minimization: Learning from logged
bandit feedback. In International Conference on Machine Learning, pp. 814–823. PMLR.

Swaminathan, A., A. Krishnamurthy, A. Agarwal, M. Dudik, J. Langford, D. Jose, and I. Zi-

touni (2017). Off-policy evaluation for slate recommendation. Advances in Neural Information
Processing Systems 30.

Tchetgen, E. J. T. and T. J. VanderWeele (2012). On causal inference in the presence of interference.

Statistical methods in medical research 21 (1), 55–75.

Tchetgen Tchetgen, E. J. and T. J. VanderWeele (2012). On causal inference in the presence of

interference. Statistical Methods in Medical Research 21 (1), 55–75. PMID: 21068053.

25
Vapnik, V. N. and A. Y. Chervonenkis (2015). On the uniform convergence of relative frequencies
of events to their probabilities. In Measures of complexity: festschrift for alexey chervonenkis,
pp. 11–30. Springer.

Viviano, D. (2024). Policy targeting under network interference. Review of Economic Studies,
forthcoming.

Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, Volume 48.

Cambridge University Press.

Wang, Y.-X., A. Agarwal, and M. Dudık (2017). Optimal and adaptive off-policy evaluation in
contextual bandits. In International Conference on Machine Learning, pp. 3589–3597. PMLR.

Xu, Q., H. Fu, and A. Qu (2023). Optimal individualized treatment rule for combination treatments
under budget constraints. arXiv preprint arXiv:2303.11507 .

Yu, C. L., E. M. Airoldi, C. Borgs, and J. T. Chayes (2022). Estimating the total treatment effect in
randomized experiments with unknown network structure. Proceedings of the National Academy
of Sciences 119 (44), e2208975119.

Zhang, B., A. A. Tsiatis, M. Davidian, M. Zhang, and E. Laber (2012). Estimating optimal
treatment regimes from a classification perspective. Stat 1 (1), 103–114.

Zhang, Y., E. Ben-Michael, and K. Imai (2022). Safe policy learning under regression discontinuity
designs. arXiv preprint arXiv:2208.13323 .

Zhao, Y., D. Zeng, A. J. Rush, and M. R. Kosorok (2012). Estimating individualized treatment
rules using outcome weighted learning. Journal of the American Statistical Association 107 (499),
1106–1118.

Zhou, Z., S. Athey, and S. Wager (2023). Offline multi-action policy learning: Generalization and
optimization. Operations Research 71 (1), 148–183.

26
Supplementary Appendix

Contents
A Proofs 27
A.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
A.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
A.3 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

B Observational studies with unknown propensity scores 30

C The estimator for the general polynomial additive model 31

A Proofs
A.1 Proof of Proposition 1
Proof. For notational simplicity, we define the cluster-level average nuisance functions of Equa-
tion (5) as
M
1 Xi
(0) (1) (Mi )
⊤
g(Xi ) = gj (Xi ) := g (Xi ), g (Xi ), . . . , g (Xi ) .
Mi
j=1

We start by examining the expectation of the proposed estimator,

  
n Mi
1 X X 1 {A ij = π(X ij )} 
E[Vb addIPW (π)] =E  Yi −1 +1 
n  ej (π(Xij ) | Xi ) 
i=1 j=1
    (26)
n Mi
1 X X 1 {A ij = π(X ij )}
=E  E Y i −1 Xi  + E Y i | Xi  .
n ej (π(Xij ) | Xi )
i=1 j=1

We can further compute,

" #
1 {Aij = π(Xij )}
E Yi Xi
ej (π(Xij ) | Xi )

1 {Aij = π(Xij )}
= E Y i (Ai(−j) , Aij = π(Xij )) Xi
ej (π(Xij ) | Xi )

1 {Aij = π(Xij )}
= E Y i (Ai(−j) , Aij = π(Xij )) | Xi E Xi
ej (π(Xij ) | Xi )

= E E Y i (Ai(−j) , Aij = π(Xij )) | Ai(−j) , Aij = π(Xij ), Xi | Xi
Mi
X
(0)
=g (Xi ) + g (k) (Xi )ek (1 | Xi ) + g (j) (Xi )π(Xij ), (27)
k=1,k̸=j

where the first equality is due to consistency, the second equality follows from unconfoundedness
(Assumption 1(a)) and factorization of propensity scores (Assumption 3), the third equality follows

27
from the tower property and the fact that Yi (ai ) ⊥
⊥ Aij | Ai(−j) , Xi (due to Assumptions 1(a) and 3)
and the final equality holds under the additive outcome model assumption (Assumption 2). In
addition, Assumption 2 implies the following equality,
Mi
X
(0)
g (k) (Xi )ek (1 | Xi ).

E Y i | Xi = g (Xi ) + (28)
k=1

Plugging Equations (27) and(28) into Equation (26), we obtain the following desired result,

Mi
X Mi
X
addIPW (0) (k)
E[V
b (π)] = E Mi × g (Xi ) + (Mi − 1)
 g (Xi )ek (1 | Xi ) + g (j) (Xi )π(Xij )
k=1 j=1
Mi
!#
X
−(Mi − 1) g (0) (Xi ) + g (k) (Xi )ek (1 | Xi )
k=1
 
Mi
X
= E g (0) (Xi ) + g (j) (Xi )π(Xij )
j=1

= V (π).

where the last equality uses the law of iterated expectation under the modeling assumption (As-
sumption 2).

A.2 Proof of Theorem 1

Definition 1 (Rademacher complexity). Let F be a family of functions mapping O 7→ R. The
population Rademacher complexity of F is defined as
n
" #
1X
Rn (F) ≡ EO,ε sup εi f (Oi )
f ∈F n i=1

where εi ’s are i.i.d. Rademacher random variables, i.e., Pr (εi = 1) = Pr (εi = −1) = 1/2, and the
expectation is taken over both the Rademacher variables εi and the i.i.d. random variables Oi .

Proof. By definition, the regret of π̂ relative to π ∗ is given by

R(π̂) = V (π ∗ ) − V (π̂) = V (π ∗ ) − Vb (π ∗ ) + Vb (π ∗ ) − Vb (π̂) + Vb (π̂) − V (π̂)

≤ V (π ∗ ) − Vb (π ∗ ) + Vb (π̂) − V (π̂)
≤ 2 sup |Vb (π) − V (π)|.
π∈Π

Similar to the standard policy learning literature, our approach is to bound the above worst-case
estimation error, supπ∈Π |Vb (π) − V (π)|. Define
  
Mi
X 1 {Aij = π(Xij )} 
f (Oi ; π) = Y i − 1 + 1 ,
 ej (π(Xij ) | Xi ) 
j=1

28
and the function class on O as F = {f (·; π) | π ∈ Π}. Proposition 1 implies,
n
1X
sup |Vb (π) − V (π)| = sup f (Oi ) − E[f (O)] .
π∈Π f ∈F n i=1

From Assumptions 1(c) and 4, the class F is uniformly bounded by a constant C = B[mmax × ( η1 −
1) + 1]. Then by Theorem 4.10 in Wainwright (2019),
n
1X
sup f (Oi ) − E[f (O)] ≤ 2Rn (F) + δ
f ∈F n
i=1

nδ 2
with probability at least 1 − exp − 2C 2 . It suffices to bound the Rademacher complexity for F,

Rn (F), given by
  
n Mi
1 X  X 1 − Aij 
Rn (F) ≤ EO,ϵ  ϵi Y i −1 +1 
n  ej (0 | Xi ) 
i=1 j=1
 
n Mi
1 X X A ij 1 − A ij
+ sup EO,ε  ϵi Y i π(Xij ) − 
π∈Π n ej (1 | Xi ) e j (0 | X i)
i=1 j=1
 
Mi
" n # n
C X B 1 X X
≤ Eε εi + sup EX,M,ε  εi π(Xij )  .
n π∈Π η n
i=1 i=1 j=1
 
n Mi
C B 1 X X
≤ √ + sup EX,M,ε  εi π(Xij )  .
n π∈Π η n
i=1 j=1

To bound the second term above, we utilize the upper bound mmax on cluster size given in Assump-
tion 4(b). We append zeros to the cluster-level covariates Xi (i.e., Xij = ∅ for j = Mi +1, . . . , mmax )
so that each cluster has the same length of covariate vectors, and define π(∅) = 0. Therefore, we
can bound it by
   
n m max m max n
1 X X X 1 X
sup EX,M,ε  εi π(Xij )  ≤ sup EX,M,ε  εi π(Xij ) 
π∈Π n π∈Π n
i=1 j=1 j=1 i=1
m n
" #
max
X 1X
≤ sup EX,M,ε εi π(Xij )
n
j=1 π∈Π i=1
r
ν
≤c0 mmax ,
n
where the first inequality is due to triangle inequality, and the last inequality follows from the fact
p class G with finite VC dimension ν < ∞, the Rademacher complexity is bounded
that for a function
by, Rn (G) ≤ c0 ν/n, for some universal p constant c0 (Wainwright, 2019, §5.3.3). Combining the
C Bm ν
above, we obtain Rn (F) ≤ √n + c0 η max
n , and thus finish the proof.

29
A.3 Proof of Theorem 2
Proof. To derive the semiparametric efficiency bound for a policy value estimate, we use a result
from Chamberlain (1992) who derives the efficiency bound under conditional moment restrictions
with a unknown nonparametric component. Assumption 2 can be written as the following condi-
tional moment restriction,
E [Y − G(X)ϕ(A) | A, X] = 0.
Based on the results in Chamberlain (1992), we can directly show that under the additive semi-
parametric model assumption, the efficiency bound for estimating V (π) is:
h i
Var w(M )⊤ G(X)ϕ(π(X)) +
" −1 #
h i−1
E ϕ(π(X))⊤ E ϕ(A)Var w(M )⊤ (Y − G(X)ϕ(A)) | A, X ϕ(A)⊤ | X ϕ(π(X))

Under the homoskedastivity assumption, E w(M )⊤ (Y − µ(A, X))(Y − µ(A, X))⊤ w(M ) | A, X =

σ 2 where σ 2 is a constant. Thus, this efficiency bound becomes:

h i h i−1
⊤ 2 ⊤ ⊤
Var w(M ) G(X)ϕ(π(X)) + E σ ϕ(π(X)) E ϕ(A)ϕ(A) | X ϕ(π(X))
h i h i
=Var w(M )⊤ G(X)ϕ(π(X)) + E σ 2 ϕ(π(X))⊤ Σ(X)−1 ϕ(π(X))

which is equal to the asymptotic variance of the doubly robust estimator:

h i
Var w(M )⊤ GDR (Y , X, M )ϕ(π(X))
h i h i
=Var w(M )⊤ G(X)ϕ(π(X)) + Var w(M )⊤ (Y − G(X)ϕ(A)) ϕ(A)⊤ Σ(X)−1 ϕ(π(X))
h i
=Var w(M )⊤ G(X)ϕ(π(X)) +
h h i i
E Var w(M )⊤ Y − G(X)⊤ ϕ(A) | A, X (ϕ(A)⊤ Σ(X)−1 ϕ(π(X)))2
h i
=Var w(M )⊤ G(X)⊤ ϕ(π(X)) +
h i
E σ 2 ϕ(π(X))⊤ Σ(X)−1 E[ϕ(A)ϕ(A)⊤ | X, M ]Σ(X)−1 ϕ(π(X))
h i h i
=Var w(M )⊤ G(X)⊤ ϕ(π(X)) + E σ 2 ϕ(π(X))⊤ Σ(X)−1 ϕ(π(X)) .

B Observational studies with unknown propensity scores

We have so far focused on the experimental settings where the true propensity scores are known.
We now relax this assumption and consider observational studies, propensity scores must be esti-
mated. As shown in Kitagawa and Tetenov (2018), if we have consistent estimates êj (a | x) of the
propensities and plug them into Equation (8), we can still learn a consistent policy π̂ by maximizing

30
Vb (π). We define the new estimated policy as,
 
n Mi
1 X X 1 {Aij = π(Xij )} 
π̂ê := argmax Vbê where Vê :=
b Yi −1 +1 , (29)
π∈Π n  êj (π(Xij ) | Xi ) 
i=1 j=1

where êj is the estimated propensity score. We examine how the estimation of propensity scores
affect the performance of learned policy. The next assumption concerns the convergence rate of the
estimation error of the propensity score.

Assumption 6 (Estimation error of ej ). For some sequence ρn → ∞, assume that

 
n Mi
1 X X 1 1  = O ρ−1

E sup − n .
n a∈{0,1} ej (a | Xi ) êj (a | Xi )
i=1 j=1

We next discuss the regret guarantees when using the estimated propensity score. Theorem 3
shows that a simple plug-in approach based on the IPW-type estimator in Equation (8) typically
√
leads to a learned policy with a slower-than- n convergence rate.

Theorem 3 (Regret bound with estimated propensity score). Suppose Assumptions 1–4 and 6
hold. Define π̂ê as the solution in Equation (29). The regret of π̂ê can be upper bounded as,
1
R(π̂ê ) ≤ Op (n− 2 ∨ ρ−1
n ). (30)

Proof. By definition, the regret of π̂ê relative to π ∗ is given by

R(π̂ê ) = V (π ∗ ) − V (π̂ê ) = V (π ∗ ) − Vb (π ∗ ) + Vb (π ∗ ) − Vbê (π ∗ ) + Vbê (π ∗ ) − Vbê (π̂ê ) + Vbê (π̂ê ) − Vb (π̂ê ) + Vb (π̂ê ) − V (π̂ê )
≤ 2 sup |Vb (π) − V (π)| + 2 sup |Vb (π) − Vbê (π)|,
π∈Π π∈Π

where Vb (·) is the empirical policy value estimate using the true propensity score in Equation (8).
1
Due to Theorem 1, the first term is bounded by Op (n− 2 ) . We now study the second term.

n M
1 X Xi

1 {Aij = π(Xij )} 1 {Aij = π(Xij )}
sup |Vb (π) − Vbê (π)| = sup Yi −
π∈Π π∈Π n ej (π(Xij ) | Xi ) êj (π(Xij ) | Xi )
i=1 j=1

n Mi
1X X 1 1
≤B sup −
n a∈{0,1} ej (a | Xi ) ej (a | Xi )
i=1 j=1

= Op (ρ−1
n ).

Combining the above, we obtain the desired result.

C The estimator for the general polynomial additive model

Here, we give an explicit formula for the policy estimator under the general polynomial additive
structural model given in Equation (14). We first index the entries of E ϕ(Ai )ϕ(Ai )⊤ | Xi by the

sets J and K corresponding to the matrix row and column, where J and K contain the indices of

31
individuals associated with the respective entries of ϕ(Ai ). According to factored propensity score,
we have the following decomposition
h i Y
E ϕ(Ai )ϕ(Ai )⊤ | Xi = ej (Xi ).
J ,K
j∈J ∪K

We let Iiβ denote the power set of {∅, 1, . . . , Mi } with cardinality at most β. For simplicity, we
define ej (Xi ) := ej (1 | Xi ).
Now, we provide an explicit expression for the inverse of E ϕ(Ai )ϕ(Ai )⊤ | Xi by citing the

results of Lemma 1 of Cortez et al. (2022):

Lemma 1. The matrix Σ(Xi ) := E ϕ(Ai )ϕ(Ai )⊤ | Xi is invertible, with each entry of its inverse

matrix Σ−1 (Xi ) given by the following formula

Y −1 Y −1 X Y eℓ (Xi )
Σ−1 (Xi ) J ,K =

.
ej (Xi ) ek (Xi ) 1 − eℓ (Xi )
j∈J k∈K U ∈Iiβ ℓ∈U
(J ∪K)⊆U

Recalling Equation (14), the generalized form of the estimator is given by

n i−1
1X h
Vb (π) = Y i · ϕ(π(Xi ))⊤ E ϕ(Ai )ϕ(Ai )⊤ | Xi ϕ(Ai ),
n
i=1

Based on Lemma 1, we can directly calculate the cluster-level weights, resulting into
X X Y −Aij Y −π(Xik ) X Y eℓ (Xi )
ϕ(π(Xi ))⊤ Σ−1 (Xi )ϕ(Ai ) =
ej (Xi ) ek (Xi ) 1 − eℓ (Xi )
J ∈Ii K∈Ii j∈J
β β k∈K U ∈Iiβ ℓ∈U
(J ∪K)⊆U
X Y −Aij X Y −π(Xik ) X Y eℓ (Xi )
=
ej (Xi ) ek (Xi ) 1 − eℓ (Xi )
J ∈Ii j∈J
β β
K∈Ii k∈K U ∈Iiβ ℓ∈U
(J ∪K)⊆U
X Y −Aij X Y eℓ (Xi ) X Y −π(Xik )
=
ej (Xi ) 1 − eℓ (Xi ) ek (Xi )
J ∈Ii j∈J
β β
U ∈Ii ℓ∈U K⊆U k∈K
J ⊆U
X Y −Aij X Y eℓ (Xi ) Y π(Xik )

= 1−
ej (Xi ) 1 − eℓ (Xi ) ek (Xi )
β
J ∈Ii j∈J β
U ∈Ii ℓ∈U k∈U
J ⊆U

X Y eℓ (Xi ) Y π(Xik ) X Y −Aij
= 1−
1 − eℓ (Xi ) ek (Xi ) ej (Xi )
U ∈Iiβ ℓ∈U k∈U J ⊆U j∈J
X Y eℓ (Xi ) Y π(Xik ) Y

Aij

= 1− 1−
1 − eℓ (Xi ) ek (Xi ) ej (Xi )
β
U ∈Ii ℓ∈U k∈U j∈U
X Y (eℓ (Xi ) − Aiℓ )(eℓ (Xi ) − π(Xiℓ ))
=
eℓ (Xi )(1 − eℓ (Xi ))
β
U ∈Ii ℓ∈U

32
X Y (eℓ (Xi ) − Aiℓ )(eℓ (Xi ) − π(Xiℓ ))
=
eℓ (Xi )(1 − eℓ (Xi ))
β
U ∈Ii ℓ∈U
X Y 1 {Aiℓ = π(Xiℓ )}
= −1 .
eℓ (π(Xiℓ ) | Xi )
β
U ∈Ii ℓ∈U

Consequently, the explicit form of the final policy value estimator is given by:
n X Y 1 {Aiℓ = π(Xiℓ )}
1X
V (π) =
b Yi −1 .
n eℓ (π(Xiℓ ) | Xi )
i=1 β
U ∈Ii ℓ∈U

Exploring Marginal Treatment Effects Flexible Estimation Using Stata
100% (1)
Exploring Marginal Treatment Effects Flexible Estimation Using Stata
37 pages
Gauranga Das - The Art of Focus (2021, Penguin Random House India Private Limited) - Libgen - Li
67% (3)
Gauranga Das - The Art of Focus (2021, Penguin Random House India Private Limited) - Libgen - Li
253 pages
Fullz PDF
No ratings yet
Fullz PDF
3 pages
Docmine: Spare Parts Catalog
No ratings yet
Docmine: Spare Parts Catalog
83 pages
3 Happiness Exercises
No ratings yet
3 Happiness Exercises
20 pages
Markus Frölich, Stefan Sperlich - Impact Evaluation - Treatment Effects and Causal Analysis-Cambridge University Press (2019)
100% (3)
Markus Frölich, Stefan Sperlich - Impact Evaluation - Treatment Effects and Causal Analysis-Cambridge University Press (2019)
432 pages
Parasite SEO Secrets Revealed by Charles Floate
100% (1)
Parasite SEO Secrets Revealed by Charles Floate
73 pages
CausalML Book 2022
No ratings yet
CausalML Book 2022
500 pages
Hauz Khas Urban Village
No ratings yet
Hauz Khas Urban Village
7 pages
Derlon
100% (1)
Derlon
11 pages
09 Kbat Jawapan
88% (8)
09 Kbat Jawapan
40 pages
Paver Block Specification
No ratings yet
Paver Block Specification
8 pages
Testing MCQ
No ratings yet
Testing MCQ
59 pages
AC2 Engineering Utilities 2 Syllabus
No ratings yet
AC2 Engineering Utilities 2 Syllabus
16 pages
HD - Machine Learnind and Econometrics
No ratings yet
HD - Machine Learnind and Econometrics
185 pages
Pushpak Dagade - Python Algorithmic Trading Cookbook All The Recipes You Need To Implement Your Own Trading Strategies in Python (2020) (Z-Lib - Io)
No ratings yet
Pushpak Dagade - Python Algorithmic Trading Cookbook All The Recipes You Need To Implement Your Own Trading Strategies in Python (2020) (Z-Lib - Io)
2 pages
Surds PDF
No ratings yet
Surds PDF
12 pages
Asheshrambachan Harvard EconomicsPhD Dissertation Revised
No ratings yet
Asheshrambachan Harvard EconomicsPhD Dissertation Revised
254 pages
Shpit Ser 2016
No ratings yet
Shpit Ser 2016
34 pages
Explain The Physical Layer of The I2C Protocol
100% (1)
Explain The Physical Layer of The I2C Protocol
7 pages
Economic Growth and Development
100% (1)
Economic Growth and Development
20 pages
Causal Inference - A Statistical Learning Approach
No ratings yet
Causal Inference - A Statistical Learning Approach
247 pages
PES3701 Assignment 3
No ratings yet
PES3701 Assignment 3
3 pages
V51i13 PDF
No ratings yet
V51i13 PDF
35 pages
Imbens & Wooldridge (2009) Recent Developments in The Econometrics of Program Evaluation
No ratings yet
Imbens & Wooldridge (2009) Recent Developments in The Econometrics of Program Evaluation
83 pages
Topic - Four - Decisions - Are - Hard Aa
No ratings yet
Topic - Four - Decisions - Are - Hard Aa
85 pages
Knaus
No ratings yet
Knaus
112 pages
Final
No ratings yet
Final
9 pages
Heckman MTE
No ratings yet
Heckman MTE
83 pages
Difference-in-Differences With Interference: Ruonan Xu
No ratings yet
Difference-in-Differences With Interference: Ruonan Xu
65 pages
Causal Inference Under Interference Through Designed Markets
No ratings yet
Causal Inference Under Interference Through Designed Markets
63 pages
HD Econometrics
No ratings yet
HD Econometrics
197 pages
Potential Outcomes Framework
100% (1)
Potential Outcomes Framework
7 pages
Treatment Effects With Targeting Instruments: Sokbae Lee Bernard Salani e
No ratings yet
Treatment Effects With Targeting Instruments: Sokbae Lee Bernard Salani e
66 pages
CWP241717
No ratings yet
CWP241717
89 pages
WorkingPaper Spillovers 20160902
No ratings yet
WorkingPaper Spillovers 20160902
45 pages
Hernanrobins V2march52017
No ratings yet
Hernanrobins V2march52017
86 pages
Spillover Effects
No ratings yet
Spillover Effects
52 pages
Causal Without Control
No ratings yet
Causal Without Control
63 pages
Structural Equations... James J. Heckman and Edward Vytlacil
No ratings yet
Structural Equations... James J. Heckman and Edward Vytlacil
71 pages
Multi CATE Adaptation
No ratings yet
Multi CATE Adaptation
53 pages
Sunil IFPRI 23mar21 IV ESR PDFFormat
No ratings yet
Sunil IFPRI 23mar21 IV ESR PDFFormat
54 pages
Regression Adjustments For Estimating The Global Treatment Effect in Experiments With Interference
No ratings yet
Regression Adjustments For Estimating The Global Treatment Effect in Experiments With Interference
36 pages
1712 04802
No ratings yet
1712 04802
81 pages
Fixed-Population Causal Inference For Models of Equilibrium
No ratings yet
Fixed-Population Causal Inference For Models of Equilibrium
39 pages
Double Machine Learning For Causal Inference Under Shared-State Interference
No ratings yet
Double Machine Learning For Causal Inference Under Shared-State Interference
48 pages
Intro To PPE
No ratings yet
Intro To PPE
45 pages
Using Instrumental Variables For Inference About Policy Relevant Treatment Parameters
No ratings yet
Using Instrumental Variables For Inference About Policy Relevant Treatment Parameters
57 pages
Athey Imbens 2017 The State of Applied Econometrics Causality and Policy Evaluation
No ratings yet
Athey Imbens 2017 The State of Applied Econometrics Causality and Policy Evaluation
67 pages
Hernanrobins v2.17.14
No ratings yet
Hernanrobins v2.17.14
86 pages
Hernanrobins v2.17.18
No ratings yet
Hernanrobins v2.17.18
86 pages
Generalization Bounds and Representation Learning For Estimation of Potential Outcomes and Causal Effects
No ratings yet
Generalization Bounds and Representation Learning For Estimation of Potential Outcomes and Causal Effects
50 pages
Соц Эффект По Закону
No ratings yet
Соц Эффект По Закону
36 pages
Session67 DR Daniel C Suryadarma Impact Evaluation Methods I Impact Evaluation Methods II Rev
No ratings yet
Session67 DR Daniel C Suryadarma Impact Evaluation Methods I Impact Evaluation Methods II Rev
55 pages
Spillover Effects in Experimental Data
No ratings yet
Spillover Effects in Experimental Data
42 pages
Modeling Interference Using Experiment Roll-Out
No ratings yet
Modeling Interference Using Experiment Roll-Out
32 pages
Survey Learning Horner Skrzypacz PDF
No ratings yet
Survey Learning Horner Skrzypacz PDF
30 pages
21-Economics-2017 (Tamil) - Final - 1693223768823
No ratings yet
21-Economics-2017 (Tamil) - Final - 1693223768823
74 pages
NeuralHack Stage 2 Python
100% (1)
NeuralHack Stage 2 Python
2 pages
Econometric Evaluation and HCF 2024
No ratings yet
Econometric Evaluation and HCF 2024
61 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
6824 Staggered Rollout Designs Enab
No ratings yet
6824 Staggered Rollout Designs Enab
13 pages
Guiding Principle:: Title: Training Guide For Dcws On Self Help Assessment
No ratings yet
Guiding Principle:: Title: Training Guide For Dcws On Self Help Assessment
33 pages
Centralized Approach To Modelling
No ratings yet
Centralized Approach To Modelling
24 pages
Factors of Production and Their Theories
No ratings yet
Factors of Production and Their Theories
19 pages
Economics As A Science
No ratings yet
Economics As A Science
15 pages
Cluster Randomized Designs For One-Sided Bipartite Experiments
No ratings yet
Cluster Randomized Designs For One-Sided Bipartite Experiments
13 pages
Marginal Treatment Effects (MTE)
No ratings yet
Marginal Treatment Effects (MTE)
30 pages
Соц Эффект По Закону2
No ratings yet
Соц Эффект По Закону2
29 pages
Athey StateAppliedEconometrics 2017
No ratings yet
Athey StateAppliedEconometrics 2017
31 pages
C E T D (A - ) O V: Ausal Stimation For EXT Ata With Ppar ENT Verlap Iolations
No ratings yet
C E T D (A - ) O V: Ausal Stimation For EXT Ata With Ppar ENT Verlap Iolations
20 pages
Large Sample Randomization Inference of Causal Effects in The Presence of Interference
No ratings yet
Large Sample Randomization Inference of Causal Effects in The Presence of Interference
15 pages
Journal of Econometrics: Yanqin Fan, Sang Soo Park
No ratings yet
Journal of Econometrics: Yanqin Fan, Sang Soo Park
15 pages
Fsds 26-06 Data Cleaning With Excel
No ratings yet
Fsds 26-06 Data Cleaning With Excel
7 pages
9222 - Calculating and Processing Devices (WEEK 4)
No ratings yet
9222 - Calculating and Processing Devices (WEEK 4)
11 pages
Current Affairs - Compendium - DMS - IIT - Delhi
No ratings yet
Current Affairs - Compendium - DMS - IIT - Delhi
28 pages
Pressure Groups
No ratings yet
Pressure Groups
12 pages
Policy Evaluation and Efficiency A Systematic Literature Review
No ratings yet
Policy Evaluation and Efficiency A Systematic Literature Review
23 pages
Learning When To Treat Policies
No ratings yet
Learning When To Treat Policies
19 pages
Machine Learning Methods For Estimating Heterogeneous Causal Effects
No ratings yet
Machine Learning Methods For Estimating Heterogeneous Causal Effects
25 pages
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
No ratings yet
Ayitenew Determinantsof Internal Audit Effectiveness Evidencefrom Gurage Zone
12 pages
Detecting Interference Between Units
No ratings yet
Detecting Interference Between Units
22 pages
P.7 Math
No ratings yet
P.7 Math
12 pages
The Board of Regents of The University of Wisconsin System
No ratings yet
The Board of Regents of The University of Wisconsin System
23 pages
Spillover
No ratings yet
Spillover
13 pages
ImPPPact Nigeria Alliance People First PPPs For Sustainable Development
No ratings yet
ImPPPact Nigeria Alliance People First PPPs For Sustainable Development
9 pages
Targeted Optimal Treatment Regime Learning Using Summary Statistics
No ratings yet
Targeted Optimal Treatment Regime Learning Using Summary Statistics
20 pages
Standard Operating Procedure (SOP) For Program Facilitators
No ratings yet
Standard Operating Procedure (SOP) For Program Facilitators
7 pages
Processes of Legislation
No ratings yet
Processes of Legislation
9 pages
Candidate System - UHB NHS-Recruit
No ratings yet
Candidate System - UHB NHS-Recruit
3 pages
PHD Fee Waiver
No ratings yet
PHD Fee Waiver
3 pages
Physical Science - q4 - Slm13-Pages-Deleted
No ratings yet
Physical Science - q4 - Slm13-Pages-Deleted
5 pages
Causal Optimal Transport For Treatment Effect Estimation
No ratings yet
Causal Optimal Transport For Treatment Effect Estimation
13 pages
Introduction EMT357-upload Ver
No ratings yet
Introduction EMT357-upload Ver
19 pages
DVB Conditional Access System - CAS & SMS - CryptoGuard AB CryptoGuard AB
No ratings yet
DVB Conditional Access System - CAS & SMS - CryptoGuard AB CryptoGuard AB
7 pages
Foot-Surface-Structure Analysis Using A Smartphone-Based 3D Foot Scanner
No ratings yet
Foot-Surface-Structure Analysis Using A Smartphone-Based 3D Foot Scanner
7 pages
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
No ratings yet
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
9 pages
fml-g12s Ds en
No ratings yet
fml-g12s Ds en
7 pages
Premier League Data - Activity Questions: Part A: Sorting and Filtering
No ratings yet
Premier League Data - Activity Questions: Part A: Sorting and Filtering
3 pages
Task 3 - Instructions Sheet
No ratings yet
Task 3 - Instructions Sheet
4 pages
Q2 Lesson 1 Worksheet
No ratings yet
Q2 Lesson 1 Worksheet
2 pages
Dpi Reports
No ratings yet
Dpi Reports
2 pages
Filtermedia HSL HSL-C Uk
No ratings yet
Filtermedia HSL HSL-C Uk
2 pages

Individualized Policy Evaluation and Learning Under Clustered Network Interference

Uploaded by

Individualized Policy Evaluation and Learning Under Clustered Network Interference

Uploaded by

Individualized Policy Evaluation and Learning under

Clustered Network Interference∗

Keywords: individualized treatment rules, partial interference, randomized experiments, spillover

2 The Problem Statement

2.1 Setup and Notation

(a) Unconfoundedness: Yi (ai ) ⊥

(b) Positivity: ∃ ϵ > 0 such that ϵ < e (ai | xi ) < 1 − ϵ

2.2 Individualized Policy Learning Problem

Π = {π : X → {0, 1} | π(X) = 1(γ0 + X ⊤ γ ≥ 0), γ0 ∈ R, γ ∈ Rp }.

R(π) = V (π ∗ ) − V (π). (3)

3.1 The Standard Inverse-probability-weighting (IPW) Estimator

3.2 A Semiparametric Additive Outcome Model

E [Yij (ai ) | Xi ] = γ1 + γ2 aij + γ3 ai(−j) + γ4⊤ Xij + γ5⊤ X i(−j) , (6)

Example 2 (Additive nonparametric effect model under anonymous interference).

where h := (h0 , h1 , h2 ) is a vector of nonparametric functions. This model is a nonparametric

3.3 Identification and Estimation

Assumption 3. (Factored cluster-level propensity score) The cluster-level treatment probability

E[Vb addIPW (π)] = V (π)

The unbiasedness of this estimator is immediate:

3.4 Semiparametric Model with Interactions

E [Yij (ai ) | Xi ] = gj (Xi )⊤ ϕ(ai ), (14)

π̂ := argmax Vb addIPW (π). (18)

4.1 Regret Analysis

Assumption 4. The following statements hold:

Proof. See Appendix A.

4.2 Mixed-Integer Program Formulation

Π = {π : X → {0, 1} | π(X) = 1(X ⊤ γ ≥ 0), γ ∈ B}.

⊤ γ is non-negative and zero otherwise, i.e., p = π(X ). We now

5 Extension to Observational Studies

µ(a, x) = G(x)ϕ(a) (21)

where ϕ(a) := (1, a1 , . . . , am )⊤ is the treatment assignment vector, and G(x) is a m × (m + 1)

Assumption 5 (Homoskedasticity). Assume homoscedastic error in the outcome model, i.e.,

Proof. See Appendix A.

π(X) = 1(β0 + β1 X1 + β2 X2 + β3 X3 + β4 X4 ≥ 0).

50 100 200 400 800 50 100 200 400 800

Method IPW.NoInt IPW AddIPW Oracle

Ananth, A. (2020). Optimal treatment assignment rules on networked populations. Technical

Chamberlain, G. (1992). Efficiency bounds for semiparametric regression. Econometrica: Journal

Imai, K. and M. L. Li (2023). Experimental evaluation of individualized treatment rules. Journal

Liu, L., M. G. Hudgens, and S. Becker-Dreps (2016). On inverse probability-weighted estimators

Robinson, P. M. (1988). Root-n-consistent semiparametric regression. Econometrica: Journal of

Swaminathan, A., A. Krishnamurthy, A. Agarwal, M. Dudik, J. Langford, D. Jose, and I. Zi-

Tchetgen, E. J. T. and T. J. VanderWeele (2012). On causal inference in the presence of interference.

Tchetgen Tchetgen, E. J. and T. J. VanderWeele (2012). On causal inference in the presence of

Wainwright, M. J. (2019). High-dimensional statistics: A non-asymptotic viewpoint, Volume 48.

B Observational studies with unknown propensity scores 30

C The estimator for the general polynomial additive model 31

We start by examining the expectation of the proposed estimator,

We can further compute,

A.2 Proof of Theorem 1

Proof. By definition, the regret of π̂ relative to π ∗ is given by

R(π̂) = V (π ∗ ) − V (π̂) = V (π ∗ ) − Vb (π ∗ ) + Vb (π ∗ ) − Vb (π̂) + Vb (π̂) − V (π̂)

σ 2 where σ 2 is a constant. Thus, this efficiency bound becomes:

which is equal to the asymptotic variance of the doubly robust estimator:

B Observational studies with unknown propensity scores

Assumption 6 (Estimation error of ej ). For some sequence ρn → ∞, assume that

Proof. By definition, the regret of π̂ê relative to π ∗ is given by

Combining the above, we obtain the desired result.

C The estimator for the general polynomial additive model

results of Lemma 1 of Cortez et al. (2022):

matrix Σ−1 (Xi ) given by the following formula

Recalling Equation (14), the generalized form of the estimator is given by

You might also like