0% found this document useful (0 votes)
45 views32 pages

Modeling Interference Using Experiment Roll-Out

The document proposes a framework for modeling interference in experiments using staggered roll-out designs. It develops a model selection method called Leave-One-Period-Out to evaluate and select between different interference models. It also studies how roll-outs can help identify causal effects when interference is present and quantify the statistical efficiency gains from using a roll-out design.

Uploaded by

liushaoyusz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views32 pages

Modeling Interference Using Experiment Roll-Out

The document proposes a framework for modeling interference in experiments using staggered roll-out designs. It develops a model selection method called Leave-One-Period-Out to evaluate and select between different interference models. It also studies how roll-outs can help identify causal effects when interference is present and quantify the statistical efficiency gains from using a roll-out design.

Uploaded by

liushaoyusz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Modeling Interference Using Experiment Roll-out

Ariel Boyarsky1 Hongseok Namkoong1 Jean Pouget-Abadie2


1 Decision, Risk, and Operations Division, Columbia Business School, 2 Google Research
{aboyarsky26, namkoong}@gsb.columbia.edu, [email protected]

Abstract
Experiments on online marketplaces and social networks suffer from interference, where the
arXiv:2305.10728v2 [stat.ME] 16 Aug 2023

outcome of a unit is impacted by the treatment status of other units. We propose a framework
for modeling interference using a ubiquitous deployment mechanism for experiments, staggered
roll-out designs, which slowly increase the fraction of units exposed to the treatment to mitigate
any unanticipated adverse side effects. Our main idea is to leverage the temporal variations in
treatment assignments introduced by roll-outs to model the interference structure. Since there
are often multiple competing models of interference in practice we first develop a model selection
method that evaluates models based on their ability to explain outcome variation observed along
the roll-out. Through simulations, we show that our heuristic model selection method, Leave-
One-Period-Out, outperforms other baselines. Next, we present a set of model identification
conditions under which the estimation of common estimands is possible and show how these
conditions are aided by roll-out designs. We conclude with a set of considerations, robustness
checks, and potential limitations for practitioners wishing to use our framework.

1 Introduction
Experimentation is at the core of scientific decision-making in many online platforms. However,
many of the experiments run by these platforms suffer from interference, meaning an intervention
on a participant might impact the outcome of other participants. For example, interference is prob-
lematic in online marketplaces where strategic agents compete for scarce resources (Pouget-Abadie
et al., 2018; Ugander and Yin, 2020; Johari et al., 2020; Candogan et al., 2021; Brennan et al.,
2022; Bojinov et al., 2022a; Bojinov and Gupta, 2022; Bright et al., 2022), and in online social
networks (Eckles et al., 2016; Chang et al., 2022) where peers interact and influence each other. In-
terference among units leads to a violation of the Stable Unit Treatment Value Assumption (Rubin,
2005; Imbens and Rubin, 2015) and the causal effect may not even be identifiable in a randomized
experiment.
A common practical approach to estimating causal effects when interference is present is to
assume a specific potential outcome structure based on the network(s) between participants. To-
gether, the postulated potential outcome structure and the underlying network(s) constitute an
interference model, which can be used to estimate the true causal effect directly (Eckles et al.,
2017; Biswas and Airoldi, 2018), or modify the experimental design to mitigate the effects of in-
terference (Basse and Airoldi, 2018). As a motivating example, which we will explore further in
Section 2, consider an online auction platform incrementally rolling out an experiment that tests
the effectiveness of increasing reserve prices. Treating one auction is likely to affect each partici-
pating advertisers, which can in turn affect other auctions in which they participate. Practitioners
have postulated several working models of interference for how advertisers and auctions interact
with one another. Each modeling choice might lead to a different estimated treatment effect.

1
Such ad-hoc structural modeling, however, is often unreliable in practice since the interference
network is rarely known and the potential outcome structure is almost always misspecified. As a
result, selecting and validating models of interference is a crucial—but difficult—step of estimating
causal effects when interference is present. Furthermore, even when committing to a fixed inter-
ference model, estimating its parameters may be infeasible if the observed data does not contain
enough variation in treatment exposure across units.
We propose a framework for modeling interference using a ubiquitous deployment mechanism for
experiments, staggered roll-out designs. Online platforms rely on staggered roll-out designs (Kohavi
et al., 2009; Xu et al., 2015, 2018) as an early detection tool for any unintended consequences caused
by a new experiment or product launch, e.g., software bugs or adverse participant responses. Roll-
out designs follow a simple principle: instead of intervening on all participants marked for the
intervention all at once, the proportion of participants exposed to the intervention is increased
incrementally until all participants marked for the intervention have been intervened on. The
practice is nearly universal across standard experimentation infrastructures; for example, Xu et al.
(2018) notes that out of 5000+ experiments run annually at LinkedIn, “every experiment goes
through a ramp-up/roll-out process.”
In this paper, we propose to leverage existing roll-out designs for a new purpose: to better select
and estimate models of interference in causal estimation when interference is present. Despite its
ubiquity, the temporal variation in treatment exposure induced by roll-out designs is a promising
yet largely overlooked consequence of how experiments are implemented in practice. Variations in
treatment proportions, whether temporal (Bojinov et al., 2022b; Han et al., 2022) or spatial (Athey
et al., 2018; Baird et al., 2018), are key to modeling interference and validating modeling choices.
As our first main contribution, we utilize roll-out experimental designs to develop a model
selection mechanism, Leave-One-Period-Out, to select the best interference model and evaluate
this mechanism over a rich set of simulated examples. We show that, given an interference model,
roll-out designs allow the identification of interference parameters, which may be unidentified in
the absence of any temporal variation in treatment exposure. We theoretically characterize when
the causal estimand becomes identifiable with the help of roll-out designs and quantify the level of
temporal variation required to identify heterogeneous patterns across units. Finally, we quantify
the statistical efficiency gains resulting from a roll-out design.
In Section 2, we introduce our estimand of interest, define roll-out designs, and provide a
motivating example. In Section 3, we develop our heuristic modeling framework and provide ex-
perimental evidence supporting our model selection algorithm. In Section 4, we study the question
of identification, specifically how roll-outs help identify and estimate causal effects in the presence
of interference. Finally, in Section 5.1, we address challenges practitioners may face when applying
our framework and consider possible robustness checks that could be performed.

Related work
The literature on causal inference in the presence of interference is extensive (e.g., see the work of
Rosenbaum (2007); Hudgens and Halloran (2008); Sävje et al. (2017); Leung (2019); Farias et al.
(2022); Viviano (2020)). Our work is related to the subset of this literature that focuses on new
experimental designs to mitigate the effects of interference (Eckles et al., 2017; Baird et al., 2018;
Brennan et al., 2022); we repurpose roll-outs, a common design in practice, to model interfer-
ence. While several authors have proposed new estimands and estimators to better understand
the mechanism of interference and reduce its bias-inducing effects (Yuan et al., 2021; Karrer et al.,

2
2021; Yu et al., 2022a; Zigler and Papadogeorgou, 2021), we study common models of interfer-
ence (Aronow and Samii, 2017a; Basse et al., 2016) and focus on the familiar total treatment effect
estimand (Chin, 2019). Instead of suggesting new estimators and estimands, we leverage roll-out
designs to formulate a new method for validating and choosing from these previously introduced
models.
The operational benefits of roll-out designs have been well-documented in the context of online
platforms (Kohavi et al., 2009; Xu et al., 2015; Xiong et al., 2020). However, the study of roll-
out designs is sparse in the context of interference. Cortez et al. (2022b) recently studied roll-out
designs to develop unbiased estimates of treatment effects. The work is similar to ours in that both
frameworks do not assume the knowledge of the underlying network but make different assumptions
to make inference possible. Cortez et al. (2022b) takes a design-based perspective in which given a
low-degree polynomial structure for interference they design a roll-out with an unbiased estimator
of the treatment effect. In contrast, our heuristic model selection framework applies to any roll-out
design and interference model. When the interference model is well-specified, we provide theoretical
guarantees quantifying how roll-out designs boost statistical efficiency.
Validating and choosing from potential outcome models that incorporate interference bears
many similarities to the task of detecting interference, which has historically relied on one of two
methods. The first set of methods is to compare two designs with different properties under SUTVA
and interference, often simultaneously using a hierarchical design structure (Sinclair et al., 2012;
Saveski et al., 2017; Pouget-Abadie et al., 2017). A second approach consists in running Fisher-
randomized-like tests on observed data to determine the significance of well-chosen estimators that
are non-zero if and only if interference is present (Aronow and Samii, 2017b; Athey et al., 2018; Basse
et al., 2019b). Both approaches seek to exploit fluctuations in a specific parameter of interference
to determine whether (1) interference is present and potentially (2) whether it occurs in the form
or through the channel of that parameter.
Roll-outs provide a third paradigm for detecting interference, as they introduce desirable fluc-
tuations in interference-sensitive parameters such as the global treatment fraction. In a concurrent
and independent work, Han et al. (2022) study how roll-outs can be used to detect interference.
They design randomization tests that can be used to detect cross-unit interference even in the
presence of temporal effects. In comparison, the present paper goes beyond detection: at the cost
of stronger modeling assumptions, we study the direct modeling of interference effects. Since in-
terference may be common in online platforms, rather than establishing its existence, we propose
estimation and model selection methods that allow contextualizing the operational significance of
interference effects compared to the direct treatment effect. While our modeling approach can also
be used as a heuristic test for interference, we do not establish its theoretical validity and instead
focus on identifiability and estimation guarantees.

2 Setting
We use the potential outcomes notation for a finite population of size N . We do not make the
Stable Unit Treatment Value Assumption (SUTVA) (Imbens and Rubin, 2015) such that, for any
treatment vector z ∈ {0, 1}N , the potential outcome Yi (z) of each unit i may depend on the
treatment status of other units due to interference. For concreteness, we focus on the identification

3
and estimation of the total treatment effect estimand
N
1 X
TTE := (Yi (1) − Yi (0)) . (2.1)
N
i=1

In practice, the TTE is of particular relevance to online platforms whose goal is to determine
whether a product innovation is fruitful when it is completely adopted (Bond et al., 2012; Eckles
et al., 2017).
Roll-outs, also known as “ramp-ups”, are a common experimental practice where instead of
assigning treatments at once, it is done in incremental steps. While primarily instituted to improve
engineering reliability, they induce important temporal variation in a unit’s treatment exposure
that can be used to better model the effect of interference in randomized experiments. Formally,
a roll-out design with T periods consists of a sequence of treatment assignments {Z t }Tt=1 and
corresponding observed outcomes {Yi,t := Yi (Z t )}t∈[T ],i∈[N ] such that, once a unit is treated, it
remains treated for the remainder of the experiment. For typical experiments, T is a small number,
often equal to 5 or less. The completely randomized roll-out design considers a fixed proportion of
units to be newly treated at each period (Cortez et al., 2022a).
Definition 1 (Completely Randomized Roll-outs). A T -period completely randomized roll-
out is an increasing set of random PT treatment assignments, {Z 1 , . . . , Z T }, and treatment allocation
vector p⃗ = {p1 , . . . , pTj} with t=1k pt ≤ 1 such that in each period, t, Zit ∈ {0, 1} is randomly chosen
such that N t
Pt t−1
= 1 then Zit = 1.
P
i=1 Zi = N j=1 pt and if Zi

Another roll-out design is to specify an independent Bernoulli probability to treat each unit, known
as a Bernoulli randomized design. This design does not meaningfully change the conclusions of our
work, and we defer its definition to Section A of the appendix. Roll-out designs are characterized
by the proportion of newly treated individuals in each period: “even” (resp. “uneven”) roll-outs
treat the same (resp. different) incremental proportions of individuals in each period. Even among
even and uneven roll-outs, there are several possible roll-out mechanisms for assigning treatments,
corresponding to different joint distributions over Z t .

Example 1 (Linear-in-Means Models with Heterogeneity): As a motivating example, we consider


an advertising auction system where bidders compete for limited items. We are interested in
measuring the impact of changing the reserve price—the minimum required bid to participate in the
auction—on advertisers’ spend (outcomes). The example models common operational concerns on
online platforms (Pouget-Abadie et al., 2018). In this two-sided marketplace with finite resources,
interference occurs when changing the reserve price for some items leads bidders to change their
bidding strategy, thus affecting the outcome of other auctions they participate in.
While a bipartite graph of bidders and auctions is usually used to represent the full market, in
statistical inference, it is common to reduce the bipartite market structure to a single interference
graph between the N items (Brennan et al., 2022). In this interference graph, edges represent a
notion of competition, e.g., substitutable keywords (goods). As we substantiate further in Section 3,
there are multiple ways to construct the item-to-item interference network in practice: we may
consider whether differences in advertising budgets should be taken into account or whether two
items are considered in competition if their co-bidders achieve a certain activity threshold.
For concreteness, consider two plausible and competing ways of defining “neighboring units”
for any given unit i, G1 (i) and G2 (i), based on two different interference networks. For each notion

4
of “neighborhood”, we can posit a simple linear model of interference
X
Yit (Z t ) = αi⋆ + τ ⋆ · Zit + η1⋆ · Zjt + ϵi,t (2.2a)
j∈G1 (i)
X
Yit (Z t ) = αi⋆ + τ ⋆ · Zit + η2⋆ · Zjt + ϵi,t . (2.2b)
j∈G2 (i)

Linear models similar to the ones above have been previously studied by Eckles et al. (2017); Aronow
and Samii (2017a); Basse et al. (2019a). Given these two competing models of interference (2.2),
the better model can be selected by measuring each model’s ability to explain the variation in the
outcomes as treatment exposure increases in the roll-out. ⋄

3 Model Selection
To motivate the model selection problem, we again consider the class of models defined in Example
1. This example captures how a standard linear model can be highly flexible, allowing us to
incorporate a wide range of interference structures. Its richness highlights the need to distinguish
between model instances that are useful in explaining interference and those that are not.
In this section, we propose a model selection mechanism inspired by leave-one-out cross-validation
to choose between models of interference. A unit’s outcome depends on its treatment exposure
which varies as we increase the treatment allocation throughout a roll-out. Our main observation
is that we can test whether the selected interference model, trained on a subset of treatment pe-
riods, is able to extrapolate to different levels of treatment exposure by evaluating its predictive
performance on observations from remaining periods. Hence, we use the mean-squared prediction
error for a given period t,
N N
1 X tb 1 X t b
MSPEt (θ)
b := (Xi θ − Yit )2 = (Ŷi (θ) − Yit )2 , (3.1)
N N
i=1 i=1

where Yit (θ)


b = X t θ,
i
b X t ∈ R1×K refers to the row of features for unit i at time t, and θb ∈ RK
i
are estimated parameters. We can relate the MSPE to the estimation error of the TTE. For
instance, suppose that we are given data for a new roll-out period, s, of the form (X s , Y s ) where
X s ∈ RN ×K and Y s ∈ RN (cf. Eq. (4.4)). Using the previous roll-outP periods to estimate θ, b we
s s (s) 1 N s s⊤
predict Ŷ (θ) = X θ. Define the sample covariance matrix as Σ = N i=1 Xi Xi . Then,
b b

ˆ − TTE)2 = [c⊤ (θ⋆ − θ)]


(TTE b 2
⋆ 2 1 PN sb s 2
2 ||θ − θ ||Σ(s) i=1 (Xi θ − Yi + ϵi,s )
b
2 N
≤ ∥c∥2 · = ∥c∥ 2
λmin (Σ(s) ) λmin (Σ(s) )
N N
!
2 ∥c∥22 1 X h sb s 2 2
i 2 ∥c∥22 b + 1
X
≤ · (Xi θ − Y i ) + ϵi,s = · MSPEs (θ) ϵ2i,s
λmin (Σ(s) ) N i=1
λmin (Σ(s) ) N
i=1

where ϵi,s = Yis − Xis θ⋆ and the second inequality follows from convexity. This provides a heuristic
argument for using the MSPE criteria in our model section procedure.

5
Figure 1: Graphical Representation of Leave-One-Period-Out

Algorithm 1 Leave-One-Period-Out Model Selection


(m) (m) (m) (m)
1: Input: Data D = {{Yit , Xit = [Wi,t , Zit , fi (Z t )]}i∈[N ],t∈[T ] }m∈[M ] where Wi,t are other
model m specific features
2: for m ∈ [M ] do
3: for t ∈ [T ] do
(m)
4: Estimate θb(m) using data D−t (excluding period t)
(m) (m)
5: Predict Ŷ t (θb(m) ) ← X t θb(m) using data Dt
Store mean squared prediction error: MSPEt (θb(m) ) ← 1 N (Ŷ t (θb(m) ) − Y t )2
P
6: N i=1 i i
7: end for
8: MODEL MSPESm ← Average(MSPE)
9: end for
10: Return arg minm∈[M ] MODEL MSPESm

Proposed method: Leave-One-Period-Out When interference is present, outcomes are non-


stationary due to the increasing treatment allocation of a roll-out. For example, an estimator that
is fitted on the first period of a roll-out may not extrapolate to the last period. Our proposed
procedure, Leave-One-Period-Out (LOPO), leverages the fact that every period offers an op-
portunity to test an interference model’s ability to extrapolate outcomes from different levels of
treatment exposure. In parallel, we leave out each period for testing and estimate parameters θb on
all other periods. After we have predicted outcomes Ŷ for every period, we compute the MSPE
over each prediction task and output the model that minimized the average of these MSPEs. This
procedure is visualized in Figure 1 and formalized in Algorithm 1. While this is only a heuristic,
our empirical results in Section 3.1 demonstrate its effectiveness.

Baseline procedures To evaluate our proposed Leave-One-Period-Out (LOPO) procedure,


we compare its performance against reasonable baselines outlined in Table 1, including methods
that incorporate both temporal and network structure and those that do not. The No-Roll-out
procedure considers what happens when the sample size is increased to match that from a roll-
out design, but no temporal variation is generated in the interference structure because treatment
status remains constant. The procedure provides a fair comparison against our proposed LOPO
method in case the gains our method achieves are simply due to the increased effective sample size
from a roll-out. We also include a Pooled K-Fold procedure which pools the data across all
periods and conducts standard K-fold cross-validation. Pooled K-Fold evaluates how well our
proposed methodology works relative to standard cross-validation tools that implicitly assume all

6
Model Selection Overview Considers Considers
Method Temporal Network
Variation? Structure?
No Roll-out Simulates additional periods all with 50% of No Yes
units treated. Then applies Algorithm 1.
Pooled K-Fold K-Fold cross validation over units after No No
pooling all periods together with K = 10.
Train First Estimates model on first T − 1 periods and Yes Yes
evaluates model on period T .
Train Last Estimates model on last T − 1 periods and Yes Yes
evaluates model on first period.
LOPO (Proposed) Applies the procedure in Algorithm 1. Yes Yes
Table 1: Model Selection Mechanisms

data points are exchangeable. Since the exchangeability assumption is violated due to interference,
the LOPO procedure circumvents this problem by exchanging whole periods. Finally, Train
First and Train Last are in the same spirit as LOPO method, except they only consider steps
t = T and t = 0 respectively of Algorithm 1. They preserve both network and temporal structures
of the experiment and are less computationally intensive. There are other model selection methods
not mentioned here that are similar to LOPO, e.g., train on the first and last periods and evaluate
on all other periods. We find that in most cases LOPO achieves the best performance, and omit
them from our comparisons.

3.1 Simulation Setup


Continuing from Example 1, we consider a two-sided marketplace between advertisers and auctions.
We anchor our experiments in a previously motivated setting where there are two competing in-
terference graphs that might describe the observed interactions across advertisers (Brennan et al.,
2022). For example, one graph might consider an advertiser’s historical spending for a certain
window of time, whereas another might consider only certain types of spend or a different window
of time. Even if both graphs consider the same bipartite graph to represent interactions between
advertisers and auctions, there are different ways to “fold” this graph into a one-sided interference
network of advertisers with other advertisers, as suggested by Brennan et al. (2022). Each folded
graph leads to different interference neighborhoods, which we can define as G1 (·) and define G2 (·).
We now generate data according to the true model of interference described by G1 (i)
X
Yit (Z t ) = α⋆ + τ ⋆ · Zit + η1⋆ · Zjt + ϵi,t . (3.2)
j∈G1 (i)

Let us define several competing models of interference to predict advertisers’ spend. We want to
test how reliably each model selection method can select the true model. We consider the following
competing models,

Yit (Z t ) = α⋆ + τ ⋆ · Zit + ϵi,t (3.3)

7
Figures No Pooled Train Train Last LOPO
Roll-out K-Fold First (Proposed)
Figure 5a 81.2 15.2 51.6 47.8 12.6
(80.4, 82.0) (14.5, 15.9) (50.6, 52.6) (46.8, 48.8) (12.0, 13.3)
Figure 5b 81.4 19.6 45.8 46.2 15.0
(80.6, 82.2) (18.8, 20.4) (44.8, 46.8) (45.2, 47.2) (14.3,15.7)
Figure 6a 21.8 19.0 39.6 44.8 19.0
(21.0, 22.6) (18.2, 19.8) (38.6, 40.6) (43.8, 45.8) (18.2, 19.8)
Figure 6b 45.8 11.0 19.8 18.6 19.6
(44.8, 46.8) (10.4, 11.6) (19.0, 20.6) (17.8, 19.4) (18.8, 20.4)
Table 2. Percentage of Models Incorrectly Selected: Each row displays the percentage of incorrect
model selections by each procedure for the simulation in the corresponding figure. 95% bootstrapped
confidence intervals are displayed in parentheses.

X
Yit (Z t ) = α⋆ + τ ⋆ · Zit + η1⋆ · Zjt + ϵi,t (3.4)
j∈G2 (i)

The model (3.3) assumes no interference and the model (3.4) considers an incorrect interference
network defined by G2 (·) as in Example 1.
In real-world applications, the effect of interference is typically thought to be smaller than the
direct effect of treatment (Blake and Coey, 2014). To capture this, we set τ ⋆ = 5, η1⋆ = 2, and
iid
simulate data using ϵi,t ∼ N (0, 1). In each experiment, we observe outcome and treatment data
from a 50% completely randomized roll-out (cf. Def. 1), such that 50% of units are treated after
the last period. We fit a linear regression model associated with the selected model to estimate
the total treatment effect. Throughout, we assume T = 5 in each experiment which coincides with
many practical applications with few roll-out periods.

3.2 Simulation Results


In the experiments below, we consider several variations for establishing the effectiveness of the
LOPO methodology. We first consider both even and uneven 50% roll-outs. We then consider
introducing additional interference terms to the true model and allowing for individual heterogeneity
and time-varying effects. Lastly, we consider the effect of network sparsity by evaluating the
performance of our model selection mechanisms as the underlying network density increases.
We evaluate each method on two metrics: how often it selects the correct model of interference
and how well it minimizes the estimation error of the TTE regardless of whether it has chosen the
correct model. Table 2 summarizes the percentage of times each model selection procedure selects
the incorrect model. However, since an incorrectly selected model could still yield similar estimates
of the TTE, Figures 2 and 3 present the distribution of the relative absolute percent estimation
error of the TTE over 500 runs.

Even vs. uneven roll-outs. In this first experiment, we consider model selection in the even
vs. uneven roll-out setting. In the even setting, we consider treatment increments of 10%. The
uneven setting considers five treatment periods with proportions, p⃗ = [0.01, 0.09, 0.10, 0.15, 0.15],

8
(a) Even Roll-out (b) Uneven Roll-out

Figure 2. Even vs. Uneven Roll-outs: figures are generated by averaging the results of 500 experi-
ments. Each model is estimated with a sample size of N = 1000 and T = 5 periods. The plots show
the distribution of relative estimation error (%) for the model selection mechanism. The central tick
marks represent the median.

that sum to 0.50. We observe the proposed LOPO procedure performs the best in both the even
and uneven settings, verifying our intuition outlined above. The Pooled K-Fold procedure also
does a good job at minimizing estimation error but performs worse than LOPO on selecting the
correct model, as we observe in rows one and two of Table 2. This may emphasize the importance of
considering the underlying network structure in our selection procedure. In contrast, Train First,
Train Last, and No Roll-out have very different performances across simulation settings and
usually underperform relative to LOPO. LOPO tends to be more robust to changes in the roll-out
schedule, which is useful when the practitioner cannot control the roll-out schedule.

Varying the true interference model We now consider variations to the data-generating
model. Figure 3(a) considers a true potential outcome model where interference comes from both
first-order and second-order neighbors
X X
Yit (Z t ) = α⋆ + τ ⋆ · Zit + η1⋆ · Zjt + η2⋆ · Zkt + ϵi,t , (3.5)
j∈G1 (i) (2)
k∈G1 (i)

(2)
where G1 (i) defines the set of neighbors-of-neighbors of unit i under G1 (·). Figure 3(b) considers
the performance of our model selection methods when adding individual heterogeneity and time-
varying effects to the true potential outcomes model:
X
Yit (Z t ) = αi⋆ + γt⋆ + τ ⋆ · Zit + η1⋆ · Zjt + ϵi,t (3.6)
j∈G1 (i)

9
(a) Neighbors of Neighbors Interference (b) Individual Heterogeneity and Time Effects

Figure 3. Varying the True Interference Model: Figures are generated by averaging the results
of 500 experiments. Each model is estimated with a sample size of N = 1000 and T = 5 periods.
The coefficient on 2nd-order neighbor interference is η2⋆ = 2. The plots show the distribution of
relative estimation error (%) for the model selection mechanism. The central tick marks represent
the median.

To make the comparison fair, we add these additional terms of the true models in Figure 3 to all
the alternate interference models specified in (3.3)-(3.4). In each of these experiments, we use an
even roll-out with a 10% per period increment.
In Figure 3(a), all procedures tend to perform better in estimation error when adding the second
order neighbors interference term. This is likely because outcomes are now more correlated with
the underlying network since interference now has a larger spillover effect. On the other hand, all
model selection procedures tend to do worse when considering unit-fixed and time-fixed effects.
This is unsurprising since the additional individual and time-varying terms make it difficult to
distinguish between spillover effects and individual and temporal heterogeneity. In Figure 3(b), we
observe both LOPO and Pooled K-Fold perform better in terms of estimation error relative to
the other baselines.

Network sparsity Finally, we consider the performance of our procedure when we vary the
sparsity of the underlying network. As we have seen in Section 4, sparsity can greatly influence the
likelihood that the TTE is identified. Naturally, we expect this parameter to also influence model
selection. For example, we might expect graphs that are very sparse to generate little variation
preventing us from learning how to extrapolate treatment exposures to outcomes. On the other
hand, graphs that are very dense tend to generate colinear data, which complicates parameter
estimation.
In Figure 4, we generate a series of Erdos-Renyi graphs with an increasing probability of any
two units having an edge, using each graph to define G1 (·). Figure 4 displays how often each model

10
Figure 4. Varying Network Sparsity: Figures are generated by averaging the results of 500 ex-
periments. Each model is estimated with a sample size of N = 1000 and T = 5 periods. 95%
bootstrapped confidence sets are displayed by the shaded region. We exclude probabilities of 0 and
1 due to multicollinearity issues.

selection procedure fails to pick out the true model as we increase this probability. We observe that
the procedures are not very sensitive to the underlying sparsity of the graph, with relatively flat
selection rates across each graph. Strikingly, the LOPO procedure outperforms every other model
selection procedure in selecting the true model at every point.

Polynomial Models In the previous experiments we have generated potential outcomes based
on linearly additive models. However, the LOPO procedure can be extended to non-linear outcome
models as well. We illustrate this using the polynomial outcome model of Cortez et al. (2022b),
which assumes the data-generating process
β P !l
j∈N (i) c̃i,j zj
X X
Yi (⃗z) = ci,∅ + c̃i,j zj + P , (3.7)
j∈N (i) j∈N (i) c̃i,j
l=2
P
where ci,∅ ∈ U [0, 1], c̃i,i ∼ U [0, 1], and for i ̸= j c̃i,j = vj |N (i)|/ k:(k,j)∈E |N (k)| and vj ∼ U [0, r].
Here, r is a parameter that controls the magnitude of indirect effects; in our simulations, we set
r = 2. Like Cortez et al. (2022b) we do not include a noise term so that any error is due to the
misspecification of the model. We define N (i) according to a sparse Erdos-Renyi graph where the
probability of an edge between any two nodes is 0.1.
An important consideration in Cortez et al. (2022b) is the choice of β controlling the number
of higher-order polynomial terms. Cortez et al. (2022b) take a design based perspective where
they choose T = β periods of roll-out based on a known β. Instead, we consider how a researcher
might select β given a T -period roll-out. This complementary perspective is important in many

11
online platforms where roll-outs are frequently implemented independent of the model specification;
researchers are often tasked with evaluating the effects of an intervention ex-post. Figure 5 shows the
results of an experiment where we select β by applying LOP O to four variations of the interference
model (3.2) including a model with no interference terms, a second-order term, and a third-order
term. β can then be inferred by inspecting how many higher-order terms are in the selected model.

(a) Selecting the Polynomial Order (b) Relative Bias

Figure 5. Selecting the Degree of Polynomial Models: figures are generated by averaging the results
of 100 experiments. In each experiment, sample size is given by the x-axis, N , with T = 4 and an
even roll-out. We display bootstrapped 95% confidence intervals of the relative bias. We use the
DGP given by Cortez et al. (2022b) in Eq. (3.7) with r = 2 and β = 2.

Notably, LOPO predominantly selects the correct model associated with β = 2, and the rate
of accurate selections rises with growing values of N . After determining the appropriate β value,
researchers can employ the Lagrange interpolation estimation technique with T = β, as elaborated
in Section 3 of Cortez et al. (2022b). The results of this procedure are displayed in the second
panel of Figure 5 where we find that applying this procedure yields minimal relative bias which is
vanishing with the sample size N . Additionally, in Section C, we delve into the impact of model
misspecification on the bias of the TTE in the polynomial model.

3.3 Limitations and Extensions


LOPO tends to lead to better model selections and lower absolute error than other methods across
each experimental setting we have considered, especially relative to the No Roll-out procedure.
Our empirical findings align with our theoretical study to come, which quantifies the statistical
efficiency gains due to roll-outs and confirms our conjecture that temporal variations can be used
to better model interference effects. Interestingly, the Pooled K-Fold procedure performs very
well, even though it does not consider the underlying network structure. One explanation may be
that our interference networks induce neighborhoods that are relatively uniform across units. In this
case, K-Fold cross-validation does not risk leaving out highly central units; uniformity effectively
implies the particular network we used to generate outcomes satisfies exchangeability.

12
While the LOPO procedure is reliable overall, there are instances in which it is outperformed
by other procedures. As with any model selection procedure, we can always construct adversarial
examples in which LOPO will fail. For example, if all the variation in treatment exposure occurs in
the first and last periods, we would expect a procedure that only considers estimation and prediction
on these periods to do very well while LOPO—which equally weights the middle period—-could
perform worse. A more challenging example might be the case of threshold interference. In all of
our models, we have considered roll-outs only to 50%. If there is a thresholding effect whereby
interference only appears at the 51% treated level, then all of our procedures would fail since we
would have no variation in the interference effect to learn from.
There are possible extensions to our suggested procedure. For instance, we could consider a
weighted average of the MSPEs from each fold. Weights could be proportional to the number of
treated observations in the test set, for instance, so that variation in each period is considered.
Another approach might be to consider testing on multiple periods at once, e.g., leaving out all
combinations of two periods, training on the remaining periods, and testing on these two periods.
Such a procedure would maximize the variation we exploit and is computationally very costly for
potentially marginal benefit. Still, for any of these extensions, an adversarial example is possible,
and we believe the LOPO procedure is a reasonable choice, being both intuitive and reliable in a
wide range of environments.

4 Model Identification and Estimation


In the previous section, we discussed how a model of interference may be selected using roll-out
experiments. We now take this selected model as given and characterize how roll-outs allow us to
identify causal effects. We begin by providing conditions under which the total treatment effect (2.1)
can be identified under the presence of interference. We demonstrate theoretically and empirically
that roll-outs help satisfy these identification conditions. Furthermore, we prove that when these
conditions are met, roll-outs provide gains in statistical efficiency for estimating the TTE.

4.1 Potential Outcomes Model and Estimation Framework


Causal estimation under interference requires structural assumptions. To ground our study, we
consider the following model class and associated estimator.

Linear additive models We assume potential outcomes are linearly additive in zi and a known
p-dimensional feature vector fi : {0, 1}N → Rp

Yit (⃗zt ) = αi⋆ + γt⋆ + ψg(i,t)



+ τ ⋆ · zi,t + η ⋆⊤ fi (⃗zt ) + ϵi,t . (4.1)

The class (4.1) allows flexible modeling of interference effects. It subsumes commonly studied
models such as exposure mapping and two-way fixed effect models (Harshaw et al., 2023), as well
as models from Example 1.

• α⋆ ∈ RN is a unit-fixed effects allowing for individual heterogeneity.

• γ ⋆ ∈ RT models time-varying trends through period-fixed effects.

13
• ψ ⋆ ∈ RG models further unit-period heterogeneity where g : [N ] × [T ] → [G]. To make
estimation tractable, we assume G < N T is small enough such that we never have more
parameters than observations.
• τ ⋆ ∈ R is the direct treatment effect for unit i.
• η ⋆ ∈ Rp models indirect treatment effects due to interference.
• {ϵi,t }i∈[N ],t∈[T ] are idiosyncratic noise, for which we will consider different regimes.
Letting K = N + T + G + p + 1, define the vector of model parameters
θ⋆ := [α⋆ , γ ⋆ , ψ ⋆ , τ ⋆ , η ⋆ ] ∈ RK , (4.2)
assuming a normalization where fi (0) = 0p for all i. Under the data generating process (4.1), the
total treatment effect (2.1) can be rewritten as the linear combination
N
" #
⊤ ⋆ 1 X
TTE = c θ where c = 0N , 0T , 0G , 1, f (1) := fi (1) ∈ RK . (4.3)
N
i=1

Estimation approach To estimate the TTE under the above linearly additive model (4.1), we
turn to simple linear regression estimators. We define our matrix of covariates for a single roll-out
period as
X t = [IN , 1⊤
t , 1g(i,t) , Z , f (Z )]
⊤ t t
(4.4)
where IN is the N × N identity matrix representing the individual effects, 1⊤ t ∈R
N ×T is a matrix

indicating if an observation belongs to period t, 1g(i,t) ∈ R


⊤ N ×G indicates if observation i is in cluster
1 T ⊤
g = 1, . . . , G at period t. Letting X = [X , . . . , X ] and recalling the definition of coefficients c
and parameters θ defined in Eq. (4.2), we study the linear regression estimator
ˆ = c⊤ θb where θb = (X ⊤ X)−1 X ⊤ Y
TTE (4.5)
whenever X ⊤ X is non-singular, in which case TTEˆ is a consistent and unbiased estimator. A major
benefit of roll-outs is that they increase the likelihood that X ⊤ X will be non-singular, which is
ˆ to have the aforementioned
necessary for the total treatment effect to be identifiable and for TTE
statistical guarantees.

4.2 Model Based Identification


We now consider conditions that allow us to identify the total treatment effect in the finite pop-
ulation setting and show how they are tied to X ⊤ X being invertible. We first consider the case
of a single interference term to build intuition, and then provide a more general condition for the
identification of the TTE. We conclude by showing empirically how roll-outs increase the likelihood
that these identification conditions hold in-sample.
In what follows we consider parametric identification of the TTE in the setting of finite popu-
lations. We say that a parameter is identified in this sense if there exists a consistent estimator for
that parameter where consistency is considered with respect to increasing population sizes. Results
of this nature are derived in Section 4.3. In the case of linear models with exogenous covariates, as
in our setting, a sufficient condition for identification of the parameter vector is non-singularity of
the design matrix, X ⊤ X (see Section 4.2.1 of Wooldridge (2010)). Section 4.2.1 provides sufficient
conditions for the non-singularity condition to hold.

14
4.2.1 Identification with a Single Interference Term
Proposition 1 below introduces a sufficient condition that ensure X ⊤ X is invertible when fi (⃗z) ∈ R.
We state the condition in the language of interference networks to illustrate how they relate to roll-
outs and interference. The key idea here is that if we can find some unit in the control group
connected to a treated unit and observe the spillover effect on this individual, then we have enough
information about the interference mechanism to extrapolate to the case of total treatment. Roll-
outs increase the probability that this sufficient condition holds by increasing the proportion of
treated individuals in each period. The proof of this result is given in Section B.1.

Proposition 1. Consider a T > 1 period roll-out under the following linearly additive model

Yit (⃗z) = α⋆ + τ ⋆ · zi + η ⋆ · fi (⃗z) + ϵi,t .



Assume there are t, t′ ∈ [T ], i, j ∈ [N ], (i, t) ̸= (j, t′ ), such that Zit = Zjt = 0, fi (Z t ) ̸= 0, and

fi (Z t ) ̸= fj (Z t ). Then X ⊤ X, as defined in Eq. (4.4), is non-singular.

The condition fi (Z t ) ̸= fj (Z t ) ensures we observe variation in the interference term so that, as the
roll-out progresses, the interference effects vary. The condition fi (Z t ) ̸= 0 further ensures that we
observe a spillover effect on an untreated unit. Together, these conditions are sufficient (but not
necessary) for identifying the TTE by ensuring the invertibility of the Gram matrix, X ⊤ X.
Next, we apply Proposition 1 in the context of the interference graph from Example 1.

Corollary 1. Consider the model from Proposition


P 1 where the interference term is given by the
model (2.2a) in Example 1, i.e., f (⃗z) = j∈G1 (i) ⃗zj . Assume that neighbors are commutative so
that i ∈ G1 (j) implies j ∈ G1 (i). If at time t > 1, there is a treated unit j with an untreated neighbor
(i ∈ G1 (j) with Zit = 0), X ⊤ X is non-singular.

4.2.2 Identification with General Interference


In practice, we are only interested in estimating the TTE, not the entire parameter vector θ in the
model (4.1). Recalling the linear representation (4.3) for the TTE, we now show we can identify
the TTE even when the individual components of θ are not identifiable. Our result shows that the
TTE is identified so long as the linear transformation c that maps θ to the TTE (4.3) lies in the
space spanned by the covariates (4.4). Intuitively, this shows that we can identify the TTE under
general interference patterns whenever the linear transformation (4.3) can be represented by the
observed data. This is particularly useful in the small N and T regime where there may not be
enough variation to compute a typical least squares estimate.

Theorem 2. Under the data-generating model (4.1), recall the linear transformation c that maps
the vector of parameters to the TTE (4.2). If c ∈ span(X ⊤ ), then {c⊤ θ : X ⊤ Xθ = X ⊤ Y } is a
singleton.

To clarify the importance of Theorem 2, which we prove in Section B.3, we consider the following
example where X ⊤ X is singular but the TTE is still well-defined and identifiable via Theorem 2.

15
Figure 6. A disconnected network defined by G(·) and corresponding feature matrix X defined by
(4.6).

Example 2 (Identifying the TTE when θ is not identified): Consider the linear interference model
of Example 1 with no individual heterogeneity, so that ∀ i ∈ [N ], αi = α,
X
Yit (Z t ) = α⋆ + τ ⋆ · Zit + η1⋆ · Zjt + ϵi,t (4.6)
j∈G(i)

Let N = 3 and define the interference network G(·) by the graph in Figure 6. Suppose that T = 2,
such that in the first period t = 1 no individuals are treated, and that in period t = 2 we treat
observations A and B, generating the feature matrix X in Figure 6. We wish to estimate the TTE
using the correctly specified model (4.6). Notice that X ⊤ X is singular since because columns 2 and
3 of X are linearly dependent. The TTE in this model is given by τ + η implying c = [0, 1, 1], and
c ∈ span(X ⊤ ) because X ⊤ v = c for v = [0, 0, 0, 0, 1, −1]⊤ ∈ R6 . Theorem 2 shows that the TTE is
identifiable; in particular, we can estimate TTE by looking at the difference in outcomes for unit
A or B at periods t = 1 and t = 2. ⋄

Proposition 1 and Theorem 2 consider when it is possible to identify a linear combination of


parameters from a linear regression. While it is possible to satisfy the conditions of Proposition 1
and Theorem 2 with variation in interference effects over a single period—sometimes called spatial
variation (Aronow and Samii, 2017a)—in many cases, we also need temporal variation to achieve
identification, which roll-outs provide. For example, when we have individual heterogeneity param-
eters, {αi }i∈[N ] , temporal variation in individual responses is required for identification. Roll-outs
should not only help us in identifying individual effects but also enable us to identify interference
effects as well as the total treatment effect. Specifically, roll-outs provide added variation that in-
creases the probability that the conditions such as the ones outlined in Proposition 1 and Theorem
2 hold. Figure 7 provides evidence for this idea in a simulated setting where outcomes are sam-
pled according to (2.2a): as the number of roll-out periods increases, the probability of uniquely
identifying the total treatment effect increases very quickly, even in extremely sparse models with
an Erdos-Renyi parameter of 0.001. As expected, the higher the graph density, the likelier we
are to satisfy the conditions in Proposition 1 and Theorem 2. This is because increasing network
density also increases the probability that any two units are connected and so generally increases
the likelihood that an untreated unit will be connected to a treated unit.

4.3 Estimation of the Total Treatment Effect


In addition to guaranteeing identifiability, temporal variation in the covariates X as measured by
the spectrum of X ⊤ X can also reduce statistical error due to measurement noise. In this section, we

16
Figure 7. Probability that c ∈ span(X ⊤ ) for an underlying Erdos-Renyi graph with varying edge-
formation probabilities of an edge. Probabilities are computed over 100 experiments. We display
95% CIs. Over all realizations with a 5% probability of an edge (shown in grey) results are constant,
hence there is no visible CI.

quantify how roll-outs provide gains in statistical efficiency under two settings. First, we consider
the case of completely correlated noise so that, in each period, outcomes only change as a function
of the treatment. This is similar to modeling unobserved individual heterogeneity, which can be
fully captured using unit-level fixed effects. Second, and at the other extreme, we consider fully
independent noise across individuals and roll-out periods. This setting arises when there are no
unobserved temporally persistent events. In both settings we expect roll-outs to improve the final
variance bound.
We begin by studying the bias and variance of our estimator under the assumption that our
model selection procedure (cf. Section 3) has correctly chosen an interference model. The unbi-
asedness of the estimator is clear from the randomized design because Z is drawn randomly in each
period independent of ϵ. To study the variance of our estimator, we make the following assumption
on how unobserved noise enters the model.

Assumption A (Time-invariant Individual Idiosyncrasies). Suppose ϵi,t = ϵi where {ϵi }i=1,...,N,


are independent, mean-zero, and satisfies E[ϵ2i ] ≤ σmax
2 < ∞.

Applying this assumption, we can control the mean squared error (MSE) of our estimator TTE. ˆ
ˆ ˆ ˆ 2
Since our estimator TTE is unbiased, we have MSE(TTE) = E[(TTE − TTE) ] = Var(TTE) and ˆ
we only have to control the variance term. As noise terms across periods are completely correlated
under Assumption A, idiosyncrasies may persist indefinitely across periods and we expect our
variance will increase as a function of T . The variance reduction occurs as the population size N
grows large, as in the classical linear regression setting. In the below result, this is captured by the
geometry of X using λmin (X ⊤ X), the minimum eigenvalue of X ⊤ X.

17
2
1
Theorem 3. Under the data-generating model (4.1) and Assumption A, we have NT E X θb − Xθ⋆ ≤
2
2 K
4σmax N , and
h i
ˆ − TTE)2 ] = E(c⊤ (θb − θ⋆ ))2 ≤ 4KT · ∥c∥2 · σmax
E[(TTE 2
· E λ min (X ⊤
X)−1
.
2

It is useful to compare Theorem 3 to the standard linear regression setting with a single period
T = 1, where a similar analysis yields the same bound but without any dependence on T . Com-
paring these two results together, it may seem as though roll-outs increase the variance in the case
of time-invariant noise. However, we also need to consider how λmin (X ⊤ X) scales with N and T .
By way of illustration, consider the case of a complete graph so that the interference term can be
deterministically quantified in relation to Z t . In that setting, we find that the minimum eigenvalue
grows linearly in N T , implying that we recover the classical N1 rate. The following lemma captures
this idea that a roll-out allows us to increase our effective data size even with fixed population size
and time-invariant errors. In particular, as we have seen in the previous subsection, roll-outs also
tend to increase the likelihood that our X ⊤ X matrix will be full-rank.

Lemma 1. Consider the same model as in Proposition 1 and a completely randomized roll-out
(cf. Definition 1) with allocation vector p⃗. Let Assumption A hold and let fi be linear-in-means
1 P
fi (⃗z) = |G(i)| j∈G(i) zj where G(i) is defined by a complete graph. Then, fixing the sample size at
N there exists M ∈ R large enough that for N T > M

ˆ − TTE)2 ] ≤ 8C̄1
E[(TTE ,
N
2
where C̄1 is a constant that depends on K, σmax from Assumption A, and the allocation vector p⃗.

Lemma 1 illustrates how roll-outs even when observations are fully correlated across periods still
enable us to obtain the usual N1 decay for the variance through λmin (X T X), the minimum eigenvalue
of our Gram matrix. The proof can be found in Section B.5.
We now turn to the case where individual noise is independent across periods. As we have noted
earlier, this setting arises when unobserved idiosyncrasies do not persist across several periods. Here
we make the following analogue to Assumption A.

Assumption B (Time-varying Individual Idiosyncrasies). {ϵi,t }i∈[N ],t∈[T ] are independent, mean-
h i
zero, and satisfies E ϵ2i,t ≤ σmax
2 < ∞.

Because Assumption B requires noise to be independent across time periods, we can achieve
tighter control of the variance of our estimator. In particular, roll-outs decrease the MSE at a T1
rate since idiosyncrasies are fully independent across time. Applying the same analysis as in our
derivation of Theorem 3, we have the following result which we prove in Section B.4.
2
1
Theorem 4. Under the data-generating model (4.1) and Assumption B, we have NT E X θb − Xθ⋆ ≤
2
2 K
4σmax N T , and
h i
ˆ − TTE)2 ] = E(c⊤ (θb − θ⋆ ))2 ≤ 4K · ∥c∥2 · σmax
E[(TTE 2
· E λ min (X ⊤
X)−1
2

18
(a) Time-invariant Individual Idiosyncrasies (b) Time-varying Individual Idiosyncrasies

Figure 8. Variance reduction due to roll-out under fixed and i.i.d. errors. The no roll-out baseline
is computed by assuming that we observe a 50% treated population at multiple periods in time with
both fixed and random potential outcomes. We display 95% χ2 -confidence intervals.

Similar to Lemma 1 in the case of Assumption B, the next lemma provides a concrete bound
for a complete interference graph. When the noise is fully independent across periods, we gain a
reduction in variance as T grows. Hence, in this extreme, roll-outs produce an even faster vaster
variance reduction through the geometry of λmin (X ⊤ X).

Lemma 2. Consider the same model as in Proposition 1 and a completely randomized roll-out
(cf. Definition 1) with allocation vector p⃗. Let Assumption B hold and let fi be linear-in-means
1 P
fi (⃗z) = |G(i)| j∈G(i) zj where G(i) is defined by a complete graph. Then, fixing the sample size at
N there exists M ∈ R large enough that for N T > M

ˆ − TTE)2 ] ≤ 8C̄2
E[(TTE ,
NT
2
where C̄2 is a constant that depends on K, σmax from Assumption B, and the allocation vector p⃗.

See Section B.5 for the proof.


Figure 8 shows the implications of Theorems 3 and 4 in both the time-varying and time-invariant
noise settings. In both cases, we see a non-trivial variance reduction relative to the no roll-out case,
which is emblematic of the variance gains from roll-outs quantified in this section.

5 Discussion
In this work, we leverage a universal experimentation design used throughout online platforms,
roll-outs, to model interference effects. We quantify how roll-outs induce temporal variation in
treatment exposure that facilitates the identification and estimation of the total treatment effect.
We propose a model selection procedure to help practitioners model interference and identify the

19
total treatment effect. We conclude the paper by discussing robustness checks that can augment
our model selection framework, and discuss possible pitfalls practitioners may face when applying
our methodology. The heuristics we propose below help practitioners implement our methodology.

5.1 Practical Considerations


Robustness checks A natural question that arises from this analysis is whether there are any
robustness checks that can provide evidence that our model selection procedure has chosen the
correct model of interference. Fundamentally, testing if a model is correct is not possible. However,
there are tests we can perform to build evidence that we are fully accounting for the variation in
outcomes caused by interference in our data sample.
The first recommendation we make to practitioners is to include a model without interference
terms in the model selection step. Including such a model in our procedure is equivalent to testing
for interference. If the model selection procedure chooses the no interference model, when there is
a strong prior for interference in the experiment, then this is good evidence that the models that
are being tested are inadequately capturing the effects of treatment exposure.
Another possible test uses the interference testing framework of Han et al. (2022). Their work
considers what gains roll-outs provide when attempting to detect interference. They provide several
permutation tests under a Bernoulli roll-out design that are able to effectively test for the presence
of interference. A key component of these tests is the candidate exposure of each unit which is
defined in their notation as hi (W−i,k ) where, using our notation Wi,k = Zit . A simple test to consider
is to define hi to be the interference terms in our setting, that is to say, set hi (W−i,k ) = fi (Z t )
where f(·) is given by the selected model, and then conduct the proposed multiple experiment test
of Han et al. (2022). If the test finds interference to be statistically significant, then this is good
evidence that the selected model of interference is capturing the effect of treatment exposure on
outcomes. While this test may still suffer from misspecification issues, comparing its results to the
permutation test proposed by Han et al. (2022) again can provide strong evidence in favor of the
selected model.
A final approach applies the test proposed by Pouget-Abadie et al. (2017). In this case, after
model selection is completed, we compute the new outcome, which subtracts the effects of inter-
ference from each outcome at every period. Next, we pool our data across all periods and use the
underlying interference network to create clusters of units, allowing us to compute a difference-in-
means estimate of the total treatment effect and a Horvitz-Thompson estimate under a cluster-based
design. We can now compute ∆ as the difference of these estimates and conduct the test proposed
in Pouget-Abadie et al. (2017). If we find that the estimates are similar, then this is again evidence
that our selected model is accurately capturing interference effects.

Other Considerations Since effect sizes are typically small in online platforms, the lack of
statistical power may result in the inability to distinguish between similar potential outcome models.
In many of the simulations we considered, we observed different models yield similar estimates of
the TTE, somewhat alleviating such concerns. When considering rich interference models, we
recommend a LASSO penalty when conducting estimation.
There are often non-stationarities in interference effects that require time to equilibrate, and
the length of each period in a roll-out is an important design choice. Our procedure relies on the
fact that outcomes are observed after the full interference effects have been experienced. Different
experiments and settings will naturally require different time windows, and previous experiments

20
and domain knowledge should guide these choices. Finally, some practitioners may want to pose
auto-regressive models in their experiments. Unfortunately, auto-regressive models pose challenges
as they introduce complicated interactions with interference terms, time effects, and individual
heterogeneity. While a well-known practical issue in the context of two-way fixed effects (Arellano
and Bond, 1991), the consequences of auto-regressive terms are unclear in terms of interference,
which we leave as a topic of future work.

5.2 Future Directions


We summarize several future directions of research. First, we have seen how roll-out schedules can
influence our ability to conduct model selection effectively. A theoretical study of model selection
requires a formal language for model misspecification in the presence of interference, which we
leave for future work. A close study of the design-based perspective posed in our work may yield
fruit. While requiring more engineering resources, the experimenter may sometimes be able to
adaptively choose a roll-out schedule that maximizes the information that can be learned from the
experiment before fully launching an intervention. Similarly, she may mitigate the non-stationarity
of interference effects by appropriately choosing periods in a roll-out. A third direction may consider
how to carefully incorporate auto-regressive terms under the presence of interference, which may
prove useful from a modeling perspective. Finally, while we have empirically shown that the LOPO
procedure tends to perform reliably, we mention some possible extensions in Section 3.3.

Acknowledgement We thank Kevin Han, Shuangning Li, Jialiang Mao, and Han Wu for their
thoughtful feedback.

References
Arellano, M. and Bond, S. (1991). Some tests of specification for panel data: Monte carlo evidence
and an application to employment equations. The Review of Economic Studies, 58(2):277–297.

Aronow, P. M. and Samii, C. (2017a). Estimating average causal effects under general interference,
with application to a social network experiment. The Annals of Applied Statistics, 11(4):1912–
1947.

Aronow, P. M. and Samii, C. (2017b). Estimating average causal effects under general interference,
with application to a social network experiment. The Annals of Applied Statistics, 11(4):1912 –
1947.

Athey, S., Eckles, D., and Imbens, G. W. (2018). Exact p-values for network interference. Journal
of the American Statistical Association, 113(521):230–240.

Baird, S., Bohren, J. A., McIntosh, C., and Özler, B. (2018). Optimal design of experiments in the
presence of interference. Review of Economics and Statistics, 100(5):844–860.

Basse, G., Ding, P., Feller, A., and Toulis, P. (2019a). Randomization tests for peer effects in group
formation experiments. arXiv.

Basse, G. W. and Airoldi, E. M. (2018). Model-assisted design of experiments in the presence of


network-correlated outcomes. Biometrika, 105(4):849–858.

21
Basse, G. W., Feller, A., and Toulis, P. (2019b). Randomization tests of causal effects under
interference. Biometrika, 106(2):487–494.

Basse, G. W., Soufiani, H. A., and Lambert, D. (2016). Randomization and the pernicious effects of
limited budgets on auction experiments. In Artificial Intelligence and Statistics, pages 1412–1420.
PMLR.

Biswas, N. and Airoldi, E. M. (2018). Estimating peer-influence effects under homophily: Ran-
domized treatments and insights. In Complex Networks IX: Proceedings of the 9th Conference
on Complex Networks CompleNet 2018 9, pages 323–347. Springer.

Blake, T. and Coey, D. (2014). Why marketplace experimentation is harder than it seems: The role
of test-control interference. In Proceedings of the Fifteenth ACM Conference on Economics and
Computation, EC ’14, page 567–582, New York, NY, USA. Association for Computing Machinery.

Bojinov, I. and Gupta, S. (2022). Online experimentation: Benefits, operational and methodological
challenges, and scaling guide. Harvard Data Science Review, 4(3).

Bojinov, I., Simchi-Levi, D., and Zhao, J. (2022a). Design and analysis of switchback experiments.
Management Science.

Bojinov, I., Simchi-Levi, D., and Zhao, J. (2022b). Design and analysis of switchback experiments.
Management Science.

Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D. I., Marlow, C., Settle, J. E., and Fowler,
J. H. (2012). A 61-million-person experiment in social influence and political mobilization. Nat.,
489(7415):295–298.

Brennan, J. R., Mirrokni, V., and Pouget-Abadie, J. (2022). Cluster randomized designs for one-
sided bipartite experiments. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K., editors,
Advances in Neural Information Processing Systems.

Bright, I., Delarue, A., and Lobel, I. (2022). Reducing marketplace interference bias via shadow
prices.

Candogan, O., Chen, C., and Niazadeh, R. (2021). Correlated cluster-based randomized experi-
ments: Robust variance minimization.

Chang, S., Vrabac, D., Leskovec, J., and Ugander, J. (2022). Estimating geographic spillover effects
of covid-19 policies from large-scale mobility networks.

Chin, A. (2019). Regression adjustments for estimating the global treatment effect in experiments
with interference. Journal of Causal Inference, 7(2).

Cortez, M., Eichhorn, M., and Yu, C. L. (2022a). Exploiting neighborhood interference with low
order interactions under unit randomized design.

Cortez, M., Eichhorn, M., and Yu, C. L. (2022b). Graph agnostic estimators with staggered rollout
designs under network interference.

22
Eckles, D., Karrer, B., and Ugander, J. (2017). Design and analysis of experiments in networks:
Reducing bias from interference. Journal of Causal Inference, 5(1).

Eckles, D., Kizilcec, R. F., and Bakshy, E. (2016). Estimating peer effects in networks with peer
encouragement designs. Proceedings of the National Academy of Sciences, 113(27):7316–7322.

Farias, V. F., Li, A. A., Peng, T., and Zheng, A. (2022). Markovian interference in experiments.

Han, K., Li, S., Mao, J., and Wu, H. (2022). Detecting interference in a/b testing with increasing
allocation.

Harshaw, C., Sävje, F., Eisenstat, D., Mirrokni, V., and Pouget-Abadie, J. (2023). Design and
analysis of bipartite experiments under a linear exposure-response model. Electronic Journal of
Statistics, 17(1):464 – 518.

Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. Journal of
the American Statistical Association, 103(482):832–842. PMID: 19081744.

Imbens, G. W. and Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical
sciences. Cambridge University Press.

Johari, R., Li, H., Liskovich, I., and Weintraub, G. (2020). Experimental design in two-sided
platforms: An analysis of bias. arXiv.

Karrer, B., Shi, L., Bhole, M., Goldman, M., Palmer, T., Gelman, C., Konutgan, M., and Sun, F.
(2021). Network experimentation at scale. In Proceedings of the 27th ACM SIGKDD Conference
on Knowledge Discovery & Data Mining, pages 3106–3116.

Kohavi, R., Longbotham, R., Sommerfield, D., and Henne, R. M. (2009). Controlled experiments
on the web: survey and practical guide. Data mining and knowledge discovery, 18(1):140–181.

Leung, M. P. (2019). Causal inference under approximate neighborhood interference.

Pouget-Abadie, J., Mirrokni, V., Parkes, D. C., and Airoldi, E. M. (2018). Optimizing cluster-
based randomized experiments under monotonicity. In Proceedings of the 24th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining, pages 2090–2099.

Pouget-Abadie, J., Saveski, M., Saint-Jacques, G., Duan, W., Xu, Y., Ghosh, S., and Airoldi, E. M.
(2017). Testing for arbitrary interference on experimentation platforms.

Rigollet, P. and Hütter, J.-C. (2015). High dimensional statistics.

Rosenbaum, P. R. (2007). Interference between units in randomized experiments. Journal of the


American Statistical Association, 102(477):191–200.

Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions.
Journal of the American Statistical Association, 100(469):322–331.

Saveski, M., Pouget-Abadie, J., Saint-Jacques, G., Duan, W., Ghosh, S., Xu, Y., and Airoldi, E. M.
(2017). Detecting network effects: Randomizing over randomized experiments. In Proceedings of
the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages
1027–1035.

23
Sinclair, B., McConnell, M., and Green, D. P. (2012). Detecting spillover effects: Design and
analysis of multilevel experiments. American Journal of Political Science, 56(4):1055–1069.

Sävje, F., Aronow, P. M., and Hudgens, M. G. (2017). Average treatment effects in the presence
of unknown interference.

Ugander, J. and Yin, H. (2020). Randomized graph cluster randomization.

Viviano, D. (2020). Policy design in experiments with unknown interference.

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. The MIT Press.

Xiong, R., Athey, S., Bayati, M., and Imbens, G. (2020). Optimal experimental design for staggered
rollouts. Management Science.

Xu, Y., Chen, N., Fernandez, A., Sinno, O., and Bhasin, A. (2015). From infrastructure to culture:
A/b testing challenges in large scale social networks. In Proceedings of the 21th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pages 2227–2236.

Xu, Y., Duan, W., and Huang, S. (2018). Sqr: Balancing speed, quality and risk in online ex-
periments. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, pages 895–904.

Yu, C. L., Airoldi, E. M., Borgs, C., and Chayes, J. T. (2022a). Estimating the total treatment
effect in randomized experiments with unknown network structure. Proceedings of the National
Academy of Sciences, 119(44):e2208975119.

Yu, C. L., Airoldi, E. M., Borgs, C., and Chayes, J. T. (2022b). Graph agnostic randomized
experimental design. arXiv.

Yuan, Y., Altenburger, K., and Kooti, F. (2021). Causal network motifs: Identifying heterogeneous
spillover effects in a/b tests. In Proceedings of the Web Conference 2021, pages 3359–3370.

Zigler, C. M. and Papadogeorgou, G. (2021). Bipartite causal inference with interference. Statistical
science: a review journal of the Institute of Mathematical Statistics, 36(1):109.

24
A Characterizing the Distribution of Roll-out Designs
In this section of the appendix, we analyze two different procedures to implement roll-out designs.
The first design, known as the completely randomized roll-out, is defined in Definition 1 and is
used throughout our paper and by contemporaneous work, e.g., (Cortez et al., 2022b). In each
period t, let St ⊆ [N ] be the set of newly treated units. The probability that a unit i is selected
for treatment is
N pt
P[i ∈ St ] = . (A.1)
N − ⌈N t−1
P
j=1 pj ⌉

Evidently, a completely randomized roll-out is a Markov chain whose state transitions occur ac-
cording to, (
1 Zit−1 = 1 or i ∈ St
Zit = .
0 otherwise
With this in hand, we can compute the marginal distribution of Zit .

Lemma 3. The distribution of Z t under Definition 1 is given by


t
X
yuetal2022rolloutP[Zit = 1] = pt (A.2)
k=1

Proof Let St ⊆ [N ] be the set of newly treated units in period t.

1 − P[Zit = 0] = P[Zit = 1]
= P[Zit−1 = 1] + P[i ∈ St ]P[Zit−1 = 0]
= P[i ∈ S1 ] + P[i ∈ S2 ]P[Zi1 = 0] + · · · + P[i ∈ St ]P[Zit−1 = 0]
t
X N pk  
= p1 + Pk−1 1 − P[Zik−1 = 1] .
k=2
N − ⌈N j=1 pj ⌉

Conclude
P[Zi1 = 1] = p1
N p2
P[Zi2 = 1] = p1 + (1 − p1 ) = p1 + p2
N (1 − p1 )
..
.
t
X
P[Zit = 1] = pk .
k=1

We now turn to the second roll-out implementation that is based on Bernoulli trials. This design
appears in Han et al. (2022) and much of the analysis in this paper can also be carried through in
this implementation.

25
Definition 2 (Bernoulli Roll-outs). A T -period Bernoulli roll-out is an increasing PT set of treat-
1 T
ment assignments, {Z , . . . , Z } and treatment proportions p = {p1 , . . . , pT } where t=1 pt = P̄ ≤
1 such that the distribution of Z follows
(
1 if Zit−1 = 1
Zi1 ∼ Bernoulli(p1 ) and Zit ∼
Bernoulli(pt ) otherwise
Lemma 4. Suppose that Z is a roll-out given by Definition 2. Then, Z t is a Markov chain with
1. P[Zit = 0] = tj=1 (1 − pj )
Q

2. P[Zit = 1] = p1 + p2 (1 − p1 ) + · · · + pt t−1
Q
j=1 (1 − pj ).

Proof To see Property 1, apply the law of total probability and P[Zit = 0|Zit−1 = 1] = 0
P[Zi1 = 0] = 1 − p1
P[Zi2 = 0] = P[Zi2 = 0|Zi1 = 0]P[Zi1 = 0] = (1 − p2 )(1 − p1 )
..
.
t
Y
P[Zit = 0] = (1 − pt ).
j=1

To see Property 2, we again apply the law of total probability


P[Zi1 = 1] = p1
P[Zi2 = 1] = P[Zi2 = 1|Zi1 = 0]P[Zi1 = 0] + P[Zi2 = 1|Zi1 = 1]P[Zi1 = 1]
= p2 (1 − p1 ) + 1 · p1 = p1 + p2 (1 − p1 )
..
.
t−1
Y
P[Zit = 1] = p1 + p2 (1 − p1 ) + · · · + pt (1 − pj ).
j=1

B Proof of Identification Results in Section 4


B.1 Proof of Proposition 1
We begin by writing out X
1 Z11 f1 (Z·1 )
 
 .. .. .. 
 . . . 
 
 1 Z1 1
fN (Z· ) 
 N 
X =  ... .. ..
 ∈ RT N ×3 .
 
 . . 
 1 ZT f1 (Z·T ) 
 1 
 .. .. .. 
 . . . 
1 ZN T fN (Z·T )

26
X ⊤ X is non-singular when the columns of X are linearly independent. Suppose by way of contra-
diction that the columns X are dependent: there is a nonzero vector λ ∈ R3 such that

λ1 X1 + λ2 X2 + λ3 X3 = 0.

Recall from our hypothesis that there is a i ∈ [N ] and t ∈ [T ] with Zit = 0 and fi (Z t ) ̸= 0.
From the linear dependence of the columns, we have

λ1 + λ3 fi (Z t ) = 0 =⇒ λ1 = −λ3 fi (Z t ). (B.1)
′ ′
Now, take j ∈ [N ], t′ ∈ [T ] such that Zjt = 0 and fi (Z t ) ̸= fj (Z t ) as assumed in our hypothesis.
Linear dependence again implies
′ ′ ′
λ1 + λ3 fj (Z t ) = 0 ⇔ −λ3 fi (Z t ) + λ3 fj (Z t ) = 0 ⇔ λ3 (fj (Z t ) − fi (Z t )) = 0 ⇒ λ3 = 0,

where we use fj (Z t ) ̸= fi (Z t ) in the final line. Conclude λ1 = 0 from Eq. (B.1).
Now, take any k ∈ [N ] and q ∈ [T ] such that Zkq = 1, which is guaranteed to exist by the
definition of the roll-out. Notice that if the columns of X are linearly dependent, λ1 = λ3 = 0
implies

λ1 + λ2 + λ3 fj (Z t ) = 0 ⇒ λ2 = 0.

This is a contradiction since λ ̸= 0.

B.2 Proof of Corollary 1


Since ZjPt = 1 and i ∈ G (j) ⇔ j ∈ G (i), unit i’s interference term in model (2.2a) is positive:
1 1
fi (⃗z) = j∈G1 (i) ⃗zj > 0. Noting that unit i is untreated Zit = 0, the hypothesis of Proposition 1 is
satisfied for (i, t) and (i, 1).

B.3 Proof of Theorem 2


To prove this result, we consider two optimization problems that solve for the upper and lower
bounds of the total treatment effect.
max c⊤ θ min c⊤ θ
θ∈Rk θ∈Rk (B.2)
s.t. X ⊤ Xθ = X ⊤ Y s.t. X ⊤ Xθ = X ⊤ Y

The dual problems are given by

min p⊤ X ⊤ Y max p⊤ X ⊤ Y
p∈Rk p∈Rk (B.3)
s.t. p⊤ X ⊤ X = c⊤ s.t. p⊤ X ⊤ X = c⊤

The dual problems are feasible if and only if c lies within span(X ⊤ X) = span(X ⊤ ) where
span(·) refers to the column span. We show that this condition implies that the lower and up-
per bounds (B.2) are equal, yielding a unique estimate of the total treatment effect. If X ⊤ X is
nonsingular, this is evident. Otherwise, let

Θ0 = {θ ∈ RK : X ⊤ Xθ = X ⊤ Y }.

27
Suppose X ⊤ X is singular so that there may exist θ, θ′ ∈ Θ0 with θ ̸= θ′ . Let θ̃, p̃ and θ, p be
the optimal primal-dual pair for the upper and  lower bound problems respectively. From eprimal
e

feasibility θ̃, θ ∈ Θ0 and dual feasibility p̃, p ∈ p : X Xp = c , strong duality gives
e e
c⊤ θ̃ = p̃⊤ X ⊤ Y = p̃⊤ X ⊤ Xθ θ ∈ Θ0
= (X ⊤ X p̃)⊤ θ X ⊤ X is symmetric
e e

= c⊤ θ X ⊤ X p̃ = c by dual feasibility
e
e
B.4 Proof of Theorems 3 and 4
By the definition of the minimum eigenvalue, we have the tautological bound
2  −1 2
θb − θ⋆ ≤ λmin X ⊤ X X θb − Xθ⋆ .
2 2

Conditioning on X so that λmin (X ⊤ X) is deterministic, Cauchy-Schwarz gives


2  −1 2
EX |c⊤ (θb − θ⋆ )|2 ≤ ∥c∥22 EX θb − θ⋆ ≤ ∥c∥22 λmin X ⊤ X · EX X θb − Xθ⋆ . (B.4)
2 2

To bound the final term in the preceding display, we use an adaptation of Rigollet and Hütter
(2015, Theorem 2.2); recall that X ∈ RN T ×K and Y = Xθ⋆ + ϵ.
Lemma 5. Under the linear model of (4.1), suppose either Assumption A or B hold. Then, the
least squares estimator θb = (X ⊤ X)−1 X ⊤ Y satisfies
(
2 2 KT
4σmax under Assumption A
EX X θb − Xθ⋆ ≤ 2
. (B.5)
2 4σmax K under Assumption B

Applying the lemma, we have the desired result.


In the remainder of the section, we prove the bound (B.5). Since θb is the least squares estimator,
we have
2
Y − X θb ≤ ∥Y − Xθ⋆ ∥22 = ∥ϵ∥22 .
2
As we assume a well-specified linear model Y = Xθ⋆ + ϵ, this implies
2 2 2
∥ϵ∥22 ≥ Y − X θb = Xθ⋆ + ϵ − X θb = X θb − Xθ⋆ − 2ϵ⊤ X(θb − θ⋆ ) + ∥ϵ∥22 .
2 2 2

Rearranging terms, we get


ϵ⊤ X(θb − θ⋆ )
X θb − Xθ⋆ ≤2 . (B.6)
2 X(θb − θ⋆ )
2
Let Φ ∈ RN T ×K be an orthonormal basis for the column span of X. Then, there exists ν ∈ RK
such that X(θb − θ⋆ ) = Φν. Letting BK = {u ∈ RK : ||u||2 ≤ 1} be the unit ball in RK , conclude

ϵ⊤ X(θb − θ⋆ ) ϵ⊤ Φν ϵ⊤ Φν
= = by orthonormality of Φ
X(θb − θ⋆ ) ∥Φν∥2 ∥ν∥2
2
≤ sup ϵ⊤ Φµ since ν/||ν||2 ∈ BK .
µ∈Bk

28
From Equation (B.6), we arrive at
!2
2
X θb − Xθ⋆ ≤4· sup ϵ⊤ Φµ . (B.7)
2 µ∈Bk

Since the last expression is maximized at µ = Φ⊤ ϵ/ Φ⊤ ϵ 2


 !2  "K #
X 2
⊤ ⊤
EX  sup ϵ Φµ  = EX [Φ ϵ]i ≤ K · max VarX ([Φ⊤ ϵ]i ).
µ∈Bk i=1,...,K
i=1

We now consider the case of perfectly correlated and fully independent errors separately.

Case for Assumption A (Theorem 3) Recall that under Assumption A, ϵi,t are perfectly
correlated across t. For convenience, we define the vector ⃗σi = [σi,1 , . . . , σi,T ]⊤ ∈ RT and let
Σi = σ⃗i (σ⃗i )⊤ . We begin by noting that

CovX (Φ⊤ ϵ) = Φ⊤ Cov(ϵ)Φ ⪯ λmax (Cov(ϵ))IK×K ,

where Assumption A imposes a block-diagonal structure on Cov(ϵ) such that each block is given
by Σi . Letting η = [η1 , . . . , ηN ]⊤ ∈ RN T so that ηi ∈ RT , we bound the variational representation
for the maximum eigenvalue of Cov(ϵ)
N
X N 
X 2

λmax (Cov(ϵ)) = max η Cov(ϵ)η = max ηi⊤ Σi ηi = max ηi⊤⃗σi
∥η∥2 =1 ∥η∥2 =1 ∥η∥2 =1
i=1 i=1
N
X N
X
≤ max ∥ηi ∥22 ∥⃗σi ∥22 ≤ T σmax
2
· max ∥ηi ∥22 = T · σmax
2
.
∥η∥2 =1 ∥η∥2 =1
i=1 i=1

Noting that CovX (Φ⊤ ϵ) ⪯ T · σmax


2 · IK×K , conclude
2
EX X θb − Xθ⋆ 2
≤ 4σmax KT.
2

Case for Assumption B (Theorem 4) Under Assumption B, ϵi,t is independent across both i
and t. In this case, Cov(ϵ) is a diagonal matrix with entries Var(ϵi,t ) and evidently, λmax (Cov(ϵ)) ≤
2 . As before, Cov (Φ⊤ ϵ) ⪯ σ 2
σmax X max · IK×K , which implies

2
EX X θb − Xθ⋆ 2
≤ 4σmax K.
2

B.5 Proof of Lemma 1 and 2


Under the model in Proposition 1, X ⊤ X can be computed as

1 Z 1 f (Z 1 )
   
Zit fi (Z t )
P P
NT

1 ··· 1 i,t i,t
Z T · ... ..   P Zt t t t 
X ⊤X =  Z 1
P P
··· = i,t Zi i,t Zi fi (Z ) 

. i,t i
f (Z 1 ) · · · f (Z T ) t t t t 2
P P P
1 Z T f (Z T ) i,t fi (Z ) i,t Zi fi (Z ) i,t fi (Z )

29
t ]⊤ and f (Z t ) = [f (Z t ), . . . , f (Z t )]⊤ . Under a completely randomized
where Z t = [Z1t , . . . , ZN 1 N
design with allocation vector p⃗ and linear-in-means interference, we can fully characterize the
design matrix X. Specifically, we know that the first column is the vector 1 ∈ RN T (i.e. the
intercept term), and the second column is the stacked treatment vectors [Z 1 , . . . , Z T ]⊤ ∈ RN T .
The last column is the interference term which can be computed exactly because we know since
1 P
fi (Z t ) = |G(i)| t
j∈G(i) Zj and G is defined by a fully connected network. Therefore, |G(i)| = N − 1
for every i ∈ [N ]. Then if unit i is treated (Zit = 1) at period t we have
Pt
t ( k=1 pk )N −1
fi (Z ) = .
N −1
Otherwise, if unit i is untreated (Zit = 0) at period t we have

( tk=1 pk )N
P
t
fi (Z ) = .
N −1

From here we can easily compute the elements of the matrix X ⊤ X which are given by

X X T
X −1
Zit = t
fi (Z ) = N (T p1 + (T − 1)p2 + · · · + 1 · pT ) = N t · pT −t ,
i,t i,t t=0
T −1 T
P −1 ! !
(N Tt=1 (N Tt=1 pt − 1)
P
X p1 N − 1 pt − 1) X X
Zit fi (Z t )
= p1 N + · · · + N pt + N pt
N −1 N −1 N −1
i,t t=1 t=1
 P  2
J
t=1 t N − 1
T J
!
X p X
=   N pt , and
(N − 1)
J=1 t=1
 P  2 !  PJ  2
J
T − J J
!!
X X t=1 p t N 1 X t=1 pt N X
fi (Z t )2 =   N pt +   N 1− pt .
(N − 1) (N − 1)
i,t J=1 t=1 t=1

Plugging these into our matrix X ⊤ X, we can now solve the characteristic polynomial, det(λI −
X ⊤ X) = 0 as N → ∞ and T → ∞ and examine how the eigenvalues scale with N T . Due to the
analytic complexity of the problem, we compute this in Mathematica using the AsymptoticSolve
method which yields that for N T large enough
NT
λi ≍ for all i
Ci (⃗
p)

where Ci (⃗
p) is a function that only depends on the increment vector p⃗.

Case for Assumption A (Lemma 1) Define C̄1 = K · σmax 2 p) where i⋆ = arg mini λi and
· Ci⋆ (⃗
K is fixed under the model. Under this model c = [0, 1, 1] =⇒ ∥c∥22 = 2. Applying Theorem 3
and plugging in these values yields the desired result: for some M ∈ R and N T > M
 −1
h
⊤ ⋆ 2
i
2 NT 8C̄1
E |c (θ − θ)|
b ≤ KT · σmax · = .
Ci (⃗
p) N

30
Case for Assumption B (Lemma 2) Define C̄2 = K · σmax 2 p) where i⋆ = arg mini λi and
· Ci⋆ (⃗
2 2
σmax is from Assumption B. Again, c = [0, 1, 1] =⇒ ∥c∥2 = 2. Plugging in these values and noting
that T when using Theorem 4. Conclude that for some M ∈ R large enough, whenever N T > M

N T −1 8C̄2
h i  
⊤ ⋆ 2 2
E |c (θ − θ)| ≤ K · σmax ·
b = .
Ci (⃗
p) NT

C Estimation of the TTE under Model Misspecification


In Figure 9, we assess the performance of our estimator based on the potential outcomes model
specified in Cortez et al. (2022b) (Sec. 5 Eq. 4) and reproduced in (3.7). We apply LOPO
to variations of the model in 3.2 including a model with a second-order term. In the following
experiment, we generate data using the model (3.7) with β = 2. We also consider a misspecified
Lagrange interpolation estimator of Cortez et al. (2022b) with β ̸= 2. To facilitate comparison,
we follow Cortez et al. (2022b) and do not include a noise term so that any error is due to model
misspecifcation alone.

Figure 9. Relative Bias of Estimators of the TTE: figures are generated by averaging the results
of 100 experiments. In each experiment sample size is given by the x-axis, N , with T = 4 and an
uneven roll-out. We display bootstrapped 95% confidence intervals of the relative bias. We use the
DGP given by Cortez et al. (2022b) in Eq. (3.7) with r = 2 and β = 2. Note that the correctly
specified PI estimator (with β = 2) has constant zero bias, hence there is no visible CI.

Firstly, we validate that in the well-specified case, we reproduce the results of Yu et al. (2022a),

31
which demonstrates that their estimator has zero bias. As the sample size increases, our mis-
specified estimator using LOPO preforms similarly to the Lagrange interpolation estimator. The
phenomenon shows how linear models are able to approximate polynomial models. This also un-
derscores the need to consider model misspecification in practice as even slight changes in the
interference model estimated can result in possibly large relative biases.

32

You might also like