Identification and estimation of causal effects using non-concurrent controls in platform trials

Michele Santacatterina Corresponding author: [email protected]. This article is based upon work supported by the National Science Foundation under Grant No 2306556 Federico Macchiavelli Giron Xinyi Zhang and IvΓ‘n DΓ­az
Division of Biostatistics
Department of Population Health
New York University School of Medicine

New York
NY 10016
Abstract

Platform trials are multi-arm designs that simultaneously evaluate multiple treatments for a single disease within the same overall trial structure. Unlike traditional randomized controlled trials, they allow treatment arms to enter and exit the trial at distinct times while maintaining a control arm throughout. This control arm comprises both concurrent controls, where participants are randomized concurrently to either the treatment or control arm, and non-concurrent controls, who enter the trial when the treatment arm under study is unavailable. While flexible, platform trials introduce the challenge of using non-concurrent controls, raising questions about estimating treatment effects. Specifically, which estimands should be targeted? Under what assumptions can these estimands be identified and estimated? Are there any efficiency gains? In this paper, we discuss issues related to the identification and estimation assumptions of common choices of estimand. We conclude that the most robust strategy to increase efficiency without imposing unwarranted assumptions is to target the concurrent average treatment effect (cATE), the ATE among only concurrent units, using a covariate-adjusted doubly robust estimator. Our studies suggests that, for the purpose of obtaining efficiency gains, collecting important prognostic variables is more important than relying on non-concurrent controls. We also discuss the perils of targeting ATE due to an untestable extrapolation assumption that will often be invalid. We provide simulations illustrating our points and an application to the ACTT platform trial, resulting in a 20% improvement in precision.


Keywords: adaptive trials; causality; doubly robust; efficiency; estimand

1 Introduction

Platform trials are multi-arm designs that simultaneously evaluate multiple treatments for a single disease within the same overall trial structure (Woodcock &Β LaVange, 2017; Berry etΒ al., 2015; Park etΒ al., 2022). Unlike traditional randomized controlled trials, they allow treatment arms to enter and exit the trial at distinct times while maintaining a control arm throughout. These trials have been instrumental in assessing the efficacy of treatments across various therapeutic areas (Barker etΒ al., 2009; Foltynie etΒ al., 2023; Wells etΒ al., 2012, among others) and gained traction during the COVID-19 pandemic (Hayward etΒ al., 2021; Angus etΒ al., 2020; Kalil etΒ al., 2021, among others). For instance, the Adaptive COVID-19 Treatment Trial (ACTT) (Kalil etΒ al., 2021) was a platform trial that investigated treatments for hospitalized adult patients with COVID-19 pneumonia. ACTT comprised of multiple stages, as depicted in Figure 1. In the initial stage (ACTT-1), the efficacy of remdesivir alone versus placebo was evaluated. Subsequently, in the second stage (ACTT-2), placebo was discontinued, and a new treatment, remdesivir plus baricitinib, was introduced while concurrently randomizing participants to remdesivir alone. Here, the remdesivir alone arm served as a shared arm between the ACTT-1 and ACTT-2 stages. The remdesivir alone arm is termed non-concurrent for remdesivir plus baricitinib during ACTT-1 and concurrent during ACTT-2. In this paper, we adhere to the terminology used in current literature (BofillΒ Roig etΒ al., 2023; Lee &Β Wason, 2020) and designate the shared arm as control, irrespective of whether it is a placebo arm or an active control or an experimental treatment. Thus, we consistently use the terms concurrent and non-concurrent controls regardless of the nature of the shared arm.

Refer to caption
Figure 1: Adaptive COVID-19 Treatment Trial (ACTT) schema. Example of concurrent and non-concurrent arm

The central question revolves around the efficient utilization of non-concurrent controls to estimate treatment effects in platform trials. Specifically, what estimands should be targeted to evaluate the causal effect of a treatment versus a shared control? Under what assumptions can these estimands be identified and estimated? Does using non-concurrent controls lead to efficiency gains?

Addressing these questions requires careful consideration of how the timing of entry into the platform trial may introduce bias into the study results, which is referred to as β€œtime drift”, β€œtemporal drift” or β€œtime trend”. Various methods have been proposed to control for it, including test-then-pool approaches (Viele etΒ al., 2014), frequentist and Bayesian regression models (Lee &Β Wason, 2020; Sridhara etΒ al., 2022; BofillΒ Roig etΒ al., 2023; Saville etΒ al., 2022), propensity-score-based methods (Yuan etΒ al., 2019; Chen etΒ al., 2020), and other approaches (Han etΒ al., 2017; Collignon etΒ al., 2020; Ibrahim &Β Chen, 2000; Neuenschwander etΒ al., 2009; Banbeta etΒ al., 2019; Gravestock etΒ al., 2017; Bennett etΒ al., 2021; Hobbs etΒ al., 2011; Normington etΒ al., 2020; Schmidli etΒ al., 2020; Hupf etΒ al., 2021; Jiang etΒ al., 2023).

While these methods provide a statistical way to incorporate non-concurrent controls and control for the β€œtemporal drift” bias, these approaches are β€œmodel-first”, meaning that they are focused on first providing a model for the outcome and then reverse engineering interpretations for the estimated parameters in terms of causal effects. This approach conflicts with the recently advocated estimand framework (FDA, 2021) and the International Council for Harmonisation (ICH) E9(R1) guidance (International Council for Harmonisation, 2017), where the causal target of interest is first identified based solely on scientific discussions, and then the optimal statistical estimation method for that target is deployed. Existing methods for the use of non-concurrent controls lack a formal framework for characterizing causal effects and their identifying conditions, which implies that interpreting the effect estimates from these procedures and making recommendations regarding clinical practice become challenging. These concerns are underscored in recent reviews (Collignon etΒ al., 2022; Koenig etΒ al., 2024) and are discussed in the FDA estimand framework (FDA, 2021) and the ICH E9(R1) guidance (International Council for Harmonisation, 2017).

In this paper, we discuss the use of non-concurrent controls, using the estimand framework to guide our discussion and choices. We propose to target the concurrent average treatment effect of treatment arm kπ‘˜kitalic_k, 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), as an estimand of interest in platform trials. Specifically, 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) is the marginal average difference in outcomes for individuals who receive treatment kπ‘˜kitalic_k compared to those in the shared control group, among the concurrent population. We then provide assumptions for its non-parametric identification, and show that these assumptions are all feasible, in contrast to the assumptions required for identification of the average treatment effect, which are not testable. We develop several estimators for 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), including outcome regression, inverse probability weighting, and doubly robust estimators. We show that efficiency gains can be obtained by leveraging non-concurrent controls for estimators based on outcome regression under correct models specification. Interestingly, we also show that there are no asymptotic gains in efficiency when using non-concurrent controls with doubly robust estimators adjusted by time of entry into the trial when treatment availability is a deterministic function of entry time. However, we show that efficiency gains can be obtained when treatment availability is a stochastic function of entry time.

In randomized trials, efficiency gains may come from multiple sources. For instance, one can attempt to gain efficiency by increasing the sample size, as illustrated by the use of non-concurrent controls. Alternatively, precision may be increased through adjustment for prognostic variables (see Colantuoni &Β Rosenblum, 2015; Benkeser etΒ al., 2021, among others). Prognostic variables are often incorporated though regression models, which can then be mapped into conditional or marginal effect estimates. However, it is important to remember that outcome models may lead to biased results under certain types of misspecification. It is therefore important to use doubly robust estimators for covariate adjustment. Doubly robust estimators are consistent when either the treatment assignment or the outcome model is correctly specified, a property we obtain β€œfor free” in platform trials due to randomization. Therefore, a key takeaway when targeting 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) in platform trials is to use a doubly robust estimator that prioritizes identifying strong prognostic baseline variables rather than relying on non-concurrent controls. The latter provide no efficiency benefit when using a robust estimator that does not rely on the ability to correctly specify the outcome regression mechanism when treatment availability is a deterministic function of entry time.

Finally, we further highlight the risks of targeting the average treatment effect using data for the entire duration of the trial including the non-concurrent period, due to its dependence on an untestable extrapolation assumption.

2 Notation and setup

For each of i∈{1,…,n}𝑖1…𝑛i\in\{1,\ldots,n\}italic_i ∈ { 1 , … , italic_n } study participants, let Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the (random) entry time, after eligibility screening and consent, of a unit into the study, let Wisubscriptπ‘Šπ‘–W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote a set of baseline variables, let Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the randomized treatment taking values k=0,1,…,Kπ‘˜01…𝐾k=0,1,\ldots,Kitalic_k = 0 , 1 , … , italic_K, where 00 denotes the control arm and 1,…,K1…𝐾1,\ldots,K1 , … , italic_K denotes the treatments of interest. Let Vk,isubscriptπ‘‰π‘˜π‘–V_{k,i}italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT denote an indicator of whether arm kπ‘˜kitalic_k was available at time Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and define Vi=(V0,i,…,VK,i)subscript𝑉𝑖subscript𝑉0𝑖…subscript𝑉𝐾𝑖V_{i}=(V_{0,i},\ldots,V_{K,i})italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_V start_POSTSUBSCRIPT 0 , italic_i end_POSTSUBSCRIPT , … , italic_V start_POSTSUBSCRIPT italic_K , italic_i end_POSTSUBSCRIPT ). Let Yisubscriptπ‘Œπ‘–Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote a binary or numerical outcome measured at a fixed time after entry Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The observed data is D=(Z1,…,Zn)𝐷subscript𝑍1…subscript𝑍𝑛D=(Z_{1},\ldots,Z_{n})italic_D = ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), where Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the data for the experimental unit i𝑖iitalic_i, i.e., Zi=(Ei,Wi,Vi,Ai,Yi)βˆΌπ–―subscript𝑍𝑖subscript𝐸𝑖subscriptπ‘Šπ‘–subscript𝑉𝑖subscript𝐴𝑖subscriptπ‘Œπ‘–similar-to𝖯Z_{i}=(E_{i},W_{i},V_{i},A_{i},Y_{i})\sim\mathsf{P}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∼ sansserif_P. We define V0=V1=β‹―=Vj=1subscript𝑉0subscript𝑉1β‹―subscript𝑉𝑗1V_{0}=V_{1}=\cdots=V_{j}=1italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = β‹― = italic_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 1 with probability one so that at least j𝑗jitalic_j treatments plus control are available at the start of the trial. We also assume the data are ordered in time of study entry in the sense that E1≀E2≀⋯≀Ensubscript𝐸1subscript𝐸2β‹―subscript𝐸𝑛E_{1}\leq E_{2}\leq\cdots\leq E_{n}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≀ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≀ β‹― ≀ italic_E start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Note that 𝖯⁒(Ai=k∣Vi,k=0)=0𝖯subscript𝐴𝑖conditionalπ‘˜subscriptπ‘‰π‘–π‘˜00\mathsf{P}(A_{i}=k\mid V_{i,k}=0)=0sansserif_P ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k ∣ italic_V start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = 0 ) = 0 by design.

2.1 A structural causal model and associated DAG

To encapsulate the role of entry time and non-concurrent controls in platform trials, we posit the structural causal model and directed acyclic graph (DAG) (Pearl, 1995) represented in Figure 2, and its interpretation in terms of a non-parametric structural equation model in eq. (1) respectively.

E𝐸Eitalic_EWπ‘ŠWitalic_WA𝐴Aitalic_AVksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTYπ‘ŒYitalic_Y
Figure 2: DAG associated to the structural equation model in equation (1).
Eisubscript𝐸𝑖\displaystyle E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =fE⁒(UE,i),absentsubscript𝑓𝐸subscriptπ‘ˆπΈπ‘–\displaystyle=f_{E}(U_{E,i}),= italic_f start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT italic_E , italic_i end_POSTSUBSCRIPT ) ,
Wisubscriptπ‘Šπ‘–\displaystyle W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =fW⁒(Ei,UW,i),absentsubscriptπ‘“π‘Šsubscript𝐸𝑖subscriptπ‘ˆπ‘Šπ‘–\displaystyle=f_{W}(E_{i},U_{W,i}),= italic_f start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPT ) ,
Vk,isubscriptπ‘‰π‘˜π‘–\displaystyle V_{k,i}italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =fVk⁒(Ei,UVk,i),absentsubscript𝑓subscriptπ‘‰π‘˜subscript𝐸𝑖subscriptπ‘ˆsubscriptπ‘‰π‘˜π‘–\displaystyle=f_{V_{k}}(E_{i},U_{V_{k},i}),= italic_f start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_i end_POSTSUBSCRIPT ) , (1)
Aisubscript𝐴𝑖\displaystyle A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =fA⁒(Vk,i,Wi,UA,i),absentsubscript𝑓𝐴subscriptπ‘‰π‘˜π‘–subscriptπ‘Šπ‘–subscriptπ‘ˆπ΄π‘–\displaystyle=f_{A}(V_{k,i},W_{i},U_{A,i}),= italic_f start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_A , italic_i end_POSTSUBSCRIPT ) ,
Yisubscriptπ‘Œπ‘–\displaystyle Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =fY⁒(Ai,Wi,Ei,UY,i).absentsubscriptπ‘“π‘Œsubscript𝐴𝑖subscriptπ‘Šπ‘–subscript𝐸𝑖subscriptπ‘ˆπ‘Œπ‘–\displaystyle=f_{Y}(A_{i},W_{i},E_{i},U_{Y,i}).= italic_f start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_Y , italic_i end_POSTSUBSCRIPT ) .

We now discuss some important features of ModelΒ (1). ModelΒ (1) allows all variables to be dependent, directly or through other variables, on entry time Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and therefore appropriately models temporal drifts. It also allows the treatment assignment Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to depend on the participant’s covariates Wisubscriptπ‘Šπ‘–W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, thus allowing study designs such as stratified randomization (Broglio, 2018). ModelΒ (1) also imposes some exclusion restrictions. First, the treatment assignment Aisubscript𝐴𝑖A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is not allowed to depend on the entry time Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT other than through treatment availability Vk,isubscriptπ‘‰π‘˜π‘–V_{k,i}italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT. In other words, a participant entering the study at time Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can only be assigned to available treatments at that time, but the randomization probability of a treatment that is available for assignment does not vary in time. Second, the outcome for unit i𝑖iitalic_i, Yisubscriptπ‘Œπ‘–Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, is not allowed to depend on the availability of treatments Vk,isubscriptπ‘‰π‘˜π‘–V_{k,i}italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT, other than through the treatment actually given to unit i𝑖iitalic_i, but it is allowed to directly depends on unit’s i𝑖iitalic_i entry time. Third, the availability of treatments Vk,isubscriptπ‘‰π‘˜π‘–V_{k,i}italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT does not depend on covariates Wisubscriptπ‘Šπ‘–W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The two last assumptions are reasonable assumptions since the treatments under evaluation do not often depend on trial data. In this paper, we assumed that covariates Wisubscriptπ‘Šπ‘–W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT depend on entry time Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and not the other way around. This assumption is reasonable because, in many platform trials, entry time does not depend on individual-level data. Furthermore, it can be shown that the results presented in the next sections also hold when entry time Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT depends on Wisubscriptπ‘Šπ‘–W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. In ModelΒ (1), the functions fEsubscript𝑓𝐸f_{E}italic_f start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT, fWsubscriptπ‘“π‘Šf_{W}italic_f start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT, and fYsubscriptπ‘“π‘Œf_{Y}italic_f start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT are completely unknown, thus making the model non-parametric, while the treatment assignment function, fAsubscript𝑓𝐴f_{A}italic_f start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT, and treatment availability function, fVksubscript𝑓subscriptπ‘‰π‘˜f_{V_{k}}italic_f start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT, are known by design. In addition, the random variables UE,isubscriptπ‘ˆπΈπ‘–U_{E,i}italic_U start_POSTSUBSCRIPT italic_E , italic_i end_POSTSUBSCRIPT, UW,isubscriptπ‘ˆπ‘Šπ‘–U_{W,i}italic_U start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPT, and UY,isubscriptπ‘ˆπ‘Œπ‘–U_{Y,i}italic_U start_POSTSUBSCRIPT italic_Y , italic_i end_POSTSUBSCRIPT are unmeasured factors that impact the entry time, covariates, and outcomes, respectively. The random variables UA,isubscriptπ‘ˆπ΄π‘–U_{A,i}italic_U start_POSTSUBSCRIPT italic_A , italic_i end_POSTSUBSCRIPT control the randomization probabilities and are known by design. The random variables UVksubscriptπ‘ˆsubscriptπ‘‰π‘˜U_{V_{k}}italic_U start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT represent all factors that determine the availability of treatments for subjects in the trial.

In the following section, under this model and its associated DAG, we define the concurrent average treatment effect as the causal estimand of interest and introduce its identification assumptions, aligning with the estimand framework advocated by the FDA (FDA, 2021).

3 Definition and identification of the concurrent average treatment effect

In this paper, we focus on endpoints measured at fixed time-points post-randomization. Additionally, we consider an intention-to-treat (ITT) analysis. Our results can be easily extended to binary endpoints. We define the concurrent average treatment effect of treatment kπ‘˜kitalic_k against a shared control arm in terms of counterfactual variables (Pearl, 2010), Yi⁒(k)=fY⁒(k,Vk,i,Wi,Ei,UY,i)subscriptπ‘Œπ‘–π‘˜subscriptπ‘“π‘Œπ‘˜subscriptπ‘‰π‘˜π‘–subscriptπ‘Šπ‘–subscript𝐸𝑖subscriptπ‘ˆπ‘Œπ‘–Y_{i}(k)=f_{Y}(k,V_{k,i},W_{i},E_{i},U_{Y,i})italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) = italic_f start_POSTSUBSCRIPT italic_Y end_POSTSUBSCRIPT ( italic_k , italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_Y , italic_i end_POSTSUBSCRIPT ), where k=0,…,Kπ‘˜0…𝐾k=0,\dots,Kitalic_k = 0 , … , italic_K, that would have been observed in a hypothetical world where treatment Ai=ksubscriptπ΄π‘–π‘˜A_{i}=kitalic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k had been given, i.e., 𝖯⁒(Ai=k)=1𝖯subscriptπ΄π‘–π‘˜1\mathsf{P}(A_{i}=k)=1sansserif_P ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k ) = 1. We first define it and then discuss their non-parametric identification.

Definition 1 (Conditional and marginal average treatment effect of treatment arm kπ‘˜kitalic_k compared to shared control among concurrent population).
𝖼𝖒𝖠𝖳𝖀⁒(k,w,e)π–Όπ–’π– π–³π–€π‘˜π‘€π‘’\displaystyle\mathsf{cCATE}(k,w,e)sansserif_cCATE ( italic_k , italic_w , italic_e ) =𝖀⁒[Y⁒(k)βˆ’Y⁒(0)∣W=w,E=e,Vk=1]absent𝖀delimited-[]formulae-sequenceπ‘Œπ‘˜conditionalπ‘Œ0π‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}[Y(k)-Y(0)\mid W=w,E=e,V_{k}=1]= sansserif_E [ italic_Y ( italic_k ) - italic_Y ( 0 ) ∣ italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ]
𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\displaystyle\mathsf{cATE}(k)sansserif_cATE ( italic_k ) =𝖀⁒[𝖼𝖒𝖠𝖳𝖀⁒(k,W,E)∣Vk=1].absent𝖀delimited-[]conditionalπ–Όπ–’π– π–³π–€π‘˜π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}[\mathsf{cCATE}(k,W,E)\mid V_{k}=1].= sansserif_E [ sansserif_cCATE ( italic_k , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] .

𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) is the ITT-average treatment effect among only concurrent units, Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1. 𝖼𝖒𝖠𝖳𝖀⁒(k,w,e)π–Όπ–’π– π–³π–€π‘˜π‘€π‘’\mathsf{cCATE}(k,w,e)sansserif_cCATE ( italic_k , italic_w , italic_e ) is its conditional versions, conditioning on baseline variables Wπ‘ŠWitalic_W and entry time E𝐸Eitalic_E. We now provide assumptions to identify it.

3.1 Non-parametric identification

Non-parametric identification allows us to express the causal target quantity of interest in terms of the distribution of the observed data without relying on assumptions on the functional form of the distributions (Pearl, 1995). In order to discuss non-parametric identification of 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), we introduce the following assumptions:

A1weak A-ignorability.

Assume
𝖀⁒[Y⁒(k)|W=w,E=e,Vk=1]=𝖀⁒[Y⁒(k)|A=k,W=w,E=e,Vk=1]𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ‘˜π‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ‘˜π΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1\mathsf{E}[Y(k)|W=w,E=e,V_{k}=1]=\mathsf{E}[Y(k)|A=k,W=w,E=e,V_{k}=1]sansserif_E [ italic_Y ( italic_k ) | italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] = sansserif_E [ italic_Y ( italic_k ) | italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ].

A2Consistency.

Assume
𝖯⁒(Y⁒(k)|A=k,W=w,E=e,Vk=1)=𝖯⁒(Y|A=k,W=w,E=e,Vk=1)𝖯formulae-sequenceconditionalπ‘Œπ‘˜π΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1𝖯formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1\mathsf{P}(Y(k)|A=k,W=w,E=e,V_{k}=1)=\mathsf{P}(Y|A=k,W=w,E=e,V_{k}=1)sansserif_P ( italic_Y ( italic_k ) | italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) = sansserif_P ( italic_Y | italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ).

A3Positivity of treatment assignment mechanism among concurrent units.

Assume
𝖯(A=k∣W=w,E=e,Vk=1)>0\mathsf{P}(A=k\mid W=w,E=e,V_{k}=1)>0sansserif_P ( italic_A = italic_k ∣ italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) > 0 for all w𝑀witalic_w and e𝑒eitalic_e s.t. Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1.

A4Positivity of shared arm assignment mechanism among all controls.

Assume
𝖯(A=0∣W=w,E=e)>0\mathsf{P}(A=0\mid W=w,E=e)>0sansserif_P ( italic_A = 0 ∣ italic_W = italic_w , italic_E = italic_e ) > 0 for all w𝑀witalic_w and e𝑒eitalic_e.

A5Pooling concurrent and non-concurrent controls.

Assume
𝖀⁒(Y∣A=0,W=w,E=e,Vk=1)=𝖀⁒(Y∣A=0,W=w,E=e)𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=0,W=w,E=e,V_{k}=1)=\mathsf{E}(Y\mid A=0,W=w,E=e)sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) = sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e ) for all e𝑒eitalic_e s.t. Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1.

AssumptionΒ A1 is an untestable assumption, i.e., it is a function of counterfactuals which are unobservable; that state that once we control for Wπ‘ŠWitalic_W, E𝐸Eitalic_E and Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the counterfactual outcome under kπ‘˜kitalic_k is independent from the treatment assignment. We expect this to hold by design because of randomization.

AssumptionΒ A2 is a standard causal inference assumption that states that under Vk=vsubscriptπ‘‰π‘˜π‘£V_{k}=vitalic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v and once controlled for W,Eπ‘ŠπΈW,Eitalic_W , italic_E, the distribution of the observed outcome under A=kπ΄π‘˜A=kitalic_A = italic_k is the same as that of the counterfactual outcome Y⁒(k)π‘Œπ‘˜Y(k)italic_Y ( italic_k ) for all kπ‘˜kitalic_k in {0,…,K}0…𝐾\{0,\ldots,K\}{ 0 , … , italic_K }. This assumption is implied by the structural causal model (1). We expect this also to hold by design.

AssumptionΒ A3 states that once a treatment arm is available in the trial all covariate profiles w,e𝑀𝑒w,eitalic_w , italic_e have a positive probability of receiving such treatment. Similarly, AssumptionΒ A4 states that within the shared control arm, covariate profiles w,e𝑀𝑒w,eitalic_w , italic_e have a positive probability of receiving the control group (Note that assumptionΒ A4 is redundant given assumptionΒ A3 but helps clarifying our identification proofs). These two assumptions hold by design in platform trials.

AssumptionΒ A5 states that once we control for Wπ‘ŠWitalic_W and E𝐸Eitalic_E, the conditional expectation of outcome Yπ‘ŒYitalic_Y under control (A=0𝐴0A=0italic_A = 0) in the pooled concurrent and non-concurrent units (Vk=0subscriptπ‘‰π‘˜0V_{k}=0italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0, and Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 – right-hand side of assumptionΒ A5) is the same as that among only concurrent units (Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 – left-hand side of assumptionΒ A5), for all values e𝑒eitalic_e among the concurrent units. In other words, after conditioning on Wπ‘ŠWitalic_W and E𝐸Eitalic_E, what is learned using all the pooled data can be used to predict conditional expectations under only concurrent. In addition, it is straightforward to see that these quantities depends only on observable data. For instance, we know by design that we have data for the left-hand side of assumptionΒ A5 for all e𝑒eitalic_e such that Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 and for the right-hand side, for all e𝑒eitalic_e in the shared arm. Therefore assumptionΒ A5 can be tested as discussed in our practical guidelines in section 8.

Remark 1.

Under Model (1) with Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT a deterministic function of E𝐸Eitalic_E, i.e., Vk,i=πŸ™β’[Ei>tk]subscriptπ‘‰π‘˜π‘–1delimited-[]subscript𝐸𝑖subscriptπ‘‘π‘˜V_{k,i}=\mathds{1}[{E_{i}>t_{k}}]italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = blackboard_1 [ italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ], where tksubscriptπ‘‘π‘˜t_{k}italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a positive scalar, assumptionΒ A5 holds always true at the population level and for the true conditional outcome expectations.

In contrast, when Vk,isubscriptπ‘‰π‘˜π‘–V_{k,i}italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT is a stochastic function of E𝐸Eitalic_E, i.e., Vk,i=fVk⁒(Ei,UVk,i)subscriptπ‘‰π‘˜π‘–subscript𝑓subscriptπ‘‰π‘˜subscript𝐸𝑖subscriptπ‘ˆsubscriptπ‘‰π‘˜π‘–V_{k,i}=f_{V_{k}}(E_{i},U_{V_{k},i})italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_i end_POSTSUBSCRIPT ), an assumption is needed since we do not know UVk,isubscriptπ‘ˆsubscriptπ‘‰π‘˜π‘–U_{V_{k},i}italic_U start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_i end_POSTSUBSCRIPT. e.g., the unknown error can be different between the two expectations. In addition, while this remark is always true at the population level and for the true conditional outcome expectations, for given estimators, its validity depends on the correct model specification. For instance, if there are non-linearities in Wπ‘ŠWitalic_W in the data, and we fit a linear models within the pooled dataset and within the concurrent dataset, these two linear regressions will not be equal because they will capture the projection of the true non-linear expectation onto linear models in different subset of the range of Wπ‘ŠWitalic_W. This underscores the importance of using non-parametric models for these regressions if data are to be pooled.

We now provide an identification theorem, under assumptions A1-A5.

Theorem 1 (Identification of 𝖼𝖠𝖳𝖀𝖼𝖠𝖳𝖀\mathsf{cATE}sansserif_cATE in platform adaptive trials under ModelΒ (1)).

Assume ModelΒ (1) and assumptions A1-A3. Then we have:

  1. 1.

    The parameter 𝖼𝖒𝖠𝖳𝖀⁒(k,w,e)π–Όπ–’π– π–³π–€π‘˜π‘€π‘’\mathsf{cCATE}(k,w,e)sansserif_cCATE ( italic_k , italic_w , italic_e ) is identified as

    𝖀⁒(Y∣A=k,W=w,E=e,Vk=1)βˆ’π–€β’(Y∣A=0,W=w,E=e,Vk=1),𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1\mathsf{E}(Y\mid A=k,W=w,E=e,V_{k}=1)-\mathsf{E}(Y\mid A=0,W=w,E=e,V_{k}=1),sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) , (2)
  2. 2.

    Under assumptions A1-A5, 𝖼𝖒𝖠𝖳𝖀⁒(k,w,e)π–Όπ–’π– π–³π–€π‘˜π‘€π‘’\mathsf{cCATE}(k,w,e)sansserif_cCATE ( italic_k , italic_w , italic_e ) is also identified as

    𝖀⁒(Y∣A=k,W=w,E=e,Vk=1)βˆ’π–€β’(Y∣A=0,W=w,E=e).𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=k,W=w,E=e,V_{k}=1)-\mathsf{E}(Y\mid A=0,W=w,E=e).sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e ) . (3)

Furthermore, 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) is identified by taking the average of the above expression for 𝖼𝖒𝖠𝖳𝖀⁒(k)π–Όπ–’π– π–³π–€π‘˜\mathsf{cCATE}(k)sansserif_cCATE ( italic_k ) over the distribution of (W,E)π‘ŠπΈ(W,E)( italic_W , italic_E ) conditional on Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1.

Equivalent expressions based on weighting are provided in the appendix. A comparison between expressions (2) and (3) reveals why researchers have been historically motivated to use non-concurrent controls: they can be useful in estimating the outcome expectation for the controls, therefore potentially reducing the variance of the estimator.

In the next sections, we will show that, when Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a deterministic function of E𝐸Eitalic_E, efficiency gains only bear out for plug-in estimators based on parametric regressions, which can be biased if the models are misspecified. Doubly robust estimators, which are always consistent by virtue of randomization, will not benefit from these efficiency gains. However, we will also show, that efficiency gains can be obtained when Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a stochastic function of E𝐸Eitalic_E.

3.2 On the identification of the average treatment effect

In this paper, we propose targeting 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) in platform trials. However, many researchers are familiar with another estimand, the average treatment effect, defined as the expected difference between treatment kπ‘˜kitalic_k and control 00 in the entire trial population (Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 and Vk=0subscriptπ‘‰π‘˜0V_{k}=0italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0). In formulas,

Definition 2 (Conditional and marginal average treatment effect of treatment arm kπ‘˜kitalic_k compared to shared control).
𝖒𝖠𝖳𝖀⁒(k,w,e)π–’π– π–³π–€π‘˜π‘€π‘’\displaystyle\mathsf{CATE}(k,w,e)sansserif_CATE ( italic_k , italic_w , italic_e ) =𝖀⁒[Y⁒(k)βˆ’Y⁒(0)∣W=w,E=e]absent𝖀delimited-[]formulae-sequenceπ‘Œπ‘˜conditionalπ‘Œ0π‘Šπ‘€πΈπ‘’\displaystyle=\mathsf{E}[Y(k)-Y(0)\mid W=w,E=e]= sansserif_E [ italic_Y ( italic_k ) - italic_Y ( 0 ) ∣ italic_W = italic_w , italic_E = italic_e ]
𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\displaystyle\mathsf{ATE}(k)sansserif_ATE ( italic_k ) =𝖀⁒[𝖒𝖠𝖳𝖀⁒(k,W,E)],absent𝖀delimited-[]π–’π– π–³π–€π‘˜π‘ŠπΈ\displaystyle=\mathsf{E}[\mathsf{CATE}(k,W,E)],= sansserif_E [ sansserif_CATE ( italic_k , italic_W , italic_E ) ] ,

where 𝖒𝖠𝖳𝖀⁒(k,w,e)π–’π– π–³π–€π‘˜π‘€π‘’\mathsf{CATE}(k,w,e)sansserif_CATE ( italic_k , italic_w , italic_e ) is the conditional version of 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ), conditioning on baseline variables Wπ‘ŠWitalic_W and entry time E𝐸Eitalic_E. The familiarity of 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) is partly because, in standard randomized controlled trials, the common statistical model used to evaluate treatment effects is a linear model that regresses the outcome on the treatment group and baseline covariates. In such model, the canonical interpretation of the model coefficient for the treatment aligns with that of the 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ). For instance, assuming the following linear model: 𝖀⁒[Y|A,W,E]=g⁒(A,W,E;Ξ±)=Ξ·+β⁒A+W⁒γw+E⁒γe𝖀delimited-[]conditionalπ‘Œπ΄π‘ŠπΈπ‘”π΄π‘ŠπΈπ›Όπœ‚π›½π΄π‘Šsubscript𝛾𝑀𝐸subscript𝛾𝑒\mathsf{E}\left[Y|A,W,E\right]=g(A,W,E;\alpha)=\eta+\beta A+W\gamma_{w}+E% \gamma_{e}sansserif_E [ italic_Y | italic_A , italic_W , italic_E ] = italic_g ( italic_A , italic_W , italic_E ; italic_Ξ± ) = italic_Ξ· + italic_Ξ² italic_A + italic_W italic_Ξ³ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT + italic_E italic_Ξ³ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, we can show that,

𝖠𝖳𝖀⁒(1)=𝖀⁒[Y⁒(1)βˆ’Y⁒(0)]𝖠𝖳𝖀1𝖀delimited-[]π‘Œ1π‘Œ0\displaystyle\mathsf{ATE}(1)=\mathsf{E}[Y(1)-Y(0)]sansserif_ATE ( 1 ) = sansserif_E [ italic_Y ( 1 ) - italic_Y ( 0 ) ] =𝖀⁒[𝖀⁒[Y|A=1,W,E]]βˆ’π–€β’[𝖀⁒[Y|A=0,W,E]]absent𝖀delimited-[]𝖀delimited-[]conditionalπ‘Œπ΄1π‘ŠπΈπ–€delimited-[]𝖀delimited-[]conditionalπ‘Œπ΄0π‘ŠπΈ\displaystyle=\mathsf{E}[\mathsf{E}[Y|A=1,W,E]]-\mathsf{E}[\mathsf{E}[Y|A=0,W,% E]]= sansserif_E [ sansserif_E [ italic_Y | italic_A = 1 , italic_W , italic_E ] ] - sansserif_E [ sansserif_E [ italic_Y | italic_A = 0 , italic_W , italic_E ] ]
=𝖀⁒[Ξ·+1⁒β+W⁒γw+E⁒γe]βˆ’π–€β’[Ξ·+0⁒β+W⁒γw+E⁒γe]absent𝖀delimited-[]πœ‚1π›½π‘Šsubscript𝛾𝑀𝐸subscript𝛾𝑒𝖀delimited-[]πœ‚0π›½π‘Šsubscript𝛾𝑀𝐸subscript𝛾𝑒\displaystyle=\mathsf{E}[\eta+1\beta+W\gamma_{w}+E\gamma_{e}]-\mathsf{E}[\eta+% 0\beta+W\gamma_{w}+E\gamma_{e}]= sansserif_E [ italic_Ξ· + 1 italic_Ξ² + italic_W italic_Ξ³ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT + italic_E italic_Ξ³ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ] - sansserif_E [ italic_Ξ· + 0 italic_Ξ² + italic_W italic_Ξ³ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT + italic_E italic_Ξ³ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ]
=Ξ².absent𝛽\displaystyle=\beta.= italic_Ξ² .

We now explain why we believe that targeting 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) in platform trials is dangerous. To do so, we start by introducing an additional assumption needed to identify 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) in platform trials:

A6Extrapolation of outcome mechanism among the treated.

Assume
𝖀⁒(Y∣A=k,W=w,E=e,Vk=1)=𝖀⁒(Y∣A=k,W=w,E=e)𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=k,W=w,E=e,V_{k}=1)=\mathsf{E}(Y\mid A=k,W=w,E=e)sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) = sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e ) for all e𝑒eitalic_e.

AssumptionΒ A5 and assumption A6 state that the outcome distribution among controls and treated are exchangeable between patients for whom treatment kπ‘˜kitalic_k is available and those for whom it is not, respectively; given patients’ baseline variables and entry time. Note that assumption A5 and assumption A6 are similar in nature in that they assume exchangeability of the outcome mechanism for treatment and control arms. However, there is a fundamental difference between these assumptions that makes identification based on A5 more reliable than identification based on A6. AssumptionΒ A5 is a testable assumption since it is based on observed data. In addition, as aforementioned, assumption A5 is a statement that holds always true when Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a determinist function of E𝐸Eitalic_E and its validity only depends on the correct model specification. Consequently, whether pooling data in a specific regression algorithm is appropriate can be empirically checked as shown in our practical guidelines.

On the other hand, assumption A6 is an identification assumption based on unobserved data, since it requires assuming that the conditional outcome expectation observed in patients who could hypothetically be randomized to treatment kπ‘˜kitalic_k can be used to extrapolate to those who could not. In other words, A6 is an extrapolation assumption, since it assumes that the expected outcome under treatment A=kπ΄π‘˜A=kitalic_A = italic_k in times E=e𝐸𝑒E=eitalic_E = italic_e and baseline variables W=wπ‘Šπ‘€W=witalic_W = italic_w of no treatment availability Vk=0subscriptπ‘‰π‘˜0V_{k}=0italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 can be extrapolated from a model fit on times E=e𝐸𝑒E=eitalic_E = italic_e and baseline variables W=wπ‘Šπ‘€W=witalic_W = italic_w where the treatment was available. Consequently, assumption A6 cannot be empirically checked.

We know state and show in the appendix that identification of 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) depends on the extrapolation assumption A6.

Theorem 2 (Identification of 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) in platform adaptive trials under ModelΒ (1)).

Assume ModelΒ (1) and assumptions A1-A6. Then we have that 𝖒𝖠𝖳𝖀⁒(k,w,e)π–’π– π–³π–€π‘˜π‘€π‘’\mathsf{CATE}(k,w,e)sansserif_CATE ( italic_k , italic_w , italic_e ) is identified as (3). Consequently, 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) is identified by taking the average of the above expression for 𝖒𝖠𝖳𝖀⁒(k)π–’π– π–³π–€π‘˜\mathsf{CATE}(k)sansserif_CATE ( italic_k ) over the marginal distribution of (W,E)π‘ŠπΈ(W,E)( italic_W , italic_E ).

In summary, unlike 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) depends on an extrapolation assumption (A6) which can be risky. Firstly, this assumption cannot be tested. Secondly, it is often unrealistic for novel diseases with a rapidly changing pathology and clinical landscape.

4 Relation to analytical approaches common in the literature

Regression models are often used to estimate 𝖀⁒(Y∣A=k,E=e,Vk=1)𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1\mathsf{E}(Y\mid A=k,E=e,V_{k}=1)sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) or 𝖀⁒(Y∣A=k,W=w,E=e,Vk=1)𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1\mathsf{E}(Y\mid A=k,W=w,E=e,V_{k}=1)sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) and then used to extrapolate to units where Vk=0subscriptπ‘‰π‘˜0V_{k}=0italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0, thus targeting 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) as discussed in Section 3.2. Inferences are then made using the regression coefficient related to the treatment, whether within the frequentist (Lee &Β Wason, 2020; BofillΒ Roig, Krotka, Burman, Glimm, Gold &Β Hees, 2022) or Bayesian framework (Saville etΒ al., 2022; BofillΒ Roig, KΓΆnig, Meyer &Β Posch, 2022; Ibrahim &Β Chen, 2000).

Matching techniques have been proposed to estimate the average treatment effect among the treated, 𝖀⁒[Y⁒(k)βˆ’Y⁒(0)|A=k]𝖀delimited-[]π‘Œπ‘˜conditionalπ‘Œ0π΄π‘˜\mathsf{E}[Y(k)-Y(0)|A=k]sansserif_E [ italic_Y ( italic_k ) - italic_Y ( 0 ) | italic_A = italic_k ]. The idea is to balance covariates Wπ‘ŠWitalic_W between concurrent and non-concurrent controls by using for instance a matching algorithm based on the propensity score (Yuan etΒ al., 2019).

Bayesian methods have been proposed to include non-concurrent controls. The idea is to learn a prior of the parameter of interest using non-concurrent controls only. Then, this prior is combined with the concurrent control data via Bayes’ theorem. Meta-analytic priors (Schmidli etΒ al., 2014) or elastic priors (Jiang etΒ al., 2023) have been proposed. These methods assume an exchangeability assumption for the control parameters, which relates to A5. Other Bayesian methods have been proposed (Neuenschwander etΒ al., 2009; Bennett etΒ al., 2021; Wei etΒ al., 2024, among others). These methods, however, do not allow for the use of baseline covariates Wπ‘ŠWitalic_W and it is not clear what estimands they target.

In this paper, to estimate 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), we propose estimators based on outcome regression (OR) and inverse-probability-weighting (IPW), and doubly robust estimators.

5 Estimation of 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k )

To build intuition, we start by introducing outcome regression (OR) and inverse probability weighting (IPW) estimators for 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) considering Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT a deterministic function of E𝐸Eitalic_E. Since OR and IPW estimators are not robust to model misspecification, we then propose doubly robust (DR) estimators. To simplify notation, the following sections assume there are only two treatment arms k=1π‘˜1k=1italic_k = 1 and k=0π‘˜0k=0italic_k = 0. Furthermore, we assume that V1=πŸ™β’{E>t}subscript𝑉11𝐸𝑑V_{1}=\mathds{1}\{E>t\}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = blackboard_1 { italic_E > italic_t } for some time t𝑑titalic_t such that treatment A=1𝐴1A=1italic_A = 1 is only available for patients who entered the trial after time t𝑑titalic_t.

5.1 Estimators based on parametric outcome regression

Based on the identification results presented in Theorem 1, eq.Β (2), the conditional mean 𝖀⁒(Y∣A=k,W=w,E=e,V1=1)𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscript𝑉11\mathsf{E}(Y\mid A=k,W=w,E=e,V_{1}=1)sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 ), where a={0,1}π‘Ž01a=\{0,1\}italic_a = { 0 , 1 }, can be modelled as

𝖀⁒(Y∣A=k,W=w,E=e,V1=1)=ΞΌoc⁒(k,w,e,1;Ξ²a),𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscript𝑉11subscriptπœ‡ocπ‘˜π‘€π‘’1subscriptπ›½π‘Ž\mathsf{E}(Y\mid A=k,W=w,E=e,V_{1}=1)=\mu_{\text{oc}}(k,w,e,1;\beta_{a}),sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 ) = italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_k , italic_w , italic_e , 1 ; italic_Ξ² start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ,

where (oc) stands for only-concurrent. Based on the identification results presented in Theorem 1, eq.Β (3) , the conditional mean 𝖀⁒(Y∣A=0,W=w,E=e)𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=0,W=w,E=e)sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e ), can be modelled as

𝖀⁒(Y∣A=0,W=w,E=e)=ΞΌall⁒(0,w,e;Ξ±a).𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€πΈπ‘’subscriptπœ‡all0𝑀𝑒subscriptπ›Όπ‘Ž\mathsf{E}(Y\mid A=0,W=w,E=e)=\mu_{\text{all}}(0,w,e;\alpha_{a}).sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e ) = italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_w , italic_e ; italic_Ξ± start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) .

An estimate of Ξ²asubscriptπ›½π‘Ž\beta_{a}italic_Ξ² start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and Ξ±asubscriptπ›Όπ‘Ž\alpha_{a}italic_Ξ± start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT can be then obtained by using maximum likelihood estimation, i.e., ordinary least squares, only among the concurrent controls Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 for ΞΌoc⁒(a,w,e,1;Ξ²a)subscriptπœ‡ocπ‘Žπ‘€π‘’1subscriptπ›½π‘Ž\mu_{\text{oc}}(a,w,e,1;\beta_{a})italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_a , italic_w , italic_e , 1 ; italic_Ξ² start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) and among all concurrent and non-concurrent controls when using ΞΌall⁒(0,w,e;Ξ±a)subscriptπœ‡all0𝑀𝑒subscriptπ›Όπ‘Ž\mu_{\text{all}}(0,w,e;\alpha_{a})italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_w , italic_e ; italic_Ξ± start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ), i.e., among A=0𝐴0A=0italic_A = 0 only. Let Ξ²^asubscript^π›½π‘Ž\hat{\beta}_{a}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and Ξ±^asubscript^π›Όπ‘Ž\hat{\alpha}_{a}over^ start_ARG italic_Ξ± end_ARG start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT denote consistent estimators of Ξ²asubscriptπ›½π‘Ž\beta_{a}italic_Ξ² start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and Ξ±asubscriptπ›Όπ‘Ž\alpha_{a}italic_Ξ± start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, respectively. We then propose

𝖼𝖠𝖳𝖀^ORocsuperscriptsubscript^𝖼𝖠𝖳𝖀ORoc\displaystyle\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT =βˆ‘i=1nπŸ™β’{Vk,i=1}⁒μoc⁒(1,wi,ei,1;Ξ²^1)βˆ‘i=1nπŸ™β’{vk,i=1}absentsuperscriptsubscript𝑖1𝑛1subscriptπ‘‰π‘˜π‘–1subscriptπœ‡oc1subscript𝑀𝑖subscript𝑒𝑖1subscript^𝛽1superscriptsubscript𝑖1𝑛1subscriptπ‘£π‘˜π‘–1\displaystyle=\frac{\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1}\}\mu_{\text{oc}}(1,w% _{i},e_{i},1;\hat{\beta}_{1})}{\sum_{i=1}^{n}\mathds{1}{\{v_{k,i}=1}\}}= divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ; over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_v start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG
βˆ’βˆ‘i=1nπŸ™β’{Vk,i=1}⁒μoc⁒(0,wi,ei,1;Ξ²^0)βˆ‘i=1nπŸ™β’{Vk,i=1},superscriptsubscript𝑖1𝑛1subscriptπ‘‰π‘˜π‘–1subscriptπœ‡oc0subscript𝑀𝑖subscript𝑒𝑖1subscript^𝛽0superscriptsubscript𝑖1𝑛1subscriptπ‘‰π‘˜π‘–1\displaystyle-\frac{\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1}\}\mu_{\text{oc}}(0,w% _{i},e_{i},1;\hat{\beta}_{0})}{\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1}\}},- divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ; over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG ,

as an outcome regression estimator for 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ). Under Theorem 1, eq.Β (3), we propose the alternative outcome regression estimator for 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ),

𝖼𝖠𝖳𝖀^ORallsuperscriptsubscript^𝖼𝖠𝖳𝖀ORall\displaystyle\hat{\mathsf{cATE}}_{\text{OR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT =βˆ‘i=1nπŸ™β’{Vk,i=1}⁒μoc⁒(1,wi,ei,1;Ξ²^1)βˆ‘i=1nπŸ™β’{vk,i=1}absentsuperscriptsubscript𝑖1𝑛1subscriptπ‘‰π‘˜π‘–1subscriptπœ‡oc1subscript𝑀𝑖subscript𝑒𝑖1subscript^𝛽1superscriptsubscript𝑖1𝑛1subscriptπ‘£π‘˜π‘–1\displaystyle=\frac{\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1}\}\mu_{\text{oc}}(1,w% _{i},e_{i},1;\hat{\beta}_{1})}{\sum_{i=1}^{n}\mathds{1}{\{v_{k,i}=1}\}}= divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ; over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_v start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG
βˆ’βˆ‘i=1nπŸ™β’{Vk,i=1}⁒μall⁒(0,wi,ei;Ξ±^0)βˆ‘i=1nπŸ™β’{Vk,i=1}.superscriptsubscript𝑖1𝑛1subscriptπ‘‰π‘˜π‘–1subscriptπœ‡all0subscript𝑀𝑖subscript𝑒𝑖subscript^𝛼0superscriptsubscript𝑖1𝑛1subscriptπ‘‰π‘˜π‘–1\displaystyle-\frac{\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1}\}\mu_{\text{all}}(0,% w_{i},e_{i};\hat{\alpha}_{0})}{\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1}\}}.- divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over^ start_ARG italic_Ξ± end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG .
Large sample properties.

We derived the asymptotic properties of 𝖼𝖠𝖳𝖀^ORocsuperscriptsubscript^𝖼𝖠𝖳𝖀ORoc\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT, and 𝖼𝖠𝖳𝖀^ORallsuperscriptsubscript^𝖼𝖠𝖳𝖀ORall\hat{\mathsf{cATE}}_{\text{OR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT, using the approach of M-estimation (Boos &Β Stefanski, 2013, Chapter 7). Under regularity conditions (Boos &Β Stefanski, 2013, Section 7.2), 𝖼𝖠𝖳𝖀^ORocsuperscriptsubscript^𝖼𝖠𝖳𝖀ORoc\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT, 𝖼𝖠𝖳𝖀^ORallsuperscriptsubscript^𝖼𝖠𝖳𝖀ORall\hat{\mathsf{cATE}}_{\text{OR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT, are consistent and asymptotically Normal, with asymptotic variance derived in the appendix.

5.2 An estimator based on parametric inverse probability weighting

Following standard procedures, we model the conditional probability of treatment assignment given Wπ‘ŠWitalic_W and E𝐸Eitalic_E among only Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 by using a logistic regression model,

𝖀⁒(πŸ™β’{A=1}∣W=w,E=e,Vk=1)𝖀formulae-sequenceconditional1𝐴1π‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1\displaystyle\mathsf{E}(\mathds{1}{\{A=1}\}\mid W=w,E=e,V_{k}=1)sansserif_E ( blackboard_1 { italic_A = 1 } ∣ italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) =Ο€oc⁒(w,e,1;Ξ·)=exp⁑(Ξ·T⁒x)1+exp⁑(Ξ·T⁒x),absentsubscriptπœ‹oc𝑀𝑒1πœ‚superscriptπœ‚π‘‡π‘₯1superscriptπœ‚π‘‡π‘₯\displaystyle=\pi_{\text{oc}}(w,e,1;\eta)=\frac{\exp(\eta^{T}x)}{1+\exp(\eta^{% T}x)},= italic_Ο€ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_w , italic_e , 1 ; italic_Ξ· ) = divide start_ARG roman_exp ( italic_Ξ· start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) end_ARG start_ARG 1 + roman_exp ( italic_Ξ· start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x ) end_ARG ,

where x=(w,e,1)π‘₯𝑀𝑒1x=(w,e,1)italic_x = ( italic_w , italic_e , 1 ). We obtain an estimate of Ξ·πœ‚\etaitalic_Ξ· by using maximum likelihood estimation, only among the concurrent controls Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1. We then propose,

𝖼𝖠𝖳𝖀^IPWoc=βˆ‘i=1nΞ³i1β’πŸ™β’{Vk,i=1}⁒yiβˆ‘i=1nΞ³i1βˆ’βˆ‘i=1nΞ³i0β’πŸ™β’{Vk,i=1}⁒yiβˆ‘i=1nΞ³i0,superscriptsubscript^𝖼𝖠𝖳𝖀IPWocsuperscriptsubscript𝑖1𝑛subscriptsuperscript𝛾1𝑖1subscriptπ‘‰π‘˜π‘–1subscript𝑦𝑖superscriptsubscript𝑖1𝑛subscriptsuperscript𝛾1𝑖superscriptsubscript𝑖1𝑛subscriptsuperscript𝛾0𝑖1subscriptπ‘‰π‘˜π‘–1subscript𝑦𝑖superscriptsubscript𝑖1𝑛subscriptsuperscript𝛾0𝑖\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}}=\frac{\sum_{i=1}^{n}\gamma^{1}_{i% }\mathds{1}{\{V_{k,i}=1}\}y_{i}}{\sum_{i=1}^{n}\gamma^{1}_{i}\ }-\frac{\sum_{i% =1}^{n}\gamma^{0}_{i}\mathds{1}{\{V_{k,i}=1}\}y_{i}}{\sum_{i=1}^{n}\gamma^{0}_% {i}\ },over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT = divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Ξ³ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Ξ³ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Ξ³ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Ξ³ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ,

where Ξ³i0=πŸ™β’{Ai=0}/(1βˆ’Ο€^oc)subscriptsuperscript𝛾0𝑖1subscript𝐴𝑖01subscript^πœ‹oc\gamma^{0}_{i}=\mathds{1}{\{A_{i}=0}\}/(1-\hat{\pi}_{\text{oc}})italic_Ξ³ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_1 { italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 } / ( 1 - over^ start_ARG italic_Ο€ end_ARG start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ), Ξ³i1=πŸ™β’{Ai=1}/Ο€^ocsubscriptsuperscript𝛾1𝑖1subscript𝐴𝑖1subscript^πœ‹oc\gamma^{1}_{i}=\mathds{1}{\{A_{i}=1}\}/\hat{\pi}_{\text{oc}}italic_Ξ³ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_1 { italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 } / over^ start_ARG italic_Ο€ end_ARG start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT, and Ο€^oc=Ο€oc⁒(w,e,1;Ξ·^)subscript^πœ‹ocsubscriptπœ‹oc𝑀𝑒1^πœ‚\hat{\pi}_{\text{oc}}=\pi_{\text{oc}}(w,e,1;\hat{\eta})over^ start_ARG italic_Ο€ end_ARG start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT = italic_Ο€ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_w , italic_e , 1 ; over^ start_ARG italic_Ξ· end_ARG ) for clarity.

Large sample properties.

We derived the asymptotic properties of 𝖼𝖠𝖳𝖀^IPWocsuperscriptsubscript^𝖼𝖠𝖳𝖀IPWoc\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT using the approach of M-estimation (Boos &Β Stefanski, 2013, Chapter 7). Under regularity conditions (Boos &Β Stefanski, 2013, Section 7.2), 𝖼𝖠𝖳𝖀^IPWocsuperscriptsubscript^𝖼𝖠𝖳𝖀IPWoc\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT, is consistent and asymptotically Normal, with asymptotic variance derived in the appendix.

5.3 Doubly robust estimators

Doubly robust (DR) estimators for average treatment effects provide consistent estimates by combining outcome regression and IPW. To derive DR estimators of 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), we follow standard practice of constructing them based on efficient influence functions (EIF)s (Bickel etΒ al., 1993; Fisher &Β Kennedy, 2021; Hines etΒ al., 2022; Kennedy, 2022). Influence functions are a core component of classical statistical theory. They aid in constructing estimators with desirable properties such as double robustness, asymptotic normality, and fast rates of convergence. Additionally, they enable the incorporation of machine learning algorithms while preserving valid statistical inferences and providing insights into statistical efficiency, i.e., the best performance for estimating an estimand. We provide efficiency considerations of the proposed estimators in section 6. The next theorem provide these EIFs,

Theorem 3.

The efficient influence function, φ⁒(Z,𝖼𝖠𝖳𝖀⁒(k))πœ‘π‘π–Όπ– π–³π–€π‘˜\varphi(Z,\mathsf{cATE}(k))italic_Ο† ( italic_Z , sansserif_cATE ( italic_k ) ), for 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) under Model (1), is equal to

πŸ™β’{Vk=1}𝖯⁒(Vk=1)[2⁒Aβˆ’1𝖯⁒(A∣W,E,Vk=1){Yβˆ’π–€(Y∣A,W,E,Vk=1)}+𝖀(Y∣A=1,W,E,Vk=1)βˆ’π–€(Y∣A=0,W,E,Vk=1)βˆ’π–Όπ– π–³π–€(k).]\frac{\mathds{1}{\{V_{k}=1\}}}{\mathsf{P}(V_{k}=1)}\bigg{[}\frac{2A-1}{\mathsf% {P}(A\mid W,E,V_{k}=1)}\{Y-\mathsf{E}(Y\mid A,W,E,V_{k}=1)\}\\ +\mathsf{E}(Y\mid A=1,W,E,V_{k}=1)-\mathsf{E}(Y\mid A=0,W,E,V_{k}=1)-\mathsf{% cATE}(k).\bigg{]}start_ROW start_CELL divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 } end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ divide start_ARG 2 italic_A - 1 end_ARG start_ARG sansserif_P ( italic_A ∣ italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - sansserif_E ( italic_Y ∣ italic_A , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) } end_CELL end_ROW start_ROW start_CELL + sansserif_E ( italic_Y ∣ italic_A = 1 , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_cATE ( italic_k ) . ] end_CELL end_ROW (4)

The efficient influence function, φ⁒(Z,𝖼𝖠𝖳𝖀⁒(k))πœ‘π‘π–Όπ– π–³π–€π‘˜\varphi(Z,\mathsf{cATE}(k))italic_Ο† ( italic_Z , sansserif_cATE ( italic_k ) ), for 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) under Model (1) and assuming A5, is equal to

πŸ™β’{Vk=1}𝖯⁒(Vk=1)⁒[A𝖯⁒(A∣Vk=1,W,E)⁒{Yβˆ’π–€β’(Y∣A,W,E,Vk=1)}]βˆ’1βˆ’A𝖯⁒(A∣W,E)⁒𝖯⁒(Vk=1∣E,W)𝖯⁒(Vk=1)⁒{Yβˆ’π–€β’(Y∣A,E,W)}+πŸ™β’{Vk=1}𝖯⁒(Vk=1)⁒[𝖀⁒(Y∣A=1,W,E,Vk=1)βˆ’π–€β’(Y∣A=0,W,E)]βˆ’π–Όπ– π–³π–€β’(k).1subscriptπ‘‰π‘˜1𝖯subscriptπ‘‰π‘˜1delimited-[]𝐴𝖯conditional𝐴subscriptπ‘‰π‘˜1π‘ŠπΈπ‘Œπ–€conditionalπ‘Œπ΄π‘ŠπΈsubscriptπ‘‰π‘˜11𝐴𝖯conditionalπ΄π‘ŠπΈπ–―subscriptπ‘‰π‘˜conditional1πΈπ‘Šπ–―subscriptπ‘‰π‘˜1π‘Œπ–€conditionalπ‘Œπ΄πΈπ‘Š1subscriptπ‘‰π‘˜1𝖯subscriptπ‘‰π‘˜1delimited-[]𝖀formulae-sequenceconditionalπ‘Œπ΄1π‘ŠπΈsubscriptπ‘‰π‘˜1𝖀conditionalπ‘Œπ΄0π‘ŠπΈπ–Όπ– π–³π–€π‘˜\frac{\mathds{1}{\{V_{k}=1\}}}{\mathsf{P}(V_{k}=1)}\bigg{[}\frac{A}{\mathsf{P}% (A\mid V_{k}=1,W,E)}\{Y-\mathsf{E}(Y\mid A,W,E,V_{k}=1)\}\bigg{]}-\\ \frac{1-A}{\mathsf{P}(A\mid W,E)}\frac{\mathsf{P}(V_{k}=1\mid E,W)}{\mathsf{P}% (V_{k}=1)}\{Y-\mathsf{E}(Y\mid A,E,W)\}+\\ \frac{\mathds{1}{\{V_{k}=1\}}}{\mathsf{P}(V_{k}=1)}\Big{[}\mathsf{E}(Y\mid A=1% ,W,E,V_{k}=1)-\mathsf{E}(Y\mid A=0,W,E)\Big{]}-\mathsf{cATE}(k).start_ROW start_CELL divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 } end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ divide start_ARG italic_A end_ARG start_ARG sansserif_P ( italic_A ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG { italic_Y - sansserif_E ( italic_Y ∣ italic_A , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) } ] - end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 - italic_A end_ARG start_ARG sansserif_P ( italic_A ∣ italic_W , italic_E ) end_ARG divide start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ∣ italic_E , italic_W ) end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - sansserif_E ( italic_Y ∣ italic_A , italic_E , italic_W ) } + end_CELL end_ROW start_ROW start_CELL divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 } end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ sansserif_E ( italic_Y ∣ italic_A = 1 , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E ) ] - sansserif_cATE ( italic_k ) . end_CELL end_ROW (5)

These influence functions suggest the following estimators,

𝖼𝖠𝖳𝖀^DRocsuperscriptsubscript^𝖼𝖠𝖳𝖀DRoc\displaystyle\hat{\mathsf{cATE}}_{\text{DR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT =1nβˆ‘i=1n(πŸ™β’{Vk,i=1}nβˆ’1β’βˆ‘i=1nπŸ™β’{Vk,i=1}[(2⁒aiβˆ’1)Ο€oc⁒(wi,ei,1){yiβˆ’ΞΌoc(ai,wi,ei,1)}\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{\mathds{1}{\{V_{k,i}=1\}}% }{n^{-1}\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1\}}}\bigg{[}\frac{(2a_{i}-1)}{\pi_% {\text{oc}}(w_{i},e_{i},1)}\{y_{i}-\mu_{\text{oc}}(a_{i},w_{i},e_{i},1)\}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG start_ARG italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG [ divide start_ARG ( 2 italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - 1 ) end_ARG start_ARG italic_Ο€ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) end_ARG { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) }
+ΞΌoc(1,wi,ei,1)βˆ’ΞΌoc(0,wi,ei,1)])\displaystyle+\mu_{\text{oc}}(1,w_{i},e_{i},1)-\mu_{\text{oc}}(0,w_{i},e_{i},1% )\bigg{]}\bigg{)}+ italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) ] )
𝖼𝖠𝖳𝖀^DRallsuperscriptsubscript^𝖼𝖠𝖳𝖀DRall\displaystyle\hat{\mathsf{cATE}}_{\text{DR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT =1nβˆ‘i=1n(πŸ™β’{Vk,i=1}nβˆ’1β’βˆ‘i=1nπŸ™β’{Vk,i=1}[πŸ™β’{Ai=1}Ο€oc⁒(wi,ei,1){yiβˆ’ΞΌoc(1,wi,ei,1)}]\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{\mathds{1}{\{V_{k,i}=1\}}% }{n^{-1}\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1\}}}\bigg{[}\frac{\mathds{1}{\{A_{% i}=1\}}}{\pi_{\text{oc}}(w_{i},e_{i},1)}\{y_{i}-\mu_{\text{oc}}(1,w_{i},e_{i},% 1)\}\bigg{]}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG start_ARG italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG [ divide start_ARG blackboard_1 { italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 } end_ARG start_ARG italic_Ο€ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) end_ARG { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) } ]
+πŸ™β’{Ai=0}1βˆ’Ο€all⁒(wi,ei)⁒ν⁒(wi,ei)nβˆ’1β’βˆ‘i=1nπŸ™β’{Vk,i=1}⁒{yiβˆ’ΞΌall⁒(0,wi,ei)}1subscript𝐴𝑖01subscriptπœ‹allsubscript𝑀𝑖subscriptπ‘’π‘–πœˆsubscript𝑀𝑖subscript𝑒𝑖superscript𝑛1superscriptsubscript𝑖1𝑛1subscriptπ‘‰π‘˜π‘–1subscript𝑦𝑖subscriptπœ‡all0subscript𝑀𝑖subscript𝑒𝑖\displaystyle+\frac{\mathds{1}{\{A_{i}=0\}}}{1-\pi_{\text{all}}(w_{i},e_{i})}% \frac{\nu(w_{i},e_{i})}{n^{-1}\sum_{i=1}^{n}\mathds{1}{\{V_{k,i}=1\}}}\{y_{i}-% \mu_{\text{all}}(0,w_{i},e_{i})\}+ divide start_ARG blackboard_1 { italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 } end_ARG start_ARG 1 - italic_Ο€ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG divide start_ARG italic_Ξ½ ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG { italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) }
+πŸ™β’{Vk,i=1}nβˆ’1β’βˆ‘i=1nπŸ™β’{Vk,i=1}[ΞΌoc(1,wi,ei,1)βˆ’ΞΌall(0,wi,ei)]),\displaystyle+\frac{\mathds{1}{\{V_{k,i}=1\}}}{n^{-1}\sum_{i=1}^{n}\mathds{1}{% \{V_{k,i}=1\}}}\bigg{[}\mu_{\text{oc}}(1,w_{i},e_{i},1)-\mu_{\text{all}}(0,w_{% i},e_{i})\bigg{]}\bigg{)},+ divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG start_ARG italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT = 1 } end_ARG [ italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) - italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] ) ,

where, Ο€oc⁒(wi,ei,1),ΞΌoc⁒(1,wi,ei,1),ΞΌall⁒(0,wi,ei)subscriptπœ‹ocsubscript𝑀𝑖subscript𝑒𝑖1subscriptπœ‡oc1subscript𝑀𝑖subscript𝑒𝑖1subscriptπœ‡all0subscript𝑀𝑖subscript𝑒𝑖\pi_{\text{oc}}(w_{i},e_{i},1),\mu_{\text{oc}}(1,w_{i},e_{i},1),\mu_{\text{all% }}(0,w_{i},e_{i})italic_Ο€ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) , italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ) , italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and Ο€all⁒(wi,ei)subscriptπœ‹allsubscript𝑀𝑖subscript𝑒𝑖\pi_{\text{all}}(w_{i},e_{i})italic_Ο€ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), can be estimated by using parametric and machine learning methods. Building on the results from the previous sections, we can leverage the linear regression models introduced earlier: ΞΌoc⁒(0,wi,ei,1;Ξ²0)subscriptπœ‡oc0subscript𝑀𝑖subscript𝑒𝑖1subscript𝛽0\mu_{\text{oc}}(0,w_{i},e_{i},1;\beta_{0})italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ; italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), and ΞΌall⁒(0,wi,ei;Ξ±0)subscriptπœ‡all0subscript𝑀𝑖subscript𝑒𝑖subscript𝛼0\mu_{\text{all}}(0,w_{i},e_{i};\alpha_{0})italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_Ξ± start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) as outcome models. Similarly, the logistic regression models Ο€all⁒(w,e;Ξ·)subscriptπœ‹allπ‘€π‘’πœ‚\pi_{\text{all}}(w,e;\eta)italic_Ο€ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( italic_w , italic_e ; italic_Ξ· ) and ν⁒(w,e;ΞΎ)πœˆπ‘€π‘’πœ‰\nu(w,e;\xi)italic_Ξ½ ( italic_w , italic_e ; italic_ΞΎ ) introduced previously for 𝖯[A=1∣W=w,E=e]\mathsf{P}[A=1\mid W=w,E=e]sansserif_P [ italic_A = 1 ∣ italic_W = italic_w , italic_E = italic_e ] and 𝖯[Vk=1∣W=w,E=e]\mathsf{P}[V_{k}=1\mid W=w,E=e]sansserif_P [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ∣ italic_W = italic_w , italic_E = italic_e ], respectively, can also be employed.

Large sample properties.

Note that if Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a deterministic function of E𝐸Eitalic_E, and therefore 𝖯⁒(Vk=1∣E,W)=πŸ™β’{E>t}𝖯subscriptπ‘‰π‘˜conditional1πΈπ‘Š1𝐸𝑑\mathsf{P}(V_{k}=1\mid E,W)=\mathds{1}\{E>t\}sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ∣ italic_E , italic_W ) = blackboard_1 { italic_E > italic_t }, the two EIFs (4) and (5) are the same. In addition, the first influence function in Theorem 3 boils down to the standard influence function for the average treatment effect in the the Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 population. Therefore, it inherits the standard analysis of the one-step estimator for average treatment effects as discussed in (Kennedy etΒ al., 2021, Section 4.1). Similar analysis can be conducted when Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a stochastic function of E𝐸Eitalic_E. In summary, if the outcome models are correctly specified, it can be shown that estimators of the form of 𝖼𝖠𝖳𝖀^DRocsuperscriptsubscript^𝖼𝖠𝖳𝖀DRoc\hat{\mathsf{cATE}}_{\text{DR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT and 𝖼𝖠𝖳𝖀^DRallsuperscriptsubscript^𝖼𝖠𝖳𝖀DRall\hat{\mathsf{cATE}}_{\text{DR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT are root-n consistent, asymptotically normal with asymptotically valid 95% confidence intervals given by the closed-form expressions 𝖼𝖠𝖳𝖀^DRocΒ±1.96⁒var^⁒{φ⁒(Z,𝖼𝖠𝖳𝖀⁒(k))}/nplus-or-minussuperscriptsubscript^𝖼𝖠𝖳𝖀DRoc1.96^varπœ‘π‘π–Όπ– π–³π–€π‘˜π‘›\hat{\mathsf{cATE}}_{\text{DR}}^{\text{oc}}\pm 1.96\sqrt{\hat{\text{var}}\{% \varphi(Z,\mathsf{cATE}(k))\}/n}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT Β± 1.96 square-root start_ARG over^ start_ARG var end_ARG { italic_Ο† ( italic_Z , sansserif_cATE ( italic_k ) ) } / italic_n end_ARG and 𝖼𝖠𝖳𝖀^DRallΒ±1.96⁒var^⁒{φ⁒(Z,𝖼𝖠𝖳𝖀⁒(k))}/nplus-or-minussuperscriptsubscript^𝖼𝖠𝖳𝖀DRall1.96^varπœ‘π‘π–Όπ– π–³π–€π‘˜π‘›\hat{\mathsf{cATE}}_{\text{DR}}^{\text{all}}\pm 1.96\sqrt{\hat{\text{var}}\{% \varphi(Z,\mathsf{cATE}(k))\}/n}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT Β± 1.96 square-root start_ARG over^ start_ARG var end_ARG { italic_Ο† ( italic_Z , sansserif_cATE ( italic_k ) ) } / italic_n end_ARG, respectively, and efficient in the local asymptotic minimax sense. If the models are misspecified, the confidence intervals will be conservative (Kennedy, 2022).

Double robustness.

Similarly, if Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a deterministic function of E𝐸Eitalic_E the two estimators boils down to the standard DR estimator for the average treatment effect in the Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 population, they also inherit the same double robust property. This means that if either the outcome regression model (ΞΌoc⁒(0,wi,ei,1;Ξ²0)subscriptπœ‡oc0subscript𝑀𝑖subscript𝑒𝑖1subscript𝛽0\mu_{\text{oc}}(0,w_{i},e_{i},1;\beta_{0})italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ; italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), ΞΌall⁒(0,wi,ei;Ξ±0)subscriptπœ‡all0subscript𝑀𝑖subscript𝑒𝑖subscript𝛼0\mu_{\text{all}}(0,w_{i},e_{i};\alpha_{0})italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_Ξ± start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )) or the treatment assignment model (Ο€oc⁒(wi,ei,1;Ξ·)subscriptπœ‹ocsubscript𝑀𝑖subscript𝑒𝑖1πœ‚\pi_{\text{oc}}(w_{i},e_{i},1;\eta)italic_Ο€ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ; italic_Ξ· ), Ο€all⁒(w,e;Ξ·)subscriptπœ‹allπ‘€π‘’πœ‚\pi_{\text{all}}(w,e;\eta)italic_Ο€ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( italic_w , italic_e ; italic_Ξ· )) is correctly specified (in a parametric sense), then the DR estimator is consistent, see section 4.2 of Kennedy (2022) for details. Recall that our proposed estimators are doubly robust due to randomization i.e. the treatment assignment mechanism is known by design. We provide some empirical result of this property in our simulations in section 7. In addition, it can be shown that if Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a stochastic function of E𝐸Eitalic_E, the same double robustness property holds, provided the model for Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is correctly specified.

6 Efficiency considerations

Estimators based on outcome regression.

As shown in the appendix, the influence function of the conditional expectation under control, φ⁒(Zi,ΞΌ^0)πœ‘subscript𝑍𝑖subscript^πœ‡0\varphi(Z_{i},\hat{\mu}_{0})italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ΞΌ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), depends on two components: the influence function of ΞΌ0subscriptπœ‡0\mu_{0}italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT itself and that of the regression coefficients. Here, ΞΌ0=𝖀⁒[Y⁒(0)|Vk=1]subscriptπœ‡0𝖀delimited-[]conditionalπ‘Œ0subscriptπ‘‰π‘˜1\mu_{0}=\mathsf{E}[Y(0)|V_{k}=1]italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = sansserif_E [ italic_Y ( 0 ) | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] depending on the estimand under study. Therefore, a more precise estimation of the regression coefficients (the variance of the estimated regression coefficient is inversely proportional to the sample size), translate to a more precise estimation of ΞΌ0subscriptπœ‡0\mu_{0}italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and consequently of 𝖼𝖠𝖳𝖀^ORallsuperscriptsubscript^𝖼𝖠𝖳𝖀ORall\hat{\mathsf{cATE}}_{\text{OR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT compared to 𝖼𝖠𝖳𝖀^ORocsuperscriptsubscript^𝖼𝖠𝖳𝖀ORoc\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT.

Doubly robust estimators.

As previously discussed, if Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a deterministic function of E𝐸Eitalic_E the two EIFs presented in section 5.3 are the same. In this case, efficiency gains come solely from better fitting of the regression E⁒(Y∣A=0,W,E,Vk=1)𝐸formulae-sequenceconditionalπ‘Œπ΄0π‘ŠπΈsubscriptπ‘‰π‘˜1E(Y\mid A=0,W,E,V_{k}=1)italic_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ), which under assumption (A5) is equal to E⁒(Y∣A=0,W,E,Vk=1)=E⁒(Y∣A=0,W,E)𝐸formulae-sequenceconditionalπ‘Œπ΄0π‘ŠπΈsubscriptπ‘‰π‘˜1𝐸conditionalπ‘Œπ΄0π‘ŠπΈE(Y\mid A=0,W,E,V_{k}=1)=E(Y\mid A=0,W,E)italic_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) = italic_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E ) (because Vk=πŸ™β’{E>t}subscriptπ‘‰π‘˜1𝐸𝑑V_{k}=\mathds{1}\{E>t\}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = blackboard_1 { italic_E > italic_t }). In this case it becomes purely about getting this regression right, and these efficiency gains do not show up in the first order analysis of the estimator. In addition, as aforementioned, efficiency gains can be obtained by leveraging prognostic variables as discussed in Colantuoni &Β Rosenblum (2015); Benkeser etΒ al. (2021). We show some empirical results in our simulations in section 7. Interestingly, efficiency gains can also be achieved when using doubly robust estimators under ModelΒ (1) when Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a non deterministic function of E𝐸Eitalic_E as shown in our simulation setting. Under this scenario, we expect to see efficiency gains also when using an estimator based on inverse probability weighting solely.

7 Simulations

In this section we evaluate the performance of the proposed estimators with respect to, bias squared, variance, mean square error, and coverage of the 95% confidence interval, across levels of the percentage of concurrent controls, and model misspecification when estimating 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ). We do not compare our proposed estimators with methods described in section 4, because it is not clear if they target 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ).

Finally, our simulations aim to showcase the theoretical properties previously discussed, rather than evaluating them under complex real-world scenarios.

7.1 Setup

Aims

To evaluate the performance and gains in efficiency of our proposed estimators across levels of (1) percentage of concurrent controls (90% to 10%) and (2) model misspecification (correct outcome and treatment models; and misspecified outcome and correct treatment model) considering Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT a deterministic function of E𝐸Eitalic_E. In addition, we also evaluate efficiency gains by comparing the estimated variance of the outcome regression (Section 5.1) and doubly robust (Section 5.3) estimators that only use concurrent data compared with those that use all data considering Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT both a deterministic and a stochastic function of E𝐸Eitalic_E.

Data-generating mechanisms

We considered generating data from ModelΒ (1). Specifically, we considered a sample size of n=1,000𝑛1000n=1,000italic_n = 1 , 000 and for each subject i=1,…,n𝑖1…𝑛i=1,\dots,nitalic_i = 1 , … , italic_n, we simulated the following data:

  • Step 1. the entry time E∼Norm⁒(0,1)similar-to𝐸Norm01E\sim\text{Norm}(0,1)italic_E ∼ Norm ( 0 , 1 ) and a baseline covariate W=βˆ’ΞΊ1+0.8⁒E+Norm⁒(0,1)π‘Šsubscriptπœ…10.8𝐸Norm01W=-\kappa_{1}+0.8E+\text{Norm}(0,1)italic_W = - italic_ΞΊ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 0.8 italic_E + Norm ( 0 , 1 ), where ΞΊ1=nβˆ’1β’βˆ‘i=1n0.8⁒Esubscriptπœ…1superscript𝑛1superscriptsubscript𝑖1𝑛0.8𝐸\kappa_{1}=n^{-1}\sum_{i=1}^{n}0.8Eitalic_ΞΊ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 0.8 italic_E;

  • Step 2. an indicator whether treatment kπ‘˜kitalic_k was available at time E𝐸Eitalic_E, Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as a deterministic function of E𝐸Eitalic_E being less than a threshold describing the level of the percentage of concurrent controls;

  • Step 3. a binary treatment A∼Bernoulli⁒(π⁒(W))similar-to𝐴Bernoulliπœ‹π‘ŠA\sim\text{Bernoulli}(\pi(W))italic_A ∼ Bernoulli ( italic_Ο€ ( italic_W ) ), where π⁒(W)=(1+exp⁑(βˆ’(βˆ’ΞΊ2+0.8⁒W)))βˆ’1πœ‹π‘Šsuperscript1subscriptπœ…20.8π‘Š1\pi(W)=\left(1+\exp\left(-(-\kappa_{2}+0.8W)\right)\right)^{-1}italic_Ο€ ( italic_W ) = ( 1 + roman_exp ( - ( - italic_ΞΊ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 0.8 italic_W ) ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and ΞΊ2=nβˆ’1β’βˆ‘i=1n0.8⁒Wsubscriptπœ…2superscript𝑛1superscriptsubscript𝑖1𝑛0.8π‘Š\kappa_{2}=n^{-1}\sum_{i=1}^{n}0.8Witalic_ΞΊ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 0.8 italic_W when Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 and A=0𝐴0A=0italic_A = 0, otherwise (participants for which treatment only control is available);

  • Step 4. two counterfactual outcomes, Y⁒(0)=0.8⁒W+0.5⁒E+Norm⁒(0,1)π‘Œ00.8π‘Š0.5𝐸Norm01Y(0)=0.8W+0.5E+\text{Norm}(0,1)italic_Y ( 0 ) = 0.8 italic_W + 0.5 italic_E + Norm ( 0 , 1 ), and Y⁒(k)=Y⁒(0)+Ξ”π‘Œπ‘˜π‘Œ0Ξ”Y(k)=Y(0)+\Deltaitalic_Y ( italic_k ) = italic_Y ( 0 ) + roman_Ξ”, with Ξ”=0.8Ξ”0.8\Delta=0.8roman_Ξ” = 0.8, and the observed outcome Y=A⁒Y⁒(1)+(1βˆ’A)⁒Y⁒(0)π‘Œπ΄π‘Œ11π΄π‘Œ0Y=AY(1)+(1-A)Y(0)italic_Y = italic_A italic_Y ( 1 ) + ( 1 - italic_A ) italic_Y ( 0 ). Since we consider a homogeneous treatment effect, Ξ”=𝖼𝖠𝖳𝖀⁒(k)=0.8Ξ”π–Όπ– π–³π–€π‘˜0.8\Delta=\mathsf{cATE}(k)=0.8roman_Ξ” = sansserif_cATE ( italic_k ) = 0.8 .

Estimands

The estimand of interest is 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ).

Methods

For each dataset across levels of percentage of concurrent controls, and misspecification we used the methods summarized in Table 1.

Performance metrics

Bias squared, variance, mean square error (MSE), and coverage of the 95% confidence interval. In addition, we also considered the ratio of the estimated variances.

Scenarios

We considered levels of percentage of concurrent controls between 10% and 90% by 10%. Misspecified models were set to only include an intercept – not controlling for any covariates or entry time.

Table 1: Methods used in the estimation of 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ).
Acronym
Method 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k )
Outcome regression using only concurrent data, (𝖼𝖠𝖳𝖀^ORocsuperscriptsubscript^𝖼𝖠𝖳𝖀ORoc\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT, Section 5.1) OR-oc
Outcome regression using all data, (𝖼𝖠𝖳𝖀^ORallsuperscriptsubscript^𝖼𝖠𝖳𝖀ORall\hat{\mathsf{cATE}}_{\text{OR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT, 𝖠𝖳𝖀^ORsubscript^𝖠𝖳𝖀OR\hat{\mathsf{ATE}}_{\text{OR}}over^ start_ARG sansserif_ATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT, Section 5.1) OR-ac
Weighting using only concurrent data (𝖼𝖠𝖳𝖀^IPWocsuperscriptsubscript^𝖼𝖠𝖳𝖀IPWoc\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT, Section 5.2) IPW
Doubly robust using only concurrent data (𝖼𝖠𝖳𝖀^DRocsuperscriptsubscript^𝖼𝖠𝖳𝖀DRoc\hat{\mathsf{cATE}}_{\text{DR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT, Section 5.3) DR-oc
Doubly robust using all data (𝖼𝖠𝖳𝖀^DRallsuperscriptsubscript^𝖼𝖠𝖳𝖀DRall\hat{\mathsf{cATE}}_{\text{DR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT, Section 5.3) DR-ac

7.2 Results

7.2.1 Bias, variance, MSE, and coverage

Figure 3 and Figure 4, show bias squared, variance, MSE and coverage of the 95% confidence intervals in estimating 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) across percentage of concurrent controls when models are correct and when only the treatment model is correctly specified, respectively. When both the outcome and the treatment models are correct (Figure 3), bias squared is negligible across levels of concurrent controls for all methods. Variance is shown to increase with decreasing levels of concurrent controls across all methods, with OR-ac being smaller than OR-oc, suggesting a gain in efficiency (more on this in the next section). Similar behavior can be seen for the MSE. Finally, all methods achieve desirable coverage levels. When the outcome model is misspecified (Figure 4), both estimators based on outcome regression show bias while maintaining a relatively small variance. MSE is consequently dominated by bias. DR estimators and the IPW estimator maintaine negligible levels of bias and relatively small variance.

Refer to caption
Figure 3: Bias squared, variance, MSE and coverage of the 95% confidence interval of DR-ac, DR-oc, IPW, OR-ac and OR-ac under correct models. Note that, DR-ac and DR-oc overlap in terms of bias squared, sampling variability and MSE.
Refer to caption
Figure 4: Bias squared, variance, MSE and coverage of the 95% confidence interval of DR-ac, DR-oc, IPW, OR-ac and OR-ac under misspecified outcome model and correct treatment assignment model. Note that, DR-ac and DR-oc overlap in terms of bias squared and MSE.

7.2.2 Efficiency gains

Figure 5 shows the ratio of the estimated standard errors of DR-oc over DC-ac and OR-oc over OR-ac across levels of concurrent controls and misspecification.

The top panels of Figure 5 follow ModelΒ (1) where Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a deterministic function of E𝐸Eitalic_E. As discussed in Section 6, estimators based on outcome regression that use all controls seem to have a gain in efficiency, while DR estimators did not under correct models.

The bottom panels of Figure 5 follow ModelΒ (1) where Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a not a deterministic function of E𝐸Eitalic_E. Specifically, we generated Vk∼Bernoulli⁒(π⁒(E))similar-tosubscriptπ‘‰π‘˜Bernoulliπœ‹πΈV_{k}\sim\text{Bernoulli}(\pi(E))italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∼ Bernoulli ( italic_Ο€ ( italic_E ) ), where π⁒(E)=(1+exp⁑(βˆ’(βˆ’ΞΊ4+0.5⁒E)))βˆ’1πœ‹πΈsuperscript1subscriptπœ…40.5𝐸1\pi(E)=\left(1+\exp\left(-(-\kappa_{4}+0.5E)\right)\right)^{-1}italic_Ο€ ( italic_E ) = ( 1 + roman_exp ( - ( - italic_ΞΊ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT + 0.5 italic_E ) ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and ΞΊ4=ΞΊ3+nβˆ’1β’βˆ‘i=1n0.5⁒Esubscriptπœ…4subscriptπœ…3superscript𝑛1superscriptsubscript𝑖1𝑛0.5𝐸\kappa_{4}=\kappa_{3}+n^{-1}\sum_{i=1}^{n}0.5Eitalic_ΞΊ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = italic_ΞΊ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 0.5 italic_E, and ΞΊ3subscriptπœ…3\kappa_{3}italic_ΞΊ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is the threshold discussed in Step 2 above. Under correct models (including the one for Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT), the entry time E𝐸Eitalic_E is a prognostic variable of Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and therefore improves efficiency compared to using only concurrent controls (Bottom panels of 5; DR-oc/DR-ac). These results suggest that efficiency gains can be obtained when using a doubly robust estimator under ModelΒ (1) when Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is not a deterministic function of E𝐸Eitalic_E and the model for Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is correct.

Summary of results.

Methods based on outcome regression improve efficiency when using non-concurrent controls. However, they introduce bias when misspecified. In contrast, doubly robust estimators provide consistent estimates with relatively small variance when either the treatment or outcome model is correctly specified. In addition, doubly robust estimator have the potential to additionally improve efficiency when Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is not a deterministic function of E𝐸Eitalic_E under ModelΒ (1).

Refer to caption
Figure 5: Ratio of standard errors DR-oc/DR-ac and OR-oc/OR-ac across model misspecifications considering Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT a deterministic function of E𝐸Eitalic_E (top panels) and not (bottom panels). A ratio greater than 1 means efficiency gains.

8 Practical considerations

What estimand should we target?

In this paper, we introduce 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), whereas it not clear what estimand the current related literature targets. Given the more stringent and untestable assumptions needed to identify 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ), we suggest targeting 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ). Note that 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) and 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) will coincide under the assumption of homogeneous treatment effect. While this is true in theory, we would expect it not to hold in practical settings, leading to different results as showed in our case study in the next section.

Should we pool concurrent and non-concurrent controls? Testing assumptionΒ A5.

To evaluate assumptionΒ A5 under ModelΒ (1), we propose using the method introduced by Luedtke etΒ al. (2019). Specifically, following the notation of the original paper, we suggest to set RP⁒(o)β‰œπ–€β’(Y∣A,W,E,Vk=1)βˆ’π–€β’(Y∣A=0,W,E)β‰œsubscriptπ‘…π‘ƒπ‘œπ–€conditionalπ‘Œπ΄π‘ŠπΈsubscriptπ‘‰π‘˜1𝖀conditionalπ‘Œπ΄0π‘ŠπΈR_{P}(o)\triangleq\mathsf{E}(Y\mid A,W,E,V_{k}=1)\ -\mathsf{E}(Y\mid A=0,W,E)italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_o ) β‰œ sansserif_E ( italic_Y ∣ italic_A , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E ) and SP⁒(o)≑0subscriptπ‘†π‘ƒπ‘œ0S_{P}(o)\equiv 0italic_S start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_o ) ≑ 0, where oπ‘œoitalic_o is the observed data, and RPsubscript𝑅𝑃R_{P}italic_R start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT and Spsubscript𝑆𝑝S_{p}italic_S start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT are elements of the space of univariate bounded real-valued measurable functions defined on the the support of the distribution P𝑃Pitalic_P (Luedtke etΒ al., 2019). We show an application of this testing procedure in our case study. Note that if Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a deterministic function of E𝐸Eitalic_E as in our case study, this test is a test on the statistical models, i.e., if the projections on the linear model are different, and not a test on the true expectations. In other words, it is a test to evaluate model misspecifications and therefore guide analysts on the choice to either use or not non-concurrent controls.

Should we leverage prognostic baseline variables for additional precision?

Recent literature suggests that incorporating baseline prognostic variables can improve the precision of estimates (Colantuoni &Β Rosenblum, 2015). We propose following this approach by appropriately controlling for these variables in the analysis. This may explain the increased precision observed with DR-oc and OR-oc estimators compared to the naive estimator in our case study (presented in the next section), despite being computed within the concurrent population only.

What estimator should we use?

Under ModelΒ (1) and assuming Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a deterministic function of E𝐸Eitalic_E, to estimate 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), we recommend using 𝖼𝖠𝖳𝖀^DRocsuperscriptsubscript^𝖼𝖠𝖳𝖀DRoc\hat{\mathsf{cATE}}_{\text{DR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT DR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT, the doubly robust estimator using only concurrent data (Section 5.3). We recommend this estimator because: 1) it has the same efficiency as the DR estimator that uses non-concurrent controls 2) it does not require any additional assumptions, thus better aligning with the FDA recommendations (FDA, 2021, Section A.5.1); 3) it accommodates covariates that, if prognostic, can be leveraged to improve efficiency (Colantuoni &Β Rosenblum, 2015); and 4) it is doubly robust, meaning that it is consistent when either the treatment assignment model or the outcome model is correctly specified, a property we obtain β€œfor free” in platform trials due to randomization. Assuming Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a stochastic function of E𝐸Eitalic_E and if an analyst chooses to leverage non-concurrent controls, then the DR-ac estimator is recommended.

Sample size calculation for a prospective trial with non-concurrent controls.

Our theoretical and methodological results suggest an efficiency gain when including non-concurrent control with estimators based on regression models. While these results are promising, we suggest to conduct standard sample size calculation as if the non-concurrent control data will not be available. At the analysis stage, precision can then be improved by using non-concurrent control data as previously described with the caveat that the outcome model must be correctly specified.

Multiple comparisons.

Our proposed methods enable the use of standard type I error control procedures, such as Bonferroni or Benjamini-Hochberg corrections, due to the validity of 95% confidence intervals, test statistics, and p-values (demonstrated in previous sections). This allows for straightforward application of these corrections in platform trials with, for instance, pre-planned interim analyses, and multiple primary endpoints.

Summary of practical guidelines.

Target 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) as the estimand of interest. Use DR-oc to obtain consistent estimates of 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ). To improve efficiency, focus on prognostic baseline covariates rather than relying on non-concurrent controls. If leveraging non-concurrent controls is of interest, use DR-ac. Finally, conduct sample size calculations without considering non-concurrent controls and conduct standard multiple comparisons adjustments.

9 The Adaptive COVID-19 Treatment Platform Trial

In this section, we apply our proposed estimators to estimate 𝖼𝖠𝖳𝖀𝖼𝖠𝖳𝖀\mathsf{cATE}sansserif_cATE using data from the Adaptive COVID-19 Treatment Trial (ACTT) (Kalil etΒ al., 2021). This was a platform trial that investigated treatments for hospitalized adult patients with COVID-19 pneumonia. The trial comprised multiple stages, as illustrated in Figure 1. The initial phase, ACTT-1, involved the assessment of the effectiveness of remdesivir alone compared to placebo. Subsequently, in the second stage (ACTT-2), the placebo was phased out, and a novel treatment, combining remdesivir with baricitinib, was introduced. Simultaneously, participants were randomized to receive either remdesivir alone or the combination therapy of remdesivir and baricitinib. Data were accessed using the NIAID Clinical Trials Data Repository (https://data.niaid.nih.gov/). We have a Data User Agreement in place for its use.

Study population and endpoint of interest.

We considered the combined participants of ACCT-1 and ACTT-2 as our study population. We followed the inclusion and exclusion criteria of the original study. The final study population was comprised of 1,379 participants, 541 from ACTT-1 and 1,033 from ACTT-2. We considered the time to recovery in days as our enpoint of interest.

Treatments under study and targeted causal estimand.

We considered two treatment arms: remdesivir alone (which served as the shared control arm) and remdesivir plus baricitinib. Our target estimand was 𝖼𝖠𝖳𝖀⁒(1)𝖼𝖠𝖳𝖀1\mathsf{cATE}(1)sansserif_cATE ( 1 ), where 1111 represents the remdesivir plus baricitinib arm and 00 represents the remdesivir alone shared arm.

Baseline covariates.

We consider the following baseline covariates: age, sex assigned at birth (female, male), race (White, Black, Asian, Other: American Indian or Alaska Native, Native Hawaiian or Other Pacific Islander, Multiple), ethnicity (Hispanic or Latino, Not Hispanic or Latino), BMI, geographic region of study site (Asia, Europe, North America), disease severity stratum (mild, severe), and having any of these comorbidities: duration of symptoms, hypertension, coronary artery disease, congestive heart disease, chronic oxygen requirement, chronic respiratory disease, chronic liver disease, chronic kidney disease, diabetes type I, diabetes type II, obesity, cancer, immune deficiency, and asthma, in addition to the entry time which we normalized to be between 0 and 1.

Models setup.

We computed OR-oc, OR-ac, OR-ad, IPW, DR-oc and DR-ac (Table 1) by using linear and logistic regression models. We computed the naive estimator by taking the average difference in the endpoint between the two arms among only concurrent participants. Variances were obtained by using the sandwich estimator (for naive, OR-oc, OR-ac, IPW) and by taking the variance of the efficient influence function (for DR-oc and DR-ac). Wald 95% confidence intervals and Wald tests were constructed.

Results.

Table 2 shows the point estimate for 𝖼𝖠𝖳𝖀⁒(1)𝖼𝖠𝖳𝖀1\mathsf{cATE}(1)sansserif_cATE ( 1 ), standard errors, 95% confidence intervals and p-values. The naive estimate of 𝖼𝖠𝖳𝖀⁒(1)𝖼𝖠𝖳𝖀1\mathsf{cATE}(1)sansserif_cATE ( 1 ), resulted in a value of -1.33 with a standard error of 0.58. This suggest that baricitinib plus remdesivir was superior to remdesivir alone in reducing recovery time as in the original trial (Kalil etΒ al., 2021). OR-oc, IPW, DR-oc and DR-ac improved precision while maintaining a similar point estimate. OR-ac improved precision the most (around 28% improvement compared with the naive estimator), however, it resulted in a different point estimate, -0.75 which led to a non significant result. This suggest that the outcome model used to obtain OR-ac might be misspecified and therefore that assumption A5 does not hold. Using an omnibus test as described in our practical guidelines in section 8 we obtained a p-value ≀0.001absent0.001\leq 0.001≀ 0.001 using both the variance and the eigenvalue approach (Luedtke etΒ al., 2019), thus supporting rejecting assumption A5 and, therefore, suggests not using non-concurrent controls. In contrast, doubly robust estimators improved precision while maintaining a similar point estimate as the naive estimator. We believe the improved precision observed in OR-oc, IPW, and DR-oc compared to the naive estimator stems from appropriately adjusting for baseline variables, as discussed in Colantuoni &Β Rosenblum (2015), and in our previous section. Note that a conditional analysis, targeting 𝖠𝖳𝖀⁒(1)𝖠𝖳𝖀1\mathsf{ATE}(1)sansserif_ATE ( 1 ), using a standard linear regression model regressing the outcome in the full population on treatment arms, entry time and baseline covariates led to a non significant point estimate of -0.46 (standard error equal to 0.52).

Table 2: Estimated 𝖼𝖠𝖳𝖀⁒(1)𝖼𝖠𝖳𝖀1\mathsf{cATE}(1)sansserif_cATE ( 1 ) using the ACTT data.
Method 𝖼𝖠𝖳𝖀^⁒(1)^𝖼𝖠𝖳𝖀1\hat{\mathsf{cATE}}(1)over^ start_ARG sansserif_cATE end_ARG ( 1 ) SE 95% CI p-value Ratio
OR-oc -1.29 0.47 (-2.21;-0.37) <<<0.01 1.22
OR-ac -0.75 0.45 (-1.63;0.13) 0.10 1.28
IPW -1.28 0.47 (-2.20;-0.36) <<<0.01 1.22
DR-oc -1.30 0.47 (-2.22;-0.38) <<<0.01 1.22
DR-ac -1.30 0.47 (-2.22;-0.38) <<<0.01 1.21
naive -1.33 0.58 (-2.47;-0.19) 0.02 1.00

10 Conclusion

In this paper, we introduced identification results and estimation techniques to identify and estimate concurrent average treatment effects 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) in the presence of non-concurrent control in platform trials. We argue that identifying and estimating 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ) relies on an extrapolation assumption that is both untestable and often too stringent, particularly in the context of platform trials, where multiple, potentially novel treatments or interventions are being evaluated and the outcome mechanism is poorly understood. Therefore, we advocate focusing primarily on 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ), where assumptions can be tested. By focusing on 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) rather than 𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\mathsf{ATE}(k)sansserif_ATE ( italic_k ), we also open the door to leveraging non-parametric models based on machine and deep learning techniques for learning outcome and treatment assignment mechanisms under the proposed doubly robust estimators (Kennedy, 2022; DΓ­az, 2020; Hirshberg &Β Wager, 2021). In fact, while these methods can capture complex data relationships, potentially mitigating model misspecification, they may not be suitable for extrapolation. Furthermore, our proposed doubly robust estimator accommodates Bayesian techniques while retaining valid frequentist properties, as demonstrated in (Shin &Β Antonelli, 2023; Antonelli etΒ al., 2022).

A key takeaway of this paper is to target 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) in platform trials, and, under ModelΒ (1), to use a doubly robust estimator only among concurrent units, prioritizing the identification of strong prognostic baseline variables rather than relying on non-concurrent controls, especially when Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a deterministic function of E𝐸Eitalic_E.

In this paper, we presented results that can be used for continuous and binary endpoints. Estimators can be constructed for time-to-event endpoints under the non-parametric causal model introduced in eq. (1).

Finally, in this paper, we demonstrate results assuming a structural equation model where treatment assignment may depend on baseline covariates; however, similar identification and estimation results can be obtained without baseline covariates.

References

  • (1)
  • Angus etΒ al. (2020) Angus, D.Β C., Derde, L., Al-Beidh, F., Annane, D., Arabi, Y., Beane, A., van Bentum-Puijk, W., Berry, L., Bhimani, Z., Bonten, M. etΒ al. (2020), β€˜Effect of hydrocortisone on mortality and organ support in patients with severe covid-19: the remap-cap covid-19 corticosteroid domain randomized clinical trial’, Jama 324(13),Β 1317–1329.
  • Antonelli etΒ al. (2022) Antonelli, J., Papadogeorgou, G. &Β Dominici, F. (2022), β€˜Causal inference in high dimensions: a marriage between bayesian modeling and good frequentist properties’, Biometrics 78(1),Β 100–114.
  • Banbeta etΒ al. (2019) Banbeta, A., Rosmalen, J., Dejardin, D. &Β Lesaffre, E. (2019), β€˜Modified power prior with multiple historical trials for binary endpoints’, Stat Med. 38.
    https://doi.org/10.1002/sim.8019
  • Barker etΒ al. (2009) Barker, A., Sigman, C., Kelloff, G.Β J., Hylton, N., Berry, D.Β A. &Β Esserman, L. (2009), β€˜I-spy 2: an adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy’, Clinical Pharmacology & Therapeutics 86(1),Β 97–100.
  • Benkeser etΒ al. (2021) Benkeser, D., DΓ­az, I., Luedtke, A., Segal, J., Scharfstein, D. &Β Rosenblum, M. (2021), β€˜Improving precision and power in randomized trials for covid-19 treatments using covariate adjustment, for binary, ordinal, and time-to-event outcomes’, Biometrics 77(4),Β 1467–1481.
  • Bennett etΒ al. (2021) Bennett, M., White, S., Best, N. &Β Mander, A. (2021), β€˜A novel equivalence probability weighted power prior for using historical control data in an adaptive clinical trial design: A comparison to standard methods’, Pharm Stat. 20.
    https://doi.org/10.1002/pst.2088
  • Berry etΒ al. (2015) Berry, S.Β M., Connor, J.Β T. &Β Lewis, R.Β J. (2015), β€˜The platform trial: an efficient strategy for evaluating multiple treatments’, Jama 313(16),Β 1619–1620.
  • Bickel etΒ al. (1993) Bickel, P.Β J., Klaassen, C.Β A., Bickel, P.Β J., Ritov, Y., Klaassen, J., Wellner, J.Β A. &Β Ritov, Y. (1993), Efficient and adaptive estimation for semiparametric models, Vol.Β 4, Springer.
  • BofillΒ Roig etΒ al. (2023) BofillΒ Roig, M., Burgwinkel, C., Garczarek, U., Koenig, F., Posch, M., Nguyen, Q. &Β Hees, K. (2023), β€˜On the use of non-concurrent controls in platform trials: a scoping review’, Trials 24(1),Β 1–17.
  • BofillΒ Roig, KΓΆnig, Meyer &Β Posch (2022) BofillΒ Roig, M., KΓΆnig, F., Meyer, E. &Β Posch, M. (2022), β€˜Commentary: Two approaches to analyze platform trials incorporating non-concurrent controls with a common assumption’, Clin Trials. 19.
    https://doi.org/10.1177/17407745221112016
  • BofillΒ Roig, Krotka, Burman, Glimm, Gold &Β Hees (2022) BofillΒ Roig, M., Krotka, P., Burman, C.Β F., Glimm, E., Gold, S.Β M. &Β Hees, K. (2022), β€˜On model-based time trend adjustments in platform trials with non-concurrent controls’, BMC Med Res Methodol. 22.
    https://doi.org/10.1186/s12874-022-01683-w
  • Boos &Β Stefanski (2013) Boos, D.Β D. &Β Stefanski, L.Β A. (2013), Essential statistical inference: theory and methods, Vol. 591, Springer.
  • Broglio (2018) Broglio, K. (2018), β€˜Randomization in clinical trials: permuted blocks and stratification’, Jama 319(21),Β 2223–2224.
  • Chen etΒ al. (2020) Chen, W.Β C., Wang, C., Li, H., Lu, N., Tiwari, R. &Β Xu, Y. (2020), β€˜Propensity score-integrated composite likelihood approach for augmenting the control arm of a randomized controlled trial by incorporating real-world data’, J Biopharm Stat. 30.
    https://doi.org/10.1080/10543406.2020.1730877
  • Colantuoni &Β Rosenblum (2015) Colantuoni, E. &Β Rosenblum, M. (2015), β€˜Leveraging prognostic baseline variables to gain precision in randomized trials’, Statistics in medicine 34(18),Β 2602–2617.
  • Collignon etΒ al. (2022) Collignon, O., Schiel, A., Burman, C.-F., Rufibach, K., Posch, M. &Β Bretz, F. (2022), β€˜Estimands and complex innovative designs’, Clinical Pharmacology & Therapeutics 112(6),Β 1183–1190.
  • Collignon etΒ al. (2020) Collignon, O., Schritz, A., Senn, S.Β J. &Β Spezia, R. (2020), β€˜Clustered allocation as a way of understanding historical controls: Components of variation and regulatory considerations’, Stat Methods Med Res. 29.
    https://doi.org/10.1177/0962280219880213
  • DΓ­az (2020) DΓ­az, I. (2020), β€˜Machine learning in the estimation of causal effects: targeted minimum loss-based estimation and double/debiased machine learning’, Biostatistics 21(2),Β 353–358.
  • FDA (2021) FDA (2021), β€˜E9 (r1) statistical principles for clinical trials: addendum: estimands and sensitivity analysis in clinical trials’, Guidance for Industry .
    https://shorturl.at/clJL5
  • Fisher &Β Kennedy (2021) Fisher, A. &Β Kennedy, E.Β H. (2021), β€˜Visually communicating and teaching intuition for influence functions’, The American Statistician 75(2),Β 162–172.
  • Foltynie etΒ al. (2023) Foltynie, T., Gandhi, S., Gonzalez-Robles, C., Zeissler, M.-L., Mills, G., Barker, R., Carpenter, J., Schrag, A., Schapira, A., Bandmann, O. etΒ al. (2023), β€˜Towards a multi-arm multi-stage platform trial of disease modifying approaches in parkinson’s disease’, Brain 146(7),Β 2717–2722.
  • Gravestock etΒ al. (2017) Gravestock, I., Held, L. &Β consortium, C.-N. (2017), β€˜Adaptive power priors with empirical bayes for clinical trials’, Pharmaceutical statistics 16(5),Β 349–360.
  • Han etΒ al. (2017) Han, B., Zhan, J., JohnΒ Zhong, Z., Liu, D. &Β Lindborg, S. (2017), β€˜Covariate-adjusted borrowing of historical control data in randomized clinical trials’, Pharm Stat. 16.
    https://doi.org/10.1002/pst.1815
  • Hayward etΒ al. (2021) Hayward, G., Butler, C.Β C., Yu, L.-M., Saville, B.Β R., Berry, N., Dorward, J., Gbinigie, O., VanΒ Hecke, O., Ogburn, E., Swayze, H. etΒ al. (2021), β€˜Platform randomised trial of interventions against covid-19 in older people (principle): protocol for a randomised, controlled, open-label, adaptive platform, trial of community treatment of covid-19 syndromic illness in people at higher risk’, BMJ open 11(6),Β e046799.
  • Hines etΒ al. (2022) Hines, O., Dukes, O., Diaz-Ordaz, K. &Β Vansteelandt, S. (2022), β€˜Demystifying statistical learning based on efficient influence functions’, The American Statistician 76(3),Β 292–304.
  • Hirshberg &Β Wager (2021) Hirshberg, D.Β A. &Β Wager, S. (2021), β€˜Augmented minimax linear estimation’, The Annals of Statistics 49(6),Β 3206–3227.
  • Hobbs etΒ al. (2011) Hobbs, B.Β P., Carlin, B.Β P., Mandrekar, S.Β J. &Β Sargent, D.Β J. (2011), β€˜Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials’, Biometrics. 67.
    https://doi.org/10.1111/j.1541-0420.2011.01564.x
  • Hupf etΒ al. (2021) Hupf, B., Bunn, V., Lin, J. &Β Dong, C. (2021), β€˜Bayesian semiparametric meta-analytic-predictive prior for historical control borrowing in clinical trials’, Stat Med. 40.
    https://doi.org/10.1002/sim.8970
  • Ibrahim &Β Chen (2000) Ibrahim, J.Β G. &Β Chen, M.Β H. (2000), β€˜Power prior distributions for regression models’, Stat Sci. 15.
    https://doi.org/10.1214/ss/1009212673
  • International Council for Harmonisation (2017) International Council for Harmonisation (2017), β€˜Ich harmonised guideline e9 (r1): Estimands and sensitivity analysis in clinical trials’.
    https://shorturl.at/arQT6
  • Jiang etΒ al. (2023) Jiang, L., Nie, L. &Β Yuan, Y. (2023), β€˜Elastic priors to dynamically borrow information from historical data in clinical trials’, Biometrics 79(1),Β 49–60.
  • Kalil etΒ al. (2021) Kalil, A.Β C., Patterson, T.Β F., Mehta, A.Β K., Tomashek, K.Β M., Wolfe, C.Β R., Ghazaryan, V., Marconi, V.Β C., Ruiz-Palacios, G.Β M., Hsieh, L., Kline, S. etΒ al. (2021), β€˜Baricitinib plus remdesivir for hospitalized adults with covid-19’, New England Journal of Medicine 384(9),Β 795–807.
  • Kennedy (2022) Kennedy, E.Β H. (2022), β€˜Semiparametric doubly robust targeted double machine learning: a review’, arXiv preprint arXiv:2203.06469 .
  • Kennedy etΒ al. (2021) Kennedy, E.Β H., Balakrishnan, S. &Β Wasserman, L. (2021), β€˜Semiparametric counterfactual density estimation’, arXiv preprint arXiv:2102.12034 .
  • Koenig etΒ al. (2024) Koenig, F., Spiertz, C., Millar, D., RodrΓ­guez-Navarro, S., MachΓ­n, N., VanΒ Dessel, A., GenescΓ , J., PericΓ s, J.Β M., Posch, M., SΓ‘nchez-Montalva, A. etΒ al. (2024), β€˜Current state-of-the-art and gaps in platform trials: 10 things you should know, insights from eu-pearl’, Eclinicalmedicine 67.
  • Lee &Β Wason (2020) Lee, K.Β M. &Β Wason, J. (2020), β€˜Including non-concurrent control patients in the analysis of platform trials: is it worth it?’, BMC medical research methodology 20(1),Β 1–12.
  • Luedtke etΒ al. (2019) Luedtke, A., Carone, M. &Β vanΒ der Laan, M.Β J. (2019), β€˜An omnibus non-parametric test of equality in distribution for unknown functions’, Journal of the Royal Statistical Society Series B: Statistical Methodology 81(1),Β 75–99.
  • Neuenschwander etΒ al. (2009) Neuenschwander, B., Branson, M. &Β Spiegelhalter, D.Β J. (2009), β€˜A note on the power prior’, Stat Med. 28.
    https://doi.org/10.1002/sim.3722
  • Normington etΒ al. (2020) Normington, J., Zhu, J., Mattiello, F., Sarkar, S. &Β Carlin, B. (2020), β€˜An efficient bayesian platform trial design for borrowing adaptively from historical control data in lymphoma’, Contemp Clin Trials. 89.
    https://doi.org/10.1016/j.cct.2019.105890
  • Park etΒ al. (2022) Park, J.Β J., Detry, M.Β A., Murthy, S., Guyatt, G. &Β Mills, E.Β J. (2022), β€˜How to use and interpret the results of a platform trial: users’ guide to the medical literature’, Jama 327(1),Β 67–74.
  • Pearl (1995) Pearl, J. (1995), β€˜Causal diagrams for empirical research’, Biometrika 82(4),Β 669–688.
  • Pearl (2010) Pearl, J. (2010), β€˜An introduction to causal inference’, The International Journal of Biostatistics 6(2),Β 7.
  • Saville etΒ al. (2022) Saville, B.Β R., Berry, D.Β A., Berry, N.Β S., Viele, K. &Β Berry, S.Β M. (2022), β€˜The bayesian time machine: accounting for temporal drift in multi-arm platform trials’, Clinical Trials 19(5),Β 490–501.
  • Schmidli etΒ al. (2014) Schmidli, H., Gsteiger, S., Roychoudhury, S., O’Hagan, A., Spiegelhalter, D. &Β Neuenschwander, B. (2014), β€˜Robust meta-analytic-predictive priors in clinical trials with historical control information’, Biometrics 70(4),Β 1023–1032.
  • Schmidli etΒ al. (2020) Schmidli, H., HΓ€ring, D.Β A., Thomas, M., Cassidy, A., Weber, S. &Β Bretz, F. (2020), β€˜Beyond randomized clinical trials: Use of external controls’, Clin Pharmacol Ther. 107.
    https://doi.org/10.1002/cpt.1723
  • Shin &Β Antonelli (2023) Shin, H. &Β Antonelli, J. (2023), β€˜Improved inference for doubly robust estimators of heterogeneous treatment effects’, Biometrics 79(4),Β 3140–3152.
  • Sridhara etΒ al. (2022) Sridhara, R., Marchenko, O., Jiang, Q., Pazdur, R., Posch, M., Berry, S., Theoret, M., Shen, Y.Β L., Gwise, T., Hess, L. etΒ al. (2022), β€˜Use of nonconcurrent common control in master protocols in oncology trials: report of an american statistical association biopharmaceutical section open forum discussion’, Statistics in Biopharmaceutical Research 14(3),Β 353–357.
  • Viele etΒ al. (2014) Viele, K., Berry, S., Neuenschwander, B., Amzal, B., Chen, F. &Β Enas, N. (2014), β€˜Use of historical control data for assessing treatment effects in clinical trials’, Pharm Stat. 13.
    https://doi.org/10.1002/pst.1589
  • Wei etΒ al. (2024) Wei, W., Blaha, O., Esserman, D., Zelterman, D., Kane, M., Liu, R. &Β Lin, J. (2024), β€˜A bayesian platform trial design with hybrid control based on multisource exchangeability modelling’, Statistics in Medicine 43(12),Β 2439–2451.
  • Wells etΒ al. (2012) Wells, A., Fisher, P., Myers, S., Wheatley, J., Patel, T. &Β Brewin, C.Β R. (2012), β€˜Metacognitive therapy in treatment-resistant depression: A platform trial’, Behaviour research and therapy 50(6),Β 367–373.
  • Woodcock &Β LaVange (2017) Woodcock, J. &Β LaVange, L.Β M. (2017), β€˜Master protocols to study multiple therapies, multiple diseases, or both’, New England Journal of Medicine 377(1),Β 62–70.
  • Yuan etΒ al. (2019) Yuan, J., Liu, J., Zhu, R., Lu, Y. &Β Palm, U. (2019), β€˜Design of randomized controlled confirmatory trials using historical control data to augment sample size for concurrent controls’, J Biopharm Stat. 29.
    https://doi.org/10.1080/10543406.2018.1559853

SUPPLEMENTARY MATERIAL
Identification and estimation of causal effects using non-concurrent controls in platform trials
Michele Santacatterina, Federico Macchiavelli Giron, Xinyi Zhang, and IvΓ‘n DΓ­az

Division of Biostatistics, Department of Population Health,

New York University School of Medicine,

New York, NY, 10016

Proofs and M-estimation details:

Theorem 1, 2, and 3 and M-estimation details for estimators based on outcome regression and parametric weighting.

Proofs and M-estimation details

Proof of Theorem 1

E𝐸Eitalic_EWπ‘ŠWitalic_WA𝐴Aitalic_AVksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTY⁒(k)π‘Œπ‘˜Y(k)italic_Y ( italic_k )
Figure 6: DAG associated to the structural equation model in equation (6).
Eisubscript𝐸𝑖\displaystyle E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =fE⁒(UE,i),absentsubscript𝑓𝐸subscriptπ‘ˆπΈπ‘–\displaystyle=f_{E}(U_{E,i}),= italic_f start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_U start_POSTSUBSCRIPT italic_E , italic_i end_POSTSUBSCRIPT ) ,
Wisubscriptπ‘Šπ‘–\displaystyle W_{i}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =fW⁒(Ei,UW,i),absentsubscriptπ‘“π‘Šsubscript𝐸𝑖subscriptπ‘ˆπ‘Šπ‘–\displaystyle=f_{W}(E_{i},U_{W,i}),= italic_f start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPT ) ,
Vk,isubscriptπ‘‰π‘˜π‘–\displaystyle V_{k,i}italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT =fVk⁒(Ei,UV),absentsubscript𝑓subscriptπ‘‰π‘˜subscript𝐸𝑖subscriptπ‘ˆπ‘‰\displaystyle=f_{V_{k}}(E_{i},U_{V}),= italic_f start_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_V end_POSTSUBSCRIPT ) , (6)
Aisubscript𝐴𝑖\displaystyle A_{i}italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =fA⁒(Vi,Wi,UA,i),absentsubscript𝑓𝐴subscript𝑉𝑖subscriptπ‘Šπ‘–subscriptπ‘ˆπ΄π‘–\displaystyle=f_{A}(V_{i},W_{i},U_{A,i}),= italic_f start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ( italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_A , italic_i end_POSTSUBSCRIPT ) ,
Yi⁒(k)subscriptπ‘Œπ‘–π‘˜\displaystyle Y_{i}(k)italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) =fY⁒(k)⁒(k,Wi,Ei,UY,i).absentsubscriptπ‘“π‘Œπ‘˜π‘˜subscriptπ‘Šπ‘–subscript𝐸𝑖subscriptπ‘ˆπ‘Œπ‘–\displaystyle=f_{Y(k)}(k,W_{i},E_{i},U_{Y,i}).= italic_f start_POSTSUBSCRIPT italic_Y ( italic_k ) end_POSTSUBSCRIPT ( italic_k , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_Y , italic_i end_POSTSUBSCRIPT ) .

Since we are interested in the effect of A𝐴Aitalic_A on Yπ‘ŒYitalic_Y, and in using non-concurrent controls, Vk=0subscriptπ‘‰π‘˜0V_{k}=0italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0, we study paths from A𝐴Aitalic_A to Y⁒(k)π‘Œπ‘˜Y(k)italic_Y ( italic_k ) and then apply d-separation. We start by studying paths from A𝐴Aitalic_A to Y⁒(k)π‘Œπ‘˜Y(k)italic_Y ( italic_k ).

A←Vk←Eβ†’Y⁒(k)←𝐴subscriptπ‘‰π‘˜β†πΈβ†’π‘Œπ‘˜\displaystyle A\leftarrow V_{k}\leftarrow E\rightarrow Y(k)italic_A ← italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ← italic_E β†’ italic_Y ( italic_k ) {Vk};{E};{Vk,E}subscriptπ‘‰π‘˜πΈsubscriptπ‘‰π‘˜πΈ\displaystyle\{V_{k}\};\{E\};\{V_{k},E\}{ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ; { italic_E } ; { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_E }
A←Vk←Eβ†’Wβ†’Y⁒(k)←𝐴subscriptπ‘‰π‘˜β†πΈβ†’π‘Šβ†’π‘Œπ‘˜\displaystyle A\leftarrow V_{k}\leftarrow E\rightarrow W\rightarrow Y(k)italic_A ← italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ← italic_E β†’ italic_W β†’ italic_Y ( italic_k ) {Vk};{E};{W};{Vk,E};{Vk,W};{E,W};{Vk,W,E}subscriptπ‘‰π‘˜πΈπ‘Šsubscriptπ‘‰π‘˜πΈsubscriptπ‘‰π‘˜π‘ŠπΈπ‘Šsubscriptπ‘‰π‘˜π‘ŠπΈ\displaystyle\{V_{k}\};\{E\};\{W\};\{V_{k},E\};\{V_{k},W\};\{E,W\};\{V_{k},W,E\}{ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ; { italic_E } ; { italic_W } ; { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_E } ; { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_W } ; { italic_E , italic_W } ; { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_W , italic_E }
A←Wβ†’Y⁒(k)β†π΄π‘Šβ†’π‘Œπ‘˜\displaystyle A\leftarrow W\rightarrow Y(k)italic_A ← italic_W β†’ italic_Y ( italic_k ) {W}π‘Š\displaystyle\{W\}{ italic_W }
A←W←Eβ†’Y⁒(k)β†π΄π‘Šβ†πΈβ†’π‘Œπ‘˜\displaystyle A\leftarrow W\leftarrow E\rightarrow Y(k)italic_A ← italic_W ← italic_E β†’ italic_Y ( italic_k ) {W};{E};{W,E}.π‘ŠπΈπ‘ŠπΈ\displaystyle\{W\};\{E\};\{W,E\}.{ italic_W } ; { italic_E } ; { italic_W , italic_E } .

By applying d-separation, the set {Vk,W,E}subscriptπ‘‰π‘˜π‘ŠπΈ\{V_{k},W,E\}{ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_W , italic_E } conditionally block the path from A𝐴Aitalic_A to Y⁒(k)π‘Œπ‘˜Y(k)italic_Y ( italic_k ). This leads to the following assumptions:

A1weak A-ignorability.

Let a=0,…,Kπ‘Ž0…𝐾a=0,\ldots,Kitalic_a = 0 , … , italic_K. Assume
𝖀⁒[Y⁒(k)|W=w,E=e,Vk=v]=𝖀⁒[Y⁒(k)|A=k,W=w,E=e,Vk=v]𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ‘˜π‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜π‘£π–€delimited-[]formulae-sequenceconditionalπ‘Œπ‘˜π΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜π‘£\mathsf{E}[Y(k)|W=w,E=e,V_{k}=v]=\mathsf{E}[Y(k)|A=k,W=w,E=e,V_{k}=v]sansserif_E [ italic_Y ( italic_k ) | italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v ] = sansserif_E [ italic_Y ( italic_k ) | italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v ].

A2Consistency.

Assume
𝖯⁒(Y⁒(k)|A=k,W=w,E=e,Vk=v)=𝖯⁒(Y|A=k,W=w,E=e,Vk=v)𝖯formulae-sequenceconditionalπ‘Œπ‘˜π΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜π‘£π–―formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜π‘£\mathsf{P}(Y(k)|A=k,W=w,E=e,V_{k}=v)=\mathsf{P}(Y|A=k,W=w,E=e,V_{k}=v)sansserif_P ( italic_Y ( italic_k ) | italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v ) = sansserif_P ( italic_Y | italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_v ).

A3Positivity of treatment assignment mechanism among concurrent units.

Assume
𝖯(A=k∣W=w,E=e,Vk=1)>0\mathsf{P}(A=k\mid W=w,E=e,V_{k}=1)>0sansserif_P ( italic_A = italic_k ∣ italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) > 0 for all w𝑀witalic_w and e𝑒eitalic_e s.t. Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1.

A4Positivity of shared arm assignment mechanism among all controls.

Assume
𝖯(A=0∣W=w,E=e)>0\mathsf{P}(A=0\mid W=w,E=e)>0sansserif_P ( italic_A = 0 ∣ italic_W = italic_w , italic_E = italic_e ) > 0 for all w𝑀witalic_w and e𝑒eitalic_e.

A5Pooling concurrent and non-concurrent controls.

Assume
𝖀⁒(Y∣A=0,W=w,E=e,Vk=1)=𝖀⁒(Y∣A=0,W=w,E=e)𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=0,W=w,E=e,V_{k}=1)=\mathsf{E}(Y\mid A=0,W=w,E=e)sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) = sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e ) for all e𝑒eitalic_e s.t. Vk=1subscriptπ‘‰π‘˜1V_{k}=1italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1.

A6Conditional exchangeability of outcome mechanism among the treated.

Assume
𝖀⁒(Y∣A=k,W=w,E=e,Vk=1)=𝖀⁒(Y∣A=k,W=w,E=e)𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=k,W=w,E=e,V_{k}=1)=\mathsf{E}(Y\mid A=k,W=w,E=e)sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) = sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W = italic_w , italic_E = italic_e ) for all e𝑒eitalic_e.

Identification of concurrent ATE

Recall that

Definition 3 (Conditional and marginal average treatment effect of treatment arm kπ‘˜kitalic_k compared to control among concurrent population).
𝖼𝖒𝖠𝖳𝖀⁒(k,w,e)π–Όπ–’π– π–³π–€π‘˜π‘€π‘’\displaystyle\mathsf{cCATE}(k,w,e)sansserif_cCATE ( italic_k , italic_w , italic_e ) =𝖀⁒[Y⁒(k)βˆ’Y⁒(0)∣Vk=1,W=w,E=e]absent𝖀delimited-[]formulae-sequenceπ‘Œπ‘˜conditionalπ‘Œ0subscriptπ‘‰π‘˜1formulae-sequenceπ‘Šπ‘€πΈπ‘’\displaystyle=\mathsf{E}[Y(k)-Y(0)\mid V_{k}=1,W=w,E=e]= sansserif_E [ italic_Y ( italic_k ) - italic_Y ( 0 ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W = italic_w , italic_E = italic_e ]
𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\displaystyle\mathsf{cATE}(k)sansserif_cATE ( italic_k ) =𝖀⁒[𝖼𝖒𝖠𝖳𝖀⁒(k,W,E)∣Vk=1].absent𝖀delimited-[]conditionalπ–Όπ–’π– π–³π–€π‘˜π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}[\mathsf{cCATE}(k,W,E)\mid V_{k}=1].= sansserif_E [ sansserif_cCATE ( italic_k , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] .

Identification based on the G-formula.

Proof We start by showing it for treatment kπ‘˜kitalic_k. We refer to W=w,E=eformulae-sequenceπ‘Šπ‘€πΈπ‘’W=w,E=eitalic_W = italic_w , italic_E = italic_e as W,Eπ‘ŠπΈW,Eitalic_W , italic_E for clarity and (IE) as iterated expectation.

𝖀⁒(Y⁒(k)∣Vk=1)𝖀conditionalπ‘Œπ‘˜subscriptπ‘‰π‘˜1\displaystyle\mathsf{E}(Y(k)\mid V_{k}=1)sansserif_E ( italic_Y ( italic_k ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 )
=𝖀⁒(𝖀⁒(Y⁒(k)|Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀conditionalπ‘Œπ‘˜subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y(k)|V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y ( italic_k ) | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (IE)
=𝖀⁒(𝖀⁒(Y⁒(k)|A=k,Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀formulae-sequenceconditionalπ‘Œπ‘˜π΄π‘˜subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y(k)|A=k,V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y ( italic_k ) | italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A1)
=𝖀⁒(𝖀⁒(Y|A=k,Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y|A=k,V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y | italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A2,A3)
=1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[Vk=1]⁒𝖀⁒(Y|A=k,Vk=1,W,E))absent1𝖯subscriptπ‘‰π‘˜1𝖀1delimited-[]subscriptπ‘‰π‘˜1𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜subscriptπ‘‰π‘˜1π‘ŠπΈ\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}(\mathds{1}[V_{k}=1]% \mathsf{E}(Y|A=k,V_{k}=1,W,E))= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] sansserif_E ( italic_Y | italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) )

We now show the proof under treatment 0.

𝖀⁒(Y⁒(0)∣Vk=1)𝖀conditionalπ‘Œ0subscriptπ‘‰π‘˜1\displaystyle\mathsf{E}(Y(0)\mid V_{k}=1)sansserif_E ( italic_Y ( 0 ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 )
=𝖀⁒(𝖀⁒(Y⁒(0)|Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀conditionalπ‘Œ0subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y(0)|V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y ( 0 ) | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (IE)
=𝖀⁒(𝖀⁒(Y⁒(0)|A=0,Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀formulae-sequenceconditionalπ‘Œ0𝐴0subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y(0)|A=0,V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y ( 0 ) | italic_A = 0 , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A1)
=𝖀⁒(𝖀⁒(Y|A=0,Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀formulae-sequenceconditionalπ‘Œπ΄0subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y|A=0,V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y | italic_A = 0 , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A2,A4)
=𝖀⁒(𝖀⁒(Y|A=0,W,E)∣Vk=1)absent𝖀conditional𝖀conditionalπ‘Œπ΄0π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y|A=0,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y | italic_A = 0 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A5)
=1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[Vk=1]⁒𝖀⁒(Y|A=0,W,E))absent1𝖯subscriptπ‘‰π‘˜1𝖀1delimited-[]subscriptπ‘‰π‘˜1𝖀conditionalπ‘Œπ΄0π‘ŠπΈ\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}(\mathds{1}[V_{k}=1]% \mathsf{E}(Y|A=0,W,E))= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] sansserif_E ( italic_Y | italic_A = 0 , italic_W , italic_E ) )

Consequently, under (A1)-(A4), 𝖼𝖒𝖠𝖳𝖀⁒(k,w,e)π–Όπ–’π– π–³π–€π‘˜π‘€π‘’\mathsf{cCATE}(k,w,e)sansserif_cCATE ( italic_k , italic_w , italic_e ) is non-parametrically identified as

𝖀⁒(Y∣A=k,Vk=1,W,E)βˆ’π–€β’(Y∣A=0,Vk=1,W,E).𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜subscriptπ‘‰π‘˜1π‘ŠπΈπ–€formulae-sequenceconditionalπ‘Œπ΄0subscriptπ‘‰π‘˜1π‘ŠπΈ\mathsf{E}(Y\mid A=k,V_{k}=1,W,E)-\mathsf{E}(Y\mid A=0,V_{k}=1,W,E).sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) . (7)

In addition, under (A1)-(A5), 𝖼𝖒𝖠𝖳𝖀⁒(k,w,e)π–Όπ–’π– π–³π–€π‘˜π‘€π‘’\mathsf{cCATE}(k,w,e)sansserif_cCATE ( italic_k , italic_w , italic_e ) is identified as

𝖀⁒(Y∣A=k,Vk=1,W,E)βˆ’π–€β’(Y∣A=0,W,E).𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜subscriptπ‘‰π‘˜1π‘ŠπΈπ–€conditionalπ‘Œπ΄0π‘ŠπΈ\mathsf{E}(Y\mid A=k,V_{k}=1,W,E)-\mathsf{E}(Y\mid A=0,W,E).sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E ) . (8)

∎

Identification based on weighting.

Proof We start by showing it for treatment kπ‘˜kitalic_k. We refer to W=w,E=eformulae-sequenceπ‘Šπ‘€πΈπ‘’W=w,E=eitalic_W = italic_w , italic_E = italic_e as W,Eπ‘ŠπΈW,Eitalic_W , italic_E for clarity, and (IE) as iterated expectation.

𝖀⁒(Y⁒(k)∣Vk=1)𝖀conditionalπ‘Œπ‘˜subscriptπ‘‰π‘˜1\displaystyle\mathsf{E}(Y(k)\mid V_{k}=1)sansserif_E ( italic_Y ( italic_k ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 )
=𝖀⁒(𝖀⁒(Y⁒(k)|Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀conditionalπ‘Œπ‘˜subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y(k)|V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y ( italic_k ) | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (IE)
=𝖀⁒(𝖀⁒(Y⁒(k)|A=k,Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀formulae-sequenceconditionalπ‘Œπ‘˜π΄π‘˜subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y(k)|A=k,V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y ( italic_k ) | italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A1)
=𝖀⁒(𝖀⁒(Y|A=k,Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y|A=k,V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y | italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A2)
=𝖀⁒(𝖀⁒(πŸ™β’[A=k,Vk=1]⁒Y𝖯(A=k|Vk=1,W,E)|W,E)∣Vk=1)\displaystyle=\mathsf{E}\left(\mathsf{E}\left(\frac{\mathds{1}[A=k,V_{k}=1]Y}{% \mathsf{P}(A=k|V_{k}=1,W,E)}|W,E\right)\mid V_{k}=1\right)= sansserif_E ( sansserif_E ( divide start_ARG blackboard_1 [ italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_Y end_ARG start_ARG sansserif_P ( italic_A = italic_k | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG | italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A3)
=1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[Vk=1]⁒𝖀⁒(πŸ™β’[A=k,Vk=1]⁒Y𝖯(A=k|Vk=1,W,E)|W,E))\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\mathds{1}[V_{k}=1]% \mathsf{E}\left(\frac{\mathds{1}[A=k,V_{k}=1]Y}{\mathsf{P}(A=k|V_{k}=1,W,E)}|W% ,E\right)\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] sansserif_E ( divide start_ARG blackboard_1 [ italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_Y end_ARG start_ARG sansserif_P ( italic_A = italic_k | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG | italic_W , italic_E ) )
=1𝖯⁒(Vk=1)⁒𝖀⁒(𝖀⁒(πŸ™β’[A=k,Vk=1]⁒Y𝖯(A=k|Vk=1,W,E)|W,E))\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\mathsf{E}\left(% \frac{\mathds{1}[A=k,V_{k}=1]Y}{\mathsf{P}(A=k|V_{k}=1,W,E)}|W,E\right)\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( sansserif_E ( divide start_ARG blackboard_1 [ italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_Y end_ARG start_ARG sansserif_P ( italic_A = italic_k | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG | italic_W , italic_E ) )
=1𝖯⁒(Vk=1)⁒𝖀⁒(𝖀⁒(πŸ™β’[A=k]⁒Y⁒𝖯⁒(Vk=1|W,E)𝖯(A=k|Vk=1,W,E)|W,E))\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\mathsf{E}\left(% \frac{\mathds{1}[A=k]Y\mathsf{P}(V_{k}=1|W,E)}{\mathsf{P}(A=k|V_{k}=1,W,E)}|W,% E\right)\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( sansserif_E ( divide start_ARG blackboard_1 [ italic_A = italic_k ] italic_Y sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_W , italic_E ) end_ARG start_ARG sansserif_P ( italic_A = italic_k | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG | italic_W , italic_E ) ) by (IE)
=1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[A=k]⁒Y⁒𝖯⁒(Vk=1|W,E)𝖯(A=k|Vk=1,W,E))\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\frac{\mathds{1}[A=% k]Y\mathsf{P}(V_{k}=1|W,E)}{\mathsf{P}(A=k|V_{k}=1,W,E)}\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( divide start_ARG blackboard_1 [ italic_A = italic_k ] italic_Y sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_W , italic_E ) end_ARG start_ARG sansserif_P ( italic_A = italic_k | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG )

Note that if Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is deterministic, then 𝖯⁒(Vk=1|W,E)=πŸ™β’[E>t]=πŸ™β’[Vk=1]𝖯subscriptπ‘‰π‘˜conditional1π‘ŠπΈ1delimited-[]𝐸𝑑1delimited-[]subscriptπ‘‰π‘˜1\mathsf{P}(V_{k}=1|W,E)=\mathds{1}[E>t]=\mathds{1}[V_{k}=1]sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_W , italic_E ) = blackboard_1 [ italic_E > italic_t ] = blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] and therefore

1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[A=k]⁒Y⁒𝖯⁒(Vk=1|W,E)𝖯(A=k|Vk=1,W,E))=1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[A=k]⁒Yβ’πŸ™β’[Vk=1]𝖯(A=k|Vk=1,W,E)).\displaystyle\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\frac{\mathds{1}[A=k% ]Y\mathsf{P}(V_{k}=1|W,E)}{\mathsf{P}(A=k|V_{k}=1,W,E)}\right)=\frac{1}{% \mathsf{P}(V_{k}=1)}\mathsf{E}\left(\frac{\mathds{1}[A=k]Y\mathds{1}[V_{k}=1]}% {\mathsf{P}(A=k|V_{k}=1,W,E)}\right).divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( divide start_ARG blackboard_1 [ italic_A = italic_k ] italic_Y sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_W , italic_E ) end_ARG start_ARG sansserif_P ( italic_A = italic_k | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG ) = divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( divide start_ARG blackboard_1 [ italic_A = italic_k ] italic_Y blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] end_ARG start_ARG sansserif_P ( italic_A = italic_k | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG ) .

We now show the proof under treatment 0.

𝖀⁒(Y⁒(0)∣Vk=1)𝖀conditionalπ‘Œ0subscriptπ‘‰π‘˜1\displaystyle\mathsf{E}(Y(0)\mid V_{k}=1)sansserif_E ( italic_Y ( 0 ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 )
=𝖀⁒(𝖀⁒(Y⁒(0)|Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀conditionalπ‘Œ0subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y(0)|V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y ( 0 ) | italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (IE)
=𝖀⁒(𝖀⁒(Y⁒(0)|A=0,Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀formulae-sequenceconditionalπ‘Œ0𝐴0subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y(0)|A=0,V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y ( 0 ) | italic_A = 0 , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A1)
=𝖀⁒(𝖀⁒(Y|A=0,Vk=1,W,E)∣Vk=1)absent𝖀conditional𝖀formulae-sequenceconditionalπ‘Œπ΄0subscriptπ‘‰π‘˜1π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y|A=0,V_{k}=1,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y | italic_A = 0 , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A2)
=𝖀⁒(𝖀⁒(Y|A=0,W,E)∣Vk=1)absent𝖀conditional𝖀conditionalπ‘Œπ΄0π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}(\mathsf{E}(Y|A=0,W,E)\mid V_{k}=1)= sansserif_E ( sansserif_E ( italic_Y | italic_A = 0 , italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A5)
=𝖀⁒(𝖀⁒(πŸ™β’[A=0]⁒Y𝖯⁒(A=0|W,E)|W,E)∣Vk=1)absent𝖀conditional𝖀conditional1delimited-[]𝐴0π‘Œπ–―π΄conditional0π‘ŠπΈπ‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}\left(\mathsf{E}\left(\frac{\mathds{1}[A=0]Y}{\mathsf{% P}(A=0|W,E)}|W,E\right)\mid V_{k}=1\right)= sansserif_E ( sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG | italic_W , italic_E ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) by (A4)
=1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[Vk=1]⁒𝖀⁒(πŸ™β’[A=0]⁒Y𝖯⁒(A=0|W,E)|W,E))absent1𝖯subscriptπ‘‰π‘˜1𝖀1delimited-[]subscriptπ‘‰π‘˜1𝖀conditional1delimited-[]𝐴0π‘Œπ–―π΄conditional0π‘ŠπΈπ‘ŠπΈ\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\mathds{1}[V_{k}=1]% \mathsf{E}\left(\frac{\mathds{1}[A=0]Y}{\mathsf{P}(A=0|W,E)}|W,E\right)\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG | italic_W , italic_E ) )
=1𝖯⁒(Vk=1)⁒𝖀⁒(𝖀⁒(πŸ™β’[Vk=1]|W,E)⁒𝖀⁒(πŸ™β’[A=0]⁒Y𝖯⁒(A=0|W,E)|W,E))absent1𝖯subscriptπ‘‰π‘˜1𝖀𝖀conditional1delimited-[]subscriptπ‘‰π‘˜1π‘ŠπΈπ–€conditional1delimited-[]𝐴0π‘Œπ–―π΄conditional0π‘ŠπΈπ‘ŠπΈ\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\mathsf{E}(\mathds{% 1}[V_{k}=1]|W,E)\mathsf{E}\left(\frac{\mathds{1}[A=0]Y}{\mathsf{P}(A=0|W,E)}|W% ,E\right)\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( sansserif_E ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] | italic_W , italic_E ) sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG | italic_W , italic_E ) ) by (IE)
=1𝖯⁒(Vk=1)⁒𝖀⁒(𝖀⁒(πŸ™β’[A=0]⁒Yβ’πŸ™β’[Vk=1]𝖯⁒(A=0|W,E)|W,E))absent1𝖯subscriptπ‘‰π‘˜1𝖀𝖀conditional1delimited-[]𝐴0π‘Œ1delimited-[]subscriptπ‘‰π‘˜1𝖯𝐴conditional0π‘ŠπΈπ‘ŠπΈ\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\mathsf{E}\left(% \frac{\mathds{1}[A=0]Y\mathds{1}[V_{k}=1]}{\mathsf{P}(A=0|W,E)}|W,E\right)\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG | italic_W , italic_E ) )
=1𝖯⁒(Vk=1)⁒𝖀⁒(𝖀⁒(πŸ™β’[A=0]⁒Y⁒𝖀⁒(πŸ™β’[Vk=1]|W,E)𝖯⁒(A=0|W,E)|W,E))absent1𝖯subscriptπ‘‰π‘˜1𝖀𝖀conditional1delimited-[]𝐴0π‘Œπ–€conditional1delimited-[]subscriptπ‘‰π‘˜1π‘ŠπΈπ–―π΄conditional0π‘ŠπΈπ‘ŠπΈ\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\mathsf{E}\left(% \frac{\mathds{1}[A=0]Y\mathsf{E}(\mathds{1}[V_{k}=1]|W,E)}{\mathsf{P}(A=0|W,E)% }|W,E\right)\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y sansserif_E ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] | italic_W , italic_E ) end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG | italic_W , italic_E ) ) by (IE)
=1𝖯⁒(Vk=1)⁒𝖀⁒(𝖀⁒(πŸ™β’[A=0]⁒Y⁒𝖯⁒(Vk=1|W,E)𝖯⁒(A=0|W,E)|W,E))absent1𝖯subscriptπ‘‰π‘˜1𝖀𝖀conditional1delimited-[]𝐴0π‘Œπ–―subscriptπ‘‰π‘˜conditional1π‘ŠπΈπ–―π΄conditional0π‘ŠπΈπ‘ŠπΈ\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\mathsf{E}\left(% \frac{\mathds{1}[A=0]Y\mathsf{P}(V_{k}=1|W,E)}{\mathsf{P}(A=0|W,E)}|W,E\right)\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_W , italic_E ) end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG | italic_W , italic_E ) )
=1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[A=0]⁒Y⁒𝖯⁒(Vk=1|W,E)𝖯⁒(A=0|W,E))absent1𝖯subscriptπ‘‰π‘˜1𝖀1delimited-[]𝐴0π‘Œπ–―subscriptπ‘‰π‘˜conditional1π‘ŠπΈπ–―π΄conditional0π‘ŠπΈ\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\frac{\mathds{1}[A=% 0]Y\mathsf{P}(V_{k}=1|W,E)}{\mathsf{P}(A=0|W,E)}\right)= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_W , italic_E ) end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG )

Note that if Vksubscriptπ‘‰π‘˜V_{k}italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is deterministic, then 𝖯⁒(Vk=1|W,E)=πŸ™β’[E>t]=πŸ™β’[Vk=1]𝖯subscriptπ‘‰π‘˜conditional1π‘ŠπΈ1delimited-[]𝐸𝑑1delimited-[]subscriptπ‘‰π‘˜1\mathsf{P}(V_{k}=1|W,E)=\mathds{1}[E>t]=\mathds{1}[V_{k}=1]sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_W , italic_E ) = blackboard_1 [ italic_E > italic_t ] = blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] and therefore

1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[A=0]⁒Y⁒𝖯⁒(Vk=1|W,E)𝖯⁒(A=0|W,E))=1𝖯⁒(Vk=1)⁒𝖀⁒(πŸ™β’[A=0]⁒Yβ’πŸ™β’[Vk=1]𝖯⁒(A=0|W,E)).1𝖯subscriptπ‘‰π‘˜1𝖀1delimited-[]𝐴0π‘Œπ–―subscriptπ‘‰π‘˜conditional1π‘ŠπΈπ–―π΄conditional0π‘ŠπΈ1𝖯subscriptπ‘‰π‘˜1𝖀1delimited-[]𝐴0π‘Œ1delimited-[]subscriptπ‘‰π‘˜1𝖯𝐴conditional0π‘ŠπΈ\displaystyle\frac{1}{\mathsf{P}(V_{k}=1)}\mathsf{E}\left(\frac{\mathds{1}[A=0% ]Y\mathsf{P}(V_{k}=1|W,E)}{\mathsf{P}(A=0|W,E)}\right)=\frac{1}{\mathsf{P}(V_{% k}=1)}\mathsf{E}\left(\frac{\mathds{1}[A=0]Y\mathds{1}[V_{k}=1]}{\mathsf{P}(A=% 0|W,E)}\right).divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_W , italic_E ) end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG ) = divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG sansserif_E ( divide start_ARG blackboard_1 [ italic_A = 0 ] italic_Y blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] end_ARG start_ARG sansserif_P ( italic_A = 0 | italic_W , italic_E ) end_ARG ) .

∎

Proof of Theorem 2

Identification of ATE

Recall that

Definition 4 (Conditional and marginal average treatment effect of treatment arm kπ‘˜kitalic_k compared to control).
𝖒𝖠𝖳𝖀⁒(k,e,w)π–’π– π–³π–€π‘˜π‘’π‘€\displaystyle\mathsf{CATE}(k,e,w)sansserif_CATE ( italic_k , italic_e , italic_w ) =𝖀⁒[Y⁒(k)βˆ’Y⁒(0)∣W=w,E=e]absent𝖀delimited-[]formulae-sequenceπ‘Œπ‘˜conditionalπ‘Œ0π‘Šπ‘€πΈπ‘’\displaystyle=\mathsf{E}[Y(k)-Y(0)\mid W=w,E=e]= sansserif_E [ italic_Y ( italic_k ) - italic_Y ( 0 ) ∣ italic_W = italic_w , italic_E = italic_e ]
𝖠𝖳𝖀⁒(k)π– π–³π–€π‘˜\displaystyle\mathsf{ATE}(k)sansserif_ATE ( italic_k ) =𝖀⁒[𝖒𝖠𝖳𝖀⁒(k,W,E)].absent𝖀delimited-[]π–’π– π–³π–€π‘˜π‘ŠπΈ\displaystyle=\mathsf{E}[\mathsf{CATE}(k,W,E)].= sansserif_E [ sansserif_CATE ( italic_k , italic_W , italic_E ) ] .

Identification based on the G-formula.

Proof We show the proof for treatment 00. We refer to W=w,E=eformulae-sequenceπ‘Šπ‘€πΈπ‘’W=w,E=eitalic_W = italic_w , italic_E = italic_e as W,Eπ‘ŠπΈW,Eitalic_W , italic_E for clarity.

𝖀⁒(Y⁒(0))π–€π‘Œ0\displaystyle\mathsf{E}(Y(0))sansserif_E ( italic_Y ( 0 ) ) =𝖀⁒(𝖀⁒(Y⁒(0)|W,E))absent𝖀𝖀conditionalπ‘Œ0π‘ŠπΈ\displaystyle=\mathsf{E}(\mathsf{E}(Y(0)|W,E))= sansserif_E ( sansserif_E ( italic_Y ( 0 ) | italic_W , italic_E ) ) by (IE)
=𝖀⁒(𝖀⁒(Y⁒(0)|A=0,W,E))absent𝖀𝖀conditionalπ‘Œ0𝐴0π‘ŠπΈ\displaystyle=\mathsf{E}(\mathsf{E}(Y(0)|A=0,W,E))= sansserif_E ( sansserif_E ( italic_Y ( 0 ) | italic_A = 0 , italic_W , italic_E ) ) by (A1)
=𝖀⁒(𝖀⁒(Y|A=0,W,E))absent𝖀𝖀conditionalπ‘Œπ΄0π‘ŠπΈ\displaystyle=\mathsf{E}(\mathsf{E}(Y|A=0,W,E))= sansserif_E ( sansserif_E ( italic_Y | italic_A = 0 , italic_W , italic_E ) ) by (A2,A4)

The proof for treatment kπ‘˜kitalic_k can be shown by following the steps for identifying 𝖀⁒(Y⁒(k)∣Vk=1)𝖀conditionalπ‘Œπ‘˜subscriptπ‘‰π‘˜1\mathsf{E}(Y(k)\mid V_{k}=1)sansserif_E ( italic_Y ( italic_k ) ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) in the section above and then assuming (A6) to be able to marginalize to concurrent and non-concurrent controls combined. Consequently, under (A1-A6), 𝖒𝖠𝖳𝖀⁒(k,w,e)π–’π– π–³π–€π‘˜π‘€π‘’\mathsf{CATE}(k,w,e)sansserif_CATE ( italic_k , italic_w , italic_e ) is identified as

𝖀⁒(Y∣A=k,W,E,Vk=1)βˆ’π–€β’(Y∣A=0,W,E).𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜π‘ŠπΈsubscriptπ‘‰π‘˜1𝖀conditionalπ‘Œπ΄0π‘ŠπΈ\mathsf{E}(Y\mid A=k,W,E,V_{k}=1)-\mathsf{E}(Y\mid A=0,W,E).sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E ) . (9)

∎

M-estimation details

We here provide detail on the M-estimation approach for obtaining asymptotic variances for outcome regression and weighted estimators. Recall that Zisubscript𝑍𝑖Z_{i}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represent the data for the experimental unit i𝑖iitalic_i, i.e., Zi=(Ei,Wi,Vk,i,Ai,Yi)βˆΌπ–―subscript𝑍𝑖subscript𝐸𝑖subscriptπ‘Šπ‘–subscriptπ‘‰π‘˜π‘–subscript𝐴𝑖subscriptπ‘Œπ‘–similar-to𝖯Z_{i}=(E_{i},W_{i},V_{k,i},A_{i},Y_{i})\sim\mathsf{P}italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_V start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∼ sansserif_P and consider Xi=(Ei,Wi)subscript𝑋𝑖subscript𝐸𝑖subscriptπ‘Šπ‘–X_{i}=(E_{i},W_{i})italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ).

Outcome regression

𝖼𝖠𝖳𝖀^ORocsuperscriptsubscript^𝖼𝖠𝖳𝖀ORoc\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT

. This estimator consider only concurrent controls. Let’s define 𝖼𝖠𝖳𝖀^ORoc=ΞΌkβˆ’ΞΌ0superscriptsubscript^𝖼𝖠𝖳𝖀ORocsubscriptπœ‡π‘˜subscriptπœ‡0\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}=\mu_{k}-\mu_{0}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT = italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT where ΞΌksubscriptπœ‡π‘˜\mu_{k}italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ΞΌ0subscriptπœ‡0\mu_{0}italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are the mean outcomes under treatment kπ‘˜kitalic_k and control in the only concurrent control population. We started by considering controls, ΞΈ0=(Ξ²0,ΞΌ0)subscriptπœƒ0subscript𝛽0subscriptπœ‡0\theta_{0}=(\beta_{0},\mu_{0})italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and the following estimating equations

βˆ‘i=1nh⁒(Zi,ΞΈ0)superscriptsubscript𝑖1π‘›β„Žsubscript𝑍𝑖subscriptπœƒ0\displaystyle\sum_{i=1}^{n}h(Z_{i},\theta_{0})βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =βˆ‘i=1n(h1⁒(Zi,Ξ²0)h2⁒(Zi,ΞΌ0))=0absentsuperscriptsubscript𝑖1𝑛matrixsubscriptβ„Ž1subscript𝑍𝑖subscript𝛽0subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡00\displaystyle=\sum_{i=1}^{n}\begin{pmatrix}h_{1}(Z_{i},\beta_{0})\\ h_{2}(Z_{i},\mu_{0})\end{pmatrix}=0= βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) = 0

where h1⁒(Zi,Ξ²0)=Xi⊀⁒Vi⁒(1βˆ’Ai)⁒(Yiβˆ’Xi⁒β0)subscriptβ„Ž1subscript𝑍𝑖subscript𝛽0superscriptsubscript𝑋𝑖topsubscript𝑉𝑖1subscript𝐴𝑖subscriptπ‘Œπ‘–subscript𝑋𝑖subscript𝛽0h_{1}(Z_{i},\beta_{0})=X_{i}^{\top}V_{i}(1-A_{i})(Y_{i}-X_{i}\beta_{0})italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and h2⁒(Xi,ΞΌ0)=Vi⁒(Zi⁒β0βˆ’ΞΌ0)subscriptβ„Ž2subscript𝑋𝑖subscriptπœ‡0subscript𝑉𝑖subscript𝑍𝑖subscript𝛽0subscriptπœ‡0h_{2}(X_{i},\mu_{0})=V_{i}(Z_{i}\beta_{0}-\mu_{0})italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) are the score functions for the model of the conditional mean and the the marginal mean under control, respectively. We consider the following Jacobian matrix of the estimating equations,

𝐆¯⁒(ΞΈ^0)¯𝐆subscript^πœƒ0\displaystyle\overline{\mathbf{G}}(\hat{\theta}_{0})overΒ― start_ARG bold_G end_ARG ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =βˆ’1nβˆ‘i=1nβˆ‚h⁒(Zi,ΞΈ0)βˆ‚ΞΈ0⊀|ΞΈ0=ΞΈ^0\displaystyle=-\frac{1}{n}\sum_{i=1}^{n}\frac{\partial h(Z_{i},\theta_{0})}{% \partial\theta_{0}^{\top}}\biggr{\rvert}_{\theta_{0}=\hat{\theta}_{0}}= - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG βˆ‚ italic_h ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG βˆ‚ italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT
=1nβ’βˆ‘i=1n(𝐆¯11πŸŽπ†Β―21𝐆¯22)absent1𝑛superscriptsubscript𝑖1𝑛matrixsubscript¯𝐆110subscript¯𝐆21subscript¯𝐆22\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\begin{pmatrix}\overline{\mathbf{G}}_{1% 1}&\mathbf{0}\\ \overline{\mathbf{G}}_{21}&\overline{\mathbf{G}}_{22}\end{pmatrix}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG )
=1nβ’βˆ‘i=1n(Xi⊀⁒Vi⁒(1βˆ’Ai)⁒Xi0βˆ’Vi⁒XiVi)absent1𝑛superscriptsubscript𝑖1𝑛matrixsuperscriptsubscript𝑋𝑖topsubscript𝑉𝑖1subscript𝐴𝑖subscript𝑋𝑖0subscript𝑉𝑖subscript𝑋𝑖subscript𝑉𝑖\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\begin{pmatrix}X_{i}^{\top}V_{i}(1-A_{i% })X_{i}&0\\ -V_{i}X_{i}&V_{i}\end{pmatrix}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 1 - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL - italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG )

We then constructed the following influence functions

φ⁒(Zi,Ξ²^0)πœ‘subscript𝑍𝑖subscript^𝛽0\displaystyle\varphi(Z_{i},\hat{\beta}_{0})italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =𝐆¯11βˆ’1⁒h1⁒(Zi,Ξ²^0)absentsuperscriptsubscript¯𝐆111subscriptβ„Ž1subscript𝑍𝑖subscript^𝛽0\displaystyle=\overline{\mathbf{G}}_{11}^{-1}h_{1}(Z_{i},\hat{\beta}_{0})= overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
φ⁒(Zi,ΞΌ^0)πœ‘subscript𝑍𝑖subscript^πœ‡0\displaystyle\varphi(Z_{i},\hat{\mu}_{0})italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ΞΌ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =𝐆¯22βˆ’1⁒(h2⁒(Zi,ΞΌ0)+(βˆ’π†Β―21)⁒φ⁒(Zi,Ξ²^0)),absentsuperscriptsubscript¯𝐆221subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡0subscript¯𝐆21πœ‘subscript𝑍𝑖subscript^𝛽0\displaystyle=\overline{\mathbf{G}}_{22}^{-1}\left(h_{2}(Z_{i},\mu_{0})+(-% \overline{\mathbf{G}}_{21})\varphi(Z_{i},\hat{\beta}_{0})\right),= overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ( - overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ) italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) ,

where Ξ²^0subscript^𝛽0\hat{\beta}_{0}over^ start_ARG italic_Ξ² end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT where obtained by ordinary least squares. We conducted a similar analysis for ΞΈk=(Ξ²k,ΞΌk)subscriptπœƒπ‘˜subscriptπ›½π‘˜subscriptπœ‡π‘˜\theta_{k}=(\beta_{k},\mu_{k})italic_ΞΈ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). Finally, we obtained the variance of 𝖼𝖠𝖳𝖀^ORocsuperscriptsubscript^𝖼𝖠𝖳𝖀ORoc\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT as,

V^⁒(𝖼𝖠𝖳𝖀^ORoc)^𝑉superscriptsubscript^𝖼𝖠𝖳𝖀ORoc\displaystyle\hat{V}(\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}})over^ start_ARG italic_V end_ARG ( over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT ) =1n⁒(1nβ’βˆ‘i=1nφ⁒(Zi,𝖼𝖠𝖳𝖀^ORoc)⁒φ⁒(Zi,𝖼𝖠𝖳𝖀^ORoc)⊀),absent1𝑛1𝑛superscriptsubscript𝑖1π‘›πœ‘subscript𝑍𝑖superscriptsubscript^𝖼𝖠𝖳𝖀ORocπœ‘superscriptsubscript𝑍𝑖superscriptsubscript^𝖼𝖠𝖳𝖀ORoctop\displaystyle=\frac{1}{n}\left(\frac{1}{n}\sum_{i=1}^{n}\varphi(Z_{i},\hat{% \mathsf{cATE}}_{\text{OR}}^{\text{oc}})\varphi(Z_{i},\hat{\mathsf{cATE}}_{% \text{OR}}^{\text{oc}})^{\top}\right),= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT ) italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT ) ,

where φ⁒(Zi,𝖼𝖠𝖳𝖀^ORoc)=φ⁒(Zi,ΞΌ^k)βˆ’Ο†β’(Zi,ΞΌ^0)πœ‘subscript𝑍𝑖superscriptsubscript^𝖼𝖠𝖳𝖀ORocπœ‘subscript𝑍𝑖subscript^πœ‡π‘˜πœ‘subscript𝑍𝑖subscript^πœ‡0\varphi(Z_{i},\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}})=\varphi(Z_{i},\hat{% \mu}_{k})-\varphi(Z_{i},\hat{\mu}_{0})italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT ) = italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ΞΌ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ΞΌ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ).

𝖼𝖠𝖳𝖀^ORallsuperscriptsubscript^𝖼𝖠𝖳𝖀ORall\hat{\mathsf{cATE}}_{\text{OR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT

. This estimator consider both concurrent and non concurrent controls when estimating 𝖀⁒(Y∣A=0,W=w,E=e)𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=0,W=w,E=e)sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e ). Hence, the analysis for 𝖼𝖠𝖳𝖀^ORallsuperscriptsubscript^𝖼𝖠𝖳𝖀ORall\hat{\mathsf{cATE}}_{\text{OR}}^{\text{all}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT all end_POSTSUPERSCRIPT looks the same as that for 𝖼𝖠𝖳𝖀^ORocsuperscriptsubscript^𝖼𝖠𝖳𝖀ORoc\hat{\mathsf{cATE}}_{\text{OR}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT only changing the estimating equation for 𝖀⁒(Y∣A=0,W=w,E=e)𝖀formulae-sequenceconditionalπ‘Œπ΄0formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=0,W=w,E=e)sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W = italic_w , italic_E = italic_e ), i.e., h1⁒(Zi,Ξ²0)=Zi⊀⁒(1βˆ’Ai)⁒(Yiβˆ’Zi⁒β0)subscriptβ„Ž1subscript𝑍𝑖subscript𝛽0superscriptsubscript𝑍𝑖top1subscript𝐴𝑖subscriptπ‘Œπ‘–subscript𝑍𝑖subscript𝛽0h_{1}(Z_{i},\beta_{0})=Z_{i}^{\top}(1-A_{i})(Y_{i}-Z_{i}\beta_{0})italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT ( 1 - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), while the conditional mean of the outcome among the treated remains computed within only concurrent, i.e., 𝖀⁒(Y∣A=k,Vk=1,W=w,E=e)𝖀formulae-sequenceconditionalπ‘Œπ΄π‘˜formulae-sequencesubscriptπ‘‰π‘˜1formulae-sequenceπ‘Šπ‘€πΈπ‘’\mathsf{E}(Y\mid A=k,V_{k}=1,W=w,E=e)sansserif_E ( italic_Y ∣ italic_A = italic_k , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W = italic_w , italic_E = italic_e ). Specifically, we started by considering controls, ΞΈ0=(Ξ±0,ΞΌ0)subscriptπœƒ0subscript𝛼0subscriptπœ‡0\theta_{0}=(\alpha_{0},\mu_{0})italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( italic_Ξ± start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and the following estimating equations

βˆ‘i=1nh⁒(Zi,ΞΈ0)superscriptsubscript𝑖1π‘›β„Žsubscript𝑍𝑖subscriptπœƒ0\displaystyle\sum_{i=1}^{n}h(Z_{i},\theta_{0})βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =βˆ‘i=1n(h1⁒(Zi,Ξ±0)h2⁒(Zi,ΞΌ0))=0absentsuperscriptsubscript𝑖1𝑛matrixsubscriptβ„Ž1subscript𝑍𝑖subscript𝛼0subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡00\displaystyle=\sum_{i=1}^{n}\begin{pmatrix}h_{1}(Z_{i},\alpha_{0})\\ h_{2}(Z_{i},\mu_{0})\end{pmatrix}=0= βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ± start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) = 0

where h1⁒(Zi,Ξ±0)=Xi⊀⁒(1βˆ’Ai)⁒(Yiβˆ’Xi⁒α0)subscriptβ„Ž1subscript𝑍𝑖subscript𝛼0superscriptsubscript𝑋𝑖top1subscript𝐴𝑖subscriptπ‘Œπ‘–subscript𝑋𝑖subscript𝛼0h_{1}(Z_{i},\alpha_{0})=X_{i}^{\top}(1-A_{i})(Y_{i}-X_{i}\alpha_{0})italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ± start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT ( 1 - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Ξ± start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and h2⁒(Zi,ΞΌ0)=Vi⁒(Xi⁒α0βˆ’ΞΌ0)subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡0subscript𝑉𝑖subscript𝑋𝑖subscript𝛼0subscriptπœ‡0h_{2}(Z_{i},\mu_{0})=V_{i}(X_{i}\alpha_{0}-\mu_{0})italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Ξ± start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) are the score functions for the model of the conditional mean and the the marginal mean under control, respectively. While for the treated units we considered, ΞΈk=(Ξ²k,ΞΌk)subscriptπœƒπ‘˜subscriptπ›½π‘˜subscriptπœ‡π‘˜\theta_{k}=(\beta_{k},\mu_{k})italic_ΞΈ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and the following estimating equations

βˆ‘i=1nh⁒(Zi,ΞΈk)superscriptsubscript𝑖1π‘›β„Žsubscript𝑍𝑖subscriptπœƒπ‘˜\displaystyle\sum_{i=1}^{n}h(Z_{i},\theta_{k})βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΈ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) =βˆ‘i=1n(h1⁒(Zi,Ξ²k)h2⁒(Zi,ΞΌk))=0absentsuperscriptsubscript𝑖1𝑛matrixsubscriptβ„Ž1subscript𝑍𝑖subscriptπ›½π‘˜subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡π‘˜0\displaystyle=\sum_{i=1}^{n}\begin{pmatrix}h_{1}(Z_{i},\beta_{k})\\ h_{2}(Z_{i},\mu_{k})\end{pmatrix}=0= βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) = 0

where h1⁒(Zi,Ξ²k)=Xi⊀⁒Vi⁒Ai⁒(Yiβˆ’Xi⁒βk)subscriptβ„Ž1subscript𝑍𝑖subscriptπ›½π‘˜superscriptsubscript𝑋𝑖topsubscript𝑉𝑖subscript𝐴𝑖subscriptπ‘Œπ‘–subscript𝑋𝑖subscriptπ›½π‘˜h_{1}(Z_{i},\beta_{k})=X_{i}^{\top}V_{i}A_{i}(Y_{i}-X_{i}\beta_{k})italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and h2⁒(Zi,ΞΌk)=Vi⁒(Xi⁒βkβˆ’ΞΌk)subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡π‘˜subscript𝑉𝑖subscript𝑋𝑖subscriptπ›½π‘˜subscriptπœ‡π‘˜h_{2}(Z_{i},\mu_{k})=V_{i}(X_{i}\beta_{k}-\mu_{k})italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Ξ² start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). Derivation of the Jacobian matrix of the estimating equations is similar to the above.

Parametric inverse probability weighting

𝖼𝖠𝖳𝖀^IPWocsuperscriptsubscript^𝖼𝖠𝖳𝖀IPWoc\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT

. This estimator consider only concurrent controls. Let’s define 𝖼𝖠𝖳𝖀^IPWoc=ΞΌkβˆ’ΞΌ0superscriptsubscript^𝖼𝖠𝖳𝖀IPWocsubscriptπœ‡π‘˜subscriptπœ‡0\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}}=\mu_{k}-\mu_{0}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT = italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. We started by considering controls, ΞΈ=(Ξ·,ΞΌ0,ΞΌ1)πœƒπœ‚subscriptπœ‡0subscriptπœ‡1\theta=(\eta,\mu_{0},\mu_{1})italic_ΞΈ = ( italic_Ξ· , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and the following estimating equations

βˆ‘i=1nh⁒(Zi,ΞΈ)superscriptsubscript𝑖1π‘›β„Žsubscriptπ‘π‘–πœƒ\displaystyle\sum_{i=1}^{n}h(Z_{i},\theta)βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_h ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΈ ) =βˆ‘i=1n(h1⁒(Zi,Ξ·)h2⁒(Zi,ΞΌ0)h3⁒(Zi,ΞΌ1))=0absentsuperscriptsubscript𝑖1𝑛matrixsubscriptβ„Ž1subscriptπ‘π‘–πœ‚subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡0subscriptβ„Ž3subscript𝑍𝑖subscriptπœ‡10\displaystyle=\sum_{i=1}^{n}\begin{pmatrix}h_{1}(Z_{i},\eta)\\ h_{2}(Z_{i},\mu_{0})\\ h_{3}(Z_{i},\mu_{1})\end{pmatrix}=0= βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ· ) end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ) = 0

where h1⁒(Zi,Ξ·)=Xi⊀⁒Vi⁒(Aiβˆ’Ο€)subscriptβ„Ž1subscriptπ‘π‘–πœ‚superscriptsubscript𝑋𝑖topsubscript𝑉𝑖subscriptπ΄π‘–πœ‹h_{1}(Z_{i},\eta)=X_{i}^{\top}V_{i}(A_{i}-\pi)italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_Ξ· ) = italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_Ο€ ) and h2⁒(Zi,ΞΌ0)=Vi⁒(Ξ³i0⁒Yiβˆ’ΞΌ0)subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡0subscript𝑉𝑖subscriptsuperscript𝛾0𝑖subscriptπ‘Œπ‘–subscriptπœ‡0h_{2}(Z_{i},\mu_{0})=V_{i}(\gamma^{0}_{i}Y_{i}-\mu_{0})italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_Ξ³ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), h3⁒(Zi,ΞΌ1)=Vi⁒(Ξ³i1⁒Yiβˆ’ΞΌ1)subscriptβ„Ž3subscript𝑍𝑖subscriptπœ‡1subscript𝑉𝑖subscriptsuperscript𝛾1𝑖subscriptπ‘Œπ‘–subscriptπœ‡1h_{3}(Z_{i},\mu_{1})=V_{i}(\gamma^{1}_{i}Y_{i}-\mu_{1})italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_Ξ³ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ΞΌ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) are the score functions for the model of the conditional probability and the marginal mean under control and treatment, respectively, and where Ξ³i0=πŸ™β’{Ai=0}/(1βˆ’Ο€i)subscriptsuperscript𝛾0𝑖1subscript𝐴𝑖01subscriptπœ‹π‘–\gamma^{0}_{i}=\mathds{1}\{{A_{i}=0}\}/(1-\pi_{i})italic_Ξ³ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_1 { italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 } / ( 1 - italic_Ο€ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), Ξ³ik=πŸ™β’{Ai=k}/(Ο€i)subscriptsuperscriptπ›Ύπ‘˜π‘–1subscriptπ΄π‘–π‘˜subscriptπœ‹π‘–\gamma^{k}_{i}=\mathds{1}\{{A_{i}=k}\}/(\pi_{i})italic_Ξ³ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = blackboard_1 { italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_k } / ( italic_Ο€ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and Ο€i=exp⁑(Xi⊀⁒η)1+exp⁑(Xi⊀⁒η)subscriptπœ‹π‘–superscriptsubscript𝑋𝑖topπœ‚1superscriptsubscript𝑋𝑖topπœ‚\pi_{i}=\frac{\exp(X_{i}^{\top}\eta)}{1+\exp(X_{i}^{\top}\eta)}italic_Ο€ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG roman_exp ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_Ξ· ) end_ARG start_ARG 1 + roman_exp ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_Ξ· ) end_ARG. We consider the following Jacobian matrix of the estimating equations,

𝐆¯⁒(ΞΈ^0)¯𝐆subscript^πœƒ0\displaystyle\overline{\mathbf{G}}(\hat{\theta}_{0})overΒ― start_ARG bold_G end_ARG ( over^ start_ARG italic_ΞΈ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =βˆ’1nβˆ‘i=1nβˆ‚h⁒(Zi,ΞΈ0)βˆ‚ΞΈ0⊀|ΞΈ=ΞΈ^\displaystyle=-\frac{1}{n}\sum_{i=1}^{n}\frac{\partial h(Z_{i},\theta_{0})}{% \partial\theta_{0}^{\top}}\biggr{\rvert}_{\theta=\hat{\theta}}= - divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG βˆ‚ italic_h ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG βˆ‚ italic_ΞΈ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT italic_ΞΈ = over^ start_ARG italic_ΞΈ end_ARG end_POSTSUBSCRIPT
=1nβ’βˆ‘i=1n(𝐆¯11πŸŽπŸŽπ†Β―21𝐆¯22πŸŽπ†Β―31πŸŽπ†Β―33)absent1𝑛superscriptsubscript𝑖1𝑛matrixsubscript¯𝐆1100subscript¯𝐆21subscript¯𝐆220subscript¯𝐆310subscript¯𝐆33\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\begin{pmatrix}\overline{\mathbf{G}}_{1% 1}&\mathbf{0}&\mathbf{0}\\ \overline{\mathbf{G}}_{21}&\overline{\mathbf{G}}_{22}&\mathbf{0}\\ \overline{\mathbf{G}}_{31}&\mathbf{0}&\overline{\mathbf{G}}_{33}\\ \end{pmatrix}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT end_CELL start_CELL overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT end_CELL start_CELL bold_0 end_CELL start_CELL overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 33 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG )
=1nβ’βˆ‘i=1n(Xi⊀⁒Vi⁒exp⁑(Xi⊀⁒η)1+exp(Xi⊀η)2⁒Xi00(1βˆ’Ai)⁒Vi⁒Xi⁒Yi⁒exp⁑(Xi⊀⁒η)Vi0Ai⁒Vi⁒Xi⁒Yi⁒exp⁑(βˆ’Xi⊀⁒η)0Vi)\displaystyle=\frac{1}{n}\sum_{i=1}^{n}\begin{pmatrix}X_{i}^{\top}V_{i}\frac{% \exp(X_{i}^{\top}\eta)}{1+\exp(X_{i}^{\top}\eta)^{2}}X_{i}&0&0\\ (1-A_{i})V_{i}X_{i}Y_{i}\exp(X_{i}^{\top}\eta)&V_{i}&0\\ A_{i}V_{i}X_{i}Y_{i}\exp(-X_{i}^{\top}\eta)&0&V_{i}\\ \end{pmatrix}= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( start_ARG start_ROW start_CELL italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG roman_exp ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_Ξ· ) end_ARG start_ARG 1 + roman_exp ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_Ξ· ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL ( 1 - italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_exp ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_Ξ· ) end_CELL start_CELL italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_exp ( - italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT italic_Ξ· ) end_CELL start_CELL 0 end_CELL start_CELL italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG )

We then constructed the following influence functions

φ⁒(Zi,Ξ·^)πœ‘subscript𝑍𝑖^πœ‚\displaystyle\varphi(Z_{i},\hat{\eta})italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Ξ· end_ARG ) =𝐆¯11βˆ’1⁒h1⁒(Zi,Ξ·^),absentsuperscriptsubscript¯𝐆111subscriptβ„Ž1subscript𝑍𝑖^πœ‚\displaystyle=\overline{\mathbf{G}}_{11}^{-1}h_{1}(Z_{i},\hat{\eta}),= overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Ξ· end_ARG ) ,
φ⁒(Zi,ΞΌ^0)πœ‘subscript𝑍𝑖subscript^πœ‡0\displaystyle\varphi(Z_{i},\hat{\mu}_{0})italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ΞΌ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =𝐆¯22βˆ’1⁒(h2⁒(Zi,ΞΌ0)+(βˆ’π†Β―21)⁒φ⁒(Zi,Ξ·^)),absentsuperscriptsubscript¯𝐆221subscriptβ„Ž2subscript𝑍𝑖subscriptπœ‡0subscript¯𝐆21πœ‘subscript𝑍𝑖^πœ‚\displaystyle=\overline{\mathbf{G}}_{22}^{-1}\left(h_{2}(Z_{i},\mu_{0})+(-% \overline{\mathbf{G}}_{21})\varphi(Z_{i},\hat{\eta})\right),= overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ( - overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT ) italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Ξ· end_ARG ) ) ,
φ⁒(Zi,ΞΌ^k)πœ‘subscript𝑍𝑖subscript^πœ‡π‘˜\displaystyle\varphi(Z_{i},\hat{\mu}_{k})italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ΞΌ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) =𝐆¯33βˆ’1⁒(h3⁒(Zi,ΞΌk)+(βˆ’π†Β―31)⁒φ⁒(Zi,Ξ·^))absentsuperscriptsubscript¯𝐆331subscriptβ„Ž3subscript𝑍𝑖subscriptπœ‡π‘˜subscript¯𝐆31πœ‘subscript𝑍𝑖^πœ‚\displaystyle=\overline{\mathbf{G}}_{33}^{-1}\left(h_{3}(Z_{i},\mu_{k})+(-% \overline{\mathbf{G}}_{31})\varphi(Z_{i},\hat{\eta})\right)= overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 33 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ΞΌ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + ( - overΒ― start_ARG bold_G end_ARG start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT ) italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_Ξ· end_ARG ) )

where Ξ·^^πœ‚\hat{\eta}over^ start_ARG italic_Ξ· end_ARG where obtained by ordinary least squares. Finally, we obtained the variance of 𝖼𝖠𝖳𝖀^IPWocsuperscriptsubscript^𝖼𝖠𝖳𝖀IPWoc\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}}over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT as,

V^⁒(𝖼𝖠𝖳𝖀^IPWoc)^𝑉superscriptsubscript^𝖼𝖠𝖳𝖀IPWoc\displaystyle\hat{V}(\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}})over^ start_ARG italic_V end_ARG ( over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT ) =1n⁒(1nβ’βˆ‘i=1nφ⁒(Zi,𝖼𝖠𝖳𝖀^IPWoc)⁒φ⁒(Zi,𝖼𝖠𝖳𝖀^IPWoc)⊀),absent1𝑛1𝑛superscriptsubscript𝑖1π‘›πœ‘subscript𝑍𝑖superscriptsubscript^𝖼𝖠𝖳𝖀IPWocπœ‘superscriptsubscript𝑍𝑖superscriptsubscript^𝖼𝖠𝖳𝖀IPWoctop\displaystyle=\frac{1}{n}\left(\frac{1}{n}\sum_{i=1}^{n}\varphi(Z_{i},\hat{% \mathsf{cATE}}_{\text{IPW}}^{\text{oc}})\varphi(Z_{i},\hat{\mathsf{cATE}}_{% \text{IPW}}^{\text{oc}})^{\top}\right),= divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ( divide start_ARG 1 end_ARG start_ARG italic_n end_ARG βˆ‘ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT ) italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊀ end_POSTSUPERSCRIPT ) ,

where φ⁒(Zi,𝖼𝖠𝖳𝖀^IPWoc)=φ⁒(Zi,ΞΌ^k)βˆ’Ο†β’(Zi,ΞΌ^0)πœ‘subscript𝑍𝑖superscriptsubscript^𝖼𝖠𝖳𝖀IPWocπœ‘subscript𝑍𝑖subscript^πœ‡π‘˜πœ‘subscript𝑍𝑖subscript^πœ‡0\varphi(Z_{i},\hat{\mathsf{cATE}}_{\text{IPW}}^{\text{oc}})=\varphi(Z_{i},\hat% {\mu}_{k})-\varphi(Z_{i},\hat{\mu}_{0})italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG sansserif_cATE end_ARG start_POSTSUBSCRIPT IPW end_POSTSUBSCRIPT start_POSTSUPERSCRIPT oc end_POSTSUPERSCRIPT ) = italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ΞΌ end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_Ο† ( italic_Z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_ΞΌ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ).

Proof of Theorem 3

We start by introducing some notation. We introduce an operator 𝖨π–₯:Οˆβ†’L2⁒(β„™):𝖨π–₯β†’πœ“subscript𝐿2β„™\mathsf{IF}:\psi\rightarrow L_{2}(\mathds{P})sansserif_IF : italic_ψ β†’ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_P ), where β„™β„™\mathds{P}blackboard_P is a probability distribution assumed to lie in some nonparametric model 𝒫𝒫\mathcal{P}caligraphic_P, that maps functionals ψ:𝒫→ℝ:πœ“β†’π’«β„\psi:\mathcal{P}\rightarrow\mathds{R}italic_ψ : caligraphic_P β†’ blackboard_R to their influence function φ⁒(z)∈L2⁒(β„™)πœ‘π‘§subscript𝐿2β„™\varphi(z)\in L_{2}(\mathds{P})italic_Ο† ( italic_z ) ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_P ) and where z𝑧zitalic_z is our observed data. Recall the following building blocks:

  • (bb1)

    the influence function of μ⁒(x)=𝖀⁒[Y|X=x]πœ‡π‘₯𝖀delimited-[]conditionalπ‘Œπ‘‹π‘₯\mu(x)=\mathsf{E}[Y|X=x]italic_ΞΌ ( italic_x ) = sansserif_E [ italic_Y | italic_X = italic_x ] is 𝖨π–₯⁒(μ⁒(x))=πŸ™β’[X=x]𝖯⁒(X=x)⁒(Yβˆ’π–€β’[Y|X=x])𝖨π–₯πœ‡π‘₯1delimited-[]𝑋π‘₯𝖯𝑋π‘₯π‘Œπ–€delimited-[]conditionalπ‘Œπ‘‹π‘₯\mathsf{IF}(\mu(x))=\frac{\mathds{1}[X=x]}{\mathsf{P}(X=x)}(Y-\mathsf{E}[Y|X=x])sansserif_IF ( italic_ΞΌ ( italic_x ) ) = divide start_ARG blackboard_1 [ italic_X = italic_x ] end_ARG start_ARG sansserif_P ( italic_X = italic_x ) end_ARG ( italic_Y - sansserif_E [ italic_Y | italic_X = italic_x ] )

  • (bb2)

    the influence function of 𝗉⁒(x)=𝖯⁒(X=x)𝗉π‘₯𝖯𝑋π‘₯\mathsf{p}(x)=\mathsf{P}(X=x)sansserif_p ( italic_x ) = sansserif_P ( italic_X = italic_x ) is 𝖨π–₯⁒(p⁒(x))=(πŸ™β’[X=x]βˆ’p⁒(x))𝖨π–₯𝑝π‘₯1delimited-[]𝑋π‘₯𝑝π‘₯\mathsf{IF}(p(x))=(\mathds{1}[X=x]-p(x))sansserif_IF ( italic_p ( italic_x ) ) = ( blackboard_1 [ italic_X = italic_x ] - italic_p ( italic_x ) )

  • (bb3)

    𝖨π–₯⁒(ψ1⁒ψ2)=𝖨π–₯⁒(ψ1)⁒ψ2+ψ1⁒𝖨π–₯⁒(ψ2)𝖨π–₯subscriptπœ“1subscriptπœ“2𝖨π–₯subscriptπœ“1subscriptπœ“2subscriptπœ“1𝖨π–₯subscriptπœ“2\mathsf{IF}(\psi_{1}\psi_{2})=\mathsf{IF}(\psi_{1})\psi_{2}+\psi_{1}\mathsf{IF% }(\psi_{2})sansserif_IF ( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = sansserif_IF ( italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sansserif_IF ( italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (product rule)

  • (bb4)

    𝖨π–₯⁒(f⁒(ψ))=Οˆβ€²β’π–¨π–₯⁒(ψ)𝖨π–₯π‘“πœ“superscriptπœ“β€²π–¨π–₯πœ“\mathsf{IF}(f(\psi))=\psi^{\prime}\mathsf{IF}(\psi)sansserif_IF ( italic_f ( italic_ψ ) ) = italic_ψ start_POSTSUPERSCRIPT β€² end_POSTSUPERSCRIPT sansserif_IF ( italic_ψ ) (chain rule)

  • (bb5)

    𝖯⁒(A,B,C)=𝖯⁒(A|B,C)⁒𝖯⁒(B,C)=𝖯⁒(A|B,C)⁒𝖯⁒(B|C)⁒𝖯⁒(C)𝖯𝐴𝐡𝐢𝖯conditional𝐴𝐡𝐢𝖯𝐡𝐢𝖯conditional𝐴𝐡𝐢𝖯conditional𝐡𝐢𝖯𝐢\mathsf{P}(A,B,C)=\mathsf{P}(A|B,C)\mathsf{P}(B,C)=\mathsf{P}(A|B,C)\mathsf{P}% (B|C)\mathsf{P}(C)sansserif_P ( italic_A , italic_B , italic_C ) = sansserif_P ( italic_A | italic_B , italic_C ) sansserif_P ( italic_B , italic_C ) = sansserif_P ( italic_A | italic_B , italic_C ) sansserif_P ( italic_B | italic_C ) sansserif_P ( italic_C )

  • (bb6)

    βˆ‘xπŸ™β’[A=k,X=x]=πŸ™β’[A=k]subscriptπ‘₯1delimited-[]formulae-sequenceπ΄π‘˜π‘‹π‘₯1delimited-[]π΄π‘˜\sum_{x}\mathds{1}[A=k,X=x]=\mathds{1}[A=k]βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT blackboard_1 [ italic_A = italic_k , italic_X = italic_x ] = blackboard_1 [ italic_A = italic_k ].

Finally, recall that the parameter of interest (under the aforementioned identification assumption) is

𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\displaystyle\mathsf{cATE}(k)sansserif_cATE ( italic_k ) =𝖀⁒[𝖀⁒[Y|A=k,Wβˆ’w,E=e,Vk=1]βˆ’π–€β’[Y|A=0,Wβˆ’w,E=e,Vk=1]∣Vk=1].absent𝖀delimited-[]𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄π‘˜π‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1conditional𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄0π‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1subscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}[\mathsf{E}[Y|A=k,W-w,E=e,V_{k}=1]-\mathsf{E}[Y|A=0,W-% w,E=e,V_{k}=1]\mid V_{k}=1].= sansserif_E [ sansserif_E [ italic_Y | italic_A = italic_k , italic_W - italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] - sansserif_E [ italic_Y | italic_A = 0 , italic_W - italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] .

while in the nonparametric model that assumes (A5) is

𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\displaystyle\mathsf{cATE}(k)sansserif_cATE ( italic_k ) =𝖀⁒[𝖀⁒[Y|A=k,Wβˆ’w,E=e,Vk=1]βˆ’π–€β’[Y|A=0,Wβˆ’w,E=e]∣Vk=1].absent𝖀delimited-[]𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄π‘˜π‘Šπ‘€formulae-sequence𝐸𝑒subscriptπ‘‰π‘˜1conditional𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄0π‘Šπ‘€πΈπ‘’subscriptπ‘‰π‘˜1\displaystyle=\mathsf{E}[\mathsf{E}[Y|A=k,W-w,E=e,V_{k}=1]-\mathsf{E}[Y|A=0,W-% w,E=e]\mid V_{k}=1].= sansserif_E [ sansserif_E [ italic_Y | italic_A = italic_k , italic_W - italic_w , italic_E = italic_e , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] - sansserif_E [ italic_Y | italic_A = 0 , italic_W - italic_w , italic_E = italic_e ] ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] .
Theorem 3, eq. (4).

We define (X=x)=(W=w,E=e)𝑋π‘₯formulae-sequenceπ‘Šπ‘€πΈπ‘’(X=x)=(W=w,E=e)( italic_X = italic_x ) = ( italic_W = italic_w , italic_E = italic_e ) and pretend that the data is discrete. Recall that under discrete data

𝖀⁒[𝖀⁒[Y|A=1,X=x,Vk=1]∣Vk=1]𝖀delimited-[]conditional𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄1formulae-sequence𝑋π‘₯subscriptπ‘‰π‘˜1subscriptπ‘‰π‘˜1\displaystyle\mathsf{E}[\mathsf{E}[Y|A=1,X=x,V_{k}=1]\mid V_{k}=1]sansserif_E [ sansserif_E [ italic_Y | italic_A = 1 , italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] =𝖀⁒[πŸ™β’[Vk=1]⁒𝖀⁒[Y|A=1,X=x,Vk=1]]P⁒(Vk=1)absent𝖀delimited-[]1delimited-[]subscriptπ‘‰π‘˜1𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄1formulae-sequence𝑋π‘₯subscriptπ‘‰π‘˜1𝑃subscriptπ‘‰π‘˜1\displaystyle=\frac{\mathsf{E}[\mathds{1}[V_{k}=1]\mathsf{E}[Y|A=1,X=x,V_{k}=1% ]]}{P(V_{k}=1)}= divide start_ARG sansserif_E [ blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] sansserif_E [ italic_Y | italic_A = 1 , italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] ] end_ARG start_ARG italic_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG
=βˆ‘xπŸ™β’[Vk=1]⁒𝖀⁒[Y|A=1,X=x,Vk=1]⁒P⁒(X=x)P⁒(Vk=1)absentsubscriptπ‘₯1delimited-[]subscriptπ‘‰π‘˜1𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄1formulae-sequence𝑋π‘₯subscriptπ‘‰π‘˜1𝑃𝑋π‘₯𝑃subscriptπ‘‰π‘˜1\displaystyle=\frac{\sum_{x}\mathds{1}[V_{k}=1]\mathsf{E}[Y|A=1,X=x,V_{k}=1]P(% X=x)}{P(V_{k}=1)}= divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] sansserif_E [ italic_Y | italic_A = 1 , italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_P ( italic_X = italic_x ) end_ARG start_ARG italic_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG
=βˆ‘xπŸ™β’[Vk=1]⁒μoc⁒(1,x,1)⁒p⁒(x)P⁒(Vk=1)absentsubscriptπ‘₯1delimited-[]subscriptπ‘‰π‘˜1subscriptπœ‡oc1π‘₯1𝑝π‘₯𝑃subscriptπ‘‰π‘˜1\displaystyle=\frac{\sum_{x}\mathds{1}[V_{k}=1]\mu_{\text{oc}}(1,x,1)p(x)}{P(V% _{k}=1)}= divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) italic_p ( italic_x ) end_ARG start_ARG italic_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG
=ψn⁒u⁒m1ψd⁒e⁒n=ψ1.absentsubscriptsuperscriptπœ“1π‘›π‘’π‘šsubscriptπœ“π‘‘π‘’π‘›superscriptπœ“1\displaystyle=\frac{\psi^{1}_{num}}{\psi_{den}}=\psi^{1}.= divide start_ARG italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG = italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT .

We now analyze the influence function of ψn⁒u⁒m1subscriptsuperscriptπœ“1π‘›π‘’π‘š\psi^{1}_{num}italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT,

φ⁒(Z,ψn⁒u⁒m1)≑𝖨π–₯⁒{ψn⁒u⁒m1}πœ‘π‘subscriptsuperscriptπœ“1π‘›π‘’π‘šπ–¨π–₯subscriptsuperscriptπœ“1π‘›π‘’π‘š\displaystyle\varphi(Z,\psi^{1}_{num})\equiv\mathsf{IF}\{\psi^{1}_{num}\}italic_Ο† ( italic_Z , italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT ) ≑ sansserif_IF { italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT }
=𝖨π–₯⁒{βˆ‘xπŸ™β’[Vk=1]⁒μoc⁒(1,x,1)⁒p⁒(x)}absent𝖨π–₯subscriptπ‘₯1delimited-[]subscriptπ‘‰π‘˜1subscriptπœ‡oc1π‘₯1𝑝π‘₯\displaystyle=\mathsf{IF}\{\sum_{x}\mathds{1}[V_{k}=1]\mu_{\text{oc}}(1,x,1)p(% x)\}= sansserif_IF { βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) italic_p ( italic_x ) }
=πŸ™β’[Vk=1]β’βˆ‘x[𝖨π–₯⁒{ΞΌoc⁒(1,x,1)}⁒p⁒(x)+μ⁒(1,x,1)⁒𝖨π–₯⁒{p⁒(x)}]absent1delimited-[]subscriptπ‘‰π‘˜1subscriptπ‘₯delimited-[]𝖨π–₯subscriptπœ‡oc1π‘₯1𝑝π‘₯πœ‡1π‘₯1𝖨π–₯𝑝π‘₯\displaystyle=\mathds{1}[V_{k}=1]\sum_{x}\left[\mathsf{IF}\{\mu_{\text{oc}}(1,% x,1)\}p(x)+\mu(1,x,1)\mathsf{IF}\{p(x)\}\right]= blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT [ sansserif_IF { italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) } italic_p ( italic_x ) + italic_ΞΌ ( 1 , italic_x , 1 ) sansserif_IF { italic_p ( italic_x ) } ] by (bb3)
=πŸ™[Vk=1]βˆ‘x[(πŸ™β’[A=1,X=x,Vk=1]p⁒(1,x,1){Yβˆ’ΞΌoc(1,x,1)})p(x)\displaystyle=\mathds{1}[V_{k}=1]\sum_{x}\left[\left(\frac{\mathds{1}[A=1,X=x,% V_{k}=1]}{p(1,x,1)}\{Y-\mu_{\text{oc}}(1,x,1)\}\right)\right.p(x)= blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT [ ( divide start_ARG blackboard_1 [ italic_A = 1 , italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] end_ARG start_ARG italic_p ( 1 , italic_x , 1 ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) } ) italic_p ( italic_x )
+ΞΌoc(1,x,1)(πŸ™[X=x]βˆ’p(x))]\displaystyle+\mu_{\text{oc}}(1,x,1)\left(\mathds{1}[X=x]-p(x)\right)\left.\right]+ italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) ( blackboard_1 [ italic_X = italic_x ] - italic_p ( italic_x ) ) ] by (bb1,bb2)
=πŸ™[Vk=1]βˆ‘x[(πŸ™β’[A=1,X=x,Vk=1]𝖯(A=1∣X=x,Vk=1)𝖯(Vk=1|X=x){Yβˆ’ΞΌoc(1,x,1)})\displaystyle=\mathds{1}[V_{k}=1]\sum_{x}\left[\left(\frac{\mathds{1}[A=1,X=x,% V_{k}=1]}{\mathsf{P}(A=1\mid X=x,V_{k}=1)\mathsf{P}(V_{k}=1|X=x)}\{Y-\mu_{% \text{oc}}(1,x,1)\}\right)\right.= blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT [ ( divide start_ARG blackboard_1 [ italic_A = 1 , italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] end_ARG start_ARG sansserif_P ( italic_A = 1 ∣ italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_X = italic_x ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) } )
+ΞΌoc(1,x,1)πŸ™[X=x]βˆ’ΞΌoc(1,x,1)p(x)]\displaystyle+\mu_{\text{oc}}(1,x,1)\mathds{1}[X=x]-\mu_{\text{oc}}(1,x,1)p(x)% \left.\right]+ italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) blackboard_1 [ italic_X = italic_x ] - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) italic_p ( italic_x ) ] by (bb5)
=πŸ™β’[Vk=1]⁒[(πŸ™β’[A=1]𝖯(A=1∣X=x,Vk=1)⁒{Yβˆ’ΞΌoc⁒(1,x,1)})+ΞΌoc⁒(1,x,1)]βˆ’Οˆn⁒u⁒m1\displaystyle=\mathds{1}[V_{k}=1]\left[\left(\frac{\mathds{1}[A=1]}{\mathsf{P}% (A=1\mid X=x,V_{k}=1)}\{Y-\mu_{\text{oc}}(1,x,1)\}\right)+\mu_{\text{oc}}(1,x,% 1)\right]-\psi^{1}_{num}= blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] [ ( divide start_ARG blackboard_1 [ italic_A = 1 ] end_ARG start_ARG sansserif_P ( italic_A = 1 ∣ italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) } ) + italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) ] - italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT by (bb6)

where in the last equality we also used the fact that 𝖯⁒(Vk=1|X=x)=1𝖯subscriptπ‘‰π‘˜conditional1𝑋π‘₯1\mathsf{P}(V_{k}=1|X=x)=1sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 | italic_X = italic_x ) = 1 under πŸ™β’[Vk=1]1delimited-[]subscriptπ‘‰π‘˜1\mathds{1}[V_{k}=1]blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ], and ψn⁒u⁒m1=βˆ‘xπŸ™β’[Vk=1]⁒μoc⁒(1,x,1)⁒p⁒(x)subscriptsuperscriptπœ“1π‘›π‘’π‘šsubscriptπ‘₯1delimited-[]subscriptπ‘‰π‘˜1subscriptπœ‡oc1π‘₯1𝑝π‘₯\psi^{1}_{num}=\sum_{x}\mathds{1}[V_{k}=1]\mu_{\text{oc}}(1,x,1)p(x)italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT = βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) italic_p ( italic_x ). Analogously we can compute the influence function of ψn⁒u⁒m0subscriptsuperscriptπœ“0π‘›π‘’π‘š\psi^{0}_{num}italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT,

φ⁒(Z,ψn⁒u⁒m0)≑𝖨π–₯⁒{ψn⁒u⁒m0}πœ‘π‘subscriptsuperscriptπœ“0π‘›π‘’π‘šπ–¨π–₯subscriptsuperscriptπœ“0π‘›π‘’π‘š\displaystyle\varphi(Z,\psi^{0}_{num})\equiv\mathsf{IF}\{\psi^{0}_{num}\}italic_Ο† ( italic_Z , italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT ) ≑ sansserif_IF { italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT }
=𝖨π–₯⁒{βˆ‘xπŸ™β’[Vk=1]⁒μoc⁒(0,x,1)⁒p⁒(x)}absent𝖨π–₯subscriptπ‘₯1delimited-[]subscriptπ‘‰π‘˜1subscriptπœ‡oc0π‘₯1𝑝π‘₯\displaystyle=\mathsf{IF}\{\sum_{x}\mathds{1}[V_{k}=1]\mu_{\text{oc}}(0,x,1)p(% x)\}= sansserif_IF { βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 0 , italic_x , 1 ) italic_p ( italic_x ) }
=πŸ™β’[Vk=1]⁒[(πŸ™β’[A=0]𝖯(A=0∣X=x,Vk=1)⁒{Yβˆ’ΞΌoc⁒(0,x,1)})+ΞΌoc⁒(0,x,1)]βˆ’Οˆn⁒u⁒m0\displaystyle=\mathds{1}[V_{k}=1]\left[\left(\frac{\mathds{1}[A=0]}{\mathsf{P}% (A=0\mid X=x,V_{k}=1)}\{Y-\mu_{\text{oc}}(0,x,1)\}\right)+\mu_{\text{oc}}(0,x,% 1)\right]-\psi^{0}_{num}= blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] [ ( divide start_ARG blackboard_1 [ italic_A = 0 ] end_ARG start_ARG sansserif_P ( italic_A = 0 ∣ italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 0 , italic_x , 1 ) } ) + italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 0 , italic_x , 1 ) ] - italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT

We no compute the influence function of ψd⁒e⁒nsubscriptπœ“π‘‘π‘’π‘›\psi_{den}italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT,

φ⁒(Z,ψd⁒e⁒n)≑𝖨π–₯⁒{ψd⁒e⁒n}=πŸ™β’[Vk=1]βˆ’Οˆd⁒e⁒nπœ‘π‘subscriptπœ“π‘‘π‘’π‘›π–¨π–₯subscriptπœ“π‘‘π‘’π‘›1delimited-[]subscriptπ‘‰π‘˜1subscriptπœ“π‘‘π‘’π‘›\displaystyle\varphi(Z,\psi_{den})\equiv\mathsf{IF}\{\psi_{den}\}=\mathds{1}[V% _{k}=1]-\psi_{den}italic_Ο† ( italic_Z , italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT ) ≑ sansserif_IF { italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT } = blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] - italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT

We no consider the influence function of ψn⁒u⁒m1ψd⁒e⁒n=ψ1subscriptsuperscriptπœ“1π‘›π‘’π‘šsubscriptπœ“π‘‘π‘’π‘›superscriptπœ“1\frac{\psi^{1}_{num}}{\psi_{den}}=\psi^{1}divide start_ARG italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG = italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT,

φ⁒(Z,ψ1)≑𝖨π–₯⁒{ψ1}=𝖨π–₯⁒{ψn⁒u⁒m1}ψd⁒e⁒nβˆ’Οˆn⁒u⁒m1ψd⁒e⁒n⁒𝖨π–₯⁒{ψd⁒e⁒n}ψd⁒e⁒nπœ‘π‘superscriptπœ“1𝖨π–₯superscriptπœ“1𝖨π–₯subscriptsuperscriptπœ“1π‘›π‘’π‘šsubscriptπœ“π‘‘π‘’π‘›subscriptsuperscriptπœ“1π‘›π‘’π‘šsubscriptπœ“π‘‘π‘’π‘›π–¨π–₯subscriptπœ“π‘‘π‘’π‘›subscriptπœ“π‘‘π‘’π‘›\displaystyle\varphi(Z,\psi^{1})\equiv\mathsf{IF}\{\psi^{1}\}=\frac{\mathsf{IF% }\{\psi^{1}_{num}\}}{\psi_{den}}-\frac{\psi^{1}_{num}}{\psi_{den}}\frac{% \mathsf{IF}\{\psi_{den}\}}{\psi_{den}}italic_Ο† ( italic_Z , italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) ≑ sansserif_IF { italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT } = divide start_ARG sansserif_IF { italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT } end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG divide start_ARG sansserif_IF { italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT } end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG
=1ψd⁒e⁒n⁒[𝖨π–₯⁒{ψn⁒u⁒m1}βˆ’Οˆn⁒u⁒m1ψd⁒e⁒n⁒𝖨π–₯⁒{ψd⁒e⁒n}]absent1subscriptπœ“π‘‘π‘’π‘›delimited-[]𝖨π–₯subscriptsuperscriptπœ“1π‘›π‘’π‘šsubscriptsuperscriptπœ“1π‘›π‘’π‘šsubscriptπœ“π‘‘π‘’π‘›π–¨π–₯subscriptπœ“π‘‘π‘’π‘›\displaystyle=\frac{1}{\psi_{den}}\left[\mathsf{IF}\{\psi^{1}_{num}\}-\frac{% \psi^{1}_{num}}{\psi_{den}}\mathsf{IF}\{\psi_{den}\}\right]= divide start_ARG 1 end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG [ sansserif_IF { italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT } - divide start_ARG italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG sansserif_IF { italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT } ]
=1𝖯⁒(Vk=1)[(πŸ™[Vk=1][(πŸ™β’[A=1]𝖯(A=1∣X=x,Vk=1){Yβˆ’ΞΌoc(1,x,1)})+ΞΌoc(1,x,1)]βˆ’Οˆn⁒u⁒m1)\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\left[\left(\mathds{1}[V_{k}=1]% \left[\left(\frac{\mathds{1}[A=1]}{\mathsf{P}(A=1\mid X=x,V_{k}=1)}\{Y-\mu_{% \text{oc}}(1,x,1)\}\right)+\mu_{\text{oc}}(1,x,1)\right]-\psi^{1}_{num}\right)\right.= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] [ ( divide start_ARG blackboard_1 [ italic_A = 1 ] end_ARG start_ARG sansserif_P ( italic_A = 1 ∣ italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) } ) + italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) ] - italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT )
βˆ’Οˆn⁒u⁒m1ψd⁒e⁒n(πŸ™[Vk=1]βˆ’Οˆd⁒e⁒n)]\displaystyle-\left.\frac{\psi^{1}_{num}}{\psi_{den}}(\mathds{1}[V_{k}=1]-\psi% _{den})\right]- divide start_ARG italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] - italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT ) ]
=1𝖯⁒(Vk=1)[(πŸ™[Vk=1][(πŸ™β’[A=1]𝖯(A=1∣X=x,Vk=1){Yβˆ’ΞΌoc(1,x,1)})+ΞΌoc(1,x,1)]βˆ’Οˆn⁒u⁒m1)\displaystyle=\frac{1}{\mathsf{P}(V_{k}=1)}\left[\left(\mathds{1}[V_{k}=1]% \left[\left(\frac{\mathds{1}[A=1]}{\mathsf{P}(A=1\mid X=x,V_{k}=1)}\{Y-\mu_{% \text{oc}}(1,x,1)\}\right)+\mu_{\text{oc}}(1,x,1)\right]-\psi^{1}_{num}\right)\right.= divide start_ARG 1 end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] [ ( divide start_ARG blackboard_1 [ italic_A = 1 ] end_ARG start_ARG sansserif_P ( italic_A = 1 ∣ italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) } ) + italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) ] - italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT )
βˆ’Οˆn⁒u⁒m1ψd⁒e⁒nπŸ™[Vk=1]+ψn⁒u⁒m1ψd⁒e⁒nψd⁒e⁒n]\displaystyle-\left.\frac{\psi^{1}_{num}}{\psi_{den}}\mathds{1}[V_{k}=1]+\frac% {\psi^{1}_{num}}{\psi_{den}}\psi_{den}\right]- divide start_ARG italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] + divide start_ARG italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT ]
=πŸ™β’[Vk=1]𝖯⁒(Vk=1)⁒[πŸ™β’[A=1]𝖯(A=1∣X=x,Vk=1)⁒{Yβˆ’ΞΌoc⁒(1,x,1)}+ΞΌoc⁒(1,x,1)βˆ’Οˆ1]\displaystyle=\frac{\mathds{1}[V_{k}=1]}{\mathsf{P}(V_{k}=1)}\left[\frac{% \mathds{1}[A=1]}{\mathsf{P}(A=1\mid X=x,V_{k}=1)}\{Y-\mu_{\text{oc}}(1,x,1)\}+% \mu_{\text{oc}}(1,x,1)-\psi^{1}\right]= divide start_ARG blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ divide start_ARG blackboard_1 [ italic_A = 1 ] end_ARG start_ARG sansserif_P ( italic_A = 1 ∣ italic_X = italic_x , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) } + italic_ΞΌ start_POSTSUBSCRIPT oc end_POSTSUBSCRIPT ( 1 , italic_x , 1 ) - italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ]

We can now combine ψn⁒u⁒m1βˆ’Οˆn⁒u⁒m0ψd⁒e⁒n=𝖼𝖠𝖳𝖀⁒(k)subscriptsuperscriptπœ“1π‘›π‘’π‘šsubscriptsuperscriptπœ“0π‘›π‘’π‘šsubscriptπœ“π‘‘π‘’π‘›π–Όπ– π–³π–€π‘˜\frac{\psi^{1}_{num}-\psi^{0}_{num}}{\psi_{den}}=\mathsf{cATE}(k)divide start_ARG italic_ψ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT - italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG = sansserif_cATE ( italic_k ) to obtain

φ⁒(Z,𝖼𝖠𝖳𝖀⁒(k))≑𝖨π–₯⁒{𝖼𝖠𝖳𝖀⁒(k)}πœ‘π‘π–Όπ– π–³π–€π‘˜π–¨π–₯π–Όπ– π–³π–€π‘˜\displaystyle\varphi(Z,\mathsf{cATE}(k))\equiv\mathsf{IF}\{\mathsf{cATE}(k)\}italic_Ο† ( italic_Z , sansserif_cATE ( italic_k ) ) ≑ sansserif_IF { sansserif_cATE ( italic_k ) } =πŸ™β’{Vk=1}𝖯⁒(Vk=1)[2⁒Aβˆ’1𝖯⁒(A∣W,E,Vk=1){Yβˆ’π–€(Y∣A,W,E,Vk=1)}\displaystyle=\frac{\mathds{1}{\{V_{k}=1\}}}{\mathsf{P}(V_{k}=1)}\bigg{[}\frac% {2A-1}{\mathsf{P}(A\mid W,E,V_{k}=1)}\{Y-\mathsf{E}(Y\mid A,W,E,V_{k}=1)\}= divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 } end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ divide start_ARG 2 italic_A - 1 end_ARG start_ARG sansserif_P ( italic_A ∣ italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - sansserif_E ( italic_Y ∣ italic_A , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) }
+𝖀(Y∣A=1,W,E,Vk=1)βˆ’π–€(Y∣A=0,W,E,Vk=1)βˆ’π–Όπ– π–³π–€(k).]\displaystyle+\mathsf{E}(Y\mid A=1,W,E,V_{k}=1)-\mathsf{E}(Y\mid A=0,W,E,V_{k}% =1)-\mathsf{cATE}(k).\bigg{]}+ sansserif_E ( italic_Y ∣ italic_A = 1 , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_cATE ( italic_k ) . ]
Theorem 3, eq. (5).

Under assumption (A5), we now target (among controls),

𝖀⁒[𝖀⁒[Y|A=0,X=x]∣Vk=1]𝖀delimited-[]conditional𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄0𝑋π‘₯subscriptπ‘‰π‘˜1\displaystyle\mathsf{E}[\mathsf{E}[Y|A=0,X=x]\mid V_{k}=1]sansserif_E [ sansserif_E [ italic_Y | italic_A = 0 , italic_X = italic_x ] ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] =𝖀⁒[πŸ™β’[Vk=1]⁒𝖀⁒[Y|A=0,X=x]]P⁒(Vk=1)absent𝖀delimited-[]1delimited-[]subscriptπ‘‰π‘˜1𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄0𝑋π‘₯𝑃subscriptπ‘‰π‘˜1\displaystyle=\frac{\mathsf{E}[\mathds{1}[V_{k}=1]\mathsf{E}[Y|A=0,X=x]]}{P(V_% {k}=1)}= divide start_ARG sansserif_E [ blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] sansserif_E [ italic_Y | italic_A = 0 , italic_X = italic_x ] ] end_ARG start_ARG italic_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG
=𝖀⁒[𝖀⁒[πŸ™β’[Vk=1]∣X=x]⁒𝖀⁒[Y|A=0,X=x]]P⁒(Vk=1)absent𝖀delimited-[]𝖀delimited-[]conditional1delimited-[]subscriptπ‘‰π‘˜1𝑋π‘₯𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄0𝑋π‘₯𝑃subscriptπ‘‰π‘˜1\displaystyle=\frac{\mathsf{E}[\mathsf{E}[\mathds{1}[V_{k}=1]\mid X=x]\mathsf{% E}[Y|A=0,X=x]]}{P(V_{k}=1)}= divide start_ARG sansserif_E [ sansserif_E [ blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] ∣ italic_X = italic_x ] sansserif_E [ italic_Y | italic_A = 0 , italic_X = italic_x ] ] end_ARG start_ARG italic_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG
=βˆ‘x𝖯⁒(Vk=1∣X=x)⁒𝖀⁒[Y|A=0,X=x]⁒P⁒(X=x)P⁒(Vk=1)absentsubscriptπ‘₯𝖯subscriptπ‘‰π‘˜conditional1𝑋π‘₯𝖀delimited-[]formulae-sequenceconditionalπ‘Œπ΄0𝑋π‘₯𝑃𝑋π‘₯𝑃subscriptπ‘‰π‘˜1\displaystyle=\frac{\sum_{x}\mathsf{P}(V_{k}=1\mid X=x)\mathsf{E}[Y|A=0,X=x]P(% X=x)}{P(V_{k}=1)}= divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ∣ italic_X = italic_x ) sansserif_E [ italic_Y | italic_A = 0 , italic_X = italic_x ] italic_P ( italic_X = italic_x ) end_ARG start_ARG italic_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG
=βˆ‘xν⁒(wi,ei)⁒μall⁒(1,x)⁒p⁒(x)P⁒(Vk=1)absentsubscriptπ‘₯𝜈subscript𝑀𝑖subscript𝑒𝑖subscriptπœ‡all1π‘₯𝑝π‘₯𝑃subscriptπ‘‰π‘˜1\displaystyle=\frac{\sum_{x}\nu(w_{i},e_{i})\mu_{\text{all}}(1,x)p(x)}{P(V_{k}% =1)}= divide start_ARG βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_Ξ½ ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 1 , italic_x ) italic_p ( italic_x ) end_ARG start_ARG italic_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG
=ψn⁒u⁒m0ψd⁒e⁒n=ψ0.absentsubscriptsuperscriptπœ“0π‘›π‘’π‘šsubscriptπœ“π‘‘π‘’π‘›superscriptπœ“0\displaystyle=\frac{\psi^{0}_{num}}{\psi_{den}}=\psi^{0}.= divide start_ARG italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG = italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT .

The influence function of ψn⁒u⁒m0subscriptsuperscriptπœ“0π‘›π‘’π‘š\psi^{0}_{num}italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT is

φ⁒(Z,ψn⁒u⁒m0)≑𝖨π–₯⁒{ψn⁒u⁒m0}πœ‘π‘subscriptsuperscriptπœ“0π‘›π‘’π‘šπ–¨π–₯subscriptsuperscriptπœ“0π‘›π‘’π‘š\displaystyle\varphi(Z,\psi^{0}_{num})\equiv\mathsf{IF}\{\psi^{0}_{num}\}italic_Ο† ( italic_Z , italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT ) ≑ sansserif_IF { italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT }
=βˆ‘x[πŸ™β’[X=x]𝖯⁒(X=x)(πŸ™[Vk=1]βˆ’Ξ½(x))ΞΌall(0,x)p(x)\displaystyle=\sum_{x}\left[\frac{\mathds{1}[X=x]}{\mathsf{P}(X=x)}(\mathds{1}% [V_{k}=1]-\nu(x))\mu_{\text{all}}(0,x)p(x)\right.= βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT [ divide start_ARG blackboard_1 [ italic_X = italic_x ] end_ARG start_ARG sansserif_P ( italic_X = italic_x ) end_ARG ( blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] - italic_Ξ½ ( italic_x ) ) italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) italic_p ( italic_x )
+ν⁒(x)⁒(πŸ™β’[A=0,X=x]𝖯⁒(A=0∣X=x)⁒𝖯⁒(X=x)⁒{Yβˆ’ΞΌall⁒(0,x)})⁒p⁒(x)𝜈π‘₯1delimited-[]formulae-sequence𝐴0𝑋π‘₯𝖯𝐴conditional0𝑋π‘₯𝖯𝑋π‘₯π‘Œsubscriptπœ‡all0π‘₯𝑝π‘₯\displaystyle+\nu(x)\left(\frac{\mathds{1}[A=0,X=x]}{\mathsf{P}(A=0\mid X=x)% \mathsf{P}(X=x)}\{Y-\mu_{\text{all}}(0,x)\}\right)p(x)+ italic_Ξ½ ( italic_x ) ( divide start_ARG blackboard_1 [ italic_A = 0 , italic_X = italic_x ] end_ARG start_ARG sansserif_P ( italic_A = 0 ∣ italic_X = italic_x ) sansserif_P ( italic_X = italic_x ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) } ) italic_p ( italic_x )
+Ξ½(x)ΞΌall(0,x)(πŸ™[X=x]βˆ’p(x))]\displaystyle+\nu(x)\mu_{\text{all}}(0,x)(\mathds{1}[X=x]-p(x))\left.\frac{}{}\right]+ italic_Ξ½ ( italic_x ) italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) ( blackboard_1 [ italic_X = italic_x ] - italic_p ( italic_x ) ) divide start_ARG end_ARG start_ARG end_ARG ]
=βˆ‘x[πŸ™β’[X=x]β’πŸ™β’[Vk=1]𝖯⁒(X=x)ΞΌall(0,x)p(x)βˆ’πŸ™β’[X=x]𝖯⁒(X=x)Ξ½(x)ΞΌall(0,x)p(x)\displaystyle=\sum_{x}\left[\frac{\mathds{1}[X=x]\mathds{1}[V_{k}=1]}{\mathsf{% P}(X=x)}\mu_{\text{all}}(0,x)p(x)-\frac{\mathds{1}[X=x]}{\mathsf{P}(X=x)}\nu(x% )\mu_{\text{all}}(0,x)p(x)\right.= βˆ‘ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT [ divide start_ARG blackboard_1 [ italic_X = italic_x ] blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] end_ARG start_ARG sansserif_P ( italic_X = italic_x ) end_ARG italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) italic_p ( italic_x ) - divide start_ARG blackboard_1 [ italic_X = italic_x ] end_ARG start_ARG sansserif_P ( italic_X = italic_x ) end_ARG italic_Ξ½ ( italic_x ) italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) italic_p ( italic_x )
+ν⁒(x)⁒(πŸ™β’[A=0,X=x]𝖯⁒(A=0∣X=x)⁒𝖯⁒(X=x)⁒{Yβˆ’ΞΌall⁒(0,x)})⁒p⁒(x)𝜈π‘₯1delimited-[]formulae-sequence𝐴0𝑋π‘₯𝖯𝐴conditional0𝑋π‘₯𝖯𝑋π‘₯π‘Œsubscriptπœ‡all0π‘₯𝑝π‘₯\displaystyle+\nu(x)\left(\frac{\mathds{1}[A=0,X=x]}{\mathsf{P}(A=0\mid X=x)% \mathsf{P}(X=x)}\{Y-\mu_{\text{all}}(0,x)\}\right)p(x)+ italic_Ξ½ ( italic_x ) ( divide start_ARG blackboard_1 [ italic_A = 0 , italic_X = italic_x ] end_ARG start_ARG sansserif_P ( italic_A = 0 ∣ italic_X = italic_x ) sansserif_P ( italic_X = italic_x ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) } ) italic_p ( italic_x )
+πŸ™[X=x]Ξ½(x)ΞΌall(0,x)βˆ’Ξ½(x)ΞΌall(0,x)p(x))]\displaystyle+\mathds{1}[X=x]\nu(x)\mu_{\text{all}}(0,x)-\nu(x)\mu_{\text{all}% }(0,x)p(x))\left.\frac{}{}\right]+ blackboard_1 [ italic_X = italic_x ] italic_Ξ½ ( italic_x ) italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) - italic_Ξ½ ( italic_x ) italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) italic_p ( italic_x ) ) divide start_ARG end_ARG start_ARG end_ARG ]
=πŸ™β’[Vk=1]⁒μall⁒(0,x)+ν⁒(x)⁒(πŸ™β’[A=0]𝖯⁒(A=0∣X=x)⁒{Yβˆ’ΞΌall⁒(0,x)})βˆ’Οˆn⁒u⁒m0absent1delimited-[]subscriptπ‘‰π‘˜1subscriptπœ‡all0π‘₯𝜈π‘₯1delimited-[]𝐴0𝖯𝐴conditional0𝑋π‘₯π‘Œsubscriptπœ‡all0π‘₯subscriptsuperscriptπœ“0π‘›π‘’π‘š\displaystyle=\mathds{1}[V_{k}=1]\mu_{\text{all}}(0,x)+\nu(x)\left(\frac{% \mathds{1}[A=0]}{\mathsf{P}(A=0\mid X=x)}\{Y-\mu_{\text{all}}(0,x)\}\right)-% \psi^{0}_{num}= blackboard_1 [ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ] italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) + italic_Ξ½ ( italic_x ) ( divide start_ARG blackboard_1 [ italic_A = 0 ] end_ARG start_ARG sansserif_P ( italic_A = 0 ∣ italic_X = italic_x ) end_ARG { italic_Y - italic_ΞΌ start_POSTSUBSCRIPT all end_POSTSUBSCRIPT ( 0 , italic_x ) } ) - italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT

As shown before, we can then compute the influence function of ψn⁒u⁒m0ψd⁒e⁒nsubscriptsuperscriptπœ“0π‘›π‘’π‘šsubscriptπœ“π‘‘π‘’π‘›\frac{\psi^{0}_{num}}{\psi_{den}}divide start_ARG italic_ψ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n italic_u italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_ψ start_POSTSUBSCRIPT italic_d italic_e italic_n end_POSTSUBSCRIPT end_ARG, and finally of 𝖼𝖠𝖳𝖀⁒(k)π–Όπ– π–³π–€π‘˜\mathsf{cATE}(k)sansserif_cATE ( italic_k ) under assumption (A5), leading to,

φ⁒(Z,𝖼𝖠𝖳𝖀⁒(k))≑𝖨π–₯⁒{𝖼𝖠𝖳𝖀⁒(k)}πœ‘π‘π–Όπ– π–³π–€π‘˜π–¨π–₯π–Όπ– π–³π–€π‘˜\displaystyle\varphi(Z,\mathsf{cATE}(k))\equiv\mathsf{IF}\{\mathsf{cATE}(k)\}italic_Ο† ( italic_Z , sansserif_cATE ( italic_k ) ) ≑ sansserif_IF { sansserif_cATE ( italic_k ) } =πŸ™β’{Vk=1}𝖯⁒(Vk=1)⁒[A𝖯⁒(A∣Vk=1,W,E)⁒{Yβˆ’π–€β’(Y∣A,W,E,Vk=1)}]absent1subscriptπ‘‰π‘˜1𝖯subscriptπ‘‰π‘˜1delimited-[]𝐴𝖯conditional𝐴subscriptπ‘‰π‘˜1π‘ŠπΈπ‘Œπ–€conditionalπ‘Œπ΄π‘ŠπΈsubscriptπ‘‰π‘˜1\displaystyle=\frac{\mathds{1}{\{V_{k}=1\}}}{\mathsf{P}(V_{k}=1)}\bigg{[}\frac% {A}{\mathsf{P}(A\mid V_{k}=1,W,E)}\{Y-\mathsf{E}(Y\mid A,W,E,V_{k}=1)\}\bigg{]}= divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 } end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ divide start_ARG italic_A end_ARG start_ARG sansserif_P ( italic_A ∣ italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 , italic_W , italic_E ) end_ARG { italic_Y - sansserif_E ( italic_Y ∣ italic_A , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) } ]
βˆ’1βˆ’A𝖯⁒(A∣W,E)⁒𝖯⁒(Vk=1∣E,W)𝖯⁒(Vk=1)⁒{Yβˆ’π–€β’(Y∣A,E,W)}1𝐴𝖯conditionalπ΄π‘ŠπΈπ–―subscriptπ‘‰π‘˜conditional1πΈπ‘Šπ–―subscriptπ‘‰π‘˜1π‘Œπ–€conditionalπ‘Œπ΄πΈπ‘Š\displaystyle-\frac{1-A}{\mathsf{P}(A\mid W,E)}\frac{\mathsf{P}(V_{k}=1\mid E,% W)}{\mathsf{P}(V_{k}=1)}\{Y-\mathsf{E}(Y\mid A,E,W)\}- divide start_ARG 1 - italic_A end_ARG start_ARG sansserif_P ( italic_A ∣ italic_W , italic_E ) end_ARG divide start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ∣ italic_E , italic_W ) end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG { italic_Y - sansserif_E ( italic_Y ∣ italic_A , italic_E , italic_W ) }
+πŸ™β’{Vk=1}𝖯⁒(Vk=1)⁒[𝖀⁒(Y∣A=1,W,E,Vk=1)βˆ’π–€β’(Y∣A=0,W,E)βˆ’π–Όπ– π–³π–€β’(k)]1subscriptπ‘‰π‘˜1𝖯subscriptπ‘‰π‘˜1delimited-[]𝖀formulae-sequenceconditionalπ‘Œπ΄1π‘ŠπΈsubscriptπ‘‰π‘˜1𝖀conditionalπ‘Œπ΄0π‘ŠπΈπ–Όπ– π–³π–€π‘˜\displaystyle+\frac{\mathds{1}{\{V_{k}=1\}}}{\mathsf{P}(V_{k}=1)}\Big{[}% \mathsf{E}(Y\mid A=1,W,E,V_{k}=1)-\mathsf{E}(Y\mid A=0,W,E)-\mathsf{cATE}(k)% \Big{]}+ divide start_ARG blackboard_1 { italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 } end_ARG start_ARG sansserif_P ( italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) end_ARG [ sansserif_E ( italic_Y ∣ italic_A = 1 , italic_W , italic_E , italic_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 ) - sansserif_E ( italic_Y ∣ italic_A = 0 , italic_W , italic_E ) - sansserif_cATE ( italic_k ) ]