Causal Effects of Intervening Variables in Settings With Unmeasured Confounding
Causal Effects of Intervening Variables in Settings With Unmeasured Confounding
Causal Effects of Intervening Variables in Settings With Unmeasured Confounding
Abstract
We present new results on average causal effects in settings with unmeasured exposure-
outcome confounding. Our results are motivated by a class of estimands, e.g., frequently
of interest in medicine and public health, that are currently not targeted by standard ap-
proaches for average causal effects. We recognize these estimands as queries about the
average causal effect of an intervening variable. We anchor our introduction of these es-
timands in an investigation of the role of chronic pain and opioid prescription patterns,
and illustrate how conventional approaches will lead to non-replicable estimates with am-
biguous policy implications. We argue that our alternative effects are replicable and have
clear policy implications, and furthermore are non-parametrically identified by the classi-
cal frontdoor formula. As an independent contribution, we derive a new semiparametric
efficient estimator of the frontdoor formula with a uniform sample boundedness guarantee.
This property is unique among previously-described estimators in its class, and we demon-
strate superior performance in finite-sample settings. The theoretical results are applied to
data from the National Health and Nutrition Examination Survey.
1. Introduction
Unmeasured confounding and ill-defined interventions are major contributing factors to the
replication crisis for policy-relevant parameters, e.g., in medical research. When there is
unmeasured confounding, standard covariate-adjustment approaches will lead to biases that
would likely differ across studies. Further, when an analysis is based on variables that do
not correspond to well-defined interventions, it is nearly impossible for future analyses and
experiments to ensure that these variables are operationalized identically with the original
study. In each case, replication is infeasible.
To counteract these challenges, there exists a diverse set of strategies to confront unmeasured
confounding between exposure and outcome, which leverage the measurement of auxiliary
variables, including instruments and other proxies (Angrist et al., 1996; Lipsitch et al., 2010;
Tchetgen Tchetgen et al., 2020), as well as mediators (Pearl, 2009; Fulcher et al., 2020). At
the same time, there is a long history of calls for an “interventionist” approach to causal
analyses in statistics (Holland, 1986; Dawid, 2000; Richardson and Robins, 2013; Robins
et al., 2022) wherein investigators focus on the effects of manipulable, or intervenable, vari-
ables. Use of such variables ensures that the targets of inference are clearly defined by
interventions that can be implemented in principle (see, e.g., Hernán, 2005; Hernán and
VanderWeele, 2011; Galea, 2013). Furthermore, the seemingly disparate challenges of ill-
defined interventions and unmeasured exposure-outcome confounding are often considered
separately, but systematically co-occur in practice: when an exposure variable does not cor-
respond to a well-defined intervention, investigators will often face exceptional challenges in
sufficiently describing and measuring the causes of that exposure. In such cases, investiga-
tors will often also have little confidence in the assumption of no unmeasured confounding
(Hernán and Taubman, 2008).
In this article we consider jointly the twin challenges of ill-defined interventions and un-
measured confounding. Building on new results for a generalized theory of separable effects
(Robins and Richardson, 2010; Shpitser and Sherman, 2018; Robins et al., 2022; Stensrud
et al., 2021), our contributions concern effects of an intervening variable: a manipulable de-
scendant of a (possibly ill-defined) exposure (or treatment) that precedes the outcome. We
argue that such average causal effects, rather than those of (possibly ill-defined) exposures,
are frequently of interest in practice. We give results on their interpretation, identifica-
tion and estimation. In doing so, we develop a novel semiparametric efficient estimator
for the canonical front-door functional with superior finite sample performance properties
compared with existing strategies.
Our results are related to previous work on identification of causal effects in the presence
of unmeasured confounding. As we expound in Section 3, the causal effect of an interven-
ing variable may be identified by the frontdoor formula, which coincidentally also allows
non-parametric identification of the average causal effect (ACE) of the exposure on the
2
Causal effects of intervening variables in settings with unmeasured confounding
Our results are also related to the work by Fulcher et al. (2020), who gave conditions under
which the frontdoor formula identifies the so-called Population Intervention Indirect Effect
(PIIE). Fulcher et al. (2020) interpreted the PIIE as a “contrast between the observed
outcome mean for the population and the population outcome mean if contrary to fact the
mediator had taken the value that it would have in the absence of exposure [page 200].”
Thus, unlike the conventional identification result for the ACE, the frontdoor formula can
be used to identify the PIIE even in the presence of a direct effect of the exposure on
the outcome not mediated by an intermediate variable (or intermediate variables). We
distinguish the assumptions for the two aforementioned estimands in Section 4. To fix ideas
about the relation to previous work on the frontdoor formula, we introduce the following
running example.
Example 1 (Chronic pain and opioid use, Inoue et al., 2022) Chronic pain is asso-
ciated with use and abuse of opioids, which subsequently can lead to death. Moreover,
chronic pain can affect mortality outside of its effect on opioid use (Dowell et al., 2016,
2022), e.g., by causing long-term stress and undesirable lifestyle changes. Inoue et al.
(2022) studied the effect of chronic pain (exposure to pain versus no pain) on mortality
(outcome) mediated by opioid use. Chronic pain is notoriously treatment-resistant, and
unmeasured confounders between chronic pain and mortality may include social, physio-
logic, and psychological factors (Inoue et al., 2022). Using data from the National Health
and Nutrition Examination Survey (NHANES) from 1999–2004 with linkage to mortality
databases through 2015, the investigators studied a causal effect related to the PIIE. Specif-
ically, Inoue et al. (2022) considered a “path-specific frontdoor effect” that they interpreted
as “the change in potential outcomes that follows a change in the mediator (opioid) which
was caused by changing the exposure.”
As indicated by Inoue et al. (2022) there likely exist unmeasured common causes of the
exposure and the outcome in Example 1. Relatedly, it is unclear how one could intervene
on chronic pain, or even whether any intervention on chronic pain could be well-defined.
Therefore, the interpretation of the PIIE in our example has dubious public health impli-
cations (Holland, 1986). In contrast, we suggest effects of intervening variables, which do
correspond to interventions that are feasible to implement in practice. To concretely moti-
vate our intervention, consider a doctor who determines if a patient should receive opioids
to relieve symptoms. As suggested in Inoue et al. (2022), the doctor’s decision to prescribe
opioids could be determined by their patient’s chronic pain status. Our intervention is
motivated by current guidelines of the Centers for Disease Control and Prevention (CDC)
3
Intervening variables and unmeasured confounding
(Dowell et al., 2016, 2022; Inoue et al., 2022), which suggest that chronic pain status should
no longer be used to determine opioid prescription. Thus, our interest is in evaluating the
efficacy of a modified prescription policy (the policy-relevant intervention of interest) such
that the doctor regards their patient as not having chronic pain in their decisions on opioid
prescriptions.
A realistic and implementable example of such a modified prescription policy is one where
doctors are mandated to participate in courses, e.g. as part of their Continuing Medical
Education (CME), where they were trained to base their prescription decisions on what
they would have done if they did not consider, or perceive, their patient as having chronic
pain and supposed they had none. This modified policy does not require us to conceptualize
interventions on a patient’s chronic pain – an intervention that would have been hard, or
impossible, to specify. While a doctor’s beliefs about their patient’s chronic pain status
is not usually directly recorded in observed data, an analyst might reasonably assume a
deterministic relation between a patient’s chronic pain status and the doctor’s perception
of that chronic pain status. The doctor’s perception of the patient’s chronic pain status
during prescription decision-making process is considered to be an intervening variable that
in turn determines the doctor’s prescription in the observed data.
The remainder of the article is organized as follows. In Section 2 we describe the observed
and counterfactual data structure. In Section 3 we precisely define our interventionist
estimand and derive identification results. In Section 4 we relate our identification results
to non-parametric conditions that allow identification of relevant estimands that have been
proposed in the past, and discuss the plausibility of these conditions. In Section 5 we present
a new sample-bounded semiparametric estimator of the frontdoor formula. In Section 6 we
give simulation results suggesting superior finite sample performance compared with existing
estimators of the frontdoor formula (Fulcher et al., 2020). In Section 7, we apply our new
results to study the effect of opioid prescription policies on chronic pain using data from
NHANES, and discuss the practical implications in Section 8.
4
Causal effects of intervening variables in settings with unmeasured confounding
In the analysis of the chronic pain example, Inoue et al. (2022) concluded that their findings
“highlight the importance of careful guideline-based chronic pain management to prevent
death from possibly inappropriate opioid prescriptions driven by chronic pain.” However,
this argument does not translate to an intervention on chronic pain A; rather, we interpret
their policy concern as one that directly involves a modifiable intervening variable: a care
provider’s perception of the patient’s chronic pain in standard-of-care pain management
decisions, AM . If we define AM as binary (taking values a† or a◦ ), and assume that a care
provider’s natural consideration corresponds exactly with a patient’s chronic pain experi-
ence, then in this setting, A=AM with probability one in the observed data. Despite this
feature of the observed data, we could nevertheless conceive an intervention that modifies
this intervening variable AM without changing the non-modifiable exposure A, and thus
appease any policy concerns at the causal estimand-formulation stage of the analysis. Our
consideration of the chronic pain example motivates estimands under interventions on the
modifiable intervening variable AM , not A. As we subsequently review sufficient conditions
for identification include that (i) AM is deterministically equal to A in the observed data,
and (ii) AM captures the effects of A on Y through M . Analogous conditions have been
considered for the identification of separable effects (Robins and Richardson, 2010; Robins
et al., 2022; Stensrud et al., 2021, 2022a).
As in the separable effects literature, our chronic pain example is amenable to graphical
representation. Consider an extended causal directed acyclic graph (DAG) (Robins and
Richardson, 2010), which not only includes A but also the intervening variable AM . Figure
1a shows this extended DAG with node set V =(U,L,A,AM ,M,Y ). The bold arrow from
A to AM in Figure 1a represents the assumption of a deterministic relationship between
A and AM in the observed data: with probability one under f (v), either A=AM =a† or
A=AM =a◦ . These graphs will be used to illustrate our subsequent results, where we
give formal conditions under which the effects of an intervention that sets AM to aM ,
as represented in the Single World Interventions Graph (SWIG; Richardson and Robins,
2013) Figure 1b, can be identified. We adopt the convention that the absence of an arrow
in a SWIG encodes the absence of individual level effects in that context, as described
in Richardson and Robins (2013). Causal effects of intervening variables are of interest
beyond our running chronic pain example. In Appendix A, we consider an example on
racial disparity and an obstetrics example from Fulcher et al. (2020).
5
Intervening variables and unmeasured confounding
Like Robins and Richardson (2010), our formalized interventions on the intervening variable
is conceptually related to edge interventions described by Shpitser and Sherman (2018).
To formally state our identifiability conditions of an intervening variable estimand, we
first invoke the assumption of a deterministic relationship between the exposure and the
intervening variable in the observed data.
Assumption 1 implies that A will almost always equal AM . In the chronic pain example, this
will be violated if there is a large number of chronic pain diagnoses that incorrectly capture
the true underlying pain status of the patients. Nevertheless, beyond patient-reported pain,
doctors can utilize many tools (e.g., electromyography, nerve conduction studies, reflex and
balance tests) to diagnose a patient’s pain status, and thus in our context, we believe that
Assumption 1 is reasonable. We also invoke a positivity condition which only involves
observable laws and requires that for all joint values of L, there is a positive probability of
observing A=a and M =m, ∀a,m.
Following Stensrud et al. (2021), let “(G)” refer to a future trial where AM is randomly
assigned, and consider the following dismissible component conditions.
In the context of our chronic pain example, it is reasonable to assume that an opioid pre-
scription depends on a doctor’s perceived chronic pain status of a patient, AM . Moreover, it
is also reasonable to assume that AM (doctor’s perception of chronic pain) affects Y (mor-
tality) only through M (opioid prescription), because a doctor’s perception of a patient’s
chronic pain status can ultimately lead to overdosing events and thus death only via their
decision to prescribe opioids.3
1. We discuss additional scenarios involving a measured recanting witness as mentioned in Remark 1 below.
2. The dismissible component conditions are related to so-called partial isolation conditions, see, e.g.,
(Stensrud et al., 2021) for more details.
3. AM may also affect usage of non-opioid pain-killers (e.g., acetaminophen), but these are generally safe
and unlikely to result in death.
6
Causal effects of intervening variables in settings with unmeasured confounding
Assumption 3 may fail if there exists a covariate, say anti-depressant medication prescrip-
tion, that potentially affects opioid prescription and mortality but is omitted in the data
analysis. Thus, it is important to measure all sufficient common causes of opioid prescrip-
tion and mortality (and similarly, common causes of chronic pain and opioid prescription)
in any data analysis of chronic pain studies.
†
P P
Furthermore, we let Ψ:= m,l f (m|a ,l)f (l) a E(Y |L=l,A=a,M =m)f (a|l). As we for-
†
mally show in Appendix D, the causally manipulable estimand E(Y aM =a ) is identified by
(3) even when A is a direct cause of Y , not mediated by M . See also Theorem 2 and
Proposition 4 of Robins et al. (2022) for an alternative derivation of identification results
of a related estimand via the extended ID algorithm of Shpitser et al. (2022),5 in addition
to identification of other interventionist estimands related to separable effects.
Henceforth, we refer to the Ψ as the generalized frontdoor formula as it allows for some
baseline covariates L. Note that Fulcher et al. (2020) used the term generalized front-
door criterion (not formula) to describe the identification conditions for the PIIE, which
we discuss in Section 4. They argued that their “identification criterion generalizes Judea
Pearl’s front door criterion as it does not require no direct effect of exposure not mediated
by the intermediate variable”. However, their strategy requires an untestable cross-world
exchangeability assumption (see Assumption 9, subsequently reviewed), which is strictly
not necessary to identify the ACE. Thus, their approach should not be viewed as a gener-
alization.
4. Specifically, because AM is randomly assigned in G, we have that:
One sufficient way to ensure this is to presume that participation in the trial can only influence M and
Y through AM . That is, we define a trial (G) as one that that precludes any participation effects on M
and Y not mediated through AM (see Appendix D).
5. See Shpitser (2013) and Shpitser and Sherman (2018) who also provide a complete algorithm for identi-
fying path-specific effects with hidden variables.
7
Intervening variables and unmeasured confounding
To motivate our novel estimation results in Section 5, we emphasize that, in the absence of
covariates L, the frontdoor formula can be expressed as
X X
Ψ:= f (m|a† ) E(Y |A=a,M =m)f (a) (4)
m a
X
=P (A=a )E(Y |A=a† )+P (A=a◦ )
†
E(Y |A=a◦ ,M =m)f (m|a† ).
m
†
Thus, E(Y aM =a ) is a weighted average of a conditional mean, E(Y |A=a† ), and a term that
† ◦
appears in a known identification formula for separable effects (i.e., E(Y aM =a ,aY =a )6 ) of A
on Y (Robins and Richardson, 2010; Robins et al., 2022; Stensrud et al., 2021, 2022a). This
decomposition is instrumental in formulating new semiparametric estimators, see Appendix
B for further details. In Appendix B, we provide further intuition for this re-expression,
and in Appendix C, we give identification and estimation results for our new causally
manipulable estimand in absence of L.
The ACE is identified by the generalized frontdoor formula (3) under Assumptions 2 and
5-8, detailed below.
Ma⊥
⊥A|L,∀ a ∈ supp(A).
8
Causal effects of intervening variables in settings with unmeasured confounding
A AM M Y
L
(a) Extended DAG.
A AM aM M aM Y aM
L
(b) SWIG transformation of the extended graph in Figure 1a, corresponding to an intervention on
AM .
U (G)
L(G)
Figure 1
9
Intervening variables and unmeasured confounding
Assumption 7 ensures that A only affects Y through M which holds, for instance, if the
green arrow is removed from the graphs in Figure 2a–2c. Assumption 8 ensures that there
is no unmeasured mediator-outcome confounding, which holds, for instance, in the SWIG
in Figure 2c. Together, Assumptions 2 and 5–8 allow unmeasured confounders between
exposure and outcome given measured covariates such that Y a 6⊥
⊥A|L as illustrated in the
SWIG in Figure 2b (see Appendix B for details).
The PIIE (Fulcher et al., 2020) is a contrast between the observed mean outcome and the
counterfactual outcome if, possibly contrary to fact, the mediator had taken the value that
a† ,A
it would have when exposure equals a† . An example of such a contrast is E(Y )−E(Y M ).
† =0 † =0
In the context of the chronic pain example, E(Y Ma ,A )=E(Y M a
) is then interpreted as
the counterfactual cumulative risk had there been no intervention on a patient’s chronic pain
status, but had opioid prescription been set to the value it would have taken had chronic
pain been eliminated in all patients. This is thus considered a cross-world counterfactual
a† ,A a◦ ,A
estimand. Inoue et al. (2022) studied a closely related estimand, E(Y M )−E(Y M ),
which they called the path specific frontdoor effect.
Instead of imposing Assumptions 7–8, Fulcher et al. (2020) only relied on Assumptions 2,
5–6 and the following cross-world exchangeability assumption to identify the PIIE:
†
⊥M a |A,L for all values of a,a† ,m.
Assumption 9 (Cross-world exchangeability) Y a,m ⊥
While the identification strategy proposed by Fulcher et al. (2020) permits the presence
of unmeasured common causes of exposure and the outcome, and a direct effect of the
exposure on the outcome, their remaining assumptions are not innocuous. First, consistency
(Assumption 5) is likely to be violated in our running example; for instance, it is not clear
10
Causal effects of intervening variables in settings with unmeasured confounding
A M Y
L
(a) DAG of the observed random variables.
U
A a Ma Ya
L
(b) SWIG of the counterfactual variables corresponding to Figure 2a with intervention on treatment
or exposure A.
A M m Ym
L
(c) SWIG of the counterfactual variables corresponding to Figure 2a with intervention on mediator
variable M .
Figure 2: DAG and SWIGs of random variables under which average counterfactual out-
comes can be identified.
11
Intervening variables and unmeasured confounding
Remark 1 As the identifying formula for the PIIE and our proposed estimand coincide
under the intersection of the model in Theorem 1 and that proposed by (Fulcher et al.,
2020), there is clearly a close connection between identification results for our proposed es-
timands and the PIIE, specifically when there is no so-called recanting witness (Avin et al.,
2005). However, when there exists a recanting witness, the PIIE is no longer identified,7
†
yet contrasts defined by E(Y aM =a ) can still be identified and meaningfully interpreted.8
Such a scenario is given by the DAGs in Figure 3. These distinctions in identifiability at
laws in DAG models with a recanting witness have practical implications, as we will illus-
trate with our chronic pain example. Specifically, the use of Selective Serotonin Reuptake
Inhibitors (SSRIs) antidepressants for clinically diagnosed depression (L) could be a plausi-
ble recanting witness: suppose that the example is described by the DAG in Figure 3, where
a patient’s chronic pain status can affect SSRI use through diagnosed clinical depression,
but SSRI use is unaffected by the doctor’s perception of chronic pain (AM ). Then, SSRI
†
is a recanting witness and thus the PIIE is not identified, but E(Y aM =a ) is still identified
under our set of assumptions. In particular, under the intervention defining our estimand, a
doctor’s decision-making will proceed under the perception that the patient’s chronic pain
is absent while factoring in the patient’s (factual) SSRI usage. More broadly, this feature
†
a† A a† ,La
7. The reason being that the PIIE involves E(Y A,M )=E(Y A,L ,M ), which involves both L and
a†
L , and neither can be eliminated in the identifying formula (see also Avin et al., 2005; Robins and
Richardson, 2010; Robins et al., 2022).
8. This is because Assumption 4 (Dismissible component conditions), still holds in this setting.
12
Causal effects of intervening variables in settings with unmeasured confounding
A M Y
L
(a) DAG of the observed random variables with a recanting witness.
U
A AM M Y
L
(b) Modified extended DAG with a recanting witness.
†
illustrates the interventionist interpretation of E(Y aM =a ), which is analogous to motiva-
tions for separable effects (Robins and Richardson, 2010; Didelez, 2019; Robins et al., 2022;
Stensrud et al., 2022b,a). These estimands can have an unambiguous interventionist inter-
pretation, regardless of the existence of, e.g., mediators and recanting witnesses, and are
meaningful without any engagement with conventional mediation analysis. As the exam-
ple of antidepressant use illustrates, the existence of a recanting witness is not a problem
for the interpretation of the intervention on AM . In this case, it would not even hinder
identification.
Because our proposed estimand E(Y aM ) is identified by the generalized frontdoor formula,
we can apply existing estimators, such as the Augmented Inverse Probability Weighted
(AIPW) semiparametric estimator in Fulcher et al. (2020). However, existing approaches
do not guarantee that estimates are bounded by the parameter space. Such estimators have
been shown to result in poor performance in finite samples (Kang and Schafer, 2007; Robins
et al., 2007). This is a concern in Example 1, where the outcome is a binary indicator of
13
Intervening variables and unmeasured confounding
mortality. Hence, here we develop new semiparametric estimators that are guaranteed to
be bounded by the parameter space.
Suppose that the observed data O=(L,A,M,Y ) follow a law P which is known to belong to
a model M={Pθ :θ∈Θ}, where Θ is the parameter space. The efficient influence function
ϕeff (O) for a causal parameter Ψ≡Ψ(θ) in a non-parametric model Mnp that imposes no re-
strictions on the law of O other than positivity is given by dΨ(θt )/dt|t=0 =E{ϕeff (O)S(O)},
where dΨ(θt )/dt|t=0 is known as the pathwise derivative of the parameter Ψ along any
parametric submodel of the observed data distribution indexed by t, and S(O) is the score
function of the parametric submodel evaluated at t=0 (Newey, 1994; Van Der Vaart, 2000).
We will first re-express our causal estimand – identified by the generalized frontdoor formula
– as a weighted average of two terms given by ψ2 and ψ3 :
X
Ψ:=P (A=a† )E(Y |A=a† ) +P (A=a◦ ) E(Y |M =m,L=l,A=a◦ )f (m|a† ,l)f (l|a◦ ) .
| {z }
ψ2 m,l
| {z }
ψ3
Thus, using differentiation rules (Van Der Vaart, 2000; Ichimura and Newey, 2022; Kennedy,
2022), the efficient influence function of Ψ can be realized by finding the efficient in-
fluence function of the following parameters: ψ1 :=P (A=a); ψ2 :=E(Y |A=a† ); and ψ3 :=
◦ † ◦
P
m,l E(Y |M =m,L=l,A=a )f (m|a ,l)f (l|a ). We can view ψ3 as the identifying formula
† ◦
for E(Y aM =a ,aY =a |A=a◦ ), a conditional mean that would appear in estimands for sep-
arable effects in the treated, which is identified in an extended DAG in the absence of U
(see Robins and Richardson, 2010; Robins et al., 2022; Stensrud et al., 2021, 2022a). The
following Theorem motivates semiparametric estimators based on this particular weighted
representation of Ψ.
Theorem 2 The efficient influence function ϕeff (O) of the generalized front-door formula
in Mnp for O=(L,A,M,Y ) is given by
where in the equation, b0 (M,L):=E(Y |M,L,A=a◦ ), h† (L):=E(b0 (M,L)|A=a† ,L) and the
expression in blue is proportional to the efficient influence function for ψ3 . The efficient
influence function (5) can also be re-expressed as
14
Causal effects of intervening variables in settings with unmeasured confounding
A proof can be found in Appendix E. Algebraic manipulation of (6) obtains the efficient
influence function representation given in Equation (5) in Theorem 1 of Fulcher et al. (2020).
As such, for any regular
Pn and asymptotically linear estimator Ψ̂ of Ψ in Mnp , it must be that
√ −1/2 eff
n(Ψ̂−Ψ)=n i=1 ϕ (Oi )+op (1). Furthermore, all regular and asymptotically linear
estimators in Mnp with efficient influence function equaling to ϕeff (O) are asymptotically
equivalent and attain the semiparametric efficiency bound (Bickel et al., 1998).
Writing the efficient influence function for the generalized frontdoor formula (Ψ) given in
Expressions (5) or (6) motivates estimators that guarantee sample-boundedness. A weighted
iterative conditional expectation (Weighted ICE) estimator
Pn with this property is presented
in Algorithm 1. In what follows, we let Pn (X):=n −1 −1
i=1 Xi and let g denote a known
inverse link function satisfying inf(Y)≤g −1 (u)≤sup(Y), for all u, where Y is the sample
space of Y (e.g., a logit link for dichotomous Y ).9
f (M |A=a† ,L;γ̂)
f (M |A=a◦ ,L;γ̂)
9. For bounded continuous outcome, obtain a transformed outcome Y ∗ = Yc−b −b
, where b and c denote the
minimum and maximum of Y, respectively. The algorithm proceeds with Y ∗ in place of Y , and the
estimate obtained from Step 7 of the algorithm is transformed back to the original scale by multiplying
it by (c−b) and adding b; see also Gruber and van der Laan, 2010.
15
Intervening variables and unmeasured confounding
if γ̂ was estimated in the previous step, and φ(M,L) is a known function of M and L.
More specifically, we solve for θ in the following estimating equations:
h i
Pn I(A=a◦ )φ(M,L)Ŵ1 {Y −Q(M,L;θ)} =0.
4: In those whose A=a† , fit a model R(L;η):=g −1 {η T Γ(L)} for h† (L) where the score
function for each observation is weighted by
P (A=a◦ |L;κ̂)
Ŵ2 := .
P (A=a† |L;κ̂)
Here, Γ(L) is a known function of L. More specifically, we solve for η in the following
estimating equations:
h n oi
Pn I(A=a† )Γ(L)Ŵ2 Q(M,L; θ̂)−R(L;η) =0.
5: In those whose A=a◦ , fit an intercept-only model T (β):=g −1 (β) for ψ3 . More specifi-
cally, we solve for β in the following estimating equations:
In Algorithm 1, steps 3, 4 and 5 ensure that the estimates for ψ3 are sample bounded.
Moreover, in Step 6 it is clear that Ψ̂W ICE is a convex combination of Y and estimates for
ψ3 , both of which are bounded by the range of the outcome Y . Thus, Ψ̂W ICE will also be
sample-bounded. In Appendix F, we prove that the proposed estimator, which is based on
the efficient influence function given by (5) and (6), is robust against 3 classes of model
misspecification scenarios.
Theorem 3 Under standard regularity conditions, the weighted ICE estimator Ψ̂W ICE
where a model for P (A=a|M,L) is specified will be consistent and asymptotically normal
under the union model Munion =M1 ∪M2 ∪M3 where we define:
1. Model M1 : working models for P (A=a|M,L) and P (A=a|L) are correctly specified.
2. Model M2 : working models for b0 (M,L) and h† (L) are correctly specified.
3. Model M3 : working models for b0 (M,L) and P (A=a|L) are correctly specified.
Moreover, Ψ̂W ICE is locally efficient in the sense that it achieves the semiparametric
efficiency bound for Ψ in Mnp , i.e., E ϕeff (O)2 , at the intersection model given by
Mintersection =M1 ∩M2 ∩M3 .
16
Causal effects of intervening variables in settings with unmeasured confounding
Alternatively, the weighted ICE estimator Ψ̂W ICE where a model for P (M =m|A,L) is
specified will be consistent and asymptotically normal under the union model Munion =
M1 ∪M2 ∪M3 where we define (1) Model M1 : working models for P (M =m|A,L)10 and
P (A=a|L) are correctly specified; (2) Model M2 : working models for b0 (M,L) and h† (L)
are correctly specified; and (3) Model M3 : working models for b0 (M,L) and P (A=a|L)
are correctly specified. Furthermore, this Ψ̂W ICE is also locally efficient in the sense that,
when all working models are correctly specified, it achieves the semiparametric efficiency
bound for estimating Ψ in Mnp .
Remark 2 It is possible that the weighted ICE estimator is consistent when the models for
b0 (M,L) and E(M |A,L) are correctly specified. For instance, this would be the case when
Y and M are continuous, in which case the model R(L;η) for h† (L) can also be correctly
specified in this model specification scenario.
Unlike the estimator proposed by Fulcher et al. (2020), the estimator proposed here requires
specification of four models instead of three, and thus our proposed estimator offers a
different robustness property against model misspecification compared to that of the AIPW
estimator in Fulcher et al. (2020). Consequently, the AIPW estimator in Fulcher et al.
(2020), which requires specification of only three (vs. four) working models, is robust against
two (vs. three) classes of model misspecification scenarios. Specifically, it will be consistent
when at least (1) the models for b0 (M,L) and P (A=a|L) are correctly specified, or (2) the
model for P (M =m|A,L) is correctly specified. In Appendix G, we describe an iterative
algorithm that preserves the double robustness property of Fulcher et al. (2020) for binary
mediators. However, we do not pursue this iterative algorithm in our main manuscript as
it is more computationally intensive and can run into convergence issues in finite samples
(see van der Laan and Gruber, 2009; Van der Laan and Rose, 2018).
We believe that our proposed estimator will be particularly useful in cases where (i) M
is a continuous variable, as in the ‘Safer deliveries’ application in Fulcher et al. (2020),
and/or (ii) there are multiple mediator variables (M1 , M2 , M3 . . . ). In both of these cases,
the model(s) for the mediator variable(s) will be difficult to correctly specify. In Appendix
F, we prove that in the absence of L our weighted ICE estimator based on the efficient
influence function for the (non-generalized) frontdoor formula is doubly robust in the sense
that it will be consistent as long as the model for P (A=a|M ) – or P (M =m|A), depending
on the representation – or the model for E(Y |A=a◦ ,M ) is correctly specified. As such, for
the (non-generalized) frontdoor formula in the absence of L, the double robustness property
of our estimator is the same as that of Fulcher et al. (2020).
10. or model for the density ratio of P (M =m|A=a† ,L)/P (M =m|A=a◦ ,L)
17
Intervening variables and unmeasured confounding
4: A. In those whose A=a† , compute R̂(L) by regressing Q∗ (M,L; δ̂) (as outcome, ob-
tained from last step) on L. Here, R̂(L) is possibly estimated using machine
learning methods, and it denote an initial estimate of h† (L).
B. Update the previous regression. Specifically, in those whose A=a† , fit an intercept-
only regression model R∗ (L;ν):=g −1[ {g{R̂(L)}+ν] where the score function for
each observation is weighted by Ŵ2 (defined previously). More specifically, we
solve for ν in the following estimating equations:
h n oi
Pn I(A=a† )Ŵ2 Q∗ (M,L; δ̂)−R∗ (L;ν) =0.
5: In those whose A=a◦ , fit another regression model T (β):=g −1 (β) for ψ3 with just an
intercept. More specifically, we solve for β in the following estimating equations:
Theorem 4 (Weak convergence of TMLE) Suppose that the conditions given in Ap-
pendix J hold, and further suppose that the following condition also holds:
18
Causal effects of intervening variables in settings with unmeasured confounding
The advantage of using TMLE (or the AIPW estimator of Fulcher et al., 2020) with machine
learning algorithms for the nuisance functions is that the estimator is still consistent and
asymptotically normal as long as the nuisance functions converge to the truth at rates faster
than n−1/4 (Robins et al., 2008; Chernozhukov et al., 2018) (see Appendix J). Nevertheless,
in real world applications there is no guarantee that such rates of convergence can be
attained when more flexible algorithms such as neural network or random forest are used.
Moreover, there is no guarantee that these machine learning methods will exhibit more or
less bias compared with parametric models (Liu et al., 2020).
6. Simulation study
We conducted a simulation study to demonstrate that (1) our estimand, like the PIIE, is
robust to unmeasured confounding between exposure and outcome, (2) our proposed estima-
tor is more robust to model misspecification compared with estimators such as the Inverse
Probability Weighted (IPW) estimator and the ICE estimator (described in Appendix G),
and (3) unlike the AIPW estimator of Fulcher et al. (2020) our estimator is sample bounded
in any finite sample size setting.
The simulation study was based on 1000 simulated data sets of sample sizes n=100, 250
and 500. We compared the bias, standardized bias and efficiency of IPW, ICE, AIPW
and weighted ICE estimators. Standardized bias is 100×[(Average Estimate−True Param-
eter)/empirical standard deviation of the parameter estimates]. Larger standardized bias
will have a bigger impact on efficiency, coverage, and error rates. It has been suggested
that for n=500, anything greater than an absolute standardized bias of approximately 40%
will have a ‘noticeable adverse impact on efficiency, coverage, and error rates’ (Schafer and
Kang, 2008; Collins et al., 2001). The true value of Ψ=0.0144 was calculated by generating
a Monte Carlo sample of size 107 . We intentionally chose a rare outcome to compare the
performance of the estimators where the AIPW estimator had a non-trivial chance of falling
outside of the parameter space.
The data-generating mechanism for our simulations and model specifications are provided
in Table 1. We consider four scenarios to illustrate the robustness of our proposed estimator
to model misspecification: (1) all models are approximately correctly specified, (2) only the
models for b0 (M,L) and h† (L) are approximately correctly specified, (3) only the models
for b0 (M,L) and P (A=a|L) are approximately correctly specified, and (4) only the model
for P (M =m|A,L) is correctly specified and the model for P (A=a|L) is approximately
correctly specified. The correct mediator model in the specification scenarios is the one used
19
Intervening variables and unmeasured confounding
in the data generation process, and the exposure and outcome models are approximately
correctly specified by including pairwise interactions between all the variables to ensure
model flexibility.
Table 2 shows the results from the simulation study. Consistent with our theoretical deriva-
tions, when all of the working models are (approximately) correctly specified, all of the
estimators become nearly unbiased as the sample size increases. For n=500, the AIPW
estimator and our proposed weighted ICE estimator are also nearly unbiased in the three
model misspecification settings whereas the IPW and ICE estimators are not all unbiased.
In Appendix I, we show an additional simulation study where the variables are all binary
and thus the correct exposure and outcome models are saturated models that cannot be
misspecified. The results further show the robustness of our estimator in model misspecifi-
cation scenarios.
Finally, ICE, AIPW and weighted ICE estimators have comparable standard errors when
all of the models are correctly specified, but all three had lower standard errors compared
with IPW. However, our results also show that AIPW estimator has poorer finite sample
performance when n=100 compared with the weighted ICE estimator. Moreover, for n=
100, there were 90, 59 and 88 simulated data sets where estimates from the AIPW estimator
fell below 0 in scenarios 1, 3 and 4, respectively.
Table 1: Data generating mechanism and model misspecifications for scenarios in simulation
study.
7. Application
Motivated by Example 1, we applied our results to study the effect of modified prescription
policies for opioids on mortality in patients with chronic pain. The intervention of interest is
20
Causal effects of intervening variables in settings with unmeasured confounding
Table 2: Simulation results: Bias, standard error (SE), and standardized bias (Biass ) are
multiplied by 100, and true value was Ψ=0.0144. For n=100, 90, 59 and 88 simulated
data sets had estimates from the AIPW estimator that fell below 0 in scenarios 1, 3 and 4,
respectively. For n=250, 16, 38 and 14 simulated data sets had estimates that fell below
0 in scenarios 1, 3 and 4, respectively. For n=500, 5, 31, and 1 simulated data set(s) had
estimates that fell below 0 in scenarios 1, 3 and 4, respectively.
a new prescription policy, in which the doctor is instructed to consider each patient’s chronic
pain as absent for the purpose of opioid prescription decisions, and otherwise use measured
covariates as usual (according to standard treatment guidelines). Thus, this policy is an
intervention on a doctor’s perception of chronic pain, AM . This intervention is consistent
with a recently proposed policy by the CDC that instructs practicing physicians to no longer
consider chronic pain as an indication for opioid therapy. While the doctors’ perceptions
are not explicitly measured in the observed data, we may reasonably assume that, in the
absence of an intervention, such perceptions perfectly correspond with the medical reality
of the patient, that is, A=AM almost surely.
We used the dataset from Inoue et al. (2022), which includes observations from the NHANES
study linked to a national mortality database (National Death Index). The NHANES study
consists of a series of in-depth in-person interviews, medical and physical examinations, and
laboratory tests aimed at understanding various emerging needs in public health and nu-
trition. Sample data are from 1999–2004 and include information on individuals’ chronic
pain status (A), opioid prescriptions (M ), mortality (Y ), and covariates (L) including age,
sex assigned at birth (male and female), race (non-Hispanic White, non-Hispanic Black,
Mexican-American, or others), education levels (less than high school, high school or Gen-
eral Education Degree, or more than high school), poverty-income ratio (the ratio of house-
hold income to the poverty threshold), health insurance coverage, marital status, smoking,
alcohol intake, and anti-depressant medication prescription. Let AM denote the intervening
variable: the doctor’s perception of the patient’s chronic pain status.
21
Intervening variables and unmeasured confounding
Our sample included 12037 individuals. Following Inoue et al. (2022), an individual is
considered to have chronic pain if they reported pain for at least three months by the Inter-
national Association for the Study of Pain criteria (Merskey and Bogduk, 1994). Moreover,
data on prescription medications for pain relief used in the past 30 days were collected in
the in-person interview. Opioids identified through the process include codeine, fentanyl,
oxycodone, pentazocine and morphine. About 16% of the individuals in the sample expe-
rienced chronic pain and approximately 5% of the individuals in the sample reported using
opioids. Detailed data description can be found in Inoue et al. (2022).
We estimated the cumulative incidence E(Y aM =0 ) and the causal contrast E(Y )−E(Y aM =0 )
using ICE, IPW and our weighted ICE estimator by specifying logistic regression models
for the outcome, mediator and exposure. We used the same logistic regression models as
those of Inoue et al. (2022) by adjusting for all the previously-listed measured covariates
as potential confounders. All 95% confidence intervals were based on the 2.5 and 97.5
percentiles of a non-parametric bootstrap procedure with 1000 bootstrapped samples.
The ICE procedure estimated the E(Y aM =0 ) to be 4.76% (95% CI=(4.36, 5.16)) in three
years and 8.55% (95% CI=(8.04, 9.09)) in five years. The IPW procedure estimated this
cumulative incidence to be 4.95% (95% CI=(4.57, 5.35)) in three years and 8.82% (95%
CI=(8.30, 9.35)) in five years. The weighted ICE procedure estimated this cumulative
incidence to be 4.76% (95% CI=(4.36, 5.16)) in three years and 8.55% (95% CI=(8.04, 9.09))
in five years.
Moreover, the ICE procedure estimated E(Y )−E(Y aM =0 ) to be 0.22% (95% CI=
(−0.32, 0.82)) in three years and 0.25% (95% CI=(0.09, 0.40)) in five years. The IPW
procedure estimated this causal contrast to be 0.02% (95% CI=(−0.38, 0.40)) in three
years and -0.03% (95% CI=(−0.55, 0.49)) in five years. Finally, the weighted ICE proce-
dure estimated this causal contrast to be 0.22% (95% CI=(−0.33, 0.82)) in three years and
0.25% (95% CI=(0.09,0.40)) in five years.
We also applied the TMLE estimator described herein where the nuisance functions were
estimated using the Super Learner ensemble (library of candidates including generalized
additive models and multivariate adaptive regression Splines) and found similar results.
For instance, the TMLE procedure estimated E(Y )−E(Y aM =0 ) to be 0.24% (95% CI=
(0.08,0.39)) in five years.
The results from the data analysis illustrate that our causal estimand E(Y )−E(Y aM =0 ) has
a clear interpretation and is practically relevant. We interpret E(Y aM =0 ) as a counterfactual
cumulative incidence resulting from a policy where doctors are trained to consider their
patients as not having chronic pain and fully adhere to this training. Our analysis suggests
that under such an intervention, the cumulative incidence of death is almost identical to
the cumulative incidence in the observed data after three years, but decreases very slightly
after five years.
22
Causal effects of intervening variables in settings with unmeasured confounding
8. Discussion
We have derived identification results that justify the use of the frontdoor formula in new
settings, reflecting questions of practical interest, where unmeasured confounding is a seri-
ous concern. Our identification results do not rely on ill-defined interventions or cross-world
assumptions (see also Shpitser and Sherman, 2018; Robins et al., 2022). Specifically, we
proposed an estimand defined by an intervention on a modifiable descendant of an exposure
or treatment, which we call an intervening variable. Like the previously proposed PIIE, our
proposed estimand is identified by the frontdoor formula even in the presence of a direct
effect of the exposure on the outcome not mediated by an intermediate variable. But unlike
the PIIE, our estimand is identifiable under conditions that, in principle, are empirically
testable. In addition, we presented an example in which our proposed estimand is practi-
cally relevant. In this example, the exposure variable – chronic pain – was difficult, if not
impossible, to intervene on. However, we argued that interventions on the intervening vari-
able, rather than the exposure, are often of practical interest in settings where interventions
on exposures are ill-defined.
Existing estimators of the frontdoor formula, including the AIPW estimator of Fulcher
et al. (2020), can also be used to estimate estimands identified by the frontdoor formula,
including the one proposed in this manuscript. When our proposed estimand is identified
by the frontdoor formula in the absence of L, our proposed estimator and the existing
AIPW estimator of Fulcher et al. (2020) are both doubly robust in the sense that they are
consistent as long as (1) the model for P (A=a|M ) (or P (M =m|A)) is correctly specified
or (2) the model for b0 (M ) is correctly specified. For the generalized frontdoor formula,
the AIPW estimator is doubly robust when (1) the models for b0 (M,L) and P (A=a|L) are
correctly specified, or (2) the model for P (M =m|A,L) is correctly specified. Compared
with existing AIPW estimator of Fulcher et al. (2020), the one disadvantage of our estimator
for the generalized g-formula is that it requires specification of four models instead of three.
Nevertheless, our proposed estimator then offers three chances for consistent estimation and
is therefore triply robust. Specifically, it is consistent when (1) the models for P (A=a|M,L)
(or P (M =m|A,L)) and P (A=a|L) are correctly specified, (2) the models for b0 (M,L) and
h† (L) are correctly specified, or (3) the models for b0 (M,L) and P (A=a|L) are correctly
specified. A key advantage of our semiparametric estimator is that estimates of Ψ are
ensured to be bounded by the parameter space, regardless of the sample size and variability
of inverse probability weights.
In practice it is advised to define richly parameterized models for P (A=a|L) and h† (L)
to ameliorate model incompatibility issues between P (A=a|L) and P (A=a|M,L) and be-
tween b0 (M,L) and h† (L) (Vansteelandt et al., 2007; Tchetgen Tchetgen and Shpitser,
2014). However, similar to the AIPW estimator in Fulcher et al. (2020), our proposed
TMLE estimator can also accommodate machine learning algorithms, which in principle
√
can achieve n-consistency as long as the nuisance functions converge at sufficiently fast
rates. Moreover, when the mediator variable is continuous, the AIPW estimator of Fulcher
et al. (2020) involves a preliminary estimator of a density ratio. Direct estimation of the
density ratio is often cumbersome as correct specification of the probability density of M
23
Intervening variables and unmeasured confounding
is difficult. This is particularly challenging task when data-adaptive estimators are used
for estimating high-dimensional nuisance parameters. Our proposal allows for an alterna-
tive estimation procedure to accommodate continuous mediator variables by modeling the
propensity score of exposure/treatment instead. Finally, our semiparametric estimator can
be applied whenever the frontdoor formula identifies the parameter of interest, which e.g.,
could be the ACE, PIIE or our interventionist estimand. Our results also motivate future
methodological work. In particular, we aim to generalize our results to longitudinal settings,
involving time-varying treatments.
Acknowledgments
The authors thank Dr. Kosuke Inoue for the access to the NHANES dataset used in
the data application. Lan Wen is supported by the Natural Sciences and Engineering
Research Council of Canada (NSERC) Discovery Grant [RGPIN-2023-03641, DGECR-2023-
00455]. Mats J. Stensrud and Aaron L. Sarvet are supported by the Swiss National Science
Foundation, grant 200021 207436.
24
Causal effects of intervening variables in settings with unmeasured confounding
Example 2 (Race and job interview discrimination) Consider the extended graph
in Figure 1a, where A denotes the race of an individual, AM denotes the race that an
individual indicates on a job application for a company, M denotes the indicator that the
individual receives an interview, and Y denotes whether an individual is hired for the job
(1 if the individual is hired and 0 otherwise). Further, the unmeasured variable U includes
complex historical processes that ultimately affect both a person’s perceived or declared
race on the job market and may influence whether or not they are hired by the company.
Several studies have documented that resumé ‘whitening’ (e.g., deleting any references or
connotations to a non-White race) can increase an applicant’s chance of receiving an inter-
views (Kang et al., 2016; Gerdeman, 2017). However, it is unclear if this strategy leads to
a higher chance of actually being hired, as e.g., unconscious bias can also affect whether
or not an individual is hired, even if the candidate’s background and qualifications are the
same as other candidates. Thus, in a group of individuals with the same qualifications,
the difference between E(Y ) and E(Y aM =white ) could indicate the effect among non-White
candidates of resumé whitening in the screening process on the probability of being hired.
Example 3 (The safer deliveries program, (Fulcher et al., 2020)) The ‘Safer de-
liveries’ program was designed to reduce the relatively high rates of maternal and neonatal
mortality in Zanzibar, Tanzania. The program provided counselling to pregnant women
preparing for delivery. Women deemed to be in a high pregnancy “risk category”, based
on a mobile device algorithm, were instructed to deliver at a referral hospital, a specialty
healthcare resource that generally incurred higher expenses for the women’s family. Then,
given the recommended delivery location, the mobile device algorithm also calculated an
amount that they recommended the family should save in anticipation of future obstetric
care costs (i.e. a “tailored savings recommendation”). At a later point in the study, the
amount that the families actually saved for this purpose was recorded in the data (i.e. the
“actual savings”).
Using data from the ‘Safer deliveries’ program, Fulcher et al. (2020) aimed to evaluate “the
effectiveness of this tailored savings recommendation by risk category on actual savings”.
They reported estimates of the PIIE of delivery risk (high risk versus low/medium risk
exposure) on actual savings at delivery (outcome), mediated by a recommended savings
amount calculated by the mobile device algorithm. As noted in Fulcher et al. (2020), there
may be unmeasured confounding between a participant’s recommended risk category and
actual savings at delivery, for example by socioeconomic factors and individual’s health-
seeking behaviour.
25
Intervening variables and unmeasured confounding
Fulcher et al. (2020) argued that the PIIEs in Example A.3 was an appropriate estimand
to study “the effectiveness of this tailored savings recommendation” for pregnant women.
However, it is not clear that the plain English justification translates to a PIIE defined by in-
terventions on a woman’s recommended risk category, an exposure that is non-interveneable
or of limited scientific interest. To interpret the results of their analysis we either have to
consider:
2. obstetric risk category to be defined as a conceptually distinct feature from the mea-
sured socio-demographic and clinical features used to compute it, and thus possibly
manipulable separately from these features. In this case the “risk category” variable
would simply be no more or less than the computed “risk category” that appears on
the screens of mobile devices, and these risk categories could be manipulated simply
by intervening on the software run on these devices. However, such interventions
would be of a substantively different nature with profound differences in interpreta-
tion and will have different implications for policy-makers. Moreover, the exposure
“risk category” would not be susceptible to unmeasured confounding.
Considering effects of intervening variables ameliorates this ambiguity and also clarifies
assumptions. Specifically, an intervention that avoids these difficulties would be to fix the
output value from the algorithm, so that it recommends a delivery location as usual, but
the patient’s recommended savings amount is based on the delivery location the original
algorithm would recommend if that patient had been deemed to be at low risk for obstetric
complications. We can explicitly define the algorithm’s computed risk category as AM (a
modifiable intervening variable) that is distinct from the patient’s non-modifiable embodied
risk category (A). In the observed data, A=AM with probability 1. However, we could
conceive an intervention that modifies this intervening variable AM without changing the
exposure A.
26
Causal effects of intervening variables in settings with unmeasured confounding
The proof for the frontdoor identifying formula (4) is given as follows:
† † † †
X
E(Y a )= E(Y a |M a =m)P (M a =m)
m
† †
X
= E(Y a |M a =m)P (M =m|A=a† ) (By A5, A6)
m
† ,m †
X
= E(Y a |M a =m)p(m|a† ) (By A5)
m
† ,m †
X
= E(Y a |A=a,M a =m)p(a)p(m|a† )
a,m
X X
= p(m|a† ) E(Y |A=a,M =m)p(a) (By A5, A7, A8).
m a
Aside from probability laws, we note the following conditions that are used in the proof
†
above: line 2 follows by conditional exchangeability of Y a and A conditional on U seen in
the SWIG in Figure 2b and follows from Assumption 612 ; line 3 follows by consistency that is
implied from recursive substitution of underlying NPSEMs; line 5 follows from Assumption
613 ; line 6 follows from Assumptions 7 and 8 and can be seen from the conditional indepen-
12. Since there is no unmeasured common causes of exposure-mediator by Assumption 6, the only path from
†
A to Y a is a backdoor path via U .
13. If U has a direct arrow to M not through A, then this will violate Assumption 6.
27
Intervening variables and unmeasured confounding
We utilize this decomposition in deriving efficient influence functions for the frontdoor for-
† ◦
mula. We note that both estimands E(Y |A=a† ) and E(Y aM =a ,aY =a ) allow for a direct
path from A to Y not mediated through M . Moreover, E(Y |A=a† ) is identified even
if there are unmeasured common causes of A and Y . This decomposition implies that
when all individuals in the observed data take treatment a† , then the frontdoor formula
equals E(Y |A=a† ); when all individuals in the observed data take treatment a◦ , then the
† ◦
frontdoor formula equals E(Y aM =a ,aY =a ).16 Clearly, when the frontdoor formula equals
† ◦
E(Y aM =a ,aY =a ), then it must be that A=AY =a◦ for all observations. But this is precisely
a scenario where intervening on AY (and creating a mediated path from A→ → AY →Y ) is
unnecessary, since A=AY necessarily. Thus, this decomposition serves as a motivation for
the construction of our proposed estimand.
14. Assumption 7 ensures that there is no direct path from A to Y not mediated by M . In addition, since
there is no unmeasured common causes of mediator-outcome by Assumption 8, the only path from A to
†
Y a is a backdoor path via U and a frontdoor path via M .
15. Specifically in the observed data, A=AM =AY =1 with probability 1, and A=AM =AY =0 with proba-
bility 1.
16. although this would not be identified from observed data unless f (m|a† ) is known a priori for all m
28
Causal effects of intervening variables in settings with unmeasured confounding
Suppose that the observed data O=(A,M,Y ) follow a law P which is known to belong to
M={Pθ :θ∈Θ}, where Θ is the parameter space. The efficient influence function ϕeff (O) for
a causal parameter Ψ≡Ψ(θ) in a non-parametric model Mnp that imposes no restrictions
on the law of O other than positivity is given by dΨ(θt )/dt|t=0 =E{ϕeff (O)S(O)}, where
dΨ(θt )/dt|t=0 is known as the pathwise derivative of the parameter Ψ along any parametric
submodel of the observed data distribution indexed by t, and S(O) is the score function of
the parametric submodel evaluated at t=0 (Newey, 1994; Van Der Vaart, 2000).
and thus the efficient influence function can bePbroken intoPtwo components. Using the chain
rule, the efficient influence function of Ψ= m f (m|a† ) a E(Y |A=a,M =m)f (a) can be
derived by finding the efficient influence function of (1) ψ1 =P (A=a), (2) ψ2 =E(Y |A=a† )
and (3) ψ3 = m E(Y |A=a◦ ,M =m)f (m|a† ). We will use the fact that ψ3 is an established
P
† ◦
identifying formula for E(Y AY =a ,AD =a ), a term in the identification formula for separable
◦
effects. This is equal to the same functional of the observed data law p(o) as E(Y a |A=a† )
in the average treatment effect on the treated (ATT) when a◦ =0 and a† =1 (Tchetgen and
Shpitser, 2012).
Theorem 6 The efficient influence function ϕeff (O) of the frontdoor formula in Mnp is
given by
h i I(A=a† )
ϕeff (O)= I(A=a† )−P (A=a† ) ψ2 +P (A=a† ) (Y −ψ2 )+
P (A=a† )
P (A=a◦ ) †
◦ P (A=a |M ) †
I(A=a ) {Y −b0 (M )}+I(A=a ){b0 (M )−ψ3 } +
P (A=a† ) P (A=a◦ |M )
29
Intervening variables and unmeasured confounding
where the terms in red are the efficient influence function for P (A=a† )ψ2 , the terms in
blue is the efficient influence function for P (A=a◦ )ψ3 , and b0 (M )=E(Y |A=a◦ ,M ). The
efficient influence function can be reduced to the following,
by realizing that I(A=a◦ )P (A=a◦ )P (A=a† |M ){P (A=a† )P (A=a◦ |M )}−1 =I(A=
◦ † ◦ −1
a )f (M |a ){f (M |a )} .
After some algebra, it can be shown that (C.9) can be written as the form of the efficient
influence function for Ψ given by Equation (5) in Theorem 1 in Fulcher et al. (2020) with
C =∅. Following Theorem 1 in Fulcher et al. (2020), the semiparametric efficiency bound
eff
for Ψ in Mnp is given by var ϕ .
The efficient influence function for the frontdoor formula (Ψ) given by in Expressions
(C.8) or (C.9) allows us to construct estimators that guarantee sample-boundedness. A
weighted iterative conditional expectation (Weighted ICE) estimator that guaranteeP sample-
−1 n
boundedness is given in the following algorihtm. In what follows, we let Pn (X)=n i=1 Xi
−1
and let g denote a known inverse link function . 17
30
Causal effects of intervening variables in settings with unmeasured confounding
f (M |A=a† ;γ̂)
f (M |A=a◦ ;γ̂)
4: In those whose A=a† , fit an intercept-only model T (β)=g −1 (β) for ψ3 =E{b0 (M )|A=
a† }, where the score function for each observation is weighted by
P (A=a◦ )
Ŵ2 = .
P (A=a† )
More specifically, we solve for β in the following estimating equations:
h n oi
Pn I(A=a† )Ŵ2 Q(M ; θ̂)−T (β) =0.
Steps 3 and 4 ensure that the estimates for ψ3 =E{b0 (M )|A=a† } are sample bounded. Step
6 confirms that Ψ̂W ICE is a convex combination of Y and estimates for ψ3 , both of which
are bounded by the range of the outcome Y . Thus, Ψ̂W ICE will also be sample-bounded.
For instance if the outcome is binary, then Ψ̂W ICE will always be bounded between 0
and 1. Note that estimators based on Expression (C.8) are more convenient to construct
than estimators based on Expression (C.9) when (1) M is continuous, and/or (2) there are
multiple mediator variables (M1 , M2 , M3 . . . ).
In Appendix F, we prove that an estimator based on the efficient influence function given
by (C.8) is doubly robust in the sense that it will be consistent as long as the model for
P (A=a|M ) or the model for b0 (M )=E(Y |A=a◦ ,M ) is correctly specified, and an estimator
based on the efficient influence function given by (C.9) is doubly robust in the sense that it
will be consistent as long as the model for P (M =m|A) or the model for b0 (M ) is correctly
specified.
31
Intervening variables and unmeasured confounding
Consider an extended causal DAG, which includes A and also the variable AM , where the
bold arrow from A to AM indicates a deterministic relationship. That is, Figure 1a is the
extended DAG of such a split with V =(U,L,A,AM ,M,Y ), and in the observed data, with
probability one under f (v), either A=AM =1 or A=AM =0.
Here and henceforth we use “(G)” to indicate that the variables refer to those realized in the
hypothetical trial where AM is randomly assigned (possibly dependent on L(G) and A(G)),
as illustrated in 1c. In particular, conditioning sets that include A(G)=a,AM (G)=a† refer
to the hypothetical trial G in which AM is randomly assigned (or where random assignment
is dependent on L and/or A). Then, by our presumed definition of G that satisfies the distri-
†
butional consistencies described in the main text, we have that E(Y aM =a )= l,a E(Y (G)|
P
†
X
E(Y aM =a )= E(Y (G)|AM (G)=a† ,L(G)=l,A(G)=a)P (A(G)=a|L(G)=l)P (L(G)=l)
l,a
X
= E(Y (G)|M (G)=m,AM (G)=a† ,L(G)=l,A(G)=a)
m,l,a
X
= P (M (G)=m|AM (G)=a† ,A(G)=a† ,L(G)=l)P (L(G)=l)
m,l
X
E(Y (G)|M (G)=m,L(G)=l,A(G)=a,AM (G)=a)P (A(G)=a|L(G)=l)
a
X X
= P (M =m|A=a† ,L=l)P (L=l) E(Y |M =m,L=l,A=a)P (A=a|L=l).
m,l a
Aside from probability laws, we note the following conditions that are used in the proof
above: equality 1 follows by definition of G; equality 3 holds by the dismissible component
conditions such that conditional independence M (G)⊥ ⊥A(G)|AM (G),L(G) and Y (G)⊥ ⊥
AM (G)|A(G),M (G),L(G) hold; and equality 4 holds by definition of G, consistency and
determinism in the observed data such that the event {A=a,AM =a} is the same as the
event {A=a}.
32
Causal effects of intervening variables in settings with unmeasured confounding
X
Ψ:=P (A=a† )E(Y |A=a† )+P (A=a◦ ) E(Y |M =m,L=l,A=a◦ )f (m|a† ,l)f (l|a◦ ).
m,l
The efficient influence function in the non-parametric model MN P is defined as the unique
mean zero, finite variance random variable ϕeff (O) such that
dΨ(θt )
=E{ϕeff (O)S(O)}
dt t=0
ϕeff (O)=[I(A=a† )−P (A=a† )] ψ2 +ψ2eff P (A=a† )+ψ3eff P (A=a◦ )+[I(A=a◦ )−P (A=a◦ )] ψ3
| {z } | {z }
(∗) (∗∗)
where expression (∗) is the efficient influence function of P (A=a† ) and expression (∗∗) is
the efficient influence function of P (A=a◦ ). Moreover,
dψ2 (θt )
=E{Y S(Y |A=a† )|A=a† }
dt t=0
hn o i
=E Y −E(Y |A=a† ) S(Y |A=a† )|A=a†
I(A=a† )(Y −ψ2 )
=E S(O)
P (A=a† )
and
dψ3 (θt ) h i
=E E E{Y S(Y |A=a◦ ,M,L)|A=a◦ ,M,L}|A=a† ,L |A=a◦ +
dt t=0
| {z }
A
h n o i
E E E(Y |A=a◦ ,M,L)S(M |A=a† ,L)|A=a† ,L |A=a◦ +
| {z }
B
h n o i
E E E(Y |A=a◦ ,M,L)|A=a† ,L S(L|A=a◦ )|A=a◦ .
| {z }
C
33
Intervening variables and unmeasured confounding
I(A=a◦ )
† ◦
=E E E Y S(Y |A,M,L)|M,L |A=a ,L |A=a
P (A=a◦ |L,M )
I(A=a◦ ) I(A=a† ) I(A=a◦ )
=E E E Y S(Y |A,M,L)|M,L |L
P (A=a◦ |L,M ) P (A=a† |L) P (A=a◦ )
I(A=a◦ ) P (A=a† |L,M ) P (A=a◦ |L)
=E E E Y S(Y |A,M,L)|M,L |L
P (A=a◦ |L,M ) P (A=a† |L) P (A=a◦ )
I(A=a◦ )
P (A=a† |L,M )
P (A=a◦ |L)
=E E E Y −b0 (M,L) S(Y |A,M,L)|M,L |L
P (A=a◦ |L,M ) P (A=a† |L) P (A=a◦ )
n o I(A=a◦ )P (A=a† |L,M )P (A=a◦ |L)
=E Y −b0 (M,L) S(O)
P (A=a◦ |L,M )P (A=a† |L)P (A=a◦ )
I(A=a◦ )
=E h† (L) S(L|A)
P (A=a◦ )
I(A=a◦ )
=E h† (L)−ψ3 S(O) .
P (A=a◦ )
34
Causal effects of intervening variables in settings with unmeasured confounding
Thus, putting everything together and after some further algebraic simplification, we can
see that the efficient influence function is indeed given by:
35
Intervening variables and unmeasured confounding
We show that an estimator based on Equation (C.8) is doubly robust in the sense that it
will be consistent as long as
and that an estimator based on Equation (C.9) is doubly robust in that it will be consistent
as long as
We consider an estimator based on Equation (C.8). Suppose that α∗ , θ∗ and β ∗ are proba-
bility limits of α, θ and β, respectively. Furthermore, let b∗0 (M )=Q(M ;θ∗ ) where as before
Q(M ;θ) is a model for b0 (M )=E(Y |M,A=a◦ ), and let ψ3∗ =T (β ∗ ) where T (β) is a non-
parametric model for ψ3 =E{b0 (M )|A=a† }. Under Equation (C.8) suffices to show that
under scenario (1) where α∗ =α and thus P (A=a† |M ;α∗ )=P (A=a† |M ), or under scenario
(2) where θ∗ =θ and thus b∗0 (M )=b0 (M ) and ψ3∗ =ψ3 .
Proof Suppose first that only the model for P (A=a|M ) is correctly specified. Then,
36
Causal effects of intervening variables in settings with unmeasured confounding
X
= P (A=a† |M =m)E(Y |A=a† ,M =m)+P (A=a◦ |M =m)ψ3∗ +
m
!
P (A=a◦ ) h † ∗ † ∗
i
P (A=a |M =m;α )b0 (m)−P (A=a |M =m)ψ3 } P (M =m)−Ψ
P (A=a† )
X
=E(Y |A=a† )P (A=a† )+ P (A=a◦ )P (M =m|A=a† )b0 (M )−Ψ
m
=0.
Next, suppose that only the model for E(Y |A=a◦ ,M ) is correctly specified. Then,
The proof of double robustness for estimators based on Equation (C.9) follows analogously
as the proof shown above.
Our proposed estimator based on the efficient influence function given by (5) and (6) is
robust against 3 classes of model misspecification scenarios. Specifically, the weighted ICE
estimator where a model for P (A=a|M,L) is specified will be consistent when at least one
of the following holds:
37
Intervening variables and unmeasured confounding
The weighted ICE estimator where a model for P (M =m|A,L) is specified will be consistent
when at least one of the following holds
We will prove robustness for an estimator based on (6) where a model for P (M =m|A,L) is
specified. Proof of robustness for estimator based on (5) where a model for P (A=a|M,L)
is specified follows analogously.
under scenario (1) where γ ∗ =γ and κ∗ =κ and thus P (M =m|A,L;γ ∗ )=P (M =m|A,L)
and P (A=a† |L;κ∗ )=P (A=a† |L), or under scenario (2) where θ∗ =θ and η ∗ =η and thus
b∗0 (M,L)=b0 (M,L), h∗† (L)=h† (L) and ψ3∗ =ψ3 , or under scenario (3) where θ∗ =θ and κ∗ =κ
and thus b∗0 (M,L)=b0 (M,L) and P (A=a† |L;κ∗ )=P (A=a† |L).
Proof Suppose first that only the models for P (M =m|A,L) and P (A=a|L) are correctly
specified. Then,
38
Causal effects of intervening variables in settings with unmeasured confounding
m,l
=0.
Next, suppose that only the models for b0 (M,L) and h† (L) are correctly specified. Then,
m,l
=0.
39
Intervening variables and unmeasured confounding
Finally, suppose that only the models for b0 (M,L) and P (A=a|L) are correctly specified.
Then,
X
=E P (A=a† |L)E(Y |A=a† ,M =m,L)f (m|a† ,L)+
m
( ) !
X
P (A=a◦ |L;κ∗ ) b∗0 (m,L)f (m|a† ,L)−
h∗† ◦(((
|L)h∗† (L)−Ψ
((
+ P (A=a
(L) ((((
m
X
=P (A=a )E(Y |A=a† )+P (A=a◦ )
†
b0 (m,l)f (m|a† ,l)f (l|a◦ )
m,l
=0.
40
Causal effects of intervening variables in settings with unmeasured confounding
We describe one class of inverse probability weighted estimator that was used in the sim-
ulation and data analysis (see Fulcher et al., 2020 for other inverse probability weighted
estimators). Specifically, we can solve for ΨIP W in the following IPW estimator to estimate
Ψ:
" ( )#
I(A=a† ) X
Pn E(Y |A=a,M,L; θ̂)f (a|L;κ̂)−ΨIP W =0
f (A|L;κ̂) a
where Pn (X)=n−1 ni=1 Xi and E(Y |A,M,L; θ̂) is an estimate of E(Y |A,M,L) such that
P
E(Y |A=a◦ ,M,L)=b0 (M,L).
The ICE estimator that was used in the simulation and data analysis follows from the
weighted ICE procedure, whereby we set Ŵ =1 for all regression steps.
Herein, we describe an iterative TMLE algorithm that is doubly robust in the sense of
Fulcher et al. (2020) for binary mediators.
41
Intervening variables and unmeasured confounding
(t+1)
B. Define Q̂diff =Q(t+1) (m=1,L; δ̂ (t+1) )−Q(t+1) (m=0,L; δ̂ (t+1) ). In those whose A=
a† , fit a single covariate regression model (with no intercept) for the conditional
distribution of M given by
(t+1)
R(t+1) (a† ,L;ν (t+1) )=g −1[ {g{R̂(t) (a† ,L)}+ν (t+1) Q̂diff ]
◦
with observational weights given by Ŵ2 = P̂ (A=a† |L) . More specifically, we solve for
P̂ (A=a |L)
ν (t+1) in the following estimating equations:
h n oi
(t+1)
Pn I(A=a† )Ŵ2 Q̂diff M −R(t+1) (a† ,L;ν (t+1) ) =0.
C. Update Q̂(t+1) (M,L):=Q(t+1) (M,L; δ̂ (t+1) ) and R̂(t+1) (a† ,L):=R(t+1) (a† ,L; ν̂ (t+1) ).
D. t+=1
(K)
5: Upon convergence at iteration K, define b̂0 (m,L)= Q̂(K) (m,L) for m=0,1 and
fˆ(K) (M |A=a† ,L)= R̂(K) (a† ,L). In those whose A=a◦ , fit another regression model
T (β)=g −1 (β) for ψ3 =E{h† (L)|A=a◦ } with just an intercept. More specifically, we
solve for β in the following estimating equations:
" ( 1 )#
X (K)
Pn I(A=a◦ ) b̂ (m,L)fˆ(K) (m|a† ,L)−T (β) =0.
0
m=0
42
Causal effects of intervening variables in settings with unmeasured confounding
Extensions to discrete exposure variables with more than two levels is straightforward. To
see this, we can show that our estimand can be written as follows:
X X
Ψ:=P (A=a† )E(Y |A=a† )+ P (A=a◦ ) E(Y |M =m,L=l,A=a◦ )f (m|a† ,l)f (l|a◦ ).
∀a◦ 6=a† m,l
It then follows that in this extension, the efficient influence function for Ψ is given by: The
efficient influence function ϕeff (O) for AY =(L,A,M,Y ) is given by
X
ϕeff (O)=I(A=a† )Y + I(A=a◦ )ψ3 +
∀a◦ 6=a†
"
X I(A=a◦ )P (A=a† |M,L)P (A=a◦ |L)
{Y −b0 (M,L)}+
P (A=a◦ |M,L)P (A=a† |L)
∀a◦ 6=a†
The weighted estimator will still be sample-bounded, but will need to be slightly modified
in the following way:
Algorithm 5 Algorithm for Weighted ICE (generalized frontdoor formula for discrete
exposure with more than two levels)
1: Non-parametrically compute P (A=a) for all values of a∈A.
2: Compute the MLEs κ̂ of κ from the observed data for the treatment model P (A=a|L;κ).
In addition, compute the MLEs α̂ of α from the observed data for the treatment model
43
Intervening variables and unmeasured confounding
P (A=a|M,L;α), or compute the MLEs γ̂ of γ from the observed data for the mediator
model P (M =m|A,L;γ)
3: For all levels of a◦ ∈A that is not equal to a† , do the following:
f (M |A=a† ,L;γ̂)
f (M |A=a◦ ,L;γ̂)
if γ̂ was estimated in the previous step. Moreover, φa◦ (M,L) is a known function
of M and L. More specifically, we solve for θ in the following estimating equations:
h i
Pn I(A=a◦ )φa◦ (M,L)Ŵ1,a◦ {Y −Qa◦ (M,L;θa◦ )} =0.
B. In those whose A=a† , fit a regression model Ra◦ (L;ηa◦ )=g −1 {ηaT◦ Γa◦ (L)} for
h† (L)=E(ba◦ (M,L)|L,A=a† ) where the score function for each observation is
weighted by
P (A=a◦ |L;κ̂)
Ŵ2,a◦ = .
P (A=a† |L;κ̂)
Here, γa◦ (L) is a known function of L. More specifically, we solve for η in the
following estimating equations:
h n oi
Pn I(A=a† )Γa◦ (L)Ŵ2,a◦ Qa◦ (M,L; θ̂a◦ )−Ra◦ (L;ηa◦ ) =0.
C. In those whose A=a◦ , fit another regression model T (βa◦ )=g −1 (βa◦ ) for ψ3 =
E{h†,a◦ (L)|A=a◦ } with just an intercept. More specifically, we solve for βa◦ in
the following estimating equations:
44
Causal effects of intervening variables in settings with unmeasured confounding
The data-generating mechanism for our second simulation study and model specifications
are provided in Table 3. We consider four scenarios to illustrate the robustness of our pro-
posed estimator to model misspecification. We consider four model specification scenarios:
(1) all models are correctly specified, (2) only the models for b0 (M,L) and h† (L) are cor-
rectly specified, (3) only the models for b0 (M,L) and P (A=a|L) are correctly specified, and
(4) only the models for P (M =m|A,L) and P (A=a|L) are correctly specified. The correct
mediator model in the specification scenarios is the one used in the data generation process,
and the exposure and outcome models are approximately correctly specified by including
pairwise interactions between all the variables to ensure flexibility.
Table 4 shows the results from the simulation study. As expected by our theoretical deriva-
tions, when all of the working models are correctly specified, all of the estimators are nearly
unbiased. The AIPW estimator and our proposed weighted ICE estimator are also nearly
unbiased in the three model misspecification settings whereas the IPW and ICE estimators
are not all unbiased.
Table 3: Data generating mechanism and model misspecifications for scenarios in simulation
study.
45
Intervening variables and unmeasured confounding
Table 4: Simulation results: Bias, standard error (SE), and standardized bias (Biass ) are
multiplied by 100.
From first order expansion of a singly-robust plug-in estimator (IPW and ICE estimators),
it can be shown that we require the nuisance parameter estimators to converge to the truth
at rate n−1/2 . However, this is not possible for non-parametric conditional mean functions
as this rate is not attainable for these types of functions. However when doubly robust
estimators are used with data-adaptive methods this issue largely disappears as doubly
robust estimators enjoy the small bias property (Newey et al., 2004).
In this section we will examine the Remainder or Bias term from the following decomposi-
tion. For notational brevity, we suppress O in the equations below. For generality, suppose
that Ψ(P̂ ) is an estimator that solves the estimating equations based on the efficient influ-
ence function. We have that
√ √ h i √ h i
n(Ψ(P̂ )−Ψ(P ))= n Pn (ϕef f (P̂ ))−P (ϕef f (P̂ )) + n Ψ(P̂ )+P (ϕef f (P̂ ))−Ψ(P )
=Gn (ϕef f (P ))+Gn [ϕef f (P̂ )−ϕef f (P )]+
46
Causal effects of intervening variables in settings with unmeasured confounding
√ h i
n Ψ(P̂ )+P (ϕef f (P̂ ))−Ψ(P )
=Gn (ϕ(P )) +Gn [ϕ(P̂ )−ϕ(P )] +
| {z } | {z }
T1 T2
√
n Ψ(P̂ )+P (ϕef f (P̂ ))−Ψ(P )
| {z }
R
√
where Gn [X]= n(Pn −P )(X) for any X and we define ϕ(O; P̃ )=ϕef f (O; P̃ )+Ψ(P̃ ) for
any P̃ . The first term given by T1 is a centered sample average which converges to a
mean zero Normal distribution by the central limit theorem. The second term is known
as an empirical process term, which can be shown to be op (1) if we assume that nuisance
functions and their corresponding estimators are not too complex and belong to Donsker
class. Alternatively, one can use sample splitting and cross fitting to overcome issues with
overfitting (Chernozhukov et al., 2018).
Formally, we assume the following conditions for the first two terms:
p
C3. kϕ̂(O)−ϕ(O)k22 −→0
where ϕ̂(O) is analogously as ϕ(O) but evaluated at P̂ , and where all nuisance functions
estimators are exactly the same as those in the TMLE estimator. Similarly, ϕ̂ef f (O) can
be defined analogously. It is not hard to show that Pn {ϕ̂eff (O)}=0 by construction.
The last term is known as the remainder or bias term. We will need to show that R=op (1)
under some conditions about the convergence rates of the nuisance functions.
We examine the terms in blue (henceforth denoted as (A)) and term in red (henceforth
denoted as (B)) in detail. Starting with the term in (B):
47
Intervening variables and unmeasured confounding
" #
◦ I(A=a† )fˆ(a◦ |L)
(B)=EP I(A=a )(ĥ† (L)−h† (L))+ (b̂0 (M,L)− ĥ† (L))
fˆ(a† |L)
" #
◦ I(A=a† )fˆ(a◦ |L) n †
o
=EP I(A=a )(ĥ† (L)−h† (L))+ EP (b̂0 (M,L)|A=a ,L)− ĥ† (L)
fˆ(a† |L)
"
f (a† |L)fˆ(a◦ |L) n o
=EP f (a◦ |L)(ĥ† (L)−h† (L))+ EP (b̂0 (M,L)|A=a† ,L)− ĥ† (L) +
fˆ(a† |L)
#
n o
◦ † †
f (a |L) EP (b̂0 (M,L)|A=a ,L)−EP (b̂0 (M,L)|A=a ,L)
" ( )#
n o f (a† |L)fˆ(a◦ |L)
† ◦
=EP EP (b̂0 (M,L)|A=a ,L)− ĥ† (L) −f (a |L) +
fˆ(a† |L)
n o
EP EP b̂0 (M,L)−b0 (M,L)|A=a† ,L I(A=a◦ )
" #
n on o 1
=Ep EP (b̂0 (M,L)|A=a† ,L)− ĥ† (L)+h† (L)−h† (L) fˆ(a◦ |L)−f (a◦ |L)
fˆ(a† |L)
n o
EP EP b̂0 (M,L)−b0 (M,L)|A=a† ,L I(A=a◦ )
" #
n
ˆ ◦ ◦
o 1
=EP {h† (L)− ĥ† (L)} f (a |L)−f (a |L) +
fˆ(a† |L)
" #
n o 1
EP EP (b̂0 (M,L)−b0 (M,L)|A=a† ,L) fˆ(a◦ |L)−f (a◦ |L) +
fˆ(a† |L)
n o
EP EP b̂0 (M,L)−b0 (M,L)|A=a† ,L I(A=a◦ ) .
| {z }
(B.2)
We keep in mind the term in purple (term (B.2)) as we expand upon term (A):
" #
I(A=a◦ )fˆ(M |a† ,L)
(A)=EP (Y − b̂0 (M,L))
fˆ(M |a◦ ,L)
" #
I(A=a◦ )fˆ(M |a† ,L)
=EP b0 (M,L)− b̂0 (M,L)
fˆ(M |a◦ ,L)
" #!
◦ fˆ(M |a† ,L)f (M |a◦ ,L) n o
†
=EP I(A=a )EP b0 (M,L)− b̂0 (M,L) |A=a ,L .
fˆ(M |a◦ ,L)f (M |a† ,L)
Now, adding (A) and (B.2) (term in purple) we get the following:
"( ) #!
◦ fˆ(M |a† ,L)f (M |a◦ ,L) n o
†
EP I(A=a )EP −1 b0 (M,L)− b̂0 (M,L) |A=a ,L
fˆ(M |a◦ ,L)f (M |a† ,L)
48
Causal effects of intervening variables in settings with unmeasured confounding
"( ) #!
◦
f (M |a ,L) 1 n on o
=EP I(A=a◦ )EP fˆ(M |a† ,L)−f (M |a† ,L) b0 (M,L)− b̂0 (M,L) |A=a† ,L +
f (M |a† ,L) fˆ(M |a◦ ,L)
"( ) #!
◦ 1 n
◦ ˆ ◦
on o
†
EP I(A=a )EP f (M |a ,L)− f (M |a ,L) b0 (M,L)− b̂0 (M,L) |A=a ,L .
fˆ(M |a◦ ,L)
" #
n o 1
EP {h† (L)− ĥ† (L)} fˆ(a◦ |L)−f (a◦ |L) +
fˆ(a† |L)
" #
n o 1
EP EP (b̂0 (M,L)−b0 (M,L)|A=a ,L) fˆ(a◦ |L)−f (a◦ |L)
†
+
fˆ(a† |L)
"( ) #!
f (M |a◦ ,L) 1 n on o
EP ◦
I(A=a )EP fˆ(M |a ,L)−f (M |a ,L) b0 (M,L)− b̂0 (M,L) |A=a ,L +
† † †
f (M |a† ,L) fˆ(M |a◦ ,L)
"( ) #!
◦ 1 n
◦ ˆ ◦
on o
†
EP I(A=a )EP f (M |a ,L)− f (M |a ,L) b0 (M,L)− b̂0 (M,L) |A=a ,L .
fˆ(M |a◦ ,L)
1/2 √ n
for ν >1/2 and where kf (x)k= |f (x)|2 dP (x)
R
, i.e. the L2 (P ) norm. Then, n Ψ(P̂ )+
o
P (ϕef f (P̂ ))−Ψ(P ) =op (1). This can be accomplished, for example, if the nuisance func-
tions are each consistently estimated at a rate faster than n−1/4 .
Note that h† (L)= m b0 (m,L)f (m|a† ,L). In our estimators, we propose estimating h† (L)
P
49
Intervening variables and unmeasured confounding
"( ) #!
1 n on o
EP I(A=a◦ )EP f (M |a◦ ,L)− fˆ(M |a◦ ,L) b0 (M,L)− b̂0 (M,L) |A=a† ,L .
fˆ(M |a◦ ,L)
Then we can show that if the model for f (M |A,L) is correctly specified, then the remainder
term reduces to the following asymptotically:
f (M |a◦ ,L)
1
EP I(A=a◦ )EP
∗ † †
∗ †
f (M |a ,L)−f (M |a ,L) b 0 (M,L)−b 0 (M,L)|A=a ,L +
f (M |a† ,L) f ∗ (M |a◦ ,L)
1
EP I(A=a◦ )EP ◦ ∗ ◦ ∗ †
{f (M |a ,L)−f (M |a ,L)} b 0 (M,L)−b 0 (M,L)|A=a ,L +op (1)
f ∗ (M |a◦ ,L)
where f ∗ (M |A,L) and b∗0 (M,L) denote the limiting values of fˆ(M |A,L) and b̂0 (M,L). This
gives intuition to why the augmented inverse probability weighted estimator proposed in
Fulcher et al. (2020) is consistent when models for b0 (M,L) and P (A=a|L) are correctly
specified, or when the model for P (M =m|A,L) is correctly specified.
1/2 √ n
for ν >1/2 and where kf (x)k= |f (x)|2 dP (x)
R
, i.e. the L2 (P ) norm, then, n Ψ(P̂ )+
o
P (ϕef f (P̂ ))−Ψ(P ) =op (1). This can be accomplished, for example, if the nuisance func-
tions are each consistently estimated at a rate faster than n−1/4 .
References
Joshua D Angrist, Guido W Imbens, and Donald B Rubin. Identification of causal effects
using instrumental variables. Journal of the American statistical Association, 91(434):
444–455, 1996.
Chen Avin, Ilya Shpitser, and Judea Pearl. Identifiability of path-specific effects. In IJCAI
International Joint Conference on Artificial Intelligence, pages 357–363, 2005.
50
Causal effects of intervening variables in settings with unmeasured confounding
L.M. Collins, J.L. Schafer, and C.M. Kam. A comparison of inclusive and restrictive strate-
gies in modern missing data procedures. Psychological Methods, 6(4):330–351, 2001.
A Philip Dawid. Causal inference without counterfactuals. Journal of the American statis-
tical Association, 95(450):407–424, 2000.
Vanessa Didelez. Causal concepts and graphical models. In Handbook of graphical models,
pages 353–380. CRC Press, 2018.
Vanessa Didelez. Defining causal mediation with a longitudinal mediator and a survival
outcome. Lifetime data analysis, 25:593–610, 2019.
Deborah Dowell, Tamara M Haegerich, and Roger Chou. Cdc guideline for prescribing
opioids for chronic pain—united states, 2016. Jama, 315(15):1624–1645, 2016.
Deborah Dowell, Kathleen R Ragan, Christopher M Jones, Grant T Baldwin, and Roger
Chou. Cdc clinical practice guideline for prescribing opioids for pain—united states, 2022.
MMWR Recommendations and Reports, 71(3):1, 2022.
D. Gerdeman. Minorities who ’whiten’ job resumes get more interviews. Harvard Business
School: Working Knowledge, 2017.
Susan Gruber and Mark J van der Laan. A targeted maximum likelihood estimator of a
causal effect on a bounded continuous outcome. The International Journal of Biostatis-
tics, 6(1), 2010.
Miguel A Hernán and Sarah L Taubman. Does obesity shorten life? the importance of
well-defined interventions to answer causal questions. International journal of obesity, 32
(3):S8–S14, 2008.
Paul W Holland. Statistics and causal inference. Journal of the American statistical Asso-
ciation, 81(396):945–960, 1986.
51
Intervening variables and unmeasured confounding
Kosuke Inoue, Beate Ritz, and Onyebuchi A Arah. Causal effect of chronic pain on mortality
through opioid prescriptions: Application of the front-door formula. Epidemiology, 33(4):
572–580, 2022.
S. K. Kang, K. A. DeCelles, A. Tilcsik, and S. Jun. Whitened résumés: Race and self-
presentation in the labor market. Administrative Science Quarterly, 61(3):469–502, 2016.
Marc Lipsitch, Eric Tchetgen Tchetgen, and Ted Cohen. Negative controls: a tool for de-
tecting confounding and bias in observational studies. Epidemiology (Cambridge, Mass.),
21(3):383, 2010.
H. Merskey and N. Bogduk. Classification of chronic pain, iasp task force on taxonomy,
1994.
Whitney K Newey, Fushing Hsieh, and James M Robins. Twicing kernels and a small bias
property of semiparametric estimators. Econometrica, 72(3):947–962, 2004.
Thomas S Richardson and James M Robins. Single world intervention graphs (swigs): A
unification of the counterfactual and graphical approaches to causality. CSSS, University
of Washington Series. Working Paper, 128(30):2013, 2013.
J. Robins, L. Li, E. Tchetgen, and A. van der Vaart. Higher order influence functions and
minimax estimation of nonlinear functionals. Probability and statistics: essays in honor
of David A. Freedman, 2:335–421, 2008.
J. M. Robins and T. S. Richardson. Alternative graphical causal models and the identi-
fication of direct effects. Causality and psychopathology: Finding the determinants of
disorders and their cures, 84:103–158, 2010.
52
Causal effects of intervening variables in settings with unmeasured confounding
J. Schafer and J. Kang. Average causal effects from nonrandomized studies: a practical
guide and simulated example. Psychological methods, 13(4):279, 2008.
Ilya Shpitser. Counterfactual graphical models for longitudinal mediation analysis with
unobserved confounding. Cognitive science, 37(6):1011–1035, 2013.
Ilya Shpitser and Eli Sherman. Identification of personalized effects associated with causal
pathways. In Uncertainty in artificial intelligence: proceedings of the... conference. Con-
ference on Uncertainty in Artificial Intelligence, volume 2018. NIH Public Access, 2018.
Mats J Stensrud, James M Robins, Aaron Sarvet, Eric J Tchetgen Tchetgen, and Jessica G
Young. Conditional separable effects. Journal of the American Statistical Association,
118(544):1–29, 2022a.
Mats J Stensrud, Jessica G Young, Vanessa Didelez, James M Robins, and Miguel A Hernán.
Separable effects for causal inference in the presence of competing events. Journal of the
American Statistical Association, 117(537):175–183, 2022b.
Eric J Tchetgen Tchetgen and Ilya Shpitser. Semiparametric theory for causal media-
tion analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Annals of
statistics, 40(3):1816, 2012.
Eric J Tchetgen Tchetgen, Andrew Ying, Yifan Cui, Xu Shi, and Wang Miao. An intro-
duction to proximal causal learning. arXiv preprint arXiv:2009.10982, 2020.
M. J. Van der Laan and S. Rose. Targeted learning: causal inference for observational and
experimental data, volume 10. Springer, 2011.
M. J. Van der Laan and S. Rose. Targeted learning in data science. Springer, 2018.
Mark van der Laan and Susan Gruber. One-step targeted minimum loss-based estimation
based on universal least favorable one-dimensional submodels. International Journal of
Biostatistics, 12(1):351–378, 2009.
A. W. Van Der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
53
Intervening variables and unmeasured confounding
S. Vansteelandt, A. Rotnitzky, and J.M. Robins. Estimation of regression models for the
mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika,
94(4):841–860, 2007.
54