Cinelli - Hazlett - 2020 - Making Sense of Sensitivity Extending Omitted Variable Bias
Cinelli - Hazlett - 2020 - Making Sense of Sensitivity Extending Omitted Variable Bias
B (2020)
82, Part 1, pp. 39–67
Summary. We extend the omitted variable bias framework with a suite of tools for sensitivity
analysis in regression models that does not require assumptions on the functional form of
the treatment assignment mechanism nor on the distribution of the unobserved confounders,
naturally handles multiple confounders, possibly acting non-linearly, exploits expert knowledge
to bound sensitivity parameters and can be easily computed by using only standard regression
results. In particular, we introduce two novel sensitivity measures suited for routine reporting.The
robustness value describes the minimum strength of association that unobserved confounding
would need to have, both with the treatment and with the outcome, to change the research
conclusions. The partial R 2 of the treatment with the outcome shows how strongly confounders
explaining all the residual outcome variation would have to be associated with the treatment
to eliminate the estimated effect. Next, we offer graphical tools for elaborating on problematic
confounders, examining the sensitivity of point estimates and t-values, as well as ‘extreme
scenarios’. Finally, we describe problems with a common ‘benchmarking’ practice and introduce
a novel procedure to bound the strength of confounders formally on the basis of a comparison
with observed covariates. We apply these methods to a running example that estimates the
effect of exposure to violence on attitudes toward peace.
Keywords: Causal inference; Confounding; Omitted variable bias; Regression; Robustness
value; Sensitivity analysis
1. Introduction
Observational research often seeks to estimate causal effects under a ‘no-unobserved-
confounding’ or ‘ignorability’ (conditional on observables) assumption (see for example Rosen-
baum and Rubin (1983a), Pearl (2009) and Imbens and Rubin (2015)). When making causal
claims from observational data, investigators marshal what evidence they can to argue that their
result is not due to confounding. In ‘natural’ and ‘quasi’-experiments, this often includes a qual-
itative account for why the treatment assignment is ‘as if’ random conditional on a set of key
characteristics (see for example Angrist and Pischke (2008) and Dunning (2012)). Investigators
seeking to make causal claims from observational data are also instructed to show ‘balance
tests’ and ‘placebo tests’. Although, in some cases, null findings on these tests may be consistent
with the claim of no unobserved confounders, they are certainly not dispositive: it is unobserved
variables that we worry may be both ‘imbalanced’ and related to the outcome in problematic
ways. Fundamentally, causal inferences always require assumptions that are unverifiable from
the data (Pearl, 2009).
Thus, in addition to balance and placebo tests, investigators are advised to conduct ‘sensitivity
Address for correspondence: Chad Hazlett, Departments of Statistics and Political Science, University of Cali-
fornia, Los Angeles, 8125 Math Sciences Building, Los Angeles, CA 90095, USA.
E-mail: [email protected]
(a) the ‘robustness value’ RV, which provides a convenient reference point to assess the overall
robustness of a coefficient to unobserved confounding. If the confounders’ association
to the treatment and to the outcome (measured in terms of partial R2 ) are both assumed
to be less than the robustness value, then such confounders cannot ‘explain away’ the
observed effect. And,
Extending Omitted Variable Bias 41
(b) the proportion of variation in the outcome explained uniquely by the treatment, R2Y ∼D|X ,
which reveals how strongly counfounders that explain 100% of the residual variance of
the outcome would have to be associated with the treatment to eliminate the effect.
Both measures can be easily computed from standard regression output: one needs only the
estimate’s t-value and the degrees of freedom. To advance standard practice across a variety of
disciplines, we propose routinely reporting RV and R2Y ∼D|X in regression tables.
Next, we offer graphical tools that investigators can use to refine their sensitivity analyses.
The first is close in spirit to the proposal of Imbens (2003)—a bivariate sensitivity contour plot,
parameterizing the confounder in terms of partial R2 values. However, contrary to Imbens’s
(transcript from an interview taken by the Darfurian Voices team; interview code 03072009
118 cf2009008).
One can further argue that attacks were indiscriminate within village on the basis that the
violence that was promoted by the government was mainly used to drive people out rather than
to target individuals. Within village, the bombing was crude and the attackers had almost no
information about whom they would target, with one major exception: whereas both men and
women were often injured or killed, women were targeted for widespread sexual assault and
rape by the Janjaweed.
With this in mind, an investigator might claim that village and gender are sufficient for control
of confounding and estimate the linear model
PeaceIndex = τˆres DirectHarm + β̂ f ,res Female + Village β̂v,res + Xβ̂res + "ˆres .1/
i.e. our earlier estimate τˆres would differ from our target quantity τˆ: but how badly? How
cov.D⊥X , Y ⊥X /
τˆres =
3.3. Using the traditional omitted variable bias for sensitivity analysis
If we know the signs of the partial correlations between the confounder with the treatment and
the outcome (the same as the signs of γ̂ and δ̂) we can argue whether our estimate is likely to
be underestimating or overestimating the quantity of interest. Arguments using correlational
direction are common practice in econometrics work (for an example, see Angrist and Pischke
(2017), pages 8–9). Often, though, discussing possible direction of the bias is not possible or not
sufficient, and magnitude must be considered. How strong would the confounder(s) have to be
to change the estimates in such a way to affect the main conclusions of a study?
−0
.0
4
−0
.0
−0
1
.0
Impact of Centre/Periphery
2
0.3
0
0.0
1
0.2
0.0
2
0.0
3
0.0
4
0.05
0.1
0.06
0.07
0.08
Unadjusted
(0.097) 0.09
0.0
where Ẑ are the fitted values given by regressing Z on D. Note that the R2 is symmetric, i.e. it is
invariant to whether we use the ‘forward’ or the ‘reverse’ regression since R2Z∼D = corr.Z, D/2 =
corr.D, Z/2 = R2D∼Z . Extending this to the case with covariates X, we denote the partial R2 from
regressing Z on D after controlling for X as R2Z∼D|X . This has the same useful symmetry, with
= δ̂ γ̂
bias
cov.D⊥X , Z⊥X / cov.Y ⊥X,D , Z⊥X,D /
=
var.D⊥X / var.Z⊥X,D /
corr.D⊥X , Z⊥X / sd.Z⊥X / corr.Y ⊥X,D , Z⊥X,D / sd.Y ⊥X,D /
=
sd.D⊥X / sd.Z⊥X,D /
corr.Y ⊥X,D , Z⊥X,D / corr.D⊥X , Z⊥X / sd.Y ⊥X,D /
= : .7/
sd.Z⊥X,D /=sd.Z⊥X / sd.D⊥X /
Noting that corr.Y ⊥X,D , Z⊥X,D /2 = R2Y ∼Z|X,D , that corr.Z⊥X , D⊥X /2 = R2D∼Z|X and that
var.Z⊥X,D /=var.Z⊥X / = 1 − R2Z∼D|X = 1 − R2D∼Z|X , we can write equation (7) as
=
R2Y ∼Z|D,X R2D∼Z|X sd.Y ⊥X,D /
|bias| : .8/
1 − R2D∼Z|X sd.D⊥X /
Equation (8) rewrites the OVB formula in terms that more conveniently rely on partial R2
measures of association rather than raw regression coefficients. Investigators may be interested
in how confounders alter inference as well, so we also examine the standard error. Let df denote
the regression’s degrees of freedom (for the restricted regression actually run). Noting that
sd.Y ⊥X,D / 1
se.τˆres / = , .9/
sd.D⊥X / df
sd.Y ⊥X,D,Z / 1
se.τˆ/ = , .10/
sd.D⊥X,Z / df − 1
whose ratio is
se.τˆ/ sd.Y ⊥X,D,Z / sd.D⊥X / df
= , .11/
se.τˆres / sd.Y ⊥X,D / sd.D⊥X,Z / df − 1
The numerator of the relative bias contains the partial Cohen’s f of the confounder with the
treatment, amortized by the partial correlation of that confounder with the outcome. (Cohen’s
f 2 can be written as f 2 = R2 =.1 − R2 /, so, for example, fD∼Z|X
2 = R2D∼Z|X =.1 − R2D∼Z|X /.) Collec-
tively this numerator could be called the bias factor of the confounder: BF = |RY ∼Z|D,X fD∼Z|X |,
which is determined entirely by the two sensitivity parameters R2Y ∼Z|D,X and R2D∼Z|X . To de-
termine the size of the relative bias, this is compared with how much variation of the outcome
is uniquely explained by the treatment assignment, in the form of the partial Cohen’s f of
the treatment with the outcome. Computationally, fY ∼D|X can be obtained by dividing the
t-value of the treatment
√ coefficient by the square root of the regression’s degrees of freedom—
fY ∼D|X = tτˆres = df. This enables us to assess easily the sensitivity to any confounder with a given
pair of partial R2 values; see Table 2 in the on-line supplement section D for an illustration
procedure.
Equation (14) also reveals that, given a particular confounder (which will fix BF), the only
property that is needed to determine the robustness of a regression estimate against that con-
founder is the partial R2 of the treatment with the outcome (via fY ∼D|X ). This serves to reinforce
the fact that robustness to confounding is an identification problem, impervious to sample size
considerations. Whereas t-values and p-values might be informative with respect to the statisti-
cal uncertainty (in a correctly specified model), robustness to misspecification is determined by
the share of variation of the outcome that the treatment uniquely explains.
A subtle but useful property of the partial R2 parameterization is that it reveals an asymme-
try in the role of the components of the bias factor. In the traditional OVB formulation, the
50 C. Cinelli and C. Hazlett
bias is simply a product of two terms with the same importance. The new formulation breaks
this symmetry: the effect of the partial R2 of the confounder with the outcome on the bias
factor is bounded at 1. By contrast, the effect of the partial R2 of the confounder with the
treatment on the bias factor is unbounded (via fD∼Z|X ). This enables us to consider extreme
scenarios, in which we suppose that the confounder explains all of the left-out variation of the
outcome, and to see what happens as we vary the partial R2 of the confounder with the treatment
(Section 5.3).
Therefore, if needed, we can reason directly about sensitivity parameters R2Y ∼Z|X and R2D∼Z|X .
Extending Omitted Variable Bias 51
Finally, it may be beneficial to reason in terms of how much explanatory power is added by
including confounders. For this, recall that the partial R2 s are defined as
R2Y ∼D+X+Z − R2Y ∼D+X
R2Y ∼Z|D,X = ,
1 − R2Y ∼D+X
.17/
R2D∼X+Z − R2D∼X
R2D∼Z|X = ,
1 − R2D∼X
i.e. plausibility judgements about the partial R2 boil down to plausibility judgements about the
total (or added) explanatory power that we would have obtained in the treatment and the outcome
(a) they can be routinely reported in standard regression tables, making the discussion of
sensitivity to unobserved confounding more accessible and standardized;
(b) they can be easily computed from quantities found in a regression table, enabling readers
and reviewers to initiate the discussion about unobserved confounders when reading
papers that did not formally assess sensitivity.
where fq := q|fY ∼D|X | is the partial Cohen’s f of the treatment with the outcome multiplied by
the proportion of reduction q on the treatment coefficient which would be deemed problematic.
Confounders that explain RVq % both of the treatment and of the outcome are sufficiently strong
to change the point estimate in problematic ways, whereas confounders with neither association
greater than RVq % are not.
The robustness value thus offers an interpretable sensitivity measure that summarizes how
robust the point estimate is to unobserved confounding. A robustness value that is close to 1
means that the treatment effect can handle strong confounders explaining almost all residual
variation of the treatment and the outcome. In contrast, a robustness value that is close to 0
means that even very weak confounders could eliminate the results. Note that the robustness
value can be easily computed from any regression table, √recalling that fY ∼D|X can be obtained
by simply dividing the treatment coefficient t-value by df.
52 C. Cinelli and C. Hazlett
With minor adjustment, robustness values can also be obtained for t-values, or lower and
upper bounds of confidence intervals. Let |tα,Å
df −1 | denote the t-value threshold for a √ t-test with
level of significance α and df − 1 degrees of freedom, and define fα, Å Å
df −1 := |tα, df −1 |= .df − 1/.
Now construct an adjusted fq,α , accounting for both the proportion of reduction q of the point
estimate and the boundary below which statistical significance is lost at the level of α:
f := q|f
q,α |−fÅ
Y ∼D|X :
α, df −1 .19/
If fq,α < 0, then the robustness value is 0. If fq,α > 0, then a confounder with a partial R2 of
√ 4 2 2
RVq,α = 21 { .fq,α + 4fq,α / − fq,α }, .20/
would change the conclusion. Since we are considering sample uncertainty, RVq,α is a more
conservative measure than RVq . If we pick |tα, Å
df −1 | = 0 then RVq,α reduces to RVq . Also, for fixed
Å
|tα, df −1 |, RVq,α converges to RVq when the sample size grows to ∞. See the on-line supplement
section A for details.
R2D∼Z|X−j
kD := ,
R2D∼Xj |X−j
.21/
R2Y ∼Z|X−j ,D
kY := ,
R2Y ∼Xj |X−j ,D
where X−j represents the vector of covariates X excluding Xj , i.e. kD indexes how much variance
of the treatment assignment the confounder explains relative to how much Xj explains (after
controlling for the remaining covariates). To make things concrete, for example, if the researcher
where η is a scalar which depends on kY , kD and R2D∼Xj |X−j (see the on-line supplement section
B for details). These equations enable us to investigate the maximum effect that a confounder
at most ‘k times’ as strong as a particular covariate Xj would have on the coefficient estimate.
These results are also tight, in the sense that we can always find a confounder that makes the
second inequality an equality. Further, certain values for kD and kY may be ruled out by the
data (for instance, if R2D∼Xj |X−j = 50% then kD must be less than 1).
Our bounding exercises can be extended to any subset of the covariates. For instance, the
researcher can bound the effect of a confounder as strong as all covariates X or any subset
thereof. The method can also be extended to allow different subgroups of covariates to bound
R2D∼Z|X and R2Y ∼Z|D,X —thus, if a group of covariates X1 is known to be the most important
driver of selection to treatment, and another group of covariates X2 is known to be the most
important determinant of the outcome, the researcher can exploit this fact.
5.1. Proposed minimal reporting: robustness value, RY2 DjX and bounds
Table 1 illustrates the type of reporting that we propose should accompany linear regression
models that are used for causal inference with observational data. Along with traditionally
reported statistics, we propose that researchers present
(a) the partial R2 of the treatment with the outcome and
(b) the robustness value RV, both for where the point estimate and the confidence interval
would cross zero, or another meaningful reference value (for convenience, we refer to RVq
or RVq,α with q = 1 as simply RV or RVα ).
Estimate Standard error t-value R2Y ∼D|X (%) RV (%) RVα=0:05 (%)
†df = 783; bound (Z as strong as Female), R2Y ∼Z|D,X = 12%, R2D∼Z|X = 1%.
56 C. Cinelli and C. Hazlett
Finally, to aid user judgement, we encourage researchers to provide plausible bounds on the
strength of the confounder. These may be based on bounds employing meaningful covariates
determined by the research context and design (Section 4.4), or in principle may be available
from theory and previous literature.
For our running example of violence in Darfur, Table 1 shows an augmented regression
table, including the robustness value RV of DirectHarm coefficient: 13.9%. This means that
unobserved confounders explaining at least 13.9% of the residual variance of both the treat-
ment and the outcome would explain away the estimated treatment effect. It also means that
any confounder explaining less than 13.9% of the residual variance of both the treatment
and the outcome would not be sufficiently my strong to bring down the estimated effect to
−0
.3
3x female
0.4
(0.03)
−0
.25
0.3
2x female
(0.05)
−0
−0
.15
1x female
(0.08)
−0.
0.1
−0.05
Unadjusted
(0.097) 0
0.0
0.05
−1
3x female 2
0.4
(1.63)
−1
0
0.3
2x female
−8
(2.6)
0.2
−6
1x female
(3.44) −4
0.1
−2
Unadjusted
(4.2) 0
0.0
4 2
6. Discussion
6.1. Making formal sensitivity analysis standard practice
Given that ruling out unobserved confounders is often difficult or impossible in observational
research, we might expect that sensitivity analyses would be a routine procedure in numerous
0.10
0.05
Adjusted effect estimate
0.00
−0.05
−0.10
−0
.2
0.4
Proper bound
(0)
−0
.1
0.3
0
0.2
0.1
Informal benchmark
(0.31)
0.1
0.2
Unadjusted 0.3
(0.5) 0.4
0.0
0.5
where λ̂ and θ̂ are the coefficients of the regression D = θ̂X + λ̂Z + "ˆD . Consequently, claims that
δOster = 1 implies that ‘the unobservable and observables are equally related to the treatment’
(Oster (2019), page 192) can lead researchers astray, as this quantity also depends on associations
with the outcome. To see how, let the variables be standardized to mean 0 and unit variance,
and pick β̂ = θ̂ = p, γ̂ = λ̂ = p=2 and τˆ = 0. In this case, the confounder Z has either half or a
quarter of the explanatory power of X (as measured by standardized coefficients or variance
explained), yet δOster = 1.
Although researchers may be able to make arguments about relative explanatory power of
observables and unobservables in the treatment assignment process, the δOster -parameter does
not correspond directly to such claims. Indeed, arguments made by researchers applying Oster
(2019) suggest that they believe they are comparing the explanatory power of observables and
unobservables over treatment assignment in terms such as correlation or variance explained (e.g.
as in Jakiela and Ozier (2018), page 4, ‘Following the approach suggested by Altonji, Elder, and
Taber (2005) and Oster (2017), we estimate that unobservable country-level characteristics would
need to be 1.44 times more correlated with treatment than observed covariates to fully explain
the apparent impact of grammatical gender on the level of female labor force participation;
unobserved factors would need to be 3.23 times more closely linked to treatment to explain the
impact of grammatical gender on the gender gap in labor force participation’). By contrast, the
parameter kD that we introduced in our bounding procedure (Section 4.4) captures precisely
this notion of the relative explanatory power of the unobservable and observable over treatment
assignment, in terms of partial R2 or total R2 , depending on the investigator’s preference.
Such parameterization choices are more than notional when they drive a wedge between what
investigators can argue about and the values of the parameters that these arguments imply. It is
thus important that the sensitivity parameters that are used in these exercises be as transparent
Extending Omitted Variable Bias 65
as possible and match investigators’ conception of what they mean. Hence, we employ R2 -
based parameters, rather than t-values or quantities relating indices. The resulting sensitivity
parameters not only correspond more directly to what investigators can articulate and reason
about, but also lead to the rich set of sensitivity exercises that we have discussed. Of course,
further improvements may be possible and future research should investigate whether such
flexibility can be achieved with yet more meaningful parameterizations.
The tools that we propose here, like any other, have potential for abuse. We thus end with
important caveats, in particular emphasizing that sensitivity analysis should not be used for
automatic judgement, but as an instrument for disciplined arguments about confounding.
References
Altonji, J. G., Elder, T. E. and Taber, C. R. (2005) An evaluation of instrumental variable strategies for estimating
the effects of catholic schooling. J. Hum. Resour., 40, 791–821.
Angrist, J. D. and Pischke, J.-S. (2008) Mostly Harmless Econometrics: an Empiricist’s Companion. Princeton:
Princeton University Press.
Angrist, J. D. and Pischke, J.-S. (2017) Undergraduate econometrics instruction: through our classes, darkly.
Technical Report. National Bureau of Economic Research, Cambridge.
Blackwell, M. (2013) A selection bias approach to sensitivity analysis for causal effects. Polit. Anal., 22, 169–182.
Brumback, B. A., Hernán, M. A., Haneuse, S. J. and Robins, J. M. (2004) Sensitivity analyses for unmeasured
confounding assuming a marginal structural model for repeated measures. Statist. Med., 23, 749–767.
Carnegie, N. B., Harada, M. and Hill, J. L. (2016a) Assessing sensitivity to unmeasured confounding using a
simulated potential confounder. J. Res. Educ. Effect., 9, 395–420.
Carnegie, N., Harada, M. and Hill, J. (2016b) treatsens: a package to assess sensitivity of causal analyses to
unmeasured confounding.
Cinelli, C. and Hazlett, C. (2019) sensemakr: sensitivity analysis tools for OLS. R Package Version 0.1.2.
Cinelli, C., Kumor, D., Chen, B., Pearl, J. and Bareinboim, E. (2019) Sensitivity analysis of linear structural causal
models. Proc. Mach. Learn. Res., 97, 1252–1261.
Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B. and Wynder, E. L. (1959) Smoking
and lung cancer: recent evidence and a discussion of some questions. J. Natn. Cancer Inst., 22, 173–203.
Ding, P. and Miratrix, L. W. (2015) To adjust or not to adjust?: Sensitivity analysis of M-bias and butterfly-bias.
J. Causl Inf., 3, 41–57.
Dorie, V., Harada, M., Carnegie, N. B. and Hill, J. (2016) A flexible, interpretable framework for assessing
sensitivity to unmeasured confounding. Statist. Med., 35, 3453–3470.
Dunning, T. (2012) Natural Experiments in the Social Sciences: a Design-based Approach. New York: Cambridge
University Press.
Flint, J. and de Waal, A. (2008) Darfur: a New History of a Long War. London: Zed Books.
Frank, K. A. (2000) Impact of a confounding variable on a regression coefficient. Sociol. Meth. Res., 29, 147–194.
Frank, K. A., Maroulis, S. J., Duong, M. Q. and Kelcey, B. M. (2013) What would it take to change an inference?:
Using Rubin’s causal model to interpret the robustness of causal inferences. Educ. Evaln Poly Anal., 35, 437–460.
Frank, K. and Min, K.-S. (2007) Indices of robustness for sample representation. Sociol. Methodol., 37, 349–392.
Frank, K. A., Sykes, G., Anagnostopoulos, D., Cannata, M., Chard, L., Krause, A. and McCrory, R. (2008)
Does NBPTS certification affect the number of colleagues a teacher helps with instructional matters? Educ.
Evaln Poly Anal., 30, 3–30.
Franks, A., D’Amour, A. and Feller, A. (2019) Flexible sensitivity analysis for observational studies without
observable implications. J. Am. Statist. Ass., to be published.
Extending Omitted Variable Bias 67
Frisch, R. and Waugh, F. V. (1933) Partial time regressions as compared with individual trends. Econometrica, 1,
387–401.
Hazlett, C. (2019) Angry or weary?: The effect of personal violence on attitudes towards peace in Darfur. J. Conflct
Resoln, to be published.
Hong, G., Qin, X. and Yang, F. (2018) Weighting-based sensitivity analysis in causal mediation studies. J. Educ.
Behav. Statist., 43, 32–56.
Hosman, C. A., Hansen, B. B. and Holland, P. W. (2010) The sensitivity of linear regression coefficients’ confidence
limits to the omission of a confounder. Ann. Appl. Statist., 4, 849–870.
Imai, K., Keele, L. and Yamamoto, T. (2010) Identification, inference and sensitivity analysis for causal mediation
effects. Statist. Sci., 25, 51–71.
Imbens, G. W. (2003) Sensitivity to exogeneity assumptions in program evaluation. Am. Econ. Rev., 93, 126–132.
Imbens, G. W. and Rubin, D. B. (2015) Causal Inference in Statistics, Social, and Biomedical Sciences. New York:
Cambridge University Press.
Supporting information
Additional ‘supporting information’ may be found in the on-line version of this article:
‘Online supplementary material for “Making sense of sensitivity: extending omitted variable bias”’.