0% found this document useful (0 votes)
41 views13 pages

Journal of Econometrics: Xun Lu, Halbert White

This article discusses how to properly interpret evidence from robustness checks in applied economic studies. It explains that robustness checks alone do not provide necessary or sufficient evidence of structural validity, and can be misleading if not conducted properly. The article also presents new methods for robustness testing and estimation that can help strengthen inferences about economic structures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views13 pages

Journal of Econometrics: Xun Lu, Halbert White

This article discusses how to properly interpret evidence from robustness checks in applied economic studies. It explains that robustness checks alone do not provide necessary or sufficient evidence of structural validity, and can be misleading if not conducted properly. The article also presents new methods for robustness testing and estimation that can help strengthen inferences about economic structures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Journal of Econometrics 178 (2014) 194–206

Contents lists available at ScienceDirect

Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom

Robustness checks and robustness tests in applied economics✩


Xun Lu a,∗ , Halbert White b
a
Department of Economics, Hong Kong University of Science and Technology, Hong Kong
b
Department of Economics, University of California, San Diego, United States

article info abstract


Article history: A common exercise in empirical studies is a ‘‘robustness check’’, where the researcher examines how
Available online 13 August 2013 certain ‘‘core’’ regression coefficient estimates behave when the regression specification is modified by
adding or removing regressors. If the coefficients are plausible and robust, this is commonly interpreted
JEL classification:
as evidence of structural validity. Here, we study when and how one can infer structural validity
C18
from coefficient robustness and plausibility. As we show, there are numerous pitfalls, as commonly
C51
implemented robustness checks give neither necessary nor sufficient evidence for structural validity.
Keywords: Indeed, if not conducted properly, robustness checks can be completely uninformative or entirely
Robustness misleading. We discuss how critical and non-critical core variables can be properly specified and how non-
Causal effect core variables for the comparison regression can be chosen to ensure that robustness checks are indeed
Conditional exogeneity
structurally informative. We provide a straightforward new Hausman (1978) type test of robustness for
Specification test
the critical core coefficients, additional diagnostics that can help explain why robustness test rejection
Combined estimator
occurs, and a new estimator, the Feasible Optimally combined GLS (FOGLeSs) estimator, that makes
relatively efficient use of the robustness check regressions. A new procedure for Matlab, testrob, embodies
these methods.
© 2013 Elsevier B.V. All rights reserved.

1. Introduction If the signs and magnitudes of the estimated regression coefficients


are also plausible, this is commonly taken as evidence that the es-
A now common exercise in empirical studies is a ‘‘robustness timated regression coefficients can be reliably interpreted as the
check’’, where the researcher examines how certain ‘‘core’’ regres- true causal effects of the associated regressors, with all that this
sion coefficient estimates behave when the regression specifica- may imply for policy analysis and economic insight.
tion is modified in some way, typically by adding or removing Examples are pervasive, appearing in almost every area of
regressors. Leamer (1983) influentially advocated investigations of applied econometrics. For example, of the 98 papers published
this sort, arguing that ‘‘fragility’’ of regression coefficient estimates in The American Economic Review during 2009, 76 involve some
is indicative of a specification error, and that sensitivity analyses data analysis. Of these, 23 perform a robustness check along the
(i.e., robustness checks) should be routinely conducted to help di- lines just described, using a variety of estimators suitable to the
agnose misspecification. data, such as ordinary least squares, logit, instrumental variables,
Such exercises are now so popular that the standard economet- or panel methods (Adams et al., 2009; Alfaro and Charlton, 2009;
ric software has modules designed to perform robustness checks Angelucci and De Giorgi, 2009; Angrist and Lavy, 2009; Ashraf,
automatically; for example, one can use the STATA commands 2009; Boivin et al., 2009; Cai et al., 2009; Chen and Li, 2009; Chetty
rcheck or checkrob. A finding that the coefficients do not change et al., 2009; Dobkin and Nicosia, 2009; Forbes and Lederman,
much is taken to be evidence that these coefficients are ‘‘robust’’.1 2009; Hargreaves Heap and Zizzo, 2009; Hendel et al., 2009; Lavy,
2009; Leaver, 2009; Makowsky and Stratmann, 2009; Mas and
Moretti, 2009; Matsusaka, 2009; Miller, 2009; Oberholzer-Gee and
✩ We are grateful to the participants at the International Symposium on Waldfogel, 2009; Sialm, 2009; Spilimbergo, 2009; Urquiola and
Econometrics of Specification Tests in 30 Years at Xiamen University and the
Verhoogen, 2009).
seminars at many universities where this paper was presented. We also thank But when and how can evidence of coefficient robustness
the editor and two anonymous referees for their helpful comments. Lu gratefully and plausibility support the inference of structural validity? Our
acknowledges partial research support from Hong Kong RGC (Grant No. 643711).
∗ Corresponding author. Tel.: +852 2358 7616.
E-mail address: [email protected] (X. Lu).
1 This usage of ‘‘robust’’ should not be confused with the concept of robustness in or removing sample observations, typically extreme in some way. A more accurate
the statistics literature, which refers to the insensitivity of an estimator to adding terminology for ‘‘robustness’’ here might be ‘‘insensitivity to covariate selection’’.

0304-4076/$ – see front matter © 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.jeconom.2013.08.016
X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206 195

purpose here is to address this question in substantive detail. For from some (important) details specific to robustness checking, our
maximum clarity and practical relevance, we consider a simple lin- robustness test is standard; similarly, our FGLS and FOGLeSs esti-
ear regression context. We show that even in this familiar frame- mators are straightforward. The contribution of this paper is there-
work, there are many opportunities to go astray. Some of these fore not some completely novel method. Rather, this paper is about
pitfalls can be avoided by paying attention to properties of regres- the proper use of now common procedures that seem to be widely
sion analysis that should be well known. But if they are, they are misapplied. Proper application of these, together with the new
often ignored in practice. This neglect can have serious adverse practical methods embodied here in our Hausman test-based pro-
consequences when checking robustness, so it is useful to em- cedure testrob, could considerably strengthen the validity and
phasize these. Other pitfalls have not previously been recognized; reliability of structural inference in economics. To make these
these bear critically on properly interpreting empirical results, es- methods easy to apply, Matlab code for testrob is freely available
pecially with regard to robustness checks. at http://ihome.ust.hk/~xunlu/code.html.
It should be well known, but it is too often forgotten, that is not
necessary that all estimated coefficients make economic sense. See, 2. The structural data generating process
e.g., Stock and Watson (2007, pp. 478–479), who stress this point.
For easy reference and because it forms part of the foundation Economic theory justifies claims that an outcome or response
for the analysis to follow, we spell out why requiring coefficient of interest, Y , is structurally generated as
signs and magnitudes to make economic sense is necessary only
for a certain subset of the regressors with particular relevance Y = r (D, Z , U ),
for robustness checks: the critical core variables are precisely where r is the unknown structural function, D of dimension k0 ∈
those whose effects are of primary interest and whose coefficients N+ represents observed causes of interest, Z of dimension kz ∈ N is
should be plausible. As should also be well known, plausibility other observable drivers of Y and U is unobservable drivers of Y . In
of regression coefficients is not sufficient to permit attribution of particular, U represents not just ‘‘shocks’’, but all factors driving Y
causal effects, regardless of robustness. For easy reference, we also that are too costly or too difficult to observe precisely.
briefly spell this out here. Our interest attaches to the effects of D on Y . For example,
Robustness is necessary for valid causal inference, in that the let Y be wage, let D be schooling, let Z be experience, and let
coefficients of the critical core variables should be insensitive to U be unobserved ability. Then we are interested in the effects
adding or dropping variables, under appropriate conditions. But of schooling on wage. Or let Y be GDP growth, let D be lagged
several pertinent questions have not so far been adequately ad- GDP growth and lagged oil price changes, let Z be other observed
dressed. Specifically, which variables besides the critical core vari- drivers of GDP growth (e.g., monetary policy), and let U represent
ables should also be core variables? Which non-core variables unobserved drivers of growth, including shocks. In this case, we
should one add or remove? Should the latter satisfy certain con- are considering a component of a structural VAR system, and we
ditions, or is it valid to compare regressions dropping or including are interested in the effects of lagged GDP growth and lagged oil
any non-core variables? We show that the choice of core variables prices on GDP growth. When interest truly attaches to the effects
and the non-core variables included or excluded must obey specific of all observed drivers of Y , we assign these to D and omit Z .
restrictions dictated by the underlying structural equations. Ignor- For maximum clarity and practical relevance, we assume here
ing these restrictions can lead to fundamentally flawed economic that r is linear, so
inference. We propose straightforward analysis and statistical pro-
cedures for distinguishing core and non-core variables. Y = D′ βo + Z ′ αo + U , (1)
Most importantly, why perform just a ‘‘check’’, when it should where βo and αo represent the effects of D and Z on Y , respec-
be easy to conduct a true test of the hypothesis that the relevant tively. Later, we discuss some consequences of nonlinearity. Be-
coefficients do not change? At the least, this would permit the cause D is the cause of interest, we are interested primarily in βo ;
researcher to determine objectively whether the coefficients had we may have little or no interest in αo . Note that U has now become
changed too much. The specification testing principles articulated a scalar; in this linear case, we can view U as representing a linear
in Hausman’s (1978) landmark work apply directly. Accordingly, combination of unobserved drivers of Y , without essential loss of
we give a straightforward robustness test that turns informal generality.
robustness checks into true Hausman (1978)-type structural A standard assumption that identifies βo is exogeneity: (D, Z )
specification tests. is independent of U, written (D, Z ) ⊥ U, using the notation of
Suppose we find that the critical core coefficients are not robust. Dawid (1979) (henceforth ‘‘D79’’). In fact, this identifies both βo
Does a robustness check or test provide insight into the reason for and αo . When we are mainly interested in βo , we do not need
the failure? Are there additional steps that could be taken to gain such a strong assumption. A weaker condition identifying βo is a
further insight? We show how and why robustness can fail, and we conditional form of exogeneity:
discuss methods that can be used to gain deeper insight.
Or, suppose we do find that the critical core coefficients are ro- Assumption A.1 (Conditional Exogeneity). Let W be an observable
bust and suppose we have other good reasons to believe that these random vector of dimension kw ∈ N, and X = (Z ′ , W ′ )′ of dimen-
validly measure economic effects of interest. Then there are multi- sion k ≡ kw + kz , such that D ⊥ U | X , and D and U are not
ple consistent estimators, and we would like to obtain the most measurable with respect to the sigma-field generated by X .
precise of these. But which set of regressors should we use? A That is, D is independent of U given X , where X is a vector of
larger set or a smaller set? Or does it matter? We show that the ‘‘control variables’’ or ‘‘covariates’’. This condition is actually much
choice of efficient regressors for causal inference is a deeper and weaker than exogeneity, since it not only permits D and U to be
context-dependent question that presents interesting opportuni- correlated, but it also permits X to be correlated with U. A classic
ties for further investigation. We also propose relatively efficient example is when W is an IQ score proxying for unobserved abil-
feasible generalized least squares (FGLS) methods and a new esti- ity, as in Griliches (1977). Below and in the appendix, we discuss
mator, the Feasible Optimally combined GLS (FOGLeSs) estimator, in detail where the W ’s come from.
that can provide superior inference in practice.
In addressing each of these questions, we provide succinct an- A still weaker condition identifying βo is conditional mean
swers and, where relevant, easily implemented procedures. Apart independence:
196 X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206

Assumption A.2 (Conditional Mean Independence). Let X = (Z ′ , say, where γ ∗ ≡ ((αo + δz∗ )′ , δw ) . Because E (ε | D, X ) = 0, the
∗′ ′

W ′ )′ be an observable random vector such that E (U | D, X ) = regression coefficients βo and γ ∗ can be consistently estimated by
E (U | X ), and D and U are not measurable with respect to the ordinary or generalized least squares under standard conditions.
sigma-field generated by X . Because βo represents the effects of the causes of interest, D, its
signs and magnitudes clearly should make economic sense. But the
For example, Stock and Watson (2007) use this assumption in coefficient on Z , the other observed drivers of Y , is γz∗ ≡ αo + δz∗ .
their outstanding textbook. Wooldridge (2002, pp. 607–608), has This is a mix of causal (αo ) and predictive (δz∗ ) coefficients. Even
an excellent discussion comparing this to A.1. An even weaker though Z drives Y , the presence of δz∗ ensures there is no reason
(in fact necessary) requirement is conditional non-correlation, whatsoever that these coefficients should make economic sense.
E ([D − ζ ∗ X ]U ) = 0, where ζ ∗ ≡ E (DX ′ )E (XX ′ )−1 is the matrix The same is true for the coefficients of W . These are purely predic-
of regression coefficients for the regression of D on X . Here, we tive, so their signs and magnitudes have no economic content.
work with A.1 for three main reasons. First, this permits us to These facts should be well known. Indeed, Stock and Watson
closely link our analysis to the relevant literature on treatment (2007, pp. 478–479) give a very clear discussion of precisely this
effect estimation, which has substantive implications for choosing point in their undergraduate textbook. But this deserves empha-
W . Second, estimator efficiency is a key focus here, and A.1 has sis here for two related reasons. First, researchers and reviewers of
straightforward and important useful implications for efficiency, research submitted for publication often overlook this. Reviewers
discussed below. Third, this condition ensures valid structural may take researchers to task for ‘‘puzzling’’ or ‘‘nonsense’’ coeffi-
inference in contexts extending well beyond linear regression cients that are not the primary focus of interest; and researchers,
(e.g. White and Chalak, 2013).2 , 3 anticipating such reactions (correctly or not), often (mis)direct sig-
nificant effort to specification searches that will preempt such at-
3. Making economic sense tacks. This often includes robustness checks for all core coefficients.
This motivates the second reason for emphasizing this, directly
The conditional exogeneity relation (A.1) is key to determining relevant to our focus here: Because only D should have econom-
the critical core variables, i.e., those variables whose coefficients ically sensible coefficients, only D’s coefficients should be sub-
should make economic sense and be robust. Here ‘‘making eco- ject to robustness checking or testing. Thus, we call D critical core
nomic sense’’ means that the coefficients have causal interpreta- variables. Researchers should not robustness check all core co-
tions, or coincide with the parameters βo and αo in the structural efficients, only the critical core coefficients. Similarly, reviewer
Eq. (1). We begin by considering the regression of Y on D and X , demands that all regression coefficients be robust and make eco-
E (Y | D, X ). Defining the regression error ε ≡ Y − E (Y | D, X ), we nomic sense are unnecessary and unrealistic. Indeed, this stifles
have the regression equation valid research and provides incentives for researchers to ‘‘cook’’
their results until naively demanding reviewers will find them
Y = E (Y | D, X ) + ε. palatable.
This is purely a predictive relation, with no necessary economic It is also easy see here why regression coefficients ‘‘being close
content. Observe that U and ε are distinct objects. Whereas ε is to the researcher’s wanted results’’ is not sufficient to permit
simply a prediction error with no causal content for Y , U represents attribution of causal effects: if either linearity (Eq. (1) or Eq. (2))
unobserved drivers of Y . or conditional exogeneity (A.1) fails, then the regression equation
The underlying structure does create a relationship between just gives the optimal linear prediction equation
U and ε , however. Using the structural relation and conditional Y = D′ β ∗ + X ′ γ ∗ + ε,
exogeneity, we have
where the optimal linear prediction coefficients are
E (Y | D, X ) = E (D′ βo + Z ′ αo + U | D, X ) −1 
β∗ E (DD′ ) E (DX ′ ) E (DY )
   
= D βo + Z αo + E (U | D, X )
′ ′
≡ ,
γ∗ E (XD′ ) E (XX ′ ) E (XY )
= D′ βo + Z ′ αo + E (U | X ).
and ε ≡ Y − D′ β ∗ − X ′ γ ∗ is the optimal linear prediction er-
The final equality holds because D ⊥ U | X implies E (U | D, X ) = ror. These coefficients could be anything; some of them could be
E (U | X ). It follows that ε = U − E (U | X ). Thus, ε is a function of close to the researcher’s wanted results. But without the structural
U and X , an important fact that we rely on later. content of linearity and conditional exogeneity, this is, as MacBeth
We can now easily see which regression coefficients should might say, a tale told by an idiot, full of sound and fury, signify-
make economic sense and which should not. Suppose for simplicity ing nothing but predictive optimality. Researchers and reviewers
that comforted by plausible regression coefficients but with little or no
E (U | X ) = X ′ δ ∗ = Z ′ δz∗ + W ′ δw

. (2) other evidence of correct structural specification may be easily led
astray.
Then the regression equation has the form
Y = D′ βo + Z ′ αo + X ′ δ ∗ + ε 4. Robustness
= D βo + Z (αo + δz ) +
′ ′ ∗
W ′ δw


To determine whether one has estimated effects of interest, βo ,
= D′ βo + X ′ γ ∗ + ε, or only predictive coefficients, β ∗ , one can check or test robustness
by dropping or adding covariates. As we show in the next three
sections, however, there are important restrictions on which
2 In fact, since we focus on the linear specification, for all the analysis except for variables one may include or exclude when examining robustness.
that in Section 5, we only require A.2 and a conditional second moment indepen- This means that letting STATA loose with checkrob, for example,
dence condition: E (U 2 | D, X ) = E (U 2 | X ). These two conditions are implied by is potentially problematic, as this module estimates a set of
A.1. regressions where the dependent variable is regressed on the core
3 In this paper, we focus on the conditional exogeneity as the key identification
variables (included in all regressions) and all possible combinations
assumption. In principle, our methods can be extended to instrumental variables
of other (non-core) variables, without regard to these restrictions.
easily. For example, if we have multiple plausible instrument variables, our robust-
ness test in Section 6 is just the standard Hausman test. Our FOGLeSs estimator We begin by considering what happens with alternative choices
in Section 8.3 can also be similarly constructed by taking a linear combination of of covariates, say X1 and X2 . In particular, suppose we take
multiple instrument variable estimators. X1 = (Z ′ , W ′ )′ as above and form X2 = W by dropping Z . In the
X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206 197

latter case, suppose conditional exogeneity holds with D ⊥ U | W . When the structural assumptions fail, we still have optimal
Now prediction regressions, say,
E (Y |D, X2 ) = E (D′ βo + Z ′ αo + U | D, X2 ) Y = D′ β1∗ + X1′ γ1∗ + ε1
= D βo + E (Z αo | D, W ) + E (U | D, W ).
′ ′
Y = D′ β2∗ + X2′ γ2∗ + ε2 .
Parallel to the prior analysis, we have E (U | D, W ) = E (U | W ), A robustness test gains power against structural misspecification
and we take E (U | W ) = W ′ δw ∗
. But we must also account for from the fact that when the structural assumptions fail, we
E (Z ′ αo | D, W ). This is a function of D and W , so suppose for typically have that β1∗ and β2∗ differ. Note, however, that such tests
simplicity that E (Z ′ | D, W ) = D′ ζd∗ + W ′ ζw∗ . This gives are not guaranteed to have good power, as β1∗ and β2∗ can be similar
E (Y |D, X2 ) = D′ βo + D′ ζd∗ αo + W ′ ζw∗ αo + W ′ δw
∗ or even equal in the presence of structural misspecification, as
when Xj = (Z ′ , Wj′ ) and E (DWj′ ), E (ZWj′ ) are close or equal to zero,
= D′ (βo + ζd∗ αo ) + W ′ (ζw∗ + δw∗ αo ). j = 1, 2. We provide an example in Appendix A.
Now the coefficient on D is not βo but (βo + ζd∗ αo ). If robustness These properties are precisely those permitting application of a
held, these coefficients would be identical. But these coefficients Hausman (1978)-type test. Below, we specify a Hausman-style ro-
generally differ.4 A robustness check or test based on core bustness test based on differences in critical core coefficient esti-
variables D and comparing regressions including and excluding mates, such as β̂1n −β̂2n , where β̂1n and β̂2n are estimators of β1∗ and
Z while including covariates W would therefore generally signal β2∗ , respectively. When this test rejects, this implies that any or all
non-robustness of the D coefficients, despite the validity of the of the following maintained hypotheses fail: (i) Y = D′ βo +X ′ γ ∗ +ε
regression including Z and W for estimating βo . This shows that (regression linearity); (ii) D ⊥ U | X1 (conditional exogeneity of D
robustness checks or tests generally should not drop observed w.r.t. X1 ); (iii) D ⊥ U | X2 (conditional exogeneity of D w.r.t. X2 ).
drivers Z of Y . That is, the Z ’s should be core variables for Regression linearity can fail due to either structural nonlinearity
robustness checks or tests, in the sense that they generally should (failure of Y = D′ βo + Z ′ αo + U) or predictive nonlinearity (failure
be included along with D in the regressions used for robustness of E (U | X ) = X ′ δ ∗ = Z ′ δz∗ + W ′ δw

) or both. Robustness tests are
checking. therefore non-specific; either exogeneity failures or nonlinearity
On the other hand, even when Z is included as a core variable, may be responsible for rejection. In particular, if rejection is due to
dropping one or more elements of W could lead to failure of the failure of (iii) only, then consistent estimation of causal effects
conditional exogeneity, again signaling non-robustness, despite βo is still possible, as β1∗ = βo . The robustness test rejects because
the validity of the regression including Z and W for estimating βo . of a misleading choice of comparison covariates. A similar situation
Robustness is nevertheless necessary for valid causal inference, holds if rejection is due to the failure of (ii) only, as then β2∗ = βo .
provided we have (at least) two alternate choices of covariates, To avoid rejections due to a misleading choice of comparison
say X1 and X2 , both of which contain Z and both of which ensure covariates, it is helpful to gain a better understanding of where the
conditional exogeneity of D: covariates W come from. We take this up next.
D ⊥ U | X1 and D ⊥ U | X2 . (3)
5. Selecting covariates
A similar situation has been considered in a related treatment ef-
fect context by Hahn (2004) and White and Lu (2011). Neverthe-
So far, we have seen that the covariates X should contain Z . They
less, those papers focus on a different issue, estimator efficiency;
may also contain additional variables W , as X ≡ (Z ′ , W ′ )′ . We
we will return to this below.
now consider how these additional variables may arise. First, we
The reasoning above now gives two regression equations of
consider how W may be chosen to ensure the validity of covariates
similar form:
X , that is, D ⊥ U | X . Then we consider how the core and non-
Y = D′ βo + X1′ γ1∗ + ε1 core covariates potentially useful for examining robustness may be
Y = D′ βo + X2′ γ2∗ + ε2 , chosen.

where (γ1∗ , ε1 ) and (γ2∗ , ε2 ) are defined analogously to their 5.1. Valid covariates
counterparts above. Because conditional exogeneity holds for both
sets of covariates, we can estimate βo consistently from both If strict exogeneity holds, i.e., (D, Z ) ⊥ U, then D ⊥ U |
regressions. That is, whether we include D and X1 in the regression Z (e.g., by Lemma 4.3 of D79). In this case, X = Z is a valid
or D and X2 , we will get similar critical core variable coefficient choice of covariates, and we need not include other variables W
estimates, as both estimate the same thing, βo . This is the essence in the core covariates. Nevertheless, we still need valid non-core
of the robustness property, and we have just shown that it is a covariates W to perform a robustness check, so it is important
consequence of correct structural specification, hence necessary. to understand how these can arise. Further, when D or Z are
Notice that even though Z is included in both regressions, endogenous (e.g., correlated with U), as in the classic example
its coefficients generally differ between the two. In the first
where schooling (D) may be correlated with ability (U ), then W
regression, the Z coefficient is (αo + δ1z ∗
), say; it is (αo + δ2z

) in can play a crucial role by providing core covariates that ensure
the second. This reflects the fact that Z is playing a predictive role
D ⊥ U | (Z , W ).
for U in these regressions, and this role changes when different W ’s
With X = (Z ′ , W ′ )′ , the condition D ⊥ U | X says that given
are involved. Non-robustness of the Z coefficients does not signal
X , D contains no predictively useful information for U. It also says
structural misspecification. For this reason, the Z ’s are non-critical
that given X , U contains no predictively useful information for D.
core variables.
For the first criterion, we want X to be a good predictor of U , so that
D has nothing more to contribute. For the second, we want X to be
a good predictor of D, so that U has nothing more to contribute.
4 No difference occurs when ζ ∗ α = 0. For this, it suffices that α = 0 (Z
d o o Either or both of these properties may ensure D ⊥ U | X .
does not drive Y ) or that ζd∗ = 0 (e.g., D ⊥ Z | W ). In what follows, we use the Appendix B contains a detailed discussion based on these
qualifier ‘‘generally’’ to implicitly recognize these exceptions or other similar special
circumstances. Actually, it is desirable to include Z even in this special case where
heuristics supporting the conclusion that W can either contain
Z is a driver of Y and D ⊥ Z | W , as including Z in general renders the estimation of proxies for U , observed drivers of D, or proxies for unobserved
β0 more efficient. White and Lu (2011) discuss this for the case of a scalar binary D. drivers of D. It should not contain outcomes driven by D. One
198 X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206

can begin, then, by selecting an initial vector of covariates W is core or not. Alternatively, suppose we fail to reject ε11 ⊥ W12 |
satisfying these conditions. By themselves, these criteria do not X11 . Because this generally does not imply U ⊥ W12 | X11 , we
guarantee D ⊥ U | X . On the other hand, however, W may indeed again cannot make a determination. Properties of the regression
ensure this. We now discuss how to determine which of these residuals ε11 do not provide definitive insight. Instead, however,
possibilities holds and how this relates to specifying core and non- one might use knowledge of the underlying structure as discussed
core covariates for robustness checking and testing. in Sections 5.1 and 5.3 to justify dropping or retaining elements of
W1 . There, we argue that when X11 are proxies for the unobservable
5.2. Core covariates U, or observable drivers of W12 , or proxies for unobservable drivers
of W12 , then U ⊥ W12 | X11 is plausible. One simple example
Suppose W ensures D ⊥ U | X . Then W may contain more is that U is unobservable ability, W12 is education and X11 is
information than is minimally needed to identify βo . To obtain core IQ test score, where IQ test score is a proxy for unobservable
covariates, we therefore seek a subvector W1 of W such that X1 = ability. Specifying non-core elements W12 of W1 satisfying U ⊥
(Z ′ , W1′ )′ generates what Heckman and Navarro-Lozano (2004) call W12 | X11 unavoidably requires some judgment.
the minimum relevant information set, the smallest information set For practical purposes, then, we recommend using core covari-
giving D ⊥ U | X1 . Note that W1 could have dimension zero; we ates W1 determined by checking D ⊥ W2 | X1 as described above,
could also have W1 = W . adjusted by deleting any elements for which U ⊥ W12 | X11 is plau-
Let X = (X1′ , W2′ )′ . Given D ⊥ U | X , Lemma 4.3 of D79 yields sible a priori. Because we recommend erring on the side of inclu-
two conditions that imply D ⊥ U | X1 , so that W1 is core and W2 is sion, this may result in a set of core covariates that might be larger
not. These two conditions are: than absolutely necessary; but these should generally be quite ad-
equate for robustness checking and testing.
Condition 1: D ⊥ W2 | X1 and Condition 2: U ⊥ W2 | X1 . For reference later, we note that an important consequence of
If either condition holds, then we can drop covariate W2 from the D ⊥ U | (X1 , W2 ) and D ⊥ W2 | X1 following from Lemma
original covariates X . Below we discuss how these two conditions 4.3 of D79 is that D ⊥ (U , W2 ) | X1 . Similarly, we have that
can be used to distinguish core and non-core covariates. D ⊥ U | (X1 , W2 ) and U ⊥ W2 | X1 imply (D, W2 ) ⊥ U | X1 .
First, consider Condition 1: D ⊥ W2 | X1 . This condition
involves only observables, so it is straightforward to investigate 5.3. Non-core covariates
empirically. When it is not rejected, we conclude that W2 is non-
core. Condition 1 can be tested nonparametrically or parametri- We began with an initial set of covariates W and described how
cally. When the stochastic dependence can be fully captured by to extract core covariates; for convenience, we now just denote
linear regression, one can regress D on X = (Z ′ , W ′ )′ to poten- these W1 and also write X1 = (Z ′ , W1′ )′ . To use a robustness check
tially reveal a subvector W1 of W such that D ⊥ U | X1 . The matrix or test to see whether W1 does in fact ensure D ⊥ U | X1 and
of regression coefficients for this regression is therefore identify βo , we require suitable non-core covariates for
the comparison regressions.
ζ ∗ = E (DX ′ )E (XX ′ )−1 .
We may already have some non-core covariates, namely any
Partitioning ζ ∗ (k0 × (kz + kw ), say) as ζ ∗ ≡ (ζz∗ , ζw∗ ), suppose components of W not in W1 . But this is not guaranteed, as we may
that there is (after a possible permutation) a partition of W , W = have W1 = W . Even if we do have some initial non-core covari-
(W1′ , W2′ )′ , such that, with ζw∗ ≡ (ζw∗1 , ζw∗2 ), we have ζw∗2 = 0 (a ates, it can be useful to find others. These can enhance the power
k0 × kw2 matrix, say). Then D ⊥ W2 | (Z , W1 ), so we can rule out of robustness tests; they can also enhance the efficiency of the es-
W2 as core covariates. timator for βo .
This suggests a straightforward, practical regression-based There are various ways to construct additional valid covariates.
method for isolating non-core covariates: one can examine the One effective way is to find covariates X2 such that D ⊥ U | X1
sample regression coefficients from the regressions of each ele- implies D ⊥ U | X2 . White and Chalak (2010, Prop. 3), provide a
ment of D on X = (Z ′ , W ′ )′ and identify non-core variables W2 relevant result, showing that if D ⊥ U | X1 and if
as those having estimated coefficients close to zero in every such
W2 = q(X1 , U , V ), where D ⊥ V | (U , X1 )
regression. One may proceed heuristically or conduct more formal
inference or model selection. Whatever the approach, one should for some unknown structural function q, then with X2 = (X1′ , W2′ )′ ,
err on the side of keeping rather than dropping variables, as re- we have
taining valid but non-essential covariates is much less costly than
D ⊥ U | X2 .
dropping true core covariates, as the former does not render esti-
mates of βo inconsistent, whereas the latter does. A leading example occurs when W2 is a proxy for U different
In the fully general (nonlinear) case, analogous but more elab- than W1 , subject to measurement errors V , say W2 = q(U , V ). If
orate procedures can reveal such a W1 . For simplicity, we restrict the measurement errors are independent of (D, U , X1 ), as is often
attention here to the linear case. plausible, then D ⊥ V | (U , X1 ) also holds.
Second, consider whether we can identify further elements of Because W2 can be a vector here, any subvector of W2 has the
W1 as non-core using Condition 2. For this, we seek a partition same properties. It follows that when W2 is constructed in this way,
of W1 , W1 = (W11 ′
, W12
′ ′
) such that U ⊥ W12 | X11 , where automated methods, along the lines of STATA’s rcheck and checkrob
X11 = (Z ′ , W11 ) . If such a partition exists, we can rule out W12
′ ′
modules, that treat D and X1 as core variables and W2 as non-core
as core variables, since we then have D ⊥ U | X11 . Since U is (now taking X = X1 in forming W2 ), will yield a potentially useful
unobservable, we cannot proceed directly, as above. Nevertheless, set of comparison regressions.
we might try to use regression residuals ε11 ≡ Y − E (Y | D, X11 ) to White and Lu (2011) provide various structural configura-
find such a partition. This turns out to be problematic, as we now tions sufficient for certain unconfoundedness assumptions used by
discuss. Hahn (2004) in studying the efficiency of treatment effect estima-
Specifically, observe that D ⊥ U | X11 implies that ε11 = U − tors. These structures can inform the choice of both initial and addi-
E (U | X11 ). It follows by Lemmas 4.1 and 4.2 of D79 that U ⊥ W12 | tional covariates; here we focus on the latter. Although D is a binary
X11 then implies ε11 ⊥ W12 | X11 . Now suppose we test and reject scalar in Hahn (2004) and White and Lu (2011), this is not required
ε11 ⊥ W12 | X11 . Then U ̸⊥ W12 | X11 , so it is not clear whether W12 here: D can be a vector with categorical or continuous elements.
X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206 199

A condition sufficient for one of Hahn’s unconfoundedness Xji = (X1i′ , Wji′ )′ , j = 2, . . . , J, where Wji is a subvector of the non-
assumptions (his A.3(a)) is core elements of Wi . Not all subvectors of the non-core variables
need to appear in the comparison regressions.
D ⊥ ( U , W 2 ) | X1 .
The ordinary least squares (OLS) robustness check regression
White and Lu (2011) give two different underlying structures for estimators are
which this holds. The example above from White and Chalak (2010, −1  ′ 
D′ D D′ Xj
  
δ̂jn ≡ β̂jn
D
Prop. 3), is another case where this holds. With X2 = (X1′ , W2′ )′ , ≡ ′ Y, j = 1, . . . , J ,
γ̂jn Xj D X′j Xj X′j
Lemma 4.2 of D79 gives the required condition (3),
D ⊥ U | X1 and D ⊥ U | X2 . where D is the n × k0 matrix with k0 × 1 rows D′i , Xj is the n × kj
matrix with kj × 1 rows Xji′ , and Y is the n × 1 vector with elements
Note that D ⊥ (U , W2 ) | X1 was implied for our initial covariates Yi .
when D ⊥ W2 | X1 . The covariates identified by White and Lu’s Letting δ̂n ≡ (δ̂1n

, . . . , δ̂Jn′ )′ , it follows under mild conditions (in
(2011) structures may be the same or different than those obtained particular, without assuming correct specification) that
from the initial covariates. √ d
A condition sufficient for Hahn’s other assumption (A.3(b)) is n(δ̂n − δ ∗ ) → N (0, M ∗−1 V ∗ M ∗−1 ),
(D, W2 ) ⊥ U | X1 . where δ ∗ ≡ (δ1∗′ , . . . , δJ∗′ )′ and δj∗ ≡ (βj∗′ , γj∗′ )′ , with optimal
White and Lu (2011) give two different underlying structures for prediction coefficients βj∗ and γj∗ as defined above, and where M ∗
which this holds. With X2 = (X1′ , W2′ )′ , this also implies and V ∗ are given by

D ⊥ U | X1 and D ⊥ U | X2 . M ∗ ≡ diag(M1∗ , . . . , MJ∗ ) and V ∗ ≡ [Vkj∗ ],

Note that (D, W2 ) ⊥ U | X1 was implied for our initial covariates where, for k, j = 1, . . . , J ,
when U ⊥ W2 | X1 . The covariates identified by White and Lu’s E (DD′ ) E (DXj′ )
 
Mj∗ ≡ and
(2011) structures may be the same or different than those obtained E (Xj D′ ) E (Xj Xj′ )
from the initial covariates.
E (εk εj DD′ ) E (εk εj DXj′ )
 
More generally, suppose that Vkj∗ ≡ .
E (εk εj Xk D′ ) E (εk εj Xk Xj′ )
(D, W21 ) ⊥ (U , W22 ) | X1 .
See Chalak and White (2011) for regularity conditions in this con-
With W2 = (W21

, W22
′ ′
) and X2 = (X1′ , W2′ )′ , Lemma 4.2 of D79 text.
again gives The critical core coefficient robustness hypothesis is
D ⊥ U | X1 and D ⊥ U | X2 . Ho : ∆S δ ∗ = 0,
In these last three examples, we have X2 = (X1′ , W2′ )′ , as in where S is a selection matrix that selects 2 ≤ K ≤ J subvectors βj∗
the first example. The structure of these results also ensures that
from δ ∗ ≡ (δ1∗′ , . . . , δJ∗′ )′ and ∆ is the (K − 1)k0 × Kk0 differencing
a potentially useful comparison regression will result with core
matrix
variables D and X1 and any subvector of W2 as non-core.
I −I 0 0
 
These examples are not necessarily the only possibilities, but
they illustrate different ways to obtain additional non-core covari- I 0 −I 0
∆= .. .
ates potentially useful for robustness checking and testing. Sig-  . 
nificantly, the underlying economic structure plays a key role in
I 0 0 −I
determining both the non-critical core variables, X1 , and the non-
core covariates, W2 . By failing to account for this structure, one may The proper choice of subvectors selected by S is crucial. We discuss
improperly specify core and non-core variables, making it easy to this below.
draw flawed economic inferences. Indeed, our characterization of Letting R ≡ ∆S, the robustness test statistic is
some of these examples as yielding ‘‘potentially’’ useful non-core
RKn ≡ nδ̂n′ R′ [RM̂n−1 V̂n M̂n−1 R′ ]−1 Rδ̂n ,
variables is meant to signal that these might not be useful after
all, in the sense that their use in robustness checks or robustness where M̂n and V̂n are consistent estimators of M ∗ and V ∗ , respec-
tests provides no information about structural misspecification. tively, and it is assumed that RM ∗−1 V ∗ M ∗−1 R′ is nonsingular. The
The next section more explicitly identifies some of the pitfalls. testrob routine estimates V ∗ under the assumption that the regres-
sion errors εi ≡ (ε1i , . . . , εJi )′ are uncorrelated across i.5
6. A robustness test for structural misspecification As is standard, under Ho ,
d
Performing a robustness test is a completely standard pro- RKn → χ(2K −1)k0 ,
cedure, so it is somewhat mystifying that such tests have not
where χ(2K −1)k denotes the chi-squared distribution with (K − 1)k0
been routinely conducted, even if not focused on just the criti- 0

cal core coefficients. Here, we describe such a procedure, embod- degrees of freedom.6 One rejects critical core coefficient robustness
ied in our testrob module. A Matlab routine for this can be found
at http://ihome.ust.hk/~xunlu/code.html. We hope its availability
will help make such tests standard. 5 The test statistic can be constructed for the case where different sets of
To describe the robustness test, we first specify the robustness covariates have different sample sizes by adjusting the covariance matrix V ∗ ap-
check regressions. Let the sample consist of n observations propriately.
(Yi , Di , Zi , Wi ), i = 1, . . . , n. The core variables are those appearing 6 In practice, the number of degrees of freedom may need adjustment to ac-

in the core regression: the critical core variables, Di , and the non- commodate certain linear dependences among the elements of Rδ̂n . With linear
critical core variables, X1i = (Zi′ , W1i′ )′ , where W1i is a subvector dependences, the statistic becomes RKn ≡ nδ̂n′ R′ [RM̂n−1 V̂n M̂n−1 R′ ]− R δ̂n , where
of Wi . The remaining elements of Wi are the non-core variables. [RM̂n−1 V̂n M̂n−1 R′ ]− denotes a suitable generalized inverse of RM̂n−1 V̂n M̂n−1 R′ , and
d
There are J − 1 comparison regressions that include Di and RKn → χk2∗ , where k∗ ≡ rk(RM̂n−1 V̂n M̂n−1 R′ ).
200 X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206

at the α level if RKn exceeds the 1 − α percentile of the χ(2K −1)k exogeneity. But the robustness test is sufficiently non-specific that
0
distribution. Because this test is a completely standard parametric it is helpful to employ further diagnostics to understand what may
−1/2 be driving the robustness rejection. At the heart of the matter is
test, it has power against local alternatives at rate n .
Under the global alternative, HA : ∆S δ ∗ ̸= 0, the test is consis- whether D ⊥ U | X1 holds for the core regression. There are
tent; that is, it will eventually detect any departure of β1∗ from any now many approaches to testing this in the literature. Early pro-
selected βj∗ . This may leave some misspecified alternatives unde- cedures were developed for the case of scalar binary D. See Rosen-
tected, as explained above; also, rejection is non-specific, as it can baum (1987), Heckman and Hotz (1989), and the review by Imbens
signal regression nonlinearity and/or failures of conditional exo- (2004). Recently White and Chalak (2010) have given tests comple-
geneity. Nevertheless, the structure generating the non-core vari- mentary to these that permit D to be a vector of categorical or con-
ables affects the alternatives detected. For example, with non-core tinuous variables. White and Lu (2010) give straightforward tests
variables generated as in the first example of Section 5.3, the test for conditional exogeneity that involve regressing transformed re-
will detect the failure of core conditional exogeneity, i.e., D ⊥ U | gression residuals on functions of the original regressors. In the
X1 , or the failure of D ⊥ V | (U , X1 ). (Since V can be anything, the next section, we discuss a special case involving squared residu-
assumption that W2 = q(X1 , U , V ) is essentially without loss of als that emerges naturally in the context of efficient estimation.
generality.) Given the realities of the review and publication process in
In fact, the way the non-core variables are generated plays a key economics, it is perhaps not surprising that the literature contains
role in determining the choice of comparison regressions. To see plenty of robustness checks, but not nearly as much in the way of
why, recall that εj = U − E (U | Xj ). Suppose that Xk = (Xj′ , Wjk′ )′ more extensive specification analysis. By submitting only results
and that that may have been arrived at by specification searches designed
to produce plausible results passing robustness checks, researchers
U ⊥ Wjk | Xj .
can avoid having reviewers point out that this or that regression
Then coefficient does not make sense or that the results might not be
robust. And if this is enough to satisfy naive reviewers, why take a
εk = U − E (U | Xk ) = U − E (U | Xj , Wjk ) = U − E (U | Xj ) = εj . chance? Performing further analyses that could potentially reveal
It follows that using comparison regressions that include Xj as specification problems, such as nonlinearity or exogeneity failure,
well as Wjk or its components introduces linear dependences is just asking for trouble.
that increase the opportunities for singularity of RM ∗−1 V ∗ M ∗−1 R′ Of course, this is also a recipe for shoddy research. Specifica-
tion searches designed to achieve the appearance of plausibility
(hence RM̂n−1 V̂n M̂n−1 R′ ). We thus recommend caution in specifying
comparison regressions of this sort. and robustness have no place in a scientific endeavor. But even in
the absence of such data snooping, it must be recognized that ro-
As discussed in Section 5, when (D, W21 ) ⊥ (U , W22 ) | X1 , we
have D ⊥ U | X1 , s (W21 ) , s (W22 ) for any subvector s (W21 ) of bustness checks are not tests, so they cannot have power. And, al-
W21 and subvector s (W22 ) of W22 by D79 Lemma 4.2. But including though robustness tests may have power, they are non-specific and
all subvectors of (W11 , W22 ) may introduce linear dependences thus may lack power in specific important directions. By deploying
into RM ∗−1 V ∗ M ∗−1 R′ . Thus, testrob carefully assesses and handles true specification testing methods, such as the Hausman-style ro-
bustness test of the previous section, the further tests described
nonsingularities in RM̂n−1 V̂n M̂n−1 R′ .
in this section, or any of the methods available in the vast speci-
Also, when there are multiple sets of covariates, say, Xj and
fication testing literature to examine whether findings supported
Xk such that D ⊥ U | Xj and D ⊥ U | Xk , it is important to
by robustness checks really do hold up, researchers can consider-
recognize that it is generally not valid to use the combined vector
ably improve the quality of empirical research. Reviewers of work
(i.e., (Xj , Xk )) for robustness checking, as there is no guarantee that
submitted for publication can encourage this improvement rather
D ⊥ U | (Xj , Xk ) holds. Algorithms that include all possible subsets
than hindering it by focusing on critical core coefficients only and
of the non-core variables, such as rcheck or checkrob, ignore this
by reorienting their demands toward testing robustness and spec-
restriction, leading to the likely appearance of misleading non-
ification.7
robustness in the critical core coefficients.

8. Estimator efficiency with correct structural specification


7. Gaining insight into robustness rejections

When the robustness test rejects, it would be helpful to know Suppose the robustness test does not reject, and that other
why. Is it regression nonlinearity or is it failure of conditional ex- specification tests (e.g., nonlinearity, conditional exogeneity)
ogeneity? There is an extensive literature on testing for neglected do not indicate specification problems. Then one has multiple
nonlinearity in regression analysis, ranging from Ramsey’s (1969) consistent estimators for βo , and one must decide which to use.
classic RESET procedure to modern neural network or random field Should one just use the core regression or instead use a regression
tests. (See, for example, Lee et al., 1993; Hamilton, 2001; Dahl including non-core variables? Or should one use some other
and Gonzalez-Rivera, 2003.) These references, like those to follow, estimator?
barely scratch the surface, but they can at least help in locating The criterion that resolves this question is estimator efficiency.
other relevant work. Such tests are easy and should be routinely We seek the most precise estimator of the critical core coefficients,
performed; for example, one can use STATA’s reset procedure. as this delivers the tightest confidence intervals and the most
If the nonlinearity test rejects, one may seek more flexible powerful tests for the effects of interest. Recent work of Hahn
specifications. The relevant literature is vast, ranging from use of (2004) shows that which estimator is efficient is not immediately
more flexible functional forms including algebraic or trigonometric obvious. As Hahn (2004) and White and Lu (2011) further show,
polynomials (e.g. Gallant, 1982), partially linear models (Engle this depends on the exact nature of the underlying structure.
et al., 1986; see Su and White, 2010 for a partial review), local
polynomial regression (e.g., Cleveland, 1979; Ruppert and Wand,
1994), and artificial neural networks (e.g., White, 2006a). 7 As pointed out by one anonymous referee, ideally, the linearity and conditional
On the other hand, if robustness is rejected but not linear- exogeneity tests discussed in this section and the further specification test in
ity, then we have apparent evidence of the failure of conditional Section 8.4 should be implemented before the robustness test.
X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206 201

8.1. Efficiency considerations σji−1 = E (εji2 | Di , Xji )−1/2 . Recall that D ⊥ U | Xj implies
εj = U − E (U | Xj ). Then
To understand the main issues in the present context, recall that
E (εj2 | D, Xj ) = E ([U − E (U | Xj )]2 | D, Xj )
the asymptotic covariance matrix for δ̂nj ≡ (β̂nj

, γ̂nj′ )′ , av ar (δ̂nj ), is
Mj∗−1 Vj∗ Mj∗−1 , where = E (U 2 | D, Xj ) − 2E (U | D, Xj )E (U | Xj ) + E (U | Xj )2
= E (U 2 | Xj ) − E (U | Xj )2 ,
E (DD′ ) E (DXj′ )
 

Mj ≡ and where the final equality holds because D ⊥ U | Xj . We see
E (Xj D′ ) E (Xj Xj′ )
that with conditional exogeneity, conditional heteroskedasticity
E (εj2 DD′ ) E (εj2 DXj′ )
 
only depends on X and does not depend on D. Also, we see that
Vj∗ ≡ .
E (εj2 Xj D′ ) E (εj2 Xj Xj′ ) conditional heteroskedasticity is typical, as X is chosen so that X
and U are dependent. GLS is thus typically required for estimator
Suppose for simplicity that σj2 ≡ E (εj2 | D, Xj ) does not depend on efficiency.
D or Xj . Since this assumption is unrealistic, we remove it later; for The fact that E (εj2 | D, Xj ) does not depend on D has useful
now, it enables us to expose the key issues. With constant σj2 , we implications. First, it suggests another way to test conditional
have the classical result that exogeneity. Second, it simplifies the GLS computations. We first
 −1 take up GLS estimation and then discuss specification testing.
E (DD′ ) E (DXj′ )

Mj∗−1 Vj∗ Mj∗−1 = σj2 . Typically, the conditional variances σji2 = E (εji2 | Di , Xji ) are
E (Xj D′ ) E (Xj Xj′ ) unknown; given suitable estimators σ̂ji , a feasible GLS (FGLS) esti-
mator for robustness check regression j is
A little algebra shows that for the critical core coefficients β̂nj ,
   ′ −1  ′ 
D̃′ X̃j
δ̃jn ≡ β̃jn ≡ ′
we have D̃ D̃ D̃
Ỹ,
av ar (β̂nj ) ≡ σ E (ηj ηj )
2 ′ −1
, γ̃jn X̃j D̃ X̃′j X̃j X̃′j
j

where ηj ≡ D − ζj∗ Xj is the vector of residuals from the lin- where D̃ is the n × k0 matrix with k0 × 1 rows D̃′i ≡ D′i /σ̂ji , X̃j is
ear regression of D on Xj , with regression coefficient matrix ζj∗ ≡ the n × kj matrix with kj × 1 rows X̃ji′ = Xji′ /σ̂ji , and Ỹ is the n × 1
E (DXj )E (Xj Xj ) . We now see the essential issue: estimator effi-
′ ′ −1
vector with elements Ỹi = Yi /σ̂ji . Regression linearity ensures that
ciency is determined by a trade-off between the residual variance δ̃jn is consistent for δj∗ . Conditions ensuring that FGLS estimators
of the Y regression, σj2 , and the residual covariance matrix of the are asymptotically equivalent to GLS for nonparametric choice of
D regression, E (ηj ηj′ ). Adding regressors beyond those of the core σ̂ji are given by Robinson (1987), White and Stinchcombe (1991),
regression may either increase or decrease av ar (β̂nj ), depending and Andrews (1994), among others.
on whether the included non-core variables reduce σj2 enough to An interesting possibility discussed by White and Stinchcombe
(1991) is to estimate σji2 using artificial neural networks. This
offset the corresponding increase in E (ηj ηj′ )−1 resulting from the entails estimating a neural network regression model, such as
reduction in E (ηj ηj′ ). q
The high-level intuition is that additional non-core covariates 
ε̂ji2 = ψ(τjℓ0 + Xji′ τjℓ ) θjℓ + υji , (4)
should be included if they tend to act as proxies for U (reducing
ℓ=1
σj2 ) and should not be included if they tend to act as drivers of D
or proxies for unobserved drivers of D (reducing E (ηj ηj′ )). Indeed, where ε̂ji ≡ Yi − D′i β̂jn − Xji′ γ̂jn is the estimated residual for
when for some W ∗ , W ∗ ⊥ U | Xj holds, adding elements of W ∗ to observation i from robustness check regression j, q = qn is the
number of ‘‘hidden units’’ (with qn → ∞ as n → ∞), ψ is
Xj can only increase E (ηj ηj′ )−1 , as adding W ∗ leaves σj2 unchanged.
the hidden unit ‘‘activation function’’, e.g., the logistic CDF or PDF,
A general investigation of the efficiency bounds and of
(τjℓ0 , τj′ℓ , θj′ℓ ), ℓ = 1, . . . , q, are parameters to be estimated, and
estimators attaining these bounds relevant to selecting covariates
υji is the regression error. When ψ is suitably chosen, nonlinear
in the current context is a fascinating topic. The work of Hahn
estimation can be avoided by choosing (τjℓ0 , τj′ℓ )′ randomly; see
(2004) for the case of scalar binary D is a good start, but the issues
appear sufficiently challenging that a general analysis is beyond White (2006a). The estimates σ̂ji2 are the fitted values from this
our scope here. Instead, we discuss practical methods that can in regression. Observe that this estimation is simplified by the fact
that Di need not be included on the right-hand side above. This
the meantime provide concrete benefits in applications.
conserves degrees of freedom, offering significant benefits for
One practical approach is to select the most efficient estimator
nonparametric estimation.
among the robustness check regressions by simply comparing
their asymptotic covariance matrix estimators. This is particularly The FGLS critical core coefficient estimator β̃nj now has
straightforward for the estimators used in the robustness test, as av ar (β̃nj ) ≡ E (η̃j η̃j′ )−1 ,
their asymptotic covariances are already estimated as part of the
test. where η̃j ≡ (D − ζ̃j∗ Xj )/σj and ζ̃j∗ ≡ E (DXj′ /σj2 )E (Xj Xj′ /σj2 )−1 . If
But one can do better, because the efficiency of the estimators j∗ indexes the efficient estimator, then for all j ̸= j∗ , av ar (β̃nj ) −
used in the robustness check can be improved by using generalized av ar (β̃nj∗ ) will be positive semi-definite. The covariance matrices
least squares (GLS) instead of OLS. Further, the nature of these GLS can be estimated in the obvious way, and one can search for an
corrections is of independent interest, as, among other things, this estimator with the smallest estimated covariance. In testrob, the
bears on the general analysis of efficient estimators. covariance matrices are estimated using the heteroskedasticity-
consistent method of White (1980), to accommodate the fact that
8.2. GLS for the robustness check regressions the FGLS adjustment might not be fully successful.

In line with our previous discussion, from now on we consider 8.3. Combining FGLS estimators
robustness check regressions where Xj contains X1 . For regression
j with independent or martingale difference regression errors εji , Even if one finds an FGLS estimator with smallest covariance
the GLS correction involves transforming each observation by matrix, there is no guarantee that this estimator makes efficient
202 X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206

use of the sample information. Usually, one can find a more j = 1, . . . , J, but this generally fails for the robustness check com-
efficient estimator by optimally combining the individual FGLS parison regressions. Instead, we have only that E (εj | D, Xj ) = 0,
estimators. To describe this estimator, we note that, parallel to the which generally does not imply E (εj | D, Z , W ) = 0.
OLS case, asymptotic normality holds for FGLS, i.e., To close this section, we note that using some estimator other
√ d
than the FOGLeSs estimator, for example, the core regression
n(δ̃n − δ ∗ ) → N (0, M̃ ∗−1 Ṽ ∗ M̃ ∗−1 ), OLS estimator, gives inefficient and therefore potentially flawed
inferences about the critical core coefficients.
under mild conditions, with
M̃ ∗ ≡ diag(M̃1∗ , . . . , M̃J∗ ) and Ṽ ∗ ≡ [Ṽkj∗ ], 8.4. Further specification analysis
where, for k, j = 1, . . . , J ,
Observe that the FGLS and the FOGLeSs estimators are consis-
E (DD′ /σj2 ) E (DXj′ /σj2 ) tent for the same coefficients as the corresponding OLS estimators.
 

M̃j ≡ and Any apparent divergence among these estimators potentially sig-
E (Xj D′ /σj2 ) E (Xj Xj′ /σj2 )
nals misspecification (i.e., regression nonlinearity or failure of con-
E (ε̃k ε̃j DD′ /σk σj ) E (ε̃k ε̃j DXj′ /σk σj )
 
ditional exogeneity). These differences can be formally tested using
Ṽkj∗ ≡ ,
E (ε̃k ε̃j Xk D′ /σk σj ) E (ε̃k ε̃j Xk Xj′ /σk σj ) a standard Hausman (1978) test. As this is completely standard, we
do not pursue this here.
with ε̃j ≡ (Yj − D′ βo − Xj′ γj∗ )/σj . As noted above, the fact that E (εj2 | D, Xj ) does not depend on D
Letting S be the Jk0 × dim(δ ∗ ) selection matrix that extracts suggests a convenient and informative way to test conditional ex-
β̃n ≡ (β̃n1

, . . . , β̃nJ′ )′ from δ̃n (so β̃n = S δ̃n ), the asymptotic nor- ogeneity. Specifically, for any j, one can estimate a neural network
mality result for β̃n can be represented as an artificial regression regression of the form
√ √ q
n β̃n = n I βo + υ, 
ε̂ji2 = D′i λj + ψ(τjℓ0 + Xji′ τjℓ ) θjℓ + υji . (5)
where the Jk0 × k0 matrix of artificial regressors is I ≡ι⊗ Ik0 , where ℓ=1
ι is the J × 1 vector of ones and Ik0 is the identity matrix of order Under the null of conditional exogeneity, H0j : D ⊥ U | Xj , we have
k0 ; and υ is the Jk0 × 1 artificial regression error, distributed as
λj = 0. Full details of testing λj = 0 in this way can be found in
N (0, Ω ∗ ) with Ω ∗ ≡ S M̃ ∗−1 Ṽ ∗ M̃ ∗−1 S ′ . For simplicity, we assume White and Lu (2010, Section 6.3).
that Ω ∗ is nonsingular; if not, we simply drop comparison regres- The natural point in the process to do these tests is immedi-
sions until nonsingularity holds. ately after performing the robustness test and before implement-
The Feasible Optimally combined GLS (FOGLeSs) estimator is ing FGLS. These results provide important diagnostics about where
obtained by applying FGLS to this artificial regression: specification problems may lie that one should be aware of before
β̃n∗ = (I′ Ω̂ ∗−1 I)−1 I′ Ω̂ ∗−1 β̃n , going further. They also can be quickly adjusted to use in imple-
menting FGLS.
where Ω̂ ∗ is a suitable consistent estimator of Ω ∗ . This estimator The diagnostic information relates to whether D ⊥ U | Xj .
satisfies Suppose the robustness test does not reject. If we nevertheless
√ d
reject H0j , this signals a specification problem that the robustness
n(β̃n∗ − βo ) → N (0, (I′ Ω ∗−1 I)−1 ). test may lack power to detect. On the other hand, if the robustness
One can consistently estimate the asymptotic covariance matrix test did reject, then this signals a possible reason for the rejection.
This diagnostic for the core regression, j = 1, is especially
(I′ Ω ∗−1 I)−1 in a number of obvious ways.
informative. First, we can test H01 even when a robustness test is
The estimator β̃n∗ is optimal in the sense that for any other
not possible (for example, due to singularity of RM ∗−1 V ∗ M ∗−1 R′ ).
Ď Ď p
combination estimator β̃n = Aβ̃n such that β̃n → βo , where A is a If we reject H01 , then we have evidence that the core covariates
Ď
nonstochastic k0 × Jk0 weighting matrix, we have that av ar (β̃n ) − are insufficient to identify βo ; this is, after all, the paramount issue
av ar (β̃n ) is positive semi-definite.
∗ for the study at hand. Further, regardless of whether or not we
Of course, this is only a relative efficiency result. That is, β̃n∗ reject H01 , the comparison regression diagnostics indicate which
makes relatively efficient use of the FGLS comparison regressions, comparison regressions cannot be validly used in the FOGLeSs
but it need not be fully efficient. This is nevertheless useful, given estimator. If for any j we reject H0j , then we have evidence that
that the relevant efficiency bound and the estimator attaining βj∗ ̸= βo , so there is little point to computing the FGLS estimator
this bound are presently unknown. But note that if one of the β̃nj and including it in obtaining the FOGLeSs estimator. If we do
comparison FGLS regressions is relatively (or even fully) efficient, have some comparison regressions with covariates that identify
then A∗ ≡ (I′ Ω ∗−1 I)−1 I′ Ω ∗−1 will signal this by taking the form βo , we can just use these in the FOGLeSs estimator, regardless of
the outcome of the robustness test. Examining the comparison
A∗ = [0, . . . , 0, Iko , 0, . . . , 0],
regression diagnostics may help in finding covariates that do
where 0 is a k0 × k0 zero matrix, and Ik0 appears in the j∗ th identify βo , as we will not reject H0j for these regressions. Care must
position when β̃nj∗ is the relatively (fully) efficient estimator. It will be taken, however, because E (εj2 | D, Xj ) not depending on D does
thus be interesting to inspect the sample estimator of A∗ to see not imply D ⊥ U | Xj ; ideally, judgment informed by structural
if it approximates this form. If so, one may prefer just to use the insight will be applied in such cases.
indicated estimator.
Given that we are considering making efficient use of infor- 9. A step-by-step summary
mation contained in a system of related equations, one might
consider stacking the robustness check regressions and trying to This section summarizes the robustness checking and testing
construct an estimator analogous to the Seemingly Unrelated Re- process discussed here. We discuss the modeling steps required
gression (SUR) estimator. As it turns out however, this strategy prior to using testrob, followed by those of the testrob module. The
breaks down here. The problem is that a necessary condition for process is interactive, requiring the researcher to make a number
the consistency of this estimator here is that E (εj | D, Z , W ) = 0, of decisions along the way.
X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206 203

We emphasize that what follows is not intended as a definitive Table 1a


implementation of robustness checking and testing as advocated Outcome of interest.
here. Instead, it is designed to be a prototypical template that Name Description
can be conveniently used in applications and that can serve as Y Difference in OROA The three year average of industry- and
an accessible basis for variants that may be more sophisticated or performance-adjusted OROA after
better suited to specific contexts. CEO transitions minus the three year average
Before using testrob, we recommend that the researcher take before CEO transitions
the following steps:
1. Identify an outcome of interest, Y , and potential causes of
Table 1b
interest, D. The latter are the critical core variables. Potential cause of interest (critical core variable).
2. Formally or informally model not only the economic structure
Name Description
relating D to Y , but also that determining D. From the structure
determining Y , one can ascertain the actual or potential drivers D Family = 1 if the incoming CEO of the firm is related by blood
of Y other than D. Those drivers that can be accurately observed CEO or marriage to the departing CEO, to the founder, or to
a larger shareholder
are non-critical core variables, Z . Those that cannot are the
= 0 otherwise
unobservables, U. Similarly, from the structure for D, one can
ascertain its observed and unobserved drivers; call these Q and
V , respectively. The non-core variables from this step, denoted W0∗ , are
3. Specify valid covariates, W . As discussed above, these can treated in the same way as the user-specified core variables.
be proxies for U, observed drivers Q of D, and proxies for 5. Next, testrob conducts the specified comparison regressions and
unobserved drivers V of D. They should not be variables driven the robustness test for the critical core coefficients. The non-
by D. This specification should be comprehensive and in accord core covariates in the comparison regressions include the Wℓ∗ ’s
with Section 5.
specified by the user, together with the non-core variables W0∗
Several subgroups of covariates must be separately desig-
identified in step 4. These results are presented to the user, to-
nated for testrob: W0 , the initial covariates, which will be used
gether with information about which comparison regressions
to construct the core covariates; and L > 0 groups of non-core
covariates, Wℓ∗ . The latter can be either groups of non-core co- had to be dropped to achieve a nonsingular robustness test co-
variates to be used as is in the robustness check comparison variance matrix. To provide further insight, testrob also com-
regression(s), or they can be groups of covariates and their sub- putes and reports diagnostics for regression nonlinearity and
sets to be used in the comparison regressions. A ‘‘flag‘‘to testrob conditional exogeneity for each retained comparison regres-
specifies whether a group is to be used as is, or as a group to- sion. The conditional exogeneity test is that described in Sec-
gether with specified subsets. The groups are specified to te- tion 8.4. The nonlinearity test is a neural network test for
strob by designating indexes of the W vector. The indexes of W0 neglected nonlinearity described in Lee et al. (1993).8
should not overlap with those of the Wℓ∗ ’s. The indexes of the 6. If the results of step 5 suggest serious structural misspecifica-
Wℓ∗ ’s can overlap but should not coincide. tion, the researcher may decide to terminate the testrob pro-
We comment that there is no necessary virtue in proliferat- cess and rethink his or her model specification. If, however, the
ing comparison regressions, as a single comparison regression results support regression linearity and conditional exogeneity
can signal structural misspecification. Having many comparison for some subset of the comparison regressions, one may decide
regressions can dilute the power of the robustness test, espe- to continue. For this, the process queries the user as to which
cially by introducing near singularities into the needed asymp- comparison regressions to use. Given these, testrob estimates
totic covariance matrix. Also, the required computing time the squared residual Eq. (4), computes the FGLS regressions, and
increases (possibly substantially) as a function of the number reports the results to the user.
of comparison regressions. One possible benefit to having more 7. Finally, testrob computes the FOGLeSs estimator and associated
comparison regressions is improved efficiency for the FOGLeSs statistics, reports these to the user, and terminates.
estimator, but this can also be achieved by judicious choice of
non-core covariates, particularly proxies for U that impact its
conditional variance. 10. An illustrative example
With these preparations, one is ready to invoke testrob.
4. The first task for testrob is to identify core covariates. One op- We illustrate the testrob procedure using the dataset analyzed
tion is to declare that W0 contains no non-core covariates. This by Pérez-González (2006). This is a rich dataset to which
sets X1 = (Z ′ , W0′ )′ as the vector of non-critical core covari- Pérez-González applies methods that correspond closely to our
ates, and one proceeds to the next step. Otherwise, testrob re- discussion in the introduction. Pérez-González is interested in
gresses each element of D on X0 = (Z ′ , W0′ )′ . These results are the impact of inherited control on firm performance. He uses
presented to the user, together with a list of the variables in de- data from 335 management transitions of publicly traded U.S.
creasing order of the p-value for the chi-squared statistic for the corporations to examine whether firms with familially related
hypothesis that the coefficients of that variable are jointly zero. incoming chief executive officers (CEOs) underperform in terms of
Recall that non-core variables will have approximately zero co- operating profitability relative to firms with unrelated incoming
efficients in all regressions, so the top variables in this list are CEOs. Thus, in this application, D is a binary variable that equals
those most plausible to be non-core. The testrob process then
1 if the incoming CEO is related to the departing CEO, to the
queries the user as to which, if any covariates to treat as non-
founder, or to a large shareholder by blood or marriage and that
core. Denote the remaining initial covariates W̃0 . At this point, equals 0 otherwise. Pérez-González uses operating return on assets
one can declare that W̃0 contains no non-core covariates, which (OROA) as a measure of firm performance. Specifically, Y here is
sets X1 = (Z ′ , W̃0′ )′ as the vector of core covariates. One then the difference in OROA calculated as the three-year average after
proceeds to the next step.
If one proceeds with the current step, the testrob process
then queries the user as to which additional covariates to treat
as non-core. The remaining variables, W1 , are used to construct 8 As mentioned in Footnote 7, the linearity and conditional exogeneity tests can
the non-critical core covariates, X1 = (Z ′ , W1′ )′ . be implemented before the robustness test.
204 X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206

Table 1c
Covariates.
Name Description

W (1) Ln sales Logarithm of sales one year prior to the CEO transition
W (2) Industry-adjusted OROA Industry adjusted OROA one year prior to the CEO transition
W (3) Industry-adjusted M-B Industry adjusted market-to-book (M-B) ratio one year prior to the CEO transition
W (4) Board ownership The fraction of ownership held by officers and directors
W (5) Family directors = 1 if the fraction of family to total directors is higher than the median in the sample
= 0 otherwise
W (6) Mean pre-transition industry and Three year pre-transition average of the industry- and performance-adjusted OROA
performance adjusted OROA
W (7) Less selective college = 1 if the college attended by the incoming CEO is not in ‘‘very competitive’’ or higher in Barron’s ranking
= 0 otherwise
W (8) Graduate school = 1 if the incoming CEO attended a graduate school
= 0 otherwise
W (9) Age promoted The age when the incoming CEO is appointed
W (10) Woman = 1 if the incoming CEO is a woman
= 0 otherwise
W (11) Positive R&D expenses = 1 if the firm reported positive R&D expenses the year prior to the CEO transition
= 0 otherwise
W (12) Nonretirements = 1 if the departing CEO was not reported to leave the firm due to a ‘‘retirement’’
= 0 otherwise
W (13) Early succession = 1 if the departing CEO left his position before 65
= 0 otherwise
W (14) Departing CEO remains = 1 if the departing CEO continued as chairman after the CEO transition
as chairman = 0 otherwise
W (15) CEO ownership The ownership share of the incoming CEO
W (16) Year dummy, 1981–1999 Year dummies
W (34)

succession minus the three-year average before succession. Precise Table 2


Covariate classification.
definitions of D and Y are given in Tables 1a and 1b, respectively.
As stated in Pérez-González (2006, p. 1559), ‘‘the main argu- Covariate group Covariates
ment against family succession in publicly traded firms is that com- Firm size W (1 )
petitive contest for top executive positions would rarely result in a Firm’s past performance W (2), W (3), W (6)
Board characteristics W (4), W (5)
family CEO’’. Nevertheless, Pérez-González (2006, p. 1560) also ar-
Firm’s R&D expenditure W (11)
gues that family succession may benefit firm performance, for ex- Departing CEO’s separation conditions W (12), W (13), W (14), W (15)
ample, by ‘‘reducing agency problems’’, ‘‘facilitating firm specific and incoming CEO’s ownership
investment’’, ‘‘easing cooperation and the transmission of knowl- Incoming CEO’s characteristics W (7), W (8), W (9), W (10)
edge within organizations’’, or ‘‘having a long-term focus that un-
related chief executives lack’’. Thus, it is of interest to examine the firm’s past performance. We use ‘‘board ownership’’ and ‘‘Fam-
impact of family succession on firm performance empirically. ily directors’’ to proxy for board characteristics. We use ‘‘Pos-
Although Pérez-González does not include error-free measures itive R&D expense’’ to proxy for the firm’s R&D expenditure.
of other causes Z of Y , he does include a large number of covariates Further, as pointed out in Pérez-González (p. 1582), ‘‘CEO sep-
to control for unobserved drivers. We list these in Table 1c. aration conditions or the age at which the departing CEO re-
Next, we demonstrate implementing testrob as described above tires, may reveal information about the state of affairs of a
and present the results. corporation that is not captured by firm characteristics’’. We use
1. As discussed above, Y is the ‘‘Difference of OROA’’. D is ‘‘Family ‘‘Nonretirements’’, ‘‘Early succession’’, ‘‘Departing CEO remains
CEO’’. as chairman’’, and ‘‘CEO ownership’’ to represent this. Further,
2. Many factors affect OROA during the CEO transition period, we use ‘‘Less selective college’’, ‘‘Graduate school’’, ‘‘Age pro-
for example, the size of the firm, how the firm performed in moted’’, and ‘‘Woman’’ to proxy the incoming CEO’s charac-
the past, the board characteristics, and how the firm invested teristics. Table 2 summarizes. As we discussed in step 2, firm
in R&D. We assume that we do not observe these other true size, firm’s past performance, and firm’s board characteristics
drivers of OROA, but observe their proxies as discussed in step affect both OROA during the CEO transition period and the
3 below. Thus we let Z be empty. There are also many factors selection of the CEO. Therefore, we include proxies for these
influencing the CEO hiring decision, for example, characteristics among the core covariates. We also include year dummies as
of the incoming CEO, such as age, gender, and education. Also, as core covariates. Thus, the initial core covariates are W0 =
stated in Pérez-González (2006, p. 1578), ‘‘previous studies have {W (1), W (2), W (3), W (4), W (5), W (6), W (16) − W (34)}.9
shown that firm performance, size and board characteristics We use the remaining three groups in the comparison regres-
affect firms’ hiring and firing decisions, as well as selection sions. That is, W1∗ = {W (7), W (8), W (9), W (10)}, W2∗ =
of internal relative to external candidates’’. For example, {W (11)}, and W3∗ = {W (12), W (13), W (14), W (15)}. To avoid
small firms may have difficulty in hiring competent unrelated proliferating comparison regressions, we do not use subsets of
managers. A departing CEO who overperforms relative to other W1∗ , W2∗ , or W3∗ .
firms in the same industry, may ‘‘have power and (influence to 4. In this step, we identify core covariates. We do not declare
name an offerspring as CEO)’’. that the initial core covariates contain no non-core covariates.
3. We further classify the available covariates into different
groups. We use ‘‘Ln sales’’ to proxy firm size, ‘‘Industry adjusted
OROA’’, ‘‘Industry-adjusted M-B’’, and ‘‘Mean pre-transition 9 This set of covariates is exactly that used in column 1 in Table 9 in Pérez-
industry- and performance-adjusted OROA’’ to proxy for the González (2006, p. 1581).
X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206 205

Table 3 Feasible Optimally combined GLS (FOGLeSs) estimator provides


Regression of D on X0 . a relatively efficient combination of the estimators compared in
Covariates W (3 ) W (1) W (6 ) W (4) W (2 ) W (5) performing the robustness test. A new Matlab procedure, testrob,
p-values 0.973 0.885 0.766 0.471 0.334 0.000 freely available at http://ihome.ust.hk/~xunlu/code.html embod-
Note: The results for year dummies are not reported. We include these in all ies these methods. Our hope is that the ready availability of this
regressions. procedure will encourage researchers to subject their analyses to
more informative methods for identifying potential misspecifica-
Table 4 tion, strengthening the validity and reliability of structural infer-
Sets of comparison regressions. ence in economics.
Set Regressors
Appendix A. Equality of β1∗ and β2∗ inconsistent for βo
Set 1 {D} ∪ W0
Set 2 {D} ∪ W0 ∪ W0∗ Consider vectors X1 and X2 = (X1 , W ), where X1 contains Z .
Set 3 {D} ∪ W0 ∪ W1∗ Suppose that D ̸⊥ U | X1 and D ̸⊥ U | X2 , so that both X1 and
Set 4 {D} ∪ W0 ∪ W2∗ X2 are invalid covariates. Further, suppose W ⊥ D | X1 . Letting
Set 5 {D} ∪ W0 ∪ W3∗
conditional expectations be linear for simplicity, we have
E (Y |D, X2 ) = D′ βo + Z ′ αo + E (U | D, X1 , W )
Table 5
= D′ βo + Z ′ αo + D′ ζd∗ + X1′ ζx∗1 + W ′ ζw∗
Robustness and diagnostic tests.
= D′ βo + ζd∗ + Z ′ αo + X1′ ζx∗1 + W ′ ζw∗
 
Robustness test p-value 0.768
Diagnostic tests Set 1 Set 2 Set 3 Set 4 Set 5 E (Y |D, X1 ) = E [E (Y |D, X2 )|D, X1 ]
= D′ βo + ζd∗ + Z ′ αo + X1′ ζx∗1 + E W ′ ζw∗ |D, X1
   
Linearity test p-values 0.635 0.756 0.644 0.451 0.279
= D′ βo + ζd∗ + Z ′ αo + X1′ ζx∗1 + X1′ δx∗1 ζw∗
 
Conditional exogeneity 0.138 0.096 0.136 0.150 0.112
test p-values
= D′ βo + ζd∗ + Z ′ αo + X1′ ζx∗1 + δx∗1 ζw∗ .
   

This implies robustness of the D coefficients, despite the invalidity


Thus, testrob regresses D on X0 = W0 . Table 3 below presents of both X1 and X2 .
the results. W (3), W (1), W (6), W (4), and W (2) all have high
p-values. We could drop all of these from the core covari- Appendix B. Choosing valid covariates
ates. To be conservative, however, we decide to drop {W (3),
W (1), W (6)}. Thus the core covariates are now W0 = {W (2), To examine the how valid covariates W can be chosen, we
W (4), W (5)}. The new non-core covariates W0∗ = {W (1), invoke Reichenbach’s (1956) principle of common cause: if two
W (3), W (6)} become a new comparison group. We only use the variables are correlated,10 then one causes the other or there is a
full group W0∗ in the comparison regressions. third common cause of both. Chalak and White (2012) give formal
5. We now have 5 sets of comparison groups, as shown in Ta- conditions, applicable here, ensuring the validity of this principle.
ble 4. Testrob performs a singularity check; all five sets are Here, we want X correlated with U and/or with D, so we focus on
retained. Then testrob implements the robustness test and the properties of W that may enhance this correlation enough to
performs linearity and conditional exogeneity tests for each set. yield D ⊥ U | X . What follows summarizes a more extensive
Table 5 presents the results. discussion in White (2006b).
6. It appears that there is no structural misspecification detected First, consider the relation between W and U. If these are
in step 5. We thus decide to continue, using all five compari- correlated, then W causes U, U causes W , or there is an underlying
son groups. Testrob now estimates the coefficient on the critical common cause. Each possibility is useful in suggesting choices for
core variable for each covariate group and also computes the W ; however, these possibilities can be simplified. (a.i) If W is a
FOGLeSs regression. Table 6 presents the results: Finally, testrob cause of U, then by substitution, it is also a cause of Y , typically
terminates. together with some other unobservables, say V . Now W can be
considered a component of Z , and the issue is whether or not
This formally demonstrates that the results of Pérez-González D ⊥ V | Z . This takes us back to the original situation. Thus, the
(2006) are robust to different sets of covariates. The FOGLeSs possibility of substitution ensures that, without loss of generality,
estimate of the coefficient for the critical core variable, Family CEO, we can adopt the convention that W does not cause U. (a.ii) If U is
is −0.0292 with a t-statistic of −4.6557. a cause of W , then W acts as a proxy for U, as in the classic case
where IQ score acts as a proxy for unobserved ability. Note that
11. Summary and concluding remarks W ’s role is purely predictive for U, as W is not a structural factor
determining Y . (a.iii) If there is an observable common cause of U
Although robustness checks are common in applied economics,
and W , then by substitution we can include it in Z , just as in case
their use is subject to numerous pitfalls. If not implemented prop-
(a.i). W becomes redundant in this case. If there is an unobservable
erly, they may be uninformative; at worst, they can be entirely mis-
common cause of U and W , say V , then by substitution we arrive
leading. Here, we discuss these pitfalls and provide straightforward
back at case (a.ii). With these conventions, the single relevant case
methods that preserve the diagnostic spirit underlying robustness
is that W is a proxy for U, caused by U. Such proxies are a main
checks. We distinguish between critical and non-critical core vari-
source of covariates W . An interesting possibility in time-series
ables, and we discuss how these can be properly specified. We also
applications is that W may contain not only lags but also leads of
discuss how non-core variables for the comparison regressions can
variables driven by U, as discussed by White and Kennedy (2009)
be chosen to ensure that robustness checks are indeed structurally
and White and Lu (2010).
informative.
Our formal robustness test is a Hausman-style specification
test. We supplement this with diagnostics for nonlinearity and
exogeneity that can help in understanding why robustness test 10 For succinctness in what follows, we use ‘‘correlation’’ loosely to mean any
rejection occurs or in identifying invalid sets of covariates. The stochastic dependence.
206 X. Lu, H. White / Journal of Econometrics 178 (2014) 194–206

Table 6
FGLS and FOGLeSs results.
Set 1 Set 2 Set 3 Set 4 Set 5 FOGLeSs

Estimate −0.0286 −0.0327 −0.0210 −0.0286 −0.0291 −0.0292


Robust standard errors 0.0083 0.0075 0.0091 0.0078 0.0096 0.0062
Asymptotic t-statistic −3.4481 −4.3603 −2.2930 −3.6740 −3.0148 −4.6557
FOGLeSs weight 0.1855 0.3873 0.1319 0.1897 0.1057 –

Next, consider the relation between W and D. If these are Heckman, J., Hotz, J., 1989. Choosing among alternative nonexperimental methods
correlated, then W causes D, D causes W , or there is an underlying for estimating the impact of social programs: the case of manpower training.
Journal of the American Statistical Association 84, 862–874.
common cause. Again, the possibilities simplify. (b.i) The causes Heckman, J., Navarro-Lozano, S., 2004. Using matching, instrumental variables, and
W of D are another key source of potential covariates. (b.ii) The control functions to estimate economic choice models. Review of Economics
and Statistics 86, 30–57.
case where D causes W is problematic. As discussed by Rosenbaum Hendel, I., Nevo, A., Ortalo-Magné, F., 2009. The relative performance of real estate
(1984) and by Heckman and Navarro-Lozano (2004), including marketing platforms: MLS versus FSBOMadison.com. American Economic
regressors (W ) driven by the cause of interest (D) in a regression Review 99, 1878–1898.
Imbens, G., 2004. Nonparametric estimation of average treatment effects under
with D generally leads to inconsistent estimates of the effects exogeneity: a review. Review of Economics and Statistics 86, 4–29.
of interest. Such regressions are not an appropriate basis for Lavy, V., 2009. Performance pay and teachers’ effort, productivity, and grading
ethics. American Economic Review 99, 1979–2011.
examining robustness, so we rule out variables caused by D as
Leamer, E., 1983. Let’s take the con out of econometrics. American Economic Review
covariates. (b.iii) If there is an observable common cause of D and 73, 31–43.
W , then, by substitution, that common cause is a cause of D, and we Leaver, C., 2009. Bureaucratic minimal squawk behavior: theory and evidence from
regulatory agencies. American Economic Review 99, 572–607.
are back to case (b.i) by replacing the original W with its observable Lee, T., White, H., Granger, C.W.J., 1993. Testing for neglected nonlinearity in time
cause in common with D. If there is an unobservable common cause series models: a comparison of neural network methods and alternative tests.
of D and W , say V , then W acts as a proxy for V , analogous to case Journal of Econometrics 56, 269–290.
Makowsky, M., Stratmann, T., 2009. Political economy at any speed: what
(a.ii). determines traffic citations? American Economic Review 99, 509–527.
Summarizing, we see that W can either contain proxies for U, Mas, A., Moretti, E., 2009. Peers at work. American Economic Review 99, 112–145.
observed drivers of D, or proxies for unobserved drivers of D. It Matsusaka, J., 2009. Direct democracy and public employees. American Economic
Review 99, 2227–2246.
should not contain outcomes driven by D. Miller, N., 2009. Strategic leniency and cartel enforcement. American Economic
Review 99, 750–768.
Oberholzer-Gee, F., Waldfogel, J., 2009. Media markets and localism: does local
References news en Español boost Hispanic voter turnout? American Economic Review 99,
2120–2128.
Adams, W., Einav, L., Levin, J., 2009. Liquidity constraints and imperfect information Pérez-González, F., 2006. Inherited control and firm performance. American
in subprime lending. American Economic Review 99, 49–84. Economic Review 96, 1559–1588.
Alfaro, L., Charlton, A., 2009. Intra-industry foreign direct investment. American Ramsey, J.B., 1969. Tests for specification errors in classical linear least-
Economic Review 99, 2096–2119. squares regression analysis. Journal of the Royal Statistical Society, Series B
Andrews, D., 1994. Asymptotics for semiparametric econometric models via (Methodological) 31, 350–371.
stochastic equicontinuity. Econometrica 62, 43–72. Reichenbach, H., 1956. The Direction of Time. University of California Press,
Angelucci, M., De Giorgi, G., 2009. Indirect effects of an aid program: how do Berkeley.
cash transfers affect ineligibles’ consumption? American Economic Review 99, Robinson, P.M., 1987. Asymptotically efficient estimation in the presence of
486–508. heteroskedasticity of unknown form. Econometrica 55, 875–891.
Angrist, J., Lavy, V., 2009. The effects of high stakes high school achievement
awards: evidence from a randomized trial. American Economic Review 99, Rosenbaum, P., 1984. The consequences of adjusting for a concomitant variable that
1384–1414. has been affected by treatment. Journal of the Royal Statistical Society, Series A
Ashraf, N., 2009. Spousal control and intra-household decision making: an 147, 656–666.
experimental study in the Philippines. American Economic Review 99, Rosenbaum, P., 1987. The role of a second control group in an observational study.
1245–1277. Statistical Science 2, 292–306.
Boivin, J., Giannoni, M., Mihov, I., 2009. Sticky prices and monetary policy: evidence Ruppert, D., Wand, M.P., 1994. Multivariate locally weighted least squares
from disaggregated US data. American Economic Review 99, 350–384. regression. Annals of Statistics 22, 1346–1370.
Cai, H., Chen, Y., Fang, H., 2009. Observational learning: evidence from a randomized Sialm, C., 2009. Tax changes and asset pricing. American Economic Review 99,
natural field experiment. American Economic Review 99, 864–882. 1356–1383.
Chalak, K., White, H., 2011. An extended class of instrumental variables for the Spilimbergo, A., 2009. Democracy and foreign education. American Economic
estimation of causal effects. Canadian Journal of Economics 44, 1–51. Review 99, 528–543.
Chalak, K., White, H., 2012. Causality, conditional independence, and graphical Stock, J., Watson, M., 2007. Introduction to Econometrics. Addison-Wesley, Boston.
separation in settable systems. Neural Computation 24, 1611–1668. Su, L., White, H., 2010. Testing structural change in partially linear models.
Chen, Y., Li, S., 2009. Group identity and social preferences. American Economic Econometric Theory 26, 1761–1806.
Review 99, 431–457. Urquiola, M., Verhoogen, E., 2009. Class-size caps, sorting, and the regression-
Chetty, R., Looney, A., Kroft, K., 2009. Salience and taxation: theory and evidence. discontinuity design. American Economic Review 99, 179–215.
American Economic Review 99, 1145–1177.
White, H., 1980. A Heteroskedasticity-consistent covariance matrix estimator and
Cleveland, W., 1979. Robust locally weighted regression and smoothing scatter-
a direct test for heteroskedasticity. Econometrica 48, 817–838.
plots. Journal of the American Statistical Association 74, 829–836.
Dahl, C., Gonzalez-Rivera, G., 2003. Testing for neglected nonlinearity in regression White, H., 2006a. Approximate nonlinear forecasting methods. In: Elliott, G.,
models based on the theory of random fields. Journal of Econometrics 114, Granger, C.W.J., Timmermann, A. (Eds.), Handbook of Economics Forecasting.
141–164. Elsevier, New York, pp. 460–512.
Dawid, A.P., 1979. Conditional independence in statistical theory. Journal of the White, H., 2006b. Time-series estimation of the effects of natural experiments.
Royal Statistical Society, Series B (Methodological) 41, 1–31. Journal of Econometrics 135, 527–566.
Dobkin, C., Nicosia, N., 2009. The war on drugs: methamphetamine, public health, White, H., Chalak, K., 2010. Testing a conditional form of exogeneity. Economics
and crime. American Economic Review 99, 324–349. Letters 109, 88–90.
Engle, R., Granger, C.W.J., Rice, J., Weiss, A., 1986. Semiparametric estimates of White, H., Chalak, K., 2013. Identification and identification failure for treatment
the relation between weather and electricity sales. Journal of the American effects using structural systems. Econometric Reviews 32, 273–317.
Statistical Association 81, 310–320. White, H., Kennedy, P., 2009. Retrospective estimation of causal effects through
Forbes, S., Lederman, M., 2009. Adaptation and vertical integration in the airline time. In: Castle, J., Shephard, N. (Eds.), The Methodology and Practice of
industry. American Economic Review 99, 1831–1849. Econometrics: A Festschrift in Honour of David Hendry. Oxford University
Gallant, R., 1982. Unbiased determination of production technologies. Journal of Press, Oxford, pp. 59–87.
Econometrics 20, 285–323. White, H., Lu, X., 2010. Granger causality and dynamic structural systems. Journal
Griliches, Z., 1977. Estimating the returns to schooling: some econometric of Financial Econometrics 8, 193–243.
problems. Econometrica 45, 1–22. White, H., Lu, X., 2011. Causal diagrams for treatment effect estimation with
Hahn, J., 2004. Functional restriction and efficiency in causal inference. Review of application to efficient covariate selection. Review of Economics and Statistics
Economics and Statistics 86, 73–76. 93, 1453–1459.
Hamilton, J., 2001. A parametric approach to flexible nonlinear inference. White, H., Stinchcombe, M., 1991. Adaptive efficient weighted least squares with
Econometrica 69, 537–573. dependent observations. In: Stahel, W., Weisberg, S. (Eds.), Directions in Robust
Hargreaves Heap, S., Zizzo, D., 2009. The value of groups. American Economic Statistics and Diagnostics, IMA Volumes in Mathematics and Its Applications.
Review 99, 295–323. Springer-Verlag, New York, pp. 337–364.
Hausman, J.A., 1978. Specification tests in econometrics. Econometrica 46,
Wooldridge, J., 2002. Econometric Analysis of Cross Section and Panel Data. MIT
1251–1271.
Press, Cambridge.

You might also like