Location-Scale and Compensated Effects in Unconditional
Location-Scale and Compensated Effects in Unconditional
Quantile Regressions*
Julian Martinez-Iriarte† Gabriel Montes-Rojas‡ Yixiao Sun§
January 6, 2022
Abstract
This paper proposes an extension of the unconditional quantile regression analysis to (i)
location-scale shifts, and (ii) compensated shifts. The first case is intended to study a counter-
factual policy analysis aimed at increasing not only the mean or location of a covariate but also
its dispersion or scale. The compensated shift refers to a situation where a shift in a covariate
is compensated at a certain rate by another covariate. Not accounting for these possible scale
or compensated effects will result in an incorrect assessment of the potential policy effects
on the quantiles of an outcome variable. More general interventions and compensated shifts
are also considered. The unconditional policy parameters are estimated with simple semi-
parametric estimators, for which asymptotic properties are studied. Monte Carlo simulations
are implemented to study their finite sample performances, and the proposed approach is
applied to a Mincer equation to study the effects of a location-scale shift in education on the
unconditional quantiles of wages.
* For helpful conversations, we thank Javier Alejo, George Bulman and Augusto Nieto-Barthaburu. All errors
remain our own.
† Department of Economics, UC Santa Cruz. E-mail: [email protected]
‡ CONICET and Universidad de Buenos Aires. E-mail: [email protected]
§ Department of Economics, UC San Diego. E-mail: [email protected]
1
1 Introduction
In many research areas, it is important to assess the distributional effects of covariates on an
outcome variable. Several methods have been implemented in the literature to study this. A
prolific line of research is a combination of conditional mean and quantile regression models
together with micro simulation exercises, as in Autor, Katz, and Kearney (2005), Machado and
Mata (1995), and Melly (2005) (see Fortin, Lemieux, and Firpo (2011) for a review). A more recent
and popular method is the recentered influence function (RIF) regression of Firpo, Fortin, and
Lemieux (2009), which directly estimates the effect of a covariate change on a functional of the
unconditional distribution of the outcome variable. The functional of interest can be the mean,
quantile, or any other aspect of the unconditional distribution.
Consider, as an example, the unconditional quantile of the outcome variable Y. Let FY be the
unconditional distribution function of Y, then the τ-quantile of FY is defined by
Qτ [Yδ ] − Qτ [Y ]
Πτ := lim .
δ →0 δ
Firpo, Fortin, and Lemieux (2009) consider a pure location shift Xδ = X + δ. This shift affects
the entire unconditional distribution of Y = h( X, W, U ), moving it towards a counterfactual
distribution of Yδ = h( Xδ , W, U ). One of the main results in Firpo, Fortin, and Lemieux (2009,
p.958, eq. (6)) is that Πτ can be represented as an average derivative:
Πτ = E [ψ̇x ( X, W )] ,
2
where
∂E [ψ (Y, τ, FY ) | X = x, W = w]
ψ̇x ( x, w) = ,
∂x
ψ (y, τ, FY ) = [τ − 1 {y ≤ Qτ [Y ]}] / f Y ( Qτ [Y ]) is the influence function of the quantile functional,
and f Y ( Qτ [Y ]) is the unconditional density of Y evaluated at the τ-quantile Qτ [Y ]. The uncon-
ditional quantile effect Πτ can then be estimated by first running an unconditional quantile
regression (henceforth, UQR), which involves regressing the influence function ψ (Yi , τ, FY ) on
the covariates ( Xi , Wi ) and then taking an average of the partial derivatives of the regression
function with respect to X.
The same method is applicable to other functionals of interest — we only need to replace
ψ (Yi , τ, FY ) by the influence function underlying the functional we care about. This leads to
the general RIF regression of Firpo, Fortin, and Lemieux (2009). The potential simplicity and
flexibility that the methodology offers motivate subsequent research to expand the use of RIF re-
gressions. On the empirical side, after its introduction, RIF regressions became a popular method
for analyzing and identifying the distributional effects on outcomes in terms of changes in ob-
served characteristics in areas such as labor economics, income and inequality, health economics,
and public policy. On the theoretical side, the RIF type of regression has been used to study
the effect of a change in a discrete covariate.1 More recent research on UQR and RIF regres-
sions includes the high-dimensional setting of Sasaki, Ura, and Zhang (2020) and the two-sample
problem of Inoue, Li, and Xu (2021).
This paper extends the UQR and RIF regression in several ways. First, we allow simultaneous
location and scale shifts in a continuous covariate. The main goal is to study a case where a
counterfactual policy analysis aiming at increasing the location or the mean of a covariate might
also affect its dispersion. For example, we may consider Xδ = X (1 + δ)−1 + δ. We find that in this
case, the marginal effect has a closed-form expression. In order to interpret the scale effect, we
introduce the quantile-standard deviation elasticity: the percentage change in the unconditional
quantiles of the outcome associated with a 1% change in the standard deviation of the target
covariate.
Second, we consider the case of compensated location changes in two covariates. This hap-
pens when a location shift in one covariate induces a location shift in another covariate. For
example, Y = h ( X1 , X2 , W, U ) for two scalar target covariates X1 and X2 , and the policy induces
X1δ = X1 + δ and X2δ = X2 − δ. We show that the compensated effect can be obtained as a linear
combination of individual effects obtained by considering one change at a time.
Third, while we focus mainly on location-scale and compensated location shifts, we consider a
general framework that includes these two types of shifts as special cases. In fact, our framework
allows for any smooth and invertible intervention of the target covariates.
Fourth, we allow the target covariates to be endogenous, and we characterize the asymptotic
1 Insuch a case, we may consider a shift in the probability mass function. The discrete case was initially studied by
Firpo, Fortin, and Lemieux (2009). See Rothe (2012), Martinez-Iriarte (2020), and Martinez-Iriarte and Sun (2021b) for
further studies.
3
bias of the unconditional effect estimator when the endogeneity is not appropriately accounted
for. We eliminate the endogeneity bias using a control function approach.
Fifth, as a complement to the existing literature that focuses on changing the marginal distribu-
tion of the target covariates, we consider changing the values of the target covariates directly. An
advantage of our approach is that the changes under consideration are directly implementable.
We note that it may not be easy to induce a desired shift in the marginal distribution, and when
possible, such a shift is often achieved via transforming the target covariates, which is what we
consider here.
Finally, we propose consistent and asymptotically normal semiparametric estimators of the
location-scale effect and the compensated effect. The estimators can be easily implemented in
empirical work using either a probit or logit specification of the conditional distribution func-
tion. We conduct an extensive Monte Carlo study evaluating the finite sample performances of
the location-scale effect estimator and the accuracy of the normal approximation. Simulation
results show that the estimator works reasonably well under different specifications and that the
standard normal distribution provides a good approximation to the finite sample distribution of
a studentized test statistic introduced in this paper.
As potential applications of our proposed approach, consider the following empirical exam-
ples to motivate its use.
Example 1. Effect of increasing education on wage inequality. In a Mincer equation, log wages Y are
modeled as a function of certain observable covariates such as education. A study of the effect of a shift in
education on wage inequality could be implemented using our proposed framework. We can accommodate
a counterfactual policy experiment where there may be not only a general increase in education but also a
change in its dispersion.
Example 2. Trade integration and skill distribution Gu, Malik, Pozzoli, and Rocha (2019) document
the impact of trade integration on both the mean and the standard deviation of the skill distribution across
municipalities in Denmark. Moreover, as argued by Hanushek and Woessmann (2008), skills are related
to the income distribution. Thus, a quantification of the impact of a scale effect in the skills distribution on
the quantiles of the income distribution appears to be relevant.
Example 3. Financial return and risk. Consider the study of two assets X and W in a portfolio in-
vestment framework with stochastic returns Y. We are interested in how changes in the returns of asset
X affect the distribution of Y through its unconditional quantiles. A typical exercise involves analyzing
changes in the returns (location) and risk (scale) of X. Ignoring the structural interpretations if identifi-
cation fails, we can still use the proposed framework to decompose the relative contribution of each effect.
This could be applied to Value-at-Risk models; see, for instance, Engle and Manganelli (2004).
We illustrate the proposed method with an empirical application related to Example 1: the
effect of changing education on wage inequality, decomposing it into location and scale effects.
Empirical results reveal the contrasting nature of the two effects. The location effects are seen to
be positive and relatively similar across quantiles. On the other hand, the scale effects are highly
4
heterogeneous and monotonically decreasing across quantiles. Hence, the scale effects can more
than offset the location effects. This shows that not accounting for both shifts may result in a
biased assessment of the policy effects on the quantiles of the outcome variable.
The paper is organized as follows. Section 2 defines and studies the location-scale marginal
effects in one covariate. Section 3 proposes and studies a compensated change in two covari-
ates. Section 4 describes the estimators of the location-scale effect and the compensated effect
and studies their asymptotic properties. Section 5 reports the finite sample performance of the
location-scale effect estimator and the associated tests, and Section 6 presents the empirical ap-
plication. Section 7 concludes. The proofs are in the Appendix. Calculation details for two
theoretical examples are given in the Supplementary Appendix.
A word on notation: we use FY |X (y| x ) and f Y |X (y| x ) to denote the cumulative distribution
function and the probability density function of Y, respectively, conditional on X = x. For a
random variable Z, the unconditional τ-quantile is denoted by Qτ [ Z ], i.e., Pr( Z ≤ Qτ [ Z ]) = τ.
For a pair of random variables Z1 and Z2 , the conditional quantile is denoted by Qτ [ Z1 |z2 ], i.e.,
Pr( Z1 ≤ Qτ [ Z1 |z2 ]| Z2 = z2 ) = τ. We adopt the following notational conventions:
We start with a general structural model Y = h( X, W, U ), where the function h is unknown, and
we only observe ( X, W ) and Y. Here X is univariate and is our target variable. The dimension of
W is left unrestricted, and U collects all unobserved causal factors of Y. Consider the following
location-scale shift of X,
X−µ
Xδ = + µ + `(δ). (1)
s(δ)
Here, µ is a known parameter, `(δ) is the location shift, and s(δ) > 0 is the scale shift.2 Under (1)
with µ = µ X for µ X := E ( X ), we have E[ Xδ ] = µ + `(δ) and the variance is V [ Xδ ] = s(δ)−2 V [ X ].
In this case, `(δ) affects only the location, and s(δ) affects only the scale. When µ 6= µ X , then
E[ Xδ ] = µ + `(δ) + s(δ)−1 [ E ( X ) − µ] and V [ Xδ ] = s(δ)−2 V [ X ]. In this case, s (δ) affects both the
location and the scale. We allow for a general µ that includes, for example, µ = 0 and µ = µ X as
special cases.
We view s(δ) and `(δ) as functions of the scalar δ, and assume that they are continuously
differentiable. We further assume that s(0) = 1 and `(0) = 0 so that X0 = X. The case studied
2µ is given by the policy maker or calibrated. Note that if Qτ [ X ] is the τ-quantile of X, then
Qτ [ X ] − µ
Q τ [ Xδ ] = + µ + `(δ).
s(δ)
5
by Firpo, Fortin, and Lemieux (2009) amounts to setting s(δ) ≡ 1 and `(δ) = δ, and thus, does
not account for the scale effect and is independent of the choice of µ. To include the scale effect,
we could set s(δ) = 1 + δ and `(δ) = δ. A special case of this model is the case with only a scale
shift (i.e., ` (δ) = 0) so that Xδ = ( X − µ) /s (δ) + µ.
To allow for a more general policy function that includes the location-scale shift in (1) as a
special case, we consider the intervention:
Xδ = G( X; δ)
for some smooth function G(·; ·) that is invertible in its first argument. We will refer to G (·; ·) as
the policy function. We want to compare the quantiles of
Y = h( X, W, U ) (2)
to the quantiles of
Yδ = h( Xδ , W, U ) = h(G( X; δ), W, U ), (3)
where the distribution of ( X, W, U ) in (3) is held the same as that in (2). To understand the latter
condition, we can consider two parallel worlds: the worlds before and after the intervention.
For each given δ, let G −1 ( x; δ) be the inverse function of G( x; δ) such that G(G −1 ( x; δ); δ) = x.
After applying the inverse transform to the target covariate in the post-intervention world, the
distribution of G −1 ( X δ ; δ), W δ , U δ in the post-intervention world is assumed to be the same as
Qτ [Yδ ] − Qτ [Y ]
Πτ := lim ,
δ →0 δ
whenever this limit exists. For the location-scale shift that depends on µ, we write Πτ as Πτ .
µ
∂G ( x; δ) −1
δ ∂x δ
J ( x ; δ) := = .
∂x ∂x x= xδ
Then, the joint probability density functions of the covariate vector before and after the interven-
tion satisfy
f Xδ ,W ( x, w) = J ( x δ ; δ) · f X,W ( x δ , w).
6
(i.b) G ( x; δ) is strictly increasing in x for each δ ∈ Nε .
(i.c) G ( x; 0) = x for all x ∈ X .
(ii) for δ ∈ Nε , the conditional density of U satisfies f U |Xδ ,W (u| x, w) = f U |X,W (u| x δ , w), and the
support U of U given X and W does not depend on ( X, W ) .
(iii.a) x 7→ f X,W ( x, w) is continuously differentiable for all w ∈ W and
ˆ ˆ
∂ J x δ ; δ f X,W ( x δ , w)
sup dxdw < ∞
W X δ∈Nε ∂δ
(iv) f X,W ( x, w) is equal to 0 on the boundary of the support of X given W = w for all w ∈ W .
(v) f Y ( Qτ [Y ]) > 0.
Remark 1. Assumption 1(i) imposes some restrictions on the policy function G ( x; δ) . It is reasonable
that G ( x; δ) is strictly increasing in x, as non-monotonic and non-invertible functions do not seem to
be practically relevant. The strictly increasing property implies that J ( x; δ) > 0 for all x ∈ X and
δ ∈ Nε . The condition that G ( x; 0) = x says that there is no intervention when δ = 0, and it implies
that J ( x; 0) = 1 for all x ∈ X . Assumption 1(ii) assumes that how U depends on the covariate vector is
maintained when we induce a change in the covariate vector. Note that Assumption 1(ii) is different from
f U |Xδ ,W (u| x, w) = f U |X,W (u| x, w), which in general can not hold when U depends on X and W. The
counterfactual model in (3) says that we maintain the structure of the causal system. Assumption 1(ii)
says that we also maintain how the unobservable depends on the observables. As discussed above, we also
implicitly assume that G −1 ( X δ ; δ), W δ has the same distribution as ( X, W ) . The rest of Assumption 1
Remark 2. Assumption 1 does not assume that U is independent of ( X, W ) . It does not assume that
U is conditionally independent of X given W either. Assumption 2 below will impose identification
assumptions.
The following theorem characterizes the effects of the policy change on the distribution of Yδ
and its quantiles.
f Xδ ,W ( x, w) − f X,W ( x, w) ∂
lim = [κ ( x ) f X,W ( x, w)] ,
δ →0 δ ∂x
7
where
∂x δ ∂G( x; δ)
κ ( x ) := =− .
∂δ δ =0 ∂δ δ =0
(ii) As δ → 0, we have
where
∂E [ψ (Y, τ, FY ) | X, W ]
Aτ = − E κ (X) ,
∂X
" #
∂ ln f U |X,W (U | X, W )
Bτ = − E ψ (Y, τ, FY ) κ (X) ,
∂X
and
τ − 1 (y < Qτ [Y ])
ψ (y, τ, FY ) = .
f Y ( Qτ [Y ])
Remark 3. To understand Theorem 1(i), we can write
∂ f X,W ( x,w)
It is quite intuitive that the second term is approximately δ · κ ( x ) · ∂x when δ is small. For the first
term, we note that Xδ = x if and only if X = xδ , and so this term reflects the effect from the Jacobian of the
transformation. Indeed, f Xδ ,W ( x, w) − f X,W ( xδ , w) = J ( x δ ; δ) − J ( x δ ; 0) f X,W ( x δ , w) as J ( x δ ; 0) = 1.
∂J ( x,δ)
The first term is then approximately equal to δ · f X,W ( x, w) · ∂δ δ =0
. But
∂J ( x, δ) ∂ ∂x δ ∂ ∂x δ ∂κ ( x )
= = =
∂δ δ =0 ∂δ ∂x δ =0 ∂x ∂δ δ =0 ∂x
∂κ ( x )
and hence the first term is approximately δ · f X,W ( x, w) · ∂x . Combining these two approximations yields
Theorem 1(i).
8
effect is the marginal change in the policy function.
Remark 5. Theorem 1(iii) represents the structural parameter Πτ in terms of statistical objects. While the
first term Aτ is identifiable, the second term Bτ , which involves the conditional density of U given X and
W, is not. If we use Âτ , a consistent estimator of Aτ as an estimator of Πτ , then the second term Bτ is the
asymptotic bias of Âτ . Similar results have been established in Martinez-Iriarte and Sun (2021a) but only
for location changes. If we do not have the identification condition such as what is given in Assumption
2 below, Theorem 1(iii) allows us to use a bound approach to bound Bτ and infer the range of the policy
effect or conduct a sensitivity analysis similar to that in Martinez-Iriarte (2020).
Remark 6. While the paper focuses on the quantile functional, Theorem 1(iii) is formulated in a general
way. The result holds for any Hadamard differentiable functional and for the mean functional. We only
need to replace ψ (y, τ, FY ) by the influence function of the functional that we are interested in. For
example, for the mean functional, we can replace ψ (y, τ, FY ) by y − E(Y ), and Theorem 1(iii) remains
valid.
To identify Πτ , we make the following independence or conditional independence assump-
tion.
Assumption 2. For δ ∈ Nε , the unobservable U satisfies either f U |X,W (u| x, w) = f U |X,W (u| x δ , w) =
f U (u) or f U |X,W (u| x, w) = f U |X,W (u| x δ , w) = f U |W (u|w).
Under the above assumption, ∂ ln f U |X,W (u| x, w)/∂x = 0 and the second term Bτ in (4) van-
ishes. In this case, Πτ = Aτ and hence is identified.
For the location-scale shift given in (1), we have
κ ( x ) = ṡ (0) ( x − µ) − `˙ (0) ,
where ṡ (δ) = ds (δ) /dδ and `˙ (δ) = d` (δ) /dδ. The corollary below then follows directly from
Theorem 1(iii).
Corollary 1. Let Assumption 1 hold with Assumption 1 (ii) strengthened to Assumption 2. Then
" #
∂E [ψ (Y, τ, FY ) | X, W ] ∂FY |X,W ( Qτ [Y ]| X, W )
1
Πτ = − E κ (X) = E κ (X) .
∂X f Y ( Qτ [Y ]) ∂X
For the location and scale shift in (1) with `(0) = 0, s(0) = 1, and s(δ) > 0, we have
Πτ = Πτ,L + Πτ,S ,
µ µ
(5)
where
ˆ ˆ
`˙ (0) ∂FY |X,W ( Qτ [Y ]| x, w)
Πτ,L = − f X,W ( x, w)dxdw,
f Y ( Qτ [Y ]) W X ∂x
ˆ ˆ
ṡ (0) ∂FY |X,W ( Qτ [Y ]| x, w)
Πτ,S
µ
= ( x − µ) f X,W ( x, w)dxdw.
f Y ( Qτ [Y ]) W X ∂x
9
Remark 7. Both conditions in Assumption 2 require that f U |X,W (u| x, w) = f U |X,W (u| x δ , w). This is
related to the assumption in Firpo, Fortin, and Lemieux (2009, pp.955-957), framed as “maintaining the
conditional distribution of Y given X unaffected.” In essence, Firpo, Fortin, and Lemieux (2009) requires
f U |X (u| x ) = f U |X (u| x δ ). When this condition fails, we may still have f U |X,W (u| x, w) = f U |X,W (u| x δ , w).
Such a condition has also been used in Hsu, Lai, and Lieli (2020) and Spini (2021) in a context of
extrapolation to populations with different distributions of the covariates.
After replacing W by W ∗ = (W, Wc ), Corollary 1 continues to hold. To see this, we can write the
structural function as h∗ ( X, W ∗ , U ), but h∗ ( X, W ∗ , U ) = h( X, W, U ). That is, we include the control
variables in the structural function and restrict the structural function to be a constant function of the
control variables. With such a conceptual change, our proof goes through without any change.
Remark 9. The second part of Corollary 1 is specific to the location-scale change. The overall effect Πτ
can be decomposed into the sum of Πτ,L and Πτ,S . Here Πτ,L is the location effect, and is the estimand
µ
Remark 10. Corollary 1 shows that the scale effect under a general µ is linearly related to the location
effect and the scale effect under the specific µ = u X :
where
ṡ (0)
µ̃ = (µ − µ X ) .
`˙ (0)
The slope µ̃ is proportional to µ − µ X and independent of τ. We will refer to Πτ,S
X µ
as the pure scale effect,
as it is not related to the location effect.
In the rest of this section, we focus on the location-scale shift. To better understand the
location and scale effects in Corollary 1, consider the case that X and U are independent and
there is no W. Then
ˆ
`˙ (0) ∂FY |X ( Qτ [Y ]| x )
Πτ,L =− f X ( x )dx,
f Y ( Qτ [Y ]) X ∂x
ˆ
ṡ (0) ∂FY |X ( Qτ [Y ]| x )
Πτ,S
µ
= ( x − µ) f X ( x )dx. (7)
f Y ( Qτ [Y ]) X ∂x
10
Define
∂FY |X ( Qτ [Y ]| x ) ∂ Pr (Y ≤ Qτ [Y ]| X = x )
Xτ,F ( x ) = = ,
∂x ∂x
which measures how Pr (Y ≤ Qτ [Y ]| X = x ) will change when we induce a small change in x. By
definition,
Pr (Y ≤ Qτ [Y ]| X = x + ∆) − Pr (Y ≤ Qτ [Y ]| X = x )
Xτ,F ( x ) = lim . (8)
∆ →0 ∆
Intuitively, when x is changed into x + ∆, the value of Y will cross Qτ [Y ] from above for a subset
of individuals, and the value of Y will cross Qτ [Y ] from below for another subset of individuals.
The difference in the fractions of individuals in these two subsets is the numerator of (8). Xτ,F ( x )
is then the limit value of the difference rescaled by the induced change in x.
Note that Xτ,F ( x ) is possibly a nonlinear function of x. For notational simplicity, let Xτ,F =
Xτ,F ( X ). To sign the location effect and the pure scale effect, consider the best linear prediction
of Xτ,F using X − µ X as the predictor:
∗ ∗
Xτ,F = c0τ + ( X − µ X ) c1τ + eτ ,
Therefore,
1 ∗ var ( X ) ∗
Πτ,L = −`˙ (0) and Πτ,S
µX
c0τ = ṡ (0) c .
f Y ( Qτ [Y ]) f Y ( Qτ [Y ]) 1τ
The signs of Πτ,L and Πτ,S
X µ
can then be determined from the signs of the best predictive intercept
and slope coefficient.
To sign the location effect Πτ,L , we can assess whether Pr (Y ≤ Qτ [Y ]| X = x ) is increasing in
x or not. If `˙ (0) > 0 and Pr (Y ≤ Qτ [Y ]| X = x ) is increasing in x on average, more precisely,
E [ Xτ,F ] ≥ 0, then Πτ,L ≤ 0. As an example, consider the case that h ( x, u) is decreasing in x for
each u. In this case, Pr (Y ≤ Qτ [Y ]| X = x ) is increasing in x for all x ∈ X , and so Πτ,L ≤ 0 if
`˙ (0) > 0.
It is a bit more challenging to sign the pure scale effect Πτ,S
X µ
. The best linear predictive
∗ depends not only on the function form of X
coefficient c1τ τ,F ( x ) but also on the distribution of
X. We consider two examples below.
Example 4. Normal Location Model. Consider a typical linear model Y = α + Xβ + U, where X and
U are independent N (0, 1). We have: Πτ,L = `˙ (0) β and
β2 β2
Πτ,S
Xµ
= −ṡ(0) ( Q τ [ Y ] − α ) = − ṡ ( 0 ) Q τ [U ] .
β2 + 1
p
β2 + 1
11
See Section S.1 in the Supplementary Appendix for details. While the location effect is constant across τ,
the pure scale effect varies across quantiles and does not depend on the sign of β. The coefficient on the scale
effect (i.e., β2 /( β2 + 1)) has a “signal-to-noise-ratio” interpretation. Indeed, Πτ,S
X µ
= −ṡ(0) E[ Xβ|Y =
Qτ [Y ]]. See Theorem 2 below. This can be regarded as an inverse prediction problem. Given Y = Qτ [Y ],
we want to predict or extract the signal Xβ. The predictive coefficient, given by var ( Xβ)/var (Y ), is
precisely β2 /( β2 + 1).
The next example represents the pure scale effect under increasingly restrictive assumptions,
culminating with a generalization of Example 4. Details are given in Section S.2 in the Supple-
mentary Appendix.
Example 5. Normal Covariates. Consider the linear model Y = α + Xβ + U where X and U are
independent. Suppose we only assume that X ∼ N (µ X , σX2 ). We can use Stein’s lemma (see, for example,
Casella and Berger (2001, pp.124-125) and references therein) to gain some insight on the pure scale effect.
Stein’s lemma states that for a differentiable function m such that E[|m0 ( X )|] < ∞, E[m( X )( X − µ X )] =
σX2 E[m0 ( X )] whenever X ∼ N (µ X , σX2 ). Taking m( x ) = Xτ,F ( x ) and using Stein’s lemma, we can
express the pure scale effect as
ṡ (0) ṡ (0) ∂Xτ,F ( X )
Πτ,S 2
µX
= E [ Xτ,F ( X ) ( X − µ X )] = σ E
f Y ( Qτ [Y ]) f Y ( Qτ [Y ]) X ∂X
" #
ṡ (0) ∂2 FY |X ( Qτ [Y ]| X )
= σX2 E .
f Y ( Qτ [Y ]) ∂X 2
Therefore, when X is normal and ṡ(0) > 0, the pure scale effect is non-negative (non-positive) if FY |X ( Qτ [Y ]| x )
is a convex (concave) function of x. It is interesting to see that the location effect depends on the first order
derivative of FY |X ( Qτ [Y ]| x ) while the pure scale effect depends on its second-order derivative.
If FY |X ( Qτ [Y ]| x ) = Gτ ( aτ + bτ x ) for some link function Gτ and parameters aτ and bτ , then
Xτ,F ( x ) = bτ Ġτ ( aτ + bτ x )
and
∂Xτ,F ( x )
= bτ2 G̈τ ( aτ + bτ x )
∂x
where Ġτ and G̈τ are the first order and second order derivatives of Gτ . Hence
σX2 bτ2
Πτ,S
Xµ
= ṡ (0) E G̈τ ( aτ + bτ X ) .
f Y ( Qτ [Y ])
For example, when U is also normal so that Gτ is the standard normal cumulative distribution function
(cdf), we obtain a generalization of Example 4:
σX2 β2
Πτ,S
µ
X
= −ṡ(0) q Q τ [U ] .
σX2 β2 + 1
12
The above result reduces to that of Example 4 when we set σX2 = 1. Regardless of whether σX2 = 1, Πτ,S
X µ
does not depend on µ X . Thus, the mean of X does not play a role in the pure scale effect.
To understand why, in Example 5, the pure scale effect does not depend on µ X and sign ( β),
we write
!
∂FY |X ( Qτ [Y ]| X )
cov ,X = − βcov ( f U ( Qτ [Y ] − α − Xβ), X )
∂x
= cov ( f U ( Qτ [U ◦ ] + X ◦ ), X ◦ ) ,
f Y ( Qτ [Y ]) = f Y ( Qτ [α + Xβ + U ]) = f Y ( Qτ [α + ( X − µ X ) β + U + µ X β])
= f Y (α + Qτ [U ◦ + µ X β]) = f Y (α + Qτ [U ◦ ] + µ X β) = f U ◦ ( Qτ [U ◦ ]) .
If X − µ X is symmetrically distributed around zero, then the distribution of U ◦ does not depend
on µ X or sign( β) . Hence, f Y ( Qτ [Y ]) does not depend on µ X or sign( β) .
Since both the numerator and the denominator of Πτ,S
X µ
are invariant to µ X and sign( β), we
obtain the following proposition immediately.
Consider a situation where we only care about the scale effect, that is, we set `(δ) ≡ 0. Then, we
have x δ = µ + ( X − µ) /s(δ) and σXδ = σX /s (δ). To interpret Πτ,S , we assume Qτ [Yδ ] 6= 0 and
µ
13
When s(0) = 1 and ṡ (0) 6= 0, the elasticity at δ = 0 is
Πτ,S
µ
Eτ,δ=0 = − . (9)
ṡ (0) Qτ [Y ]
Therefore, a 1% decrease in the standard deviation of X results in a Πτ,S / (ṡ (0) Qτ [Y ]) % change
µ
in the τ-quantile of Y.
σX2 β2 Q τ [U ] σ 2 β2 Q [U ]
Eτ,δ=0 = q =q X qτ .
σX2 β2 + 1 Qτ [Y ] σX2 β2 + 1 α + µ X β + σX2 β2 + 1Qτ (U )
So, Eτ,δ=0 is positive if Qτ [U ] and Qτ [Y ] have the same sign. When α = 0 and µ X = 0, Eτ,δ=0 =
σX2 β2 /(σX2 β2 + 1), which is positive for all quantile levels.
Qτ [log Yδ ] − Qτ [log Y ]
Π̃τ,S := lim
µ
.
δ →0 δ
1
Π̃τ,S = Π .
µ µ
Qτ [Y ] τ,S
Comparing this last expression to (9), we obtain that the elasticity at δ = 0 is
Π̃τ,S
µ
Eτ,δ=0 = − .
ṡ(0)
This says that a 1% increase in the standard deviation of X results in a −Π̃τ,S /ṡ (0) % change in
µ
14
the τ-quantile of Y. When ṡ (0) = −1, the scale effect Π̃τ,S (based on log (Y )) can be interpreted
µ
In order to explore the relationship between conditional quantile regression coefficients and un-
conditional effects, we introduce a “conditional” version of the unconditional effect given in
Corollary 1. For a given ( x, w) ∈ X ⊗ W , this is defined as
Qτ [Yδ | x, w] − Qτ [Y | x, w]
Πτ ( x, w) := lim ,
δ →0 δ
whenever this limit exists. In the above, Qτ [Y | x, w] is the conditional quantile of Y given X = x
and W = w, and Qτ [Yδ | x, w] is the conditional quantile of Yδ = h (G ( X, δ) , W, U ) given X = x
and W = w. Under essentially the same assumptions as in Theorem 1 and Corollary 1, we can
obtain
1 ∂FY |X,W ( Qτ [Y | x, w]|z, w)
Πτ ( x, w) = κ (x) .
f Y |X,W ( Qτ [Y | x, w]| x, w) ∂z z= x
This function matches the unconditional quantile at quantile level τ with the conditional quantile
(conditioning on X = x, W = w) at the quantile level ξ τ ( x, w).
Theorem 2. If Πτ ( x, w) exists for all τ in the support of ξ τ ( X, W ), then the unconditional marginal
effect can be represented as
" #
f Y |X,W ( Qτ [Y ]| X, W )
Πτ = E Πξ τ (X,W ) ( X, W ) ,
f Y ( Qτ [Y ])
The first representation is the counterpart of Proposition 1(ii) of Firpo, Fortin, and Lemieux
(2009). The second representation appears to be new. It does not rely on any shape or dimension
restriction on the structural model Y = h( X, W, U ).
15
Because FY |X,W ( Qτ [Y | x, w]| x, w) = τ, by implicitly differentiating, we have
and (" # )
∂Qξ τ (X,W ) [Y |z, W ])
Πτ = − E | z = X κ ( X ) Y = Q τ [Y ] . (12)
∂z
Example 6. Consider the location-scale shift with no covariate W. Suppose that the conditional quantiles
are linear: Qτ [Y | X = x ] = aτ + xbτ . Then
∂Qξ τ (X ) [Y |z])
= bξ τ ( X )
∂z z= X
and so h i h i
˙ 0) E bξ (X ) |Y = Qτ [Y ] −ṡ(0) E bξ (X ) ( X − µ X ) |Y = Qτ [Y ] .
Πτ X = `(
µ
τ τ
| {z }| {z }
=Πτ,L µ
=Πτ,S
X
In Example 4, we have bτ = β for every τ, and ṡ(0) = 1. The pure scale effect is then
Cov( X, Y ) β2
Πτ,S
Xµ
= − βE [( X − µ X ) |Y = Qτ [Y ]] = − β ( Q τ [Y ] − α ) = − 2 ( Q τ [Y ] − α ) ,
Var (Y ) β +1
16
3 Compensated Marginal Effects
In this section, we consider the case where a location shift in one covariate is compensated by
a location shift in another covariate. In a model Y = h( X1 , X2 , W, U ) where both X1 and X2 are
univariate, we consider the limiting effect of the simultaneous location shift X1δ = X1 + `1 (δ) and
X2δ = X2 + `2 (δ) for some smooth functions `1 (δ) and `2 (δ) satisfying `1 (0) = `2 (0) = 0. In the
simplest case, we have `1 (δ) = δ and `2 (δ) = − pδ for some p ≥ 0. Here, p can be interpreted as
the “relative price” of X1 in terms of X2 . An example is the following: a policy targeted towards
increasing the level of education can, at the same time, reduce the experience of workers. As
with the case of the scale shift, neglecting this possible side effect of the policy might lead to an
inconsistent estimator of its effect.
With the above motivation, we now consider a more general setting that allows for a general
change in X1 and X2 . We induce a change in X = ( X1 , X2 )0 so that it becomes Xδ = ( X1δ , X2δ )0 .
We do not specify the exact form of the change, but we use the simultaneous location shift as a
working example. We assume that
Xδ = G ( x; δ) = (G1 ( X; δ) , G2 ( X; δ))0
for a smooth and invertible bivariate function G = (G1 , G2 )0 . We allow X1δ and X2δ to depend on
both X1 and X2 . A special case is that G1 ( X; δ) is a function of X1 only and G2 ( X; δ) is a function
of X2 only.
In this general setting, the original outcome is given by
Y = h ( X1 , X2 , W, U ) = h ( X, W, U ) ,
The distribution of ( X, W, U ) is kept the same in the above two equations. We want to identify
the following quantity
Qτ [Yδ ] − Qτ [Y ]
Πτ,C := lim , (14)
δ →0 δ
whenever this limit exists. We refer to Πτ,C as the compensated marginal effect for the τ-quantile.
0
Let x = ( x1 , x2 )0 . As before, we define x δ = x1δ , x2δ such that G x δ ; δ = x. By construction,
where the second equality follows from differentiating G x δ , δ = x with respect to x and then
17
solving for ∂x δ /∂x 0 .
Assumption 3. (i) For some ε > 0, each component function of G ( x; δ) is continuously differentiable on
X ⊗ N ε.
(i.b) G ( x; δ) is an invertible function of x each δ ∈ Nε .
(i.c) G ( x; 0) = x for all x ∈ X .
(ii) for δ ∈ Nε , the conditional density of U satisfies f U |Xδ ,W (u| x, w) = f U |X,W (u| x δ , w) and the
support U of U conditional on X and W does not depend on ( X, W ) .
(iii) Assumption 1 (iii.a) holds with J x δ ; δ replaced by det J x δ ; δ and Assumption 1 (iii.b) holds.
(iv) f X,W ( x, w) is equal to 0 on the boundary of the support of X1 given W = w and X2 = x2 for all
w ∈ W and x2 ∈ X2 , the support of X2 , and symmetrically, f X,W ( x, w) is equal to 0 on the boundary of
the support of X2 given W = w and X1 = x1 for all w ∈ W and x1 ∈ X1 , the support of X1 .
(v) f Y ( Qτ [Y ]) > 0.
Assumption 3 is a modified version of Assumption 1 adapted to the case with two target
covariates. Under Assumption 3(i.c), we have J ( x; 0) = I2 , the 2 × 2 identity matrix. Since
det [ J ( x, 0)] = 1, by continuity, det J x δ ; δ > 0 when δ is small enough. Hence, there is no
need to take the absolute value of det J x δ ; δ when converting the pdf of ( X, W ) into that of
( Xδ , W ) .
Define the local change function as
∂x δ
κ ( x ) = (κ1 ( x ) , κ2 ( x ))0 := .
∂δ δ =0
where, as before,
τ − 1 (y < Qτ [Y ])
ψ (y, τ, FY ) = .
f Y ( Qτ [Y ])
The theorem takes the same form as Theorem 1. Under the assumption that X jδ is a function
of X j only for j = 1 and 2, κ j ( x ) depends on x j only, and the effect from changing X1 into X1δ
and that from changing X2 into X2δ are additively separable.
∂E [ψ (Y, τ, FY ) | X, W ]
Πτ,C = −E κ (X) .
∂X 0
18
For the case of a simultaneous location shift X1δ = X1 + `1 (δ) and X2δ = X2 + `2 (δ), we have
0
κ ( x ) = − `˙ 1 (0) , `˙ 2 (0) ,
and so
ˆ ˆ
`˙ 1 (0) ∂FY |X,W ( Qτ [Y ]| x, w)
Πτ,C = − f X,W ( x, w)dxdw
f Y ( Qτ [Y ]) W X ∂x1
ˆ ˆ
`˙ 2 (0) ∂FY |X,W ( Qτ [Y ]| x, w)
− f X,W ( x, w)dxdw. (16)
f Y ( Qτ [Y ]) W X ∂x2
Corollary 2 shows that the compensated effect from the simultaneous location shift is a linear
combination of two location effects: one where the target variable is X1 and the other where the
target variable is X2 . Thus, we can write: Πτ,C = Πτ,L,1 + Πτ,L,2 . This additive result follows
because we have two unrelated location changes whose effects are, in essence, captured by the
sum of two partial derivatives. This is convenient since it immediately allows us to obtain the
bias if we omit the possible simultaneous change in a covariate different from the target variable.
Corollary 1 in Firpo, Fortin, and Lemieux (2009) considers the case of a simultaneous location
shift in k covariates, and delivers a k × 1 vector of marginal effects. Theorem 3 and Corollary 2
complement such a result by showing how to interpret a linear combination of the entries of the
vector of marginal effects. Furthermore, Theorem 3 and Corollary 2 allow for the intervention
of a target covariate to depend on another target covariate. Here we consider only two target
covariates for ease of exposition. Our results can be easily extended to the case with more than
two target covariates.
Our framework can accommodate more complicated policy interventions, such as simultane-
ous location-scale shifts in two target variables. In a potential application, a compensated change
may substitute the mean of one target variable with the variance of another target variable. Given
the generality of G ( x; δ), Corollary 2 is general enough to accommodate various compensating
policies.
In this section, we focus on the estimation of Πτ given in (5). The estimator involves several
µ
preliminary steps. Firstly, for a given quantile, we need to estimate Qτ [Y ]. This is given by
n
q̂τ = arg min ∑ (τ − 1 {Yi ≤ q}) (Yi − q). (17)
q
i =1
19
Next, we need to estimate the density of Y evaluated at Qτ [Y ]. This can be estimated by
1 n
fˆY (q̂τ ) = ∑ Kh (Yi − q̂τ ) (18)
n i =1
where Kh (u) = h−1 K(h−1 u) for a given kernel K and a bandwidth h. For the average derivative
of the conditional cdf, we propose either a logit model as in Firpo, Fortin, and Lemieux (2009) or
a probit model. We model:
where G (·) is either the cdf of a logistic random variable (logit) or a standard normal random
0
variable (probit). Let Zi = ( Xi0 , Wi0 ) and θτ = (α0τ , β0τ )0 . We estimate θτ by the maximum likeli-
hood estimator:
n
θ̂τ := (α̂τ , β̂0τ )0 = arg max ∑ li (θ; q̂τ )
θ ∈ Θ i =1
n
= arg max ∑ 1 {Yi ≤ q̂τ } log G ( Zi0 θ ) + 1 {Yi > q̂τ } log 1 − G ( Zi0 θ ) ,
(20)
θ ∈ Θ i =1
where Θ is a compact parameter space that contains θτ as an interior point. The estimator of Πτ
µ
is then
Π̂τ = Π̂τ,L + Π̂τ,S
µ µ
where
˙ 0) 1 n
`(
Π̂τ,L = − ∑ g(Zi0 θ̂τ )α̂τ ,
fˆY (q̂τ ) n i=1
(21)
ṡ(0) 1 n
Π̂τ,S = ∑ g(Zi0 θ̂τ )α̂τ (Xi − µ) .
µ
(22)
fˆY (q̂τ ) n i=1
In the above, g is the derivative of G, that is, the logistic density or the standard normal density.
In order to establish the asymptotic distribution of Π̂τ , we need the following three sets of
µ
Assumption 5. Logit/Probit. For G either the cdf of a logistic or a standard normal random variable,
we have
(ii) For
∂2 li (θ; q)
Hi (θ; q) = ,
∂θ∂θ 0
20
the Hessian of observation i, the following holds
1 n p
sup ∑
n i =1
Hi (θ; q) − E[ Hi (θ; q)] → 0,
(θ,q)∈N
1 n 1 n
n i∑ n i∑
si (θτ ; q̂τ ) − E [si (θτ ; q)] |q=q̂τ = si (θτ ; Qτ [Y ]) + o p (n−1/2 ),
=1 =1
0
(iv) For X̃i = (1, Xi )0 and Z̃i = (1, Zi0 ) , the following uniform law of large numbers holds:
1 n p
∑ ġ( Zi0 θ ) X̃i Z̃i0 − E ġ( Zi0 θ ) X̃i Z̃i0
sup → 0,
θ ∈Nθ n i =1
exists.
Assumption 6. Density.
´∞ ´∞
(i) The kernel function K (·) satisfies (i) −∞ K (u)du = 1, (ii) −∞ u2 K (u)du < ∞, and (iii) K (u) =
K (−u), and it is twice differentiable with Lipschitz continuous second-order derivative K 00 (u)
´∞
satisfying (i) −∞ K 00 (u)udu < ∞ and (ii ) there exist positive constants C1 and C2 such that
|K 00 (u1 ) − K 00 (u2 )| ≤ C2 |u1 − u2 |2 for |u1 − u2 | ≥ C1 .
1 n τ − 1 {Yi ≤ Qτ [Y ]} 1 n
q̂τ − Qτ [Y ] = ∑
n i =1 f Y ( Qτ [Y ])
+ o p (n−1/2 ) = ∑ ψ(Yi , τ, FY ) + o p (n−1/2 ).
n i =1
21
See, for example, Serfling (1980). Assumption 5 is mostly necessary to deal with the preliminary
estimator q̂τ that enters the likelihood in (20). Assumption 6 is taken from Martinez-Iriarte and
Sun (2021b).
The following lemma contains the influence function for the maximum likelihood estimator
θ̂τ .
1 n 1 n
θ̂τ − θτ = − H −1 ∑
n i =1
si (θτ ; Qτ [Y ]) − H −1 HQ ∑ ψ(Yi , τ, FY ) + o p (n−1/2 ).
n i =1
Theorem 4. Under Assumptions 4, 5, and 6, the estimators given in (21) and (22) satisfy
! !
Π̂τ,L Πτ,L 1 n
∑ Φi,τ + O h2 + o p (n−1/2 ) + o p (n−1/2 h−1/2 ),
− =
Π̂τ,S Πτ,S
µ µ
n i =1
where
1
Φi,τ = Dµ g( Zi0 θτ )ατ X̃i − Eg( Zi0 θτ )ατ X̃i
f Y ( Qτ [Y ])
1
− Dµ MH −1 si (θτ ; Qτ [Y ])
f Y ( Qτ [Y ])
" ! #
Πτ,L f˙Y ( Qτ [Y ]) 1 −1
− + Dµ MH HQ ψ(Yi , τ, FY )
Πτ,S f Y ( Qτ [Y ])
µ
f Y ( Qτ [Y ])
!
Πτ,L 1
− {Kh (Yi − Qτ [Y ]) − EKh (Yi − Qτ [Y ])} ,
Πτ,S f Y ( Qτ [Y ])
µ
Theorem 4 establishes the contribution from each estimation step. In particular, the last term
in n−1 ∑in=1 Φi,τ is the contribution from estimating the density of Y non-parametrically. This
term converges at a non-parametric rate, which is slower than other terms. As a result, the
asymptotic distribution of the location-scale effect estimator is determined by the last term in
n−1 ∑in=1 Φi,τ . However, we do not recommend dropping all other terms. Instead, we write the
asymptotic normality result in the form
" #−1/2 " ! !#
1 n
Π̂τ,L Πτ,L
∑ Φ̂i,τ Φ̂i,τ
0 d
− → N (0, I2 ) (23)
n2 Π̂τ,S Πτ,S
µ µ
i =1
22
as n ↑ ∞, nh3 ↑ ∞, and nh5 ↓ 0 where Φ̂i,τ is a plug-in estimator of Φi,τ . In particular,
" #−1/2
n
n−2 ∑ (l10 Φ̂i,τ )2
d
Π̂τ,L − Πτ,L → N (0, 1),
i =1
" #−1/2
n
n−2 ∑ (l20 Φ̂i,τ )2
d
Π̂τ,S − Πτ,S → N (0, 1),
µ µ
(24)
i =1
where l1 = (1, 0)0 and l2 = (0, 1)0 . The above results hold under some additional but standard
regularity conditions such as the nonsingularity of the probability limit of n−2 ∑in=1 Φ̂i,τ Φ̂i,τ
0 . In-
ferences based on these results account for the estimation errors from all estimation steps and
are more reliable in finite samples. This is supported by simulation evidence not reported here.
√
On the other hand, if we parametrize the density of Y and estimate it at the parametric n-rate,
then the last term in n−1 ∑in=1 Φi,τ will take a different form and will be of the same order as the
√
other terms. In this case, the location-scale effect estimator is n-asymptotically normal, and all
the terms in Theorem 4 will contribute to the asymptotic variance. With an obvious modification
of the last term in Φi,τ , the asymptotic normality can be presented in the same way as in (23).
Let " #
0
∂FY |X,W ( Qτ [Y ]| X, W )
Γτ,S
µ
= Dµ,S E ( X − µ)
∂X
be the numerator of Πτ,S . Then the scale effect Πτ,S is zero if and only if Γτ,S = 0. To test the null
µ µ µ
hypothesis H0 : Πτ,S = 0, we can equivalently test the null hypothesis H0 : Γτ,S = 0. Unlike Πτ,S ,
µ µ µ
Γτ,S can be estimated at the parametric rate even if f Y (·) is not parametrically specified. More
µ
1 n
n i∑
0
Γ̂τ,S := Dµ,S g( Zi0 θ̂τ )α̂τ X̃i ,
µ
=1
0
where Dµ,S = (−µ, 1) upon setting ṡ(0) = 1 without loss of generality.
Under the assumptions of Theorem 4, we can show that
n
0 1 1
n i∑
Γ
Γ̂τ,S − Γτ,S Φi,τ
µ µ
= Dµ,S + op √ ,
=1 n
where
Γ
Φi,τ = g( Zi0 θτ )ατ X̃i − E g( Zi0 θτ )ατ X̃i
− MH −1 si (θτ ; Qτ [Y ]) − MH −1 HQ ψ(Yi , τ, FY ).
Define
n
1
Vτ = lim
n → ∞ n2
∑ E( Dµ,S
0 Γ 2
Φi,τ ) .
i =1
23
d
Under some regularity conditions such as Vτ > 0, we have Vτ−1/2 (Γ̂τ,S − Γτ,S ) → N (0, 1). To test
µ µ
Γ̂τ,S
µ n
1
∑ ( Dµ,S
0 Γ 2
Φ̂i,τ
µ
tτ,S := p for V̂τ = 2 ) ,
V̂τ n i =1
where
1 n −1 −1
Γ
Φ̂i,τ = g( Zi θ̂τ )α̂τ X̃i − ∑
n i =1
g( Zi θ̂τ )α̂τ X̃i − M̂ Ĥ si (θ̂τ ; q̂τ ) − M̂ Ĥ ĤQ ψ̂(Yi , τ, FY ). (25)
In the above, ψ̂(Yi , τ, FY ) = [τ − 1 {Yi ≤ q̂τ }] / fˆY (q̂τ ) and the score si (θ̂τ ; q̂τ ) is obtained by eval-
uating the expression given in (A.7) at θ = θ̂τ and q = q̂τ . M̂, Ĥ, and ĤQ are the sample versions
of M, H, and HQ , respectively. Details are given in the proof of the corollary below.
d
Corollary 3. Let the assumptions of Theorem 4 hold. Assume that Vτ−1/2 (Γ̂τ,S − Γτ,S ) → N (0, 1) for
µ µ
p
some Vτ > 0 and V̂τ /Vτ → 1. Then, under the null hypothesis H0 : Πτ,S = 0,
µ
µ d
tτ,S → N (0, 1).
In this section, we focus on the estimation of Πτ,C given in (16). We use the same estimators of the
quantile, the density of Y, and the parameters in the probit/logit model. We only need to make
0
some minor notational changes. As before θτ = (α0τ , β0τ ) , θ̂τ = (α̂0τ , β̂0τ )0 and Zi = ( Xi0 , Wi0 )0 but
now ατ = (ατ,1 , ατ,2 )0 , α̂τ = (α̂τ,1 , α̂τ,2 )0 and Xi = ( X1i
0 , X 0 )0 . As in the case with the location-scale
2i
effect, we estimate Πτ,C by
Π̂τ,C = Π̂τ,L,1 + Π̂τ,L,2
where
`˙ 1 (0) 1 n
Π̂τ,L,1 = − ∑ g(Zi0 θ̂τ )α̂τ,1 ,
fˆY (q̂τ ) n i=1
(26)
`˙ 2 (0) 1 n
Π̂τ,L,2 = − ∑ g(Zi0 θ̂τ )α̂τ,2 .
fˆY (q̂τ ) n i=1
(27)
Assumption 7. Logit/Probit II. Assumption 5 holds with (iv) replaced by the following:
1 n p
∑ ġ( Zi0 θ ) Zi0 − E ġ( Zi0 θ ) Zi0
sup → 0,
θ ∈Nθ n i =1
1 n p
∑ ġ( Zi0 θ ) − E ġ( Zi0 θ )
sup → 0,
θ ∈Nθ n i =1
24
where Nθ is a neighborhood of θτ and
ML = E [ ġ( Zi0 θτ )ατ Xi0 + g( Zi0 θτ )] , E [ ġ( Zi0 θ )ατ Wi0 ]
exists.
Theorem 5. Under Assumptions 4, 6, and 7, the estimators given in (26) and (27) satisfy
! !
Π̂τ,L,1 Πτ,L,1 1 n L
∑ Φi,τ + O h2 + o p (n−1/2 ) + o p (n−1/2 h−1/2 ),
− =
Π̂τ,L,2 Πτ,L,2 n i =1
where
1
Φi,τ
L
DL g( Zi0 θτ )ατ − E g( Zi0 θτ )ατ
=
f Y ( Qτ [Y ])
1
− DL ML H −1 si (θτ ; Qτ [Y ])
f Y ( Qτ [Y ])
" ! #
Πτ,L,1 f˙Y ( Qτ [Y ]) 1
− + DL ML H −1 HQ ψ(Yi , τ, FY )
Πτ,L,2 f Y ( Qτ [Y ]) f Y ( Qτ [Y ])
!
Πτ,L,1 1
− {Kh (Yi − Qτ [Y ]) − EKh (Yi − Qτ [Y ])}
Πτ,L,2 f Y ( Qτ [Y ])
and !
−`˙ 1 (0) 0
DL = .
0 −`˙ 2 (0)
For the asymptotic normality, the discussions after Theorem 4 are still applicable.
In the special case that `1 (δ) = δ and `2 (δ) = − pδ, it suffices to change DL to diag(1, − p).
It is possible that p, the relative price X1 in terms of X2 , has to be estimated by p̂ based on an
independent sample. In that case, the estimator of the compensated effect would be
If the sample size ñ of the independent sample for estimating p is much larger than n (i.e.,
ñ/n → ∞), then the expansion in Theorem 5 still holds.
Y = α + Xβ + U,
25
˙ 0) = 1. Then, from the results
where X ∼ N (µ X , σX2 ) and U ∼ N (0, 1). We set α = 0 and ṡ(0) = `(
in Examples 4 and 5, the true location effect is Πτ,L = β, and the true scale effect is
σX2 β2
Πτ,S
µ
X
= −q Q τ [U ] .
σX2 β2 + 1
We consider quantiles τ ∈ {0.10, 0.25, 0.50, 0.75, 0.90} and sample sizes n = 500 and n = 1000.
The number of simulations is set to 10, 000 for each experiment.
We implement our estimators in Matlab. The unconditional quantile estimator in equation
(17) is easily computed as an order statistic. The density function is estimated as a kernel density
estimator as in equation (18) using a standard normal kernel. For the bandwidth choice in
the kernel density estimation, we use a modified version of Silverman’s rule of thumb. More
specifically, since we require nh3 ↑ ∞ and nh5 ↓ 0 as n ↑ ∞, we take h = 1.06σ̂Y n−1/4 , where σ̂Y is
the sample standard deviation of Y.
In this subsection, we consider the biases, variances, and mean-squared errors of the proposed
location and scale effects estimators. For each effect estimator, we consider either a probit or a
logit specification for the conditional cdf FY |X ( Qτ [Y ]| X ). Under our data generating process, the
probit for FY |X ( Qτ [Y ]| X ) is correctly specified while the logit is misspecified.
The bias, variance, and mean-squared error are reported in Table 1 when µ X = 0, β = 1 and
σX2 = 1 so that the true location effect is 1 for any τ and the true scale effect is −0.707Qτ [U ]. To
save space, simulation results for other values of β and σX2 are omitted.
Table 1 shows that the effect estimator based on the probit specification outperforms that
based on the logit one. This is consistent with the correct specification of probit. For each
estimator, the bias decreases as the sample size n increases. The variance also decreases as the
sample size n increase, and as a result, the MSE also becomes smaller when the sample size
grows. For our purposes, the scale-effect estimator performs well. For non-central quantiles,
the difference in the scale-effect estimates under the probit and logit specifications is in general
larger than the difference in the location-effect estimates. For central quantiles, the probit and
logit specifications lead to more or less the same estimates for both the scale effect and the
location effect.
In this subsection, we investigate the finite sample accuracy of the normal approximation given
in (24). Using the same data generating process as in the previous subsection and employing the
26
Table 1: The biases, variances, and mean-squared errors of the location and scale effects estima-
tors with β = 1 and σX2 = 1.
We plot each distribution and compare it with the standard normal distribution. We consider
β ∈ {0.25, 0.50, 0.75, 1} and the same values τ as in the previous subsection. Simulation results
for the two sample sizes n = 500, and n = 1000 are qualitatively similar, and we report only the
case when n = 1000 here. Figures 1–4 report the (simulated) finite sample distributions when
σX2 = 1 and n = 1000 for some selected values of β and τ together with a standard normal
density that is superimposed on each figure. It is clear from these figures that the standard
normal distribution provides an accurate approximation to the distribution of the studentized
test statistic for both the location and scale effects.
Table 2 reports the empirical coverage of 95% confidence intervals for the location and scale
effects. The empirical coverage is close to the nominal coverage in all cases. This is consistent
with Figures 1–4. We may then conclude that the normal approximation can be reliably used for
27
0.5 0.5
exact exact
0.4 normal 0.4 normal
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
0.5 0.5
exact exact
0.4 normal 0.4 normal
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
Figure 1: Finite sample exact distribution of the studentized location effect statistic when β =
0.25, σX2 = 1, and n = 1000.
0.5 0.5
exact exact
0.4 normal 0.4 normal
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
0.5 0.5
exact exact
0.4 normal 0.4 normal
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
Figure 2: Finite sample exact distribution of the studentized location effect statistic when β =
0.75, σX2 = 1, and n = 1000.
28
0.5 0.5
exact exact
0.4 normal 0.4 normal
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
0.5 0.5
exact exact
0.4 normal 0.4 normal
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
Figure 3: Finite sample exact distribution of the studentized scale effect statistic when β = 0.25,
σX2 = 1, and n = 1000.
0.5 0.5
exact exact
0.4 normal 0.4 normal
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
0.5 0.5
exact exact
0.4 normal 0.4 normal
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-4 -2 0 2 4 -4 -2 0 2 4
Figure 4: Finite sample exact distribution of the studentized scale effect statistic when β = 0.75,
σX2 = 1, and n = 1000.
29
Table 2: Empirical coverage of 95% confidence intervals for the location and scale effects when
σX2 = 1.
To investigate the power of the t-test proposed in Corollary 3, we simulate the following model:
Y = α + Xβ + U,
where ! ! !!
X 1 1 0
∼N , .
U 0 0 1
Here we set α = 0, µ X = 1 and ṡ(0) = 1. When β = 0, X is excluded from the outcome equation
and thus the scale effect is 0. The null hypothesis of a zero scale effect corresponds to the case
that β = 0. The power of the test is obtained by varying β around 0 in a grid from −0.4 to 0.4
with an increment of 0.01.
Figure 5 graphs the size-adjusted power of the t-test for different quantile levels when n = 500
and when n = 1000. The power is calculated using the probit specification of FY |X ( Qτ [Y ]| X ). The
size adjustment is based on the empirical critical value such that the test rejects the null 5% of the
time. Figure 5 shows that the power increases as β deviates more from its null value of zero, and
that for a given nonzero value of β, the power increases with the sample size. Results not reported
here show that the test has a quite accurate size in that the empirical rejection probability under
the null is close to 5%, the nominal level of the test.
30
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-0.3 -0.2 -0.1 0 0.1 0.2 0.3 -0.3 -0.2 -0.1 0 0.1 0.2 0.3
6 Empirical application
In order to illustrate the proposed approach, we use a household labor survey from Wooldridge
(2002) that can be accessed online for replication.3 The idea is to evaluate the effects of education
on the quantile of the unconditional distribution of log wage. In this application, Y = lwage,
which is log hourly wage, and X = educ, which is years of education. The controls are: W =
[exper tenure nonwhite f emale], where exper is years of working experience, tenure is years with
current employer, nonwhite is a dummy that equals 1 if the individual is non-white, and f emale
is a dummy that equals 1 if the individual is female. We assume that Assumption 2 holds for
this choice of W.
While the main goal is to study the scale effect, we also present results for the location effect.
For the mean of years of education µ X , we let µ X = 12.29 based on the Barro-Lee Data on
Educational Attainment.4 We set µ = µ X = 12.29 to study the location and scale effects. In
a similar fashion to the Monte Carlo analysis, we consider τ ∈ {0.10, 0.25, 0.50, 0.75, 0.90}. The
sample size for the household labor survey is n = 526, which is comparable to n = 500 in the
simulation exercises. We compute the standard errors using the approximation in (24).
The most interesting results in Table 3 appear in the unconditional scale effects. As discussed
in Section 2.2, the scale effects can be interpreted as percentage changes of the unconditional
quantiles. Consider the scale effect for τ = 0.10. Both the probit and logit specifications suggest
an effect of about .045. Then, using the quantile-standard deviation elasticity, a 1% decrease in the
standard deviation of education would produce a positive effect of .045% on the unconditional
quantile at the quantile level τ = 0.10. Given that the sample standard deviation of educ is 2.77,
3 See http://fmwww.bc.edu/ec-p/data/wooldridge/wage1.des and http://fmwww.bc.edu/ec-p/data/
wooldridge/wage1.dta for the data in the Stata data file format.
4 The dataset is available from https://databank.worldbank.org/reports.aspx?source=EducationStatistics
We use the series “Barro-Lee: Average years of total schooling, age 25+, total” for the US between 1970-2010 and find
that the average years of schooling is 12.29.
31
Table 3: Effects of location-scale shifts in education on the unconditional quantiles of log-wage.
0.2
0.1
-0.1
-0.2
-0.3
scale
location
-0.4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 6: Point and interval estimates of location and scale effects of education on the uncondi-
tional quantiles of log-wage based on the probit specification: Πτ,S (solid red) and Πτ,L (dashed
blue).
32
the 1% decrease is approximately a change in the standard deviation from 2.77 to 2.74. Consider
now the scale effect for τ = 0.50. In this case, both probit and logit specifications provide a
statistically insignificant effect (at the 5% level). Confront this with the results of Examples 4
and 5 where in the linear model Y = α + Xβ + U, the scale effect is proportional to Qτ [U ].
Thus, Π̂0.50,S ≈ 0 is consistent with a linear model and U symmetric around 0. Finally, consider
the scale effect for τ = 0.90, again using both probit and logit specifications. In this case, the
effects are negative, suggesting a 1% decrease in the standard deviation would reduce the upper
τ = 0.90 quantile by .20% (probit) and .23% (logit). Overall this analysis shows that the scale
effects are monotonically decreasing in τ. This can be seen in Figure 6 that plots, for a finer grid
of τ,5 the probit estimates for both the location (dashed blue) and scale (solid red) effects.
How can this be interpreted? The location effects suggest that the marginal contribution of
one more year of education benefits more the upper parts of the unconditional distribution of
wages. The scale effects suggest the contrary. Reducing the overall dispersion of education would
increase the lower quantile wages, but reduce the upper ones.
7 Conclusion
This paper has provided a general procedure to analyze the distributional impact of changes
in covariates on an outcome variable. The standard unconditional quantile regression analysis
focuses on a particular impact coming from a pure location shift. We study a more general
location-scale model and show how to additively decompose the total effect into a location effect
and a scale effect. They can be separately analyzed and estimated. To complement the existing
results, we focus on how to define and estimate a change in the scale of a covariate. Additionally,
we consider the case of compensated location changes in different covariates. We show how this
can be obtained from the usual vector-valued unconditional quantile regressions. More generally,
we have provided a framework to study the unconditional policy effects generated by a smooth
and invertible intervention of one or more target variables.
References
Autor, D. H., L. S. Katz, and M. S. Kearney (2005): “Rising wage inequality: The role of
composition and prices,” NBER Working Paper 11628.
Casella, G., and R. L. Berger (2001): Statistical Inference, 2nd. edition. Duxbury, Pacific Grove,
CA.
Engle, R., and S. Manganelli (2004): “Conditional Autoregressive Value at Risk by Regression
Quantiles,” Journal of Business and Economics Statistics, 22(4), 367–381.
5 For Figure 6 we use τ = 0.10, 0.11, ..., 0.89, 0.90.
33
Firpo, S., N. Fortin, and T. Lemieux (2009): “Unconditional quantile regression,” Econometrica,
77(3), 953–973.
Fortin, N., T. Lemieux, and S. Firpo (2011): “Decomposition methods in economics,” in Hand-
book of Labor Economics, ed. by O. Ashenfelter, and D. Card, vol. 4, pp. 1–12. Amsterdam: Else-
vier.
Gu, G. W., S. Malik, D. Pozzoli, and V. Rocha (2019): “Trade-induced Skill Polarization,”
Economic Inquiry, 58(1), 241–259.
Hanushek, E. A., and L. Woessmann (2008): “The Role of Cognitive Skills in Economic Devel-
opment,” Journal of Economic Literature, 3(46), 607–668.
Hsu, Y.-C., T.-C. Lai, and R. P. Lieli (2020): “Counterfactual Treatment Effects: Estimation and
Inference,” Journal of Business and Economic Statistics, Forthcoming.
Inoue, A., T. Li, and Q. Xu (2021): “Two Sample Unconditional Quantile Effect,” ARXIV:
https://arxiv.org/pdf/2105.09445.pdf.
Lee, Y.-Y. (2021): “Nonparametric Weighted Average Quantile Derivative,” Econometric Theory,
pp. 1–39.
Sasaki, Y., T. Ura, and Y. Zhang (2020): “Unconditional Quantile Regression with High Dimen-
sional Data,” Working Paper.
Spini, P. (2021): “Robustness, Heterogeneous Treatment Effects and Covariate Shifts,” Working
Paper.
34
van der Vaart, A. (1998): Asymptotic Statistics. Cambridge University Press, Cambridge.
Wooldridge, J. M. (2002): Econometric Analysis of Cross Section and Panel Data. MIT Press, Cam-
bridge, MA.
Appendix
A.1 Proof of Theorem 1
and so
∂x δ
f Xδ ,W ( x, w) = · f X,W ( x δ , w) = J ( x δ ; δ) f X,W ( x δ , w).
∂x
Evaluated at δ = 0, J x δ ; δ is 1 and f Xδ ,W ( x, w) is f X,W ( x, w). Given this, we expand f Xδ ,W ( x, w) −
f X,W ( x, w) around δ = 0, which is possible under Assumptions 1(i) and (iii.a). Observing that
∂J ( x δ ; δ) ∂ ∂x δ
= = 0,
∂x δ ∂x ∂x δ
∂J ( x δ ; δ) ∂ ∂x δ ∂ ∂x δ ∂κ ( x )
= = = ,
∂δ δ =0 ∂δ ∂x δ=0 ∂x ∂δ δ =0 ∂x
we have
f Xδ ,W ( x, w) − f X,W ( x, w)
= J ( x δ ; δ) f X,W ( x δ , w) − f X,W ( x, w)
" #
∂J x δ ; δ ∂x δ ∂J x δ ; δ
=δ + f X,W ( x, w)
∂x δ ∂δ ∂δ δ =0
∂ f X,W ( x, w) ∂x δ
+ δJ ( x; 0) + δR1 ( x, w, δ)
∂x ∂δ δ=0
∂J x δ ; δ ∂ f X,W ( x, w) ∂x δ
=δ f X,W ( x, w) + δJ ( x; 0) + δR1 ( x, w, δ)
∂δ δ =0 ∂x ∂δ δ=0
∂κ ( x ) ∂ f X,W ( x, w)
=δ f X,W ( x, w) + κ ( x ) + δR1 ( x, w, δ)
∂x ∂x
∂
= δ [κ ( x ) f X,W ( x, w)] + δR1 ( x, w, δ),
∂x
35
where, for δ̃( x, w) between 0 and δ,
( )
∂ J x δ ; δ f X,W ( x δ , w) ∂ J x δ ; δ f X,W ( x δ , w)
R1 ( x, w, δ) = − . (A.1)
∂δ δ=δ̃( x,w) ∂δ δ =0
By the continuity of the derivative of J x δ ; δ f X,W ( x δ , w) with respect to δ, we have R1 ( x, w, δ) =
o (1) for each ( x, w) as δ → 0.
∂G( x;δ)
It remains to show that κ ( x ) = − ∂δ . Differentiating both sides of G x δ ; δ = x with
δ =0
respect to δ, we obtain that
! −1
∂x δ ∂G xδ ; δ ∂G xδ ; δ
=−
∂δ ∂x δ ∂δ
and so −1
∂G ( x; 0) ∂G ( x; δ) ∂G ( x; δ)
∂x δ
κ ( x ) := =− =− ,
∂δ δ =0 ∂x ∂δ δ =0 ∂δ δ =0
where for simplicity we have assumed that the support of X conditional on any W = w does not
depend on w and we have denoted the support by X . By Assumption 1(ii), f U |Xδ ,W (u| x, w) =
f U |X,W (u| x δ , w). So we can write
FYδ (y)
ˆ ˆ ˆ
= 1 { h( x, w, u) ≤ y} f U |Xδ ,W (u| x, w) f Xδ ,W ( x, w)dudxdw
ˆW ˆX ˆU
= 1 { h( x, w, u) ≤ y} f U |X,W (u| x δ , w) f Xδ ,W ( x, w)dudxdw
ˆ ˆ ˆ
W X U
Hence, we have
FYδ (y) − FY (y)
:= G1,δ (y) + G2,δ (y) ,
δ
36
where
ˆ ˆ ˆ
1
G1,δ (y) = 1 { h( x, w, u) ≤ y} f U |X,W (u| x, w) [ f Xδ ,W ( x, w) − f X,W ( x, w)] dudxdw
δ
ˆW ˆX U
1
= FY |X,W (y| x, w) [ f Xδ ,W ( x, w) − f X,W ( x, w)] dxdw, (A.2)
W X δ
and
ˆ ˆ ˆ
G2,δ (y) = 1 { h( x, w, u) ≤ y}
W X U
1h i
× f U |X,W (u| x δ , w) − f U |X,W (u| x, w) f Xδ ,W ( x, w)dudxdw. (A.3)
δ
We first consider the term G1,δ (y) . Using Part (i) and Assumption 1(iv), we have
ˆ ˆ
∂ [κ ( x ) f X,W ( x, w)]
G1,δ (y) = FY |X,W (y| x, w)
∂x
ˆ Wˆ X
+ FY |X,W (y| x, w) R1 ( x, w, δ)dxdw
W X
ˆ ˆ
∂FY |X,W (y| x, w)
=− κ ( x ) f X,W ( x, w)dxdw
∂x
ˆ ˆW X
where the second equality follows from integration by parts. Under Assumption 1(iii.a), we can
use the dominated convergence theorem to obtain
ˆ ˆ
lim sup FY |X,W (y| x, w) R1 ( x, w, δ)dxdw = 0.
δ→0 y∈Y W X
uniformly in y ∈ Y , as δ → 0.
Next, we consider G2,δ (y) . Using Assumption 1(iii.b), we have
h i
f U |X,W (u| x δ , w) − f U |X,W (u| x, w) f X,W ( x δ , w)
∂ f U |X,W (u| x δ , w) ∂x δ
= · δ + δR2 (u, x, w, δ)
f X,W ( x δ , w)
∂x δ0 δ =0 ∂δ
∂ f U |X,W (u| x, w)
= f X,W ( x, w)κ ( x ) δ + δR2 (u, x, w, δ),
∂x 0
37
where
R2 (u, x, w, δ)
∂ f U |X,W (u| x δ , w) f X,W ( x δ , w) ∂ f U |X,W (u| x δ , w) f X,W ( x δ , w)
= −
∂δ δ=δ̃(u,x,w) ∂δ δ =0
" #
∂f ( xδ , w) ∂f ( xδ , w)
− f U |X,W (u| x, w) X,W − f U |X,W (u| x, w) X,W .
∂δ δ=δ̃(u,x,w) ∂δ δ =0
Note that in the above, the transpose on x is not relevant but we keep it so that the same lines of
arguments can be used for proving Theorem 3. Hence
ˆ ˆ ˆ
∂ f U |X,W (u| x, w)
G2,δ (y) = 1 { h( x, w, u) ≤ y} f X,W ( x, w)κ ( x ) dudxdw
∂x 0
ˆ Wˆ X ˆ U
+ 1 { h( x, w, u) ≤ y} R2 (u, x, w, δ)dudxdw.
W X U
Under Assumption 1(iii.b), we can invoke the dominated convergence theorem to get
ˆ ˆ ˆ
lim sup 1 {h( x, w, u) ≤ y} R2 (u, x, w, δ)dudxdw = 0.
δ→0 y∈Y W X U
uniformly over y ∈ Y as δ → 0.
Part (iii). Note that ψ (y, τ, FY ) is the influence function of the quantile functional. Using Part
38
(ii) and Assumption 1(v), we have
ˆ ˆ ˆ
Πτ = ψ (y, τ, FY ) dG (y) = ψ (y, τ, FY ) dG1,0 (y) + ψ (y, τ, FY ) dG2,0 (y)
Y Y Y
and
ˆ
ψ (y, τ, FY ) dG2,0 (y)
Y
ˆ ˆ ˆ ˆ
∂ ln f U |X,W (u| x, w)
= ψ (y, τ, FY ) d1 { h( x, w, u) ≤ y} κ (x)
W X U Y ∂x 0
× f U |X,W (u| x, w) f X,W ( x, w)dudxdw
ˆ ˆ ˆ
∂ ln f U |X,W (u| x, w)
= ψ (h( x, w, u), τ, FY ) κ (x)
W X U ∂x 0
× f U |X,W (u| x, w) f X,W ( x, w)dudxdw.
Therefore,
ˆ ˆ
∂E [ψ (y, τ, FY ) | X = x, W = w]
Πτ = − κ ( x ) f X,W ( x, w)dxdwdy
W X ∂x 0
ˆ ˆ ˆ
∂ ln f U |X,W (u| x, w)
+ ψ (h( x, w, u), τ, FY ) κ (x)
W X U ∂x 0
× f U |X,W (u| x, w) f X,W ( x, w)dudxdw.
39
Let ξ τ ( x, w) be the quantile implied by the matching function in (10). The conditional effect at
this particular quantile is then
1 ∂FY |X,W ( Qτ [Y ]| x, w)
Πξ τ (x,w) ( x, w) = κ (x) .
f Y |X,W ( Qτ [Y ]| x, w) ∂x
f Y |X,W ( Qτ [Y ]| x, w)
f X,W ( x, w) = f X,W |Y ( x, w| Qτ [Y ]),
f Y ( Qτ [Y ])
and so we obtain:
ˆ ˆ
Πτ = Πξ τ (x,w) ( x, w) f X,W |Y ( x, w| Qτ [Y ])dxdw
W X
h i
= E Πξ τ (X,W ) ( X, W ) Y = Qτ [Y ] .
The proof of this Theorem is very similar to the proof of Theorem 1. The following decomposition
still holds
FYδ (y) − FY (y)
:= G1,δ (y) + G2,δ (y) ,
δ
where
ˆ ˆ
[ f Xδ ,W ( x, w) − f X,W ( x, w)]
G1,δ (y) = FY |X,W (y| x, w) dxdw,
W X δ
ˆ ˆ ˆ
f U |X,W (u| x δ , w) − f U |X,W (u| x, w)
G2,δ (y) = 1 {h( x, w, u) ≤ y} f Xδ ,W ( x, w)dudxdw.
W X U δ
We first consider the term G1,δ (y) . Under the assumptions given, we have
h i
f Xδ ,W ( x, w) = det J ( x δ ; δ) f X,W ( x δ , w).
40
Evaluated at δ = 0, f Xδ ,W ( x, w) is f X,W ( x, w). Given this, we expand f Xδ ,W ( x, w) − f X,W ( x, w)
around δ = 0, which is possible under Assumptions 3(i) and (iii). We have
f Xδ ,W ( x, w) − f X,W ( x, w)
∂ det J x δ ; δ f X,W ( x δ , w)
=δ + δR1 ( x, w, δ)
∂δ δ =0
∂ det J x δ ; δ
i ∂x δ 0 ∂ f
X,W ( x , w )
h δ
δ
=δ f X,W ( x, w) + δ det J ( x ; δ) + δR1 ( x, w, δ)
∂δ δ =0 ∂δ ∂x δ δ =0
∂ det J x δ ; δ ∂ f X,W ( x, w)
=δ f X,W ( x, w) + δκ ( x )0 + δR1 ( x, w, δ), (A.4)
∂δ δ =0 ∂x
Using the arguments similar to those in the proof of Theorem 1, we can show that G1,δ (y)
converges to
ˆ ˆ
∂ det J x δ ; δ
G1,0 (y) := FY |X,W (y| x, w) f X,W ( x, w)dxdw
W X ∂δ δ =0
ˆ ˆ δ 0
∂x ∂ f X,W ( x, w)
+ FY |X,W (y| x, w) dxdw
W X ∂δ δ =0 ∂x
(1) (2)
:= G1,0 (y) + G1,0 (y)
uniformly in y ∈ Y , as δ → 0.
∂xiδ
Using Assumption 3 and the fact that ∂x j |δ=0 = 1 {i = j}, we have
!
∂ det J x δ ; δ ∂ ∂x1δ ∂x2δ ∂x δ ∂x δ
= − 1 2
∂δ δ =0 ∂δ ∂x1 ∂x2 ∂x2 ∂x1 δ =0
!
∂κ1 ( x ) ∂x2δ ∂x1δ
∂κ2 ( x ) ∂κ1 ( x ) ∂x2δ ∂x δ ∂κ2 ( x )
= + − − 1
∂x1 ∂x2 ∂x1 ∂x2 ∂x2 ∂x1 ∂x2 ∂x1 δ =0
∂κ1 ( x ) ∂κ2 ( x )
= + .
∂x1 ∂x2
So ˆ ˆ
(1) ∂κ1 ( x ) ∂κ2 ( x )
G1,0 (y) = + FY |X,W (y| x, w) f X,W ( x, w)dxdw.
W X ∂x1 ∂x2
Next, note that
0
∂x δ ∂ f X,W ( x, w) ∂f ( x, w)
= κ ( x )0 X,W .
∂δ δ =0 ∂x ∂x
41
Using integration by parts, we can show that for j = 1 and 2,
ˆ ˆ
∂ f X,W ( x, w)
FY |X,W (y| x, w) κ j ( x ) dxdw
W X ∂x j
ˆ ˆ
∂ FY |X,W (y| x, w)κ j ( x )
=− f X,W ( x, w) dxdw.
W X ∂x j
So
ˆ ˆ
(2) ∂ f X,W ( x, w)
0
G1,0 (y) = FY |X,W (y| x, w) κ ( x ) dxdw
W X ∂x
ˆ ˆ !
∂ FY |X,W (y| x, w)κ1 ( x ) ∂ FY |X,W (y| x, w)κ2 ( x )
=− f X,W ( x, w) + dxdw
W X ∂x1 ∂x2
ˆ ˆ !
∂ FY |X,W (y| x, w) ∂ FY |X,W (y| x, w)
=− f X,W ( x, w) κ1 ( x ) + κ2 ( x ) dxdw
W X ∂x1 ∂x2
ˆ ˆ
∂κ1 ( x ) ∂κ2 ( x )
− f X,W ( x, w) FY |X,W (y| x, w) + dxdw.
W X ∂x1 ∂x2
Therefore,
G1,0 (y)
ˆ ˆ " #
∂FY |X,W (y| x, w) ∂FY |X,W (y| x, w)
=− κ1 ( x ) + κ2 ( x ) f X,W ( x, w)dxdw
W X ∂x1 ∂x2
ˆ ˆ " #
∂ FY |X,W (y| x, w)
=− κ ( x ) f X,W ( x, w)dxdw
W X ∂x 0
" #
∂FY |X,W (y| X, W )
= −E κ (X) .
∂X 0
For G2,δ (y) , the proof of Theorem 1 remains valid, and we have that G2,δ (y) converges to
" #
∂ ln f U |X,W (U | X, W )
G2,0 (y) := E 1 { h( X, W, U ) ≤ y} κ (X)
∂X 0
uniformly in y ∈ Y , as δ → 0.
Invoking the same argument as that in the proof of Theorem 1, we obtain the desired result.
The main complication in this lemma is that the dependent variable is 1 {Yi ≤ q̂τ }. This means
that the preliminary estimator q̂τ might affect the asymptotic distribution of α̂τ and β̂ τ .
42
As mentioned in the main text, under Assumption 4,
1 n τ − 1 {Yi ≤ Qτ [Y ]} 1 n
q̂τ − Qτ [Y ] = ∑
n i =1 f Y ( Qτ [Y ])
+ o p (n−1/2 ) = ∑ ψ(Yi , τ, FY ) + o p (n−1/2 ).
n i =1
Recall that
n
θ̂τ = arg max ∑ G ( Zi0 θ ) G ( Zi0 θ )
1 {Yi ≤ q̂τ } log + 1 {Yi > q̂τ } log 1 − .
θ ∈ Θ i =1
Let si (θ; q̂τ ) denote the score for observation i. Then, under Assumption 5(i), we have
1 n
n i∑
si (θ̂τ ; q̂τ ) = 0.
=1
1 n 1 n 1 n
∑ ∑ ∑
s i ( θ̂ τ ; q̂ τ ) = s i ( θ τ ; q̂ τ ) + Hi (θ̃τ ; q̂τ ) θ̂τ − θτ ,
n n i =1 n i =1
| i=1 {z }
=0
where θ̃τ is between θτ and θ̂τ and can be different for different rows of Hi . Under the assumption
of the uniform law of large numbers for the Hessian (i.e., Assumption 5(ii)), we obtain
1 n p
∑
n i =1
Hi (θ̃τ , q̂τ ) → E[ Hi (θτ ; Qτ [Y ])] =: H.
We have then
1 n
∑
0= si (θτ ; q̂τ ) + H θ̂τ − θτ + o p θ̂τ − θτ . (A.5)
n i =1
1 n 1 n
n i∑ ∑ si (θτ ; Qτ [Y ]) + o p (n−1/2 ).
s i ( θ τ ; q̂ τ ) − E [ s i ( θ τ ; q )] | q = q̂ =
=1
τ
n i =1
Here we have used that E[si (θτ ; Qτ [Y ])] = 0: the score evaluated at the true quantile has expected
value 0. Plugging this back into (A.5), we obtain
1 n
∑
0 = E [si (θτ ; q)] |q=q̂τ + si (θτ ; Qτ [Y ]) + H θ̂τ − θτ + o p θ̂τ − θτ . (A.6)
n i =1
Here E [si (θτ ; q)] |q=q̂τ is random because we first compute the expectation E [si (θτ ; q)] for a fixed
q and then replace q by q̂τ , which is random. To show that E [si (θτ ; q)] |q=q̂τ is O p (n−1/2 ), we
43
observe that (see equation 15.18 in Wooldridge (2002))
So " #
g( Zi0 θτ ) Zi f Y |Z ( Qτ [Y ]| Zi )
∂E [si (θτ ; q)]
HQ = =E . (A.8)
G Zi0 θτ 1 − G Zi0 θτ
∂q q = Q τ [Y ]
We have
= HQ (q̂τ − Qτ [Y ]) + o p (n−1/2 ),
which implies that E [si (θτ ; q)] |q=q̂τ = O p (n−1/2 ). Going back to (A.6), we obtain
1 n
n i∑
H θ̂τ − θτ + o p θ̂τ − θτ ≤ E [si (θτ ; q)] |q=q̂τ + si (θτ ; Qτ [Y ]) ,
=1
1 n
n i∑
θ̂τ − θτ = − H −1 si (θτ ; Qτ [Y ]) − H −1 E [si (θτ ; q)] |q=q̂τ +o p (n−1/2 )
=1 | {z }
| {z } Contribution of q̂τ
Usual influence function
1 n
= − H −1
n i =1∑si (θτ ; Qτ [Y ]) − H −1 HQ (q̂τ − Qτ [Y ]) + o p (n−1/2 )
1 n 1 n
= − H −1
n i =1∑si (θτ ; Qτ [Y ]) − H − 1 HQ
n i∑
ψ(Yi , τ, FY ) + o p (n−1/2 ). (A.9)
=1
To establish the joint asymptotic distribution of the estimators of the location and scale effect, we
need to obtain the asymptotic distribution of fˆY (q̂τ ). By Lemma 6 in Martinez-Iriarte and Sun
44
(2021b), we have that
1 n
fˆY (y) − f Y (y) = ∑ Kh (Yi − y) − E [Kh (Y − y)] + B f (y) + o p (h2 ), (A.10)
n i =1
fˆY (q̂τ ) − f Y ( Qτ [Y ])
= fˆY (q̂τ ) − fˆY ( Qτ [Y ]) + fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ])
= f˙Y ( Qτ [Y ]) (q̂τ − Qτ [Y ]) + fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ]) + o p (n−1/2 h−1/2 ). (A.11)
The first term captures the uncertainty associated with estimating the quantile, and the second
term captures the uncertainty associated with estimating the density.
Next, we can write the location and scale effects as
! ! " #
Π̂τ,L Πτ,L n−1 ∑in=1 g( Zi0 θ̂τ )α̂τ X̃i E g( Zi0 θτ )ατ X̃i
− = Dµ − .
Π̂τ,S Πτ,S fˆY (q̂τ )
µ µ
f Y ( Qτ [Y ])
Now
1 n 1 n
n i∑ ∑ g(Zi0 θτ )ατ X̃i
0
g ( Z θ̂ )
i τ τ i α̂ X̃ =
=1
n i =1
! !
1 n 1 n
n i∑ n i∑
+ ġ( Zi0 θ̃τ )α̃τ X̃i Zi0 (θ̂τ − θτ ) + g( Zi0 θ̃τ ) X̃i (α̂τ − ατ ).
=1 =1
45
Using the uniform law of large numbers in Assumption 5(iv), we have
1 n p
∑ ġ( Zi0 θ̃τ )α̃τ X̃i Zi0 → M1 := E ġ( Zi0 θτ )ατ X̃i Zi0
n i =1 | {z }
2×dim( Z )
and
1 n p
∑ g( Zi0 θ̃τ ) X̃i → M2 := E g( Zi0 θτ ) X̃i .
n i =1 | {z }
2×1
Therefore,
!
√ 1 n
∑ g( Zi0 θ̂τ )α̂τ X̃i − E g( Zi0 θτ )ατ X̃i
n
n i =1
!
√ 1 n √ √
∑ g( Zi0 θτ )ατ X̃i − E g( Zi0 θτ )ατ X̃i
= n + M1 n(θ̂τ − θτ ) + M2 n(α̂τ − ατ ) + o p (1).
n i =1
The first term captures the uncertainty in estimating the expected value, and the second and
third terms capture the uncertainty in estimating the logit/probit model, and it has already
incorporated the contribution of the preliminary estimator q̂τ of Qτ [Y ]. To ease notation, define
M := M1 + ( M2 , O) where O is a 2 × dim(W ) matrix of zeros. An explicit expression of M is
given in Assumption 5(iv). Thus, we can write:
!
√ 1 n
∑ g( Zi0 θ̂τ )α̂τ X̃i − E g( Zi0 θτ )ατ X̃i
n
n i =1
!
√ 1 n √
∑ g( Zi0 θτ )ατ X̃i − E g( Zi0 θτ )ατ X̃i
= n + M n θ̂τ − θτ + o p (1). (A.12)
n i =1
46
√
Plugging the asymptotic representation of n θ̂τ − θτ in (A.9), we obtain:
! ! " #
Π̂τ,L Πτ,L 1 1 n
n i∑
0 0
− = Dµ g( Zi θτ )ατ X̃i − E g( Zi θτ )ατ X̃i
Π̂τ,S Πτ,S
µ µ
f Y ( Qτ [Y ]) =1
1 1 n
− Dµ MH −1 ∑ si (θτ ; Qτ [Y ])
f Y ( Qτ [Y ]) n i =1
" ! #
Πτ,L f˙Y ( Qτ [Y ]) 1 1 n
n i∑
−1
− + D µ MH H Q ψ(Yi , τ, FY )
Πτ,S f Y ( Qτ [Y ])
µ
f Y ( Qτ [Y ]) =1
!
Πτ,L fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ])
− + o p (n−1/2 ) + o p (n−1/2 h−1/2 ).
Πτ,S
µ
f Y ( Q τ [ Y ])
The result has been proved in the main text. Here we give the expressions for M̂, Ĥ, and ĤQ .
For M̂ and Ĥ, we have
!
1 n ġ( Zi0 θ̂τ )α̂τ Xi0 + g( Zi0 θ̂τ ), ġ( Zi0 θ̂τ )α̂τ Wi0
M̂ = ∑
n i =1 ġ( Zi0 θ̂τ )α̂τ Xi Xi0 + g( Zi0 θ̂τ ) Xi , ġ( Zi0 θ̂τ )α̂τ Xi Wi0
and !
1 n g( Zi0 θ̂τ )2 Xi2 Xi Wi0
Ĥ = ∑ .
n i=1 G ( Zi0 θ̂τ )(1 − G ( Zi0 θ̂τ )) Xi Wi Wi Wi0
Let
g( Zi0 θτ ) Zi
Λ( Zi , θτ ) := .
G ( Zi0 θτ ) 1 − G ( Zi0 θτ )
47
Then
HQ = E[Λ( Zi , θτ ) f Y |Z ( Qτ [Y ]| Zi )]
ˆ
= Λ(z, θτ ) f Y |Z ( Qτ [Y ]|z) f Z (z)dz
Z
ˆ
f Y,Z ( Qτ [Y ], z)
= f Y ( Qτ [Y ]) Λ(z, θτ ) f Z (z)dz
Z f Y ( Qτ [Y ]) f Z (z)
ˆ
= f Y ( Qτ [Y ]) Λ(z, θτ ) f Z|Y (z| Qτ [Y ])dz
Z
= f Y ( Qτ [Y ]) E[Λ( Z, θτ )|Y = Qτ [Y ])].
To estimate the conditional expectation, we may use a vector version of the Nadaraya-Watson
estimator:
∑in=1 Kh (Yi − q̂τ )Λ( Zi , θ̂τ )
Ê[Λ( Z, θ̂τ )|Y = q̂τ ] = ,
∑in=1 Kh (Yi − q̂τ )
where Kh is the rescaled kernel Kh (Yi − y) = h−1 K ((Yi − y)/h) for a kernel function K (·) . We
can then estimate HQ by
It is worth pointing out that, in the logistic case, G (z) = (1 + exp (−z))−1 , we have the convenient
identity g(z) = G (z)(1 − G (z)). Thus, Λ( Zi , θ̂τ ) = Zi and the estimation of H and HQ becomes
simpler.
The proof of this theorem is similar to that of Theorem 4. We outline the main steps and omit
the details here. We have
! ! " #
Π̂τ,L,1 Πτ,L,1 n−1 ∑in=1 g( Zi0 θ̂τ )α̂τ E [ g( Zi0 θτ )ατ ]
− = DL − .
Π̂τ,L,2 Πτ,L,2 fˆY (q̂τ ) f Y ( Qτ [Y ])
48
But
Now
1 n 1 n
∑
n i =1
g( Zi0 θ̂τ )α̂τ = ∑ g( Zi0 θτ )ατ
n i =1
! !
1 n 1 n
n i∑ n i∑
+ ġ( Zi0 θ̃τ )α̃τ Zi0
(θ̂τ − θτ ) + g( Zi0 θ̃τ ) (α̂τ − ατ )
=1 =1
n
1
= ∑ g( Zi0 θ̂τ )ατ + ML (θ̂τ − θτ ) + o p n−1/2 .
n i =1
Therefore,
! !
Π̂τ,L,1 Πτ,L,1
−
Π̂τ,L,2 Πτ,L,2
" #
1 1 n
n i∑
0 0
= DL g( Zi θτ )ατ − E g( Zi θτ )ατ
f Y ( Qτ [Y ]) =1
1 1 n
− DL ML H −1 ∑ si (θτ ; Qτ [Y ])
f Y ( Qτ [Y ]) n i =1
" ! #
Πτ,L,1 f˙Y ( Qτ [Y ]) 1 1 n
n i∑
− + D L M L H − 1 HQ ψ(Yi , τ, FY )
Πτ,L,2 f Y ( Qτ [Y ]) f Y ( Qτ [Y ]) =1
!
Πτ,L,1 fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ])
− + o p (n−1/2 ) + o p (n−1/2 h−1/2 ).
Πτ,L,2 f Y ( Qτ [Y ])
49
Supplementary Appendix
S.1 Details of Example 4
we have
q
Yδ = α + Xδ β + U ∼ N α + β` (δ) , 1 + β2 s−2 (δ) := α + β` (δ) + 1 + β2 s−2 (δ)ε,
p
and the τ-quantile Qτ [Yδ ] of Yδ is α + β` (δ) + 1 + β2 s−2 (δ)eτ . Hence
p p
β` (δ) + 1 + β 2 s −2 ( δ ) e τ − 1 + β2 eτ
Πτ = lim
δ →0 δ
β 2
= `˙ (0) β − ṡ (0) p Q τ [U ]
β2 + 1
β2 Q τ [Y ] − α
= `˙ (0) β − ṡ (0) p p
β2 + 1 β2 + 1
β2
= `˙ (0) β − ṡ (0) ( Q τ [Y ] − α )
β2 + 1
:= Πτ,L + Πτ,S ,
2
where Πτ,L = β`˙ (0) is the location effect and Πτ,S = −ṡ (0) β2 +1 ( Qτ [Y ] − α) is the scale effect.
β
Next, we have
Cov( X, Y ) β
E [ X |Y = y ] = (y − α) = 2 (y − α) .
Var (Y ) β +1
Taking y = Qτ [Y ] yields
β
E[ X |Y = Qτ [Y ]] = ( Q τ [Y ] − α ) .
β2 + 1
Therefore, we obtain the alternative expression Πτ,S = −ṡ (0) E[ Xβ|Y = Qτ [Y ]].
ṡ(0)
Πτ,S σ2 b2 E [ ġ( aτ + bτ X )] ,
µ
X
= (S.1)
f Y ( Qτ [Y ]) X τ
1
where 2
1 y
g(y) = √ exp − and ġ(y) = − g(y)y.
2π 2
Therefore,
ˆ ∞
" #!
x − µX 2
1 1 2
E [ ġ( aτ + bτ X )] = − ( aτ + bτ x ) exp − ( aτ + bτ x ) + dx.
2πσX −∞ 2 σX
Define
1 −1
K1,τ := bτ2
+ 2 ,
σX
µX
K2,τ := −K1,τ aτ bτ − 2 ,
σX
µ 2
K3,τ := K1,τ bτ2 + X .
σX2
Then, we have
2
x − µX
2 −1
x2 − 2K2,τ x + K3,τ
( a τ + bτ x ) + = K1,τ
σX
−1
x2 − 2K2,τ x + K2,τ
2 2
= K1,τ − K2,τ + K3,τ
−1
( x − K2,τ )2 + K1,τ
−1 2
= K1,τ K3,τ − K2,τ .
2
Next, we go back to the integral that we are interested in. For X ∼ N (K2,τ , K1,τ ), we have
E [ ġ( aτ + bτ X )]
1 p h
−1
i
= −√ K1,τ exp K1,τ K3,τ − K2,τ 2
× E ( a τ + bτ X )
2πσX
h i
−1 2
p
K1,τ exp K1,τ K3,τ − K2,τ
=− √ ( aτ + bτ K2,τ ) .
2πσX
Now, consider the case where Y = α + Xβ + U and X ⊥ U, and U is a standard normal. Note
that
FY |X ( Qτ [Y ]| x ) = Pr (α + Xβ + U < Qτ [Y ]| X = x )
= Pr (U < Qτ [Y ] − α − xβ| X = x ) = G ( Qτ [Y ] − α − xβ).
1 −1 1 −1
K1,τ := bτ2
+ 2 2
= β + 2
σX σX
µX µX
K2,τ := −K1,τ aτ bτ − 2 = K1,τ ( Qτ [Y ] − α) β + 2
σX σX
µ 2
µ 2
K3,τ := K1,τ a2τ + X 2
= K1,τ ( Qτ [Y ] − α)2 + X2 .
σX σX
( Q τ [Y ] − α − µ X β ) 2
= .
σX2 β2 + 1
Thus, we have
E [ ġ( aτ + bτ X )]
( Q [Y ]−α−µ β)2 ( Q [Y ]−α−µ β)2
h i h i
K1,τ exp − 21 τ σ2 β2 +1 X K1,τ exp − 21 τ σ2 β2 +1 X
p p
= − ( Q τ [Y ] − α ) √ X
+β √ X
K2,τ
2πσX 2πσX
µ
( Qτ [Y ] − α) β + σX2
X
= f Y ( Qτ [Y ]) − ( Qτ [Y ] − α) + β
β2 + σ12
X
α + µ X β − Q τ [Y ]
= f Y ( Qτ [Y ]) ,
σX2 β2 + 1
3
where we have used
ṡ(0)
Πτ,S σ2 b2 E [ ġ( aτ + bτ X )]
µ
X
=
f Y ( Qτ [Y ]) X τ
α + µ X β − Q τ [Y ]
= ṡ(0)σX2 β2
σX2 β2 + 1
h q i
α + µ X β − α + µ X β + σX2 β2 + 1Qτ (U )
= ṡ(0)σX2 β2
σX2 β2 + 1
σX2 β2
= −ṡ(0) q Q τ [U ] ,
σX2 β2 + 1
q
where we have used Qτ [Y ] = α + µ X β + σX2 β2 + 1Qτ (U ) .