0% found this document useful (0 votes)
9 views53 pages

Location-Scale and Compensated Effects in Unconditional

Uploaded by

490189269
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views53 pages

Location-Scale and Compensated Effects in Unconditional

Uploaded by

490189269
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Location-Scale and Compensated Effects in Unconditional

Quantile Regressions*
Julian Martinez-Iriarte† Gabriel Montes-Rojas‡ Yixiao Sun§

January 6, 2022

Abstract

This paper proposes an extension of the unconditional quantile regression analysis to (i)
location-scale shifts, and (ii) compensated shifts. The first case is intended to study a counter-
factual policy analysis aimed at increasing not only the mean or location of a covariate but also
its dispersion or scale. The compensated shift refers to a situation where a shift in a covariate
is compensated at a certain rate by another covariate. Not accounting for these possible scale
or compensated effects will result in an incorrect assessment of the potential policy effects
on the quantiles of an outcome variable. More general interventions and compensated shifts
are also considered. The unconditional policy parameters are estimated with simple semi-
parametric estimators, for which asymptotic properties are studied. Monte Carlo simulations
are implemented to study their finite sample performances, and the proposed approach is
applied to a Mincer equation to study the effects of a location-scale shift in education on the
unconditional quantiles of wages.

Keywords: Quantile regression, unconditional policy effect, unconditional regression.

JEL: J01, J31.

* For helpful conversations, we thank Javier Alejo, George Bulman and Augusto Nieto-Barthaburu. All errors
remain our own.
† Department of Economics, UC Santa Cruz. E-mail: [email protected]
‡ CONICET and Universidad de Buenos Aires. E-mail: [email protected]
§ Department of Economics, UC San Diego. E-mail: [email protected]

1
1 Introduction
In many research areas, it is important to assess the distributional effects of covariates on an
outcome variable. Several methods have been implemented in the literature to study this. A
prolific line of research is a combination of conditional mean and quantile regression models
together with micro simulation exercises, as in Autor, Katz, and Kearney (2005), Machado and
Mata (1995), and Melly (2005) (see Fortin, Lemieux, and Firpo (2011) for a review). A more recent
and popular method is the recentered influence function (RIF) regression of Firpo, Fortin, and
Lemieux (2009), which directly estimates the effect of a covariate change on a functional of the
unconditional distribution of the outcome variable. The functional of interest can be the mean,
quantile, or any other aspect of the unconditional distribution.
Consider, as an example, the unconditional quantile of the outcome variable Y. Let FY be the
unconditional distribution function of Y, then the τ-quantile of FY is defined by

Qτ [Y ] := arg min{q : τ ≤ FY (q)} for τ ∈ (0, 1).

We seek to study how Qτ [Y ] changes when we induce an infinitesimal change in a covariate X ∈


R, allowing the presence of other observable covariates W and unobservable covariates collected
in U. These covariates and the outcome variable are related via a structural or causal function
h so that Y = h( X, W, U ). We consider a sequence of policy experiments that change X into
Xδ = G( X; δ) for a smooth function G(·; ·). The policy experiments are indexed by δ satisfying
G( X; 0) = X. That is, δ = 0 corresponds to the status quo policy. With this induced change in
X, the outcome variable becomes Yδ = h( Xδ , W, U ) = h(G( X; δ), W, U ) where the distribution
of ( X, W, U ) is held constant. Our policy experiment has a ceteris paribus interpretation at the
population level: we change X into Xδ while holding the stochastic dependence among X, W,
and U constant. Such a policy experiment is implementable if the covariate X is not a causal
factor for either W or U. In this case, when we intervene X and change it into Xδ , W and U
will not change. The main parameter of interest is the marginal effect of the change on the
unconditional quantile of the outcome variable:

Qτ [Yδ ] − Qτ [Y ]
Πτ := lim .
δ →0 δ

Firpo, Fortin, and Lemieux (2009) consider a pure location shift Xδ = X + δ. This shift affects
the entire unconditional distribution of Y = h( X, W, U ), moving it towards a counterfactual
distribution of Yδ = h( Xδ , W, U ). One of the main results in Firpo, Fortin, and Lemieux (2009,
p.958, eq. (6)) is that Πτ can be represented as an average derivative:

Πτ = E [ψ̇x ( X, W )] ,

2
where
∂E [ψ (Y, τ, FY ) | X = x, W = w]
ψ̇x ( x, w) = ,
∂x
ψ (y, τ, FY ) = [τ − 1 {y ≤ Qτ [Y ]}] / f Y ( Qτ [Y ]) is the influence function of the quantile functional,
and f Y ( Qτ [Y ]) is the unconditional density of Y evaluated at the τ-quantile Qτ [Y ]. The uncon-
ditional quantile effect Πτ can then be estimated by first running an unconditional quantile
regression (henceforth, UQR), which involves regressing the influence function ψ (Yi , τ, FY ) on
the covariates ( Xi , Wi ) and then taking an average of the partial derivatives of the regression
function with respect to X.
The same method is applicable to other functionals of interest — we only need to replace
ψ (Yi , τ, FY ) by the influence function underlying the functional we care about. This leads to
the general RIF regression of Firpo, Fortin, and Lemieux (2009). The potential simplicity and
flexibility that the methodology offers motivate subsequent research to expand the use of RIF re-
gressions. On the empirical side, after its introduction, RIF regressions became a popular method
for analyzing and identifying the distributional effects on outcomes in terms of changes in ob-
served characteristics in areas such as labor economics, income and inequality, health economics,
and public policy. On the theoretical side, the RIF type of regression has been used to study
the effect of a change in a discrete covariate.1 More recent research on UQR and RIF regres-
sions includes the high-dimensional setting of Sasaki, Ura, and Zhang (2020) and the two-sample
problem of Inoue, Li, and Xu (2021).
This paper extends the UQR and RIF regression in several ways. First, we allow simultaneous
location and scale shifts in a continuous covariate. The main goal is to study a case where a
counterfactual policy analysis aiming at increasing the location or the mean of a covariate might
also affect its dispersion. For example, we may consider Xδ = X (1 + δ)−1 + δ. We find that in this
case, the marginal effect has a closed-form expression. In order to interpret the scale effect, we
introduce the quantile-standard deviation elasticity: the percentage change in the unconditional
quantiles of the outcome associated with a 1% change in the standard deviation of the target
covariate.
Second, we consider the case of compensated location changes in two covariates. This hap-
pens when a location shift in one covariate induces a location shift in another covariate. For
example, Y = h ( X1 , X2 , W, U ) for two scalar target covariates X1 and X2 , and the policy induces
X1δ = X1 + δ and X2δ = X2 − δ. We show that the compensated effect can be obtained as a linear
combination of individual effects obtained by considering one change at a time.
Third, while we focus mainly on location-scale and compensated location shifts, we consider a
general framework that includes these two types of shifts as special cases. In fact, our framework
allows for any smooth and invertible intervention of the target covariates.
Fourth, we allow the target covariates to be endogenous, and we characterize the asymptotic
1 Insuch a case, we may consider a shift in the probability mass function. The discrete case was initially studied by
Firpo, Fortin, and Lemieux (2009). See Rothe (2012), Martinez-Iriarte (2020), and Martinez-Iriarte and Sun (2021b) for
further studies.

3
bias of the unconditional effect estimator when the endogeneity is not appropriately accounted
for. We eliminate the endogeneity bias using a control function approach.
Fifth, as a complement to the existing literature that focuses on changing the marginal distribu-
tion of the target covariates, we consider changing the values of the target covariates directly. An
advantage of our approach is that the changes under consideration are directly implementable.
We note that it may not be easy to induce a desired shift in the marginal distribution, and when
possible, such a shift is often achieved via transforming the target covariates, which is what we
consider here.
Finally, we propose consistent and asymptotically normal semiparametric estimators of the
location-scale effect and the compensated effect. The estimators can be easily implemented in
empirical work using either a probit or logit specification of the conditional distribution func-
tion. We conduct an extensive Monte Carlo study evaluating the finite sample performances of
the location-scale effect estimator and the accuracy of the normal approximation. Simulation
results show that the estimator works reasonably well under different specifications and that the
standard normal distribution provides a good approximation to the finite sample distribution of
a studentized test statistic introduced in this paper.
As potential applications of our proposed approach, consider the following empirical exam-
ples to motivate its use.

Example 1. Effect of increasing education on wage inequality. In a Mincer equation, log wages Y are
modeled as a function of certain observable covariates such as education. A study of the effect of a shift in
education on wage inequality could be implemented using our proposed framework. We can accommodate
a counterfactual policy experiment where there may be not only a general increase in education but also a
change in its dispersion.

Example 2. Trade integration and skill distribution Gu, Malik, Pozzoli, and Rocha (2019) document
the impact of trade integration on both the mean and the standard deviation of the skill distribution across
municipalities in Denmark. Moreover, as argued by Hanushek and Woessmann (2008), skills are related
to the income distribution. Thus, a quantification of the impact of a scale effect in the skills distribution on
the quantiles of the income distribution appears to be relevant.

Example 3. Financial return and risk. Consider the study of two assets X and W in a portfolio in-
vestment framework with stochastic returns Y. We are interested in how changes in the returns of asset
X affect the distribution of Y through its unconditional quantiles. A typical exercise involves analyzing
changes in the returns (location) and risk (scale) of X. Ignoring the structural interpretations if identifi-
cation fails, we can still use the proposed framework to decompose the relative contribution of each effect.
This could be applied to Value-at-Risk models; see, for instance, Engle and Manganelli (2004).

We illustrate the proposed method with an empirical application related to Example 1: the
effect of changing education on wage inequality, decomposing it into location and scale effects.
Empirical results reveal the contrasting nature of the two effects. The location effects are seen to
be positive and relatively similar across quantiles. On the other hand, the scale effects are highly

4
heterogeneous and monotonically decreasing across quantiles. Hence, the scale effects can more
than offset the location effects. This shows that not accounting for both shifts may result in a
biased assessment of the policy effects on the quantiles of the outcome variable.
The paper is organized as follows. Section 2 defines and studies the location-scale marginal
effects in one covariate. Section 3 proposes and studies a compensated change in two covari-
ates. Section 4 describes the estimators of the location-scale effect and the compensated effect
and studies their asymptotic properties. Section 5 reports the finite sample performance of the
location-scale effect estimator and the associated tests, and Section 6 presents the empirical ap-
plication. Section 7 concludes. The proofs are in the Appendix. Calculation details for two
theoretical examples are given in the Supplementary Appendix.
A word on notation: we use FY |X (y| x ) and f Y |X (y| x ) to denote the cumulative distribution
function and the probability density function of Y, respectively, conditional on X = x. For a
random variable Z, the unconditional τ-quantile is denoted by Qτ [ Z ], i.e., Pr( Z ≤ Qτ [ Z ]) = τ.
For a pair of random variables Z1 and Z2 , the conditional quantile is denoted by Qτ [ Z1 |z2 ], i.e.,
Pr( Z1 ≤ Qτ [ Z1 |z2 ]| Z2 = z2 ) = τ. We adopt the following notational conventions:

∂E( Z | X ) ∂E ( Z | X = x ) ∂FZ|X (z| X ) ∂FZ|X (z| X = x )


= , = .
∂X ∂x x=X ∂X ∂x
x=X

2 Location-scale marginal effects


2.1 Basic setting and main results

We start with a general structural model Y = h( X, W, U ), where the function h is unknown, and
we only observe ( X, W ) and Y. Here X is univariate and is our target variable. The dimension of
W is left unrestricted, and U collects all unobserved causal factors of Y. Consider the following
location-scale shift of X,
X−µ
Xδ = + µ + `(δ). (1)
s(δ)
Here, µ is a known parameter, `(δ) is the location shift, and s(δ) > 0 is the scale shift.2 Under (1)
with µ = µ X for µ X := E ( X ), we have E[ Xδ ] = µ + `(δ) and the variance is V [ Xδ ] = s(δ)−2 V [ X ].
In this case, `(δ) affects only the location, and s(δ) affects only the scale. When µ 6= µ X , then
E[ Xδ ] = µ + `(δ) + s(δ)−1 [ E ( X ) − µ] and V [ Xδ ] = s(δ)−2 V [ X ]. In this case, s (δ) affects both the
location and the scale. We allow for a general µ that includes, for example, µ = 0 and µ = µ X as
special cases.
We view s(δ) and `(δ) as functions of the scalar δ, and assume that they are continuously
differentiable. We further assume that s(0) = 1 and `(0) = 0 so that X0 = X. The case studied
2µ is given by the policy maker or calibrated. Note that if Qτ [ X ] is the τ-quantile of X, then

Qτ [ X ] − µ
Q τ [ Xδ ] = + µ + `(δ).
s(δ)

5
by Firpo, Fortin, and Lemieux (2009) amounts to setting s(δ) ≡ 1 and `(δ) = δ, and thus, does
not account for the scale effect and is independent of the choice of µ. To include the scale effect,
we could set s(δ) = 1 + δ and `(δ) = δ. A special case of this model is the case with only a scale
shift (i.e., ` (δ) = 0) so that Xδ = ( X − µ) /s (δ) + µ.
To allow for a more general policy function that includes the location-scale shift in (1) as a
special case, we consider the intervention:

Xδ = G( X; δ)

for some smooth function G(·; ·) that is invertible in its first argument. We will refer to G (·; ·) as
the policy function. We want to compare the quantiles of

Y = h( X, W, U ) (2)

to the quantiles of
Yδ = h( Xδ , W, U ) = h(G( X; δ), W, U ), (3)

where the distribution of ( X, W, U ) in (3) is held the same as that in (2). To understand the latter
condition, we can consider two parallel worlds: the worlds before and after the intervention.
For each given δ, let G −1 ( x; δ) be the inverse function of G( x; δ) such that G(G −1 ( x; δ); δ) = x.
After applying the inverse transform to the target covariate in the post-intervention world, the
distribution of G −1 ( X δ ; δ), W δ , U δ in the post-intervention world is assumed to be the same as


that of ( X, W, U ) in the pre-intervention world. Here, no change is induced on W and U and so



W δ , U δ is actually the same as (W, U ) for every individual in the population.
Formally, our parameter of interest, the marginal effect for the τ-quantile, is defined as

Qτ [Yδ ] − Qτ [Y ]
Πτ := lim ,
δ →0 δ

whenever this limit exists. For the location-scale shift that depends on µ, we write Πτ as Πτ .
µ

For notational economy, we write x δ = G −1 ( x; δ). Then Xδ = x if and only if X = x δ . Define


the Jacobian of the inverse transform x 7→ x δ := G −1 ( x; δ) as

∂G ( x; δ) −1
 
δ ∂x δ
J ( x ; δ) := = .
∂x ∂x x= xδ

Then, the joint probability density functions of the covariate vector before and after the interven-
tion satisfy
f Xδ ,W ( x, w) = J ( x δ ; δ) · f X,W ( x δ , w).

For ε > 0, define Nε := {δ : |δ| ≤ ε}. We maintain the following assumption.

Assumption 1. (i.a) For some ε > 0, G ( x; δ) is continuously differentiable on X ⊗ N ε , where X is the


support of X.

6
(i.b) G ( x; δ) is strictly increasing in x for each δ ∈ Nε .
(i.c) G ( x; 0) = x for all x ∈ X .
(ii) for δ ∈ Nε , the conditional density of U satisfies f U |Xδ ,W (u| x, w) = f U |X,W (u| x δ , w), and the
support U of U given X and W does not depend on ( X, W ) .
(iii.a) x 7→ f X,W ( x, w) is continuously differentiable for all w ∈ W and
ˆ ˆ   
∂ J x δ ; δ f X,W ( x δ , w)
sup dxdw < ∞
W X δ∈Nε ∂δ

where W is the support of W.


(iii.b) x 7→ f U |X,W (u| x, w) is continuously differentiable for all (u, w) and
ˆ ˆ ˆ
∂ h i
sup f U |X,W (u| x , w) f X,W ( x , w) dudxdw < ∞,
δ δ
W X U δ∈Nε ∂δ
ˆ ˆ ˆ
∂ f X,W ( x δ , w)
sup f U |X,W (u| x, w)dudxdw < ∞.
W X U δ∈Nε ∂δ

(iv) f X,W ( x, w) is equal to 0 on the boundary of the support of X given W = w for all w ∈ W .
(v) f Y ( Qτ [Y ]) > 0.

Remark 1. Assumption 1(i) imposes some restrictions on the policy function G ( x; δ) . It is reasonable
that G ( x; δ) is strictly increasing in x, as non-monotonic and non-invertible functions do not seem to
be practically relevant. The strictly increasing property implies that J ( x; δ) > 0 for all x ∈ X and
δ ∈ Nε . The condition that G ( x; 0) = x says that there is no intervention when δ = 0, and it implies
that J ( x; 0) = 1 for all x ∈ X . Assumption 1(ii) assumes that how U depends on the covariate vector is
maintained when we induce a change in the covariate vector. Note that Assumption 1(ii) is different from
f U |Xδ ,W (u| x, w) = f U |X,W (u| x, w), which in general can not hold when U depends on X and W. The
counterfactual model in (3) says that we maintain the structure of the causal system. Assumption 1(ii)
says that we also maintain how the unobservable depends on the observables. As discussed above, we also
implicitly assume that G −1 ( X δ ; δ), W δ has the same distribution as ( X, W ) . The rest of Assumption 1


consists of regularity conditions.

Remark 2. Assumption 1 does not assume that U is independent of ( X, W ) . It does not assume that
U is conditionally independent of X given W either. Assumption 2 below will impose identification
assumptions.

The following theorem characterizes the effects of the policy change on the distribution of Yδ
and its quantiles.

Theorem 1. Let Assumption 1 hold.


(i) For each ( x, w) ∈ X ⊗ W ,

f Xδ ,W ( x, w) − f X,W ( x, w) ∂
lim = [κ ( x ) f X,W ( x, w)] ,
δ →0 δ ∂x

7
where
∂x δ ∂G( x; δ)
κ ( x ) := =− .
∂δ δ =0 ∂δ δ =0

(ii) As δ → 0, we have

FYδ (y) − FY (y)


"δ ! #
∂FY |X,W (y| X, W ) ∂ ln f U |X,W (U | X, W )
→E − + 1 {h( X, W, U ) ≤ y} κ (X)
∂X ∂X

uniformly in y ∈ Y , the support of Y.


(iii) The marginal effect of the intervention Xδ = G( X; δ) on the τ-quantile of the outcome variable Y
can be represented by
Πτ = Aτ − Bτ (4)

where

∂E [ψ (Y, τ, FY ) | X, W ]
 
Aτ = − E κ (X) ,
∂X
" #
∂ ln f U |X,W (U | X, W )
Bτ = − E ψ (Y, τ, FY ) κ (X) ,
∂X

and
τ − 1 (y < Qτ [Y ])
ψ (y, τ, FY ) = .
f Y ( Qτ [Y ])
Remark 3. To understand Theorem 1(i), we can write

f Xδ ,W ( x, w) − f X,W ( x, w) = f Xδ ,W ( x, w) − f X,W ( x δ , w) + f X,W ( x δ , w) − f X,W ( x, w).

∂ f X,W ( x,w)
It is quite intuitive that the second term is approximately δ · κ ( x ) · ∂x when δ is small. For the first
term, we note that Xδ = x if and only if X = xδ , and so this term reflects the effect from the Jacobian of the
 
transformation. Indeed, f Xδ ,W ( x, w) − f X,W ( xδ , w) = J ( x δ ; δ) − J ( x δ ; 0) f X,W ( x δ , w) as J ( x δ ; 0) = 1.
∂J ( x,δ)
The first term is then approximately equal to δ · f X,W ( x, w) · ∂δ δ =0
. But

∂J ( x, δ) ∂ ∂x δ ∂ ∂x δ ∂κ ( x )
= = =
∂δ δ =0 ∂δ ∂x δ =0 ∂x ∂δ δ =0 ∂x

∂κ ( x )
and hence the first term is approximately δ · f X,W ( x, w) · ∂x . Combining these two approximations yields
Theorem 1(i).

Remark 4. By definition, κ ( x ) measures the marginal change of the inverse function G −1 ( x; δ) as we


increase δ from zero infinitesimally. Since G( x; 0) = x, Theorem 1(i) shows that κ ( x ) is equal to the
negative of the marginal change of G( x; δ) at δ = 0. Theorems 1 (ii) and (iii) show that only κ ( x ) appears
in the marginal effect and the Jacobian does not. This is not surprising, as what matters for the marginal

8
effect is the marginal change in the policy function.
Remark 5. Theorem 1(iii) represents the structural parameter Πτ in terms of statistical objects. While the
first term Aτ is identifiable, the second term Bτ , which involves the conditional density of U given X and
W, is not. If we use Âτ , a consistent estimator of Aτ as an estimator of Πτ , then the second term Bτ is the
asymptotic bias of Âτ . Similar results have been established in Martinez-Iriarte and Sun (2021a) but only
for location changes. If we do not have the identification condition such as what is given in Assumption
2 below, Theorem 1(iii) allows us to use a bound approach to bound Bτ and infer the range of the policy
effect or conduct a sensitivity analysis similar to that in Martinez-Iriarte (2020).
Remark 6. While the paper focuses on the quantile functional, Theorem 1(iii) is formulated in a general
way. The result holds for any Hadamard differentiable functional and for the mean functional. We only
need to replace ψ (y, τ, FY ) by the influence function of the functional that we are interested in. For
example, for the mean functional, we can replace ψ (y, τ, FY ) by y − E(Y ), and Theorem 1(iii) remains
valid.
To identify Πτ , we make the following independence or conditional independence assump-
tion.
Assumption 2. For δ ∈ Nε , the unobservable U satisfies either f U |X,W (u| x, w) = f U |X,W (u| x δ , w) =
f U (u) or f U |X,W (u| x, w) = f U |X,W (u| x δ , w) = f U |W (u|w).
Under the above assumption, ∂ ln f U |X,W (u| x, w)/∂x = 0 and the second term Bτ in (4) van-
ishes. In this case, Πτ = Aτ and hence is identified.
For the location-scale shift given in (1), we have

κ ( x ) = ṡ (0) ( x − µ) − `˙ (0) ,

where ṡ (δ) = ds (δ) /dδ and `˙ (δ) = d` (δ) /dδ. The corollary below then follows directly from
Theorem 1(iii).
Corollary 1. Let Assumption 1 hold with Assumption 1 (ii) strengthened to Assumption 2. Then
" #
∂E [ψ (Y, τ, FY ) | X, W ] ∂FY |X,W ( Qτ [Y ]| X, W )
 
1
Πτ = − E κ (X) = E κ (X) .
∂X f Y ( Qτ [Y ]) ∂X

For the location and scale shift in (1) with `(0) = 0, s(0) = 1, and s(δ) > 0, we have

Πτ = Πτ,L + Πτ,S ,
µ µ
(5)

where
ˆ ˆ
`˙ (0) ∂FY |X,W ( Qτ [Y ]| x, w)
Πτ,L = − f X,W ( x, w)dxdw,
f Y ( Qτ [Y ]) W X ∂x
ˆ ˆ
ṡ (0) ∂FY |X,W ( Qτ [Y ]| x, w)
Πτ,S
µ
= ( x − µ) f X,W ( x, w)dxdw.
f Y ( Qτ [Y ]) W X ∂x

9
Remark 7. Both conditions in Assumption 2 require that f U |X,W (u| x, w) = f U |X,W (u| x δ , w). This is
related to the assumption in Firpo, Fortin, and Lemieux (2009, pp.955-957), framed as “maintaining the
conditional distribution of Y given X unaffected.” In essence, Firpo, Fortin, and Lemieux (2009) requires
f U |X (u| x ) = f U |X (u| x δ ). When this condition fails, we may still have f U |X,W (u| x, w) = f U |X,W (u| x δ , w).
Such a condition has also been used in Hsu, Lai, and Lieli (2020) and Spini (2021) in a context of
extrapolation to populations with different distributions of the covariates.

Remark 8. The first condition in Assumption 2 is satisfied if U is independent of ( X, W ). The sec-


ond condition in Assumption 2 is a conditional independence assumption, which is commonly used to
achieve identification. When W consists of only causal variables entering the causal function h( X, W, U ),
Assumption 2 may not hold. In this case, we can find control variables Wc so that

f U |X,W,Wc (u| x, w, wc ) = f U |X,W,Wc (u| x δ , w, wc ) = f U |W,Wc (u|w, wc ).

After replacing W by W ∗ = (W, Wc ), Corollary 1 continues to hold. To see this, we can write the
structural function as h∗ ( X, W ∗ , U ), but h∗ ( X, W ∗ , U ) = h( X, W, U ). That is, we include the control
variables in the structural function and restrict the structural function to be a constant function of the
control variables. With such a conceptual change, our proof goes through without any change.

Remark 9. The second part of Corollary 1 is specific to the location-scale change. The overall effect Πτ
can be decomposed into the sum of Πτ,L and Πτ,S . Here Πτ,L is the location effect, and is the estimand
µ

˙ 0) = 1 and s (δ) ≡ 1. Πµ is the scale effect,


studied by Firpo, Fortin, and Lemieux (2009) when we set `( τ,S
and it is one of the main objects of interest in this study.

Remark 10. Corollary 1 shows that the scale effect under a general µ is linearly related to the location
effect and the scale effect under the specific µ = u X :

Πτ,S = µ̃Πτ,L + Πτ,S


µ X µ
, (6)

where
ṡ (0)
µ̃ = (µ − µ X ) .
`˙ (0)
The slope µ̃ is proportional to µ − µ X and independent of τ. We will refer to Πτ,S
X µ
as the pure scale effect,
as it is not related to the location effect.

In the rest of this section, we focus on the location-scale shift. To better understand the
location and scale effects in Corollary 1, consider the case that X and U are independent and
there is no W. Then
ˆ
`˙ (0) ∂FY |X ( Qτ [Y ]| x )
Πτ,L =− f X ( x )dx,
f Y ( Qτ [Y ]) X ∂x
ˆ
ṡ (0) ∂FY |X ( Qτ [Y ]| x )
Πτ,S
µ
= ( x − µ) f X ( x )dx. (7)
f Y ( Qτ [Y ]) X ∂x

10
Define
∂FY |X ( Qτ [Y ]| x ) ∂ Pr (Y ≤ Qτ [Y ]| X = x )
Xτ,F ( x ) = = ,
∂x ∂x
which measures how Pr (Y ≤ Qτ [Y ]| X = x ) will change when we induce a small change in x. By
definition,
Pr (Y ≤ Qτ [Y ]| X = x + ∆) − Pr (Y ≤ Qτ [Y ]| X = x )
Xτ,F ( x ) = lim . (8)
∆ →0 ∆
Intuitively, when x is changed into x + ∆, the value of Y will cross Qτ [Y ] from above for a subset
of individuals, and the value of Y will cross Qτ [Y ] from below for another subset of individuals.
The difference in the fractions of individuals in these two subsets is the numerator of (8). Xτ,F ( x )
is then the limit value of the difference rescaled by the induced change in x.
Note that Xτ,F ( x ) is possibly a nonlinear function of x. For notational simplicity, let Xτ,F =
Xτ,F ( X ). To sign the location effect and the pure scale effect, consider the best linear prediction
of Xτ,F using X − µ X as the predictor:

∗ ∗
Xτ,F = c0τ + ( X − µ X ) c1τ + eτ ,

where E(eτ ) = 0 and cov ( X, eτ ) = 0. By definition,


ˆ

∂FY |X ( Qτ [Y ]| x )
c0τ = E[ Xτ,F ] = f X ( x )dx,
X ∂x
ˆ
∗ cov( Xτ,F , X ) 1 ∂FY |X ( Qτ [Y ]| x )
c1τ = = ( x − µ X ) f X ( x )dx.
var ( X ) var ( X ) X ∂x

Therefore,
1 ∗ var ( X ) ∗
Πτ,L = −`˙ (0) and Πτ,S
µX
c0τ = ṡ (0) c .
f Y ( Qτ [Y ]) f Y ( Qτ [Y ]) 1τ
The signs of Πτ,L and Πτ,S
X µ
can then be determined from the signs of the best predictive intercept
and slope coefficient.
To sign the location effect Πτ,L , we can assess whether Pr (Y ≤ Qτ [Y ]| X = x ) is increasing in
x or not. If `˙ (0) > 0 and Pr (Y ≤ Qτ [Y ]| X = x ) is increasing in x on average, more precisely,
E [ Xτ,F ] ≥ 0, then Πτ,L ≤ 0. As an example, consider the case that h ( x, u) is decreasing in x for
each u. In this case, Pr (Y ≤ Qτ [Y ]| X = x ) is increasing in x for all x ∈ X , and so Πτ,L ≤ 0 if
`˙ (0) > 0.
It is a bit more challenging to sign the pure scale effect Πτ,S
X µ
. The best linear predictive
∗ depends not only on the function form of X
coefficient c1τ τ,F ( x ) but also on the distribution of
X. We consider two examples below.

Example 4. Normal Location Model. Consider a typical linear model Y = α + Xβ + U, where X and
U are independent N (0, 1). We have: Πτ,L = `˙ (0) β and

β2 β2
Πτ,S

= −ṡ(0) ( Q τ [ Y ] − α ) = − ṡ ( 0 ) Q τ [U ] .
β2 + 1
p
β2 + 1

11
See Section S.1 in the Supplementary Appendix for details. While the location effect is constant across τ,
the pure scale effect varies across quantiles and does not depend on the sign of β. The coefficient on the scale
effect (i.e., β2 /( β2 + 1)) has a “signal-to-noise-ratio” interpretation. Indeed, Πτ,S
X µ
= −ṡ(0) E[ Xβ|Y =
Qτ [Y ]]. See Theorem 2 below. This can be regarded as an inverse prediction problem. Given Y = Qτ [Y ],
we want to predict or extract the signal Xβ. The predictive coefficient, given by var ( Xβ)/var (Y ), is
precisely β2 /( β2 + 1).

The next example represents the pure scale effect under increasingly restrictive assumptions,
culminating with a generalization of Example 4. Details are given in Section S.2 in the Supple-
mentary Appendix.

Example 5. Normal Covariates. Consider the linear model Y = α + Xβ + U where X and U are
independent. Suppose we only assume that X ∼ N (µ X , σX2 ). We can use Stein’s lemma (see, for example,
Casella and Berger (2001, pp.124-125) and references therein) to gain some insight on the pure scale effect.
Stein’s lemma states that for a differentiable function m such that E[|m0 ( X )|] < ∞, E[m( X )( X − µ X )] =
σX2 E[m0 ( X )] whenever X ∼ N (µ X , σX2 ). Taking m( x ) = Xτ,F ( x ) and using Stein’s lemma, we can
express the pure scale effect as
 
ṡ (0) ṡ (0) ∂Xτ,F ( X )
Πτ,S 2
µX
= E [ Xτ,F ( X ) ( X − µ X )] = σ E
f Y ( Qτ [Y ]) f Y ( Qτ [Y ]) X ∂X
" #
ṡ (0) ∂2 FY |X ( Qτ [Y ]| X )
= σX2 E .
f Y ( Qτ [Y ]) ∂X 2

Therefore, when X is normal and ṡ(0) > 0, the pure scale effect is non-negative (non-positive) if FY |X ( Qτ [Y ]| x )
is a convex (concave) function of x. It is interesting to see that the location effect depends on the first order
derivative of FY |X ( Qτ [Y ]| x ) while the pure scale effect depends on its second-order derivative.
If FY |X ( Qτ [Y ]| x ) = Gτ ( aτ + bτ x ) for some link function Gτ and parameters aτ and bτ , then

Xτ,F ( x ) = bτ Ġτ ( aτ + bτ x )

and
∂Xτ,F ( x )
= bτ2 G̈τ ( aτ + bτ x )
∂x
where Ġτ and G̈τ are the first order and second order derivatives of Gτ . Hence

σX2 bτ2
Πτ,S
Xµ  
= ṡ (0) E G̈τ ( aτ + bτ X ) .
f Y ( Qτ [Y ])

For example, when U is also normal so that Gτ is the standard normal cumulative distribution function
(cdf), we obtain a generalization of Example 4:

σX2 β2
Πτ,S
µ
X
= −ṡ(0) q Q τ [U ] .
σX2 β2 + 1

12
The above result reduces to that of Example 4 when we set σX2 = 1. Regardless of whether σX2 = 1, Πτ,S
X µ

does not depend on µ X . Thus, the mean of X does not play a role in the pure scale effect.

To understand why, in Example 5, the pure scale effect does not depend on µ X and sign ( β),
we write
!
∂FY |X ( Qτ [Y ]| X )
cov ,X = − βcov ( f U ( Qτ [Y ] − α − Xβ), X )
∂x
= cov ( f U ( Qτ [U ◦ ] + X ◦ ), X ◦ ) ,

where X ◦ := − β ( X − µ X ) and U ◦ = U − X ◦ . Now, if X − µ X is symmetrically distributed around


zero, then the above covariance does not depend on sign( β) as the distributions of X ◦ and U ◦
remain the same if we flip the sign of β. Also, since the distributions of X ◦ and U ◦ do not depend
on µ X , the above covariance does not depend on µ X . On the other hand, for the denominator of
the scale effect, we have

f Y ( Qτ [Y ]) = f Y ( Qτ [α + Xβ + U ]) = f Y ( Qτ [α + ( X − µ X ) β + U + µ X β])
= f Y (α + Qτ [U ◦ + µ X β]) = f Y (α + Qτ [U ◦ ] + µ X β) = f U ◦ ( Qτ [U ◦ ]) .

If X − µ X is symmetrically distributed around zero, then the distribution of U ◦ does not depend
on µ X or sign( β) . Hence, f Y ( Qτ [Y ]) does not depend on µ X or sign( β) .
Since both the numerator and the denominator of Πτ,S
X µ
are invariant to µ X and sign( β), we
obtain the following proposition immediately.

Proposition 1. Consider the linear model Y = α + Xβ + U where X and U are independent. If X − E [ X ]


is symmetrically distributed around zero, then the pure scale effect does not depend either on E [ X ] or on
sign( β) .

2.2 Interpretation of the scale effects

Consider a situation where we only care about the scale effect, that is, we set `(δ) ≡ 0. Then, we
have x δ = µ + ( X − µ) /s(δ) and σXδ = σX /s (δ). To interpret Πτ,S , we assume Qτ [Yδ ] 6= 0 and
µ

consider the following Y-quantile-X-standard-deviation elasticity


  −1
dQτ [Yδ ] dσXδ
Eτ,δ := .
Qτ [Yδ ] σXδ

By straightforward calculations, we have


  −1
1 dQτ [Yδ ] 1 ds (δ)
Eτ,δ =− .
Qτ [Y ] dδ s (δ) dδ

13
When s(0) = 1 and ṡ (0) 6= 0, the elasticity at δ = 0 is

Πτ,S
µ
Eτ,δ=0 = − . (9)
ṡ (0) Qτ [Y ]

Therefore, a 1% decrease in the standard deviation of X results in a Πτ,S / (ṡ (0) Qτ [Y ]) % change
µ

in the τ-quantile of Y.

Example 5 (Continued). In this case, the elasticity of Πτ,S


X µ
at δ = 0 is:

σX2 β2 Q τ [U ] σ 2 β2 Q [U ]
Eτ,δ=0 = q =q X qτ .
σX2 β2 + 1 Qτ [Y ] σX2 β2 + 1 α + µ X β + σX2 β2 + 1Qτ (U )

So, Eτ,δ=0 is positive if Qτ [U ] and Qτ [Y ] have the same sign. When α = 0 and µ X = 0, Eτ,δ=0 =
σX2 β2 /(σX2 β2 + 1), which is positive for all quantile levels.

To find the value of s (δ) corresponding to a ∆% change in the standard deviation of X, we


let

 
σ
σXδ = X = 1+ σX .
s (δ) 100
We then obtain  −1


s (δ) = 1+ .
100

For ∆ = −1, which corresponds to 1% decrease in σX , we have s (δ) = (1 − 1/100)−1 = 1.0101.


Often times, when the outcome is strictly positive as in prices and wages, we are interested
in log Y. In such a case we denote the marginal scale effect by Π̃τ,S , and, since we set `(δ) ≡ 0
µ

and there is no location effect, it is given by

Qτ [log Yδ ] − Qτ [log Y ]
Π̃τ,S := lim
µ
.
δ →0 δ

Since log (·) is a strictly increasing transformation, we have

log Qτ [Yδ ] − log Qτ [Y ]


Π̃τ,S = lim
µ
,
δ →0 δ

and we can relate Π̃τ,S to Πτ,S by


µ µ

1
Π̃τ,S = Π .
µ µ
Qτ [Y ] τ,S
Comparing this last expression to (9), we obtain that the elasticity at δ = 0 is

Π̃τ,S
µ
Eτ,δ=0 = − .
ṡ(0)

This says that a 1% increase in the standard deviation of X results in a −Π̃τ,S /ṡ (0) % change in
µ

14
the τ-quantile of Y. When ṡ (0) = −1, the scale effect Π̃τ,S (based on log (Y )) can be interpreted
µ

directly as the Y-quantile-X-standard-deviation elasticity.

2.3 Relation to Conditional Effects

In order to explore the relationship between conditional quantile regression coefficients and un-
conditional effects, we introduce a “conditional” version of the unconditional effect given in
Corollary 1. For a given ( x, w) ∈ X ⊗ W , this is defined as

Qτ [Yδ | x, w] − Qτ [Y | x, w]
Πτ ( x, w) := lim ,
δ →0 δ

whenever this limit exists. In the above, Qτ [Y | x, w] is the conditional quantile of Y given X = x
and W = w, and Qτ [Yδ | x, w] is the conditional quantile of Yδ = h (G ( X, δ) , W, U ) given X = x
and W = w. Under essentially the same assumptions as in Theorem 1 and Corollary 1, we can
obtain
1 ∂FY |X,W ( Qτ [Y | x, w]|z, w)
Πτ ( x, w) = κ (x) .
f Y |X,W ( Qτ [Y | x, w]| x, w) ∂z z= x

This is a “non-integrated” version of Πτ . There are two differences between Πτ ( x, w) and


Πτ . First, instead of the unconditional quantile Qτ [Y ], the conditional quantile Qτ [Y | x, w] is used
in Πτ ( x, w). Second, instead of the unconditional density f Y , the conditional density f Y |X,W is
used in Πτ ( x, w). Note that “iterating the expectation” is unlikely to work, i.e., E[Πτ ( X, W )] is,
in general, not equal to Πτ . Due to this fact, we follow Firpo, Fortin, and Lemieux (2009) and use
the matching function:

ξ τ ( x, w) = η : Qη [Y | x, w]) = Qτ [Y ] . (10)

This function matches the unconditional quantile at quantile level τ with the conditional quantile
(conditioning on X = x, W = w) at the quantile level ξ τ ( x, w).

Theorem 2. If Πτ ( x, w) exists for all τ in the support of ξ τ ( X, W ), then the unconditional marginal
effect can be represented as
" #
f Y |X,W ( Qτ [Y ]| X, W )
Πτ = E Πξ τ (X,W ) ( X, W ) ,
f Y ( Qτ [Y ])

and as a (reverse) projection:


h i
Πτ = E Πξ τ (X,W ) ( X, W ) Y = Qτ [Y ] .

The first representation is the counterpart of Proposition 1(ii) of Firpo, Fortin, and Lemieux
(2009). The second representation appears to be new. It does not rely on any shape or dimension
restriction on the structural model Y = h( X, W, U ).

15
Because FY |X,W ( Qτ [Y | x, w]| x, w) = τ, by implicitly differentiating, we have

∂Qτ [Y | x, w] 1 ∂FY |X,W ( Qτ [Y | x, w]|z, w)


=− .
∂x f Y |X,W ( Qτ [Y | x, w]| x, w) ∂z z= x

We can then write Πτ ( x, w) in terms of the conditional quantile effect:

∂Qτ [Y | x, w] ∂Qτ [Y | x, w] ∂G( x; δ)


Πτ ( x, w) = − κ (x) = .
∂x ∂x ∂δ δ =0

Using Theorem 2, we then have


(" # )
∂Qξ τ (X,W ) [Y |z, W ]) f Y |X,W ( Qτ [Y ]| X, W )
Πτ = − E |z= X κ ( X ) (11)
∂z f Y ( Qτ [Y ])

and (" # )
∂Qξ τ (X,W ) [Y |z, W ])
Πτ = − E | z = X κ ( X ) Y = Q τ [Y ] . (12)
∂z

Hence, Πτ is a weighted average of quantile derivatives. This suggests an alternative way to


estimate Πτ . The problem of estimating a weighted average of quantile derivatives has been
recently studied in Lee (2021). However, our problem is different. In both (11) and (12), the
average is taken over a random quantile level ξ τ ( X, W ), instead of a fixed quantile level. The
random quantile level arises from the matching function given in (10). We leave this alternative
way to estimate Πτ for future research.

Example 6. Consider the location-scale shift with no covariate W. Suppose that the conditional quantiles
are linear: Qτ [Y | X = x ] = aτ + xbτ . Then

∂Qξ τ (X ) [Y |z])
= bξ τ ( X )
∂z z= X

and so h i h i
˙ 0) E bξ (X ) |Y = Qτ [Y ] −ṡ(0) E bξ (X ) ( X − µ X ) |Y = Qτ [Y ] .
Πτ X = `(
µ
τ τ
| {z }| {z }
=Πτ,L µ
=Πτ,S
X

In Example 4, we have bτ = β for every τ, and ṡ(0) = 1. The pure scale effect is then

Cov( X, Y ) β2
Πτ,S

= − βE [( X − µ X ) |Y = Qτ [Y ]] = − β ( Q τ [Y ] − α ) = − 2 ( Q τ [Y ] − α ) ,
Var (Y ) β +1

which is what we obtained before.

16
3 Compensated Marginal Effects
In this section, we consider the case where a location shift in one covariate is compensated by
a location shift in another covariate. In a model Y = h( X1 , X2 , W, U ) where both X1 and X2 are
univariate, we consider the limiting effect of the simultaneous location shift X1δ = X1 + `1 (δ) and
X2δ = X2 + `2 (δ) for some smooth functions `1 (δ) and `2 (δ) satisfying `1 (0) = `2 (0) = 0. In the
simplest case, we have `1 (δ) = δ and `2 (δ) = − pδ for some p ≥ 0. Here, p can be interpreted as
the “relative price” of X1 in terms of X2 . An example is the following: a policy targeted towards
increasing the level of education can, at the same time, reduce the experience of workers. As
with the case of the scale shift, neglecting this possible side effect of the policy might lead to an
inconsistent estimator of its effect.
With the above motivation, we now consider a more general setting that allows for a general
change in X1 and X2 . We induce a change in X = ( X1 , X2 )0 so that it becomes Xδ = ( X1δ , X2δ )0 .
We do not specify the exact form of the change, but we use the simultaneous location shift as a
working example. We assume that

Xδ = G ( x; δ) = (G1 ( X; δ) , G2 ( X; δ))0

for a smooth and invertible bivariate function G = (G1 , G2 )0 . We allow X1δ and X2δ to depend on
both X1 and X2 . A special case is that G1 ( X; δ) is a function of X1 only and G2 ( X; δ) is a function
of X2 only.
In this general setting, the original outcome is given by

Y = h ( X1 , X2 , W, U ) = h ( X, W, U ) ,

and the counterfactual outcome is given by

Yδ = h( X1δ , X2δ , W, U ) = h(G1 ( X; δ) , G2 ( X; δ) , W, U ). (13)

The distribution of ( X, W, U ) is kept the same in the above two equations. We want to identify
the following quantity
Qτ [Yδ ] − Qτ [Y ]
Πτ,C := lim , (14)
δ →0 δ
whenever this limit exists. We refer to Πτ,C as the compensated marginal effect for the τ-quantile.
0
Let x = ( x1 , x2 )0 . As before, we define x δ = x1δ , x2δ such that G x δ ; δ = x. By construction,


Xδ = x if and only if X = x δ . Define the Jacobian matrix as


 
∂x1δ ∂x1δ  −1
∂G ( x; δ)
δ

  ∂x
J xδ ; δ := = ∂x1 ∂x2 = ,
∂x 0 ∂x2δ ∂x2δ ∂x 0 x= xδ
∂x1 ∂x2


where the second equality follows from differentiating G x δ , δ = x with respect to x and then

17
solving for ∂x δ /∂x 0 .

Assumption 3. (i) For some ε > 0, each component function of G ( x; δ) is continuously differentiable on
X ⊗ N ε.
(i.b) G ( x; δ) is an invertible function of x each δ ∈ Nε .
(i.c) G ( x; 0) = x for all x ∈ X .
(ii) for δ ∈ Nε , the conditional density of U satisfies f U |Xδ ,W (u| x, w) = f U |X,W (u| x δ , w) and the
support U of U conditional on X and W does not depend on ( X, W ) .
  
(iii) Assumption 1 (iii.a) holds with J x δ ; δ replaced by det J x δ ; δ and Assumption 1 (iii.b) holds.
(iv) f X,W ( x, w) is equal to 0 on the boundary of the support of X1 given W = w and X2 = x2 for all
w ∈ W and x2 ∈ X2 , the support of X2 , and symmetrically, f X,W ( x, w) is equal to 0 on the boundary of
the support of X2 given W = w and X1 = x1 for all w ∈ W and x1 ∈ X1 , the support of X1 .
(v) f Y ( Qτ [Y ]) > 0.

Assumption 3 is a modified version of Assumption 1 adapted to the case with two target
covariates. Under Assumption 3(i.c), we have J ( x; 0) = I2 , the 2 × 2 identity matrix. Since
 
det [ J ( x, 0)] = 1, by continuity, det J x δ ; δ > 0 when δ is small enough. Hence, there is no
 
need to take the absolute value of det J x δ ; δ when converting the pdf of ( X, W ) into that of
( Xδ , W ) .
Define the local change function as

∂x δ
κ ( x ) = (κ1 ( x ) , κ2 ( x ))0 := .
∂δ δ =0

Under Assumption 3, we have


∂G ( x; δ)
κ (x) = − .
∂δ δ =0

Theorem 3. Let Assumption 3 hold. Then


" #
∂E [ψ (Y, τ, FY ) | X, W ] ∂ ln f U |X,W (U | X, W )
 
Πτ,C = −E κ ( X ) + E ψ (Y, τ, FY ) κ (X) , (15)
∂X 0 ∂X 0

where, as before,
τ − 1 (y < Qτ [Y ])
ψ (y, τ, FY ) = .
f Y ( Qτ [Y ])
The theorem takes the same form as Theorem 1. Under the assumption that X jδ is a function
of X j only for j = 1 and 2, κ j ( x ) depends on x j only, and the effect from changing X1 into X1δ
and that from changing X2 into X2δ are additively separable.

Corollary 2. Let Assumptions 2 and 3 hold. Then

∂E [ψ (Y, τ, FY ) | X, W ]
 
Πτ,C = −E κ (X) .
∂X 0

18
For the case of a simultaneous location shift X1δ = X1 + `1 (δ) and X2δ = X2 + `2 (δ), we have
0
κ ( x ) = − `˙ 1 (0) , `˙ 2 (0) ,

and so
ˆ ˆ
`˙ 1 (0) ∂FY |X,W ( Qτ [Y ]| x, w)
Πτ,C = − f X,W ( x, w)dxdw
f Y ( Qτ [Y ]) W X ∂x1
ˆ ˆ
`˙ 2 (0) ∂FY |X,W ( Qτ [Y ]| x, w)
− f X,W ( x, w)dxdw. (16)
f Y ( Qτ [Y ]) W X ∂x2

Corollary 2 shows that the compensated effect from the simultaneous location shift is a linear
combination of two location effects: one where the target variable is X1 and the other where the
target variable is X2 . Thus, we can write: Πτ,C = Πτ,L,1 + Πτ,L,2 . This additive result follows
because we have two unrelated location changes whose effects are, in essence, captured by the
sum of two partial derivatives. This is convenient since it immediately allows us to obtain the
bias if we omit the possible simultaneous change in a covariate different from the target variable.
Corollary 1 in Firpo, Fortin, and Lemieux (2009) considers the case of a simultaneous location
shift in k covariates, and delivers a k × 1 vector of marginal effects. Theorem 3 and Corollary 2
complement such a result by showing how to interpret a linear combination of the entries of the
vector of marginal effects. Furthermore, Theorem 3 and Corollary 2 allow for the intervention
of a target covariate to depend on another target covariate. Here we consider only two target
covariates for ease of exposition. Our results can be easily extended to the case with more than
two target covariates.
Our framework can accommodate more complicated policy interventions, such as simultane-
ous location-scale shifts in two target variables. In a potential application, a compensated change
may substitute the mean of one target variable with the variance of another target variable. Given
the generality of G ( x; δ), Corollary 2 is general enough to accommodate various compensating
policies.

4 Estimation of location-scale effects


4.1 Location-scale effects

In this section, we focus on the estimation of Πτ given in (5). The estimator involves several
µ

preliminary steps. Firstly, for a given quantile, we need to estimate Qτ [Y ]. This is given by

n
q̂τ = arg min ∑ (τ − 1 {Yi ≤ q}) (Yi − q). (17)
q
i =1

19
Next, we need to estimate the density of Y evaluated at Qτ [Y ]. This can be estimated by

1 n
fˆY (q̂τ ) = ∑ Kh (Yi − q̂τ ) (18)
n i =1

where Kh (u) = h−1 K(h−1 u) for a given kernel K and a bandwidth h. For the average derivative
of the conditional cdf, we propose either a logit model as in Firpo, Fortin, and Lemieux (2009) or
a probit model. We model:

FY |X,W ( Qτ [Y ]| x, w) = G ( xατ + w0 β τ ) (19)

where G (·) is either the cdf of a logistic random variable (logit) or a standard normal random
0
variable (probit). Let Zi = ( Xi0 , Wi0 ) and θτ = (α0τ , β0τ )0 . We estimate θτ by the maximum likeli-
hood estimator:
n
θ̂τ := (α̂τ , β̂0τ )0 = arg max ∑ li (θ; q̂τ )
θ ∈ Θ i =1
n  
= arg max ∑ 1 {Yi ≤ q̂τ } log G ( Zi0 θ ) + 1 {Yi > q̂τ } log 1 − G ( Zi0 θ ) ,
   
(20)
θ ∈ Θ i =1

where Θ is a compact parameter space that contains θτ as an interior point. The estimator of Πτ
µ

is then
Π̂τ = Π̂τ,L + Π̂τ,S
µ µ

where
˙ 0) 1 n
`(
Π̂τ,L = − ∑ g(Zi0 θ̂τ )α̂τ ,
fˆY (q̂τ ) n i=1
(21)

ṡ(0) 1 n
Π̂τ,S = ∑ g(Zi0 θ̂τ )α̂τ (Xi − µ) .
µ
(22)
fˆY (q̂τ ) n i=1

In the above, g is the derivative of G, that is, the logistic density or the standard normal density.
In order to establish the asymptotic distribution of Π̂τ , we need the following three sets of
µ

assumptions, one for each preliminary estimation step.

Assumption 4. Quantile. The density of Y is positive, continuous, and differentiable at Qτ [Y ].

Assumption 5. Logit/Probit. For G either the cdf of a logistic or a standard normal random variable,
we have

(i) FY |Z ( Qτ [Y ]|z) = G (z0 θτ ) for an interior point θτ ∈ Θ and θ̂τ = θτ + o p (1) .

(ii) For
∂2 li (θ; q)
Hi (θ; q) = ,
∂θ∂θ 0

20
the Hessian of observation i, the following holds

1 n p
sup ∑
n i =1
Hi (θ; q) − E[ Hi (θ; q)] → 0,
(θ,q)∈N

where N is a neighborhood of (θτ0 , Qτ [Y ]0 )0 , and H := E[ Hi (θτ ; Qτ [Y ])] is negative definite.

(iii) For the score si defined by


∂li (θ; q)
si (θ, q) = ,
∂θ
the following stochastic equicontinuity assumption holds:

1 n  1 n
n i∑ n i∑
si (θτ ; q̂τ ) − E [si (θτ ; q)] |q=q̂τ = si (θτ ; Qτ [Y ]) + o p (n−1/2 ),
=1 =1

and the map q 7→ E [si (θτ ; q)] is continuously differentiable at Qτ [Y ] with

∂E [si (θτ ; q)]


= : HQ .
∂q q = Q τ [Y ]

0
(iv) For X̃i = (1, Xi )0 and Z̃i = (1, Zi0 ) , the following uniform law of large numbers holds:

1 n p
∑ ġ( Zi0 θ ) X̃i Z̃i0 − E ġ( Zi0 θ ) X̃i Z̃i0
 
sup → 0,
θ ∈Nθ n i =1

where Nθ is a neighborhood of θτ , ġ is the derivative of g, and


!
E [ ġ( Zi0 θτ )ατ Xi0 + g( Zi0 θτ )] E [ ġ( Zi0 θτ )ατ Wi0 ]
M :=
E [ ġ( Zi0 θτ )ατ Xi Xi0 + g( Zi0 θτ ) Xi0 ] E [ ġ( Zi0 θτ )ατ Xi Wi0 ]

exists.

Assumption 6. Density.
´∞ ´∞
(i) The kernel function K (·) satisfies (i) −∞ K (u)du = 1, (ii) −∞ u2 K (u)du < ∞, and (iii) K (u) =
K (−u), and it is twice differentiable with Lipschitz continuous second-order derivative K 00 (u)
´∞
satisfying (i) −∞ K 00 (u)udu < ∞ and (ii ) there exist positive constants C1 and C2 such that
|K 00 (u1 ) − K 00 (u2 )| ≤ C2 |u1 − u2 |2 for |u1 − u2 | ≥ C1 .

(ii) As n ↑ ∞, the bandwidth satisfies: h ↓ 0, nh3 ↑ ∞, and nh5 = O(1).

Under Assumption 4, q̂τ given in (17) is asymptotically linear with

1 n τ − 1 {Yi ≤ Qτ [Y ]} 1 n
q̂τ − Qτ [Y ] = ∑
n i =1 f Y ( Qτ [Y ])
+ o p (n−1/2 ) = ∑ ψ(Yi , τ, FY ) + o p (n−1/2 ).
n i =1

21
See, for example, Serfling (1980). Assumption 5 is mostly necessary to deal with the preliminary
estimator q̂τ that enters the likelihood in (20). Assumption 6 is taken from Martinez-Iriarte and
Sun (2021b).
The following lemma contains the influence function for the maximum likelihood estimator
θ̂τ .

Lemma 1. Under Assumptions 4 and 5, we have

1 n 1 n
θ̂τ − θτ = − H −1 ∑
n i =1
si (θτ ; Qτ [Y ]) − H −1 HQ ∑ ψ(Yi , τ, FY ) + o p (n−1/2 ).
n i =1

Theorem 4. Under Assumptions 4, 5, and 6, the estimators given in (21) and (22) satisfy
! !
Π̂τ,L Πτ,L 1 n
∑ Φi,τ + O h2 + o p (n−1/2 ) + o p (n−1/2 h−1/2 ),

− =
Π̂τ,S Πτ,S
µ µ
n i =1

where

1
Φi,τ = Dµ g( Zi0 θτ )ατ X̃i − Eg( Zi0 θτ )ατ X̃i
 
f Y ( Qτ [Y ])
1
− Dµ MH −1 si (θτ ; Qτ [Y ])
f Y ( Qτ [Y ])
" ! #
Πτ,L f˙Y ( Qτ [Y ]) 1 −1
− + Dµ MH HQ ψ(Yi , τ, FY )
Πτ,S f Y ( Qτ [Y ])
µ
f Y ( Qτ [Y ])
!
Πτ,L 1
− {Kh (Yi − Qτ [Y ]) − EKh (Yi − Qτ [Y ])} ,
Πτ,S f Y ( Qτ [Y ])
µ

f˙Y (·) is the derivative of f Y (·) , and


! !
DL0 ˙ 0)
−`( 0
Dµ = 0
= .
Dµ,S −µṡ(0) ṡ(0)

Theorem 4 establishes the contribution from each estimation step. In particular, the last term
in n−1 ∑in=1 Φi,τ is the contribution from estimating the density of Y non-parametrically. This
term converges at a non-parametric rate, which is slower than other terms. As a result, the
asymptotic distribution of the location-scale effect estimator is determined by the last term in
n−1 ∑in=1 Φi,τ . However, we do not recommend dropping all other terms. Instead, we write the
asymptotic normality result in the form
" #−1/2 " ! !#
1 n
Π̂τ,L Πτ,L
∑ Φ̂i,τ Φ̂i,τ
0 d
− → N (0, I2 ) (23)
n2 Π̂τ,S Πτ,S
µ µ
i =1

22
as n ↑ ∞, nh3 ↑ ∞, and nh5 ↓ 0 where Φ̂i,τ is a plug-in estimator of Φi,τ . In particular,
" #−1/2
n
n−2 ∑ (l10 Φ̂i,τ )2
 d
Π̂τ,L − Πτ,L → N (0, 1),
i =1
" #−1/2
n  
n−2 ∑ (l20 Φ̂i,τ )2
d
Π̂τ,S − Πτ,S → N (0, 1),
µ µ
(24)
i =1

where l1 = (1, 0)0 and l2 = (0, 1)0 . The above results hold under some additional but standard
regularity conditions such as the nonsingularity of the probability limit of n−2 ∑in=1 Φ̂i,τ Φ̂i,τ
0 . In-

ferences based on these results account for the estimation errors from all estimation steps and
are more reliable in finite samples. This is supported by simulation evidence not reported here.

On the other hand, if we parametrize the density of Y and estimate it at the parametric n-rate,
then the last term in n−1 ∑in=1 Φi,τ will take a different form and will be of the same order as the

other terms. In this case, the location-scale effect estimator is n-asymptotically normal, and all
the terms in Theorem 4 will contribute to the asymptotic variance. With an obvious modification
of the last term in Φi,τ , the asymptotic normality can be presented in the same way as in (23).
Let " #
0
∂FY |X,W ( Qτ [Y ]| X, W )
Γτ,S
µ
= Dµ,S E ( X − µ)
∂X

be the numerator of Πτ,S . Then the scale effect Πτ,S is zero if and only if Γτ,S = 0. To test the null
µ µ µ

hypothesis H0 : Πτ,S = 0, we can equivalently test the null hypothesis H0 : Γτ,S = 0. Unlike Πτ,S ,
µ µ µ

Γτ,S can be estimated at the parametric rate even if f Y (·) is not parametrically specified. More
µ

specifically, under Assumption 5, we can estimate Γτ,S by


µ

1 n
n i∑
0
Γ̂τ,S := Dµ,S g( Zi0 θ̂τ )α̂τ X̃i ,
µ

=1

0
where Dµ,S = (−µ, 1) upon setting ṡ(0) = 1 without loss of generality.
Under the assumptions of Theorem 4, we can show that

n  
0 1 1
n i∑
Γ
Γ̂τ,S − Γτ,S Φi,τ
µ µ
= Dµ,S + op √ ,
=1 n

where

Γ
Φi,τ = g( Zi0 θτ )ατ X̃i − E g( Zi0 θτ )ατ X̃i
 

− MH −1 si (θτ ; Qτ [Y ]) − MH −1 HQ ψ(Yi , τ, FY ).

Define
n
1
Vτ = lim
n → ∞ n2
∑ E( Dµ,S
0 Γ 2
Φi,τ ) .
i =1

23
d
Under some regularity conditions such as Vτ > 0, we have Vτ−1/2 (Γ̂τ,S − Γτ,S ) → N (0, 1). To test
µ µ

H0 : Γτ,S = 0, we construct the test statistic


µ

Γ̂τ,S
µ n
1
∑ ( Dµ,S
0 Γ 2
Φ̂i,τ
µ
tτ,S := p for V̂τ = 2 ) ,
V̂τ n i =1

where

1 n −1 −1
Γ
Φ̂i,τ = g( Zi θ̂τ )α̂τ X̃i − ∑
n i =1
g( Zi θ̂τ )α̂τ X̃i − M̂ Ĥ si (θ̂τ ; q̂τ ) − M̂ Ĥ ĤQ ψ̂(Yi , τ, FY ). (25)

In the above, ψ̂(Yi , τ, FY ) = [τ − 1 {Yi ≤ q̂τ }] / fˆY (q̂τ ) and the score si (θ̂τ ; q̂τ ) is obtained by eval-
uating the expression given in (A.7) at θ = θ̂τ and q = q̂τ . M̂, Ĥ, and ĤQ are the sample versions
of M, H, and HQ , respectively. Details are given in the proof of the corollary below.
d
Corollary 3. Let the assumptions of Theorem 4 hold. Assume that Vτ−1/2 (Γ̂τ,S − Γτ,S ) → N (0, 1) for
µ µ

p
some Vτ > 0 and V̂τ /Vτ → 1. Then, under the null hypothesis H0 : Πτ,S = 0,
µ

µ d
tτ,S → N (0, 1).

4.2 Compensated Effects

In this section, we focus on the estimation of Πτ,C given in (16). We use the same estimators of the
quantile, the density of Y, and the parameters in the probit/logit model. We only need to make
0
some minor notational changes. As before θτ = (α0τ , β0τ ) , θ̂τ = (α̂0τ , β̂0τ )0 and Zi = ( Xi0 , Wi0 )0 but
now ατ = (ατ,1 , ατ,2 )0 , α̂τ = (α̂τ,1 , α̂τ,2 )0 and Xi = ( X1i
0 , X 0 )0 . As in the case with the location-scale
2i
effect, we estimate Πτ,C by
Π̂τ,C = Π̂τ,L,1 + Π̂τ,L,2

where

`˙ 1 (0) 1 n
Π̂τ,L,1 = − ∑ g(Zi0 θ̂τ )α̂τ,1 ,
fˆY (q̂τ ) n i=1
(26)

`˙ 2 (0) 1 n
Π̂τ,L,2 = − ∑ g(Zi0 θ̂τ )α̂τ,2 .
fˆY (q̂τ ) n i=1
(27)

For the next theorem, we need the following modification of Assumption 5.

Assumption 7. Logit/Probit II. Assumption 5 holds with (iv) replaced by the following:

1 n p
∑ ġ( Zi0 θ ) Zi0 − E ġ( Zi0 θ ) Zi0
 
sup → 0,
θ ∈Nθ n i =1
1 n p
∑ ġ( Zi0 θ ) − E ġ( Zi0 θ )
 
sup → 0,
θ ∈Nθ n i =1

24
where Nθ is a neighborhood of θτ and
 
ML = E [ ġ( Zi0 θτ )ατ Xi0 + g( Zi0 θτ )] , E [ ġ( Zi0 θ )ατ Wi0 ]

exists.
Theorem 5. Under Assumptions 4, 6, and 7, the estimators given in (26) and (27) satisfy
! !
Π̂τ,L,1 Πτ,L,1 1 n L
∑ Φi,τ + O h2 + o p (n−1/2 ) + o p (n−1/2 h−1/2 ),

− =
Π̂τ,L,2 Πτ,L,2 n i =1

where

1
Φi,τ
L
DL g( Zi0 θτ )ατ − E g( Zi0 θτ )ατ
 
=
f Y ( Qτ [Y ])
1
− DL ML H −1 si (θτ ; Qτ [Y ])
f Y ( Qτ [Y ])
" ! #
Πτ,L,1 f˙Y ( Qτ [Y ]) 1
− + DL ML H −1 HQ ψ(Yi , τ, FY )
Πτ,L,2 f Y ( Qτ [Y ]) f Y ( Qτ [Y ])
!
Πτ,L,1 1
− {Kh (Yi − Qτ [Y ]) − EKh (Yi − Qτ [Y ])}
Πτ,L,2 f Y ( Qτ [Y ])

and !
−`˙ 1 (0) 0
DL = .
0 −`˙ 2 (0)
For the asymptotic normality, the discussions after Theorem 4 are still applicable.
In the special case that `1 (δ) = δ and `2 (δ) = − pδ, it suffices to change DL to diag(1, − p).
It is possible that p, the relative price X1 in terms of X2 , has to be estimated by p̂ based on an
independent sample. In that case, the estimator of the compensated effect would be

Π̂τ,L = Π̂τ,L,1 − p̂Π̂τ,L,2 .

If the sample size ñ of the independent sample for estimating p is much larger than n (i.e.,
ñ/n → ∞), then the expansion in Theorem 5 still holds.

5 Monte Carlo experiments


In this section, we use Monte Carlo simulations to evaluate the finite sample performances of the
proposed estimators and tests of location and scale effects. We employ the same data generating
process as in Examples 4 and 5 for which we have derived the closed-form expressions for the
location and scale effects. In particular, we let

Y = α + Xβ + U,

25
˙ 0) = 1. Then, from the results
where X ∼ N (µ X , σX2 ) and U ∼ N (0, 1). We set α = 0 and ṡ(0) = `(
in Examples 4 and 5, the true location effect is Πτ,L = β, and the true scale effect is

σX2 β2
Πτ,S
µ
X
= −q Q τ [U ] .
σX2 β2 + 1

We consider quantiles τ ∈ {0.10, 0.25, 0.50, 0.75, 0.90} and sample sizes n = 500 and n = 1000.
The number of simulations is set to 10, 000 for each experiment.
We implement our estimators in Matlab. The unconditional quantile estimator in equation
(17) is easily computed as an order statistic. The density function is estimated as a kernel density
estimator as in equation (18) using a standard normal kernel. For the bandwidth choice in
the kernel density estimation, we use a modified version of Silverman’s rule of thumb. More
specifically, since we require nh3 ↑ ∞ and nh5 ↓ 0 as n ↑ ∞, we take h = 1.06σ̂Y n−1/4 , where σ̂Y is
the sample standard deviation of Y.

5.1 Bias, variance and mean squared error

In this subsection, we consider the biases, variances, and mean-squared errors of the proposed
location and scale effects estimators. For each effect estimator, we consider either a probit or a
logit specification for the conditional cdf FY |X ( Qτ [Y ]| X ). Under our data generating process, the
probit for FY |X ( Qτ [Y ]| X ) is correctly specified while the logit is misspecified.
The bias, variance, and mean-squared error are reported in Table 1 when µ X = 0, β = 1 and
σX2 = 1 so that the true location effect is 1 for any τ and the true scale effect is −0.707Qτ [U ]. To
save space, simulation results for other values of β and σX2 are omitted.
Table 1 shows that the effect estimator based on the probit specification outperforms that
based on the logit one. This is consistent with the correct specification of probit. For each
estimator, the bias decreases as the sample size n increases. The variance also decreases as the
sample size n increase, and as a result, the MSE also becomes smaller when the sample size
grows. For our purposes, the scale-effect estimator performs well. For non-central quantiles,
the difference in the scale-effect estimates under the probit and logit specifications is in general
larger than the difference in the location-effect estimates. For central quantiles, the probit and
logit specifications lead to more or less the same estimates for both the scale effect and the
location effect.

5.2 Accuracy of the normal approximation

In this subsection, we investigate the finite sample accuracy of the normal approximation given
in (24). Using the same data generating process as in the previous subsection and employing the

26
Table 1: The biases, variances, and mean-squared errors of the location and scale effects estima-
tors with β = 1 and σX2 = 1.

τ = 0.1 τ = 0.25 τ = 0.50 τ = 0.75 τ = 0.90


n = 500
Bias Π L (probit) -0.015 0.013 0.023 0.012 -0.016
Π L (logit) -0.016 0.012 0.023 0.012 -0.016
ΠS (probit) -0.008 0.008 0.000 -0.007 0.008
ΠS (logit) 0.039 0.034 0.000 -0.034 -0.039
Variance Π L (probit) 0.019 0.010 0.008 0.010 0.019
Π L (logit) 0.019 0.010 0.008 0.010 0.020
ΠS (probit) 0.032 0.007 0.003 0.008 0.033
ΠS (logit) 0.033 0.007 0.003 0.008 0.034
MSE Π L (probit) 0.019 0.010 0.009 0.010 0.019
Π L (logit) 0.020 0.011 0.009 0.010 0.020
ΠS (probit) 0.033 0.007 0.003 0.008 0.033
ΠS (logit) 0.035 0.009 0.003 0.009 0.035
n = 1000
Bias Π L (probit) -0.011 0.009 0.017 0.008 -0.013
Π L (logit) -0.011 0.009 0.017 0.008 -0.013
ΠS (probit) -0.007 0.005 -0.000 -0.004 0.010
ΠS (logit) 0.041 0.032 -0.000 -0.031 -0.038
Variance Π L (probit) 0.011 0.006 0.005 0.006 0.011
Π L (logit) 0.011 0.006 0.005 0.006 0.011
ΠS (probit) 0.018 0.004 0.001 0.004 0.017
ΠS (logit) 0.018 0.004 0.001 0.004 0.018
MSE Π L (probit) 0.011 0.006 0.005 0.006 0.011
Π L (logit) 0.011 0.006 0.005 0.006 0.011
ΠS (probit) 0.018 0.004 0.001 0.004 0.017
ΠS (logit) 0.020 0.005 0.001 0.005 0.019

probit specification, we simulate the distributions of the studentized statistics


" #−1/2
n
n−2 ∑ (l10 Φ̂i,τ )2 (Π̂τ,L − Πτ,L )
i =1

and " #−1/2


n
−2
∑ (l20 Φ̂i,τ )2 (Π̂τ,S − Πτ,S ).
µ µ
n
i =1

We plot each distribution and compare it with the standard normal distribution. We consider
β ∈ {0.25, 0.50, 0.75, 1} and the same values τ as in the previous subsection. Simulation results
for the two sample sizes n = 500, and n = 1000 are qualitatively similar, and we report only the
case when n = 1000 here. Figures 1–4 report the (simulated) finite sample distributions when
σX2 = 1 and n = 1000 for some selected values of β and τ together with a standard normal
density that is superimposed on each figure. It is clear from these figures that the standard
normal distribution provides an accurate approximation to the distribution of the studentized
test statistic for both the location and scale effects.

Table 2 reports the empirical coverage of 95% confidence intervals for the location and scale
effects. The empirical coverage is close to the nominal coverage in all cases. This is consistent
with Figures 1–4. We may then conclude that the normal approximation can be reliably used for

27
0.5 0.5
exact exact
0.4 normal 0.4 normal

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-4 -2 0 2 4 -4 -2 0 2 4

0.5 0.5
exact exact
0.4 normal 0.4 normal

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-4 -2 0 2 4 -4 -2 0 2 4

Figure 1: Finite sample exact distribution of the studentized location effect statistic when β =
0.25, σX2 = 1, and n = 1000.

0.5 0.5
exact exact
0.4 normal 0.4 normal

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-4 -2 0 2 4 -4 -2 0 2 4

0.5 0.5
exact exact
0.4 normal 0.4 normal

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-4 -2 0 2 4 -4 -2 0 2 4

Figure 2: Finite sample exact distribution of the studentized location effect statistic when β =
0.75, σX2 = 1, and n = 1000.

28
0.5 0.5
exact exact
0.4 normal 0.4 normal

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-4 -2 0 2 4 -4 -2 0 2 4

0.5 0.5
exact exact
0.4 normal 0.4 normal

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-4 -2 0 2 4 -4 -2 0 2 4

Figure 3: Finite sample exact distribution of the studentized scale effect statistic when β = 0.25,
σX2 = 1, and n = 1000.

0.5 0.5
exact exact
0.4 normal 0.4 normal

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-4 -2 0 2 4 -4 -2 0 2 4

0.5 0.5
exact exact
0.4 normal 0.4 normal

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-4 -2 0 2 4 -4 -2 0 2 4

Figure 4: Finite sample exact distribution of the studentized scale effect statistic when β = 0.75,
σX2 = 1, and n = 1000.

29
Table 2: Empirical coverage of 95% confidence intervals for the location and scale effects when
σX2 = 1.

β τ = 0.1 τ = 0.25 τ = 0.50 τ = 0.75 τ = 0.90


n = 500
Location 0.25 0.946 0.950 0.951 0.950 0.947
0.5 0.942 0.952 0.950 0.953 0.938
0.75 0.940 0.954 0.952 0.956 0.937
1 0.937 0.957 0.950 0.957 0.935
Scale 0.25 0.900 0.921 0.973 0.916 0.902
0.5 0.930 0.943 0.957 0.939 0.928
0.75 0.937 0.950 0.954 0.946 0.933
1 0.939 0.952 0.951 0.945 0.933
n = 1000
Location 0.25 0.948 0.951 0.951 0.954 0.945
0.5 0.946 0.950 0.952 0.957 0.943
0.75 0.945 0.952 0.953 0.957 0.940
1 0.941 0.952 0.952 0.958 0.942
Scale 0.25 0.922 0.939 0.965 0.940 0.921
0.5 0.938 0.949 0.955 0.950 0.933
0.75 0.942 0.951 0.952 0.952 0.938
1 0.939 0.952 0.950 0.953 0.940

making inference on the location and scale effects.

5.3 Power of the t-test of a zero scale effect

To investigate the power of the t-test proposed in Corollary 3, we simulate the following model:

Y = α + Xβ + U,

where ! ! !!
X 1 1 0
∼N , .
U 0 0 1

Here we set α = 0, µ X = 1 and ṡ(0) = 1. When β = 0, X is excluded from the outcome equation
and thus the scale effect is 0. The null hypothesis of a zero scale effect corresponds to the case
that β = 0. The power of the test is obtained by varying β around 0 in a grid from −0.4 to 0.4
with an increment of 0.01.
Figure 5 graphs the size-adjusted power of the t-test for different quantile levels when n = 500
and when n = 1000. The power is calculated using the probit specification of FY |X ( Qτ [Y ]| X ). The
size adjustment is based on the empirical critical value such that the test rejects the null 5% of the
time. Figure 5 shows that the power increases as β deviates more from its null value of zero, and
that for a given nonzero value of β, the power increases with the sample size. Results not reported
here show that the test has a quite accurate size in that the empirical rejection probability under
the null is close to 5%, the nominal level of the test.

30
1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
-0.3 -0.2 -0.1 0 0.1 0.2 0.3 -0.3 -0.2 -0.1 0 0.1 0.2 0.3

Figure 5: Size-adjusted power of the t-test for a zero scale effect.

6 Empirical application
In order to illustrate the proposed approach, we use a household labor survey from Wooldridge
(2002) that can be accessed online for replication.3 The idea is to evaluate the effects of education
on the quantile of the unconditional distribution of log wage. In this application, Y = lwage,
which is log hourly wage, and X = educ, which is years of education. The controls are: W =
[exper tenure nonwhite f emale], where exper is years of working experience, tenure is years with
current employer, nonwhite is a dummy that equals 1 if the individual is non-white, and f emale
is a dummy that equals 1 if the individual is female. We assume that Assumption 2 holds for
this choice of W.
While the main goal is to study the scale effect, we also present results for the location effect.
For the mean of years of education µ X , we let µ X = 12.29 based on the Barro-Lee Data on
Educational Attainment.4 We set µ = µ X = 12.29 to study the location and scale effects. In
a similar fashion to the Monte Carlo analysis, we consider τ ∈ {0.10, 0.25, 0.50, 0.75, 0.90}. The
sample size for the household labor survey is n = 526, which is comparable to n = 500 in the
simulation exercises. We compute the standard errors using the approximation in (24).
The most interesting results in Table 3 appear in the unconditional scale effects. As discussed
in Section 2.2, the scale effects can be interpreted as percentage changes of the unconditional
quantiles. Consider the scale effect for τ = 0.10. Both the probit and logit specifications suggest
an effect of about .045. Then, using the quantile-standard deviation elasticity, a 1% decrease in the
standard deviation of education would produce a positive effect of .045% on the unconditional
quantile at the quantile level τ = 0.10. Given that the sample standard deviation of educ is 2.77,
3 See http://fmwww.bc.edu/ec-p/data/wooldridge/wage1.des and http://fmwww.bc.edu/ec-p/data/
wooldridge/wage1.dta for the data in the Stata data file format.
4 The dataset is available from https://databank.worldbank.org/reports.aspx?source=EducationStatistics

We use the series “Barro-Lee: Average years of total schooling, age 25+, total” for the US between 1970-2010 and find
that the average years of schooling is 12.29.

31
Table 3: Effects of location-scale shifts in education on the unconditional quantiles of log-wage.

τ = 0.1 τ = 0.25 τ = 0.50 τ = 0.75 τ = .90


Location (probit) Estimate 0.039 0.062 0.101 0.101 0.118
(0.008) (0.011) (0.015) (0.016) (0.021)
95% CIL 0.025 0.041 0.072 0.069 0.076
95% CIU 0.054 0.083 0.129 0.132 0.160
Location (logit) Estimate 0.038 0.065 0.103 0.100 0.120
(0.007) (0.010) (0.015) (0.016) (0.021)
95% CIL 0.024 0.044 0.074 0.069 0.080
95% CIU 0.053 0.085 0.131 0.132 0.160
Scale (probit) Estimate 0.045 0.029 -0.025 -0.103 -0.203
(0.014) (0.011) (0.013) (0.028) (0.065)
95% CIL 0.018 0.007 -0.051 -0.158 -0.330
95% CIU 0.071 0.052 0.001 -0.049 -0.077
Scale (logit) Estimate 0.045 0.034 -0.024 -0.110 -0.227
(0.014) (0.012) (0.014) (0.029) (0.066)
95% CIL 0.017 0.011 -0.051 -0.167 -0.356
95% CIU 0.072 0.058 0.002 -0.053 -0.099
Notes: Standard errors are in parenthesis.

0.2

0.1

-0.1

-0.2

-0.3

scale
location
-0.4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Figure 6: Point and interval estimates of location and scale effects of education on the uncondi-
tional quantiles of log-wage based on the probit specification: Πτ,S (solid red) and Πτ,L (dashed
blue).

32
the 1% decrease is approximately a change in the standard deviation from 2.77 to 2.74. Consider
now the scale effect for τ = 0.50. In this case, both probit and logit specifications provide a
statistically insignificant effect (at the 5% level). Confront this with the results of Examples 4
and 5 where in the linear model Y = α + Xβ + U, the scale effect is proportional to Qτ [U ].
Thus, Π̂0.50,S ≈ 0 is consistent with a linear model and U symmetric around 0. Finally, consider
the scale effect for τ = 0.90, again using both probit and logit specifications. In this case, the
effects are negative, suggesting a 1% decrease in the standard deviation would reduce the upper
τ = 0.90 quantile by .20% (probit) and .23% (logit). Overall this analysis shows that the scale
effects are monotonically decreasing in τ. This can be seen in Figure 6 that plots, for a finer grid
of τ,5 the probit estimates for both the location (dashed blue) and scale (solid red) effects.
How can this be interpreted? The location effects suggest that the marginal contribution of
one more year of education benefits more the upper parts of the unconditional distribution of
wages. The scale effects suggest the contrary. Reducing the overall dispersion of education would
increase the lower quantile wages, but reduce the upper ones.

7 Conclusion
This paper has provided a general procedure to analyze the distributional impact of changes
in covariates on an outcome variable. The standard unconditional quantile regression analysis
focuses on a particular impact coming from a pure location shift. We study a more general
location-scale model and show how to additively decompose the total effect into a location effect
and a scale effect. They can be separately analyzed and estimated. To complement the existing
results, we focus on how to define and estimate a change in the scale of a covariate. Additionally,
we consider the case of compensated location changes in different covariates. We show how this
can be obtained from the usual vector-valued unconditional quantile regressions. More generally,
we have provided a framework to study the unconditional policy effects generated by a smooth
and invertible intervention of one or more target variables.

References
Autor, D. H., L. S. Katz, and M. S. Kearney (2005): “Rising wage inequality: The role of
composition and prices,” NBER Working Paper 11628.

Casella, G., and R. L. Berger (2001): Statistical Inference, 2nd. edition. Duxbury, Pacific Grove,
CA.

Engle, R., and S. Manganelli (2004): “Conditional Autoregressive Value at Risk by Regression
Quantiles,” Journal of Business and Economics Statistics, 22(4), 367–381.
5 For Figure 6 we use τ = 0.10, 0.11, ..., 0.89, 0.90.

33
Firpo, S., N. Fortin, and T. Lemieux (2009): “Unconditional quantile regression,” Econometrica,
77(3), 953–973.

Fortin, N., T. Lemieux, and S. Firpo (2011): “Decomposition methods in economics,” in Hand-
book of Labor Economics, ed. by O. Ashenfelter, and D. Card, vol. 4, pp. 1–12. Amsterdam: Else-
vier.

Gu, G. W., S. Malik, D. Pozzoli, and V. Rocha (2019): “Trade-induced Skill Polarization,”
Economic Inquiry, 58(1), 241–259.

Hanushek, E. A., and L. Woessmann (2008): “The Role of Cognitive Skills in Economic Devel-
opment,” Journal of Economic Literature, 3(46), 607–668.

Hsu, Y.-C., T.-C. Lai, and R. P. Lieli (2020): “Counterfactual Treatment Effects: Estimation and
Inference,” Journal of Business and Economic Statistics, Forthcoming.

Inoue, A., T. Li, and Q. Xu (2021): “Two Sample Unconditional Quantile Effect,” ARXIV:
https://arxiv.org/pdf/2105.09445.pdf.

Lee, Y.-Y. (2021): “Nonparametric Weighted Average Quantile Derivative,” Econometric Theory,
pp. 1–39.

Machado, J. A. F., and J. Mata (1995): “Counterfactual decomposition of changes in wage


distributions using quantile,” Journal of Applied Econometrics, 20, 445–465.

Martinez-Iriarte, J. (2020): “Sensitivity Analysis in Unconditional Quantile Effects,” Working


Paper.

Martinez-Iriarte, J., and Y. Sun (2021a): “Characterizing Asymptotic Biases of Unconditional


Regression Estimators of Policy Effects Under Endogeneity,” Working Paper.

(2021b): “Identification and Estimation of Unconditional Policy Effects of an Endogenous


Binary Treatment,” Working Paper.

Melly, B. (2005): “Decomposition of differences in distribution using quantile regressions,”


Labour Economics, 12, 577–590.

Rothe, C. (2012): “Partial Distributional Policy Effects,” Econometrica, 80(5), 2269–2301.

Sasaki, Y., T. Ura, and Y. Zhang (2020): “Unconditional Quantile Regression with High Dimen-
sional Data,” Working Paper.

Serfling, R. J. (1980): Approximation Theorems of Mathematical Statistics. New York: Wiley.

Spini, P. (2021): “Robustness, Heterogeneous Treatment Effects and Covariate Shifts,” Working
Paper.

34
van der Vaart, A. (1998): Asymptotic Statistics. Cambridge University Press, Cambridge.

Wooldridge, J. M. (2002): Econometric Analysis of Cross Section and Panel Data. MIT Press, Cam-
bridge, MA.

Appendix
A.1 Proof of Theorem 1

Part (i). To obtain the joint density of ( Xδ , W ), we note that

FXδ ,W ( x, w) = Pr( Xδ ≤ x, W ≤ w) = Pr( X ≤ x δ , W ≤ w) = FX,W ( x δ , w),

and so
∂x δ
f Xδ ,W ( x, w) = · f X,W ( x δ , w) = J ( x δ ; δ) f X,W ( x δ , w).
∂x

Evaluated at δ = 0, J x δ ; δ is 1 and f Xδ ,W ( x, w) is f X,W ( x, w). Given this, we expand f Xδ ,W ( x, w) −
f X,W ( x, w) around δ = 0, which is possible under Assumptions 1(i) and (iii.a). Observing that

∂J ( x δ ; δ) ∂ ∂x δ
= = 0,
∂x δ ∂x ∂x δ
∂J ( x δ ; δ) ∂ ∂x δ ∂ ∂x δ ∂κ ( x )
= = = ,
∂δ δ =0 ∂δ ∂x δ=0 ∂x ∂δ δ =0 ∂x

we have

f Xδ ,W ( x, w) − f X,W ( x, w)
= J ( x δ ; δ) f X,W ( x δ , w) − f X,W ( x, w)
"  #
∂J x δ ; δ ∂x δ ∂J x δ ; δ
=δ + f X,W ( x, w)
∂x δ ∂δ ∂δ δ =0
∂ f X,W ( x, w) ∂x δ
+ δJ ( x; 0) + δR1 ( x, w, δ)
∂x ∂δ δ=0

∂J x δ ; δ ∂ f X,W ( x, w) ∂x δ
=δ f X,W ( x, w) + δJ ( x; 0) + δR1 ( x, w, δ)
∂δ δ =0 ∂x ∂δ δ=0
 
∂κ ( x ) ∂ f X,W ( x, w)
=δ f X,W ( x, w) + κ ( x ) + δR1 ( x, w, δ)
∂x ∂x

= δ [κ ( x ) f X,W ( x, w)] + δR1 ( x, w, δ),
∂x

35
where, for δ̃( x, w) between 0 and δ,
(       )
∂ J x δ ; δ f X,W ( x δ , w) ∂ J x δ ; δ f X,W ( x δ , w)
R1 ( x, w, δ) = − . (A.1)
∂δ δ=δ̃( x,w) ∂δ δ =0


By the continuity of the derivative of J x δ ; δ f X,W ( x δ , w) with respect to δ, we have R1 ( x, w, δ) =
o (1) for each ( x, w) as δ → 0.
∂G( x;δ) 
It remains to show that κ ( x ) = − ∂δ . Differentiating both sides of G x δ ; δ = x with
δ =0
respect to δ, we obtain that

 ! −1 
∂x δ ∂G xδ ; δ ∂G xδ ; δ
=−
∂δ ∂x δ ∂δ

and so  −1
∂G ( x; 0) ∂G ( x; δ) ∂G ( x; δ)

∂x δ
κ ( x ) := =− =− ,
∂δ δ =0 ∂x ∂δ δ =0 ∂δ δ =0

where we have used G ( x; 0) = x.


Part (ii). Consider first the counterfactual distribution FYδ :
ˆ ˆ ˆ
FYδ (y) = 1 { h( x, w, u) ≤ y} f U |Xδ ,W (u| x, w) f Xδ ,W ( x, w)dudxdw,
W X U

where for simplicity we have assumed that the support of X conditional on any W = w does not
depend on w and we have denoted the support by X . By Assumption 1(ii), f U |Xδ ,W (u| x, w) =
f U |X,W (u| x δ , w). So we can write

FYδ (y)
ˆ ˆ ˆ
= 1 { h( x, w, u) ≤ y} f U |Xδ ,W (u| x, w) f Xδ ,W ( x, w)dudxdw
ˆW ˆX ˆU
= 1 { h( x, w, u) ≤ y} f U |X,W (u| x δ , w) f Xδ ,W ( x, w)dudxdw
ˆ ˆ ˆ
W X U

= 1 { h( x, w, u) ≤ y} f U |X,W (u| x, w) f X,W ( x, w)dudxdw


|W X U {z }
= FY (y)
ˆ ˆ ˆ
+ 1 { h( x, w, u) ≤ y} f U |X,W (u| x, w) [ f Xδ ,W ( x, w) − f X,W ( x, w)] dudxdw
ˆW ˆX ˆU h i
+ 1 { h( x, w, u) ≤ y} f U |X,W (u| x δ , w) − f U |X,W (u| x, w) f Xδ ,W ( x, w)dudxdw.
W X U

Hence, we have
FYδ (y) − FY (y)
:= G1,δ (y) + G2,δ (y) ,
δ

36
where
ˆ ˆ ˆ
1
G1,δ (y) = 1 { h( x, w, u) ≤ y} f U |X,W (u| x, w) [ f Xδ ,W ( x, w) − f X,W ( x, w)] dudxdw
δ
ˆW ˆX U
1
= FY |X,W (y| x, w) [ f Xδ ,W ( x, w) − f X,W ( x, w)] dxdw, (A.2)
W X δ

and
ˆ ˆ ˆ
G2,δ (y) = 1 { h( x, w, u) ≤ y}
W X U
1h i
× f U |X,W (u| x δ , w) − f U |X,W (u| x, w) f Xδ ,W ( x, w)dudxdw. (A.3)
δ

We first consider the term G1,δ (y) . Using Part (i) and Assumption 1(iv), we have
ˆ ˆ
∂ [κ ( x ) f X,W ( x, w)]
G1,δ (y) = FY |X,W (y| x, w)
∂x
ˆ Wˆ X
+ FY |X,W (y| x, w) R1 ( x, w, δ)dxdw
W X
ˆ ˆ
∂FY |X,W (y| x, w)
=− κ ( x ) f X,W ( x, w)dxdw
∂x
ˆ ˆW X

+ FY |X,W (y| x, w) R1 ( x, w, δ)dxdw,


W X

where the second equality follows from integration by parts. Under Assumption 1(iii.a), we can
use the dominated convergence theorem to obtain
ˆ ˆ
lim sup FY |X,W (y| x, w) R1 ( x, w, δ)dxdw = 0.
δ→0 y∈Y W X

Thus, we have that G1,δ (y) converges to G1,0 (y), given by


ˆ ˆ
∂FY |X,W (y| x, w)
G1,0 (y) := − κ ( x ) f X,W ( x, w)dxdw
W X ∂x 0

uniformly in y ∈ Y , as δ → 0.
Next, we consider G2,δ (y) . Using Assumption 1(iii.b), we have
h i
f U |X,W (u| x δ , w) − f U |X,W (u| x, w) f X,W ( x δ , w)
∂ f U |X,W (u| x δ , w) ∂x δ
= · δ + δR2 (u, x, w, δ)
f X,W ( x δ , w)
∂x δ0 δ =0 ∂δ
∂ f U |X,W (u| x, w)
= f X,W ( x, w)κ ( x ) δ + δR2 (u, x, w, δ),
∂x 0

37
where

R2 (u, x, w, δ)
   
∂ f U |X,W (u| x δ , w) f X,W ( x δ , w) ∂ f U |X,W (u| x δ , w) f X,W ( x δ , w)
= −
∂δ δ=δ̃(u,x,w) ∂δ δ =0
" #
∂f ( xδ , w) ∂f ( xδ , w)
− f U |X,W (u| x, w) X,W − f U |X,W (u| x, w) X,W .
∂δ δ=δ̃(u,x,w) ∂δ δ =0

Note that in the above, the transpose on x is not relevant but we keep it so that the same lines of
arguments can be used for proving Theorem 3. Hence
ˆ ˆ ˆ
∂ f U |X,W (u| x, w)
G2,δ (y) = 1 { h( x, w, u) ≤ y} f X,W ( x, w)κ ( x ) dudxdw
∂x 0
ˆ Wˆ X ˆ U
+ 1 { h( x, w, u) ≤ y} R2 (u, x, w, δ)dudxdw.
W X U

Under Assumption 1(iii.b), we can invoke the dominated convergence theorem to get
ˆ ˆ ˆ
lim sup 1 {h( x, w, u) ≤ y} R2 (u, x, w, δ)dudxdw = 0.
δ→0 y∈Y W X U

Hence, uniformly in y ∈ Y , as δ → 0, G2,δ (y) converges to


ˆ ˆ ˆ
∂ f U |X,W (u| x, w)
G2,0 (y) = 1 { h( x, w, u) ≤ y} κ ( x ) f X,W ( x, w)dudxdw
W X U ∂x 0
ˆ ˆ ˆ
∂ ln f U |X,W (u| x, w)
= 1 { h( x, w, u) ≤ y} κ ( x ) f U |X,W (u| x, w) f X,W ( x, w)dudxdw
W X U ∂x 0
" #
∂ ln f U |X,W (U | X, W )
= E 1 {h( X, W, U ) ≤ y} κ (X) .
∂X 0

Combining the above results yields

FYδ (y) − FY (y)


δ
→ G10 (y) + G20 (y)
" ! #
∂FY |X,W (y| X, W ) ∂ ln f U |X,W (U | X, W )
=E − + 1 {h( X, W, U ) ≤ y} κ (X)
∂X 0 ∂X 0
:= G (y)

uniformly over y ∈ Y as δ → 0.
Part (iii). Note that ψ (y, τ, FY ) is the influence function of the quantile functional. Using Part

38
(ii) and Assumption 1(v), we have
ˆ ˆ ˆ
Πτ = ψ (y, τ, FY ) dG (y) = ψ (y, τ, FY ) dG1,0 (y) + ψ (y, τ, FY ) dG2,0 (y)
Y Y Y

by Lemma 21.3 in van der Vaart (1998). Now


ˆ ˆ
ψ (y, τ, FY ) dG1,0 (y) = ψ (y, τ, FY ) dG1,0 (y)
Y Y
ˆ ˆ "ˆ #
∂ f Y |X,W (y| x, w)
=− ψ (y, τ, FY ) dy κ ( x ) f X,W ( x, w)dxdw
W X Y ∂x 0
ˆ ˆ ˆ 

=− ψ (y, τ, FY ) f Y |X,W (y| x, w)dy κ ( x ) f X,W ( x, w)dxdw
W X ∂x 0 Y
ˆ ˆ
∂E [ψ (Y, τ, FY ) | X = x, W = w]
=− κ ( x ) f X,W ( x, w)dxdw
W X ∂x 0

and
ˆ
ψ (y, τ, FY ) dG2,0 (y)
Y
ˆ ˆ ˆ ˆ
∂ ln f U |X,W (u| x, w)

= ψ (y, τ, FY ) d1 { h( x, w, u) ≤ y} κ (x)
W X U Y ∂x 0
× f U |X,W (u| x, w) f X,W ( x, w)dudxdw
ˆ ˆ ˆ
∂ ln f U |X,W (u| x, w)
= ψ (h( x, w, u), τ, FY ) κ (x)
W X U ∂x 0
× f U |X,W (u| x, w) f X,W ( x, w)dudxdw.

Therefore,
ˆ ˆ
∂E [ψ (y, τ, FY ) | X = x, W = w]
Πτ = − κ ( x ) f X,W ( x, w)dxdwdy
W X ∂x 0
ˆ ˆ ˆ
∂ ln f U |X,W (u| x, w)
+ ψ (h( x, w, u), τ, FY ) κ (x)
W X U ∂x 0
× f U |X,W (u| x, w) f X,W ( x, w)dudxdw.

A.2 Proof of Theorem 2

The conditional version of Πτ is

1 ∂FY |X,W ( Qτ [Y | x, w]|z, w)


Πτ ( x, w) = κ (x) .
f Y |X,W ( Qτ [Y | x, w]| x, w) ∂z
z= x

39
Let ξ τ ( x, w) be the quantile implied by the matching function in (10). The conditional effect at
this particular quantile is then

1 ∂FY |X,W ( Qτ [Y ]| x, w)
Πξ τ (x,w) ( x, w) = κ (x) .
f Y |X,W ( Qτ [Y ]| x, w) ∂x

But Corollary 1 says that


ˆ ˆ
1 ∂FY |X,W ( Qτ [Y ]| x, w)
Πτ = κ ( x ) f X,W ( x, w) dxdw.
f Y ( Qτ [Y ]) W X ∂x

It follows that we can reweigh Πτ ( x, w) to obtain Πτ :


ˆ ˆ
f Y |X,W ( Qτ [Y ]| x, w)
Πτ = Πξ τ (x,w) ( x, w) f X,W ( x, w)dxdw
W X f Y ( Qτ [Y ])
" #
f Y |X,W ( Qτ [Y ]| X, W )
= E Πξ τ (X,W ) ( X, W ) .
f Y ( Qτ [Y ])

To obtain the second representation, we note that

f Y |X,W ( Qτ [Y ]| x, w)
f X,W ( x, w) = f X,W |Y ( x, w| Qτ [Y ]),
f Y ( Qτ [Y ])

and so we obtain:
ˆ ˆ
Πτ = Πξ τ (x,w) ( x, w) f X,W |Y ( x, w| Qτ [Y ])dxdw
W X
h i
= E Πξ τ (X,W ) ( X, W ) Y = Qτ [Y ] .

A.3 Proof of Theorem 3

The proof of this Theorem is very similar to the proof of Theorem 1. The following decomposition
still holds
FYδ (y) − FY (y)
:= G1,δ (y) + G2,δ (y) ,
δ
where
ˆ ˆ
[ f Xδ ,W ( x, w) − f X,W ( x, w)]
G1,δ (y) = FY |X,W (y| x, w) dxdw,
W X δ
ˆ ˆ ˆ  
f U |X,W (u| x δ , w) − f U |X,W (u| x, w)
G2,δ (y) = 1 {h( x, w, u) ≤ y} f Xδ ,W ( x, w)dudxdw.
W X U δ

We first consider the term G1,δ (y) . Under the assumptions given, we have
h i
f Xδ ,W ( x, w) = det J ( x δ ; δ) f X,W ( x δ , w).

40
Evaluated at δ = 0, f Xδ ,W ( x, w) is f X,W ( x, w). Given this, we expand f Xδ ,W ( x, w) − f X,W ( x, w)
around δ = 0, which is possible under Assumptions 3(i) and (iii). We have

f Xδ ,W ( x, w) − f X,W ( x, w)
   
∂ det J x δ ; δ f X,W ( x δ , w)
=δ + δR1 ( x, w, δ)
∂δ δ =0

∂ det J x δ ; δ
 i  ∂x δ 0 ∂ f
X,W ( x , w )
h δ
δ
=δ f X,W ( x, w) + δ det J ( x ; δ) + δR1 ( x, w, δ)
∂δ δ =0 ∂δ ∂x δ δ =0
 
∂ det J x δ ; δ ∂ f X,W ( x, w)
=δ f X,W ( x, w) + δκ ( x )0 + δR1 ( x, w, δ), (A.4)
∂δ δ =0 ∂x

where, for δ̃( x, w) between 0 and δ,


(         )
∂ det J x δ ; δ f X,W ( x δ , w) ∂ det J x δ ; δ f X,W ( x δ , w)
R1 ( x, w, δ) = − .
∂δ δ=δ̃( x,w) ∂δ δ =0

Using the arguments similar to those in the proof of Theorem 1, we can show that G1,δ (y)
converges to
ˆ ˆ
 
∂ det J x δ ; δ
G1,0 (y) := FY |X,W (y| x, w) f X,W ( x, w)dxdw
W X ∂δ δ =0
ˆ ˆ  δ 0
∂x ∂ f X,W ( x, w)
+ FY |X,W (y| x, w) dxdw
W X ∂δ δ =0 ∂x
(1) (2)
:= G1,0 (y) + G1,0 (y)

uniformly in y ∈ Y , as δ → 0.
∂xiδ
Using Assumption 3 and the fact that ∂x j |δ=0 = 1 {i = j}, we have

  !
∂ det J x δ ; δ ∂ ∂x1δ ∂x2δ ∂x δ ∂x δ
= − 1 2
∂δ δ =0 ∂δ ∂x1 ∂x2 ∂x2 ∂x1 δ =0
!
∂κ1 ( x ) ∂x2δ ∂x1δ
∂κ2 ( x ) ∂κ1 ( x ) ∂x2δ ∂x δ ∂κ2 ( x )
= + − − 1
∂x1 ∂x2 ∂x1 ∂x2 ∂x2 ∂x1 ∂x2 ∂x1 δ =0
∂κ1 ( x ) ∂κ2 ( x )
= + .
∂x1 ∂x2

So ˆ ˆ  
(1) ∂κ1 ( x ) ∂κ2 ( x )
G1,0 (y) = + FY |X,W (y| x, w) f X,W ( x, w)dxdw.
W X ∂x1 ∂x2
Next, note that
 0
∂x δ ∂ f X,W ( x, w) ∂f ( x, w)
= κ ( x )0 X,W .
∂δ δ =0 ∂x ∂x

41
Using integration by parts, we can show that for j = 1 and 2,
ˆ ˆ  
∂ f X,W ( x, w)
FY |X,W (y| x, w) κ j ( x ) dxdw
W X ∂x j
ˆ ˆ  
∂ FY |X,W (y| x, w)κ j ( x )
=− f X,W ( x, w) dxdw.
W X ∂x j

So
ˆ ˆ  
(2) ∂ f X,W ( x, w)
0
G1,0 (y) = FY |X,W (y| x, w) κ ( x ) dxdw
W X ∂x
ˆ ˆ    !
∂ FY |X,W (y| x, w)κ1 ( x ) ∂ FY |X,W (y| x, w)κ2 ( x )
=− f X,W ( x, w) + dxdw
W X ∂x1 ∂x2
ˆ ˆ     !
∂ FY |X,W (y| x, w) ∂ FY |X,W (y| x, w)
=− f X,W ( x, w) κ1 ( x ) + κ2 ( x ) dxdw
W X ∂x1 ∂x2
ˆ ˆ  
∂κ1 ( x ) ∂κ2 ( x )
− f X,W ( x, w) FY |X,W (y| x, w) + dxdw.
W X ∂x1 ∂x2

Therefore,

G1,0 (y)
ˆ ˆ " #
∂FY |X,W (y| x, w) ∂FY |X,W (y| x, w)
=− κ1 ( x ) + κ2 ( x ) f X,W ( x, w)dxdw
W X ∂x1 ∂x2
ˆ ˆ "   #
∂ FY |X,W (y| x, w)
=− κ ( x ) f X,W ( x, w)dxdw
W X ∂x 0
" #
∂FY |X,W (y| X, W )
= −E κ (X) .
∂X 0

For G2,δ (y) , the proof of Theorem 1 remains valid, and we have that G2,δ (y) converges to
" #
∂ ln f U |X,W (U | X, W )
G2,0 (y) := E 1 { h( X, W, U ) ≤ y} κ (X)
∂X 0

uniformly in y ∈ Y , as δ → 0.
Invoking the same argument as that in the proof of Theorem 1, we obtain the desired result.

A.4 Proof of Lemma 1

The main complication in this lemma is that the dependent variable is 1 {Yi ≤ q̂τ }. This means
that the preliminary estimator q̂τ might affect the asymptotic distribution of α̂τ and β̂ τ .

42
As mentioned in the main text, under Assumption 4,

1 n τ − 1 {Yi ≤ Qτ [Y ]} 1 n
q̂τ − Qτ [Y ] = ∑
n i =1 f Y ( Qτ [Y ])
+ o p (n−1/2 ) = ∑ ψ(Yi , τ, FY ) + o p (n−1/2 ).
n i =1

Recall that
n  
θ̂τ = arg max ∑ G ( Zi0 θ ) G ( Zi0 θ )
   
1 {Yi ≤ q̂τ } log + 1 {Yi > q̂τ } log 1 − .
θ ∈ Θ i =1

Let si (θ; q̂τ ) denote the score for observation i. Then, under Assumption 5(i), we have

1 n
n i∑
si (θ̂τ ; q̂τ ) = 0.
=1

Taking a mean-value expansion (element-by-element), we obtain

1 n 1 n 1 n
∑ ∑ ∑

s i ( θ̂ τ ; q̂ τ ) = s i ( θ τ ; q̂ τ ) + Hi (θ̃τ ; q̂τ ) θ̂τ − θτ ,
n n i =1 n i =1
| i=1 {z }
=0

where θ̃τ is between θτ and θ̂τ and can be different for different rows of Hi . Under the assumption
of the uniform law of large numbers for the Hessian (i.e., Assumption 5(ii)), we obtain

1 n p

n i =1
Hi (θ̃τ , q̂τ ) → E[ Hi (θτ ; Qτ [Y ])] =: H.

We have then
1 n

 
0= si (θτ ; q̂τ ) + H θ̂τ − θτ + o p θ̂τ − θτ . (A.5)
n i =1

Now, we use the stochastic equicontinuity in Assumption 5(iii):

1 n 1 n
n i∑ ∑ si (θτ ; Qτ [Y ]) + o p (n−1/2 ).

s i ( θ τ ; q̂ τ ) − E [ s i ( θ τ ; q )] | q = q̂ =
=1
τ
n i =1

Here we have used that E[si (θτ ; Qτ [Y ])] = 0: the score evaluated at the true quantile has expected
value 0. Plugging this back into (A.5), we obtain

1 n

 
0 = E [si (θτ ; q)] |q=q̂τ + si (θτ ; Qτ [Y ]) + H θ̂τ − θτ + o p θ̂τ − θτ . (A.6)
n i =1

Here E [si (θτ ; q)] |q=q̂τ is random because we first compute the expectation E [si (θτ ; q)] for a fixed
q and then replace q by q̂τ , which is random. To show that E [si (θτ ; q)] |q=q̂τ is O p (n−1/2 ), we

43
observe that (see equation 15.18 in Wooldridge (2002))

g( Zi0 θ ) Zi [1 {Yi ≤ q} − G ( Zi0 θ )]


si (θ; q) = . (A.7)
G Zi0 θ 1 − G Zi0 θ
 

Therefore, using the law of iterated expectations, we obtain


" #
g( Zi0 θ ) Zi FY |Z (q| Zi ) − G ( Zi0 θ )

E [si (θ; q)] = E .
G Zi0 θ 1 − G Zi0 θ
 

So " #
g( Zi0 θτ ) Zi f Y |Z ( Qτ [Y ]| Zi )

∂E [si (θτ ; q)]
HQ = =E . (A.8)
G Zi0 θτ 1 − G Zi0 θτ
 
∂q q = Q τ [Y ]

We have

∂E [si (θτ ; q)]


E [si (θτ ; q)] |q=q̂τ = E [si (θτ ; Qτ [Y ])] + (q̂τ − Qτ [Y ]) + o p (n−1/2 )
| {z } ∂q q = Q τ [Y ]
=0 | {z }
= HQ

= HQ (q̂τ − Qτ [Y ]) + o p (n−1/2 ),

which implies that E [si (θτ ; q)] |q=q̂τ = O p (n−1/2 ). Going back to (A.6), we obtain

1 n
n i∑
 
H θ̂τ − θτ + o p θ̂τ − θτ ≤ E [si (θτ ; q)] |q=q̂τ + si (θτ ; Qτ [Y ]) ,
=1

which implies that


θ̂τ − θτ = O p (n−1/2 ).

Furthermore, since H is negative definite, then we have

1 n
n i∑
θ̂τ − θτ = − H −1 si (θτ ; Qτ [Y ]) − H −1 E [si (θτ ; q)] |q=q̂τ +o p (n−1/2 )
=1 | {z }
| {z } Contribution of q̂τ
Usual influence function
1 n
= − H −1
n i =1∑si (θτ ; Qτ [Y ]) − H −1 HQ (q̂τ − Qτ [Y ]) + o p (n−1/2 )

1 n 1 n
= − H −1
n i =1∑si (θτ ; Qτ [Y ]) − H − 1 HQ
n i∑
ψ(Yi , τ, FY ) + o p (n−1/2 ). (A.9)
=1

A.5 Proof of Theorem 4

To establish the joint asymptotic distribution of the estimators of the location and scale effect, we
need to obtain the asymptotic distribution of fˆY (q̂τ ). By Lemma 6 in Martinez-Iriarte and Sun

44
(2021b), we have that

1 n
fˆY (y) − f Y (y) = ∑ Kh (Yi − y) − E [Kh (Y − y)] + B f (y) + o p (h2 ), (A.10)
n i =1

where the bias is ˆ ∞


1 2 002
B fY (y) = h f Y (y) u2 K(u)du.
2 −∞

Moreover, we can write

fˆY (q̂τ ) − fˆY ( Qτ [Y ]) = f˙Y ( Qτ [Y ]) (q̂τ − Qτ [Y ]) + o p (n−1/2 h−1/2 ),

where f˙Y is the derivative of the density. Thus, we have that

fˆY (q̂τ ) − f Y ( Qτ [Y ])
= fˆY (q̂τ ) − fˆY ( Qτ [Y ]) + fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ])
= f˙Y ( Qτ [Y ]) (q̂τ − Qτ [Y ]) + fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ]) + o p (n−1/2 h−1/2 ). (A.11)

The first term captures the uncertainty associated with estimating the quantile, and the second
term captures the uncertainty associated with estimating the density.
Next, we can write the location and scale effects as
! ! " #
Π̂τ,L Πτ,L n−1 ∑in=1 g( Zi0 θ̂τ )α̂τ X̃i E g( Zi0 θτ )ατ X̃i

− = Dµ − .
Π̂τ,S Πτ,S fˆY (q̂τ )
µ µ
f Y ( Qτ [Y ])

Now

n−1 ∑in=1 g( Zi0 θ̂τ )α̂τ X̃i E g( Zi0 θτ )ατ X̃i


 

fˆY (q̂τ ) f Y ( Qτ [Y ])
n−1 ∑in=1 g( Zi0 θ̂τ )α̂τ X̃i − E g( Zi0 θτ )ατ X̃i  fˆ (q̂τ ) − f Y ( Qτ [Y ])
   
− E g( Zi0 θτ )ατ X̃i Y

=
fˆY (q̂τ ) fˆY (q̂τ ) f Y ( Qτ [Y ])
n
n−1 ∑i=1 g( Zi0 θ̂τ )α̂τ X̃i − E g( Zi0 θτ )ατ X̃i
   
=
fˆY (q̂τ )
E g( Zi0 θτ )ατ X̃i n ˙
  o
− f Y ( Q τ [ Y ]) ( q̂ τ − Q τ [ Y ]) + ˆY ( Qτ [Y ]) − f Y ( Qτ [Y ]) + o p (n−1/2 h−1/2 ).
f
2
f Y ( Qτ [Y ])

Taking a mean-value expansion (element-by-element), we have

1 n 1 n
n i∑ ∑ g(Zi0 θτ )ατ X̃i
0
g ( Z θ̂ )
i τ τ i α̂ X̃ =
=1
n i =1
! !
1 n 1 n
n i∑ n i∑
+ ġ( Zi0 θ̃τ )α̃τ X̃i Zi0 (θ̂τ − θτ ) + g( Zi0 θ̃τ ) X̃i (α̂τ − ατ ).
=1 =1

45
Using the uniform law of large numbers in Assumption 5(iv), we have

1 n p
∑ ġ( Zi0 θ̃τ )α̃τ X̃i Zi0 → M1 := E ġ( Zi0 θτ )ατ X̃i Zi0
 
n i =1 | {z }
2×dim( Z )

and
1 n p
∑ g( Zi0 θ̃τ ) X̃i → M2 := E g( Zi0 θτ ) X̃i .
 
n i =1 | {z }
2×1

Therefore,
!
√ 1 n
∑ g( Zi0 θ̂τ )α̂τ X̃i − E g( Zi0 θτ )ατ X̃i
 
n
n i =1
!
√ 1 n √ √
∑ g( Zi0 θτ )ατ X̃i − E g( Zi0 θτ )ατ X̃i
 
= n + M1 n(θ̂τ − θτ ) + M2 n(α̂τ − ατ ) + o p (1).
n i =1

The first term captures the uncertainty in estimating the expected value, and the second and
third terms capture the uncertainty in estimating the logit/probit model, and it has already
incorporated the contribution of the preliminary estimator q̂τ of Qτ [Y ]. To ease notation, define
M := M1 + ( M2 , O) where O is a 2 × dim(W ) matrix of zeros. An explicit expression of M is
given in Assumption 5(iv). Thus, we can write:
!
√ 1 n
∑ g( Zi0 θ̂τ )α̂τ X̃i − E g( Zi0 θτ )ατ X̃i
 
n
n i =1
!
√ 1 n √
∑ g( Zi0 θτ )ατ X̃i − E g( Zi0 θτ )ατ X̃i
  
= n + M n θ̂τ − θτ + o p (1). (A.12)
n i =1

It then follows that


! ! " #
Π̂τ,L Πτ,L 1 1 n
n i∑
0 0
 
− = Dµ g( Zi θτ )ατ X̃i − E g( Zi θτ )ατ X̃i
Π̂τ,S Πτ,S
µ µ
f Y ( Qτ [Y ]) =1
!
1  Πτ,L f˙Y ( Qτ [Y ])
+ Dµ M θ̂τ − θτ − (q̂τ − Qτ [Y ])
Πτ,S f Y ( Qτ [Y ])
µ
f Y ( Qτ [Y ])
!ˆ 
Πτ,L f Y ( Q τ [ Y ]) − f Y ( Q τ [ Y ])
− + o p (n−1/2 ) + o p (n−1/2 h−1/2 ).
Πτ,S
µ
f Y ( Qτ [Y ])

46
√ 
Plugging the asymptotic representation of n θ̂τ − θτ in (A.9), we obtain:
! ! " #
Π̂τ,L Πτ,L 1 1 n
n i∑
0 0
 
− = Dµ g( Zi θτ )ατ X̃i − E g( Zi θτ )ατ X̃i
Π̂τ,S Πτ,S
µ µ
f Y ( Qτ [Y ]) =1
1 1 n
− Dµ MH −1 ∑ si (θτ ; Qτ [Y ])
f Y ( Qτ [Y ]) n i =1
" ! #
Πτ,L f˙Y ( Qτ [Y ]) 1 1 n
n i∑
−1
− + D µ MH H Q ψ(Yi , τ, FY )
Πτ,S f Y ( Qτ [Y ])
µ
f Y ( Qτ [Y ]) =1
!
Πτ,L fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ])
− + o p (n−1/2 ) + o p (n−1/2 h−1/2 ).
Πτ,S
µ
f Y ( Q τ [ Y ])

Plugging the representation of fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ]) in (A.10) completes the proof.

A.6 Proof of Corollary 3

The result has been proved in the main text. Here we give the expressions for M̂, Ĥ, and ĤQ .
For M̂ and Ĥ, we have
!
1 n ġ( Zi0 θ̂τ )α̂τ Xi0 + g( Zi0 θ̂τ ), ġ( Zi0 θ̂τ )α̂τ Wi0
M̂ = ∑
n i =1 ġ( Zi0 θ̂τ )α̂τ Xi Xi0 + g( Zi0 θ̂τ ) Xi , ġ( Zi0 θ̂τ )α̂τ Xi Wi0

and !
1 n g( Zi0 θ̂τ )2 Xi2 Xi Wi0
Ĥ = ∑ .
n i=1 G ( Zi0 θ̂τ )(1 − G ( Zi0 θ̂τ )) Xi Wi Wi Wi0

For ĤQ , we note that


" #
∂E [si (θτ ; q)] g( Zi0 θτ ) Zi f Y |Z ( Qτ [Y ]| Zi )
HQ = =E  .
G ( Zi0 θτ ) 1 − G ( Zi0 θτ )

∂q q = Q τ [Y ]

Let
g( Zi0 θτ ) Zi
Λ( Zi , θτ ) := .
G ( Zi0 θτ ) 1 − G ( Zi0 θτ )


47
Then

HQ = E[Λ( Zi , θτ ) f Y |Z ( Qτ [Y ]| Zi )]
ˆ
= Λ(z, θτ ) f Y |Z ( Qτ [Y ]|z) f Z (z)dz
Z
ˆ
f Y,Z ( Qτ [Y ], z)
= f Y ( Qτ [Y ]) Λ(z, θτ ) f Z (z)dz
Z f Y ( Qτ [Y ]) f Z (z)
ˆ
= f Y ( Qτ [Y ]) Λ(z, θτ ) f Z|Y (z| Qτ [Y ])dz
Z
= f Y ( Qτ [Y ]) E[Λ( Z, θτ )|Y = Qτ [Y ])].

To estimate the conditional expectation, we may use a vector version of the Nadaraya-Watson
estimator:
∑in=1 Kh (Yi − q̂τ )Λ( Zi , θ̂τ )
Ê[Λ( Z, θ̂τ )|Y = q̂τ ] = ,
∑in=1 Kh (Yi − q̂τ )
where Kh is the rescaled kernel Kh (Yi − y) = h−1 K ((Yi − y)/h) for a kernel function K (·) . We
can then estimate HQ by

ĤQ = fˆY (q̂τ ) Ê[Λ( Z, θ̂τ )|Y = q̂τ ]


" #
1 n ∑n K (Yi − q̂τ )Λ( Zi , θ̂τ )
= ∑
n i =1
Kh (Yi − q̂τ ) · i=1 nh
∑i=1 Kh (Yi − q̂τ )
1 n
n i∑
= Kh (Yi − q̂τ )Λ( Zi , θ̂τ ). (A.13)
=1

It is worth pointing out that, in the logistic case, G (z) = (1 + exp (−z))−1 , we have the convenient
identity g(z) = G (z)(1 − G (z)). Thus, Λ( Zi , θ̂τ ) = Zi and the estimation of H and HQ becomes
simpler.

A.7 Proof of Theorem 5

The proof of this theorem is similar to that of Theorem 4. We outline the main steps and omit
the details here. We have
! ! " #
Π̂τ,L,1 Πτ,L,1 n−1 ∑in=1 g( Zi0 θ̂τ )α̂τ E [ g( Zi0 θτ )ατ ]
− = DL − .
Π̂τ,L,2 Πτ,L,2 fˆY (q̂τ ) f Y ( Qτ [Y ])

48
But

n−1 ∑in=1 g( Zi0 θ̂τ )α̂τ E [ g( Zi0 θτ )ατ ]



fˆY (q̂τ ) f Y ( Qτ [Y ])
n−1 ∑in=1 g( Zi0 θ̂τ )α̂τ − E [ g( Zi0 θτ )ατ ]
 
=
f Y ( Qτ [Y ])
0
E [ g( Zi θτ )ατ ] ˙n o
− 2
f Y ( Q τ [ Y ]) ( q̂ τ − Q τ [ Y ]) + ˆ
f Y ( Q τ [ Y ]) − f Y ( Q τ [ Y ]) + o p (n−1/2 h−1/2 ).
f Y ( Qτ [Y ])

Now

1 n 1 n

n i =1
g( Zi0 θ̂τ )α̂τ = ∑ g( Zi0 θτ )ατ
n i =1
! !
1 n 1 n
n i∑ n i∑
+ ġ( Zi0 θ̃τ )α̃τ Zi0
(θ̂τ − θτ ) + g( Zi0 θ̃τ ) (α̂τ − ατ )
=1 =1
n
1  
= ∑ g( Zi0 θ̂τ )ατ + ML (θ̂τ − θτ ) + o p n−1/2 .
n i =1

Therefore,
! !
Π̂τ,L,1 Πτ,L,1

Π̂τ,L,2 Πτ,L,2
" #
1 1 n
n i∑
0 0
 
= DL g( Zi θτ )ατ − E g( Zi θτ )ατ
f Y ( Qτ [Y ]) =1
1 1 n
− DL ML H −1 ∑ si (θτ ; Qτ [Y ])
f Y ( Qτ [Y ]) n i =1
" ! #
Πτ,L,1 f˙Y ( Qτ [Y ]) 1 1 n
n i∑
− + D L M L H − 1 HQ ψ(Yi , τ, FY )
Πτ,L,2 f Y ( Qτ [Y ]) f Y ( Qτ [Y ]) =1
!
Πτ,L,1 fˆY ( Qτ [Y ]) − f Y ( Qτ [Y ])
− + o p (n−1/2 ) + o p (n−1/2 h−1/2 ).
Πτ,L,2 f Y ( Qτ [Y ])

Combining this with (A.10) leads to the desired result.

49
Supplementary Appendix
S.1 Details of Example 4

Let ε ∼ N (0, 1). Before the location-scale shift,


q
Y = α + Xβ + U ∼ N (α, 1 + β2 ) := α + 1 + β2 ε,
p
and the τ-quantile Qτ [Y ] of Y is α + 1 + β2 eτ where eτ is the τ-quantile of ε. After the location-
scale shift with
Xδ = X/s (δ) + ` (δ) ∼ N (` (δ) , s−2 (δ)),

we have
q
Yδ = α + Xδ β + U ∼ N α + β` (δ) , 1 + β2 s−2 (δ) := α + β` (δ) + 1 + β2 s−2 (δ)ε,
 

p
and the τ-quantile Qτ [Yδ ] of Yδ is α + β` (δ) + 1 + β2 s−2 (δ)eτ . Hence
p p
β` (δ) + 1 + β 2 s −2 ( δ ) e τ − 1 + β2 eτ
Πτ = lim
δ →0 δ
β 2
= `˙ (0) β − ṡ (0) p Q τ [U ]
β2 + 1
β2 Q τ [Y ] − α
= `˙ (0) β − ṡ (0) p p
β2 + 1 β2 + 1
β2
= `˙ (0) β − ṡ (0) ( Q τ [Y ] − α )
β2 + 1
:= Πτ,L + Πτ,S ,

2
where Πτ,L = β`˙ (0) is the location effect and Πτ,S = −ṡ (0) β2 +1 ( Qτ [Y ] − α) is the scale effect.
β

Next, we have
Cov( X, Y ) β
E [ X |Y = y ] = (y − α) = 2 (y − α) .
Var (Y ) β +1
Taking y = Qτ [Y ] yields
β
E[ X |Y = Qτ [Y ]] = ( Q τ [Y ] − α ) .
β2 + 1
Therefore, we obtain the alternative expression Πτ,S = −ṡ (0) E[ Xβ|Y = Qτ [Y ]].

S.2 Details of Example 5

When FY |X ( Qτ [Y ]| x ) = G ( aτ + bτ x ) for a standard normal cdf G. We have

ṡ(0)
Πτ,S σ2 b2 E [ ġ( aτ + bτ X )] ,
µ
X
= (S.1)
f Y ( Qτ [Y ]) X τ

1
where  2
1 y
g(y) = √ exp − and ġ(y) = − g(y)y.
2π 2
Therefore,
ˆ ∞
"  #!
x − µX 2

1 1 2
E [ ġ( aτ + bτ X )] = − ( aτ + bτ x ) exp − ( aτ + bτ x ) + dx.
2πσX −∞ 2 σX

First, we complete the squares to recover a Gaussian pdf. We have


2
x − µX µ2X
    
2 1 µX
( a τ + bτ x ) + = bτ2 + 2 2
x + 2 a τ bτ − 2 x + bτ2 + .
σX σX σX σX2

Define

1 −1
 
K1,τ := bτ2
+ 2 ,
σX
 
µX
K2,τ := −K1,τ aτ bτ − 2 ,
σX

µ 2 
K3,τ := K1,τ bτ2 + X .
σX2

Then, we have
2
x − µX

2 −1
x2 − 2K2,τ x + K3,τ

( a τ + bτ x ) + = K1,τ
σX
−1
x2 − 2K2,τ x + K2,τ
2 2

= K1,τ − K2,τ + K3,τ
−1
( x − K2,τ )2 + K1,τ
−1 2

= K1,τ K3,τ − K2,τ .

It then follows that


" 2 #!


1 x µ X
exp − ( a τ + bτ x ) 2 +
2 σX
 
1 h −1 i
= exp − K1,τ ( x − K2,τ )2 + K1,τ
−1 2
K3,τ − K2,τ
2
 
p h
−1 2
i 1 1 −1 2
= 2πK1,τ exp K1,τ K3,τ − K2,τ · p exp − K1,τ ( x − K2,τ ) .
2πK1,τ 2

2
Next, we go back to the integral that we are interested in. For X ∼ N (K2,τ , K1,τ ), we have

E [ ġ( aτ + bτ X )]
1 p h
−1
i
= −√ K1,τ exp K1,τ K3,τ − K2,τ 2
× E ( a τ + bτ X )
2πσX
h i
−1 2
p 
K1,τ exp K1,τ K3,τ − K2,τ
=− √ ( aτ + bτ K2,τ ) .
2πσX

Now, consider the case where Y = α + Xβ + U and X ⊥ U, and U is a standard normal. Note
that

FY |X ( Qτ [Y ]| x ) = Pr (α + Xβ + U < Qτ [Y ]| X = x )
= Pr (U < Qτ [Y ] − α − xβ| X = x ) = G ( Qτ [Y ] − α − xβ).

So, in this case, aτ = Qτ [Y ] − α, bτ = − β. Therefore,

1 −1 1 −1
   
K1,τ := bτ2
+ 2 2
= β + 2
σX σX
   
µX µX
K2,τ := −K1,τ aτ bτ − 2 = K1,τ ( Qτ [Y ] − α) β + 2
σX σX

µ 2  
µ 2 
K3,τ := K1,τ a2τ + X 2
= K1,τ ( Qτ [Y ] − α)2 + X2 .
σX σX

Now, by some simple algebra, we have


 2
µX
µ2X ( Q τ [Y ] − α ) β + σX2
−1 2
= ( Q τ [Y ] − α ) 2 +

K1,τ K3,τ − K2,τ −
σX2 β2 + 1
σX2

( Q τ [Y ] − α − µ X β ) 2
= .
σX2 β2 + 1

Thus, we have

E [ ġ( aτ + bτ X )]
( Q [Y ]−α−µ β)2 ( Q [Y ]−α−µ β)2
h i h i
K1,τ exp − 21 τ σ2 β2 +1 X K1,τ exp − 21 τ σ2 β2 +1 X
p p
= − ( Q τ [Y ] − α ) √ X
+β √ X
K2,τ
2πσX 2πσX
 µ

( Qτ [Y ] − α) β + σX2
X 
= f Y ( Qτ [Y ]) − ( Qτ [Y ] − α) + β
β2 + σ12
X

α + µ X β − Q τ [Y ]
= f Y ( Qτ [Y ]) ,
σX2 β2 + 1

3
where we have used

( Q [Y ]−α−µ β)2 ( Q [Y ]−α−µ β)2


h i h i
exp − 12 τ σ2 β2 +1 X K1,τ exp − 12 τ σ2 β2 +1 X
p
f Y ( Qτ [Y ]) = q X
= √ X
.
2 2πσ

2π σX β + 12 X

Going back to (S.1), we obtain

ṡ(0)
Πτ,S σ2 b2 E [ ġ( aτ + bτ X )]
µ
X
=
f Y ( Qτ [Y ]) X τ
α + µ X β − Q τ [Y ]
= ṡ(0)σX2 β2
σX2 β2 + 1
h q i
α + µ X β − α + µ X β + σX2 β2 + 1Qτ (U )
= ṡ(0)σX2 β2
σX2 β2 + 1
σX2 β2
= −ṡ(0) q Q τ [U ] ,
σX2 β2 + 1
q
where we have used Qτ [Y ] = α + µ X β + σX2 β2 + 1Qτ (U ) .

You might also like