Quantile Methods Slides 2024
Quantile Methods Slides 2024
Class Notes
Manuel Arellano
Revised: January 14, 2024
Introduction
Part 2 contains a more formal development and some large sample results.
2
Preview of basic concepts
Given some data fy1 , ..., yn g consider the ordered values y(1 ) y (2 ) ... y (n ) .
More generally, letting k = bτ (n + 1)c be the greatest integer less than or equal to
τ (n + 1) for some τ 2 (0, 1), the τth sample quantile is
bτ = y(k ) .
q
b0.5 .
Thus, the median is the τ = 0.5 quantile q
b0.5 = 175, q
we have q b0.25 = 160 and q
b0.75 = 185.
Contrary to the mean, the median is una¤ected by changes in the extreme values.
3
Median and quantiles as optimal predictors
Other quantiles are also optimal predictors that minimize weighted mean absolute
error in which positive errors are weighted by τ and negative errors by 1 τ:
h i
n
bτ = arg min ∑i =1 τ (yi a )+ + (1 τ ) (yi a )
q
a
+
where (yi a ) equals jyi a j if yi a 0 and zero otherwise. Similarly, (yi a)
equals jyi a j if yi a 0 and zero otherwise.
4
Median and quantile regression
If we have data on two variables fyi , xi gni=1 a linear median regression solves:
5
Part 1
6
Conditional quantile function
Suppose that X takes on K di¤erent values x1 , x2 , ..., xK and that for each of those
values we have Mk observations of Y : yk 1 , ..., ykM k .
If the relationship between Y and X is exact, the values of Y for a given value of X
will all coincide, so that we could write
Y = q (X ).
However, in general units having the same value of X will have di¤erent values of Y .
Suppose that yk 1 yk 2 ... ykM k , so the fraction of observations that are less
than or equal to ykm is ukm = m/Mk .
It can then be said that a value of Y does not only depend on the value of X but also
on the rank ukm of the observation in the distribution of Y given X = xk .
7
Conditional quantile function (continued)
The distribution of the ranks U is always the same regardless of the value of X , so
that X and U are statistically independent.
An example is a growth chart where Y is body weight and X is age (Figure 1).
It contains the same information as the conditional cdf (it is its inverse), but is in the
form of a statistical equation for outcomes that may be related to economic models.
8
Quantile function of normal linear regression
Y = α + βX + V with V j X N 0, σ2 ,
In this case all quantiles are linear and parallel, a situation that is at odds with the
growth chart example.
10
Linear quantile regression (QR)
The linear QR model postulates linear dependence on X but allows for a di¤erent
slope and intercept at each quantile u 2 (0, 1)
q (x , u ) = α (u ) + β (u ) x (1)
In the normal linear regression β (u ) = β and α (u ) = α + σΦ 1 (u ).
In linear regression one estimates α and β by minimizing the sum of squares of the
residuals Yi a bXi (i = 1, ..., n ).
In QR one estimates α (u ) and β (u ) for …xed u by minimizing a sum of absolute
residuals where (+) residuals are weighted by u and (-) residuals by 1 u.
Its rationale is that a quantile minimizes expected asymmetric absolute value loss.
For the median u = 0.5, so estimates of α (0.5), β (0.5) are least absolute deviations.
All observations are involved in determining the estimates of α (u ), β (u ) for each u.
Under
p random sampling and standard regularity conditions, sample QR coe¢ cients
are n-consistent and asymptotically normal.
Standard errors can be easily obtained via analytic or bootstrap calculations.
The popularity of linear QR is due to its computational simplicity: computing a QR is
a linear programming problem (Koenker 2005).
11
Linear quantile regression (QR) (continued)
However, a linear QR can also be seen as a semiparametric random coe¢ cient model
with a single unobserved factor:
Y i = α (U i ) + β (U i ) X i
For example, this model determines log earnings Yi as a function of years of schooling
Xi and ability Ui , where β (Ui ) represents an ability-speci…c return to schooling.
This is a model that can capture interactions between observables and unobservables.
12
Part 2
13
I. Unconditional quantiles
Q τ [g (Y )] = g [Q τ (Y )] .
Pr [Y Q τ (Y )] = τ ) Pr (g (Y ) g [Q τ (Y )]) = τ.
14
Asymmetric absolute loss
Let us de…ne the “check” function (or asymmetric absolute loss function). For
τ 2 (0, 1)
15
Sample quantiles
Given a random sample fY1 , ..., YN g we obtain sample quantiles replacing F by the
empirical cdf:
1 N
N i∑
F N (r ) = 1(Y i r ).
=1
bτ = FN 1 (τ )
That is, we choose q inf fr : FN (r ) τ g, which minimizes
Z
1 N
N i∑
sN (r ) = ρ τ (y r ) dFN (y ) = ρ τ (Y i r) .
=1
16
Linear program representation
bτ is
An alternative presentation of the minimization leading to q
N
min ∑
r ,u i+ ,u i i =1
τui+ + (1 τ ) ui
subject to
Yi r = ui+ ui , ui+ 0, ui 0, (i = 1, ..., N )
2N
where here ui+ , ui denote 2N arti…cial additional arguments, which allow us to
i =1
represent the original problem in the form of a linear program.
We are using the notation ρτ (u ) = τu + + (1 τ ) u with u + = 1 (u 0) ju j and
u = 1 (u < 0 ) ju j.
Note that
u+ u = 1 (u 0 ) ju j 1 (u < 0 ) ju j = 1 (u 0) u + 1 (u < 0) u = u.
A linear program takes the form:
min c 0 x subject to Ax b, x 0.
x
The simplex algorithm for numerical solution of this problem was created by George
Dantzig in 1947.
17
Computing quantile regression
Suppose that we want to compute a simple median regression from the following data:
2 3
y1 x1
6 y2 x2 7
6 7
6 . .. 7
4 .. . 5
yn xn
The sample objective function sN (r ) is continuous but not di¤erentiable for all r .
Moreover, the gradient or moment condition
1 N
N i∑
bN (r ) = [1(Y i r) τ]
=1
is not continuous in r .
Note that if each Yi is distinct, so that we can reorder the observations to satisfy
Y1 < Y2 < ... < YN , for all τ we have
1
jbN (qbτ )j jFN (qbτ ) τ j .
N
Despite lack of smoothness in sN (r ) or bN (r ), smoothness of the distribution of the
data can smooth their population counterparts.
Suppose that F is di¤erentiable at qτ with positive derivative f (qτ ), then s0 (r ) is
twice continuously di¤erentiable with derivatives:
d
E [ρτ (Y r )] = τ [1 F (r )] + (1 τ ) F (r ) = F (r ) τ E [1(Y r ) τ]
dr
d2
E [ρτ (Y r )] = f (r ) .
dr 2
19
Consistency
Since a sample quantile does not have a closed form expression we need a method for
establishing the consistency of an estimator that maximizes an objective function.
A theorem taken from Newey and McFadden (1994) provides such a method.
20
Asymptotic normality
The asymptotic normality of sample quantiles cannot be established in the standard
way because of the nondi¤erentiability of the objective function.
However, it has long been known that under suitable conditions sample quantiles are
asymptotically normal and there are direct approaches to establish the result.
Here we just re-state the asymptotic normality result for unconditional quantiles
following results on nonsmooth GMM around Newey and McFadden’s theorems.
The idea is that as long as the limiting objective function is di¤erentiable the approach
for di¤erentiable problems works if a stochastic equicontinuity assumption holds.
Fix 0 < τ < 1. If F is di¤erentiable at qτ with positive derivative f (qτ ), then
p 1 N
1 (Y i qτ ) τ
bτ
N (q qτ ) = p
N
∑ f (q τ )
+ op (1 ) .
i =1
Consequently, !
p d τ (1 τ)
bτ
N (q qτ ) ! N 0, .
[f (qτ )]2
The term τ (1 τ ) in the numerator of the asymptotic variance tends to make q bτ
more precise in the tails, whereas the density term in the denominator tends to make
bτ less precise in regions of low density.
q
Typically the latter e¤ect will dominate so that quantiles closer to the extremes will
be estimated with less precision.
21
Computing standard errors
The asymptotic normality result justi…es the large N approximation
b bτ ) p
f (q
p N (qbτ qτ ) N (0, 1)
τ (1 τ )
where b bτ ) is a consistent estimator of f (qτ ).
f (q
Since
F (r + h ) F (r h ) 1
f (r ) = lim lim E [1(jY rj h )] ,
h !0 2h h !0 2h
an obvious possibility is to use the histogram estimator
N
F (r + hN ) F N (r hN ) 1
f (r ) = N
b
2hN
= ∑
2NhN i =1
[1(Y i r + hN ) 1 (Y i r hN )]
N
1
= ∑
2NhN i =1
1(jYi r j hN )
23
Conditional quantiles in a location-scale model
24
Conditional quantiles in a location-scale model (continued)
Under homoskedasticity, ∂Q τ (Y j X ) /∂Xj is the same at all quantiles since they only
di¤er by a constant term.
More generally, in a location-scale model the relative change between two quantiles
∂ ln [Q τ1 (Y j X ) Q τ2 (Y j X )] /∂Xj is the same for any pair (τ 1 , τ 2 ).
25
Structural representation
26
III. Quantile regression
A linear regression is an optimal linear predictor that minimizes average quadratic
loss. Given data fYi , Xi gN
i =1 OLS sample coe¢ cients are given by
N
∑
b 2
βOLS = arg min Yi Xi0 b .
b i =1
If E (Y j X ) is linear it coincides with the least squares population predictor, so that
b
βOLS consistently estimates ∂E (Y j X ) /∂X .
For robustness in the regression context one may be interested in median regression.
That is, an optimal predictor that minimizes average absolute loss:
N
b
βLAD = arg min
b
∑ Yi Xi0 b .
i =1
If med (Y j X ) is linear it coincides with the least absolute deviation (LAD)
population predictor, so that b βLAD consistently estimates ∂med (Y j X ) /∂X .
The idea can be generalized to quantiles other than τ = 0.5 by considering optimal
predictors that minimize average asymmetric absolute loss:
N
b
β (τ ) = arg min
b
∑ ρτ Yi Xi0 b .
i =1
As before if Q τ (Y j X ) is linear, b
β (τ ) consistently estimates ∂Q τ (Y j X ) /∂X .
27
Asymptotic inference for quantile regression
The …rst and second derivatives of the limiting objective function are:
∂
E ρτ Y X 0 b = E X 1 Y X 0b τ
∂b
∂2
E ρτ Y X 0 b = E f X 0 b j X XX 0 = H (b )
∂b∂b 0
Moreover, under some regularity conditions we can use Newey and McFadden’s
asymptotic normality theorem, leading to
p h i 1 N
N bβ (τ ) β (τ ) = H0 1 p ∑ Xi 1 Yi Xi0 β (τ ) τ + op (1 ) .
N i =1
where H0 = H ( β (τ )) is the Hessian of the limit objective function at the truth, and
N
1 d
p
N
∑ Xi 1 Yi Xi0 β (τ ) τ ! N (0, V0 )
i =1
where
2
V0 = E 1 Yi Xi0 β (τ ) τ Xi Xi0 = τ (1 τ ) E Xi Xi0 .
The last equality follows under the assumption of linearity of conditional quantiles.
Thus,
p h i
d
N b β (τ ) β (τ ) ! N (0, W0 ) with W0 = H0 1 V0 H0 1 .
28
Getting consistent standard errors
29
Getting consistent standard errors (continued)
where
N
1
b
fU τ ( 0 ) = ∑
2NhN i =1
1( Yi Xi0 bβ (τ ) hN ).
30
Part 3
Further topics
31
Flexible QR
q (x , u ) = θ 0 (u ) + θ 1 (u ) g1 (x ) + ... + θ P (u ) gP (x ) .
This type of speci…cation may be seen as an approximating model that becomes more
accurate as P increases, or simply as a parametric ‡exible model of the quantile
function.
From the point of view of computation the model is still a linear QR, but the
regressors are now functions of X instead of the X s themselves.
32
Decompositions
yM 0
yF = xM βM x F0 βF = x F0 ( βM βF ) + (x M x F ) 0 βM
33
Machado-Mata method
To do so we can use the following simulation method (Machado and Mata 2005):
2) Get b β (um ) by running QRs from the actual data fyi , xi gni=1 .
β (u1 ) , ..., b
4) Compute yj = xj b
β uj for j = 1, 2, ..., m.
5) The sample quantiles of y1 , ..., ym are consistent estimates of the marginal quantiles
of yi for large n and m.
34
Other topics not covered
Functional inference.
35