0% found this document useful (0 votes)
49 views35 pages

Quantile Methods Slides 2024

This document provides a comprehensive overview of quantile regression (QR), detailing its applications in studying conditional distributions and its mathematical foundations. It discusses the properties of quantiles, the optimization techniques used in QR, and the implications of using QR in various statistical models. The document is structured into parts that introduce basic concepts, formal developments, and practical applications of quantile methods.

Uploaded by

jorgebac1718
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views35 pages

Quantile Methods Slides 2024

This document provides a comprehensive overview of quantile regression (QR), detailing its applications in studying conditional distributions and its mathematical foundations. It discusses the properties of quantiles, the optimization techniques used in QR, and the implications of using QR in various statistical models. The document is structured into parts that introduce basic concepts, formal developments, and practical applications of quantile methods.

Uploaded by

jorgebac1718
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Quantile methods

Class Notes

Manuel Arellano
Revised: January 14, 2024
Introduction

In this presentation I provide a review of quantile regression (QR).

Quantile regression is a useful tool for studying conditional distributions.

Part 1 provides an informal introduction to conditional quantiles and QR.

Part 2 contains a more formal development and some large sample results.

Part 3 considers ‡exible QR and decompositions.

2
Preview of basic concepts

Given some data fy1 , ..., yn g consider the ordered values y(1 ) y (2 ) ... y (n ) .

The median is y( n ) if n is even and y( n +1 ) if n is odd.


2 2

More generally, letting k = bτ (n + 1)c be the greatest integer less than or equal to
τ (n + 1) for some τ 2 (0, 1), the τth sample quantile is

bτ = y(k ) .
q

b0.5 .
Thus, the median is the τ = 0.5 quantile q

For example, with n = 8 and

f155, 160, 168, 175, 183, 185, 191, 195g ,

b0.5 = 175, q
we have q b0.25 = 160 and q
b0.75 = 185.

Contrary to the mean, the median is una¤ected by changes in the extreme values.

3
Median and quantiles as optimal predictors

The median is an optimal predictor that minimizes mean absolute error:


n
b0.5 = arg min ∑i =1 jyi
q aj .
a

Other quantiles are also optimal predictors that minimize weighted mean absolute
error in which positive errors are weighted by τ and negative errors by 1 τ:
h i
n
bτ = arg min ∑i =1 τ (yi a )+ + (1 τ ) (yi a )
q
a

+
where (yi a ) equals jyi a j if yi a 0 and zero otherwise. Similarly, (yi a)
equals jyi a j if yi a 0 and zero otherwise.

4
Median and quantile regression

If we have data on two variables fyi , xi gni=1 a linear median regression solves:

b0.5 = arg min ∑ n


b
a0.5 , b jy
i =1 i
a bxi j .
a,b

Similarly, the τth linear quantile regression solves:


h i
b bτ = arg min ∑n
aτ , b τ (yi a bxi )+ + (1 τ ) (yi a bxi ) .
a,b i =1

5
Part 1

Conditional quantiles and QR: informal introduction

6
Conditional quantile function

Consider an empirical relationship between two variables Y and X .

Suppose that X takes on K di¤erent values x1 , x2 , ..., xK and that for each of those
values we have Mk observations of Y : yk 1 , ..., ykM k .

If the relationship between Y and X is exact, the values of Y for a given value of X
will all coincide, so that we could write

Y = q (X ).

However, in general units having the same value of X will have di¤erent values of Y .

Suppose that yk 1 yk 2 ... ykM k , so the fraction of observations that are less
than or equal to ykm is ukm = m/Mk .

It can then be said that a value of Y does not only depend on the value of X but also
on the rank ukm of the observation in the distribution of Y given X = xk .

Generalizing the argument:


Y = q (X , U )

7
Conditional quantile function (continued)

The distribution of the ranks U is always the same regardless of the value of X , so
that X and U are statistically independent.

Also note that q (x , u ) is an increasing function in u for every value of x .

An example is a growth chart where Y is body weight and X is age (Figure 1).

In this example U is a normalized unobservable scalar variable that captures the


determinants of body weight other than age, such as diet or genes.

The function q (x , u ) is called a conditional quantile function.

It contains the same information as the conditional cdf (it is its inverse), but is in the
form of a statistical equation for outcomes that may be related to economic models.

Y = q (X , U ) is in itself just a statistical statement: e.g. for X = 15 and U = 0.5, Y


is the weight of the median girl aged 15.

8
Quantile function of normal linear regression

If the distribution of Y conditioned on X is the normal linear regression model:

Y = α + βX + V with V j X N 0, σ2 ,

the variable U is the rank of V and it is easily seen that


1
q (x , u ) = α + βx + σΦ (u )

where Φ (.) is the standard normal cdf.

In this case all quantiles are linear and parallel, a situation that is at odds with the
growth chart example.

10
Linear quantile regression (QR)

The linear QR model postulates linear dependence on X but allows for a di¤erent
slope and intercept at each quantile u 2 (0, 1)
q (x , u ) = α (u ) + β (u ) x (1)
In the normal linear regression β (u ) = β and α (u ) = α + σΦ 1 (u ).
In linear regression one estimates α and β by minimizing the sum of squares of the
residuals Yi a bXi (i = 1, ..., n ).
In QR one estimates α (u ) and β (u ) for …xed u by minimizing a sum of absolute
residuals where (+) residuals are weighted by u and (-) residuals by 1 u.
Its rationale is that a quantile minimizes expected asymmetric absolute value loss.
For the median u = 0.5, so estimates of α (0.5), β (0.5) are least absolute deviations.
All observations are involved in determining the estimates of α (u ), β (u ) for each u.
Under
p random sampling and standard regularity conditions, sample QR coe¢ cients
are n-consistent and asymptotically normal.
Standard errors can be easily obtained via analytic or bootstrap calculations.
The popularity of linear QR is due to its computational simplicity: computing a QR is
a linear programming problem (Koenker 2005).

11
Linear quantile regression (QR) (continued)

One use of QR is as a technique for describing a conditional distribution. For


example, QR is a popular tool in wage decomposition studies.

However, a linear QR can also be seen as a semiparametric random coe¢ cient model
with a single unobserved factor:

Y i = α (U i ) + β (U i ) X i

where Ui U (0, 1) independent of Xi .

For example, this model determines log earnings Yi as a function of years of schooling
Xi and ability Ui , where β (Ui ) represents an ability-speci…c return to schooling.

This is a model that can capture interactions between observables and unobservables.

A special case of modelh with an interactionibetween Xi and Ui is the heteroskedastic


regression Y j X N α + βX , (σ + γX )2 .
– In this case α (u ) = α + σΦ 1 (u ) and β (u ) = β + γΦ 1 (u ).

12
Part 2

Quantiles methods: formal development

13
I. Unconditional quantiles

Let F (r ) = Pr (Y r ). For τ 2 (0, 1), the τth population quantile of Y is de…ned


to be
1
Q τ (Y ) qτ F (τ ) = inf fr : F (r ) τg .
1
F (τ ) is a generalized inverse function. It is a left-continuous function with range
equal to the support of F and hence often unbounded.

Equivariance of quantiles under monotone transformations


This is an interesting property of quantiles not shared by expectations.
Let g (.) be a nondecreasing function. Then, for any random variable Y

Q τ [g (Y )] = g [Q τ (Y )] .

Thus, the quantiles of g (Y ) coincide with the transformed quantiles of Y .


To see this in the case of a monotonic transformation note that

Pr [Y Q τ (Y )] = τ ) Pr (g (Y ) g [Q τ (Y )]) = τ.

14
Asymmetric absolute loss

Let us de…ne the “check” function (or asymmetric absolute loss function). For
τ 2 (0, 1)

ρτ (u ) = [τ1 (u 0 ) + (1 τ ) 1 (u < 0)] ju j = [ τ 1 (u < 0)] u.

Note that ρτ (u ) is a continuous piecewise linear function, but nondi¤erentiable at


u = 0. We should think of u as an individual error u = y r and ρτ (u ) as the loss
associated with u.
Using ρτ (u ) as a speci…cation of loss, it turns out that qτ minimizes expected loss:
Z ∞ Z r
s0 (r ) E [ ρ τ (Y r )] = τ (y r ) dF (y ) (1 τ) (y r ) dF (y ) .
r ∞

Any element of fr : F (r ) = τ g minimizes expected loss. If the solution is unique, it


coincides with qτ as de…ned above. If not, we have an interval of τth quantiles and
the smallest element is chosen so that the quantile function is left-continuous.

15
Sample quantiles

Given a random sample fY1 , ..., YN g we obtain sample quantiles replacing F by the
empirical cdf:
1 N
N i∑
F N (r ) = 1(Y i r ).
=1

bτ = FN 1 (τ )
That is, we choose q inf fr : FN (r ) τ g, which minimizes
Z
1 N
N i∑
sN (r ) = ρ τ (y r ) dFN (y ) = ρ τ (Y i r) .
=1

An important advantage of expressing the calculation of sample quantiles as an


optimization problem, as opposed to a problem of ordering the observations, is
computational (specially in the regression context).
The optimization perspective is also useful for studying statistical properties.

16
Linear program representation

bτ is
An alternative presentation of the minimization leading to q
N
min ∑
r ,u i+ ,u i i =1
τui+ + (1 τ ) ui

subject to
Yi r = ui+ ui , ui+ 0, ui 0, (i = 1, ..., N )
2N
where here ui+ , ui denote 2N arti…cial additional arguments, which allow us to
i =1
represent the original problem in the form of a linear program.
We are using the notation ρτ (u ) = τu + + (1 τ ) u with u + = 1 (u 0) ju j and
u = 1 (u < 0 ) ju j.
Note that
u+ u = 1 (u 0 ) ju j 1 (u < 0 ) ju j = 1 (u 0) u + 1 (u < 0) u = u.
A linear program takes the form:
min c 0 x subject to Ax b, x 0.
x
The simplex algorithm for numerical solution of this problem was created by George
Dantzig in 1947.

17
Computing quantile regression

Suppose that we want to compute a simple median regression from the following data:
2 3
y1 x1
6 y2 x2 7
6 7
6 . .. 7
4 .. . 5
yn xn

That is, we want to …nd b


β0 , b
β1 that minimize the absolute sum of squares:
S ( β0 , β1 ) = jy1 β0 β1 x1 j + jy2 β0 β1 x2 j + ... + jyn β0 β1 xn j
A key aspect of this optimization problem is that for a pair of observations (j, `) we
can restrict attention to "basic solutions" of the form:
yj = b β0 + bβ1 xj
y` = b
β0 + b
β1 x `
For this particular candidate solution the objective function value is
n
∑ i =1 jy i ybi j
where ybi = b
β0 + b
β1 xi .
The procedure can be repeated for all other valid pairs of data points and the pair
that produce the least average absolute deviation determine the median regression
parameter estimates. 18
Nonsmoothness in sample but smoothness in population

The sample objective function sN (r ) is continuous but not di¤erentiable for all r .
Moreover, the gradient or moment condition
1 N
N i∑
bN (r ) = [1(Y i r) τ]
=1
is not continuous in r .
Note that if each Yi is distinct, so that we can reorder the observations to satisfy
Y1 < Y2 < ... < YN , for all τ we have
1
jbN (qbτ )j jFN (qbτ ) τ j .
N
Despite lack of smoothness in sN (r ) or bN (r ), smoothness of the distribution of the
data can smooth their population counterparts.
Suppose that F is di¤erentiable at qτ with positive derivative f (qτ ), then s0 (r ) is
twice continuously di¤erentiable with derivatives:
d
E [ρτ (Y r )] = τ [1 F (r )] + (1 τ ) F (r ) = F (r ) τ E [1(Y r ) τ]
dr
d2
E [ρτ (Y r )] = f (r ) .
dr 2

19
Consistency

Since a sample quantile does not have a closed form expression we need a method for
establishing the consistency of an estimator that maximizes an objective function.

A theorem taken from Newey and McFadden (1994) provides such a method.

The requirements are boundedness of the parameter space, uniform convergence of


the objective function to some nonstochastic continuous limit, and that the limiting
objective function is uniquely maximized at the truth (identi…cation).

The quantile sample objective function sN (r ) is continuous and convex in r .

Suppose that F is such that s0 (r ) is uniquely maximized at qτ . By the law of large


numbers sN (r ) converges pointwise to s0 (r ). Then use the fact that pointwise
convergence of convex functions implies uniform convergence on compact sets.

20
Asymptotic normality
The asymptotic normality of sample quantiles cannot be established in the standard
way because of the nondi¤erentiability of the objective function.
However, it has long been known that under suitable conditions sample quantiles are
asymptotically normal and there are direct approaches to establish the result.
Here we just re-state the asymptotic normality result for unconditional quantiles
following results on nonsmooth GMM around Newey and McFadden’s theorems.
The idea is that as long as the limiting objective function is di¤erentiable the approach
for di¤erentiable problems works if a stochastic equicontinuity assumption holds.
Fix 0 < τ < 1. If F is di¤erentiable at qτ with positive derivative f (qτ ), then
p 1 N
1 (Y i qτ ) τ

N (q qτ ) = p
N
∑ f (q τ )
+ op (1 ) .
i =1
Consequently, !
p d τ (1 τ)

N (q qτ ) ! N 0, .
[f (qτ )]2
The term τ (1 τ ) in the numerator of the asymptotic variance tends to make q bτ
more precise in the tails, whereas the density term in the denominator tends to make
bτ less precise in regions of low density.
q
Typically the latter e¤ect will dominate so that quantiles closer to the extremes will
be estimated with less precision.

21
Computing standard errors
The asymptotic normality result justi…es the large N approximation
b bτ ) p
f (q
p N (qbτ qτ ) N (0, 1)
τ (1 τ )
where b bτ ) is a consistent estimator of f (qτ ).
f (q
Since
F (r + h ) F (r h ) 1
f (r ) = lim lim E [1(jY rj h )] ,
h !0 2h h !0 2h
an obvious possibility is to use the histogram estimator
N
F (r + hN ) F N (r hN ) 1
f (r ) = N
b
2hN
= ∑
2NhN i =1
[1(Y i r + hN ) 1 (Y i r hN )]

N
1
= ∑
2NhN i =1
1(jYi r j hN )

for some hN > 0 sequence such that hN ! 0 as N ! ∞. Thus,


N
1
b bτ ) =
f (q ∑
2NhN i =1
1(jYi q bτ j hN ).
p
A su¢ cient condition for consistency is NhN ! ∞.
Other alternatives are kernel estimators for f (qτ ), the bootstrap, or directly obtain an
approximate con…dence interval using the normal approximation to the binomial.
22
II. Conditional quantiles

Consider the conditional distribution of Y given X :


Pr (Y r j X ) = F (r ; X )
and denote the τth quantile of Y given X as
1
Q τ (Y j X ) q τ (X ) F (τ; X ) .
Now quantiles minimize expected asymmetric absolute loss in a conditional sense:
qτ (X ) = arg min E [ρτ (Y c) j X ] .
c

Suppose that qτ (X ) satis…es a parametric model qτ (X ) = g (X , βτ ), then


βτ = arg min E [ρτ (Y g (X , b ))] .
b
Also, since in general
Pr (Y q τ (X ) j X ) = τ or E [1 (Y qτ (X )) τ j X ] = 0,
it turns out that βτ solves moment conditions of the form
E fh (X ) [1 (Y g (X , βτ )) τ ]g = 0.

23
Conditional quantiles in a location-scale model

The standardized variable in a location-scale model of Y j X has a distribution that is


independent of X .
Namely, letting E (Y j X ) = µ (X ) and Var (Y j X ) = σ2 (X ), the variable
Y µ (X )
V =
σ (X )
is distributed independently of X according to some cdf G .
Thus, in a location scale model all dependence of Y on X occurs through mean
translations and variance re-scaling.
In the location-scale model:
Y µ (X ) r µ (X ) r µ (X )
Pr (Y r j X ) = Pr jX =G
σ (X ) σ (X ) σ (X )
and
Q τ (Y j X ) µ (X )
G =τ
σ (X )
or
1
Q τ (Y j X ) = µ (X ) + σ (X ) G (τ )
so that
∂Q τ (Y j X ) ∂µ (X ) ∂σ (X ) 1
= + G (τ ) .
∂Xj ∂Xj ∂Xj

24
Conditional quantiles in a location-scale model (continued)

Under homoskedasticity, ∂Q τ (Y j X ) /∂Xj is the same at all quantiles since they only
di¤er by a constant term.

More generally, in a location-scale model the relative change between two quantiles
∂ ln [Q τ1 (Y j X ) Q τ2 (Y j X )] /∂Xj is the same for any pair (τ 1 , τ 2 ).

25
Structural representation

De…ne U such that


F (Y ; X ) = U.
It turns out that U is uniformly distributed independently of X between 0 and 1.
Note that if Pr (Y r j X ) = F (r ; X ) then Pr (F (Y ; X ) F (r ; X ) j X ) = F (r ; X )
or Pr (U s j X ) = s.
Also
1
Y =F (U; X ) with U j X U (0, 1) .
This is sometimes called the Skorohod representation.
For example, the Skorohod representation of the Gaussian linear regression model is
Y = X 0 β + σV with V = Φ 1 (U ), so that V j X N (0, 1).

26
III. Quantile regression
A linear regression is an optimal linear predictor that minimizes average quadratic
loss. Given data fYi , Xi gN
i =1 OLS sample coe¢ cients are given by
N

b 2
βOLS = arg min Yi Xi0 b .
b i =1
If E (Y j X ) is linear it coincides with the least squares population predictor, so that
b
βOLS consistently estimates ∂E (Y j X ) /∂X .
For robustness in the regression context one may be interested in median regression.
That is, an optimal predictor that minimizes average absolute loss:
N
b
βLAD = arg min
b
∑ Yi Xi0 b .
i =1
If med (Y j X ) is linear it coincides with the least absolute deviation (LAD)
population predictor, so that b βLAD consistently estimates ∂med (Y j X ) /∂X .
The idea can be generalized to quantiles other than τ = 0.5 by considering optimal
predictors that minimize average asymmetric absolute loss:
N
b
β (τ ) = arg min
b
∑ ρτ Yi Xi0 b .
i =1

As before if Q τ (Y j X ) is linear, b
β (τ ) consistently estimates ∂Q τ (Y j X ) /∂X .
27
Asymptotic inference for quantile regression
The …rst and second derivatives of the limiting objective function are:

E ρτ Y X 0 b = E X 1 Y X 0b τ
∂b
∂2
E ρτ Y X 0 b = E f X 0 b j X XX 0 = H (b )
∂b∂b 0
Moreover, under some regularity conditions we can use Newey and McFadden’s
asymptotic normality theorem, leading to
p h i 1 N
N bβ (τ ) β (τ ) = H0 1 p ∑ Xi 1 Yi Xi0 β (τ ) τ + op (1 ) .
N i =1
where H0 = H ( β (τ )) is the Hessian of the limit objective function at the truth, and
N
1 d
p
N
∑ Xi 1 Yi Xi0 β (τ ) τ ! N (0, V0 )
i =1
where
2
V0 = E 1 Yi Xi0 β (τ ) τ Xi Xi0 = τ (1 τ ) E Xi Xi0 .
The last equality follows under the assumption of linearity of conditional quantiles.
Thus,
p h i
d
N b β (τ ) β (τ ) ! N (0, W0 ) with W0 = H0 1 V0 H0 1 .

28
Getting consistent standard errors

To get a consistent estimate of W0 we need consistent estimates of H0 and V0 .


A simple estimator of H0 suggested in Powell (1984, 1986), which mimics the
histogram estimator discussed above, is as follows:
N
1
b =
H ∑
2NhN i =1
1 Yi Xi0 b β (τ ) hN Xi Xi0 .

This estimator is motivated by the following iterated expectations argument:


1
H0 = E f X 0 β (τ ) j X XX 0 lim E E 1( Y X 0 β ( τ ) h ) j X XX 0
h !0 2h
1
= lim E 1( Y X 0 β ( τ ) h )XX 0 .
h !0 2h
If the quantile function is correctly speci…ed a consistent estimate of V0 is
1 N
N i∑
b = τ (1
V τ) Xi Xi0 .
=1
Otherwise, a fully robust estimator can be obtained using
N n h i o2
e = 1 ∑ 1 Yi
V Xi0 b
β (τ ) τ Xi Xi0
N i =1

29
Getting consistent standard errors (continued)

Finally, if Uτ = Y X 0 β (τ ) is independent of X (as in the location model) it turns


out that
H0 = fU τ (0) E Xi Xi0
so that
τ (1 τ) 1
W0 = 2
E Xi Xi0 ,
[fU τ (0)]
which can be consistently estimated as
! 1
cNR = hτ (1 τi) 1 N
N i∑
W 2
Xi Xi0
b
fU τ ( 0 ) =1

where
N
1
b
fU τ ( 0 ) = ∑
2NhN i =1
1( Yi Xi0 bβ (τ ) hN ).

In summary, we have considered three di¤erent alternative estimators for standard


errors:
cNR ,
A non-robust variance matrix estimator under independence W
cR = H
a robust estimator under correct speci…cation: W b 1VbH
b 1,
cFR = H
and a fully robust estimator under misspeci…cation: W b 1V
eH b 1.

30
Part 3

Further topics

31
Flexible QR

Linearity is restrictive. It may also be at odds with the monotonicity requirement of


q (x , u ) in u for every value of x .

Linear QR may be interpreted as an approximation to the true quantile function.

An approach to nonparametric QR is to use series methods:

q (x , u ) = θ 0 (u ) + θ 1 (u ) g1 (x ) + ... + θ P (u ) gP (x ) .

The g ’s are anonymous functions without an economic interpretation. Objects of


interest are derivative e¤ects and summary measures of them.

In practice one may use orthogonal polynomials, wavelets or splines.

This type of speci…cation may be seen as an approximating model that becomes more
accurate as P increases, or simply as a parametric ‡exible model of the quantile
function.

From the point of view of computation the model is still a linear QR, but the
regressors are now functions of X instead of the X s themselves.

32
Decompositions

Basic idea of decomposition:


Z Z
F M (y ) F F (y ) = FM (y j x ) fM (x ) dx FF (y j x ) fF (x ) dx
Z Z
= [F M (y j x ) FF (y j x )] fM (x ) dx + F M ( y j x ) [ fM ( x ) fF (x )] dx

The indices (M , F ) could be male/female gender gaps, a pair of countries or two


di¤erent periods.
The decomposition can be done for cdf s (as shown) or for other distributional
characteristics such as quantiles or moments.
When done for di¤erences in means using linear regression they are called Oaxaca
decompositions (after the work of Ronald Oaxaca):

yM 0
yF = xM βM x F0 βF = x F0 ( βM βF ) + (x M x F ) 0 βM

To do decompositions for quantiles or for distributions based on QR models, we need


to be able to calculate the marginal distribution of the outcome implied by the QR.

33
Machado-Mata method

To do so we can use the following simulation method (Machado and Mata 2005):

1) Generate u1 , ..., um iid U (0, 1).

2) Get b β (um ) by running QRs from the actual data fyi , xi gni=1 .
β (u1 ) , ..., b

3) Get a random sample of size m of the covariates: x1 , ..., xm .

4) Compute yj = xj b
β uj for j = 1, 2, ..., m.

5) The sample quantiles of y1 , ..., ym are consistent estimates of the marginal quantiles
of yi for large n and m.

34
Other topics not covered

Censored regression quantiles

Crossings and rearrangements.

Quantile regression under misspeci…cation.

Functional inference.

35

You might also like