0% found this document useful (0 votes)
6 views26 pages

Pohq 4

This article discusses kernel quantile estimation and introduces two new bandwidth selection methods. Simulations with four distributions indicate that kernel smoothed quantile estimators generally outperform empirical quantile estimators, particularly with small sample sizes, although no single method is superior across all distributions. The paper also details the asymptotic mean squared errors and optimal bandwidths for these estimators, along with methods for estimating necessary density and quantile derivatives.

Uploaded by

saeedali2132002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views26 pages

Pohq 4

This article discusses kernel quantile estimation and introduces two new bandwidth selection methods. Simulations with four distributions indicate that kernel smoothed quantile estimators generally outperform empirical quantile estimators, particularly with small sample sizes, although no single method is superior across all distributions. The paper also details the asymptotic mean squared errors and optimal bandwidths for these estimators, along with methods for estimating necessary density and quantile derivatives.

Uploaded by

saeedali2132002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Bandwidth Selection for Kernel Quantile

Estimation

Ming-Yen Cheng1 and Shan Sun2

Abstract

In this article, we summarize some quantile estimators and related bandwidth


selection methods and give two new bandwidth selection methods. By four distribu-
tions: standard normal, exponential, double exponential and log normal we simulated
the methods and compared their efficiencies to that of the empirical quantile. It turns
out that kernel smoothed quantile estimators, with no matter which bandwidth se-
lection method used, are more efficient than the empirical quantile estimator in most
situations. And when sample size is relatively small, kernel smoothed estimators
are especially more efficient than the empirical quantile estimator. However, no one
method can beat any other methods for all distributions.

Keywords. Bandwidth, kernel, quantile, nonparametric smoothing.

Short Title. Quantile Estimation.

JEL subject classification: C14, C13.


1
Department of Mathematics, National Taiwan University, Taipei 106, Taiwan. Email:
[email protected]
2
Department of Mathematics and Statistics, Texas Tech University, Lubbock, Texas 79409 - 1042,
USA. Email: [email protected]

1
1 Introduction
The estimation of population quantiles is of great interest when a parametric form
for the underlying distribution is not available. In addition, quantiles often arise
as the natural thing to estimate when the underlying distribution is skewed. Let
X1 , X2 , · · · , Xn be an independent and identically distributed random sample drawn
from an absolutely continuous distribution function F with density f . Let X(1) ≤
X(2) ≤ · · · ≤ X(n) denote the corresponding order statistics. The quantile function Q
of the population is defined as Q(p) = inf{x : F (x) ≥ p}, 0 < p < 1. Note that Q is
the left-continuous inverse of F . Denote, for each 0 < p < 1, the pth quantile of F
by ξp , that is, ξp = Q(p).
A traditional nonparametric estimator of the distribution function is the empir-
ical function Fn (x), which is defined as
n
1X
Fn (x) = I(−∞,x] (Xi )
n i=1

where IA (x) = 1 if x ∈ A and 0 otherwise. Accordingly, a nonparametric estimator


of ξp is the empirical quantile

Qn (p) = inf{x : Fn (x) ≥ p} = X([np]+1) ,

where [np] denotes the integer part of np. Let pr = r/(n + 1) and qr = 1 − pr . If we
use X(r) to estimate the pr th quantile, then the asymptotic bias and variance are

pr qr Q00 (pr ) p r qr n 1 000 1 0000 o


ABias{X(r) } = + (q r − p r )Qr + Q ,
2(n + 2) (n + 2)2 3 8 r
p r qr p r qr n 1 00 o
AV ar{X(r) } = Q02 + 2(q r − p r )Q0 00
Q + p q
r r (Q0 000
Q + Q ) .
(n + 2) r (n + 2)2 r r r r
2 r
The asymptotic mean squared error of X(r) should be AM SE{X(r) } = ABias{X(r) }2 +
AV ar{X(r) }.
When F is continuous, it is more natural to use a smooth random function as an
estimator of F since there is a substantial lack of efficiency, caused by the variability
of individual order statistics. Indeed, the choice of Fn does not always lead to the
best estimator of F (cf. Read (1972), who has shown that Fn is inadmissible with
2
respect to the integrated square loss). Intuitively appealing and easily understood
competitors to Qn are the popular kernel quantile estimators, see Section 2.
Section ?? gives the asymptotic mean squared errors and asymptotically optimal
bandwidths for two kernel smoothed quantile estimators. The optimal bandwidths
depend on unknown quantities such as density derivatives and quantile derivatives.
Kernel estimators and optimal bandwidths for these unknowns are addressed as well.
In Section 4, we give four methods to select the bandwidths for the two kernel quantile
estimators based on data. In Section 5 we implement these methods on four specific
distributions and the results of the simulation. The Appendix gives some proofs.

2 Kernel smoothed quantile estimation

2.1 Inverse of kernel distribution function estimator


A popular kernel quantile estimator is based on the Nadaraya (1964) type estimator
for F , defined as
n
1X
F̂n (x) = Kh (x − Xi )
n i=1
where Z x
1 t
Kh (x) = k( )dt,
−∞ h h
R∞
k is a kernel function satisfying k ≥ 0, −∞
k(x)dx = 1. Here h = hn > 0 is called
the smoothing parameter or bandwidth since it controls the amount of smoothness
in the estimator for a given sample of size n. We make the assumption that h → 0
as n → ∞. The corresponding estimator of the quantile function Q = F −1 is then
defined by
Q̂n (p) = inf{x : F̂n (X) ≥ p}, 0 < p < 1. (1)

Nadaraya (1964) showed under some assumptions for k, f and h, Q̂n (p) (appropri-
ately normalized) has an asymptotic standard normal distribution. Another notable
property of Q̂n (p), namely the almost sure consistency, was obtained by Yamato
(1973). Ralescu and Sun (1992) obtained the necessary and sufficient conditions for
the asymptotic normality of Q̂n (p). Azzalini (1981) and an unpublished report used

3
heuristic arguments based on second order approximations and performed some nu-
merical comparisons of Q̂n (p) with the classical sample quantile for estimating the
95th quantile of the Gamma (1) distribution. These studies indicated a considerable
amount of empirical evidence to support the superiority of Q̂n (p) for a variety of
smooth distribution functions.
Azzalini (1981) considered second order property of F̂n under the following as-
sumptions: (i) h → 0 as n → ∞; (ii) the kernel has a finite support, that is,
k(t) = 0 if |t| > t0 for some positive t0 ; (iii) the density f is continuous in the
interval (x − t0 h, x + t0 h); and (iv) f 0 (x) exists. He pointed out that the asymptotic
optimal bandwidth for F̂ is of the form
 u  31
hopt = (2)
4vn
where
n Z t0 o n1 Z t0 o2
2 0 2
u = f (x) t0 − K (t)dt) , v = f (x) t k(t)dt .
−t0 2 −t0

Also, Azzalini (1981) suggested, without offering a proof, that (2) is again the asymp-
totically optimal choice of h for Q̂n (p). We state the result in the following theorem
and the proof of the theorem can be found in Shankar (1998).
We make the following assumptions:
Assumption A

(1) f is differentiable with a bounded derivative f 0 ;

(2) f 0 is continuous in the neighborhood of ξp and f 0 (ξp ) 6= 0;


R∞ R∞
(3) −∞
xk(x)dx = 0 and −∞
x2 k(x)dx < ∞.

Theorem 1. Under assumptions (1)-(3), the asymptotic mean squared error of Q̂(p)
is
 p(1 − p) h4 f 0 (ξp )2 2 h 1
AM SE Q̂(p) = + µ 2 (k) − ψ(k)
nf (ξp )2 4 f (ξp )2 n f (ξp )
and the asymptotically optimal choice of bandwidth for the smoothed empirical quan-
tile function Q̂n (p) is
  31
f (ξp )ψ(k)
hopt,1 = (3)
n{f 0 (ξp )}2 µ2 (k)2
4
R∞ 2
R
where µ2 (k) = −∞
t k(t)dt and ψ(k) = 2 yk(y)K(y)dy. If we take k as the standard
R∞ √
normal density, then −∞ tdK 2 (t) = 1/ π, µ2 (k) = 1 and
" # 31
f (ξp )
hopt,1 = √ .
πn{fg0 n∗ (ξp )}2

2.2 Kernel smoothing the order statistics


Another type of smooth quantile estimator, provided by Yang (1985) and also traced
to Parzen (1979), is
n i
1 p−x
Z
X n
Q̃n (p) = X(i) k( )dx. (4)
i=1
i−1
n
h h

It is clear that when i/n is close to p, Q̃n (p) puts more weight on the order statistics
X(i) . The asymptotic normality and mean squared consistency of Q̃n (p) were pro-
vided by Yang (1985), while Falk (1984) showed that the asymptotic performance of
Q̃n (p) is better than that of the empirical sample quantile Qn (p) in the sense of rel-
ative deficiency for appropriately chosen kernels and sufficiently smooth distribution
functions.
Building on Faulk (1984), Sheater and Morron (1990) gave the asymptotic mean
squared error (AMSE) of Q̃n (p) as follows if f is not symmetric or f is symmetric
but p 6= 0.5:
p(1 − p) 2 1 h
q (p) + h4 q 0 (p)2 µ2 (k)2 − q 2 (p)ψ(k)

AM SE Q̃n (p) = (5)
n 4 n
where q = Q0 and q 0 = Q00 . If q = Q0 > 0 then
 13
Q0 (p)2 ψ(k)

hopt,2 = . (6)
nQ00 (p)2 µ2 (k)2

Remark 2.1. When F is symmetric and p = 0.5, then

AM SE Q̃n (p) = n−1 [q(0.5)]2 {0.25 − 0.5hψ(k) + n−1 h−1 R(k)},




R
where R(k) = k 2 (x)dx. In this case, there is no single optimal bandwidth minimizing
the AM SE.

5
Remark 2.2. If q = 0, we need higher order terms. The AM SE of Q̃n (p) can be
shown as follows:
Z
 1 1 4 00 2 2 −1 2 00 2
AM SE Q̃n (p) = ( − )h Q (q) µ2 (k) + 2n h Q (q) (q − ht)tk(t)j(t)dt
4 n
Rt
where j(t) = −∞ xk(x)dx. The proof is provided in the Appendix.

3 Density and quantile derivative estimation


The asymptotically optimal bandwidths hopt,1 and hopt,2 for Q̂n (p)) and Q̃n (p) depend
on f (ξp ), f 0 (ξp ), Q0 (p) and Q00 (p). This section provides nonparametric estimators of
these quantities and the asymptotically optimal bandwidths.

3.1 Density derivative estimation


From (3) we know that we need to estimate f 0 . A natural estimator of the rth
derivative (r ≥ 1) of f can be obtained by differentiating the estimator
n n
d d n1 X o 1X
fˆgn (x) = F̂n (x) = Kgn (x − Xi ) = kg (x − Xi ) (7)
dx dx n i=1 n i=1 n

of the density f (x), giving


n n
dr 1 X x − Xi  1 X (r) x − Xi 
fˆg(r) (x) = k = k (8)
n
dxr ngn i=1 gn ngnr+1 i=1 gn

where gn is the smoothing parameter (Wand and Jones, 1995). Therefore, the asymp-
(r)
totic mean squared error properties of fˆgn (x) can be derived straightforwardly to
obtain (Wand and Jones, 1995)
1 1 2
AM SE{fˆg(r) R(k (r) )f (x) + gn4 {µ2 (k)}2 f (r+2) (x)

(x)} = (9)
n
ngn2r+1 4
R
where R(η) = η 2 (x)dx for any square-integrable function η. It follows that the
AMSE-optimal bandwidth for estimating f (r) (x) is of order n−1/(2r+5) . The asymp-
totically optimal bandwidth for for fˆgn (x) is given by
  51
R(k)f (x)
gn∗ = . (10)
n(µ2 (k))2 f 00 (x)2
6
and the asymptotically optimal bandwidth for fˆ0 gn (x) is
 71
3R(k 0 )f (x)

gn∗∗ = . (11)
n(µ2 (k))2 f 000 (x)2

When k is the standard Normal density,


  15   17
f (x) 3f (x)
gn∗ = √ 00 2 , gn∗∗ = √ 000 2 .
n πf (x) 4n πf (x)

3.2 Quantile derivative estimation


Next, we estimate Q0 = q and Q00 = q 0 in the following ways. From (4), we know that
the estimator of Q0 = q can be constructed as follows:
Pn i−1
q̃(p) = Q̃0 n (p) = i=1 X(i) [ka (p − n
) − ka (p − ni )]
Pn i−1
= i=2 (X(i) − X(i−1) )ka (p − n
) − X(n) ka (p − 1) + X(1) ka (p).
(12)
where ka (x) = a1 k( xa ) and a = an is the bandwidth for q̃. Jones (1992) derived that
the asymptotic MSE of q̃(p) is given as follows:

a4
Z
1 2
AM SE{q̃(p)} = q 00 (p)2 µ2 (k)2 + q (p) k 2 (y)dy . (13)
4 na
Minimizing (13) with respect to a, we obtain the asymptotically optimal bandwidth
for q̃(p) as
1
Q0 (p)2 k 2 (y)dy 5
 R
a∗opt = . (14)
nQ000 (p)2 µ2 (k)2
To estimate Q00 = q 0 in (6), note that
n n p − i−1  i o
d 0 1 X 0 n 0 p− n
Q̃00 n (p) = Q̃ (p) = 2 X(i) k −k . (15)
dp n a i=1 a a

Similarly, we obtain the asymptotically optimal bandwidth for Q̃00 n (p) as


 R 0 2 1
3 k (x) dxQ0 (p)2 7
a∗∗
opt = . (16)
nµ2 (k)2 Q(4) (p)2

7
4 Bandwidth selection
In this section, we consider several data-based methods to find the asymptotically
optimal bandwidths for the estimators Q̂n (p) and Q̃n (p). Bandwidth plays a critical
role in implementation of practical estimation. It determines the trade-off between
the amount of smoothness obtained and closedness of the estimation to the true
distribution. (see Wand and Jones)

4.1 Method 1. Approximate hopt,1 for Q̂n (p) using density


derivative estimators
Note that the asymptotically optimal bandwidth hopt,1 for Q̂n (p), given in (3), involves
f (ξp ) and f 0 (ξp ), which can be estimated by fˆgn (ξˆp ) and fˆ0 (ξˆp ) respectively. Here, ξˆp
gn

is the empirical p-th quantile Qn (p). Using in (10) with f (ξˆp ) and f 00 (ξˆp ) replaced
gn∗
by their Normal(µ, σ 2 ) reference values, we obtain fˆgn∗ (x). Using gn∗∗ in (11) with
f (ξˆp ) and f 000 (ξˆp ) replaced by their Normal(µ, σ 2 ) reference values, we obtain fˆ0 ∗∗ (x).
gn

Plugging this into (3), we have a data-based bandwidth


" # 13
fˆg∗ (ξˆp )ψ(k)
ĥopt,1 =  n 2 (17)
n fˆ0 g∗∗ (ξˆp ) µ2 (k)2
n

for Q̂n (p). If k is the standard normal density then


" # 13
fˆgn∗ (ξˆp )
ĥopt,1 = √  ˆ0 2 . (18)
n π f g∗∗ (ξˆp )
n

Remark 4.1. In the expression of the hopt,1 , we have the derivative of f in the
denominator. If f 0 has zeros, then its estimates at these zeros are also very small.
Hence the estimator ĥopt,1 of hopt,1 at these zeros will be very unstable. For example,
if f is standard normal, then f 0 = −xf has a zero at x = 0, which corresponds to
p = 0.5, and hence, when p = 0.5, the estimator ĥopt,1 is very unstable. Similarly,
the first derivative of the double exponential density has a zero at x = 0 and the first
derivative of the log normal density has a zero at x = e−1 .

8
4.2 Method 2. Approximate hopt,2 for Q̃n (p) using quantile
derivative estimators
The asymptotically optimal bandwidth hopt,2 , given in (6), for Q̃n (p) involves the
unknown quantities Q0 (p) and Q00 (p), which can be estimated by Q̃0n (p) and Q̃00n (p)
in (12) and (15), respectively. The asymptotically optimal bandwidths a∗opt and a∗∗
opt ,

given in (14) and (16), for Q̃0n (p) and Q̃00n (p) depend on Q0 (p), Q000 (p) and Q(4) (p). We
replace these unknowns by their Normal(µ, σ 2 ) reference values. Then, using Q̃0n (p)
with a = a∗opt and Q̃00n (p) with a = a∗∗ opt , we have the following data-based bandwidth
( ) 31
Q̃0 n (p)2 ψ(k)
ĥopt,2 = (19)
nQ̃00 n (p)2 µ2 (k)2
for Q̃n (p).

4.3 Method 3. Approximate hopt,1 for Q̂n (p) using quantile


derivative estimators
We introduce an alternative way of estimating f (ξp ) and f 0 (ξp ) in hopt,1 , see (3), which
uses estimators of the quantile derivatives. Note that
1 1 1
Q0 (p) = = = (20)
f (F −1 (p)) f (Q(p)) f (ξp )
−f 0 (Q(p)) −f 0 (ξp )
Q00 (p) = = . (21)
f 3 (Q(p)) f 3 (ξp )
Hence, (3) becomes
 31
Q0n (p)5 ψ(k)

hopt,1 = .
nQ00n (p)2 µ2 (k)2
Similar to Method 2, first replace the unknowns in a∗opt and a∗∗
opt by their Normal

reference values, and then use Q̃0n (p) with a = a∗opt and Q̃00n (p) with a = a∗∗
opt to get
( ) 31
Q̃0 n (p)5 ψ(k)
h̄opt,2 = . (22)
nQ̃00 n (p)2 µ2 (k)2
If we take k as the standard normal density, then
( ) 31
Q̃0 n (p)5
h̄opt,2 = √ .
n π Q̃00 n (p)2

9
4.4 Method 4. Approximate hopt,2 for Q̃n (p) using density
derivative estimators
From (20) and (21), we have
 31
f (ξp )4 ψ(k)

hopt,2 = . (23)
nf 0 (ξp )2 µ2 (k)2

Then, plugin the estimators of f (ξp ) and f 0 (ξp ) in Method 1, see (17), to obtain
( ) 31
fˆgn∗ (ξˆp )4 ψ(k)
h̄opt,2 = . (24)
nfˆ0 g∗∗ (ξˆp )2 µ2 (k)2
n

When k is standard normal density, h̄opt,2 becomes


( ) 13
fˆgn∗ (ξˆp )4
h̄opt,2 = √ .
n π fˆ0 gn∗∗ (ξˆp )2

5 Numerical Performance
We implement the methods in Section 4. Four distributions are selected: Exponential,
Double Exponential, Lognormal and standard Normal. We shall use the standard
normal density as the kernel k, i.e. k(x) = √1

exp(−x2 /2). Then k 0 (x) = −xk(x).
and we can find Z
µ2 (k) = x2 k(x)dx = 1,
Z Z x
1
ψ(k) = 2 {k(x)[ k(t)dt]}dx = √ ,
−∞ π
Z
1
R(k) = k 2 (x)dx = √ ,
2 π
Z Z
1
R(k 0 ) = {k 0 (x)}2 dx = x2 k 2 (x)dx = √ .
4 π

5.1 True values


In the following we compute the asymptotically optimal bandwidths and the AM SEs
for the four distributions. First, we have the relationship between Q(p) and f (ξp ) as

10
following

1 f 0 (ξp ) 3f 0 (ξp )2 − f (ξp )f 00 (ξp )


Q0 (p) = , Q00 (p) = − , Q000
(p) = ,
f (ξp ) f (ξp )3 f (ξp )5

10f (ξp )f 0 (ξp )f 00 (ξp ) − f (ξp )2 f 000 (ξp ) − 15f 0 (ξp )3


Q(4) (p) = .
f (ξp )7
Using the above results, the asymptotic mse of Q̃n (p) is
 p(1 − p) h4 f 0 (ξp )2 h
AM SE Q̃n (p) = 2
+ 6
− √ .
nf (ξp ) 4f (ξp ) n πf (ξp )2
Also we have  51
f (ξp )8


a = √ ,
2n π(3f 0 (ξp )2 − f (ξp )f 00 (ξp ))2
 71
3f (ξp )12

∗∗
a = √ .
4n π(10f (ξp )f 0 (ξp )f 00 (ξp ) − f (ξp )2 f 000 (ξp ) − 15f 0 (ξp )3 )2
Case 1. f is the standard normal density. We have
1 x2
f (x) = √ e− 2 , f 0 (x) = (−x)f (x), f 00 (x) = (x2 − 1)f (x), f 000 (x) = (3x − x3 )f (x).

Hence, with x = ξp ,
(√ ) 51 ( √ ) 71
2 2
2 exp (x /2) 3 2 exp (x /2)
gn∗ = , gn∗∗ = ,
n(x2 − 1)2 4n(3x − x3 )2

2πp(1 − p) x2
 2h x2 h4 2
AM SE Q̂n (p) = e − e2 + x ,
n n 4
" # 1 ( ) 71
−2x2 5 −3x2
1 e 1 3e
a∗ = √ √ , a∗∗ = √ √ ,
2π 2
2n(2x + 1) 2 2π 2 2n(6x3 + 7x)2

 2πp(1 − p) x2 2 4 2 2x2 2 πh x2
AM SE Q̃n (p) = e +π h x e − e .
n n
Case 2. f is the density of Exponential(1). We have

f (x) = e−x = −f 0 (x) = f 00 (x) = −f 000 (x).

Hence   15   17
exp(x) 3 exp(x)
gn∗ = √ , gn∗∗ = √ ,
n π 4n π
11
p(1 − p) 2x h h4
e − √ ex + ,

AM SE Q̂n (p) =
n n π 4
 −4x  15  17
3e−6x

∗ e ∗∗
a = √ , a = √ ,
8 πn 144 πn
p(1 − p) 2x h4 4x h
e + e − √ e2x

AM SE Q̃n (p) =
n 4 n π
Case 3. f is the density of lognormal. We have
1 log2 x
f (x) = √ e− 2 ,
2πx
1 log2 x 1 1 f (x)
f 0 (x) = √ e− 2 (− 2 − 2 log x) = − (1 + log x),
2π x x x
1 log2 x 1 f (x)
f 00 (x) = √ e− 2 3 (1 + 3 log x + log2 x) = 2 (1 + 3 log x + log2 x),
2π x x
1 log2 x 1 f (x)
f 000 (x) = √ e− 2 4 (−8 log x−6 log2 x−log3 x) = − 3 (8 log x+6 log2 x+log3 x).
2π x x
Hence
 4
 15  6
 71
x 3x
gn∗ = √ , gn∗∗ = √ ,
n π(1 + 3 log x + log2 x)2 f (x) 4n π(8 log x + 6 log2 x + log3 x)2 f (x)

 2πp(1 − p) 2 log2 x 2h log2 x h4 (1 + log x)2
AM SE Q̂n (p) = xe − xe 2 + ,
n n 4 x2
( ) 15
−2 log2 x
1 e
a∗ = √ √ ,
2π n 2(2 + 3 log x + 2 log2 x)2
( ) 71
−3 log2 x
1 3e
a∗∗ = √ √ ,
2π 2n 2(5 + 13 log x + 11 log2 x + 6 log3 x)2
2 2
n h o
AM SE Q̃n (p) = π 2 h4 (1 + log x)2 e2 log x + 2πx2 elog x p(1 − p) − √ .

π
Case 4. f is the density of double exponential.
We have f (x) = 12 e−|x| = f 00 (x) except at x = 0 and
(
0 − 12 e−x x > 0 1
f (x) = = − sign(x)e−|x| = f 000 (x).
1 x
e x<0 2
2

12
Hence
 15  17
2e|x| 3e|x|
 
gn∗∗ = √ , gn∗∗ = √
n π 2n π
4p(1 − p) 2|x| 2h h4
e − √ e|x| + ,

AM SE Q̂n (p) =
n n π 4
1
 −4|x|  5  −6|x|  71

∗ e ∗∗ e
a = 7
√ , a = 10
√ ,
2n π 2 3n π
4p(1 − p) 2|x| 4h
AM SE Q̃n (p) = 4h4 e4|x| + e − √ e2|x| .

n n π

5.2 Simulation results


We sampled from the four distributions of size 50, 100, 500, and 1000, and computed
the bandwidths and AMSE’s at values of p from 0.05 to 0.95 with step size 0.05.
However, by remark 2.1.1, we omitted p = 0.5 0.5 for normal and double exponential
distributions and p = 0.35 for lognormal. We repeated the computation for 100 times.
In the first several times of simulations, we obtained some extremely large or small
bandwidths, which certainly resulted in extremely large asymptotic MSE. Hence we
adopted the strategy in Sheather and Marron (1990) to adjust too small or large
bandwidths. For example, in method 1, we forced fˆ0 (ξp )−2 to be in the interval [0.05,
1.5] as follows: if it is not in the interval, we replace it by the closest endpoint of the
interval. Simulation results are displayed by figures. In the figures, plotted against
p is the relative efficiency, i.e. the ratio of the AMSE of the different methods to
the AMSE of the empirical quantile. Figures 1–4 summarize performance of different
methods with the same sample size for the four distributions. Figures 5–8 show
performance of one method with different sample sizes.
From Figures 1-4 we can see that the solid line, which corresponds to sample size
n = 50, is almost the lowest in each plot. This is because when sample size is small,
the empirical quantile has a relatively bigger M SE. Hence the kernel estimators are
relatively more efficient.
Generally speaking, the four methods did a better job than empirical quantiles.
For example, in Figure 6, we can see that when n = 50 only method 2 gave an
efficiency more than 1 with p values between 0.75 and 0.95. Efficiency of all other

13
methods are under 1 with all p values. But, unfortunately, no method works better
than all the other methods for all distributions and all sample sizes. In Figure 8, for
example, Method 2 sometimes works better than the others, but sometimes worse
than the others. From this Figure it seems that Method 1 is always more efficient
than Method 3. But if we look at Figure 6, Method 3 is more efficient than Method 1
for many p values in each sample size. We can also see from Figures 5-8 that plots of
Method 1 (2) are similar to plots of Method 3 (4). This is not casual because we use
the same formula to compute their asymptotic M SEs. From Figures 1–4, we observe
that another common behavior for Method 2 and Method 4 is that they performance
badly near the boundaries, i.e. when p is close to 0 or 1.
In a word, the kernel quantile estimators, wit no matter which bandwidth selec-
tion method, are more efficient than the empirical quantile estimator in most situa-
tions. And when sample size n is relatively small, say n=50, they are significantly
more efficient than the empirical quantile estimator. But no one single method is
most efficient in any situations.

References
[1] Azzalini, A. (1981). A note on the estimation of a distribution function and
quantiles by a kernel method. Biometrika 68, 326-328.

[2] Faulk, M. (1984). Relative deficiency of kernel type estimators of quantiles,


Ann. Stat., 12, 261-268.

[3] Jones, M. C. (1992). Estimating densities, quantile, quantile densities and


density quantiles. Ann. Inst. Statist. Math., 44, 721-727

[4] Nadaraya, E. A. (1964). Some new estimates for distribution functions. Theory
Probab. Appl., 9, 497-500.

[5] Parzen, E. (1979). Nonparametric statistical data modeling. J. Amer. Stat.


Assoc., 74, 105-131.

14
[6] Read, R.R. (1972). The asymptotic inadmissibility of the sample distribution
function. Ann. Math. Statist., 43, 89-95.

[7] Ralescu, S. S. and Sun, S. (1993). Necessary and sufficient conditions for the
asymptotic normality of perturbed sample quantiles. J. Statist. Plann. Infer-
ence, 35 55-64.

[8] Shankar, B. (1998). An optimal choice of bandwidth for perturbed sample


quantiles, master thesis.

[9] Sheather, S. J. and Marron, J. S. (1990). Kernel quantile estimators. J. Amer.


Statist. Assoc., 85, 410-416.

[10] Wand, M. P. and Jones, M. C. (1995). Kernel smoothing. Chapman and Hall,
London.

[11] Yamato, H. (1973). Uniform convergence of an estimator of a distribution


function. Bull. Math. Statist., 15, 69–78.

[12] Yang, S. S. (1985). A smooth nonparametric estimation of a quantile function.


J. Amer. Stat. Assoc., 80, 1004-1011.

15
Appendix

We now provide the proof for AM SE in Remark 2.2. Here we follow the notation
of Faulk (1984). Since F −10 (q) = Q0 (q) = 0, we have
R1 R
= n−1 { k(x)(q − αn x − 1(0,q−αn x) (y))F −10 (q − αn x)dx}2 dy

V ar Q̃n (p) 0

R1 R
= n−1 0
{ k(x)(q − αn x − 1(0,q−αn x) (y))[F −10 (q) − αn xF −100 (q) + O(αn2 )]dx}2 dy

R1 R
= n−1 0
{ k(x)(q − αn x − 1(0,q−αn x) (y))(−αn xF −100 (q))dx}2 dy + O(n−1 αn2 )

R1 R
= b 0
{ k(x)(q − αn x − 1(0,q−αn x) (y))xdx}2 dy + O(n−1 αn2 )

R1
xk(x)1(0,q−αn x) (y))dx}2 dy + O(n−1 αn2 )
R R R
= b 0
{q xk(x)dx − αn x2 k(x)dx −

R1
xk(x)1(0,q−αn x) (y))dx]2 dy + O(n−1 αn2 )
R
= b 0
[αn µ2 (k) +

R1 2 2
R
= b 0
{αn µ 2 (k) + 2αn µ 2 (k) xk(x)1(0,q−αn x) (y))dx

+[ xk(x)1(0,q−αn x) (y))dx]2 }dy + O(n−1 αn2 )


R

R1R
= bαn2 µ22 (k) + 2cαn µ2 (k) 0
xk(x)1(0,q−αn x) (y))dxdy

R1 R
+b 0
[ xk(x)1(0,q−αn x) (y))dx]2 }dy + O(n−1 αn2 )

4
= bαn2 µ22 (k) + 2bαn µ2 (k)S1 + bS2 + O(n−1 αn2 )

16
where b = n−1 αn2 F −100 (q)2 . But
R1R
S1 = 0
xk(x)1(0,q−αn x) (y))dxdy

R R1
= xk(x) 0
1(0,q−αn x) (y))dydx

R
= xk(x)(q − αn x)dx

R R
= q xk(x)dx − αn x2 k(x)dx

= −αn µ2 (k)

and R1 R
S2 = 0
[ xk(x)1(0,q−αn x) (y))dx]2 }dy

R 1 R q−y
= 0
αn
[ q−1 xk(x)dx]2 dy
αn

R q−y R1 R q−y
= {y[ αn
q−1 xk(x)dx]2 }|10 − 0
yd{[ αn
q−1 xk(x)dx]2 }
αn αn

R1 R q−y
αn q−y q−y 1
= −2 0
{y[ q−1 xk(x)dx] α k( α )(− α )}dy
n n n
αn

2
R1 q−y q−y
R q−y
αn
= αn 0
{y αn
k( αn
)[ q−1 xk(x)dx]}dy
αn

2
R q−1
αn
Rt
= αn q {(q − αn t)tk(t)[ q−1 xk(x)dx]}d(−αn t)
αn αn

R αqn Rt
= 2 q−1 {(q − αn t)tk(t)[ q−1 xk(x)dx]}dt
αn αn

R αqn
= 2 q−1 (q − αn t)tk(t)j(t)dt
αn

4 Rt
where j(t) = xk(x)dx and c is such that k is finitely supported in [−c, c]. Then
−c
Z
−1 2 −100 −1 2 −100
2 2 2 2
(q−αn t)tk(t)j(t)dt+O(n−1 αn2 ).

V ar Q̃n (p) = −n αn F (q) αn µ2 (k)+2n αn F (q)

17
If we replace αn by h and F −100 (q) by Q00 (q), then
Z
−1 4 00
(q)2 µ22 (k) + 2n−1 h2 Q00 (q)2 (q − ht)tk(t)j(t)dt + O(n−1 h2 ).

V ar Q̃n (p) = −n h Q

But the bias of Q̃n (p) is


1
bias = h2 µ2 (k)Q00 (q) + O(h2 ) + O(n−1 ).
2
Hence the MSE of Q̃n (p) is

h4 2
µ (k)Q00 (q)2 + O(h4 ) + O(n−1 h2 ) − n−1 h4 Q00 (q)2 µ22 (k)

M SE Q̃n (p) = 4 2

+2n−1 h2 Q00 (q)2 (q − ht)tk(t)j(t)dt + O(n−1 h2 ).


R

That is
Z
1 1
AM SE Q̃n (p) = ( − )h4 Q00 (q)2 µ22 (k) + 2n−1 h2 Q00 (q)2

(q − ht)tk(t)j(t)dt.
4 n

18
Figure 1: Efficiency under double exponential. Different panels correspond to
different methods.

19
Figure 2: Efficiency under exponential. Different panels correspond to different
methods.

20
Figure 3: Efficiency under Log Normal. Different panels correspond to different
methods.

21
Figure 4: Efficiency under standard Normal. Different panels correspond to
different methods.

22
Figure 5: Efficiency under double exponential. Different panels correspond to
different sample sizes.

23
Figure 6: Efficiency under exponential. Different panels correspond to different
sample sizes.

24
Figure 7: Efficiency under Log Normal. Different panels correspond to different
sample sizes.

25
Figure 8: Efficiency under standard Normal. Different panels correspond to
different sample sizes.

26

You might also like