Pohq 4
Pohq 4
Estimation
Abstract
1
1 Introduction
The estimation of population quantiles is of great interest when a parametric form
for the underlying distribution is not available. In addition, quantiles often arise
as the natural thing to estimate when the underlying distribution is skewed. Let
X1 , X2 , · · · , Xn be an independent and identically distributed random sample drawn
from an absolutely continuous distribution function F with density f . Let X(1) ≤
X(2) ≤ · · · ≤ X(n) denote the corresponding order statistics. The quantile function Q
of the population is defined as Q(p) = inf{x : F (x) ≥ p}, 0 < p < 1. Note that Q is
the left-continuous inverse of F . Denote, for each 0 < p < 1, the pth quantile of F
by ξp , that is, ξp = Q(p).
A traditional nonparametric estimator of the distribution function is the empir-
ical function Fn (x), which is defined as
n
1X
Fn (x) = I(−∞,x] (Xi )
n i=1
where [np] denotes the integer part of np. Let pr = r/(n + 1) and qr = 1 − pr . If we
use X(r) to estimate the pr th quantile, then the asymptotic bias and variance are
Nadaraya (1964) showed under some assumptions for k, f and h, Q̂n (p) (appropri-
ately normalized) has an asymptotic standard normal distribution. Another notable
property of Q̂n (p), namely the almost sure consistency, was obtained by Yamato
(1973). Ralescu and Sun (1992) obtained the necessary and sufficient conditions for
the asymptotic normality of Q̂n (p). Azzalini (1981) and an unpublished report used
3
heuristic arguments based on second order approximations and performed some nu-
merical comparisons of Q̂n (p) with the classical sample quantile for estimating the
95th quantile of the Gamma (1) distribution. These studies indicated a considerable
amount of empirical evidence to support the superiority of Q̂n (p) for a variety of
smooth distribution functions.
Azzalini (1981) considered second order property of F̂n under the following as-
sumptions: (i) h → 0 as n → ∞; (ii) the kernel has a finite support, that is,
k(t) = 0 if |t| > t0 for some positive t0 ; (iii) the density f is continuous in the
interval (x − t0 h, x + t0 h); and (iv) f 0 (x) exists. He pointed out that the asymptotic
optimal bandwidth for F̂ is of the form
u 31
hopt = (2)
4vn
where
n Z t0 o n1 Z t0 o2
2 0 2
u = f (x) t0 − K (t)dt) , v = f (x) t k(t)dt .
−t0 2 −t0
Also, Azzalini (1981) suggested, without offering a proof, that (2) is again the asymp-
totically optimal choice of h for Q̂n (p). We state the result in the following theorem
and the proof of the theorem can be found in Shankar (1998).
We make the following assumptions:
Assumption A
Theorem 1. Under assumptions (1)-(3), the asymptotic mean squared error of Q̂(p)
is
p(1 − p) h4 f 0 (ξp )2 2 h 1
AM SE Q̂(p) = + µ 2 (k) − ψ(k)
nf (ξp )2 4 f (ξp )2 n f (ξp )
and the asymptotically optimal choice of bandwidth for the smoothed empirical quan-
tile function Q̂n (p) is
31
f (ξp )ψ(k)
hopt,1 = (3)
n{f 0 (ξp )}2 µ2 (k)2
4
R∞ 2
R
where µ2 (k) = −∞
t k(t)dt and ψ(k) = 2 yk(y)K(y)dy. If we take k as the standard
R∞ √
normal density, then −∞ tdK 2 (t) = 1/ π, µ2 (k) = 1 and
" # 31
f (ξp )
hopt,1 = √ .
πn{fg0 n∗ (ξp )}2
It is clear that when i/n is close to p, Q̃n (p) puts more weight on the order statistics
X(i) . The asymptotic normality and mean squared consistency of Q̃n (p) were pro-
vided by Yang (1985), while Falk (1984) showed that the asymptotic performance of
Q̃n (p) is better than that of the empirical sample quantile Qn (p) in the sense of rel-
ative deficiency for appropriately chosen kernels and sufficiently smooth distribution
functions.
Building on Faulk (1984), Sheater and Morron (1990) gave the asymptotic mean
squared error (AMSE) of Q̃n (p) as follows if f is not symmetric or f is symmetric
but p 6= 0.5:
p(1 − p) 2 1 h
q (p) + h4 q 0 (p)2 µ2 (k)2 − q 2 (p)ψ(k)
AM SE Q̃n (p) = (5)
n 4 n
where q = Q0 and q 0 = Q00 . If q = Q0 > 0 then
13
Q0 (p)2 ψ(k)
hopt,2 = . (6)
nQ00 (p)2 µ2 (k)2
R
where R(k) = k 2 (x)dx. In this case, there is no single optimal bandwidth minimizing
the AM SE.
5
Remark 2.2. If q = 0, we need higher order terms. The AM SE of Q̃n (p) can be
shown as follows:
Z
1 1 4 00 2 2 −1 2 00 2
AM SE Q̃n (p) = ( − )h Q (q) µ2 (k) + 2n h Q (q) (q − ht)tk(t)j(t)dt
4 n
Rt
where j(t) = −∞ xk(x)dx. The proof is provided in the Appendix.
where gn is the smoothing parameter (Wand and Jones, 1995). Therefore, the asymp-
(r)
totic mean squared error properties of fˆgn (x) can be derived straightforwardly to
obtain (Wand and Jones, 1995)
1 1 2
AM SE{fˆg(r) R(k (r) )f (x) + gn4 {µ2 (k)}2 f (r+2) (x)
(x)} = (9)
n
ngn2r+1 4
R
where R(η) = η 2 (x)dx for any square-integrable function η. It follows that the
AMSE-optimal bandwidth for estimating f (r) (x) is of order n−1/(2r+5) . The asymp-
totically optimal bandwidth for for fˆgn (x) is given by
51
R(k)f (x)
gn∗ = . (10)
n(µ2 (k))2 f 00 (x)2
6
and the asymptotically optimal bandwidth for fˆ0 gn (x) is
71
3R(k 0 )f (x)
gn∗∗ = . (11)
n(µ2 (k))2 f 000 (x)2
a4
Z
1 2
AM SE{q̃(p)} = q 00 (p)2 µ2 (k)2 + q (p) k 2 (y)dy . (13)
4 na
Minimizing (13) with respect to a, we obtain the asymptotically optimal bandwidth
for q̃(p) as
1
Q0 (p)2 k 2 (y)dy 5
R
a∗opt = . (14)
nQ000 (p)2 µ2 (k)2
To estimate Q00 = q 0 in (6), note that
n n p − i−1 i o
d 0 1 X 0 n 0 p− n
Q̃00 n (p) = Q̃ (p) = 2 X(i) k −k . (15)
dp n a i=1 a a
7
4 Bandwidth selection
In this section, we consider several data-based methods to find the asymptotically
optimal bandwidths for the estimators Q̂n (p) and Q̃n (p). Bandwidth plays a critical
role in implementation of practical estimation. It determines the trade-off between
the amount of smoothness obtained and closedness of the estimation to the true
distribution. (see Wand and Jones)
is the empirical p-th quantile Qn (p). Using in (10) with f (ξˆp ) and f 00 (ξˆp ) replaced
gn∗
by their Normal(µ, σ 2 ) reference values, we obtain fˆgn∗ (x). Using gn∗∗ in (11) with
f (ξˆp ) and f 000 (ξˆp ) replaced by their Normal(µ, σ 2 ) reference values, we obtain fˆ0 ∗∗ (x).
gn
Remark 4.1. In the expression of the hopt,1 , we have the derivative of f in the
denominator. If f 0 has zeros, then its estimates at these zeros are also very small.
Hence the estimator ĥopt,1 of hopt,1 at these zeros will be very unstable. For example,
if f is standard normal, then f 0 = −xf has a zero at x = 0, which corresponds to
p = 0.5, and hence, when p = 0.5, the estimator ĥopt,1 is very unstable. Similarly,
the first derivative of the double exponential density has a zero at x = 0 and the first
derivative of the log normal density has a zero at x = e−1 .
8
4.2 Method 2. Approximate hopt,2 for Q̃n (p) using quantile
derivative estimators
The asymptotically optimal bandwidth hopt,2 , given in (6), for Q̃n (p) involves the
unknown quantities Q0 (p) and Q00 (p), which can be estimated by Q̃0n (p) and Q̃00n (p)
in (12) and (15), respectively. The asymptotically optimal bandwidths a∗opt and a∗∗
opt ,
given in (14) and (16), for Q̃0n (p) and Q̃00n (p) depend on Q0 (p), Q000 (p) and Q(4) (p). We
replace these unknowns by their Normal(µ, σ 2 ) reference values. Then, using Q̃0n (p)
with a = a∗opt and Q̃00n (p) with a = a∗∗ opt , we have the following data-based bandwidth
( ) 31
Q̃0 n (p)2 ψ(k)
ĥopt,2 = (19)
nQ̃00 n (p)2 µ2 (k)2
for Q̃n (p).
reference values, and then use Q̃0n (p) with a = a∗opt and Q̃00n (p) with a = a∗∗
opt to get
( ) 31
Q̃0 n (p)5 ψ(k)
h̄opt,2 = . (22)
nQ̃00 n (p)2 µ2 (k)2
If we take k as the standard normal density, then
( ) 31
Q̃0 n (p)5
h̄opt,2 = √ .
n π Q̃00 n (p)2
9
4.4 Method 4. Approximate hopt,2 for Q̃n (p) using density
derivative estimators
From (20) and (21), we have
31
f (ξp )4 ψ(k)
hopt,2 = . (23)
nf 0 (ξp )2 µ2 (k)2
Then, plugin the estimators of f (ξp ) and f 0 (ξp ) in Method 1, see (17), to obtain
( ) 31
fˆgn∗ (ξˆp )4 ψ(k)
h̄opt,2 = . (24)
nfˆ0 g∗∗ (ξˆp )2 µ2 (k)2
n
5 Numerical Performance
We implement the methods in Section 4. Four distributions are selected: Exponential,
Double Exponential, Lognormal and standard Normal. We shall use the standard
normal density as the kernel k, i.e. k(x) = √1
2π
exp(−x2 /2). Then k 0 (x) = −xk(x).
and we can find Z
µ2 (k) = x2 k(x)dx = 1,
Z Z x
1
ψ(k) = 2 {k(x)[ k(t)dt]}dx = √ ,
−∞ π
Z
1
R(k) = k 2 (x)dx = √ ,
2 π
Z Z
1
R(k 0 ) = {k 0 (x)}2 dx = x2 k 2 (x)dx = √ .
4 π
10
following
Hence 15 17
exp(x) 3 exp(x)
gn∗ = √ , gn∗∗ = √ ,
n π 4n π
11
p(1 − p) 2x h h4
e − √ ex + ,
AM SE Q̂n (p) =
n n π 4
−4x 15 17
3e−6x
∗ e ∗∗
a = √ , a = √ ,
8 πn 144 πn
p(1 − p) 2x h4 4x h
e + e − √ e2x
AM SE Q̃n (p) =
n 4 n π
Case 3. f is the density of lognormal. We have
1 log2 x
f (x) = √ e− 2 ,
2πx
1 log2 x 1 1 f (x)
f 0 (x) = √ e− 2 (− 2 − 2 log x) = − (1 + log x),
2π x x x
1 log2 x 1 f (x)
f 00 (x) = √ e− 2 3 (1 + 3 log x + log2 x) = 2 (1 + 3 log x + log2 x),
2π x x
1 log2 x 1 f (x)
f 000 (x) = √ e− 2 4 (−8 log x−6 log2 x−log3 x) = − 3 (8 log x+6 log2 x+log3 x).
2π x x
Hence
4
15 6
71
x 3x
gn∗ = √ , gn∗∗ = √ ,
n π(1 + 3 log x + log2 x)2 f (x) 4n π(8 log x + 6 log2 x + log3 x)2 f (x)
√
2πp(1 − p) 2 log2 x 2h log2 x h4 (1 + log x)2
AM SE Q̂n (p) = xe − xe 2 + ,
n n 4 x2
( ) 15
−2 log2 x
1 e
a∗ = √ √ ,
2π n 2(2 + 3 log x + 2 log2 x)2
( ) 71
−3 log2 x
1 3e
a∗∗ = √ √ ,
2π 2n 2(5 + 13 log x + 11 log2 x + 6 log3 x)2
2 2
n h o
AM SE Q̃n (p) = π 2 h4 (1 + log x)2 e2 log x + 2πx2 elog x p(1 − p) − √ .
π
Case 4. f is the density of double exponential.
We have f (x) = 12 e−|x| = f 00 (x) except at x = 0 and
(
0 − 12 e−x x > 0 1
f (x) = = − sign(x)e−|x| = f 000 (x).
1 x
e x<0 2
2
12
Hence
15 17
2e|x| 3e|x|
gn∗∗ = √ , gn∗∗ = √
n π 2n π
4p(1 − p) 2|x| 2h h4
e − √ e|x| + ,
AM SE Q̂n (p) =
n n π 4
1
−4|x| 5 −6|x| 71
∗ e ∗∗ e
a = 7
√ , a = 10
√ ,
2n π 2 3n π
4p(1 − p) 2|x| 4h
AM SE Q̃n (p) = 4h4 e4|x| + e − √ e2|x| .
n n π
13
methods are under 1 with all p values. But, unfortunately, no method works better
than all the other methods for all distributions and all sample sizes. In Figure 8, for
example, Method 2 sometimes works better than the others, but sometimes worse
than the others. From this Figure it seems that Method 1 is always more efficient
than Method 3. But if we look at Figure 6, Method 3 is more efficient than Method 1
for many p values in each sample size. We can also see from Figures 5-8 that plots of
Method 1 (2) are similar to plots of Method 3 (4). This is not casual because we use
the same formula to compute their asymptotic M SEs. From Figures 1–4, we observe
that another common behavior for Method 2 and Method 4 is that they performance
badly near the boundaries, i.e. when p is close to 0 or 1.
In a word, the kernel quantile estimators, wit no matter which bandwidth selec-
tion method, are more efficient than the empirical quantile estimator in most situa-
tions. And when sample size n is relatively small, say n=50, they are significantly
more efficient than the empirical quantile estimator. But no one single method is
most efficient in any situations.
References
[1] Azzalini, A. (1981). A note on the estimation of a distribution function and
quantiles by a kernel method. Biometrika 68, 326-328.
[4] Nadaraya, E. A. (1964). Some new estimates for distribution functions. Theory
Probab. Appl., 9, 497-500.
14
[6] Read, R.R. (1972). The asymptotic inadmissibility of the sample distribution
function. Ann. Math. Statist., 43, 89-95.
[7] Ralescu, S. S. and Sun, S. (1993). Necessary and sufficient conditions for the
asymptotic normality of perturbed sample quantiles. J. Statist. Plann. Infer-
ence, 35 55-64.
[10] Wand, M. P. and Jones, M. C. (1995). Kernel smoothing. Chapman and Hall,
London.
15
Appendix
We now provide the proof for AM SE in Remark 2.2. Here we follow the notation
of Faulk (1984). Since F −10 (q) = Q0 (q) = 0, we have
R1 R
= n−1 { k(x)(q − αn x − 1(0,q−αn x) (y))F −10 (q − αn x)dx}2 dy
V ar Q̃n (p) 0
R1 R
= n−1 0
{ k(x)(q − αn x − 1(0,q−αn x) (y))[F −10 (q) − αn xF −100 (q) + O(αn2 )]dx}2 dy
R1 R
= n−1 0
{ k(x)(q − αn x − 1(0,q−αn x) (y))(−αn xF −100 (q))dx}2 dy + O(n−1 αn2 )
R1 R
= b 0
{ k(x)(q − αn x − 1(0,q−αn x) (y))xdx}2 dy + O(n−1 αn2 )
R1
xk(x)1(0,q−αn x) (y))dx}2 dy + O(n−1 αn2 )
R R R
= b 0
{q xk(x)dx − αn x2 k(x)dx −
R1
xk(x)1(0,q−αn x) (y))dx]2 dy + O(n−1 αn2 )
R
= b 0
[αn µ2 (k) +
R1 2 2
R
= b 0
{αn µ 2 (k) + 2αn µ 2 (k) xk(x)1(0,q−αn x) (y))dx
R1R
= bαn2 µ22 (k) + 2cαn µ2 (k) 0
xk(x)1(0,q−αn x) (y))dxdy
R1 R
+b 0
[ xk(x)1(0,q−αn x) (y))dx]2 }dy + O(n−1 αn2 )
4
= bαn2 µ22 (k) + 2bαn µ2 (k)S1 + bS2 + O(n−1 αn2 )
16
where b = n−1 αn2 F −100 (q)2 . But
R1R
S1 = 0
xk(x)1(0,q−αn x) (y))dxdy
R R1
= xk(x) 0
1(0,q−αn x) (y))dydx
R
= xk(x)(q − αn x)dx
R R
= q xk(x)dx − αn x2 k(x)dx
= −αn µ2 (k)
and R1 R
S2 = 0
[ xk(x)1(0,q−αn x) (y))dx]2 }dy
R 1 R q−y
= 0
αn
[ q−1 xk(x)dx]2 dy
αn
R q−y R1 R q−y
= {y[ αn
q−1 xk(x)dx]2 }|10 − 0
yd{[ αn
q−1 xk(x)dx]2 }
αn αn
R1 R q−y
αn q−y q−y 1
= −2 0
{y[ q−1 xk(x)dx] α k( α )(− α )}dy
n n n
αn
2
R1 q−y q−y
R q−y
αn
= αn 0
{y αn
k( αn
)[ q−1 xk(x)dx]}dy
αn
2
R q−1
αn
Rt
= αn q {(q − αn t)tk(t)[ q−1 xk(x)dx]}d(−αn t)
αn αn
R αqn Rt
= 2 q−1 {(q − αn t)tk(t)[ q−1 xk(x)dx]}dt
αn αn
R αqn
= 2 q−1 (q − αn t)tk(t)j(t)dt
αn
4 Rt
where j(t) = xk(x)dx and c is such that k is finitely supported in [−c, c]. Then
−c
Z
−1 2 −100 −1 2 −100
2 2 2 2
(q−αn t)tk(t)j(t)dt+O(n−1 αn2 ).
V ar Q̃n (p) = −n αn F (q) αn µ2 (k)+2n αn F (q)
17
If we replace αn by h and F −100 (q) by Q00 (q), then
Z
−1 4 00
(q)2 µ22 (k) + 2n−1 h2 Q00 (q)2 (q − ht)tk(t)j(t)dt + O(n−1 h2 ).
V ar Q̃n (p) = −n h Q
h4 2
µ (k)Q00 (q)2 + O(h4 ) + O(n−1 h2 ) − n−1 h4 Q00 (q)2 µ22 (k)
M SE Q̃n (p) = 4 2
That is
Z
1 1
AM SE Q̃n (p) = ( − )h4 Q00 (q)2 µ22 (k) + 2n−1 h2 Q00 (q)2
(q − ht)tk(t)j(t)dt.
4 n
18
Figure 1: Efficiency under double exponential. Different panels correspond to
different methods.
19
Figure 2: Efficiency under exponential. Different panels correspond to different
methods.
20
Figure 3: Efficiency under Log Normal. Different panels correspond to different
methods.
21
Figure 4: Efficiency under standard Normal. Different panels correspond to
different methods.
22
Figure 5: Efficiency under double exponential. Different panels correspond to
different sample sizes.
23
Figure 6: Efficiency under exponential. Different panels correspond to different
sample sizes.
24
Figure 7: Efficiency under Log Normal. Different panels correspond to different
sample sizes.
25
Figure 8: Efficiency under standard Normal. Different panels correspond to
different sample sizes.
26