Zhang 1994
Zhang 1994
To cite this article: Biao Zhang (1994) Nonparametric regression expectiles , Journal of Nonparametric Statistics, 3:3-4,
255-275, DOI: 10.1080/10485259408832586
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of
the Content. Any opinions and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied
upon and should be independently verified with primary sources of information. Taylor and Francis shall not be
liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities
whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of
the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Nonparametric Statistics, Vol. 3, pp. 255-275 01994 Gordon and Breach Science Publishers
Reprints available directly from the publisher Printed in the United States of America
Photocopying permitted by license only
It is well known that a standard nonparametric regression analysis is to model the average behavior
between the dependent variable Y and the explanatory variable x . But such an approach may not
always be appropriate if one is interested in the extreme behavior of Y conditional on x . This paper
considers the problem of estimating the expectile function of the conditional distribution of Y given x
based on the observational data generated according to a nonparametric regression model. We
proposed a kernel-type nonparametric regression estimator, called nonparametric regression expectile,
using an asymmetric squared loss function. This estimator models not only the average behavior but
also the extreme behavior of Y given x in the nonparametric regression setting. An iterative algorithm
is presented to calculate the estimator. It is shown that the nonparametric regression expectile is
consistent and asymptotically normally distributed. We also derive a lower bound for the asymptotic
Downloaded by [Northwestern University] at 06:23 25 March 2015
variance and the asymptotic expression for the mean square error and the optimal bandwidth. A
simulation study is given to demonstrate the utility of the nonparametric regression expectile for
understanding nonparametric regression data.
1. INTRODUCTION
Suppose that the n observable data {(x,, y,), . . . , (x,, y,)} are generated according
to the following nonparametric regression model
*This research was supported by NSF Grant DMS 89-02667. Computations were performed using
computer facilities supported in part by the National Science Foundation Grants DMS 86-01732, DMS
87-03942 and DMS 89-05292 awarded to the Department of Statistics at The University of Chicago,
and by The University of Chicago Block Fund.
256 B. ZHANG
with weights
where K(.) is a kernel function having finite support on [-I, 11 with a maximum
at zero, {h,) is a sequence of positive bandwidths tending to zero as the sample
size n tends to infinity, and { s , ) ~ is = ~a sequence of interpolating points such that
x , - ~(S, SX,, i = 1 , . . . , n - 1 and so =O, s, = 1.
If the kernel function K ( . ) is smooth, then the kernel estimator (1.2) is
asymptotically unbiased. Thus, given data (x,, y,), i = 1, . . . , n, which is a cloud of
points in Euclidean space R ~ , the kernel estimator gX(x) of the regression
function g(x), for large n, describes the middle of the point cloud in the y
direction, as a function of x. But such an estimator may not always be appropriate
for modeling the relationship between Y and x when one is interested in the
extreme behavior of Y conditional on x. For example, in order to know whether
an action concerning Y will have the effect on extreme values of Y,one may want
to capture the local behavior of the data either in the center or in the tails, and
thus one is interested in the higher or lower parts of the point cloud as well as its
middle. The objective of this paper is to present an approach modeling not only
Downloaded by [Northwestern University] at 06:23 25 March 2015
the average behavior but also the extreme behavior of Y given x in the
nonparametric regression setting.
For linear regression model, Koenker and Bassett (1978, 1982) defined (linear)
regression quantiles using asymmetric absolute loss functions. Breckling and
Chambers (1988) considered asymmetric M-estimators. Newey and Powell (1987)
and Efron (1991) used asymmetric squared loss functions to define what they call
curves expectiles and regression percentiles. In this paper, we will under the
nonparametric regression setting introduces a new class of kernel-type non-
parametric regression estimators, called nonparametric regression expectiles,
using asymmetric squared loss functions. These estimators are used to estimate
the expectile function (See Subsection 2.1) of the conditional distribution of Y
given x. The proposed approach also generalizes the bounded and symmetric loss
function in the M-type kernel estimator to an unbounded and asymmetric loss
function and is useful in understanding the nonparametric regression data.
In Section 2, we define nonparametric regression expectiles using asymmetric
squared loss functions. We also propose an iterative algorithm to calculate the
nonparametric regression expectiles. Section 3 establishes large sample properties
of nonparametric regression expectiles. In Section 4, we derive the asymptotic
expression for the mean square error of the nonparametric regression expectile,
including the asymptotically optimal bandwidth. Last, in Section 5, we present a
simulation study to demonstrate the utility of nonparametric regression expectiles
for understanding nonparametric regression data. Proofs of lemmas and theorems
appear in the Appendix.
where the error E has mean 0, variance u2and continuous density function f (.).
Let f ( y x ) be the conditional density function of Y given x, then f ( y / x ) =
/
f (y - g(x)). For 0 < p < 1, the pth quantile function qp(x) of this conditional
distribution satisfies J Z p g ' f (y I x) dy = p . Depending on f I x), this quantile
(a
where 0 < p < 1, then p,(.) is an asymmetric squared loss function and pp(r)
reduces to the symmetric squared loss function if p = 0.5. Furthermore, it can be
shown that gp(x) is the unique global minimum of the following equation
I
E[pp(Y - 0) x] = min! (2.7)
with respect to 8, where 0 I X I 1. This is an equivalent way to define the
expectile function gp(x).
258 B. ZHANG
xn
i=l
a i ( x ) p l I 2 ( X- 8 ) = min!
with respect to 8 , provided that Z7=l a i ( x )= 1, which is true for large n such that
[ ( x - l)lh,, x l h , ] 2 [ - I , 11. For O < p < 1, (2.2), ( 2 . 7 ) and ( 2 . 9 ) motivate us to
estimate g p ( x ) by what is called nonparametric regression expectile, denoted by
Downloaded by [Northwestern University] at 06:23 25 March 2015
x a i ( x ) p p ( &- 8 )
n
i=l
= min! (2.10)
with respect to 8 , where 0 s x 5I. Notice that g,(x) is the kernel estimator (1.2)
when p = i. Let
n
estimator (1.2) is not robust in the distribution sense. Therefore, Hardle and
Gasser (1984), among others, proposed the M-type kernel estimator gE(x) as the
estimator of the regression function g(x), which includes some robust alternatives
to the estimator (1.2) and is defined as any minimizer of
n
2 ai(x)p(.I;;. - 8) = min!
i=l
Iterative methods are needed to solve (2.15). Starting from an initial ap-
proximation t & ( ~ to
) t , ( ~ ) , we successively calculate
for k = 0, 1 , . . . . In the following, we will show that under the condition that we
restrict ourselves to solve the fixed point equation (2.15) among the class of
continuous functions, i.e., 8 = 6(x) E C[O, 11, the gL;)(x)'s calculated according to
(2.16) will always converge to the desired nonparametric regression expectile
g,(x). In the meantime, we will also give the rate of convergence of the iteration
(2.16).
For 8 E C[O, 11, let the norm 11 11 be defined by 11 8 (1 = max, ,[o, ll 11 e(x) 11, then
we have
2.1. Suppose that (A3) (See Section 3) holds and that g,,
THEOREM gL? E C[O, I],
then
-(k+l) -
knP g,ll < IEP- g,ll. (2.17)
Furthermore, there exists a constant ap(O < a, < I), such that
Theorem 2.1 shows that the distance from the true nonparametric regression
expectile are strictly decreasing in the iteration. Furthermore, for any starting
value, the iteration (2.16) converges linearly to g,(x), that is, the deviation in
terms of the norm I / . I / decreases at a geometric rate.
Next we present an algorithm showing how to use the iteration (2.16) to find
the true nonparametric regression expectile g,(x). Let g&)(x) be any starting
value for each x, say g$(x) = 0 for all x E [0, I], and gA:)(x) be given according to
(2.16) for k 2 1 and each x, then it can be shown by induction that for k r 1,
Thus, for rn r 1, it follows from (2.19) and the triangle inequality that
But
where Mn= maxx,[o, IC:=la,(x)ql. For given p, n and positive 77, if we choose
the smallest integer ko such that
THEOREM^.^. Let gp and gjP) E C[O, I], and define gF'l) = u,(~F)) for k r I, then
Ikbk+" - gP11 < kf' - gp 11 (2.26)
and there exists a constant Pp(O < &, < I), such that
K(u) du = 1 and
I uK(u) du = 0.
where
The idea behind the establishment of the weak consistency of g,(x) to g p ( x )is
1
first to show the weak consistency of & ( 8 x, p ) to h p ( 8 / x ) , then we would
anticipate that the minimum g,(x) of s n ( 8 ( x ,p) converges in probability to the
minimum gp(x) of h p ( 8 I x ) . In doing so, let [8,, Ob] be any bounded closed set,
then since both Sn(8 I x, p ) and h p ( 8 I x ) are continuous functions of 8 , it follows
by the following Lemma 3.1 that for each x E ( 0 , l ) and p E (0, I ) ,
= lim
n-f-
sup
e€[e,,eb]
12 a i ( x ) h P ( e1 xi) - h p ( 8 1 x)l = 0.
i=l
(3.3)
Proof The proof follows from Lemma 3.2 and the following lemma due to
Newey and Powell (1987, Lemma A).
.
THEOREM
x (0, I),
3.2. Suppose that (Al) to (A3) hold, then for each p E (0, 1) and
Downloaded by [Northwestern University] at 06:23 25 March 2015
(3.8)
THEOREM 3.3. Suppose that ( A l ) to (A3) hold, E E ~ +< " w , and g(x) is con-
tinuously differentiable, then for each p E (0, 1) and x E (0, I), we have
where
VP (x )
4(x) =-
Ep(4
1
, K2(u)du,
where
Note that SK does not depend on p, x and error density function f (a). According
to Epanechnikov (1969), the quadratic kernel
(3.16)
Furthermore, it can be shown that the lower bound in (3.16) is achieved if
f '(z)lf (z) is constant or f '(z)lf (z) is of the form az for some constant a.
In summary, we have shown the following inequality
I ( f )S~07 (3.17)
for all p and x in (0,l) under the conditions described above.
NONPARAMETRIC REGRESSION EXPECTILES 265
In this section, we establish the asymptotic expression for the mean square error
of g,(x), from which we derive the asymptotically optimal bandwidth. Note from
(A.24) in the Appendix that
p ) = Var (4%))
u~,(x, -,
then we have the following asymptotic expression for B,(x, p ) and u:(x, p).
LEMMA4.1. Suppose the conditions (Al) to (A3) hold and g(x) is twice
Downloaded by [Northwestern University] at 06:23 25 March 2015
Now we are ready to give the asymptotic expression for the mean square error
of gn,(x), which is defined by
THEOREM 4.1. Suppose that (Al) to (A3) hold and g(x) is twice continuously
differentiable, then if Ap(gp(x) I x) # 0,
and
266 B. ZHANG
Theorem 4.1 implies that with the choice of the optimal bandwidth h:, the
nonparametric regression quantiles gnP(x) has order of consistency n4I5, i.e.
n4'5E(g,p(~)- g p ( ~ ) ) 2 +C < rn and n + m.
5. A SIMULATION STUDY
A small simulation study was performed to provide insight into the behavior of
the non-parametric regression expectile g,(x). For model (1.1), we adopted a
rescaled version of a regression function (Wahba and Wold 1975)
Figure 1. The nonparametric regression expectiie g,(x) based on 100 independent observations
simulated from the model (1.1) and (5.1) with s, = x i = (i - 1)/99, i = 1,. . . , 100, and the ei being the
N(0, 1) errors. The top, middle and bottom curves correspond to & ( x ) with p = 0.25,0.5 and 0.75,
respectively.
NONPARAMETRIC REGRESSION EXPECTILES 267
0.99 with the choice of the quadratic kernel (3.14) in evaluating the weights a,(x)
given by (1.3). All computations were done in double precision FORTRAN. The
bandwidth h, was selected according to the optimal bandwidth h:, which can be
calculated from (4.8) as h,*= 0.0505,0.0616 and 0.0737 for p = 0.25,0.5 and 0.75,
respectively. The resulting three nonparametric regression expectile curves,
denoted by g l ~25,(x),
~ ( gloO(O
~ 5 ) ( ~ )and g l ~ ~ ( ~ , 7 5 )were
( x ) , superimposed on the
observations in Figure 1.
The following three points can be made from Figure 1. First, all three
nonparametric regression expectile curves have similar patterns, reflecting the
homogeneousity of the observations. The second point is to note that there are 32
observations below the curve gl,(a2,)(~), 48 observations below gloo(05,(~) and 67
observations below g,oo(o,,s)(x),suggesting that the lower ( p = 0.25) and upper
(p = 0.75) nonparametric regression expectile curves are inclined to be closer to
the mean ( p = 0.5) nonparametric regression expectile curve. This is a conse-
quence of the decreasing outlier sensitivity of the nonparametric regression
expectile g,(x) for smaller or larger values of p . Finally, the nonparametric
regression expectile gnP(x) conveys more information than the kernel estimator
(1.2) by itself. For example, when x = 0.5, the central 50% of the response Y,
Downloaded by [Northwestern University] at 06:23 25 March 2015
APPENDIX: Proofs
Proof of Theorem 2.1. If g:? E C[O, 11, then it is easily seen that giY1)=
T,(g$)) E C[O, 11 for k 2 0. Let
then applying the mean value theorem for integrals gives that
5 S' ~ ~ ( _ ) h ( du~ I)
1
A n ( x )= 5
i=l s,-I K(?') duIm-lh(xi)- i=l s,-1
X-U
Since K(.) and h ( . ) are both Lipschitz-continuous, there are constants L1 and
LZ such that
lKm(u)- Km(v)ls Lllu - vj and Ih(u) - h ( v ) l sL21u - vl,
and thus if In = {i: Ix - xil 5 h,), we have
= O ( n - m )+ O(n-(m+S-2)hn),
By (1.3) and assumptions (A2), (A3), it is seen that nhnakis uniformly bounded,
i.e., there is a constant C, such that
The error term in (A.4) is at most ak(x) It1 rnk(x)(Billingsley 1986, p 353), where
Downloaded by [Northwestern University] at 06:23 25 March 2015
It can be shown by (A.3) and the smoothness of g that there exists a constant CZ
such that
Therefore, (A.l) follows from (A.2), (A.8) and the triangle inequality.
Proof of Theorem 3.2. Note first that Ap(8 1 x) is continuously differentiable in 8,
satisfying Ap(gp(x)I x) = 0 and dAp(8 I x)/a8 < 0, therefore, there are positive
constants a and b such that
It follows from (A.lO) and Theorem 3.1 that with probability tending to 1,
If (A.13) is true, then the term on the left-hand side of (A.12) tends to 0 in
probability. Let ~ > 0be given and K; = 2Lnlr], where L, =
nhn2;=l a:(x)Er~.,Oc;,gp(x)), then for sufficient large n, say n r N, we have
NONPARAMETRIC REGRESSION EXPECTILES
(A.16)
and
hence, by (A.17)
(A.19)
then
hence
where
a (g&) - &(XI) = Gn(x, P ) + o p ( l ) , (A.24)
where
Thus, it follows from (A.29), (A.30), (A.31) and the Slustky Theorem that
Therefore, Theorem 3.3 follows from (A.24), (A.32) and the Slustky Theorem,
Proof of Lemma 4.1. It follows from (2.4), (A.24) and Lemma 3.1 that for large
n,
1
-
- (x) I u) du + O(n-')
It is easy to check that the right-hand side of (A.35) is minimized by h,* given by
(4.8). Substituting h: into (A.35) gives (4.9).
Acknowledgement
The author wishes to thank Professors Michael Stein and Wing Wong for their
helpful comments.
References
Billingsley, P. (1986), Probability and Measure, 2nd ed., New York: John Wiley & Sons.
Breckling, J. and Chambers, R. (1988), "M-quantiles". Biometrika, 75, 761-771.
Efron, B. (1991), "Regression percentiles using asymmetric squared error loss," Statistica Sinica, 1,
94-125.
Epanechnikov, V. A. (1969), "Nonparametric estimates of a multivariate probability density," Theor.
Probability Appl., 14, 153-158.
Gasser, T. and Miiller, H. G. (1979), "Kernel estimation of regression functions," in Smoothing
Techniques for Cure Estimation, Springer Lecture Note 757. (ed. T. Gasser, M. Rosenblatt).
Hardle, W. and Gasser, T. (1984), "Robust non-parametric function fitting," J. R. Statist. Soc. B, 46,
42-51.
Koenker, R. and Bassett, G. (1978), "Regression quantiles," Econometrica, 46,33-50.
Koenker, R, and Bassett, G. (1982), "Robust tests for heteroscedasticity based on regression
Downloaded by [Northwestern University] at 06:23 25 March 2015