Functional Local Linear Relative Regression: Abdelkader Chahad Ali Laksaci Ait-Hennani Larbi
Functional Local Linear Relative Regression: Abdelkader Chahad Ali Laksaci Ait-Hennani Larbi
Abstract—In this contribution we present a new estimator of nonparametric functional regression. This easy version has
the regression operator of a scalar response variable given a been improved by Laksaci et al. (2013) for others models. We
functional explanatory variable. The last is built by limiting return to Berlinet et al. (2011) for another version constructed
the mean squared relative error of the local linear regression
operator. As asymptotic results, we prove the punctual and by inverting the local covariance operator of the functional ex-
the uniform almost complete convergence with speed of this planatory variable. They obtained the convergence rate of the
estimator. mean quadratic error of the constructed estimator. In parallel,
Index Terms—Functional data analysis ; Nonparametric re- the relative error regression has been recently introduced in
gression; Local linear estimate; Kernel estimate; Relative-error. NFDA by Demongeot et al. (2016). They presented that this
regression model has important advantages over the classical
I. I NTRODUCTION regression. It should be noted that both local linear estimation
or relative error regression have been extensively studied in
Let’s suppose n pairs of random variables (Xi , Yi ) the multivariate case. See, for example, Stone (1977), Fan et
for i = {1, . . . , n} that we expect drawn from the pair (X, Y ) al. (1996), Masry (1996), Hallin et al. (2009), Narula and
which is takes its values in F × R∗+ , where F is a semi-metric Wellington (1977), Jones et al (2008), Yang et Ye (2013),
space equipped with a semi-metric d. Furthermore, we suppose Laksaci and Mechab (2016), Attouch et al. (2016), among
that the factors X and Y are associated by this relation others. However, littel attention has been given to the local
Y = R (X) + , (I.1) linear estimation of the relative error regression. it is known
that, only the paper by Jones et al (2008) gives an estimator
where R is an operator from F to R and is a random error of the relative error regression based on the multivariate local
variable independent to X. linear procedure. In this contribution we deal with the general
The nonparametric estimation of the operator R is one of case where the regressors are of functional nature.
the most significant tool to predict the relationship between
II. T HE MODEL AND ITS ESTIMATE
Y and X. Such a subject has taken an important place in
Nonparametric Functional data analysis (NFDA). Different Compared with the multivariate case, there can be found
nonparametric systems can be found in the literary works of different versions of the functional local linear estimate. But,
NFDA. We cite for instance Ferraty and Vieu (2006) for the all these versions are based on two common procedures. The
functional Nadaraya-Watson estimator, Attouch et al. (2010) first one is the functional operator which is assumed smooth
for the nonparametric robust estimation, Burba et al.(2009) enough to be locally well approximated by a polynomial. The
for kN N kernel method, Barrientos et al. (2010) for the local second one is the use of the following least square error
linear approach or Demongeot et al. (2016) for the functional E (Y − R(X))2 |X ,
relative error techniques. The principle main of this work
is to build a new estimator of the regression operator. This as a loss function to define the estimate. but, this standard
estimator is obtained by combining the ideas of the relative might be unadapted to certain circumstances. Certainly, this
error regression with the local linear approach. Observing that loss function deals with all variables, in the study, as having
the local linear approach has different good features over the the same weight. So, this way gives irrelevant effects when
kernel method, in particular, has little bias in comparison with the data contains some outliers. In this contribution we circle
this last. this limitation by estimating the operator R with respect the
In (2009) Baı̀llo and Grané introduced, the local linear mod- following mean squared relative error
elling in NFDA. They studied the L2 -convergence of the local
" 2 #
Y − R(X)
linear estimate of the regression function when the explanatory For Y > 0, E X . (II.1)
Y
variable has values in a Hilbert space. We refer to Barrientos et
al. (2010) for the almost complete convergence (with speed) Obviously, this standard is more understandable measure of the
of an alternative version of the local linear estimate of the prediction performance than the least square error, certainly,
when the range of forecasted values is large. Furthermore, III. P OINTWISE ALMOST COMPLETE CONVERGENCE
solution of (II.1) is explicitly expressed by In what follows, when no confusion is possible, we will
−1
E[Y |X = x] denote by C and C 0 some strictly positive generic constants.
R(x) = . Moreover, x denotes a fixed point in F, Nx denotes a fixed
E[Y −2 |X = x]
neighborhood of x and φx (r1 , r2 ) = P(r2 ≤ δ(X, x) ≤ r1 )
In this work, we take the wide version proposed by Barrientos and we put gγ (u) = E Y −γ |X = u , γ = 1, 2.
et al. (2010) and we use the loss function (II.1) to estimate
components of the linear approximation. Specifically, for a Notice that our nonparametric model will be quite general in
fixed point x in F, we assume that ∀x0 in neighborhood of the sense that we will just need the following assumptions
x, (A1) For any r > 0, φx (r) := φx (−r, r) > 0.
R(x0 ) = a + bβ(x, x0 ) + o(β(x, x0 ))
(A2) For all (x1 , x2 ) ∈ Nx2 , we have
and we use the loss function (II.1) to estimate a, b as follows
|gγ (x1 ) − gγ (x2 )| ≤ C dkγ (x1 , x2 ) for kγ > 0.
n
X (Yi − a − bβ(Xi , x))2
(â, b̂) = arg min 2 K(h−1 δ(x, Xi )) (A3) The function β(., .) is such that
(a,b)
i=1
Yi
(II.2) ∀x0 ∈ F, C |δ(x, x0 )| ≤ |β(x, x0 )| ≤ C 0 |δ(x, x0 )|.
2
where β(., .) is a known function from F into R such that, (A4) K is a positive, differentiable function with support
∀ξ ∈ F, β(ξ, ξ) = 0, with K is a kernel and h = hK,n is [−1, 1].
a sequence of positive real numbers and δ(., .) is a function (A5) The functions β and φ are such that: there exists an
x
defined on F × F such that d(., .) = |δ(., .)|. integer n0 , such that
Clearly, by a simple algebra, we prove that (â, b̂) are solutions Z 1
of (II.2) are zeros of 1 d
z 2 K(z) dz > C > 0
∀n > n0 , − φx (zh, h)
φx (h) −1 dz
a
(Q0B ∆QB ) − (Q0B ∆Y ) = 0 and
b
Z Z !
then h β(u, x)dP (u) = o 2
β (u, x) dP (u)
a
= (Q0B ∆Y )(Q0B ∆QB )−1 . B(x,h) B(x,h)
b
where B(x, r) = {x0 ∈ F/|δ(x0 , x)| ≤ r} and dP (x) is
1 ...1 the cumulative distribution of X.
where Q0B =
B(X1 , x) . . . B(Xn , x), (A6) The bandwidth h satisfies
∆ = diag(Y1−2 K(h−1 δ(x, X1 )), ...., Yn−2 K(h−1 δ(x, Xn ))) log n
0 lim h = 0 and lim = 0.
and Y = (Y1 , . . . , Yn ). n→∞ n→∞ nφx (h)
Thus, we get explicitly
(A7) The function g2 (x) > C > 0 and the inverse moments
1 of the response variable
â = (Q0B ∆Y )(Q0B ∆QB )−1
0
∀m ≥ 2, E[Y −m |X = x] < C < ∞.
and
0
Clearly, all these assumptions are very standard and are
0 0 −1
b̂ = (QB ∆Y )(QB ∆QB ) . usually supposed in this context. Certainly, the assumptions
1
(A1), (A4)-(A6) are the same as those used in Barrientos et
Moreover, as β(x, x) = 0, we can take al. (2010). Assumptions (A2) and (A7) are also the same
Pn as in Demongeot et al. (2016). We notice that (A2) is a
i,j=1 Vij (x)Yj
R(x) = â = Pn
b (II.3) regularity condition which describes the functional space of
i,j=1 Vij (x) our model and is required to evaluate the bias term in the
where asymptotic effects of this paper. While (A1) is closely linked
to topological structure of the functional space of the data F.
Vij (x) = β(Xi , x) (β(Xi , x) − β(Xj , x)) This theorem gives the almost-complete convergence (a.co.)
×K(h−1 δ(x, Xi ))K(h−1 δ(x, Xj ))Yi−2 Yj−2 , of R̂(x).
Theorem 3.1: Under conditions (A1)-(A7), we have
Remark: with the convention 0/0 = 0. s !
1) If b = 0, then we obtain from (II.2) the same estimator as b1
log n
that in Demongeot et al. (2016). |R(x) − R(x)| = O h + O
b , a.co
n φx (h)
2) If F = R and β(x, x0 ) = x − x0 , then we obtain the same
local linear estimate as in Jones et al. (2008). where b1 = min(k1 , k2 )
∀x ∈ SF , 0 < C φ(h) ≤ φx (h) ≤ C 0 φ(h) < ∞ and Lemma 4.3: With the conditions (C1)- (C6), we have :
; ∃η0 > 0, ∀η < η0 , φ0 (η) < C, sup E[fˆ(x) − g22 (x)] = O(hk2 )
x∈SF
where φ0 denotes the first derivative of φ.
(C2) There exists η > 0, such that and
g (x)] − g2 (x)g1 (x)| = O(hk1 ).
sup |E[b
0 η 0 kγ 0
∀x, x ∈ SF , |gγ (x) − gγ (x )| ≤ Cd (x, x ), x∈SF
V. A PPENDIX
m
E Ki (x)Yi−k βil (x) − E Ki (x)Yi−k βil (x)
In what follows, we put, for any x ∈ F,
and for all i = 1, . . . , n m
X
d m−d
d
Ki (x)Yi−k βil (x) E Ki (x)Yi−k βil (x) (−1)m−d
= E Cm
Ki (x) = K(h−1 δ(x, Xi )), δi (x) = δ(Xi , x)
d=0
m
X d
m−d
d
E Ki (x)Yi−k βil (x) E Ki (x)Yi−k βil (x)
and ≤ Cm
βi (x) = β(Xi , x). d=0
m
X m−
d
E K1δ (x)β1dl (x)E[Y1−dk |X1 ] E K1 (x)β1l (x)E[Y1−k |X1 ]
≤ Cm
d=0
Proof of lemma 3.2. It is clear that
where Ck,m = m!/(k!(m − k)!).
fˆ(x) = A1 T1 T2 − T32
Under condition (A7) we have
and
E Y1−kd |X = O(1), for all d ≤ m.
ĝ(x) = A1 (T4 T2 − T5 T3 )
Next, using (A3) to write that
where
h−l E Ki (x)βil (x) ≤ h−l E Ki (x)δil (x) ≤ Cφx (h). (V.3)
n −2
n2 h2 φ2x (h) 1 X Kj (x)Yj
A1 = ; T1 = We deduce that
n(n − 1)E[W12 ] n j=1 φx (h)
m
X m−d
h−lm φ−m d
E K1δ (x)β1dl E K1 (x)β1l ≤ Cφx (h)−m+1 .
x (h) Cm
n n
1 X Ki (x)βi2 (x)Yi−2 1 X Ki (x)βi (x)Yi−2 d=0
T2 = 2
, T3 = Therefore, for l = 0, 1, 2, and k = 1, 2, we obtain that
n i=1 h φx (h) n i=1 hφx (h)
m
E Zil,k = O (φx (h))−m+1 .
n −1 n −1
1 X Kj (x)Yj 1 X Kj (x)βj (x)Yj
T4 = , T5 = . Thus, to accomplish this proof, it suffices to use the classical
n j=1 φx (h) n j=1 hφx (h)
Bernstein’s inequality (see Corollary A8 in Ferraty and Vieu
−1/2
Moreover, observe that, for all, i, j = 2, . . . , 5 (2006), page 234), with an = (φx (h)) to write that
( s )
Ti Tj − E[Ti Tj ] = (Ti − E[Ti ]) (Tj − E[Tj ]) log n 2
+ (Tj − E[Tj ]) E[Ti ] + (Ti − E[Ti ]) E[Tj ] ∀i = 1, . . . 5 P |Ti − E[Ti ]| > η ≤ C 0 n−Cη .
n φx (h)
+E[Ti ]E[Tj ] − E[Ti Tj ].
Therefore, an appropriate choice of η permits to deduce that
So, the claimed result is a consequence of the following ( s )
assertions X log n
P |Ti − E[Ti ]| > η < ∞, for i = 1, 2, 3, 4, 5.
( s )
n
n φx (h)
X log n
P |Ti − E[Ti ]| > η < ∞, for i = 1, . . . , 5, (V.1)
Now we prove (V.2). Recall that the term A1 is the same as
n
n φx (h)
in Barrientos et al. (2010). So, it suffices to show the other
terms. To do that we evaluate
A1 = O(1), E[Ti ] = O(1)
E Ki (x)Yi−k βil (x) , for l = 0, 1, 2, and k = 1, 2.
and
s ! As previously, we condition on X1 to show that, for all l =
log n 0, 1, 2, and k = 1, 2. we have
Cov(Ti , Tj ) = o for i, j = 1, . . . , 5. (V.2)
n φx (h)
E Ki (x)Yi−k βil (x) = O(E Ki (x)βil (x) )
s !
1 log n
Cov(Ti , Tj ) = O =o Proof of Lemma 4.2. The proof of this lemma is based
nφx (h) n φx (h)
on the same decomposition as for the proof of Lemma 3.2
The latter yields to the proof of the Lemma. and all it remains to show the uniform version of (V.1) and
Proof of Lemma 3.3. Since the observations (Xi , Yi )i=1,...n (V.2). Obviously, the final equation is a direct result of the
are iid, then assumption (C1) and the evaluation obtained in Lemma 3.2.
While the uniform version of (V.1) is based on the following
1 decomposition
E[fˆ(x)] − g22 (x) = β12 (x)K1 (x)Y1−2 E K1 (x)Y1−2
E[W12 ]
−E2 β1 (x)K1 (x)Y1−2 − g2 (x)E[W12 ]
sup |Tk (x) − E[Tk (x)]| ≤ sup |Tk (x) − Tk (xj(x) )|
1 x∈SF x∈SF
≤
E[W 12 ] | {z
F1
}
−2 −2
2 2
×E K1 (x)Y1 E β1 (x)K1 (x)Y1 − g2 (x)E β1 (x)K1 (x)
1 + sup |Tk (xj(x) ) − E[Tk (xj(x) )]|
g2 (x)E β1 (x)K1 (x) E K1 (x)Y1−2 − g2 (x)E [K1 (x)]
2
+
x∈SF
E[W12 ]
1
| {z }
−2
F2
+ E β1 (x)K1 (x)Y1
E[W12 ]
× E β1 (x)K1 (x)Y1−2 − g2 (x)E [β1 (x)K1 (x)] + sup |E[Tk (xj(x) )] − E[Tk (x)]|. k = 1, 2, . . . 5.
x∈SF
1
+ g2 (x)E [β1 (x)K1 (x)]
| {z }
E[W12 ] F3
× E β1 (x)K1 (x)Y1−2 − g2 (x)E [β1 (x)K1 (x)] . We have, then, to evaluate each term Fj for j = 1, 2, 3.
From conditions (A2), (A3) and (A4) we have : Firstly, we treat the terms F1 and F3 . Since K is supported
[−1, 1], then we can write
within
E β1 (x)K1 (x)Y −2 − g2 (x)E β12 (x)K1 (x) ≤ CE β12 (x)K1 (x) hk2
2
1 n
1 X
Ki (x)Y −k βil (x)1B(x,h) (Xi )
and F 1 ≤ l
sup i
nh φx (h) x∈SF i=1
E β1 (x)K1 (x)Y −2 − g2 (x)E [β1 (x)K1 (x)] ≤ C 0 E [β1 (x)K1 (x)] hk2 .
−Ki (xj(x) )Yi−k βil (xj(x) )1B(xj(x) ,h) (Xi )
1
n
Moreover, we use the fact that for k, l = 0, 1, 2 C X
≤ sup Ki (x)Yi−k 1B(x,h) (Xi )
l −k
l
nhl φx (h) x∈SF i=1
E β1 (x)K1 (x)Y1 = O h φx (h) ,
× βil (x) − βil (xj(x) )1B(xj(x) ,h) (Xi )
to write that n
1 X
Ch2 φ2 (h) + l sup Y −k βil ((xj(x) )1B(xj(x) ,h) (Xi )
ˆ
E[f (x)] − g22 (x) ≤
x
hk2 ≤ C 0 hk2 . nh φx (h) x∈SF i=1 i
E[W12 ]
× Ki (x)1B(x,h) (Xi ) − Ki (xj(x) ) .
By using the same arguments as above we show that
The Lipschitz condition on K allows to have
|E[ĝ(x)] − g2 (x)g1 (x)| ≤ Chk1 .
1B(xj(x) ,h) (Xi ) Ki (x)1B(x,h) (Xi ) − Ki (xj(x) )
Consequently ≤ C1B(x,h)∩B(xj(x) ,h) (Xi ) + C1B(xj(x) ,h)∩B(x,h) (Xi )
ˆ
E[f (x) − g22 (x)] = O(hk2 ) and the Lipschitz condition on β allows in turn to write
and 1B(x,h) (Xi ) βi (x) − βi (xj(x) )1B(xj(x) ,h) (Xi )
|E[ĝ(x)] − g2 (x)g1 (x)| = O(hk1 ).
≤ 1B(x,h)∩B(xj(x) ,h) (Xi ) + h1B(x,h)∩B(xj(x) ,h) (Xi )
Proof of Corollary 3.4. we remark that:
1B(x,h) (Xi ) βi2 (x) − βi2 (xj(x) )1B(xj(x) ,h) (Xi )
2 2
g (x) g (x)
|fˆ(x)| ≤ 2 , implies that |g22 (x) − fˆ(x)| ≥ 2 . ≤ h1B(x,h)∩B(xj(x) ,h) (Xi ) + h2 1B(x,h)∩B(xj(x) ,h) (Xi ).
2 2
R EFERENCES
[1] Baı̀llo, A. and Gran, A. (2009). Local linear regression for functional
predictor and scalar response, Journal of Multivariate Analysis, 100,
Pages 102–111.
[6] Cardot, H., Ferraty, F., Sarda, P (1999). Linear Functional Model.
Statistic and Probability,Letters, 45,11–22 .
[12] Demongeot, J., Laksaci, A., Madani, F. and Rachdi, M. (2010) Local
linear estimation of the conditional density for functional data. C. R.,
Math., Acad. Sci. Paris, 348, Pages 931-934.