0% found this document useful (0 votes)

18 views31 pages

Adaptive Density Estimation For Stationary Process

This document discusses adaptive density estimation for stationary processes. It proposes an algorithm to estimate the common density of a stationary process. The algorithm provides a model selection procedure based on a generalization of Mallows' Cp. It proves oracle inequalities for the selected estimator under assumptions on the mixing coefficients and model collection. The estimator is shown to achieve adaptive rates of convergence over Besov spaces, similar to the i.i.d. case.

Uploaded by

Thai Phuc Hung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views31 pages

Adaptive Density Estimation For Stationary Process

Uploaded by

Thai Phuc Hung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/258367948

Adaptive density estimation for stationary processes

Article · September 2009

CITATIONS READS

2 24

1 author:

Matthieu Lerasle
French National Centre for Scientific Research
59 PUBLICATIONS 704 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Statistical estimation under heavy tails View project

All content following this page was uploaded by Matthieu Lerasle on 25 November 2015.

The user has requested enhancement of the downloaded file.

Adaptive density estimation for stationary processes
Matthieu Lerasle

To cite this version:

Matthieu Lerasle. Adaptive density estimation for stationary processes. Mathematical Methods
of Statistics, 2009, 18 (1), pp.59–83. <10.3103/S1066530709010049>. <hal-00413692>

HAL Id: hal-00413692

https://hal.archives-ouvertes.fr/hal-00413692
Submitted on 4 Sep 2009

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
Adaptive density estimation of stationary β-mixing and
τ -mixing processes.
Matthieu Lerasle∗

Abstract:

We propose an algorithm to estimate the common density s of a stationary

process X1 , ..., Xn . We suppose that the process is either β or τ -mixing. We
provide a model selection procedure based on a generalization of Mallows’ Cp
and we prove oracle inequalities for the selected estimator under a few prior
assumptions on the collection of models and on the mixing coefficients. We
prove that our estimator is adaptive over a class of Besov spaces, namely, we
prove that it achieves the same rates of convergence as in the i.i.d framework.

Key words: Density estimation, weak dependence, model selection.

2000 Mathematics Subject Classification: 62G07, 62M99.

1 Introduction
We consider the problem of estimating the unknown density s of P , the law of a random
variable X, based on the observation of n (possibly) dependent data X1 , ..., Xn with com-
mon law P . We assume that X is real valued, that s belongs to L2 (µ) where µ denotes the
Lebesgue measure on R and that s is compactly supported, say in [0, 1]. Throughout the
chapter, we consider least-squares estimators ŝm of s on a collection (Sm )m∈Mn of linear
subspaces of L2 (µ). Our final estimator is chosen through a model selection algorithm.
Model selection has received much interest in the last decades. When its final goal is pre-
diction, it can be seen more generally as the question of choosing between the outcomes
of several prediction algorithms. With such a general formulation, a very natural answer
is the following. First, estimate the prediction error for each model, that is ks − ŝm k22 .
Then, select the model which minimizes this estimate.
It is natural to think of the empirical risk as an estimator of the prediction error. This can
fail dramatically, because it uses the same data for building predictors and for comparing
them, making these estimates strongly biased for models involving a number of parameters
growing with the sample size.
In order to correct this drawback, penalization’s methods state that a good choice can be
made by minimizing the sum of the empirical risk (how do algorithms fit the data) and
some complexity measure of the algorithms (called the penalty). This method was first
developped in the work of Akaike [2] and [1] and Mallows [19].
In the context of density estimation, with independent data, Birgé & Massart [8] used
penalties of order Ln Dm /n, where Dm denotes the dimension of Sm and Ln is a constant
depending on the complexity of the collection Mn . They used Talagrand’s inequality (see
for example Talagrand [24] for an overview) to prove that this penalization procedure is
∗
Institut de Mathématiques (UMR 5219), INSA de Toulouse, Université de Toulouse, France

1
efficient i.e. the integrated quadratic risk of the selected estimator is asymptotically equiv-
alent to the risk of the oracle (see Section 2 for a precise definition). They also proved that
the selected estimator achieves adaptive rates of convergence over a large class of Besov
spaces. Moreover, they showed that some methods of adaptive density estimation like the
unbiased cross validation (Rudemo [23]) or the hard thresholded estimator of Donoho et
al. [16] can be viewed as special instances of penalized projection estimators.
More recently, Arlot [5] introduced new measures of the quality of penalized least-squares
estimators (PLSE). He proved pathwise oracle inequalities, that is deviation bounds for
the PLSE that are harder to prove but more informative from a practical point of view
(see also Section 2 for details).
When the process (Xi )i=1,...,n is β-mixing (Rozanov & Volkonskii [26] and Section 2), Ta-
lagrand’s inequality can not be used directly. Baraud et al. [6] used Berbee’s coupling
lemma (see Berbee ([7]) and Viennet’s covariance inequality (Viennet [25]) to overcome
this problem and build model selection procedure in the regression problem. Then Comte
& Merlevède [13] used this algorithm to investigate the problem of density estimation for
a β-mixing process. They proved that under reasonable assumptions on the collection Mn
and on the coefficients β, one can recover the results of Birgé & Massart [8] in the i.i.d.
framework.
The main drawback of those results is that many processes, even simple Markov chains
are not β-mixing. For instance, if (ǫi )i≥1 is iid with marginal B(1/2), then the stationary
solution (Xi )i≥0 of the equation

1
Xn = (Xn−1 + ǫn ), X0 independent of (ǫi )i≥1 (1)
2
is not β-mixing (Andrews [3]). More recently, Dedecker & Prieur [15] introduced new
mixing-coefficients, in particular the coefficients τ , φ̃ and β̃ and proved that many processes
like (1) happen to be τ , φ̃ and β̃-mixing. They proved a coupling lemma for the coefficient
τ and covariance inequalities for φ̃ and β̃. Gannaz & Wintenberger [18] used the covariance
inequality to extend the result of Donoho et al. [16] for the wavelet thresholded estimator
to the case of φ̃-mixing processes. They recovered (up to a log(n) factor) the adaptive
rates of convergence over Besov spaces.
In this article, we first investigate the case of β-mixing processes. We prove a pathwise
oracle inequality for the PLSE. We extend the result of Comte & Merlevède [13] under
weaker assumptions on the mixing coefficients. Then, we consider τ -mixing processes. The
problem is that the coupling result is weaker for the coefficient τ than for β. Moreover,
in order to control the empirical process we use a covariance inequality that is harder to
handle. Hence, the generalization of the procedure of Baraud et al. [6] to the framework
of τ -mixing processes is not straightforward. We recover the optimal adaptive rates of
convergence over Besov spaces (that is the same as in the independent framework) for
τ -mixing processes, which is new as far as we know.
The chapter is organized as follows. In Section 2, we give the basic material that we will
use throughout the chapter. We recall the definition of some mixing coefficients and we
state their properties. We define the penalized least-squares estimator (PLSE). Sections 3
and 4 are devoted to the statement of the main results, respectively in the β-mixing case
and in the τ -mixing case. In Section 5, we derive the adaptive properties of the PLSE.
Finally, Section 6 is devoted to the proofs. Some additional material has been reported in
the Appendix in Section 7.

2
2 Preliminaries
2.1 Notation.
Let (Ω, A, P) be a probability space. Let µ be the Lebesgue measure on R, let k.kp be the
usual norm on Lp (µ) for 1 ≤ p ≤ ∞. For all y ∈ Rl , let |y|l = li=1 |yi |. Denote by λκ the
P
set of κ-Lipschitz functions, i.e. the functions t from (Rl , |.|l ) to R such that Lip(t) ≤ κ
where
|t(x) − t(y)| l
Lip(t) = sup , x, y ∈ R , x 6= y ≤ κ.
|x − y|l
Let BV and BV1 be the set of functions t supported on R satisfying respectively ktkBV < ∞
and ktkBV ≤ 1 where

ktkBV = sup sup |t(ai+1 ) − t(ai )|.

n∈N∗ −∞<a1 <...<an <∞

2.2 Some measures of dependence.

The coefficient β(M, σ(Y )) is the mixing coefficient introduced by Rozanov & Volkonskii
[26]. The coefficients β̃(M, Y1 ) and τ (M, Y ) have been introduced by Dedecker & Prieur
[15].
Let (Xk )k∈Z be a stationary sequence of real valued random variables defined on (Ω, A, P).
For all k ∈ N∗ , the coefficients βk , β̃k and τk are defined by

βk = β(σ(Xi , i ≤ 0), σ(Xi , i ≥ k)), β̃k = sup{β̃(σ(Xp , p ≤ 0), Xj )}.

j≥k

If E(|X1 |) < ∞, for all k ∈ N∗ and all r ∈ N∗ , let

1
τk,r = max sup {τ (σ(Xp , p ≤ 0), (Xi1 , ..., Xil ))}, τk = sup τk,r .
1≤l≤r l k≤i1 <..<il r∈N∗

Moreover, we set β0 = 1. In the sequel, the processes of interest are either β-mixing or
τ -mixing, meaning that, for γ = β or τ , the γ-mixing coefficients γk → 0 as k → +∞. For
p ∈ {1, 2}, we define κp as:
X∞
κp = p lp−1 βl , (2)
l=0

3
where 00 = 1, when the series are convergent. Besides, we consider two kinds of rates of
convergence to 0 of the mixing coefficients, that is for γ = β or τ ,
[AR] arithmetical γ-mixing with rate θ if there exists some θ > 0 such that γk ≤ (1 +
k)−(1+θ) for all k in N,
[GEO] geometrical γ-mixing with rate θ if there exists some θ > 0 such that γk ≤ e−θk
for all k in N.

2.2.2 Properties
Coupling
Let X be an Rl -valued random variable defined on (Ω, A, P) and let M be a σ-algebra.
Assume that there exists a random variable U uniformly distributed on [0, 1] and indepen-
dent of M ∨ σ(X). There exist two M ∨ σ(X) ∨ σ(U )-measurable random variables X1∗
and X2∗ distributed as X and independent of M such that

β(M, σ(X)) = P(X 6= X1∗ ) and (3)

τ (M, X) = E (|X − X2∗ |l ) . (4)

Equality (3) has been established by Berbee [7], Equality (4) has been established in
Dedecker & Prieur [15], Section 7.1.
Covariance inequalities
Let X, Y be two real valued random variables and let f, h be two measurable functions
from R to C. Then, there exist two measurable functions b1 : R → R and b2 : R → R with
E (b1 (X)) = E(b2 (Y )) = β(σ(X), σ(Y )) such that, for any conjugate p, q ≥ 1 (see Viennet
[25] Lemma 4.1)

|Cov(f (X), h(Y ))| ≤ 2E1/p (|f (X)|p b1 (X)) E1/q (|h(Y )|q b2 (Y )).

There exists a random variable b(σ(X), Y ) such that E(b(σ(X), Y )) = β̃(σ(X), Y ) and such
that, for all Lipschitz functions f and all h in BV (Dedecker & Prieur [15] Proposition 1)

|Cov(f (X), h(Y ))| ≤ khkBV E (|f (X)|b(σ(X), Y )) ≤ khkBV kf k∞ β̃(σ(X), Y ). (5)

Comparison results
Let (Xk )k∈Z be a sequence of identically distributed real random variables. If the marginal
distribution satisfies a concentration’s condition |FX (x) − FX (y)| ≤ K|x − y|a with a ≤ 1,
K > 0, then (Dedecker et al. [14] Remark 5.1 p 104)
a/(a+1) a/(a+1)
β̃k ≤ 2K 1/(1+a) τk,1 ≤ 2K 1/(1+a) τk .

In particular, if PX has a density s with respect to the Lebesgue measure µ and if s ∈ L2 (µ),
we have from Cauchy-Schwarz inequality
Z Z 1/2
|FX (x) − FX (y)| = | 1[x,y] sdµ| ≤ ksk2 1[x,y] dµ = ksk2 |x − y|1/2 ,

thus
2/3 1/3
β̃k ≤ 2 ksk2 τk .
In particular, for any arithmetically [AR] τ -mixing process with rate θ > 2, we have
2/3
β̃k ≤ 2 ksk2 (1 + k)−(1+θ)/3 . (6)

4
2.2.3 Examples
Examples of β-mixing and τ -mixing sequences are well known, we refer to the books of
Doukhan [17] and Bradley [11] for examples of β-mixing processes and to the book of
Dedecker et. al [14] or the articles of Dedecker & Prieur [15], Prieur [21], and Comte
et. al [12] for examples of τ -mixing sequences. One of the most important example is
the following: a stationary, irreducible, aperiodic and positively recurent Markov chain
(Xi )i≥1 is β-mixing. However, many simple Markov chains are not β-mixing but are τ -
mixing. For instance, it is known for a long time that if (ǫi )i≥1 are i.i.d Bernoulli B(1/2),
then a stationary solution (Xi )i≥0 of the equation
1
Xn = (Xn−1 + ǫn ), X0 independent of (ǫi )i≥1
2
is not β-mixing since βk = 1 for any k ≥ 1 whereas τk ≤ 2−k (see Dedecker & Prieur [15]
Section 4.1). Another advantage of the coefficient τ is that it is easy to compute in many
situations (see Dedecker & Prieur [15] Section 4).

2.3 Collections of models

We observe n identically distributed real valued random variables X1 , ..., Xn with common
density s with respect to the Lebesgue measure µ. We assume that s belongs to the Hilbert
space L2 (µ) endowed with norm k.k2 . We consider an orthonormal system {ψj,k }(j,k)∈Λ
of L2 (µ) and a collection of models (Sm )m∈Mn indexed by subsets m ⊂ Λ for which we
assume that the following assumptions are fulfilled:
[M1 ] for all m ∈ Mn , Sm is the linear span of {ψj,k }(j,k)∈m with finite dimension Dm =
|m| ≥ 2 and Nn = maxm∈Mn Dm satisfies Nn ≤ n;
[M2 ] there exists a constant Φ such that

∀m, m′ ∈ Mn , ∀t ∈ Sm , ∀t′ ∈ Sm′ , kt + t′ k∞ ≤ Φ dim(Sm + Sm′ )kt + t′ k2 ;

[M3 ] Dm ≤ Dm′ implies that m ⊂ m′ and so Sm ⊂ Sm′ .

As a consequence of Cauchy-Schwarz inequality, we have

X
2 ktk2∞
ψj,k = sup 2 (7)
t∈Sm +Sm′ ,t6=0 ktk2
(j,k)∈m∪m′ ∞

see Birgé & Massart [8] p 58. Three examples are usually developed as fulfilling this set
of assumptions:
[T] trigonometric spaces: ψ0,0 (x) = 1 and for all j ∈ N∗ , ψj,1 (x) = cos(2πjx), ψj,2 (x) =
sin(2πjx). m = {(0, 0), (j, 1), (j ′ , 2), 1 ≤ j, j ′ ≤ Jm } and Dm = 2Jm + 1;
[P] regular piecewise polynomial spaces: Sm is generated by r polynomials ψj,k of degree
k = 0, ..., r − 1 on each subinterval [(j − 1)/Jm , j/Jm ] for j = 1, ..., Jm , Dm = rJm ,
Mn = {m = {(j, k), j = 1, ..., Jm , k = 0, ..., r − 1}, 1 ≤ Jm ≤ [n/r]};
[W] spaces generated by dyadic wavelet with regularity r as described in Section 4.
For a precise description of those spaces and their properties, we refer to Birgé & Massart
[8].

2.4 The estimator

Let (Xn )n∈Z be a real valued stationary process and let P denote the law of X0 . Assume
that P has a density s with respect to the Lebesgue measure µ and that s ∈ L2 (µ).

5
Let (Sm )m∈Mn be a collection of models satisfying assumptions [M1 ]-[M3 ]. We define
Sn = ∪m∈Mn Sm , sm and sn the orthogonal projections of s onto Sm and Sn respectively,
let P be the joint distribution of the observations (Xn )n∈Z and let E be the corresponding
expectation. We define the operators Pn , P and νn on L2 (µ) by
n
1X
Z
Pn t = t(Xi ), P t = t(x)s(x)dµ(x), νn (t) = (Pn − P )t.
n
i=1

All the real numbers that we shall introduce and which are not indexed by m or n are fixed
constants. In order to define the penalized least-squares estimator, let us consider on R×Sn
the contrast function γ(x, t) = −2t(x) + ktk22 and its empirical version γn (t) = Pn γ(., t).
Minimizing γn (t) over Sm leads to the classical projection estimator ŝm on Sm . Let ŝn be
the projection estimator on Sn . Since {ψj,k }(j,k)∈m is an orthonormal basis of Sm one gets
X X
ŝm = (Pn ψj,k )ψj,k and γn (ŝm ) = − (Pn ψj,k )2 .
(j,k)∈m (j,k)∈m

Now, given a penalty function pen : Mn → R+ , we define a selected model m̂ as any

element
m̂ ∈ arg min (γn (ŝm ) + pen(m)) (8)
m∈Mn

and a PLSE is defined as any s̃ ∈ Sm̂ ⊂ Sn such that

γn (s̃) + pen(m̂) = inf (γn (ŝm ) + pen(m)) . (9)

m∈Mn

2.5 Oracle inequalities

An ideal procedure for estimation chooses an oracle

mo ∈ Arg min {ks − ŝm k2 }.

m∈Mn

An oracle depends on the unknown s and on the data so that it is unknown in practice.
In order to validate our procedure, we try to prove:
-non asymptotic oracle inequalities for the PLSE:

E ks − s̃k22 ≤ L inf {E ks − ŝm k22 + R(m, n) },

(10)
m∈Mn

for some constant L ≥ 1 (as close to 1 as possible) and a remainder term R(m, n) ≥ 0
possibly random, and small compared to E ks − s̃k22 if possible. This inequality compares
the risk of the PLSE with the best deterministic choice of m. Since m̂ is random, we prefer
to prove a stronger form of oracle inequality :

2 2
E ks − s̃k2 ≤ LE inf {ks − ŝm k2 + R(m, n)} , (11)
m∈Mn

or, when it is possible, deviation bounds for the PLSE:

2 2
P ks − s̃k2 > L inf ks − ŝm k2 + R(m, n) ≤ cn , (12)
m∈Mn

6
where typically cn ≤ C/n1+γ for some γ > 0. Inequality (12) proves that, asymptotically,
the risk ks − s̃k22 is almost surely the one of the oracle. Let

2 2
Ω = ks − s̃k2 > L inf ks − ŝm k2 + R(m, n) .
m∈Mn

We have
E ks − s̃k22 = E ks − s̃k22 1Ω + E ks − s̃k22 1Ωc .
n o
It is clear that E ks − s̃k22 1Ωc ≤ LE inf m∈Mn ks − ŝm k22 + R(m, n) . Moreover, we
have ks − s̃k2 = ks − sm̂ k2 + ksm̂ − s̃k2 ≤ ksk2 + Φ2 Dm̂ ≤ ksk2 + Φ2 n, thus, when (12)
holds, we have
C
E ks − s̃k22 1Ωc ≤ (ksk2 + Φ2 n)cn ≤ γ .
n
Therefore, inequality (12) implies

2

2 C
E ks − s̃k2 ≤ E inf {ks − ŝm k2 + R(m, n)} + γ .
m∈Mn n

We can derive from these inequalities adaptive rates of convergence of the PLSE on Besov
spaces (see Birgé & Massart [8] for example). In order to achieve this goal, we only have
to prove a weaker form of oracle inequality where the remainder term R(m, n) ≤ LDm /n
for some constant L, for all the models m with sufficiently large dimension. This will be
detailed in Section 5.

3 Results for β-mixing processes

From now on, the letters κ, L and K, with various sub- or supscripts, will denote some
constants which may vary from line to line. One shall use L. to indicate more precisely
the dependence on various quantities, especially those which are related to the unknown
s.
In this section, we give the following theorem for β-mixing sequences. It can be seen as a
pathwise version of Theorem 3.1 in Comte & Merlevède [13].

Theorem 3.1 Consider a collection of models satisfying [M1 ], [M2 ] and [M3 ]. Assume
that the process (Xn )n∈Z is strictly stationary and arithmetically [AR] β-mixing with mix-
ing rate θ > 2 and that its marginal distribution admits a density s with respect to the
Lebesgue measure µ, with s ∈ L2 (µ).
Let κ1 be the constant defined in (2) and let s̃ be the PLSE defined by (9) with

KΦ2 κ1 Dm
pen(m) = , where K > 4.
n
Then, for all κ > 2 there exist c0 > 0, Ls > 0, γ1 > 0 and a sequence ǫn → 0, such that

(log n)(θ+2)κ

P ks̃ − sk22 > (1 + ǫn ) inf ks − s m k2
2 + pen(m) ≤ L s .
m∈Mn ,Dm ≥c0 (log n)γ1 nθ/2
(13)

Remark: The term KΦ2 κ1 is the same as in Theorem 3.1 of Comte & Merlevède [13] but
with a constant K > 4 instead of 320. The main drawback of this result is that the penalty
term involves the constant κ1 which is unknown in practice. However, Theorem 3.1 ensures

7
that penalties proportional to the linear dimension of Sm lead to efficient model selection
procedures. Thus we can use this information to apply the slope heuristic algorithm intro-
duced by Birgé & Massart [9] in a Gaussian regression context and generalized by Arlot
& Massart [4] to more general M-estimation frameworks. This algorithm calibrates the
constant in front of the penalty term when the shape of an ideal penalty is available. The
result of Arlot & Massart is proven for independent sequences, in a regression framework,
but it can be generalized to the density estimation framework, for independent as well as
for β or τ dependent data. This result is beyond the scope of this chapter and will be
proved in chapter 4.
We have to consider the infimum in equation (13) over the models with sufficiently large
dimensions. However, as noted by Arlot [5] (Remark 9 p 43), we can take the infimum
over all the models in (13) if we add an extra term in (13). More precisely, we can prove
that, with probability larger than 1 − Ls (log n)(θ+2)κ /nθ/2
(log n)γ2
ks̃ − sk22 ≤ (1 + ǫn ) inf ks − ŝm k22 + pen(m) + L , (14)
m∈Mn n
where L > 0 and γ2 > 0.
Remark : The main improvement of Theorem 3.1 is that it gives an oracle inequality
in probability, with a deviation bound of order o(1/n) as soon as θ > 2 instead of θ > 3
in Comte & Merlevède [13]. Moreover, we do not require s to be bounded to prove our
result.
Remark: When the data are independent, the proof of Theorem 3.1 can be used to
obtain that the estimator s̃ chosen with a penalty term of order KΦDm /n satisfy an
oracle inequality as (13). The main difference would be that κ1 = 1, thus it can be
used without a slope heuristic (even if this algorithm can be used also in this context to
2
optimize the constant K) and the control of the probability would be Ls e− ln(n) /Cs for
some constants Ls , Cs instead of Ls (log n)(θ+2) κn−θ/2 in our theorem.

4 Results for τ -mixing sequences

In order to deal with τ -mixing sequences, we need to specify the basis (ψj,k )(j,k)∈Λ .

4.1 Wavelet basis

Throughout this section, r is a real number, r ≥ 1 and we work with an r-regular or-
thonormal multiresolution analysis of L2 (µ), associated with a compactly supported scal-
ing function φ and a compactly supported mother wavelet ψ. Without loss of generality,
we suppose that the support of the functions φ and ψ is an interval [A1 , A2 ) where A1
and A2 are integers such that A2 − A1 = A ≥ 1. Let us recall that φ and ψ generate an
orthonormal basis by dilatations and translations.
√
For all k ∈ Z and j ∈ N∗ , let ψ0,k : x → 2φ(2x − k) and ψj,k : x → 2j/2 ψ(2j x − k). The
family {(ψj,k )j≥0,k∈Z} is an orthonormal
√ basis of L2 (µ). √
Let us recall the following inequal-
ities: for all p ≥ 1, let Kp = ( 2kφkp ) ∨ kψkp , KL = (2 2Lip(φ)) ∨ Lip(ψ), KBV = AKL .
Then for all j ≥ 0, we have kψj,k k∞ ≤ K∞ 2j/2 ,
X
|ψj,k | ≤ AK∞ 2j/2 (15)
k∈Z ∞
Lip(ψj,k ) ≤ KL 23j/2 , (16)
kψj,k kBV ≤ KBV 2j/2 . (17)

8
We assume that our collection (Sm )m∈Mn satisfies the following assumption:
[W] dyadic wavelet generated spaces: let Jn = [log(n/2(A + 1))/ log(2)] and for all Jm =
1, ..., Jn , let

m = {(0, k), −A2 < k < 2 − A1 } ∪ {(j, k), 1 ≤ j ≤ Jm , −A2 < k < −A1 + 2j }

and Sm the linear span of {ψj,k }(j,k)∈m . In particular, we have Dm = (A−1)(Jm +1)+2Jm +1
and thus 2Jm +1 ≤ Dm ≤ (A − 1)(Jm + 1) + 2Jm +1 ≤ A2Jm +1 .

4.2 The τ -mixing case

The following result proves that we keep the same rate of convergence for the PLSE based
on τ -mixing processes.

Theorem 4.1 Consider the collection of models [W]. Assume that (Xn )n∈Z is strictly
stationary and arithmetically [AR] τ -mixing with mixing rate θ > 5 and that its marginal
distribution admits a density s with respect to the Lebesgue measure µ. Let s̃ be the PLSE
defined by (9) with
∞
!
X Dm
pen(m) = KAK∞ KBV β̃l , where K ≥ 8.
n
l=0

Then there exist constants c0 > 0, γ1 > 0 and a sequence ǫn → 0 such that

2 2

E ks̃ − sk2 ≤ (1 + ǫn ) inf γ
ks − sm k2 + pen(m) . (18)
m∈Mn , Dm ≥c0 (log n) 1

Remark : As in Theorem 3.1, the penalty term involves an unknown constant and we
have a condition on the dimension of the models in (18). However, the slope heuristic can
also be used in this context to calibrate the constant and a careful look at the proof shows
that we can take the infimum over all models m ∈ Mn provided that we increase the
constant K in front of the penalty term. Our result allows to derive rates of convergence
in Besov spaces for the PLSE that correspond to the rates in the i.i.d. framework (see
Proposition 5.2).
Remark : Theorem 4.1 gives an oracle inequality for the PLSE built on τ -mixing se-
quences. This inequality is not pathwise and the constants involved in the penalty term
are not optimal. This is due to technical reasons, mainly because we use the coupling
result (4) instead of (3). However, we recover the same kind of oracle inequality as in
the i.i.d. framework (Birgé and Massart [8]) under weak assumptions on the mixing co-
efficients since we only require arithmetical [AR] τ -mixing assumptions on the process
(Xn )n∈Z . This is the first result for these processes up to our knowledge.
Let us mention here Theorem 4.1 in Comte & Merlevède [13]. They consider α-mixing
processes (for a definition of the coefficient α and its properties, we refer to Rio [22]). They
make geometrical [GEO] α-mixing assumptions on the processes and consider penalties
of order L log(n)Dm /n to get an oracle inequality. This leads to a logarithmic loss in
the rates of convergence. They get the optimal rate under an extra assumption (namely
Assumption [Lip] in Section 3.2). There exist random processes that are τ -mixing and
not α-mixing (see Dedecker & Prieur [15]), however, the comparison of these coefficients
is difficult in general and our method can not be applied in this context.
The constants c0 , γ1 , no are given in the end of the proof.

9
Remark : Inequality (2.6) can be improved under stronger assumptions on s. For exam-
√
ple, when s is bounded, we have β̃k ≤ C τk . Under this assumption and θ > 3, we can
prove that the estimator s̃ satisfies the inequality

(log n)κ(θ+1)

2 2

E ks̃ − sk2 ≤ (1 + ǫn ) inf ks − s m k2 + pen(m) + .
m∈Mn , Dm ≥c0 (log n)γ1 n(θ−3)/2

When θ < 5, the extra term (log n)κ(θ+1) /n(θ−3)/2 may be larger than the main term
inf m∈Mn , Dm ≥c0 (log n)γ1 ks − sm k22 + pen(m). In this case, we don’t know if our control
remains optimal. On the other hand, Proposition 5.2 ensures that s̃ is adaptive over the
class of Besov balls when θ ≥ 5.

5 Minimax results
5.1 Approximation results on Besov spaces
Besov balls.
Throughout this section, Λ = {(j, k), j ∈ N, k ∈ Z} and {ψj,k , (j, k) ∈ Λ} denotes an
r-regular wavelet basis as introduced in Section 4.1. Let α, pPbe two positive numbers such
that α + 1/2 − 1/p > 0. For all functions t ∈ L2 (µ), t = (j,k)∈Λ tj,k ψj,k , we say that t
belongs to the Besov ball Bα,p,∞ (M1 ) on the real line if ktkα,p,∞ ≤ M1 where
!1/p
X
j(α+1/2−1/p) p
ktkα,p,∞ = sup 2 |tj,k | .
j∈N k∈Z

It is easy to check that if p ≥ 2 Bα,p,∞ (M1 ) ⊂ Bα,2,∞ (M1 ) so that upper bounds on
Bα,2,∞ (M1 ) yield upper bounds on Bα,p,∞ (M1 ).
Approximation results on Besov spaces.
We have the following result (Birgé & Massart [8] Section 4.7.1). Suppose that the support
of s equals [0, 1] and that s belongs to the Besov ball Bα,2,∞ (1), then whenever r > α − 1,

ksk2α,2,∞ (2A)2α ksk2α,2,∞

ks − sm k22 ≤ 2−2Jm α ≤ −2α
Dm (19)
4(4α − 1) 4(4α − 1)

5.2 Minimax rates of convergence for the PLSE

We can derive from Theorems 3.1 and 4.1 adaptation results to unknown smoothness over
Besov Balls.

Proposition 5.1 Assume that the process (Xn )n∈Z is stricly stationary and arithmetically
[AR] β-mixing with mixing rate θ > 2 and that its marginal distribution admits a density
s with respect to the Lebesgue measure µ, that s is supported in [0, 1] and that s ∈ L2 (µ).
For all α, M1 > 0, the PLSE s̃ defined in Theorem 3.1 for the collection of models [W]
satisfies
L (log n)(θ+2)κ
M1
∀κ > 2, sup P ks̃ − sk22 > LM1 ,α,θ n−2α/(2α+1) ≤ .
s∈Bα,2,∞ (M1 ) nθ/2

Proposition 5.2 Assume that the process (Xn )n∈Z is stricly stationary and arithmetically
[AR] τ -mixing with mixing rate θ > 5 and that its marginal distribution admits a density

10
s with respect to the Lebesgue measure µ, that s is supported in [0, 1] and that s ∈ L2 (µ).
For all α, M1 > 0, the PLSE s̃ defined in Theorem 4.1 satisfies

sup E ks̃ − sk22 ≤ LM1 ,α,θ n−2α/(2α+1) .
s∈Bα,2,∞ (M1 )

Remark: Proposition 5.2 can be compared to Theorem 3.1 in Gannaz & Wintenberger
[18]. They prove near minimax results for the thresholded wavelet estimator introduced
by Donoho et al. [16] in a φ̃-dependent setting (for a definition of the coefficient φ̃, we
refer to Dedecker & Prieur [15]). Basically, with our notations, their result can be stated
b
as follows: if (Xn )n∈Z is φ̃-mixing with φ̃1 (r) ≤ Ce−ar for some constants C, a, b, then
the thresholded wavelet estimator ŝ of s satisfies
log n 2α/(2α+1)

2
∀α > 0, ∀p > 1, sup E kŝ − sk2 ≤ LM,M1,α,p .
s∈Bα,p,∞ (M1 )∩L∞ (M ) n

The main advantage of their result is that they can deal with Besov balls with regularity
1 < p < 2. However, in the regular case, when p ≥ 2, we have been able to remove the extra
log n factor. Moreover, our result only requires arithmetical [AR] rates of convergence for
the mixing coefficients and we do not have to suppose that s is bounded.

6 Proofs.
6.1 Proofs of the minimax results.
Proof of Proposition 5.1:
Let α > 0 and M1 > 0 and assume that s ∈ Bα,2,∞ (M1 ). Let M̃n = {m ∈ Mn , Dm >
c0 (log n)γ1 }. By Theorem 3.1, there exists a constant Lθ > 0 such that

Ls (log n)(θ+2)κ

2 2 Dm
P ks̃ − sk2 > Lθ inf ks − sm k2 + ≤ . (20)
m∈M̃n n nθ/2
It appears from the proof of Theorem 3.1 that the constant Ls depends only on ksk2 and
that it is a nondecreasing function of ksk2 so that Ls can be uniformly bounded over
Bα,2,∞ (M1 ) by a constant LM1 so that, by (20)

LM1 (log n)(θ+2)κ

2 2 Dm
P ks̃ − sk2 > Lθ inf ks − sm k2 + ≤ .
m∈M̃n n nθ/2
In particular, for a model m in Mn with dimension Dm such that

c0 (log n)γ1 ≤ L1 n1/(2α+1) ≤ Dm ≤ L2 n1/(2α+1) ,

we have
LM1 (log n)(θ+2)κ

Dm
P ks̃ − sk22 > Lθ ks − sm k22 + ≤ .
n nθ/2
Since s belongs to Bα,2,∞ (M1 ), we can use Inequality (19) to get

ks − sm k22 ≤ Lα,M1 Dm
−2α
.

Thus we obtain
L (log n)(θ+2)κ
M1
P ks̃ − sk22 > LM1 ,α,θ n−2α/(2α+1) ≤ .
nθ/2

11
Proof of Proposition 5.2:
Let α > 0 and M1 > 0 and assume that s ∈ Bα,2,∞ (M1 ). By Theorem 4.1, we have

2

2 Dm
E ks̃ − sk2 ≤ Lθ inf {ks − sm k2 + } .
m∈M̃n n

Inequality (19) leads to ks − sm k22 ≤ Lα,M1 Dm

−2α , so that for a model m in M̃ with
n
dimension Dm such that
c0 (log n)γ1 ≤ L1 n1/(2α+1) ≤ Dm ≤ L2 n1/(2α+1) ,
we find
E ks̃ − sk22 ≤ Lθ,α,M1 n−2α/(2α+1) .

6.2 Proof of Theorem 3.1:

For all mo in Mn , we have, by definition of m̂
γn (s̃) + pen(m̂) ≤ γn (ŝmo ) + pen(mo )
P γ(s̃) + νn γ(s̃) + pen(m̂) ≤ P γ(ŝmo ) + νn γ(ŝmo ) + pen(mo )
P γ(s̃) − P γ(s) − 2νn s̃ + pen(m̂) ≤ P γ(ŝmo ) − P γ(s) − 2νn ŝmo + pen(mo )
Since for all t ∈ L2 (µ), P γ(t) − P γ(s) = kt − sk22 , we have
ks − s̃k22 ≤ ks − ŝmo k22 + pen(mo ) − V (mo ) − (pen(m̂) − V (m̂)) − 2νn (smo − sm̂ ), (21)
where, for all m ∈ Mn
X
V (m) = 2νn (ŝm − sm ) = 2 νn2 (ψj,k ).
(j,k)∈m

This decomposition is different from the one used in Birgé & Massart [8] and in Comte &
Merlevède [13]. It allows to improve the constant in the oracle inequality in the β-mixing
case. Moreover, we choose to prove an oracle inequality of the form (12) for β-mixing
sequences, which allows to assume only θ > 2 instead of θ > 3. Let us now give a sketch
of the proof:
1. we build an event ΩC with P(ΩcC ) ≤ pβq such that, on ΩC , νn = νn∗ , where νn∗
is built with independent data. A suitable choice of the integers p and q leads to
pβq ≤ C(ln n)r n−θ/2 .
2. We use the concentration’s inequality (7.4) of Birgé & Massart [8] for χ2 -type statis-
tics, derived from Talagrand’s inequality. This allows us to find p1 (m) such that on
an event Ω1 with P(Ωc1 ∩ ΩC ) ≤ L1,s cn
sup {V (m) − p1 (m)} ≤ 0.
m∈Mn

cn < C(ln n)r n−θ/2 and L1,s is some constant depending on s.

3. From Bernstein’s inequality, we prove that, for all m, m′ ∈ Mn , there exists p2 (m, m′ )
such that, for all η > 0, on an event Ω2 with P(Ωc2 ∩ ΩC ) ≤ L2,s cn ,
( )
η ′ ksm − sm′ k22
sup νn (sm − sm′ ) − p2 (m, m ) − ≤ 0.
m,m′ ∈Mn 2 2η

Moreover, for all m, m′ ∈ Mn , p2 (m, m′ ) ≤ p2 (m, m) + p2 (m′ , m′ ).

12
4. We have ksm̂ − smo k22 ≤ ksm̂ − sk22 + ks − smo k22 because sm̂ − smo is either the
projection of sm̂ − s onto Smo or the projection of s − smo onto Sm̂ . Take pen(m) ≥
p1 (m) + ηp2 (m, m), we have, on Ω1 ∩ Ω2 ∩ ΩC

Vmo Vmo
ks − s̃k22 ≤ ks − ŝmo k22 − + pen(mo ) − (22)
2 2
−(pen(m̂) − p1 (m̂)) − (p1 (m̂) − V (m̂)) − 2νn (smo − sm̂ )
V (mo )
≤ ks − smo k22 + pen(mo ) − − ηp2 (m̂, m̂)
2
ksmo − sm̂ k22
+ηp2 (m̂, mo ) + (23)
η

1 1
1− ks − s̃k22 ≤ (1 + ) ks − smo k22 + pen(mo ) + ηp2 (mo , mo ). (24)
η η

In (23), we used that V (mo ) = 2ksmo − ŝmo k22 ≥ 0. In (24), we used that Vmo ≥ 0.
Pythagoras Theorem gives

V (mo )
ks − ŝmo k22 − = ks − smo k22 and; ks − sm̂ k22 ≤ ks − s̃k22 .
2
Finally, we prove that we can choose η = (log n)γ , with γ > 0 such that ηp2 (mo , mo ) =
o(pen(mo )) and we conclude the proof of (3.1) from the previous inequalities.
We decompose the proof in several claims corresponding to the previous steps.
Claim 1 : For all l = 0, ..., p − 1, let us define Al = (X2lq+1 , ..., X(2l+1)q ) and Bl =
(X(2l+1)q+1 , ..., X(2l+2)q ). There exist random vectors A∗l = (X2lq+1
∗ ∗
, ..., X(2l+1)q ) and Bl∗ =
∗ ∗
(X(2l+1)q+1 , ..., X(2l+2)q ) such that for all l = 0, ..., p − 1 :

1. A∗l and Al have the same law,

2. A∗l is independent of A0 , ..., Al−1 , A∗0 ..., A∗l−1

3. P(Al 6= A∗l ) ≤ βq

the same being true for the variables Bl .

Proof of Claim 1 :
The proof is derived from Berbee’s lemma, we refer to Proposition 5.1 in Viennet [25]
for further details about this construction.
√ √
Hereafter, we assume that, for some κ > 2, n(log n)κ /2 ≤ p ≤ n(log n)κ and for the
sake of simplicity that pq = n/2, the modifications needed to handle the extra term when
q = [n/(2p)] being straightforward. Let ΩC = {∀l = 0, ..., p − 1 Al = A∗l , Bl = Bl∗ }. We
have
(log n)(θ+2)κ
P(ΩcC ) ≤ 2pβq ≤ 22+θ .
nθ/2
Let us first deal with the quadratic term V (m).
Claim 2 : Under the assumptions of Theorem 3.1, let ǫ > 0, 1 < γ < κ/2. We define
√
L21 = 2Φ2 κ1 , L22 = 8Φ3/2 κ2 , L3 = 2Φκ(ǫ) and
s !2
(log n)γ L3
L1,m = 4 (1 + ǫ)L1 + L2 + . (25)
Dm
1/4 (log n)κ−γ

13
Then, we have
!
(log n)γ

L1,m Dm
P sup V (m) − ≥ 0 ∩ ΩC ≤ Ls,γ exp − p .
m∈Mn n ksk2

1/2
where Ls,γ = 2 ∞ γ
P
D=1 exp(−(log D) / ksk2 ). In particular, for all r > 0, there exists a
constant L′s,r depending on ksk2 , such that

L′s,r

L1,m Dm
P sup V (m) − ≥ 0 ∩ ΩC ≤ .
m∈Mn n nr

Remark : When (L2 /L1 )8 (log n)4(2κ−γ) ≤ Dm ≤ n, we have

" √ ! #2
2κ(ǫ)
L1,m ≤ 1 + ǫ + 1+ √ (log n)−(κ−γ) 4L21 .
κ1

Proof of ClaimP2 :
Let Pn∗ (t) = ni=1 t(Xi∗ )/n and νn∗ (t) = (Pn∗ − P )t, we have
X
V (m)1ΩC = 2 (νn∗ )2 (ψj,k )1ΩC .
(j,k)∈m
Pq
Let B1 (Sm ) = {t ∈ Sm ; ktk2 ≤ 1}. ∀t ∈ B1 (Sm ), let t̄(x1 , ..., xq ) = i=1 t(xi )/2q and for
all functions g : Rq → R let
p−1 p−1
1X 1X
Z
∗
PA,p g= g(A∗j ), PB,p
∗
g= g(Bj∗ ), P̄ g = gPA (dµ),
p p
j=0 j=0

∗ ∗
and ν̄A,p g = (PA,p − P̄ )g, ν̄B,p g = (PB,p − P̄ )g.
Now we have
X X X
(νn∗ )2 (ψj,k ) ≤ 2 2
ν̄A,p ψ̄j,k + 2 2
ν̄B,p ψ̄j,k .
(j,k)∈m (j,k)∈m (j,k)∈m

In order to handle these terms, we use Proposition 7.4 which is stated in Section 7. Taking

X X
2
Bm = Var(ψ̄j,k (A1 )), Vm2 = sup 2
Var(t̄(A1 )), and Hm = (ψ̄j,k )2 ,
(j,k)∈m t∈B1 (Sm ) (j,k)∈m ∞

we have
 
r
(1 + ǫ) 2x Hm x 
s X
∀x > 0, P  2 ψ̄
ν̄A,p j,k ≥ √ Bm + V m + κ(ǫ) ≤ e−x . (26)
p p p
(j,k)∈m

In order to evaluate Bm , Vm and Hm , we use Viennet’s inequality (54). There exists a

function b such that, for all p = 1, 2, P |b|p ≤ κp where κp is defined in (2) and for all
functions t ∈ L2 (P̄ ),
1
Var(t̄(A1 )) ≤ P bt2 .
q

14
Thus
2
X 1 X 2
X
2 κ1
Bm = Var(ψ̄j,k (A1 )) ≤ P bψj,k ≤ ψj,k .
q q
(j,k)∈m (j,k)∈m (j,k)∈m ∞

2 ≤ Φ2 Dm , thus,
P
From Assumption [M2 ], (j,k)∈m ψj,k
∞

2 Φ2 κ1 Dm
Bm ≤ . (27)
q
From Viennet’s and Cauchy-Schwarz inequalities

P bt2 (P t2 )1/2 (P b2 )1/2

Vm2 = sup Var(t̄(A1 )) ≤ sup ≤ sup ktk∞ .
t∈B1 (Sm ) t∈B1 (Sm ) q t∈B1 (Sm ) q

Since t ∈ B1 (Sm ), we have by Cauchy-Schwarz inequality

(P t2 )1/2 ≤ (ktk∞ ktk2 ksk2 )1/2 ≤ (ktk∞ ksk2 )1/2 .

√
From Assumption [M2 ], we have ktk∞ ≤ Φ Dm , and from Viennet’s inequality P b2 ≤
κ2 < ∞, thus we obtain
3/4
2 3/2 1/2 Dm
Vm ≤ Φ (ksk2 κ2 ) . (28)
q
Finally, from Assumption [M2 ], we have, using Cauchy-Schwarz inequality

2
X
2 1 X
2 Φ2 Dm
Hm = ψ̄j,k ≤ ψj,k ≤ . (29)
4 4
(j,k)∈m ∞ (j,k)∈m ∞

Let yn > 0. We define

s !2
(log Dm )γ + yn (log Dm )γ + yn
Lm = (1 + ǫ)L1 + L2 + L3 .
2Dm
1/4 2(log n)κ

1/2
We apply Inequality (26) with x = ((log Dm )γ + yn )/ ksk2 and the evaluations (27), (28)
√
and (29). Recalling that 1/p ≤ 2/( n(log n)κ ), this leads to
  !
L D (log D )γ yn
m m m
X
2
P ν̄A,p ψ̄j,k ≥ ≤ exp − p exp(− p ).
n ksk2 ksk2
(j,k)∈m

In order to give an upper bound on Hm x, we used that the support of s in included in

[0, 1], thus
1 = ksk1 ≤ ksk2 .
The result follows by taking yn = (log n)γ ≥ (log Dm )γ .

Claim 3. We keep the notations κ/2 > γ > 1, L2 of the proof of Claim 2. For all
m, m′ ∈ Mn we take
s !2
(log n)γ 4Φ
Lm,m′ = 4 L2 + , (30)
(Dm ∨ Dm′ ) 1/4 3(log n)κ−γ

15
we have, for all η > 0,
n)γ
!
ksm − sm′ k22 η Lm,m′ (Dm ∨ Dm′ ) − (log 1/2
P sup νn∗ (sm − sm′ ) − − > 0 ≤ Ls,γ e ksk
2
m,m′ ∈Mn 2η 2 n

(log(Dm ∨D ′ ))γ
− m
P 1/2
ksk
with Ls,γ = 2 m,m′ ∈Mn e . 2

Remark : The constant Ls,γ is finite since for all x, y > 0, (log(x ∨ y))γ ≥ ((log x)γ +
(log y)γ )/2.
As in Claim 2, when (L2 /L1 )8 (log n)4(2κ−γ) ≤ Dm ≤ n, we have
!2
23/2
Lm,m′ ≤ 1 + √ (log n)−2(κ−2γ) 4L21 .
3 κ1

Proof of Claim 3.
We keep the notations of the proof of Claim 2 and for m, m′ ∈ Mn , let tm,m′ =
(sm − sm′ )/ ksm − sm′ k2 . We use the inequality 2ab ≤ a2 η −1 + b2 η, which holds for all
a, b ∈ R, η > 0. This leads to

ksm − sm′ k22 η ∗ 2

νn∗ (sm − sm′ ) = ksm − sm′ k2 νn∗ (tm,m′ ) ≤ + νn (tm,m′ )
2η 2
2
ksm − sm′ k2 η 2
= + ν̄A,p (t̄m,m′ ) + ν̄B,p (t̄m,m′ )
2η 2
2
ksm − sm′ k2
≤ + η(ν̄A,p (t̄m,m′ ))2 + η(ν̄B,p (t̄m,m′ ))2 .
2η
Now from Bernstein’s inequality (see Section 7), we have
 s 
2Var(t̄ m,m′ (A1 ))x kt̄ ′ k
m,m ∞ x
∀x > 0, P ν̄A,p (t̄m,m′ ) > + ≤ e−x . (31)
p 3p

From Viennet’s and Cauchy-Schwarz inequalities, we have

q
kt 2 2
2
P btm,m′ m,m ∞ P b P tm,m′
′ k
Var(t̄m,m′ (A1 )) ≤ ≤ .
q q
Moreover
P b2 ≤ κ2 , P t2m,m′ ≤ ktm,m′ k∞ ktm,m′ k2 ksk2 .

√ tm,m ∈ Sm ∪ Sm and ktm,m k2 = 1, we have, from Assumption [M2 ] ktm,m k∞γ ≤

Since ′ ′ ′ ′

Φ Dm ∨ Dm′ . Let yn > 0. We apply Inequality (31) with x = [(log(Dm ∨ Dm′ )) +

1/2
yn ]/ ksk2 . We define
s !2
L′m,m′ (log(Dm ∨ Dm′ ))γ + yn 4Φ [(log(Dm ∨ Dm′ ))γ + yn ]
= L2 + ,
4 2(Dm ∨ Dm′ )1/4 6(log n)κ

we have
 s  !
L′m,m′ (Dm ∨D )m′ (log(Dm ∨ Dm′ ))γ 1/2
P ν̄A,p (t̄m,m′ ) >  ≤ exp −
1/2
e−yn /ksk2 .
4n ksk2

16
The result follows by taking yn = (log n)γ and using 2 ≤ Dm ≤ n.
Conclusion of the proof:
Let η > 0 and pen′ (m) ≥ (L1,m + ηLm,m )Dm /n where L1,m and Lm,m are defined respec-
tively by (25) and (30). From Claims 1, 2 and 3 and (24), we obtain that, for all mo and
with probability larger than Ls,θ (log n)(θ+2)κ n−θ/2
1 1 Dmo
(1 − ) ks − s̃k22 ≤ (1 + ) ks − smo k22 + pen′ (mo ) + ηL(mo , mo ) . (32)
η η n

Assume that Dm ≥ (L2 /L1 )8 (log n)4(2κ−γ) , then we have from remarks 6.2 and 6.2
2
2κ(ǫ)
L1,m ≤ 1 + ǫ + 1 + √ (log n)−(κ−2γ) 4L21 and
κ1
!2
23/2
Lm,m ≤ 1+ √ (log n)−2(κ−γ) 4L21 .
3 κ1

Take η = (log n)κ−γ , we have (L1,mo + ηLmo ,mo )Dmo /n ≤ Cpen(mo ). Fix ǫ > 0 such that
[1 + ǫ]2 < K/4. Since κ > γ, for n ≥ no , we have L1,m + ηLm,m ≤ KL21 , thus, inequality
(13) follows follows from (32) as soon as n > no . We remove the condition n > no by
improving the constant Ls in (13) if necessary.

6.3 Proof of Theorem 4.1.

The proof follows the previous one, the main difference is that the coupling lemma (Claim
1) as well as the covariance inequalities are much harder to handle in the τ -mixing case.
This leads to more technical computations to recover the results obtained in the β-mixing
case (see Claims 2, 3 and the proof of inequality (45)). We start with the decomposition
(21). As in the previous proof, the decomposition of the risk given in Birgé & Massart [8]
or in Comte & Merlevède [13] could be used. This leads to a loss in the constant in front
of the main term in (18) without avoiding any of the main difficulties. We divide the proof
in four claims.
Claim 1 : For all l = 0, ..., p − 1, let us denote by Al = (X2lq+1 , ..., X(2l+1)q ) and Bl =
∗
(X(2l+1)q+1 , ..., X(2l+2)q ). There exist random vectors A∗l = (X2lq+1 ∗
, ..., X(2l+1)q ) and Bl∗ =
∗ ∗
(X(2l+1)q+1 , ..., X(2l+2)q ) such that for all l = 0, ..., p − 1 :
• A∗l and Al have the same law,

• A∗l is independent of A0 , ..., Al−1 , A∗0 ..., A∗l−1

• E(|Al − A∗l |q ) ≤ qτq

the same being true for the variables Bl .
Proof of Claim 1 :
We use the same recursive construction as Viennet [25].
Let (δj )0≤j≤p−1 be a sequence of independent random variables uniformly distributed over
[0, 1] and independent of the sequence (Aj )0≤j≤p−1 . Let A∗0 = (X1∗ , ..., Xq∗ ) be the random
variable given by equality (4) for M = σ(Xi , i ≤ −q), A0 and δ0 .
Now suppose that we have built the variables A∗l for l < l′ . From equality (4) applied to
the σ-algebra σ(Al , A∗l , l < l′ ), Al′ and δl′ , there exists a random variable A∗l′ satisfying
the hypotheses of Claim 1.
We build in the same way the variables Bl∗ for all l = 0, ..., p − 1.

17
We keep the notations νn∗ , ν̄A,p , ν̄B,p , t̄ and B1 (Sm ) that we introduced in the proof of The-
√
orem 3.1. As in the proof of Theorem 3.1, we assume that, for some κ > 2, n(log n)κ /2 ≤
√
p ≤ n(log n)κ and for the sake of simplicity that pq = n/2, the modifications needed to
handle the extra term when q = [n/(2p)] being straightforward. We have
X X X
V (m̂) = νn2 (ψj,k ) ≤ 2 (Pn − Pn∗ )2 (ψj,k ) + 2 (νn∗ )2 (ψj,k ) (33)
(j,k)∈m̂ (j,k)∈m̂ (j,k)∈m̂

Claim 2 : There exists a constant L = LA,KL ,K∞ ,κ,θ such that

 
(log n)κ(θ+1)
((Pn − Pn∗ )(ψj,k ))2  ≤ L
X
E . (34)
n(θ−3)/2
j,k∈m̂

Proof of Claim 2 :

   
X X
E (Pn − Pn∗ )2 (ψj,k ) ≤ E  sup (Pn − Pn∗ )2 (ψj,k )
m∈Mn
(j,k)∈m̂ (j,k)∈m
X X
E (Pn − Pn∗ )2 (ψj,k )

≤
m∈Mn (j,k)∈m
p
2 X X
≤ (gA,m (j, k, l, l′ ) + gB,m (j, k, l, l′ ))
p2 ′
m∈Mn l,l =1

with
 
X
gm,A (j, k, l, l′ ) = E  ψ̄j,k (Al ) − ψ̄j,k (A∗l ) ψ̄j,k (Al′ ) − ψ̄j,k (A∗l′ )  .

(j,k)∈m

We develop this last term and we get, since

KL 23j/2 |x − y|q
ψ̄j,k (x) − ψ̄j,k (y) ≤
2q
 
X
gA,m (j, k, l, l′ ) ≤ E  ψ̄j,k (Al ) − ψ̄j,k (A∗l ) ψ̄j,k (Al′ ) − ψ̄j,k (A∗l′ ) 
(j,k)∈m
 
X Al′ − A∗l′ q
≤ E ψ̄j,k (Al ) − ψ̄j,k (A∗l ) KL 23j/2
2q
(j,k)∈m
 
KL τq  X 
≤ sup 23j/2 ψ̄j,k (x) − ψ̄j,k (y)
2 x,y∈Rq  
(j,k)∈m
Jm
( )
KL τq X 3j/2
X
≤ 2 sup |ψj,k (x) − ψj,k (y)|
4 x,y∈R
j=0 k∈Z

2 X
≤ AKL K∞ 22Jm τq since |ψj,k | ≤ AK∞ 2j/2
3
k∈Z ∞

18
We can do the same computations for the term gB,m (j, k, l, l′ ) and we obtain
 
(log n)κ(θ+1)
((Pn − Pn∗ )(ψj,k ))2  ≤ Lτq
X X
E 22Jm ≤ Lτq 22Jn ≤ L (θ−3)/2
.
j,k∈m̂ m∈M
n
n

√
The last inequality comes from q ≥ n/(2(log n)κ ) and Assumption [AR], the one before
comes from Assumption [W].

Claim 3. Let us keep the notations of Theorem 4.1, let u = 6/(7 + θ) < 1/2 and recall
that κ > 2. Let γ be a real number in (1, κ/2). Let
∞
X ∞
X
L21 = AK∞ KBV β̃l , L22 = 2ΦKBV
u
β̃ku , L3 = κ(ǫ)Φ
l=0 k=0
s !2
(log Dm )γ (log Dm )γ
and L1,m = 4(1 + ǫ) (1 + ǫ)L1 + L2 + L3 , (35)
Dm
1/2−u (log n)κ
There exists a constant Ls such that
  
 X L D 
1,m m  Ls
E  sup (νn∗ )2 (ψj,k ) − ≤ .
m∈Mn  n  n
(j,k)∈m

Remark : The series ∞

P P∞ u
l=0 β̃l and k=0 β̃k are convergent under our hypotheses on the
2 2/3 1/3
coefficients τ . Since s ∈ L ([0, 1]), we have from Inequality (6), β̃l ≤ 2ksk2 τl and thus
2/3
β̃l ≤ 2ksk2 (1 + l)−(1+θ)/3 . The series ∞ u
P
k=0 β̃k converge since θ > 5 and

u(1 + θ) 2(1 + θ) θ−5

= =1+ > 1.
3 7+θ θ+7

We use here β̃ instead of τ which allows to take L1 not depending on ksk2 .

Proof of Claim 3 :
As in the previous section we use the following decomposition
X X 2
(νn∗ )2 (ψj,k ) = ν̄A,p (ψ̄j,k ) + ν̄B,p (ψ̄j,k )
(j,k)∈m (j,k)∈m
X 2 X 2
≤ 2 ν̄A,p (ψ̄j,k ) + 2 ν̄B,p (ψ̄j,k )
(j,k)∈m (j,k)∈m

We treat both terms with Proposition 7.4 applied to the random variables (A∗l )0=1,..,p−1
and (Bl∗ )l=0,..,p−1 and to the class of functions (ψ̄j,k )(j,k)∈m . Let

X X
2
Var ψ̄j,k (A1 ) , Vm2 = 2 2

Bm = sup Var(t̄(A1 )), Hm =k ψ̄j,k k∞ .
(j,k)∈m t∈B1 (Sm ) (j,k)∈m

We have, from Proposition 7.4

 
r
(1 + ǫ) 2x Hm x 
s X
∀x > 0, P  (ν̄A,p )2 ψ̄j,k ≥ √ Bm + Vm + κ(ǫ) ≤ e−x . (36)
p p p
(j,k)∈m

19
Let us now evaluate Bm , Vm and Hm , we have
q
!
2 1 X X
Bm = Var ψj,k (Xi ) .
(2q)2
(j,k)∈m i=1

The last inequality comes

P from Assumption [W].
Since L21 = AK∞ KBV ∞ l=0 β̃l we have

2 L21 Dm
Bm ≤ . (37)
2q

Let us deal with the term Vm2 . We have

q
2 X
Vm2 ≤ sup Var(t̄(A1 )) ≤ (q + 1 − k) sup |Cov(t(X1 ), t(Xk ))| (38)
t∈B1 (Sm ) (2q)2 t∈B1 (Sm )
k=1

From Inequality (5), we have

|Cov(t(X1 ), t(Xk ))| ≤ ktkBV ktk∞ β̃k−1 .

kψj,k k2BV 
X
≤  ≤ KBV Dm .
(j,k)∈m
√
Thus ktkBV ≤ Dm KBV . From Assumption [M2 ], we have ktk∞ ≤ Φ Dm . Thus
3/2
|Cov(t(X1 ), t(Xk ))| ≤ ΦKBV β̃k−1 Dm . (39)

20
Moreover, we have by Cauchy-Schwarz inequality and [M2 ]
p
|Cov(t(X1 ), t(Xk ))| ≤ ktk∞ ktk2 ksk2 ≤ Φ ksk2 Dm . (40)

We use the inequality a ∧ b ≤ au b1−u with

3/2
p 6 1
a = ΦKBV β̃k−1 Dm , b = Φ ksk2 Dm , u = < .
7+θ 2
From (39) and (40), we derive that
u
|Cov(t(X1 ), t(Xk ))| ≤ L′k Dm
1/2+u
where L′k = Φ KBV β̃k−1 ksk1−u
2 .

Pluging this inequality in (38), we obtain

1/2+u ∞
L22 ksk21−u Dm X
Vm2 ≤ since L22 = 2ΦKBV
u
β̃ku . (41)
4q
k=0

Finally, we have from hypothesis [M2 ]

2 1 X
2 Φ2 Dm
Hm ≤ ψj,k ≤ . (42)
4 4
(j,k)∈m ∞

1/2+u
Let y > 0 and let us apply Inequality (36) with x = ((log Dm )γ / ksk1−u 2 ) + (y/Dm ).
We have, from (37), (41) and (42)
√
  s !
L 2D L D (log D )γ y
1 m 3 m m
X
P (ν̄A,p )2 (ψ̄j,k ) > (1 + ǫ) + 1−u + 1/2+u
2pq 2p ksk 2 Dm
(j,k)∈m
!2
v 
1/2+u (log Dm )γ
u L2 ksk1−u Dm
u
(log Dm )γ y − 1−u −(1/2+u)
+ 2 2
+ 1/2+u  ≤ e ksk2 e−Dm y
.
t  
2pq 1−u
ksk2 Dm
√ √ √
Then, we use the inequality α + β ≤ α + β with
(log Dm )γ y
α= 1−u and β = 1/2+u
ksk2 Dm
and the inequality (a + b)2 ≤ (1 + ǫ)a2 + (1 + ǫ−1 )b2 with
s !r
(log Dm )γ L3 (log Dm )γ Dm
a = (1 + ǫ)L1 + L2 1/2−u
+ 1−u
Dm ksk2 (log n) κ n
q
1 1−u L3 y
and b = √ L2 ksk2 y + .
n (log n)κ Dm
u

Setting Lm = (1 + ǫ)a2 n/Dm , we obtain

 
−1
q 2
X Lm Dm (1 + ǫ ) L3 y
P (ν̄A,p )2 (ψ̄j,k ) − > L2 ksk1−u
2 y+ 
n n (log n)κ Dm
u
(j,k)∈m
γ
− (log D1−u
m)
−(1/2+u)
≤e ksk
2 e−Dm y
.

21
Thus, for all y > 0,
   
γ −(1/2+u)
 X Lm Dm  Ls X − (log D1−u
m)
−Dm y
2 2  ksk
P  sup (ν̄A,p ) (ψ̄j,k ) − > (y + y ) ≤ e 2
m∈Mn  n  n
(j,k)∈m m∈Mn
q 2
where Ls = 2(1 + ǫ−1 ) (L2 ksk1−u
2 )∨ L3 /((log 2)κ 2u ) . We can integrate this last
inequality to prove Claim 3.

Claim 4 :We keep the notations of the previous Claims. Let

s !2
(log(D ∨ D ′ ))γ Φ
m m
L2 (m, m′ ) = 4 L2 + . (43)
(Dm ∨ Dm′ )1/2−u 3(log n)κ−γ

Then there exists a constant Ls,θ depending on ksk2 and θ such that, for all η > 0
( )!
ksm − sm′ k22 L2 (m, m′ )(Dm ∨ Dm′ ) ηLs,θ
E sup νn (sm − sm′ ) − −η ≤ .
m,m′ ∈Mn 2η n n

Proof of Claim 4 :

( )!
ksm − sm′ k22 L2 (m, m′ )(Dm ∨ Dm′ )
E sup νn (sm − sm ) −
′ −η
m,m′ ∈Mn 2η n
!
≤E sup (Pn − Pn∗ )(sm − sm′ )
m,m′
( )!
ksm − sm′ k22 L2 (m, m′ )(Dm ∨ Dm′ )
+E sup νn∗ (sm − sm′ ) − −η . (44)
m,m′ 2η n

Since ∀l = 0, ..., p − 1, E (|Al − A∗l |q ) ≤ qτq , we have

!
X
E sup (Pn − Pn∗ )(sm − sm′ ) ≤ 2 E (|(s̄m − s̄m′ )(A1 ) − (s̄m − s̄m′ )(A∗1 )|)
m,m′ m,m′
X
≤ τq Lip(sm − sm′ ).
m,m′

When m ⊂ m′ , we have, for all x, y ∈ R, using Assumption [W],

Jm′ 2jX
−A1
|(sm − sm′ )(x − y)| X |ψj,k (x) − ψj,k (y)|
≤ |P ψj,k |
|x − y| |x − y|
j=Jm +1 k=−A2

≤ 2A ksk2 KL 23j/2 .

22
√ √
Thus, Lip(sm − sm′ ) ≤ A ksk2 KL 823Jm′ /2 /( 8 − 1) and by Assumptions [W], [AR] and
the value of q,
!
(log n)κ(θ+1)+1
E sup (Pn − Pn∗ )(sm − sm′ ) ≤ Ls n3/2 (log n)τq ≤ Ls . (45)
m,m′ n(θ−2)/2

Let us deal with the other term in (44). We have, ∀η > 0

ksm − sm′ k22 η 2

νn∗ (sm − sm′ ) ≤ + ν̄A,p (t̄m,m′ ) + ν̄B,p (t̄m,m′ )
2η 2
ksm − sm k2
′
2
≤ + η(ν̄A,p (t̄m,m′ ))2 + η(ν̄B,p (t̄m,m′ ))2 (46)
2η

where, as in the proof of Theorem 3.1, tm,m′ = (sm − sm′ )/ksm − sm′ k2 . We apply
Bernstein’s inequality to the function t̄m,m′ and the variables A∗l , we have
 s 
2Var(t̄m,m (A0 ))x kt̄m,m k∞ x 
′ ′
∀x > 0, P ν̄A,p (t̄m,m′ ) > + ≤ e−x . (47)
p 3p

We proceed as in the proof of Claim 3 to control this variance. We have, by stationarity

of the process (Xn )n∈Z ,
q−1
1 X
Var(t̄m,m′ (A0 )) = 2 (q − k)Cov(tm,m′ (X1 ), tm,m′ (Xk+1 )).
2q
k=0

From Inequality (5), we have

Cov(tm,m′ (X1 ), tm,m′ (Xk+1 )) ≤ tm,m′ BV

tm,m′ ∞
β̃k .

Let m △ m′ be the set of indexes that belong to m ∪ m′ but do not belong to m ∩ m′ . We

use the same computations as in the proof of Claim 3 to get
P
(j,k)∈m′ △m (P ψj,k )ψj,k
s
kψj,k k2BV ≤ KBV (Dm ∨ Dm′ ).
X
BV
tm,m′ BV
≤ ≤
ksm − sm′ k2
(j,k)∈m′ △m

√
Since tm,m′ ∞
= Φ Dm ∨ Dm′ , we have

Cov(tm,m′ (X1 ), tm,m′ (Xk+1 )) ≤ ΦKBV β̃k (Dm ∨ Dm′ )3/2 . (48)

Moreover, we have
p
Cov(tm,m′ (X1 ), tm,m′ (Xk+1 )) ≤ tm,m′ tm,m′ ksk2 ≤ Φ ksk2 ′ ).
(Dm ∨ Dm (49)
∞ 2

Thus, using a ∧ b ≤ au b1−u with

6 1
a = ΦKBV β̃k (Dm ∨ Dm′ )3/2 , b = Φ ksk2
p
(Dm ∨ Dm′ ), and u = < ,
7+θ 2
we have
u
Cov(tm,m′ (X1 ), tm,m′ (Xk+1 )) ≤ ΦKBV β̃ku ksk1−u
2 (Dm ∨ Dm′ )1/2+u .

23
Thus
∞
!
(Dm ∨ Dm′ )1/2+u
ksk1−u
X
u
Var(t̄m,m′ (A0 )) ≤ ΦKBV β̃ku 2 . (50)
2q
k=0

Moreover
1 1 p ′ .
kt̄m,m′ k∞ ≤ ktm,m′ k∞ ≤ Φ Dm ∨ Dm (51)
2 2
Now, we use (47) with x = (log(Dm ∨ Dm′ ))γ / ksk1−u 2 + y/(Dm ∨ Dm′ )1/2+u . From (50)
and (51), we have for all y > 0,
 v !
1−u
u
u (Dm ∨ Dm′ )1/2+u ksk y
2
P ν̄A,p (t̄m,m′ ) > L2 t (log(Dm ∨ Dm′ ))γ +
2pq (Dm ∨ Dm′ )1/2+u
p !!
Φ Dm ∨ Dm ′ (log(Dm ∨ Dm′ ))γ y
+ +
6p ksk1−u
2
(Dm ∨ Dm′ )1/2+u
(log(Dm ∨D ′ ))γ y
− m −
ksk1−u (Dm ∨D ) 1/2+u
≤e 2 e m′ .
√ √ √
Now we use the inequality a+b≤ a+ b with

ksk1−u y
a = (log(Dm ∨ Dm′ ))γ and b = 2
(Dm ∨ Dm′ )1/2+u

and we obtain, using Assumption [M1 ]

r !
L2 (m, m′ )(Dm ∨ Dm
′ ) Ls √
P ν̄A,p t̄m,m′ − > √ ( y + y)
n n
(log(Dm ∨D ′ ))γ
− m
−(1/2+u) y
ksk1−u
≤e 2 e−(Dm ∨Dm′ ) ,

with s !2
(log(Dm ∨ Dm′ ))γ Φ(log(Dm ∨ Dm′ ))γ
L2 (m, m′ ) = L2 + ,
(Dm ∨ Dm′ )1/2−u 3(log n)κ
Φ
q
and Ls = L2 ksk1−u
2 ∨ .
3(log 2)κ 2u
Thus, we obain

L2 (m, m′ )(Dm ∨ Dm
′ ) L2s

2 2
P (ν̄A,p t̄m,m′ ) > 2 + 4 (y + y )
n n
(log(Dm ∨Dm′ ))γ y
− 1−u −
ksk2 (Dm ∨Dm′ )1/2+u
≤e .

The same result holds for ν̄B,p t̄m,m′ . Thus we obtain from (46)

ksm − sm′ k22 L2 (m, m′ )(Dm ∨ Dm

′ ) L2

P νn∗ (sm − sm′ ) ≥ + 4η + 8η s (y + y 2 )
2η n n
(log(Dm ∨D ′ ))γ y
− m −
ksk1−u (Dm ∨D ) 1/2+u
≤ 2e 2 m′ .

24
We deduce that
ksm − sm′ k22 L2 (m, m′ )(Dm ∨ Dm ′ )

P ∃m, m′ ∈ Mn , νn∗ (sm − sm′ ) − − 4η
2η n
(log(Dm ∨D ′ ))γ
!
y
L2 m

− −
ksk1−u
X 1/2+u
≥ 8η s (y + y 2 ) ≤ 2 e 2 e (Dm ∨Dm′ ) .
n ′m,m ∈Mn

We integrate this last inequality to get Claim 4.

Conclusion of the proof:

Take
Dm
pen′ (m) ≥ (2L1,m + ηL2 (m, m)) ,
n
where L1,m and L2 (m, m) are defined by (35) and (43) respectively. From Claims 2, 3 and
4, if we take the expectation in (21), we have, for some constant Ls ,

2

2 ′ Dmo ηLs
E ks − s̃k2 ≤ E ks − ŝmo k2 + pen (mo ) − V (mo ) + 2ηL2 (mo , mo ) + . (52)
n n
2(7+θ)/(θ−5)
Moreover, if Dm ≥ (L2 /L1 )(log n)κ−γ/2 , we have
2
L1,m L3 −(κ−γ)
≤ (1 + ǫ) (1 + ǫ) + 1 + (log n)
4L21 2L1
L3 2

3 −1
≤ (1 + ǫ) + (1 + ǫ )(1 + ǫ) 1 + (log n)−2(κ−γ) . (53)
2L1

We use the inequality (a + b)2 ≤ (1 + ǫ)a2 + (1 + ǫ−1 )b2 to obtain (53). Moreover, we have
2
Φ
L2 (m, m) ≤ 4L21 1+ −(κ−γ)
(log n) .
6L1

As in the proof of Theorem 3.1, we take η = (log n)κ−γ and we fix ǫ sufficiently small. For
n ≥ no , we have 2L1,m + ηL2 (m, m) < KL21 . Thus inequality (18) follows from (52).

7 Appendix
This section is devoted to technical lemmas that are needed in the proofs.

7.1 Covariance inequality

Lemma 7.1 Viennet’s inequality Let (Xn )n∈ZP be a stationary and P
β-mixing process. There
exists a positive function b such that P (b) ≤ l=0 βl , P (b ) ≤ p ∞
∞ p
l=1 l
p−1 β , and for all
l
function h ∈ L2 (P )
q
!
X
Var h(Xl ) ≤ 4qP (bh2 ). (54)
l=1

25
7.2 Concentration inequalities
We sum up in this section the concentration inequalities we used in the proofs. We begin
with Bernstein’s inequality

Proposition 7.2 Bernstein’s inequality

Let X1 , ..., Xn be iid random variables valued in a measurable space (X, X ) and let t be a
measurable real valued function. Let v = Var(t(X1 )) and b = ktk∞ , then, for all x > 0,
we have r !
2x bx
P (Pn − P )t > v + ≤ e−x .
n 3n

Now we give the most important tool of our proof, it is a concentration’s inequality for
the supremum of the empirical process over a class of function. We give here the version
of Bousquet [10].

Theorem 7.3 Talagrand’s Theorem

Let X1 , ..., Xn be i.i.d random variables valued in some measurable space [X, X ]. Let F be
a separable class of bounded functions from X to R and assume that all functions t in F
are P -measurable, and satisfy Var(t(X1 )) ≤ σ 2 , ktk∞ ≤ b. Then
r !
2x (σ 2 + 2bE (supt∈F νn (t))) bx

P sup νn (t) > E sup νn (t) + + ≤ e−x .
t∈F t∈F n 3n

In particular, for all ǫ > 0, if κ(ǫ) = 1/3 + ǫ−1 , we have

r !
2x bx
P sup νn (t) > (1 + ǫ)E sup νn (t) + σ + κ(ǫ) ≤ e−x .
t∈F t∈F n n

We can deduce from this Theorem a concentration’s inequality for χ-square type statistics.
This is Proposition (7.3) of Massart [20].

Proposition 7.4 Let X1 , ..., Xn be independent and identically distributed random vari-
ables valued in some measurable space (X, X ). Let P denote their common distribution.
Let φλ be a finite family of measurable and bounded functions on (X, X ). Let
X X
HΛ2 = k φ2λ k∞ and BΛ2 = Var(φλ (X1 )).
λ∈Λ λ∈Λ

Moreover, let SΛ = a ∈ RΛ : λ∈Λ a2λ = 1 and

( !)
X
VΛ2 = sup Var aλ φλ (X1 ) .
a∈SΛ λ∈Λ

Then the following inequality holds, for all positive x and ǫ

 !1/2 r 
X 1+ǫ 2x HΛ x 
P (Pn − P )2 φλ ≥ √ BΛ + V Λ + κ(ǫ) ≤ e−x , (55)
n n n
λ∈Λ

where κ(ǫ) = ǫ−1 + 1/3.

26
Proof :
Following Massart [20] Proposition 7.3, we remark that, by Cauchy-Schwarz’s inequal-
ity
!1/2 !
X X X
2
νn φλ = sup aλ νn φλ = sup νn aλ φλ .
λ∈Λ a∈SΛ λ∈Λ a∈SΛ λ∈Λ

Thus the result follows by applying Talagrand’s Theorem to the class of functions
( )
X
F = t= aλ φλ ; a ∈ SΛ .
λ∈Λ

27
References
[1] H. Akaike. Information theory and an extension of the maximum likelihood principle.
In Second International Symposium on Information Theory (Tsahkadsor, 1971), pages
267–281. Akadémiai Kiadó, Budapest, 1973.

[2] Hirotugu Akaike. Statistical predictor identification. Ann. Inst. Statist. Math.,
22:203–217, 1970.

[3] Donald W. K. Andrews. Nonstrong mixing autoregressive processes. J. Appl. Probab.,

21(4):930–934, 1984.

[4] S. Arlot and P. Massart. Data-driven calibration of penalties for least squares regres-
sion. Submitted to Journal of Machine learning research, 2008.

[5] Sylvain Arlot. Model selection by resampling penalization. hal-00262478, 2008.

[6] Y. Baraud, F. Comte, and G. Viennet. Adaptive estimation in autoregression or

β-mixing regression via model selection. Ann. Statist., 29(3):839–875, 2001.

[7] Henry C. P. Berbee. Random walks with stationary increments and renewal theory,
volume 112 of Mathematical Centre Tracts. Mathematisch Centrum, Amsterdam,
1979.

[8] Lucien Birgé and Pascal Massart. From model selection to adaptive estimation. In
Festschrift for Lucien Le Cam, pages 55–87. Springer, New York, 1997.

[9] Lucien Birgé and Pascal Massart. Minimal penalties for Gaussian model selection.
Probab. Theory Related Fields, 138(1-2):33–73, 2007.

[10] Olivier Bousquet. A Bennett concentration inequality and its application to suprema
of empirical processes. C. R. Math. Acad. Sci. Paris, 334(6):495–500, 2002.

[11] Richard C. Bradley. Introduction to strong mixing conditions. Vol. 1. Kendrick Press,
Heber City, UT, 2007.

[12] F. Comte, J. Dedecker, and M. L. Taupin. Adaptive density deconvolution with

dependent inputs. Math. Methods Statist., 17(2):87–112, 2008.

[13] Fabienne Comte and Florence Merlevède. Adaptive estimation of the stationary den-
sity of discrete and continuous time mixing processes. ESAIM Probab. Statist., 6:211–
238 (electronic), 2002. New directions in time series analysis (Luminy, 2001).

[14] Jérôme Dedecker, Paul Doukhan, Gabriel Lang, José Rafael León R., Sana Louhichi,
and Clémentine Prieur. Weak dependence: with examples and applications, volume
190 of Lecture Notes in Statistics. Springer, New York, 2007.

[15] Jérôme Dedecker and Clémentine Prieur. New dependence coefficients. Examples and
applications to statistics. Probab. Theory Related Fields, 132(2):203–236, 2005.

[16] David L. Donoho, Iain M. Johnstone, Gérard Kerkyacharian, and Dominique Picard.
Density estimation by wavelet thresholding. Ann. Statist., 24(2):508–539, 1996.

[17] Paul Doukhan. Mixing, volume 85 of Lecture Notes in Statistics. Springer-Verlag,

New York, 1994. Properties and examples.

28
[18] Irène Gannaz and Olivier Wintenberger. Adaptive density estimation under depen-
dence. forthcoming in ESAIM, Probab. and Statist., 2008.

[19] C.L. Mallows. Some comments on cp . Technometrics, 15:661–675, 1973.

[20] Pascal Massart. Concentration inequalities and model selection, volume 1896 of Lec-
ture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer
School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword
by Jean Picard.

[21] C. Prieur. Change point estimation by local linear smoothing under a weak depen-
dence condition. Math. Methods Statist., 16(1):25–41, 2007.

[22] Emmanuel Rio. Théorie asymptotique des processus aléatoires faiblement dépendants,
volume 31 of Mathématiques & Applications (Berlin) [Mathematics & Applications].
Springer-Verlag, Berlin, 2000.

[23] Mats Rudemo. Empirical choice of histograms and kernel density estimators. Scand.
J. Statist., 9(2):65–78, 1982.

[24] Michel Talagrand. New concentration inequalities in product spaces. Invent. Math.,
126(3):505–563, 1996.

[25] Gabrielle Viennet. Inequalities for absolutely regular sequences: application to density
estimation. Probab. Theory Related Fields, 107(4):467–492, 1997.

[26] V. A. Volkonskiı̆ and Yu. A. Rozanov. Some limit theorems for random functions. I.
Teor. Veroyatnost. i Primenen, 4:186–207, 1959.

View publication stats

Ge 09 16 19
No ratings yet
Ge 09 16 19
483 pages
UQ4ML
No ratings yet
UQ4ML
263 pages
Nonparametric Estimation Under Shape Constraints - G&J
No ratings yet
Nonparametric Estimation Under Shape Constraints - G&J
495 pages
Gaussian Sequence Model
No ratings yet
Gaussian Sequence Model
470 pages
Densityestimation
No ratings yet
Densityestimation
28 pages
978 1 4612 1718 3
No ratings yet
978 1 4612 1718 3
219 pages
Restricted Parameter Space Estimation Problems
No ratings yet
Restricted Parameter Space Estimation Problems
171 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
Hybrid Least Squares For Learning Functions From Highly Noisy Data
No ratings yet
Hybrid Least Squares For Learning Functions From Highly Noisy Data
30 pages
I Prior
No ratings yet
I Prior
23 pages
Nonparametric Statistics Epiphany 2024-25
No ratings yet
Nonparametric Statistics Epiphany 2024-25
102 pages
Ch3-Rev2
No ratings yet
Ch3-Rev2
45 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Distributed Nonparametric Estimation: From Sparse To Dense Samples Per Terminal
No ratings yet
Distributed Nonparametric Estimation: From Sparse To Dense Samples Per Terminal
37 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
Model-Free Objetive Bayesian Prediction
No ratings yet
Model-Free Objetive Bayesian Prediction
8 pages
Eir December 2019
No ratings yet
Eir December 2019
1,937 pages
Snoussi 2007
No ratings yet
Snoussi 2007
45 pages
Adaptive Estimation in An Autoregression and A Geometrical
No ratings yet
Adaptive Estimation in An Autoregression and A Geometrical
37 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Estimating Regression Models of Finite But Unknown Order
No ratings yet
Estimating Regression Models of Finite But Unknown Order
17 pages
Multidimensional Change-Point Problems: Universite Catholique de Louυain and Russian Academy of Sciences
No ratings yet
Multidimensional Change-Point Problems: Universite Catholique de Louυain and Russian Academy of Sciences
13 pages
(2010) Paisley
No ratings yet
(2010) Paisley
160 pages
Walker 2016 SliceSampling
No ratings yet
Walker 2016 SliceSampling
18 pages
Model Selection For High-Dimensional Linear Regression
No ratings yet
Model Selection For High-Dimensional Linear Regression
22 pages
A Bayesian Approach To Nonparametric Test Problems
No ratings yet
A Bayesian Approach To Nonparametric Test Problems
16 pages
Nonparametric Risk Bounds For Time-Series Forecasting - McDonald at Al
No ratings yet
Nonparametric Risk Bounds For Time-Series Forecasting - McDonald at Al
40 pages
Schwarz EstimatingDimensionModel 1978
No ratings yet
Schwarz EstimatingDimensionModel 1978
5 pages
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
No ratings yet
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
34 pages
Pa 01 Density Estimation
No ratings yet
Pa 01 Density Estimation
25 pages
Adaptive Signal Recovery in Sparse Nonparametric Models: Natalia Stepanova and Marie Turcicova
No ratings yet
Adaptive Signal Recovery in Sparse Nonparametric Models: Natalia Stepanova and Marie Turcicova
15 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Kernel Density Estimation of Tsalli's Entropy With Applications in Adaptive System Training
No ratings yet
Kernel Density Estimation of Tsalli's Entropy With Applications in Adaptive System Training
7 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Adaptive Elastic-Net Estimation For Sparse Diffusion Processes
No ratings yet
Adaptive Elastic-Net Estimation For Sparse Diffusion Processes
36 pages
Estimando Una Funcion de Distribucion Con Datos Truncados
No ratings yet
Estimando Una Funcion de Distribucion Con Datos Truncados
16 pages
Transition Density Estimation For Stochastic Differential Equations Via Forward-Reverse Representations
No ratings yet
Transition Density Estimation For Stochastic Differential Equations Via Forward-Reverse Representations
32 pages
Minimum L - Distance Estimators For Non-Normalized Parametric Models
No ratings yet
Minimum L - Distance Estimators For Non-Normalized Parametric Models
32 pages
Non Singularity of The Asymptotic Fisher Information Matrix in Hidden Markov Models
No ratings yet
Non Singularity of The Asymptotic Fisher Information Matrix in Hidden Markov Models
13 pages
Predicting A Binary Sequence
No ratings yet
Predicting A Binary Sequence
22 pages
Wahba Improper Priors
No ratings yet
Wahba Improper Priors
9 pages
14.384 Time Series Analysis: Mit Opencourseware
No ratings yet
14.384 Time Series Analysis: Mit Opencourseware
6 pages
Adaptive Bayesian Density Regression For High-Dimensional Data
No ratings yet
Adaptive Bayesian Density Regression For High-Dimensional Data
25 pages
Lect1116-18 Active Learning
No ratings yet
Lect1116-18 Active Learning
6 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Lecture 11. Rice Milling Technology
100% (1)
Lecture 11. Rice Milling Technology
16 pages
Citrix Virtual Apps and Desktops Translate
No ratings yet
Citrix Virtual Apps and Desktops Translate
299 pages
Tabak Turner
No ratings yet
Tabak Turner
20 pages
Mendel and Heredity Worksheet
No ratings yet
Mendel and Heredity Worksheet
11 pages
Classification and Kernel Density Estimation
No ratings yet
Classification and Kernel Density Estimation
7 pages
Racine - 2007 - Nonparametric Econometrics A Primer
No ratings yet
Racine - 2007 - Nonparametric Econometrics A Primer
88 pages
1991 Datta
No ratings yet
1991 Datta
24 pages
Purcell Cash Why Seismic Matters Activity and Presentation
No ratings yet
Purcell Cash Why Seismic Matters Activity and Presentation
47 pages
Conditional Density Estimation With Neural Network
No ratings yet
Conditional Density Estimation With Neural Network
41 pages
GRAUER &amp WEIL (INDIA) LTD PDF
100% (2)
GRAUER &amp WEIL (INDIA) LTD PDF
8 pages
Chapter 11 Network Models: Quantitative Analysis For Management, 11e (Render)
No ratings yet
Chapter 11 Network Models: Quantitative Analysis For Management, 11e (Render)
32 pages
Grounded Theory Thesis Structure
100% (3)
Grounded Theory Thesis Structure
5 pages
Time Grad
No ratings yet
Time Grad
11 pages
Wind On Sheds PDF
100% (1)
Wind On Sheds PDF
12 pages
"Rectbeam" - Rectangular Concrete Beam Analysis/Design: Program Description
No ratings yet
"Rectbeam" - Rectangular Concrete Beam Analysis/Design: Program Description
46 pages
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
No ratings yet
Computers and Mathematics With Applications: Alejandro Balbás, Beatriz Balbás, Inna Galperin, Efim Galperin
15 pages
Question Bank CC-9 (Educational Psychology) Unit-1: Objective Questions
No ratings yet
Question Bank CC-9 (Educational Psychology) Unit-1: Objective Questions
7 pages
Estimating The Support of A High-Dimensional Distribution
No ratings yet
Estimating The Support of A High-Dimensional Distribution
28 pages
Science
No ratings yet
Science
5 pages
LinearAI-DS Mid ch1-6 2021S2 DR - Omar
No ratings yet
LinearAI-DS Mid ch1-6 2021S2 DR - Omar
10 pages
Type VR Vacuum Circuit Breaker Interruptor Automático Al Vacío Tipo VR Disjoncteur Sous Vide Type VR
No ratings yet
Type VR Vacuum Circuit Breaker Interruptor Automático Al Vacío Tipo VR Disjoncteur Sous Vide Type VR
113 pages
MY Resume
No ratings yet
MY Resume
1 page
Variational Problems in Machine Learning and Their Solution With Finite Elements
No ratings yet
Variational Problems in Machine Learning and Their Solution With Finite Elements
11 pages
TCAD Simulation
No ratings yet
TCAD Simulation
35 pages
Chapter 4 Flexural Design - (Part 3)
No ratings yet
Chapter 4 Flexural Design - (Part 3)
37 pages
CRI Test Method 114
No ratings yet
CRI Test Method 114
11 pages
IASC Template
No ratings yet
IASC Template
7 pages
HOA314N: Activity 2: Vernacular Houses
No ratings yet
HOA314N: Activity 2: Vernacular Houses
8 pages
Tutorial #1:the Essential ANSYS.: ME309: Finite Element Analysis in Mechanical Design
No ratings yet
Tutorial #1:the Essential ANSYS.: ME309: Finite Element Analysis in Mechanical Design
9 pages
7 Counters PDF
No ratings yet
7 Counters PDF
13 pages
A Comparative Analysis of The
No ratings yet
A Comparative Analysis of The
15 pages
Cyber Security Module 1 Lesson 3 Notes
No ratings yet
Cyber Security Module 1 Lesson 3 Notes
20 pages
English 4: Quarter 1: Week 3
No ratings yet
English 4: Quarter 1: Week 3
12 pages
Mad Catz Street Fighter V Arcade FightStick TE2 PS4 PS3 Product Guide
No ratings yet
Mad Catz Street Fighter V Arcade FightStick TE2 PS4 PS3 Product Guide
14 pages
Situation Infancy Mortality
No ratings yet
Situation Infancy Mortality
2 pages
Cotton Case Study
No ratings yet
Cotton Case Study
2 pages
Invoice Nntl-Msa003
No ratings yet
Invoice Nntl-Msa003
1 page
AP Chemistry Bonding Help Sheet: 2, (Diamond)
No ratings yet
AP Chemistry Bonding Help Sheet: 2, (Diamond)
6 pages
General Stochastic Processes in the Theory of Queues
From Everand
General Stochastic Processes in the Theory of Queues
Vaclav E. Benes
No ratings yet
Studies in the Theory of Random Processes
From Everand
Studies in the Theory of Random Processes
A. V. Skorokhod
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Solution of Certain Problems in Quantum Mechanics
From Everand
Solution of Certain Problems in Quantum Mechanics
A. Bolotin
No ratings yet