0% found this document useful (0 votes)
163 views7 pages

Lecture 8 DUDLEY'S INTEGRAL INEQUALITY

This document summarizes Dudley's integral inequality, which provides an upper bound for the size of a random process indexed over a compact set T. It establishes that for a random process with subgaussian increments, the expected supremum over T is bounded above by an integral involving the covering numbers of T. The proof involves a multiscale discretization that chains together increments over a sequence of increasingly fine nets covering T. This allows bounding the probability that the process exceeds a given value in terms of an integral of the logarithm of the covering numbers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views7 pages

Lecture 8 DUDLEY'S INTEGRAL INEQUALITY

This document summarizes Dudley's integral inequality, which provides an upper bound for the size of a random process indexed over a compact set T. It establishes that for a random process with subgaussian increments, the expected supremum over T is bounded above by an integral involving the covering numbers of T. The proof involves a multiscale discretization that chains together increments over a sequence of increasingly fine nets covering T. This allows bounding the probability that the process exceeds a given value in terms of an integral of the logarithm of the covering numbers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Non-Asymptotic Theory of Random Matrices

Lecture 8: DUDLEY’S INTEGRAL INEQUALITY

Lecturer: Roman Vershynin Scribe: Igor Rumanov

Tuesday, January 30, 2007

Let A : m × n matrix with i.i.d. entries, m > n.

We want to estimate

E sup | ||Ax||2 − m | ≤ ?,
x∈S n−1

(we want C n in place of ”?” here, it would be better estimate than in
√ √
asymptotic theory: C( m + n) ). Under the absolute value sign here
stands a random variable, even family of random variables indexed by points
of sphere x ∈ S n−1 , i.e. a random process.

Random process:
(Xt )t∈T is a collection of random variables indexed by t ∈ T .

• Classical: T = [a, b] - time interval. Examples of such processes are called


Levy Processes.
(Ex.: Brownian motion)
• General: T is arbitrary, such as T = S n−1 .
• ”Size of the random process”:

E sup Xt (index set has to be compact)


t∈T

How far a particle can get in time T ?


(Ex.: The highest level of water in a river in 10 years)

Previous approach: Discretization of T .


Consider an ε-net N of T : cover by ε-balls :

1
Compute E sup Xt ,
t∈N
approximate

Definition 1 (Covering Numbers). Let (T, d) be a compact metric space,


ε > 0. Then covering number N (T, ε) = minimal cardinality of an ε-net of
T = minimum possible number of ε-balls to cover T .

Measure of compactness of T :
log N (T, ε) is called metric entropy of T .

Sharper approach: Multiscale discretization.


Cover T progressively with radius εk -balls, k = 1, 2, 3, ...
The result will be

Dudley’s Integral Inequality


Assumptions:
1) EXt = 0 for all t
2) Increments |Xt − Xs | are proportional to the distance d(t, s).
|Xt −Xs |
d(t,s) is subgaussian for all t,s:
2
P(|Xt − Xs | > u · d(t, s)) ≤ Ce−cu f or u > 0,
”subgaussian increments”. (Here C and c are some constants).

2
Theorem 2 (Dudley [1, 2]). : For a process with subgaussian increments
Z ∞p
E sup Xt ≤ C log N (T, ε)dε
t∈T 0

↑ ↑
probabilistic geometric (in T )

(one can replace the upper limit of ∞ in the integral with diam(T )).
Singularity here is at 0.

(For sphere N (T, ε) ≈ ( 1ε )n ).


√ 2
( logx = inverse of ex ).

Proof: Let diam(T ) = 1. (Exercise: general case)

1) Let t0 ∈ T be arbitrary (reference point),

E sup Xt = E sup(Xt − EXt0 ) ≤ E sup(Xt − Xt0 ),


t∈T t∈T t∈T

by Jensen’s inequality, because sup is convex function.

2) Multiscale discretization of T :

CHAINING :

π1 (t)

t0

T
1 Let N1 be a 1/2-net of T of
size N1 = N (T, 1/2)
Find π1 (t) ∈ N1 nearest to t

3
Xt − Xt0 = (Xt − Xπ1 (t) ) + (Xπ1 (t) − Xt0 )
↑ ↑
there are at most N1
smaller than before (1/2) such r.v.’s (not too
many)

π1 (t) π2 (t)

t
t0

2 Let N2 be a 1/4-net of T of
size N2 = N (T, 1/4)
Find π2 (t) ∈ N2 nearest to t

Xt − Xt0 = (Xt − Xπ2 (t) ) + (Xπ2 (t) ) − Xπ1 (t) ) + (Xπ1 (t) − Xt0 )

↑ ↑
even smaller(1/4) there are (at most ?) N1 N2 ≤ N22 such r.v.’s
.......................................................................................................

k Let Nk be a 2−k -net of T of


size Nk = N (T, 2−k )
Find πk (t) ∈ Nk nearest to t
.......................................................................................................


X
Xt − Xt0 = Xπk (t) − Xπk−1 (t)
k=1


chaining identity (π0 (t) = t0 ),
because Xt − Xπk (t) → 0 a.s. (Exercise: use πk (t) → t).

4
Nice properties of multiscale discretization:

1)Increments are small:

−(k−1)

2
−k 2

πk (t) πk−1 (t)

d(πk (t), πk−1 (t)) ≤ d(πk (t), t) + d(πk−1 (t), t) ≤ 2−k + 2−(k−1) = 3 · 2−k .

2) There are at most Nk Nk−1 ≤ Nk2 pairs of (πk (t), πk−1 (t)), whatever t is.

Increments:
 
 cu2 a2k  
P |Xπk (t) − Xπk−1 (t)| > u · ak ≤ C exp − = C exp −c′ · 22k u2 a2k
d(πk (t), πk−1 (t))2

(holds for ∀ak > 0).


Thus we can bound every increment in the Chaining Identity:
the failure (to bound) probability is

X
p = P(∃k, ∃t ∈ T : |Xπk (t) −Xπk−1 (t) | > u·ak ) ≤ Nk2 ·C exp(−c·22k u2 a2k ).
k=1

In case of success: if ∀k, ∀t ∈ T :

|Xπk (t) − Xπk−1 (t) | ≤ uak ,


P
then |Xt − Xt0 | ≤ u ak .
Hence !
X
P sup |Xt − Xt0 | > u ak ≤ p. (∗)
t
k

5
P
It remains to choose weights ak . We have tradeoff here: we want ak to
be small, but for decreasing failure probability ak have to be large.
How large ? Say, for u ≥ 1 we want the summands in p be ∼ 2−k . Therefore
q
ak = c′ · 2−k log 2k Nk2 (f or u ≥ 1).

Then

X ∞
X
2 2
p ≤ CNk2 · (2k Nk2 )−u ≤ C 2−ku .
k=1 k=1

2
So subgaussian failure probability obeys the bound p ≤ C · 2−u .

This way we get an estimate for the sum of weights which appears in (*):
X

X
−k
q √ √ √
ak = c 2 log 2k Nk2 ≤ (use a + b ≤ 2( a + b))

X p X p
≤ c′′ ( 2−k log 2k + 2−k log Nk ) ≤
| {z } | {z }
≤ const ≥ const because
diamT = 1, N1 ≥ 2
X p X q Z 1p
′′′ −k ′′′ −k IV
≤ C 2 log Nk = C 2 log N (T, 2−k ) ≤ C log N (T, ε)dε := S
k k 0

(compare series with integrals in the last inequality)


We have

 
2
P sup |Xt − Xt0 | > uS ≤ Ce−u f or u ≥ 1.
t

Thus, the random variable S1 supt |Xt − Xs | is subgaussian and Dudley’s in-
equality follows immediately.

Problem: to find sharp estimate (a function better than S - will be done


next time - see Lecture 9).

6
References
[1] Michel Ledoux and Michel Talagrand. Probability in Banach spaces, vol-
ume 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results
in Mathematics and Related Areas (3)]. Springer-Verlag, Berlin, 1991.

[2] Michel Talagrand. The generic chaining. Springer Monographs in Math-


ematics. Springer-Verlag, Berlin, 2005.

You might also like