Non-Asymptotic Theory of Random Matrices
Lecture 8: DUDLEY’S INTEGRAL INEQUALITY
Lecturer: Roman Vershynin Scribe: Igor Rumanov
Tuesday, January 30, 2007
Let A : m × n matrix with i.i.d. entries, m > n.
We want to estimate
√
E sup | ||Ax||2 − m | ≤ ?,
x∈S n−1
√
(we want C n in place of ”?” here, it would be better estimate than in
√ √
asymptotic theory: C( m + n) ). Under the absolute value sign here
stands a random variable, even family of random variables indexed by points
of sphere x ∈ S n−1 , i.e. a random process.
Random process:
(Xt )t∈T is a collection of random variables indexed by t ∈ T .
• Classical: T = [a, b] - time interval. Examples of such processes are called
Levy Processes.
(Ex.: Brownian motion)
• General: T is arbitrary, such as T = S n−1 .
• ”Size of the random process”:
E sup Xt (index set has to be compact)
t∈T
How far a particle can get in time T ?
(Ex.: The highest level of water in a river in 10 years)
Previous approach: Discretization of T .
Consider an ε-net N of T : cover by ε-balls :
1
Compute E sup Xt ,
t∈N
approximate
Definition 1 (Covering Numbers). Let (T, d) be a compact metric space,
ε > 0. Then covering number N (T, ε) = minimal cardinality of an ε-net of
T = minimum possible number of ε-balls to cover T .
Measure of compactness of T :
log N (T, ε) is called metric entropy of T .
Sharper approach: Multiscale discretization.
Cover T progressively with radius εk -balls, k = 1, 2, 3, ...
The result will be
Dudley’s Integral Inequality
Assumptions:
1) EXt = 0 for all t
2) Increments |Xt − Xs | are proportional to the distance d(t, s).
|Xt −Xs |
d(t,s) is subgaussian for all t,s:
2
P(|Xt − Xs | > u · d(t, s)) ≤ Ce−cu f or u > 0,
”subgaussian increments”. (Here C and c are some constants).
2
Theorem 2 (Dudley [1, 2]). : For a process with subgaussian increments
Z ∞p
E sup Xt ≤ C log N (T, ε)dε
t∈T 0
↑ ↑
probabilistic geometric (in T )
(one can replace the upper limit of ∞ in the integral with diam(T )).
Singularity here is at 0.
(For sphere N (T, ε) ≈ ( 1ε )n ).
√ 2
( logx = inverse of ex ).
Proof: Let diam(T ) = 1. (Exercise: general case)
1) Let t0 ∈ T be arbitrary (reference point),
E sup Xt = E sup(Xt − EXt0 ) ≤ E sup(Xt − Xt0 ),
t∈T t∈T t∈T
by Jensen’s inequality, because sup is convex function.
2) Multiscale discretization of T :
CHAINING :
π1 (t)
t0
T
1 Let N1 be a 1/2-net of T of
size N1 = N (T, 1/2)
Find π1 (t) ∈ N1 nearest to t
3
Xt − Xt0 = (Xt − Xπ1 (t) ) + (Xπ1 (t) − Xt0 )
↑ ↑
there are at most N1
smaller than before (1/2) such r.v.’s (not too
many)
π1 (t) π2 (t)
t
t0
2 Let N2 be a 1/4-net of T of
size N2 = N (T, 1/4)
Find π2 (t) ∈ N2 nearest to t
Xt − Xt0 = (Xt − Xπ2 (t) ) + (Xπ2 (t) ) − Xπ1 (t) ) + (Xπ1 (t) − Xt0 )
↑ ↑
even smaller(1/4) there are (at most ?) N1 N2 ≤ N22 such r.v.’s
.......................................................................................................
k Let Nk be a 2−k -net of T of
size Nk = N (T, 2−k )
Find πk (t) ∈ Nk nearest to t
.......................................................................................................
∞
X
Xt − Xt0 = Xπk (t) − Xπk−1 (t)
k=1
↑
chaining identity (π0 (t) = t0 ),
because Xt − Xπk (t) → 0 a.s. (Exercise: use πk (t) → t).
4
Nice properties of multiscale discretization:
1)Increments are small:
−(k−1)
2
−k 2
πk (t) πk−1 (t)
d(πk (t), πk−1 (t)) ≤ d(πk (t), t) + d(πk−1 (t), t) ≤ 2−k + 2−(k−1) = 3 · 2−k .
2) There are at most Nk Nk−1 ≤ Nk2 pairs of (πk (t), πk−1 (t)), whatever t is.
Increments:
cu2 a2k
P |Xπk (t) − Xπk−1 (t)| > u · ak ≤ C exp − = C exp −c′ · 22k u2 a2k
d(πk (t), πk−1 (t))2
(holds for ∀ak > 0).
Thus we can bound every increment in the Chaining Identity:
the failure (to bound) probability is
∞
X
p = P(∃k, ∃t ∈ T : |Xπk (t) −Xπk−1 (t) | > u·ak ) ≤ Nk2 ·C exp(−c·22k u2 a2k ).
k=1
In case of success: if ∀k, ∀t ∈ T :
|Xπk (t) − Xπk−1 (t) | ≤ uak ,
P
then |Xt − Xt0 | ≤ u ak .
Hence !
X
P sup |Xt − Xt0 | > u ak ≤ p. (∗)
t
k
5
P
It remains to choose weights ak . We have tradeoff here: we want ak to
be small, but for decreasing failure probability ak have to be large.
How large ? Say, for u ≥ 1 we want the summands in p be ∼ 2−k . Therefore
q
ak = c′ · 2−k log 2k Nk2 (f or u ≥ 1).
Then
∞
X ∞
X
2 2
p ≤ CNk2 · (2k Nk2 )−u ≤ C 2−ku .
k=1 k=1
2
So subgaussian failure probability obeys the bound p ≤ C · 2−u .
This way we get an estimate for the sum of weights which appears in (*):
X
′
X
−k
q √ √ √
ak = c 2 log 2k Nk2 ≤ (use a + b ≤ 2( a + b))
X p X p
≤ c′′ ( 2−k log 2k + 2−k log Nk ) ≤
| {z } | {z }
≤ const ≥ const because
diamT = 1, N1 ≥ 2
X p X q Z 1p
′′′ −k ′′′ −k IV
≤ C 2 log Nk = C 2 log N (T, 2−k ) ≤ C log N (T, ε)dε := S
k k 0
(compare series with integrals in the last inequality)
We have
2
P sup |Xt − Xt0 | > uS ≤ Ce−u f or u ≥ 1.
t
Thus, the random variable S1 supt |Xt − Xs | is subgaussian and Dudley’s in-
equality follows immediately.
Problem: to find sharp estimate (a function better than S - will be done
next time - see Lecture 9).
6
References
[1] Michel Ledoux and Michel Talagrand. Probability in Banach spaces, vol-
ume 23 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results
in Mathematics and Related Areas (3)]. Springer-Verlag, Berlin, 1991.
[2] Michel Talagrand. The generic chaining. Springer Monographs in Math-
ematics. Springer-Verlag, Berlin, 2005.