0% found this document useful (0 votes)
28 views76 pages

HDP Solution

This solution manual provides answers to exercises from Roman Vershynin's 'High-Dimensional Probability' and is intended for a reading group that began in Spring 2024. It includes discussions on various topics such as random variables, concentration inequalities, random vectors, and random matrices, while noting that some exercises may be omitted for simplicity or difficulty. The document also acknowledges the possibility of factual and typographic errors.

Uploaded by

y100000agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views76 pages

HDP Solution

This solution manual provides answers to exercises from Roman Vershynin's 'High-Dimensional Probability' and is intended for a reading group that began in Spring 2024. It includes discussions on various topics such as random variables, concentration inequalities, random vectors, and random matrices, while noting that some exercises may be omitted for simplicity or difficulty. The document also acknowledges the possibility of factual and typographic errors.

Uploaded by

y100000agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

High-Dimensional Probability

Solution Manual

Pingbang Hu

July 26, 2024


Abstract

This is the solution I write when organizing the reading group on Roman Vershynin’s High Dimen-
sional Probability [Ver24]. While we aim to solve all the exercises, occasionally we omit some due to
either 1.) simplicity; 2.) difficulty; or 3.) skipped section. Additionally, it may contain factual and/or
typographic errors.

The reading group started from Spring 2024, and the date on the cover page is the last updated time.
Contents

Appetizer: using probability to cover a geometric set 3

1 Preliminaries on random variables 5


1.1 Basic quantities associated with random variables . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Some classical inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Concentration of sums of independent random variables 7


2.1 Why concentration inequalities? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Hoeffding’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Chernoff’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Application: degrees of random graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Sub-gaussian distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 General Hoeffding’s and Khintchine’s inequalities . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Sub-exponential distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Bernstein’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Random vectors in high dimensions 28


3.1 Concentration of the norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Covariance matrices and principal component analysis . . . . . . . . . . . . . . . . . . . . 30
3.3 Examples of high-dimensional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Sub-gaussian distributions in higher dimensions . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Application: Grothendieck’s inequality and semidefinite programming . . . . . . . . . . . 37
3.6 Application: Maximum cut for graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.7 Kernel trick, and tightening of Grothendieck’s inequality . . . . . . . . . . . . . . . . . . . 38

4 Random matrices 41
4.1 Preliminaries on matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Nets, covering numbers and packing numbers . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Application: error correcting codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Upper bounds on random sub-gaussian matrices . . . . . . . . . . . . . . . . . . . . . . . 46
4.5 Application: community detection in networks . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Two-sided bounds on sub-gaussian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7 Application: covariance estimation and clustering . . . . . . . . . . . . . . . . . . . . . . . 52

5 Concentration without independence 53


5.1 Concentration of Lipschitz functions on the sphere . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Concentration on other metric measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Application: Johnson-Lindenstrauss Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.4 Matrix Bernstein’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.5 Application: community detection in sparse networks . . . . . . . . . . . . . . . . . . . . . 64
5.6 Application: covariance estimation for general distributions . . . . . . . . . . . . . . . . . 64

1
6 Quadratic forms, symmetrization and contraction 65
6.1 Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Hanson-Wright Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 Concentration of anisotropic random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.4 Symmetrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.5 Random matrices with non-i.i.d. entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.6 Application: matrix completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.7 Contraction Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

CONTENTS 2
Appetizer: using probability to cover a
geometric set

Week 1: Appetizer and Basic Inequalities


Problem (Exercise 0.0.3). Check the following variance identities that we used in the proof of The- 19 Jan. 2024
orem 0.0.2.
(a) Let Z1 , . . . , Zk be independent mean zero random vectors in Rn . Show that
 
2
Xk k
 X
Zj  = E[∥Zj ∥22 ].

E
j=1 j=1
2

(b) Let Z be a random vector in Rn . Show that

E[∥Z − E[Z]∥22 ] = E[∥Z∥22 ] − ∥E[Z]∥22 .

Answer. (a) If Z1 , . . . , Zk are independent mean zero random vectors in Rn , then


   2   2 
2 
X k n
X k
X Xn X k
Zj  = E   (Zj )i   = E  (Zj )i   .
    
E
j=1 i=1 j=1 i=1 j=1
2

From the assumption, E [(Zj )i (Zj ′ )i ] = E [(Zj )i ] E [(Zj ′ )i ] = 0, hence


 2   
n k n k k
" n # k
X X X X X X X
2 2
E ∥Zj ∥22 ,
 
(Zj )i   = E  (Zj )i  = (Zj )i =
 
E E
i=1 j=1 i=1 j=1 j=1 i=1 j=1

proving the result.


(b) If Z is a random vector in Rn , then
" n #
X 2
2
 
E ∥Z − E [Z]∥2 = E (Zi − E [Zi ])
i=1
n
X
E Zi2 − 2Zi E [Zi ] + (E [Zi ])2
 
=
i=1
n
X n
X n
X 2
E Zi2 − 2
 
= E [Zi ] E [Zi ] + E [Zi ]
i=1 i=1 i=1
= E ∥Z∥22 − ∥E [Z]∥22 .
 

3
Week 1: Appetizer and Basic Inequalities

Problem (Exercise 0.0.5). Prove the inequalities


 n m   Xm    en m
n n
≤ ≤ ≤
m m k m
k=0

for all integers m ∈ [1, n].

Answer. Fix some m ∈ [1, n]. We first show (n/m)m ≤ n


. This is because

m

m−1
(n/m)m Y  n m−j
n
 = ≤1
m j=0
m n−j
 Pm n
as m−j
n−j n
for all j. The second inequality m
n
≤ k=0 k is trivial since n
≥ 1 for all k. The

≥m k
last inequality is due to
Pm n n  
k=0 k
X n m k  m n
n m
 ≤ = 1+ ≤ em .
m
k n n
k=0

Problem (Exercise 0.0.6). Check that in Corollary 0.0.4,


2
(C + Cϵ2 N )⌈1/ϵ ⌉

suffice. Here C is a suitable absolute constant.


Answer. Omit. ⊛

CONTENTS 4
Chapter 1

Preliminaries on random variables

1.1 Basic quantities associated with random variables


No Exercise!

1.2 Some classical inequalities


Problem (Exercise 1.2.2). Prove the following extension of Lemma 1.2.1, which is valid for any
random variable X (not necessarily non-negative):
Z ∞ Z 0
E[X] = P(X > t) dt − P(X < t) dt.
0 −∞

Answer. Separating X into the plus and minus parts would do the job. Specifically, let X = X+ −X−
where X+ = max(X, 0) and X− = max(−X, 0), both are non-negative. Then, we see that by
applying Lemma 1.2.1,

E [X] = E [X+ ] − E [X− ]


Z ∞ Z ∞
= Pr(t < X+ ) dt − Pr(t < X− ) dt
Z0 ∞ 0
Z ∞
= Pr(X > t) dt − Pr(X < −t) dt
0 0
Z ∞ Z 0
= Pr(X > t) dt − Pr(X < t) dt.
0 −∞

Problem (Exercise 1.2.3). Let X be a random variable and p ∈ (0, ∞). Show that
Z ∞
p
E[|X| ] = ptp−1 P(|X| > t) dt
0

whenever the right-hand side is finite.

Answer. Since |X| is non-negative, from Lemma 1.2.1, we have


Z ∞ Z ∞
p p
E [|X| ] = Pr(t < |X| ) dt = ptp−1 Pr(|X| > t) dt
0 0

where we let t ← tp , hence dt ← ptp−1 dt. ⊛

5
Week 2: Basic Inequalities and Limit Theorems

Week 2: Basic Inequalities and Limit Theorems


Problem (Exercise 1.2.6). Deduce Chebyshev’s inequality by squaring both sides of the bound |X − 24 Jan. 2024
µ| ≥ t and applying Markov’s inequality.

Answer. From Markov’s inequality, for any t > 0,


 
2 2E |X − µ|2 σ2
Pr(|X − µ| ≥ t) = Pr(|X − µ| ≥ t ) ≤ = .
t2 t2

1.3 Limit theorems


Problem (Exercise 1.3.3). Let X1 , X2 , . . . be a sequence of i.i.d. random variables with mean µ and
finite variance. Show that
" N
#  
1 X 1
E Xi − µ = O √ as N → ∞.
N i=1 N

Answer. We see that


v   v
" N
# u
N 2 u " N
#
1 u  1 X 1 σ
X u u X
E Xi − µ ≤ tE Xi − µ  = Var
t Xi = √ .
N i=1
N i=1 N i=1 N


As σ < ∞ is a constant, the rate is exactly O(1/ N ). ⊛

CHAPTER 1. PRELIMINARIES ON RANDOM VARIABLES 6


Chapter 2

Concentration of sums of independent


random variables

Week 3: More Powerful Concentration Inequalities


2 Feb. 2024
2.1 Why concentration inequalities?
Problem (Exercise 2.1.4). Let g ∼ N (0, 1). Show that for all t ≥ 1, we have
 
1 −t2 /2 1 1
E[g 1g>t ] = t · √ e
2
2
+ P(g > t) ≤ t − √ e−t /2 .
2π t 2π

Answer. Denote the standard normal density as


1 2
Φ(x) = √ e−x /2 .

Since we have Φ′ (x) = −xΦ(x), by integration by part,
Z ∞
E g 1g>t = x2 1x>t Φ(x) dx
 2 
0
Z ∞
=− xΦ′ (x) dx
t
Z ∞

= − xΦ(x)|t + Φ(x) dx
t
1 2
= t · √ e−t /2 + P(g > t),

which gives the first equality. Furthermore, as t ≥ 1, we trivially have
Z ∞ Z ∞
1 ∞
Z
x Φ(t)
Φ(x) dx ≤ Φ(x) dx = −Φ′ (x) dx = ,
t t t t t t

implying that
Z ∞  
1 1 1
E g 2 1g>t = t · √ e−t /2 +
2 2
√ e−t /2 ,
 
Φ(x) dx ≤ t +
2π t t 2π
which gives the second inequality. ⊛

2.2 Hoeffding’s inequality

7
Week 3: More Powerful Concentration Inequalities

Problem (Exercise 2.2.3). Show that

cosh(x) ≤ exp x2 /2 for all x ∈ R.




Answer. Omit. ⊛
The next exercise is to prove Theorem 2.2.5 (Hoeffding’s inequality for general bounded random
variables), which we restate it for convenience.

Theorem 2.2.1 (Hoeffding’s inequality for general bounded random variables). Let X1 , . . . , XN be
independent random variables. Assume that Xi ∈ [mi , Mi ] for every i. Then, for any t > 0, we
have ! !
N
X 2t2
P (Xi − E[Xi ]) ≥ t ≤ exp − PN .
2
i=1 i=1 (Mi − mi )

Problem (Exercise 2.2.7). Prove the Hoeffding’s inequality for general bounded random variables,
possibly with some absolute constant instead of 2 in the tail.

Answer. Since raising both sides to p-th power doesn’t work since we’re now working with sum of
random variables, so we instead consider the MGF trick (also known as Crarmer-Chernoff method):

Lemma 2.2.1 (Crarmer-Chernoff method). Given a random variable X,


 
λ(X−µ) λt E eλ(X−µ)
P(X − µ ≥ t) = P(e ≥ e ) ≤ inf .
λ>0 eλt
Proof. This directly follows from the Markov’s inequality. ■
Hence, we see that
N
! " N
!#
X X
P (Xi − E [Xi ]) ≥ t ≤ inf e−λt E exp λ (Xi − E [Xi ])
λ>0
i=1 i=1
N
Y
= inf e−λt exp(λ(Xi − E [Xi ])).
λ>0
i=1

So now everything left is to bound E [exp(λ(Xi − E [Xi ]))]. Before we proceed, we need one lemma.

Lemma 2.2.2. For any bounded random variable Z ∈ [a, b],

(b − a)2
Var [Z] ≤ .
4
Proof. Since
" 2 #
(b − a)2
 
a+b a+b
Var [Z] = Var Z − ≤E Z− ≤ .
2 2 4

Claim. Given X ∈ [a, b] such that E [X] = 0, for all λ ∈ R,

(b − a)2
 
E eλX ≤ exp λ2
 
.
8

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 8


Week 3: More Powerful Concentration Inequalities

Proof. We first define ψ(λ) = ln E eλX , and compute


 

      !2
′ E XeλX ′′ E X 2 eλX E XeλX
ψ (λ) = , ψ (λ) = − .
E [eλX ] E [eλX ] E [eλX ]

λX
Now, observe that ψ ′′ is the variance under the law of X re-weighted by E[e e
λX ] , i.e., by a

change of measure, consider a new distribution Pλ (w.r.t. the original distribution P of X) as

eλX
dPλ (x) := dP(x),
EP [eλX ]

then
  Z
′ EP XeλX xeλx
ψ (λ) = = dP(x) = EPλ [X]
EP [eλX ] EP [eλX ]
and
    !2
′′ EP X 2 eλX EP XeλX 2
= EPλ X 2 − EPλ [X] = VarPλ [X] .
 
ψ (λ) = −
EP [eλX ] λX
EP [e ]

From Lemma 2.2.2, since X under the new distribution Pλ is still bounded between a and
b,
(b − a)2
ψ ′′ (λ) = VarPλ [X] ≤ .
4
Then by Taylor’s theorem, there exists some λ
e ∈ [0, λ] such that

1 e 2 = 1 ψ ′′ (λ)λ
ψ(λ) = ψ(0) + ψ ′ (0)λ + ψ ′′ (λ)λ e 2
2 2

since ψ(0) = ψ ′ (0) = 0. By bounding ψ ′′ (λ)λ


e 2 /2, we finally have

1 (b − a)2 2 (b − a)2
ln E eλX = ψ(λ) ≤ · λ = λ2
 
,
2 4 8
raising both sides by e shows the desired result. ⊛
Say given Xi ∈ [mi , Mi ] for every i, then Xi − E [Xi ] ∈ [mi − E [Xi ] , Mi − E [Xi ]] with mean 0
for every i. Then given any of the two bounds, for all λ ∈ R,
2
 
2 (Mi − mi )
h i
λ(Xi −E[Xi ])
E e ≤ exp λ .
8

Then we simply recall that


N
! N
X Y
P (Xi − E [Xi ]) ≥ t = inf e−λt exp(λ(Xi − E [Xi ]))
λ>0
i=1 i=1
N
!
X
2 (Mi − mi )2
≤ inf exp −λt + λ
λ>0
i=1
8
!
4t2 2t2
= exp − PN + PN
i=1 (Mi − mi )2 i=1 (Mi − mi )2
!
2t2
= exp − PN
i=1 (Mi − mi )2
PN
since infimum is achieved at λ = 4t/( i=1 (Mi − mi )2 ). ⊛

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 9


Week 3: More Powerful Concentration Inequalities

Problem (Exercise 2.2.8). Imagine we have an algorithm for solving some decision problem (e.g., is
a given number p a prime?). Suppose the algorithm makes a decision at random and returns the
correct answer with probability 12 + δ with some δ > 0, which is just a bit better than a random
guess. To improve the performance, we run the algorithm N times and take the majority vote.
Show that, for any ϵ ∈ (0, 1), the answer is correct with probability at least 1 − ϵ, as long as
 
1 1
N ≥ 2 ln .
2δ ϵ

i.i.d.
Answer. Consider X1 , . . . , XN ∼ Ber( 12 + δ), which is a series of indicators indicting whether the
random decision is correct or not. Note that E [Xi ] = 12 + δ.
PN
We see that by taking majority vote over N times, the algorithm makes a mistake if i=1 Xi ≤
N/2 (let’s not consider tie). This happens with probability
N
! N
!
2(N δ)2
 
X N X 2
P Xi ≤ =P (Xi − E [Xi ]) ≤ −N δ ≤ exp − = e−2N δ
i=1
2 i=1
N

2
from Hoeffding’s inequality.a Requiring e−2N δ ≤ ϵ is equivalent to requiring N ≥ 1
2δ 2 ln(1/ϵ). ⊛
a Note that the sign is flipped. However, Hoeffding’s inequality still holds (why?).

Problem (Exercise 2.2.9). Suppose we want to estimate the mean µ of a random variable X from
a sample X1 , . . . , XN drawn independently from the distribution of X. We want an ϵ-accurate
estimate, i.e., one that falls in the interval (µ − ϵ, µ + ϵ).

(a) Show that a sample of size N = O(σ 2 /ϵ2 ) is sufficient to compute an ϵ-accurate estimate with
probability at least 3/4, where s;2 = Var[X].
(b) Show that a sample of size N = O(log δ −1 σ 2 /ϵ2 ) is sufficient to compute an ϵ-accurate


estimate with probability at least 1 − δ.


PN
Answer. (a) Consider using the sample mean µ̂ = 1
N i=1 Xi as an estimator of µ. From the
Chebyshev’s inequality,
σ 2 /N
P (|µ̂ − µ| > ϵ) ≤ .
ϵ2
By requiring σ 2 /(N ϵ2 ) ≤ 1/4, i.e., N ≥ 4σ 2 /ϵ2 = O(σ 2 /ϵ2 ), suffices.
(b) Consider gathering k estimator from the above procedure, i.e., we now have µ̂1 , . . . , µ̂k such
that each are an ϵ-accurate mean estimator with probability at least 3/4. This requires
k · 4σ 2 /ϵ2 = O(kσ 2 /ϵ2 ) samples. We claim that the median µ̂ := median(µ̂1 , . . . , µ̂k ) is an
ϵ-accurate mean estimator with probability at least 1 − δ for some k (depends on δ). Consider
a series of indicators Xi = 1|µ̂i −µ|>ϵ , indicating if µ̂i is not ϵ-accurate. Then Xi ∼ Ber(1/4).
Then, our median estimator µ̂ fails with probability
k
! k
!
X k X k
P (|µ̂ − µ| > ϵ) = P Xi > =P (Xi − E [Xi ]) >
i=1
2 i=1
4

as E [Xi ] = 1/4. From Hoeffding’s inequality, the above probability is bounded above by
exp −2(k/4)2 /k , setting it to be less than δ we have

2(k/4)2
   
1 k
≥ ⇔ k = O(ln δ −1 ),

exp − ≤ δ ⇔ ln
k δ 8

i.e., the total number of samples required is O(kσ 2 /ϵ2 ) = O(ln δ −1 σ 2 /ϵ2 ).


CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 10


Week 3: More Powerful Concentration Inequalities

Problem (Exercise 2.2.10). Let X1 , . . . , XN be non-negative independent random variables with


continuous distributions. Assume that the densities of Xi are uniformly bounded by 1.
(a) Show that the MGF of Xi satisfies
1
E[exp(−tXi )] ≤ for all t > 0.
t

(b) Deduce that, for any ϵ > 0, we have


N
!
X
P Xi ≤ ϵN ≤ (eϵ)N .
i=1

Answer. (a) Since Xi ’s are non-negative and the densities fXi ≤ 1 uniformly, for every t > 0,
Z ∞ Z ∞ ∞
1 1
E [exp(−tXi )] = e−tx fXi (x) dx ≤ e−tx dx = − e−tx = .
0 0 t 0 t

(b) From Chernoff’s inequality, for any ϵ > 0,


N
! N
!
X X Xi
P Xi ≤ ϵN = P − ≥ −N
i=1 i=1
ϵ
" N
!#
λN
X Xi
≤ inf e E exp λ −
λ>0
i=1
ϵ
N   
λN
Y Xi
= inf e E exp −λ
λ>0
i=1
ϵ
N
Y ϵ
≤ inf eλN Part (a) with t = λ/ϵ
λ>0
i=1
λ
 ϵ N
= inf eλ
λ>0 λ
N
= (eϵ)

since the infimum is achieved when λ = 1.

2.3 Chernoff’s inequality


Problem (Exercise 2.3.2). Modify the proof of Theorem 2.3.1 to obtain the following bound on the
lower tail. For any t < µ , we have
 eµ t
P(SN ≤ t) ≤ e−µ .
t
Answer. A direct modification is that considering for any λ > 0,
N
Y
P(SN ≤ t) = P(−SN ≥ −t) = P(e−λSn ≥ e−λt ) ≤ eλt E [exp(−λXi )] .
i=1

A direct computation gives

E [exp(−λXi )] = e−λ pi + (1 − pi ) = 1 + (e−λ − 1)pi ≤ exp (e−λ − 1)pi ,




CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 11


Week 4: Chernoff’s Inequality and Degree Concentration

hence
N
Y
P(SN ≤ t) ≤ eλt exp (e−λ − 1)pi = eλt exp (e−λ − 1)µ = exp λt + (e−λ − 1)µ .
  
i=1

Minimizing the right-hand side, we see that


µ
t + (−µe−λ ) = 0 ⇔ t = µe−λ ⇔ λ = ln
t
achieves the infimum. And since t < µ, λ > 0 as required, which gives
     eµ t
µ t  µ 
P(SN ≤ t) ≤ exp t ln + − 1 µ = exp t ln + t − µ = e−µ .
t µ t t

Problem (Exercise 2.3.3). Let X ∼ Pois(λ). Show that for any t > λ, we have
 t
−λ eλ
P(X ≥ t) ≤ e .
t

Answer. From Chernoff’s inequality, for any θ > 0, we have

P(X ≥ t) ≤ e−θt E [exp(θX)] .

Then the Poisson moment can be calculated as


∞ ∞
X λk X (eθ λ)k
eθk · e−λ = e−λ = e−λ exp eθ λ = exp (eθ − 1)λ ,
 
E [exp(θX)] =
k! k!
k=0 k=0

hence  t  t
−θt θ
 λ −λ eλ
P(X ≥ t) ≤ e exp (e − 1)λ = exp(t − λ) = e
t t
where we take the minimizing θ = ln(t/λ) > 0 as t > λ. ⊛
Alternatively, we can also solve Exercise 2.3.3 directly as follows.
Answer. Consider a series of independent Bernoulli random variables XN,i for a fixed N such that
the Poisson limit theorem applies to approximate X ∼ Pois(λ), i.e., as N → ∞, maxi≤N pN,i → 0
and λN := E [SN ] → λ < ∞, SN → Pois(λ). From Chernoff’s inequality, for any t > λN ,
 t
eλN
P(SN > t) ≤ e−λN .
t

We then see that


 t  t
−λN eλN −λ eλ
P(X > t) = lim P(SN > t) ≤ lim e =e
N →∞ N →∞ t t

since λN → λ as N → ∞. ⊛

Week 4: Chernoff’s Inequality and Degree Concentration


Problem (Exercise 2.3.5). Show that, in the setting of Theorem 2.3.1, for δ ∈ (0, 1] we have 7 Feb. 2024

2
P(|SN − µ| ≥ δµ) ≤ 2e−cµδ

where c > 0 is an absolute constant.

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 12


Week 4: Chernoff’s Inequality and Degree Concentration

Answer. From Chernoff’s inequality (right-tail), for t = (1 + δ)µ, we have

ln P(SN ≥ (1 + δ)µ) ≤ −µ + (1 + δ)µ (1 + ln µ − ln(1 + δ) − ln µ)


= δµ − (1 + δ)µ(ln(1 + δ))
= µ(δ − (1 + δ) ln(1 + δ)).

A classic bound for ln(1 + δ) is the following.

Claim. For all x > 0,


2x
≤ ln(1 + x).
2+x

Proof. As (1 + x/2)2 = 1 + x + x2 /4 ≥ 1 + x,
 ′
1 1 x
[log(1 + x)]′ = ≥ = .
1+x (1 + x/2)2 1 + x/2

Note that log(1 + x) = x/(1 + x/2) = 0 at x = 0, so for all x > 0


x
log(1 + x) ≥ .
1 + x/2

Hence, as our δ ∈ (0, 1], we have

2δ µδ 2 µδ 2
ln P(SN ≥ (1 + δ)µ) ≤ µ(δ − (1 + δ) ln(1 + δ)) ≤ µδ − µ(1 + δ) =− ≤− .
2+δ 2+δ 3
Similarly, from Chernoff’s inequality (left-tail), for t = (1 − δ)µ, we have

ln P(SN ≤ (1 − δ)µ) ≤ −µ + (1 − δ)µ(1 + ln µ − ln(1 − δ) − ln µ)


= −δµ − (1 − δ)µ ln(1 − δ)
= µ(−δ − (1 − δ) ln(1 − δ)).

Another classic bound for ln(1 − δ) is the following.

Claim. For all x ∈ [−1, 1),


x2
−x − ≤ ln(1 − x).
2

Proof. This one is even easier: since ln(1 − x) = −x − x2 /2 − x3 /3 − . . .. ⊛


Hence, if δ ∈ (0, 1],a we have

δ2 µδ 2
 
ln P(SN ≤ (1 − δ)µ) ≤ µ(−δ − (1 − δ) ln(1 − δ)) ≤ −µδ − µ(1 − δ) −δ − ≤− .
2 2

Combining two tails, we then see that

P(|SN − µ| > δµ) ≤ P(SN ≥ (1 + δ)µ) + P(SN ≤ (1 − δ)µ)


µδ 2 µδ 2
   
≤ exp − + exp −
3 2
µδ 2
 
≤ 2 exp − ,
3

which almost complete the proof for c = 1/3. ⊛


2
a When δ = 1, ln P(SN ≤ (1 − δ)µ) ≤ − µδ2 holds trivially since P(SN = 0) ≤ exp(−µ/2).

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 13


Week 4: Chernoff’s Inequality and Degree Concentration

Problem (Exercise 2.3.6). Let X ∼ Pois(λ). Show that for t ∈ (0, λ], we have

ct2
 
P(|X − λ| ≥ t) ≤ 2 exp − .
λ

Answer. Fix some t =: δλ ∈ (0, λ] for some δ ∈ (0, 1] first. Consider a series of independent Bernoulli
random variables XN,i for a fixed N such that the Poisson limit theorem applies to approximate
X ∼ Pois(λ), i.e., as N → ∞, maxi≤N pN,i → 0 and λN := E [SN ] → λ < ∞, SN → Pois(λ). From
multiplicative form of Chernoff’s inequality, for tN := δλN ,

ct2N
 
P(|SN − λN | ≥ tN = δλN ) ≤ 2 exp − .
λN

It then follows that from the Poisson limit theorem,

ct2 ct2
   
P(|X − λ| ≥ t) = lim P(|SN − λN | ≥ tN ) = lim 2 exp − N = 2 exp −
N →∞ N →∞ λN λ

since tN = δλN → δλ = t. ⊛

Problem (Exercise 2.3.8). Let X ∼ Pois(λ). Show that, as λ → ∞, we have

X −λ D
√ → N (0, 1).
λ
Pλ i.i.d.
Answer. Since X := i=1 Xi ∼ Pois(λ) if Xi ∼ Pois(1) for all i, from Lindeberg-Lévy central
limit theorem, we have
X − E [X] X −λ d
p = √ → N (0, 1)
Var [X] λ
as E [Xi ] = Var [Xi ] = 1. ⊛

2.4 Application: degrees of random graphs


Problem (Exercise 2.4.2). Consider a random graph G ∼ G(n, p) with expected degrees d = O(log n).
Show that with high probability (say, 0.9), all vertices of G have degrees O(log n).

Answer. Since d = O(log n), there exists an absolute constant M > 0 such that d = (n − 1)p ≤
M log n for all large enough n. Now, consider some C > 0 such that eM/C =: α < 1. From
Chernoff’s inequality,
 C log n  C log n
−d ed −d eM
P(di ≥ C log n) ≤ e ≤e ≤ αC log n .
C log n C

Hence, from union bound, we have

P(∀i : di ≤ C log n) ≥ 1 − nαC log n ,

which can be arbitrarily close to 1 as C is sufficiently large. ⊛

Problem (Exercise 2.4.3). Consider a random graph G ∼ G(n, p) with expected degrees d = O(1).
Show that with high probability (say, 0.9), all vertices of G have degrees
 
log n
O .
log log n

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 14


Week 4: Chernoff’s Inequality and Degree Concentration

Answer. Since now d = (n − 1)p ≤ M for some absolute constant M > 0 for all large n, from
Chernoff’s inequality,

  !C logloglogn n  C logloglogn n
log n −d ed −d eM log log n
P di ≥ C ≤e ≤e
log log n C logloglogn n C log n

for some C > 0. This implies that


   C logloglogn n
log n −d eM log log n
P ∀i : di ≤ C ≥ 1 − ne .
log log n C log n

Now, considering C = M , we have


 C logloglogn n  M logloglogn n
−d eM log log n −d e log log n
ne ≤ ne .
C log n log n

Taking logarithm, we observe that


log n
log n − d + M (1 + log log log n − log log n)
log log n
log n
= (1 − M ) log n − d + M (1 + log log log n)
log log n
  
1 log log log n
= 1−M 1+ + log n − d → −∞
log log n log log n

as n → ∞, i.e.,
 C logloglogn n
−d eM log log n
ne → 0,
C log n
which is what we want to prove. ⊛

Problem (Exercise 2.4.4). Consider a random graph G ∼ G(n, p) with expected degrees d = o(log n).
Show that with high probability, (say, 0.9), G has a vertex with degree 10d.

Answer. Omit. ⊛

Problem (Exercise 2.4.5). Consider a random graph G ∼ G(n, p) with expected degrees d = O(1).
Show that with high probability, (say, 0.9), G has a vertex with degree
 
log n
Ω .
log log n

Answer. Firstly, note that the question is ill-defined in the sense that if d = (n − 1)p = O(1), it can
be d = 0 (with p = 0), which is impossible to prove the claim. Hence, consider the non-degenerate
case, i.e., d = Θ(1).
We want to prove that there exists some absolute constant C > 0 such that with high probability
G has a vertex with degree at least C log n/ log log n. First, consider separate the graph randomly
into two parts A, B, each of size n/2. It’s then easy to see by dropping every inner edge in A and
B, the graph becomes bipartite such that now A and B forms independent sets. Consider working
on this new graph (with degree denoted as d′ ), we have
  k  n/2−k  n k dk
n/2 d d
P(d′i = k) = 1− ≥ · k · e−d
k n−1 n−1 2k n
 n k  k
k −k −d d
=d n e = e−d .
2k 2k

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 15


Week 5: Sub-Gaussian Random Variables

Let k = C log n/ log log n such that d/2k > 1/ log n for large enough n,a we have
   k
′ C log n −d d
P di = ≥e ≥ e−d (log n)−k = exp(−d − k log log n)
log log n 2k
= exp(−d − C log n) = e−d n−C .

Let this probability be q, and focus on A. We can then define Xi = 1d′i =k for i ∈ A, and note
that Xi are all independent as A being an independent set. Then,P the number of vertices in A,
denoted as X, with degree exactly k follows Bin(n/2, q) with X = i∈A Xi and mean nq/2, variance
nq(1 − q)/2. From Chebyshev’s inequality,

σ2 nq(1 − q)/2 1−q 2 2 2ed


P(X = 0) ≤ P(|X − µ| ≥ µ) ≤ = = 2 ≤ ≤ = .
µ2 (nq/2)2 nq nq ne−d n−C n1−C

Now, by setting C < 1, say 1/2, then

P(X = 0) ≤ 2ed n−1/2 → 0

as n → ∞, which means P(X ≥ 1) → 1, i.e., with probability 1, there are at least one point with
degree log n/2 log log n. Now, by considering the deleting edges in the beginning, we conclude that
there will be a vertex with degree  
log n

log log n
with overwhelming probability. ⊛
a Since this is equivalent as k < d log n/2. As k has a log log n → ∞ factor in the denominator, the claim holds.

Week 5: Sub-Gaussian Random Variables


16 Feb. 2024
2.5 Sub-gaussian distributions
Problem (Exercise 2.5.1). Show that for each p ≥ 1, the random variable X ∼ N (0, 1) satisfies
1/p


p 1/p Γ((1 + p)/2)
∥X∥Lp = (E[|X| ]) = 2 .
Γ(1/2)

Deduce that

∥X∥Lp = O( p) as p → ∞.

Answer. We see that for p ≥ 1, we have


Z ∞ 1/p  Z ∞ 1/p
p 1/p 1 −x2 /2
p p 1 −x2 /2
(E[|X| ]) = |x| · √ e dx = 2 |x| · √ e dx
−∞ 2π 0 2π
from the symmetry around 0. Next, consider a change of variable x2 =: u, we have
 Z ∞ 1/p  Z ∞ 1/p
1 p/2 −u/2 1 1 (p−1)/2 −u/2
= 2√ u e √ du = √ u e du
2π 0 2 u 2π 0
with another change of variable u/2 =: t,
 Z ∞ 1/p  Z ∞ 1/p
1 1
= √ (2t)(p−1)/2 e−t 2 dt = √ · 2(p−1)/2 · 2 t(p−1)/2 e−t dt
2π 0 2π 0
1/p  1/p
1 √ p+1 Γ((p + 1)/2)
 
1 (p+1)/2 p+1
= √ 2 Γ = √ 2
2π 2 2 Γ(1/2)

as Γ(1/2) = π, we finally have
1/p


Γ((p + 1)/2)
= 2 ,
Γ(1/2)

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 16


Week 5: Sub-Gaussian Random Variables

where we recall that Z ∞


Γ(z) = tz−1 e−t dt.
0

To show that ∥X∥Lp = O( p) as p → ∞, we first note the following.

Lemma 2.5.1. We have that for p ≥ 1,


 ( −p/2 √
2 π(p − 1)!!, if p is even;

1+p
Γ =
2 2 −(p−1)/2
(p − 1)!!, if p is odd.

Proof. Consider the Legendre duplication formula, i.e.,



Γ(z)Γ(z + 1/2) = 21−2z πΓ(2z).

We see that for p being even, (1 + p)/2 = p/2 + 1/2, by letting z := p/2 ∈ N,

21−p πΓ(p) √ (p − 1)!
Γ((1 + p)/2) = = 21−p π
Γ(p/2) (p/2 − 1)!
√ (p − 1)! √
= 21−p π = 2−p/2 π(p − 1)!!.
(1/2)p/2−1 (p − 2)!!

For odd p, recall the identity Γ(z + 1) = zΓ(z). We then have


p−1
Γ((1 + p)/2) = · Γ((p − 1)/2)
2
(p − 1)(p − 3)
= · Γ((p − 3)/2)
22
..
.
(p − 1)(p − 3) . . . (p − (p − 2))
= · Γ(1) 2 = (p − (p − 2))
2(p−1)/2
=2−(p−1)/2 (p − 1)(p − 3) . . . (2)
=2−(p−1)/2 (p − 1)!!.


We then see that as p → ∞,
1/p


Γ((1 + p)/2) 1/p
p 1/p √
∥X∥Lp = 2 ≲ ((p − 1)!!) = O( p! ) = O( p).
Γ(1/2)

Problem (Exercise 2.5.4). Show that the condition E[X] = 0 is necessary for property v to hold.

Answer. Since if E[exp(λX)] ≤ exp K52 λ2 for all λ ∈ R, we see that from Jensen’s inequality,


exp(E[λX]) ≤ E[exp(λX)] ≤ exp K52 λ2 ,




i.e.,
λE[X] ≤ K52 λ2 .
Since this holds for every λ ∈ R, if λ > 0, E[X] ≤ K52 λ; on the other hand, if λ < 0, E[X] ≥ K52 λ.
In either case, as λ → 0 (from both sides, respectively), 0 ≤ E[X] ≤ 0, hence E[X] = 0. ⊛

Problem (Exercise 2.5.5). (a) Show that if X ∼ N (0, 1), the function λ 7→ E[exp λ2 X 2 ] is only


finite in some bounded neighborhood of zero.

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 17


Week 5: Sub-Gaussian Random Variables

(b) Suppose that some random variable X satisfies E[exp λ2 X 2 ] ≤ exp Kλ2 for all λ ∈ R and
 

some constant K. Show that X is a bounded random variable, i.e., ∥X∥∞ < ∞.

Answer. (a) If X ∼ N (0, 1), we see that


Z ∞ Z ∞
 1 2 1
E[exp λ2 X 2 ] = exp λ2 x2 √ e−x /2 dx = √ exp (λ2 − 1/2)x2 dx.
 
−∞ 2π 2π −∞
2
It’s obvious that if λ2 − 1/2 ≥ 0, the above integral doesn’t converge simply because eϵx for
any ϵ ≥ 0 is unbounded. On the other hand, if λ2 − 1/2 < 0, then this is just√a (scaled) √
Gaussian integral, which converges. Hence, this function is only finite in λ ∈ (−1/ 2, 1/ 2).
(b) Simply because that for any t, we have that for any λ,
 
E[exp λ2 X 2 ] exp Kλ2
= exp λ2 (K − t2 ) .

P(|X| > t) ≤ 2 2
≤ 2 2
exp(λ t ) exp(λ t )
√ √
Now, let’s pick t > K (as K being a constant, t can be any constant greater than t > K),
so λ2 (K − t2 ) < 0. By letting λ → ∞, we see that P(|X| > t) = 0, i.e., P(|X| ≤ t) = 1. Since
we’re in one-dimensional, |X| = ∥X∥∞ , hence we’re done.

Problem (Exercise 2.5.7). Check that ∥·∥ψ2 is indeed a norm on the space of sub-gaussian random
variables.

Answer. It’s clear that ∥X∥ψ2 = 0 if and only if X = 0. Also, for any λ > 0, ∥λX∥ψ2 = λ∥X∥ψ2
is obvious. Hence, we only need to verify triangle inequality, i.e., for any sub-gaussian random
variables X and Y ,
∥X + Y ∥ψ2 ≤ ∥X∥ψ2 + ∥Y ∥ψ2 .
Firstly, we observe that since exp(x) and x2 are both convex (hence their composition),
 2 !
X +Y ∥X∥ψ2
exp (X/∥X∥ψ2 )2

exp ≤
∥X∥ψ2 + ∥Y ∥ψ2 ∥X∥ψ2 + ∥Y ∥ψ2
∥Y ∥ψ2
exp (Y /∥Y ∥ψ2 )2 .

+
∥X∥ψ2 + ∥Y ∥ψ2

Then, by taking expectation on both sides,


"  2 !#
X +Y ∥X∥ψ2 ∥Y ∥ψ2
E exp ≤2 +2 = 2.
∥X∥ψ2 + ∥Y ∥ψ2 ∥X∥ψ2 + ∥Y ∥ψ2 ∥X∥ψ2 + ∥Y ∥ψ2

Now, we see that from the definition of ∥X + Y ∥ψ2 and t := ∥X∥ψ2 + ∥Y ∥ψ2 , the above implies

∥X + Y ∥ψ2 ≤ ∥X∥ψ2 + ∥Y ∥ψ2 ,

hence the triangle inequality is verified. ⊛

Problem (Exercise 2.5.9). Check that Poisson, exponential, Pareto and Cauchy distributions are not
sub-gaussian.

Answer. Omit. ⊛

Problem (Exercise 2.5.10). Let X1 , X2 , . . ., be a sequence of sub-gaussian random variables, which

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 18


Week 5: Sub-Gaussian Random Variables

are not necessarily independent. Show that


 
|Xi |
E max √ ≤ CK,
i 1 + log i

where K = maxi ∥Xi ∥ψ2 . Deduce that for every N ≥ 2 we have


 
p
E max|Xi | ≤ CK log N .
i≤N


Answer. Let Yi := |Xi |/K 1 + log i (which is always positive) for all i ≥ 1. Then for all t ≥ 0,
 
|X |
P(Yi ≥ t) = P √ i ≥t
K 1 + log i
 p 
= P |Xi | ≥ tK 1 + log i
!
ct2 K 2 (1 + log i) 2
≤ 2 exp −ct2 (1 + log i) = 2(ei)−ct

≤ 2 exp −
∥Xi ∥2ψ2

as K := maxi ∥Xi ∥2ψ2 . Then, our goal now is to show that E[maxi Yi ] ≤ C for some absolute constant
C. Consider t0 := 1/c, then we have
p

h i Z ∞  
E max Yi = P max Yi ≥ t dt
i 0 i
Z t0   Z ∞
∞X
≤ P max Yi ≥ t dt + P(Yi ≥ t) dt union bound
0 i t0 i=1
Z ∞
∞X
2
≤ t0 + 2(ei)−ct dt
t0 i=1
Z ∞ ∞
p 2 X
≤ 1/c + 2 e−ct i−2 dt
t0 i=1
∞ √ 5/2
2
π2 1 + π6
Z
p π −ct2
p π
≤ 1/c + 2 · e dt = 1/c + · √ = √ =: C.
6 0 3 2 c c

Finally, for every N ≥ 2,


     
|Xi | |Xi | |Xi |
E max √ ≤ E max √ ≤ E max √ ≤ CK,
i≤N 1 + log N i≤N 1 + log i i 1 + log i
√ √ √
i.e., E[maxi≤N |Xi |] ≤ CK 1 + log N ≤ CK 2 log N for all N ≥ 2. By letting C ′ := 2C,
 
p
E max |Xi | ≤ C ′ K log N ,
i≤N

which is exactly what we want. ⊛

Problem (Exercise 2.5.11). Show that the bound in Exercise 2.5.10 is sharp. Let X1 , X2 , . . . , XN be
independent N (0, 1) random variables. Prove that
 
p
E max Xi ≥ c log N .
i≤N

Answer. Again, let’s first write


  Z ∞  
E max Xi = P max Xi ≥ t dt,
i≤N 0 i≤N

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 19


Week 6: Hoeffding’s and Khintchine’s Inequalities

and observe that for any t ≥ 0,


 2
Z ∞
1 x
P(Xi ≥ t) = √ exp − dx
t 2π 2
Z ∞
(x + t)2
 
1
=√ exp − dx x←x+t
2π 0 2
Z 1
(x + t)2
 
1
≥√ exp − dx
2π 0 2
2
≥ Ce−t

for some constant C > 0. Since Xi ’s are i.i.d.,


 
N N
P max Xi ≥ t = 1 − P(X1 < t) = 1 − 1 − P(X1 ≥ t) ,
i≤N

so
  Z ∞
N
E max Xi = 1 − 1 − P(X1 ≥ t) dt
i≤N
Z0 ∞
2
≥ 1 − (1 − Ce−t )N dt
0
∞ N

Z 
p C
= log N 1 − 1 − u2 du. t =: log N u
0 N

Finally, as the final integral can be further bounded below by some absolute constant c depending
only on C, hence we obtain the desired result. ⊛

Week 6: Hoeffding’s and Khintchine’s Inequalities


21 Feb. 2024
2.6 General Hoeffding’s and Khintchine’s inequalities
Problem (Exercise 2.6.4). Deduce Hoeffding’s inequality for bounded random variables (Theorem
2.2.6) from Theorem 2.6.3, possibly with some absolute constant instead of 2 in the exponent.

Answer. Omit. ⊛

Problem (Exercise 2.6.5). Let X1 , . . . , XN be independent sub-gaussian random variables with zero
means and unit variances, and let a = (a1 , . . . , aN ) ∈ RN . Prove that for every p ∈ [2, ∞) we have

N
!1/2 N N
!1/2
X X √ X
a2i ≤ ai Xi ≤ CK p a2i
i=1 i=1 Lp i=1

where K = maxi ∥Xi ∥ψ2 and C is an absolute constant.

Answer. From Jensen’s inequality,

N N
 
N
!2 1/2
X X X
ai Xi ≥ ai Xi = E  ai Xi  .
i=1 Lp i=1 L2 i=1

Then, observe that since E[Xi ] = 0,


"N # 
N
!2  "N #!2 
N
!2 
X X X X
Var ai Xi = E  ai Xi  − E ai Xi =E ai Xi ,
i=1 i=1 i=1 i=1

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 20


Week 6: Hoeffding’s and Khintchine’s Inequalities

hP i
N PN 2 PN 2
and at the same time, as Var[Xi ] = 1, Var i=1 ai Xi = i=1 ai Var[Xi ] = i=1 ai = ∥a∥ ,
2

hence we have
N
X 1/2
≥ ∥a∥2

ai Xi = ∥a∥,
i=1 Lp
which is the desired lower-bound. For the upper-bound, we see that

N 2 N 2
2√ 2
X X
ai Xi ≤C p ai Xi
i=1 Lp i=1 ψ2
N
X N
X
≤ C ′p ∥ai Xi ∥2ψ2 = C ′′ p a2i ∥Xi ∥2ψ2 ≤ C ′′ K 2 p∥a∥2 ,
i=1 i=1

where C, C ′ , C ′′ are all absolute constant (might depend on each other). Taking square root on
both sides, we obtain the desired result. ⊛

Problem (Exercise 2.6.6). Show that in the setting of Exercise 2.6.5, we have

N
!1/2 N N
!1/2
X X X
c(K) a2i ≤ ai Xi ≤ a2i .
i=1 i=1 L1 i=1

Here Kg maxi ∥Xi ∥ψ2 and c(K) > 0 is a quantity which may depend only on K.

Answer. Skip, as this is a special case of Exercise 2.6.7. ⊛

Problem (Exercise 2.6.7). State and prove a version of Khintchine’s inequality for p ∈ (0, 2).

Answer. The Khintchine’s inequality for p ∈ (0, 2) can be stated as

N
!1/2 N N
!1/2
X X X
c(K, p) a2i ≤ ai Xi ≤ a2i .
i=1 i=1 Lp i=1

Here K = maxi ∥Xi ∥ψ2 and c(K, p) > 0 is a quantity which depends on K and p. We first recall the
generalized Hölder inequality.

Theorem 2.6.1 (Generalized Hölder inequality). For 1/p + 1/q = 1/r where p, q ∈ (0, ∞],

∥f g∥Lr ≤ ∥f ∥Lp ∥g∥Lq .

Proof. The classical case is when r = 1. By considering |f |r ∈ Lp/r and |g|r ∈ Lq/r , r/p+r/q =
1. Then the standard Hölder inequality implies
Z
∥f g∥rLr = |f g|r = ∥|f g|r ∥L1 ≤ ∥|f |r ∥Lp/r ∥|g|r ∥Lq/r
Z r/p Z r/q
r p/r r q/r
= (|f | ) (|g| ) = ∥f ∥rLp ∥g∥rLq ,

implying the result. ■


Now, take r = 2, p = q = 4, we get
1/4 1/4
∥XY ∥L2 ≤ ∥X∥L4 ∥Y ∥L4 = E[|X|4 ] E[|Y |4 ] .

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 21


Week 7: Sub-Exponential Random Variables

Let X = |Z|p/4 and Y = |Z|(4−p)/4 , we see that


1/4 1/4 p/4 (4−p)/4
∥Z∥L2 ≤ (E[|Z|p ]) E[|Z|4−p ] = ∥Z∥Lp ∥Z∥L4−p ,

implying
!4/p 4/p
∥Z∥L2 ∥Z∥L2
∥Z∥Lp ≥ (4−p)/4
= (4−p)/p
.
∥Z∥L4−p ∥Z∥L4−p
PN
Finally, by letting Z = i=1 ai Xi ,

N 4/p
N X (4−p)/p
. N
ai Xi
X
ai Xi ≥ .
X
i=1 L2
ai Xi
i=1 Lp i=1 L4−p

Observe that from Exercise 2.6.5:


PN
• ∥ i=1 ai Xi ∥L2 = ∥a∥;
PN √
• ∥ i=1 ai Xi ∥L4−p ≤ CK 4 − p∥a∥ (as 4 − p > 2 from p ∈ (0, 2)),
hence
N p
− 4−p
4/p .
≥ ∥a∥
X (4−p)/p  p
ai Xi CK
p
4 − p∥a∥ = CK 4 − p ∥a∥.
i=1 Lp

Hence, we see that by letting c(K, p) := (CK 4 − p)−p/(4−p) , the lower-bound is established. The
upper-bound is essentially the same as Exercise 2.6.5 (in there we use have the lower-bound since
p ≥ 2), where this time we use ∥·∥Lp ≤ ∥·∥L2 since p ≤ 2.a Hence, we’re done. ⊛
a Note that although ∥·∥Lp for p ∈ [0, 1) is not a norm, this inequality still holds.


Remark. Exercise 2.6.6 is just a special case with c(K, 1) = (CK 3)−1/3 .

Problem (Exercise 2.6.9). Show that unlike (2.19), the centering inequality in Lemma 2.6.8 does not
hold with C = 1.

Answer. Consider the random variable X := log 2 · ϵ where ϵ is a Rademacher random variable
with parameter p, i.e.,
w.p. p;
(p
log 2,
X=
− log 2, w.p. 1 − p.
p

Since E[exp X 2 ] = 2, we know that ∥X∥ψ2 is exactly 1. We now want to  show that ∥X −E[X]∥ψ2 >


∥X∥ψ2 = √ 1 for some p. It amounts to show that E[exp |X − E[X]|2


] > 2. Now, we know that
E[X] = log 2(2p − 1), and hence

2(1 − p) log 2, w.p. p;


( p
X − E[X] =
w.p. 1 − p.
p
−2p log 2,

Hence, we have that


2 2
E[exp |X − E[X]|2 ] = p · 24(1−p) + (1 − p)24p .


A quick numerical optimization gives the desired result with p ≈ 0.236. ⊛

Week 7: Sub-Exponential Random Variables


1 Mar. 2024
2.7 Sub-exponential distributions

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 22


Week 7: Sub-Exponential Random Variables

Problem (Exercise 2.7.2). Prove the equivalence of properties a-d in Proposition 2.7.1 by modifying
the proof of Proposition 2.5.2.

Answer. This is a special case of Exercise 2.7.3 with α = 1. ⊛

Problem (Exercise 2.7.3). More generally, consider the class of distributions whose tail decay is of
the type exp(−ctα ) or faster. Here α = 2 corresponds to sub-gaussian distributions, and α = 1, to
sub-exponential. State and prove a version of Proposition 2.7.1 for such distributions.

Answer. The generalized version of Proposition 2.7.1 is known to be the so-called Sub-Weibull
distributions [Vla+20]: Let X be a random variable. Then the following properties are equivalent;
the parameters Ki > 0 appearing in these properties differ from each other by at most an absolute
constant factor.

(a) The tails of X satisfy

P(|X| ≥ t) ≤ 2 exp(−tα /K1 ) for all t ≥ 0.

(b) The moments of X satisfy

∥X∥Lp = (E[|X|p ])1/p ≤ K2 p1/α for all p ≥ 1.

(c) The MGF of |X| satisfies


1
E[exp(λα |X|α )] ≤ exp(λα K3α ) for all λ such that 0 ≤ λ ≤ .
K3

(d) The MGF of |X| is bounded at some point, namely

E[exp(|X|α /K4α )] ≤ 2.

Claim. (a) ⇒ (b)

Proof. Without loss of generality, let K1 = 1. Then, we have


Z ∞
∥X∥pLp = P(|X|p ≥ t) dt
Z0 ∞
= pup−1 P(|X| ≥ u) du u := t1/p
0
Z ∞
α
≤ 2p up−1 e−u du from our assumption
Z0 ∞
2p
= tp/α−1 e−t dt t := uα
α 0
p
= 2 Γ(p/α) = 2Γ(p/α + 1) ≲ (p/α + 1)p/α+1
α
for some constant C from Stirling’s approximation. Hence,
p  α1 + p1 p  α1  p  p1
∥X∥Lp ≲ +1 = +1 +1 ≲ p1/α
α α α
as we desired. ⊛

Claim. (b) ⇒ (c)

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 23


Week 7: Sub-Exponential Random Variables

Proof. Firstly, from Taylor’s expansion, we have


∞ ∞
α α
X λαk E[|X|αk ] X λαk E[|X|αk ]
E[exp(λ |X| )] = 1 + ≤1+ .
k! k!
k=1 k=1

From (b), when αk ≥ 1, we have E[|X|αk ] ≤ (K2 (αk)1/α )αk = K2αk (αk)k . On the other hand,
for any given α > 0, there are only finitely many k ≥ 1 such that αk < 1. Hence, there exists
some Ke 2 such that
E[|X|αk ] ≤ K
e 2αk (αk)k

for all k ≥ 1. With k! ≥ (k/e)k from Stirling’s approximation, we further have


∞ ∞ ∞ ∞
X λαk E[|X|αk ] X λαk K
e αk (αk)k
2
X X
1+ ≤1+ =1+ λαk K
e 2αk (αe)k = 1 + e 2α λα αe)k .
(K
k! (k/e)k
k=1 k=1 k=1 k=1

Observe that if 0 < K


e α λα αe < 1, we then have
2


α α
X
e 2α λα αe)k = 1
E[exp(λ |X| )] ≤ 1 + (K .
k=1 1 − K2α λα αe
e

As (1 − x)e2x ≥ 1 for all x ∈ [0, 1/2], the above is further less than
  h iα 
exp 2(K e 2 λ)α αe = exp (2αe)1/α K e 2 λα .

By letting K3 := (2αe)1/α K
e 2 , we have the desired result whenever K
e α λα αe < 1, or equiva-
2
lently,
1 1
0 < λα < ⇔0<λ< .
α
K αe
e K2 (αe)1/α
e
2

Hence, if 0 < λ ≤ 1
e 2 (2αe)1/α
K
= K3 ,
1
the above is satisfied. ⊛

Claim. (c) ⇒ (d)

Proof. Assuming (c) holds, then (d) is obtained by taking λ := 1/K4 where K4 := K3 (ln 2)−1/α .
In this case, λ = 1/K3 · (ln 2)1/α , hence

E[exp(λα |X|α )] = E[exp(|X|α /K4α )] ≤ exp(λα K3α )

for all 0 ≤ λ = 1/K4 ≤ 1/K3 from (d) gives


 
1
E[exp(|X|α /K4α )] ≤ exp ln 2 · α · K3α = 2.
K3

Claim. (d) ⇒ (a)

Proof. Let K4 = 1 without loss of generality. Then, we have

E[exp(|X|α )]
P(|X| ≥ t) = P(exp(|X|α ) ≥ exp(tα )) ≤ ≤ 2 exp(−tα ),
exp(tα )

hence K1 := 1 proves the result. ⊛

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 24


Week 8: Bernstein’s Inequality

Problem (Exercise 2.7.4). Argue that the bound in property c can not be extended for all λ such
that |λ| ≤ 1/K3 .

Answer. It’s easy to see that in the proof of Exercise 2.7.3, when we prove (b) ⇒ (c), the condition
for λ essentially comes from:
P∞ e α α P∞ e
• whether 1 + k=1 (K k
2 λ αe) = 1 + k=1 (K2 λe) as α = 1 converges; and
k

• the numerical inequality (1 − x)e2x ≥ 1 for x ∈ [0, 1/2] such that x := K


e 2 λe.

For the first condition, we only need |K


e 2 λe| < 1, hence we don’t need positivity for λ at first;
however, the second condition indeed requires λ ≥ 0, and it’s impossible to remove as this is tight.

Problem (Exercise 2.7.10). Prove an analog of the Centering Lemma 2.6.8 for sub-exponential ran-
dom variables X:
∥X − E[X]∥ψ1 ≤ C∥X∥ψ1 .

Answer. Since ∥·∥ψ2 is a norm, we have ∥X − E[X]∥ψ1 ≤ ∥X∥ψ1 + ∥E[X]∥ψ1 such that

∥E[X]∥ψ1 ≲ |E[X]| ∥a∥ψ1 = inf t>0 {E[e|a|/t ] ≤ 2} ≲ |a|


≤ E[|X|] Jensen’s inequality
= ∥X∥L1 ≲ ∥X∥ψ1

from Proposition 2.7.1 (b) with p = 1, i.e.,

∥X∥L1 ≤ K2 ∼
= ∥X∥ψ1

since Ki ∼
= ∥X∥ψ1 = K4 . ⊛

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 25


Week 8: Bernstein’s Inequality

Week 8: Bernstein’s Inequality


Problem (Exercise 2.7.11). Show that ∥X∥ψ is indeed a norm on the space Lψ . 6 Mar. 2024

Answer. Clearly, ∥X∥ψ ≥ 0. To check ∥X∥ψ = 0 if and only if X = 0 a.s., we first see that ∥0∥ψ = 0
as ψ(0) = 0. On the other hand, if ∥X∥ψ = 0, then by the monotone convergence theorem, we have
h i
1 ≥ lim E[ψ(|X|/t)] = E lim ψ(|X|/t)
t→0 t→0
Z ∞  
= P lim ψ(|X|/t) > u du
0 t→0
Z ∞  
= P(|X| > 0) P lim ψ(|X|/t) > u | |X| > 0 du
t→0
Z0 ∞
= P(|X| > 0) du
0
= ∞ · P(|X| > 0),

since if |X| = 0, ψ(|X|/t) = ψ(0) = 0 for all t > 0, and


 
P lim ψ(|X|/t) > u | |X| > 0 = 1
t→0

since ψ(x) → ∞ for x → ∞, and in this case, x = |X|/t, which indeed goes to ∞ as t → 0. Overall,
this implies P(|X| > 0) = 0, i.e., X = 0 almost surely, hence we conclude that ∥X∥ψ = 0 if and
only if X = 0 a.s. The other two properties follows the same proof of Exercise 2.5.7. ⊛

2.8 Bernstein’s inequality


Problem (Exercise 2.8.5). Let X be a mean-zero random variable such that |X| ≤ K. Prove the
following bound on the MGF of X:

λ2 /2
E[exp(λX)] ≤ exp g(λ)E[X 2 ] where g(λ) =

,
1 − |λ|K/3

provided that |λ| < 3/K.

Answer. From the hint, we first check the following.

Claim. For all |x| < 3,


x2 /2
ex ≤ 1 + x + .
1 − |x|/3

Proof. From Taylor’s expansion,


∞ ∞
x2 X xk x2 X xk x2 /2
ex = 1 + x + ≤1+x+ = 1 + x +
2 (2 + k)!/2 2 3k 1 − |x|/3
k=0 k=0

where the last equality follows for all |x| < 3. ⊛


Now, for a random variable X such that |X| ≤ K and |λ| < 3/K, we have

λ2 X 2 /2 λ2 E[X 2 ]/2
 2
λ E[X 2 ]/2
  
E[exp(λX)] ≤ E 1 + λX + =1+ ≤ exp ,
1 − |λX|/3 1 − |λ|K/3 1 − |λ|K/3

where we let x := λX and apply the claim. Finally, note that the right-hand side is exactly
exp g(λ)E[X 2 ] , we’re done. ⊛

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 26


Week 8: Bernstein’s Inequality

Problem (Exercise 2.8.6). Deduce Theorem 2.8.4 from the bound in Exercise 2.8.5.

Answer. From Markov’s inequality, for every t ≥ 0,


h  P i
N
! N
X E exp λ i=1 Xi
P Xi ≥ t ≤ inf
i=1
λ>0 exp(λt)
N N
!
Y X
−λt −λt
= inf e E[exp(λXi )] ≤ inf e exp g(λ) E[Xi2 ]
λ>0 λ>0
i=1 i=1

PN
from Exercise 2.8.5, if |λ| < 3/K. Denote σ 2 = i=1 E[Xi2 ], we further have
N
!
X
≤ inf exp −λt + g(λ)σ 2 .

P Xi ≥ t
λ>0
i=1

Let 0 ≤ λ = t
σ 2 +tK/3 < 3/K, we see that

N
!
t2 σ 2 λ2 /2 t2 /2
X    
P Xi ≥ t ≤ exp − 2
+ = exp − .
i=1
σ + tK/3 1 − |λ|K/3 σ 2 + tK/3

Applying the same argument for −Xi , we get


N
!
t2 /2
X  
P Xi ≥ t ≤ 2 exp − .
i=1
σ 2 + Kt/3

CHAPTER 2. CONCENTRATION OF SUMS OF INDEPENDENT RANDOM VARIABLES 27


Chapter 3

Random vectors in high dimensions

Week 9: Concentration Inequalities of Random Vectors


15 Mar. 2024
3.1 Concentration of the norm
Problem (Exercise 3.1.4). (a) Deduce from Theorem 3.1.1 that
√ √
n − CK 2 ≤ E[∥X∥2 ] ≤ n + CK 2 .

(b) Can CK 2 be replaced by o(1), a quantity that vanishes as n → ∞?

Answer. (a) From Jensen’s inequality, we have


√ √ √
|E[∥X∥2 − n]| ≤ E[|∥X∥2 − n|] ≤ ∥∥X∥2 − n∥ψ2 ≤ CK 2

from Theorem 3.1.1 and

∥Z∥ψ2 = inf{t > 0 : E[exp Z 2 /t2 ] ≤ 2} ≥ ∥Z∥L1




as E[exp Z 2 /(E[|Z|]2 ) ] ≥ 1 + E[Z 2 ]/(E[|Z|]2 ) ≥ 2, again from Jensen’s inequality.





(b) We first observe that E[∥X∥2 ] ≤ E[∥X∥22 ] = n, hence we only need to deal with lower-
p

bound. Consider the following non-negative function


√ 1
f (x) = x − (1 + x − (x − 1)2 ) ≥ 0
2
for x ≥ 0. Then, for x = ∥X∥22 /n ≥ 0, we have
r 2 !
∥X∥22 ∥X∥22 ∥X∥22

1
≥ 1+ − −1
n 2 n n
√ 2 !
∥X∥22 ∥X∥22

n
⇒∥X∥2 ≥ 1+ − −1
2 n n
√  √ " 2 #
n n n ∥X∥22 − E[∥X∥22 ]
⇒E[∥X∥2 ] ≥ 1+ − E
2 n 2 n
√ 1
⇒E[∥X∥2 ] ≥ n− Var[∥X∥22 ].
2n3/2
Expanding the variance, we see that
n
X n
 2 X
Var[∥X∥22 ] E[Xi4 ] − E[Xi2 ]2 ≤ n · max E[Xi4 ] = n · max ∥Xi ∥4L4 ,

= Var Xi =
1≤i≤n 1≤i≤n
i=1 i=1

28
Week 9: Concentration Inequalities of Random Vectors

and from the sub-gaussian property, this is ≲ n · max1≤i≤n ∥Xi ∥4ψ2 = nK 4 . Overall,

√ 1 4
√ K4 √
E[∥X∥2 ] ≳ n− nK = n − √ = n + o(1),
2n3/2 n

if K ≥ 1. Otherwise, when K < 1, we replace K 4 by 1, the result holds still.

Problem (Exercise 3.1.5). Deduce from Theorem 3.1.1 that

Var[∥X∥2 ] ≤ CK 4 .

Answer. From the definition and the fact that the mean minimizes the MSE,

Var[∥X∥2 ] = E[(∥X∥2 − E[∥X∥2 ])2 ] ≤ E[(∥X∥2 − n)2 ],

then from the proof of Exercise 3.1.4, as E[|∥X∥2 − n|] ≤ cK 2 for some c,

Var[∥X∥2 ] ≤ E[(∥X∥2 − n)2 ] ≤ c2 K 4 ,

and by letting c2 =: C, we’re done.

Problem (Exercise 3.1.6). Let X = (X1 , . . . , Xn ) ∈ Rn be a random vector with independent coor-
dinates Xi that satisfy E[Xi2 ] = 1 and E[Xi4 ] ≤ K 4 . Show that

Var[∥X∥2 ] ≤ CK 4 .

Answer. Firstly,
√ √ observe that with our new assumption, Exercise 3.1.4 (b) again gives E[∥X∥2 ] ≳
n − K 4 / n. Then from the same reason as stated in Exercise 3.1.5,

√ 2 √ √ √ K4
 
Var[∥X∥2 ] ≤ E[(∥X∥2 − n) ] = 2n − 2 nE[∥X∥2 ] ≲ 2n − 2 n n− √ = 2K 4 ,
n

proving the result. ⊛

Problem (Exercise 3.1.7). Let X = (X1 , . . . , Xn ) ∈ Rn be a random vector with independent coor-
dinates Xi with continuous distributions. Assume that the densities of Xi are uniformly bounded
by 1. Show that, for any ϵ > 0, we have

P(∥X∥2 ≤ ϵ n) ≤ (Cϵ)n .

Answer. We want to bound


n
!
√  2 2
X
2 2
P ∥X∥2 ≤ ϵ n = P(∥X∥2 ≤ ϵ n) = P Xi ≤ ϵ n .
i=1

Follow the same argument as Exercise 2.2.10,a i.e., first we bound E[exp −tXi2 ] for all t > 0. We


have Z ∞ Z ∞ r
2
 −tx2 −tx2 1 π
E[exp −tXi ] = e fXi (x) dx ≤ e dx =
0 0 2 t
from the Gaussian integral. Then, from the MGF trick, we have
  r n
√ 2 2 E[exp −t∥X∥22 ] 1 π 2
P(∥X∥2 ≤ ϵ n) = P(−∥X∥2 ≥ −ϵ n) ≤ inf 2
≤ inf etϵ n .
t>0 exp(−tϵ n) t>0 2 t

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 29


Week 9: Concentration Inequalities of Random Vectors

Let t = ϵ−2 , we have √ n


√ π
P(∥X∥2 ≤ ϵ n) ≤ ϵ·e =: (Cϵ)n
2

by letting C := πe/2. ⊛
a The
result does not directly follow from this because ϵ is replaced by ϵ2 , and a bound on the density of Xi doesn’t
give a bound on the density of Xi2 .

3.2 Covariance matrices and principal component analysis


Problem (Exercise 3.2.2). (a) Let Z be a mean zero, isotropic random vector in Rn . Let µ ∈ Rn
be a fixed vector and Σ be a fixed n × n symmetric positive semidefinite matrix. Check that
the random vector
X := µ + Σ1/2 Z
has mean µ and covariance matrix Cov[X] = Σ.
(b) Let X be a random vector with mean µ and invertible covariance matrix Σ = Cov[X]. Check
that the random vector
Z := Σ−1/2 (X − µ)
is an isotropic, mean zero random vector.

Answer. (a) Firstly,


E[X] = E[µ] + E[Σ1/2 Z] = µ + Σ1/2 E[Z] = µ
Moreover,

Cov[X] = Cov[µ + Σ1/2 Z]


= E[(µ + Σ1/2 Z)(µ + Σ1/2 Z)⊤ ] − µµ⊤
= E[(µ + Σ1/2 Z)Z ⊤ (Σ1/2 )⊤ ]
= E[µZ ⊤ (Σ1/2 )⊤ ] + E[Σ1/2 ZZ ⊤ (Σ1/2 )⊤ ]
= 0 + Σ1/2 E[ZZ ⊤ ](Σ1/2 )⊤
= Σ1/2 In (Σ1/2 )⊤

as Σ is positive-semidefinite.
(b) Similarly,
E[Z] = Σ−1/2 E[X − µ] = Σ−1/2 (µ − µ) = 0,
and moreover,

Cov[Z] = Cov[Σ−1/2 (X − µ)]


h i
= E (Σ−1/2 (X − µ))(Σ−1/2 (X − µ))⊤
= Σ−1/2 E[(X − µ)(X − µ)⊤ ](Σ−1/2 )⊤
= Σ−1/2 Σ(Σ−1/2 )⊤
= In ,

hence Z is also isotropic.

Problem (Exercise 3.2.6). Let X and Y be independent, mean zero, isotropic random vectors in Rn .

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 30


Week 10: Common High-Dimensional Distributions

Check that
E[∥X − Y ∥22 ] = 2n.

Answer. This directly follows from

E[∥X − Y ∥22 ] = E[⟨X − Y, X − Y ⟩] = E[⟨X, X⟩] − 2E[⟨X, Y ⟩] + E[⟨Y, Y ⟩] = n − 0 + n = 2n.

Week 10: Common High-Dimensional Distributions


20 Mar. 2024
3.3 Examples of high-dimensional distributions
Problem (Exercise 3.3.1). Show that the spherically distributed random vector X is isotropic. Argue
that the coordinates of X are not independent.

D
Answer. Firstly, from the spherical symmetry of X, for any x ∈ Rn , ⟨X, x⟩ = ⟨X, ∥x∥2 e⟩ for all
e ∈ S n−1 . Hence, to show X is isotropic, from Lemma 3.2.3, it suffices to show that for any x ∈ Rn ,
n
" n # " n #
2 1X 2 1 X
2 2 1X 2
E[⟨X, x⟩ ] = E[⟨X, ∥x∥2 ei ⟩ ] = E (∥x∥2 Xi ) = ∥x∥2 E Xi = ∥x∥22 ,
n i=1 n i=1
n i=1

where ei denotes the ith standard unit vector. The last equality holds from the fact that
" n #
1X 2 1 1
E X = E[∥X∥22 ] = n = 1
n i=1 i n n

as X ∼ U( nS n−1 ). On the other hand, clearly Xi ’s can’t be independent since the first n − 1
coordinates determines the last coordinate. ⊛

Problem (Exercise 3.3.3). Deduce the following properties from the rotation invariance of the normal
distribution.

(a) Consider a random vector g ∼ N (0, In ) and a fixed vector u ∈ Rn . Then

⟨g, u⟩ ∼ N (0, ∥u∥22 ).

(b) Consider independent random variables Xi ∼ N (0, σi2 ). Then


n
X n
X
Xi ∼ N (0, σ 2 ) where σ 2 = σi2 .
i=1 i=1

(c) Let G be an m × n Gaussian random matrix, i.e., the entries of G are independent N (0, 1)
random variables. Let u ∈ Rn be a fixed unit vector. Then

Gu ∼ N (0, Im ).

Answer. (a) Without loss of generality, we may assume ∥u∥2 = 1 and prove

⟨g, u⟩ ∼ N (0, 1)

for any fixed unit vector u ∈ Rn . But this is clear as there must exist u1 , . . . , un−1 such
that {u, u1 , . . . , un−1 } forms an orthonormal basis of Rn , and U := (u, u1 , . . . , un−1 )⊤ is

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 31


Week 10: Common High-Dimensional Distributions

orthonormal. From Proposition 3.3.2, we have

U g ∼ N (0, In ),

which implies (U g)1 ∼ N (0, 1). With (U g)1 = u⊤ g = ⟨g, u⟩, we’re done.

(b) For independent Xi ∼ N (0, σi2 ), we have Xi /σi ∼ N (0, 1). We want to show
n
X
Xi ∼ N (0, σ 2 )
i=1
Pn
where σ 2 = i=1 σi2 . Firstly, we have g := (X1 /σ1 , . . . , Xn /σn ) ∼ N (0, In ), then by consid-
ering u := (σ1 , . . . , σn ) ∈ Rn , we have
n n
!
X X
2
⟨g, u⟩ = Xi ∼ N (0, ∥u∥2 ) = N 0, σi = N (0, σ 2 )
2

i=1 i=1

from (a).
Pn
(c) For any fixed unit vector u, (Gu)i = j=1 gij uj = ⟨gi , u⟩ where gi = (gi1 , gi2 , . . . , gin ) for all
i ∈ [m]. It’s clear that gi ∼ N (0, In ), and from (a), ⟨gi , u⟩ ∼ N (0, 1). This implies

Gu = (⟨g1 , u⟩, . . . , ⟨gm , u⟩) ∼ N (0, Im )

as desired.

Problem (Exercise 3.3.4). Let X be a random vector in Rn . Show that X has a multivariate normal
distribution if and only if every one-dimensional marginal ⟨X, θ⟩, θ ∈ Rn , has a (univariate) normal
distribution.

Answer. This is an application of Cramér-Wold device and Exercise 3.3.3 (a). Omit the details. ⊛

Problem (Exercise 3.3.5). Let X ∼ N (0, In ).


(a) Show that, for any fixed vectors u, v ∈ Rn , we have

E[⟨X, u⟩⟨X, v⟩] = ⟨u, v⟩.

(b) Given a vector u ∈ Rn , consider the random variable Xu := ⟨X, u⟩. From Exercise 3.3.3 we
know that Xu ∼ N (0, ∥u∥22 ). Check that

∥Xu − Xv ∥L2 = ∥u − v∥2

for any fixed vectors u, v ∈ Rn .

Answer. (a) It’s because

E[⟨X, u⟩⟨X, v⟩] = E[(u⊤ X)(X ⊤ v)] = u⊤ E[XX ⊤ ]v = u⊤ In v = ⟨u, v⟩

from the fact that X is isotropic.


(b) Since Xu − Xv = ⟨X, u⟩ − ⟨X, v⟩ = ⟨X, u − v⟩ = Xu−v from linearity of inner product. Hence,
p q p
∥Xu − Xv ∥L2 = ⟨Xu−v , Xu−v ⟩ = E[Xu−v 2 ] = E[⟨X, u − v⟩2 ].

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 32


Week 10: Common High-Dimensional Distributions

From (a), E[⟨X, u − v⟩2 ] = ⟨u − v, u − v⟩ = ∥u − v∥22 , hence


q
∥Xu − Xv ∥L2 = ∥u − v∥22 = ∥u − v∥2 .

Problem (Exercise 3.3.6). h Let G be an m × n Gaussian random matrix, i.e., the entries of G are
independent N (0, 1) random variables. Let u, v ∈ Rn be unit orthogonal vectors. Prove that Gu
and Gv are independent N (0, Im ) random vectors.

Answer. It’s clear that Gu and Gv are both N (0, Im ) random vectors from Exercise 3.3.3 (c). It
remains to show that Gu and Gv are independent, i.e., (Gu)i and (Gv)j are independent random
variables.
For i ̸= j, this is clear since (Gu)i = e⊤i (Gu) and (Gv)j = ej (Gv), and ei G gives the i
⊤ ⊤ th
row of
G, while ej G gives the j row of G. The fact that G has independent rows proves the result for
⊤ th

the case of i ̸= j.
For i = j, let e⊤i G =: g where g ∼ N (0, In ), and we want to show independence of (Gu)i = g u
⊤ ⊤

and (Gv)j = g v. This is still easy since


 ⊤ 
g u
= (u, v)⊤ g ∼ N (0, (u, v)⊤ In (u, v)) = N (0, I2 )
g⊤ v

as u, v are unit orthogonal vectors. ⊛

Problem (Exercise 3.3.7). Let us represent g ∼ N (0, In ) in polar form as

g = rθ

where r = ∥g∥2 is the length and θ = g/∥g∥2 is the direction of g. Prove the following:
(a) The length r and direction θ are independent random variables.

(b) The direction θ is uniformly distributed on the unit sphere S n−1 .

Answer. For any measurable M ⊆ Rn , given the normal density fG (g) of g, some elementary
calculus gives the polar coordinate transformation dg = rn−1 dr dσ(θ), hence
Z Z Z
P(g ∈ M ) = fG (g) dg = fG (rθ) dσ(θ)rn−1 dr
M A B
Z Z (3.1)
ωn−1 n−1 −r 2 /2
= r e dr dσ(θ) = P(r ∈ A, θ ∈ B)
(2π)n/2 A B

for some AR ⊆ [0, ∞) and B ⊆ S n−1 generating M , where σ is the surface area element on S n−1
such that S n−1 dσ = ωn−1 , i.e., ωn−1 is the surface area of the unit sphere S n−1 .
(a) From Equation 3.1, it’s possible to write

P(g ∈ M ) = P(r ∈ A, θ ∈ B) =: f (A)g(B)

such that g(S n−1 ) = 1 with appropriate constant manipulation. Hence, with B = S n−1 ,

P(r ∈ A, θ ∈ S n−1 ) = P(r ∈ A) = f (A),

implying f ([0, ∞)) = 1 as well. This further shows that by considering A = [0, ∞),

P(r ∈ [0, ∞), θ ∈ B) = P(θ ∈ B) = g(B).

Such a separation of probability proves the independence.

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 33


Week 11: High-Dimensional Sub-Gaussian Distributions

(b) From Equation 3.1, we see that for any B ⊆ S n−1 , the density is uniform among dσ(θ), hence
θ is uniformly distributed on S n−1 .

Problem (Exercise 3.3.9). Show that {ui }N


i=1 is a tight frame in R with bound A if and only if
n

N
X
ui u⊤
i = AIn .
i=1

Answer. Recall that for two symmetric matrices A, B ∈ Rn×n , A = B if and only if x⊤ Ax = x⊤ Bx
for all x ∈ Rn . Hence,
N N
!
X X
⊤ ⊤
ui ui = AIn ⇔ x ui ui x = x⊤ (AIn )x

i=1 i=1

for all x ∈ Rn . We see that

• The left-hand side:


N
! N N
X X X

x ui u⊤
i x= (x⊤ ui )(u⊤
i x) = ⟨ui , x⟩2 ,
i=1 i=1 i=1

• The right-hand side:


x⊤ AIn x = Ax⊤ x = A∥x∥22 .
PN PN
Hence, i=1 i = AIn if and only if
ui u⊤ i=1 ⟨ui , x⟩
2
= A∥x∥22 , i.e., {ui }N
i=1 being a tight frame. ⊛

Week 11: High-Dimensional Sub-Gaussian Distributions


29 Mar. 2024
3.4 Sub-gaussian distributions in higher dimensions
Problem (Exercise 3.4.3). This exercise clarifies the role of independence of coordinates in Lemma
3.4.2.
1. Let X = (X1 , . . . , Xn ) ∈ Rn be a random vector with sub-gaussian coordinates Xi . Show that
X is a sub-gaussian random vector.

2. Nevertheless, find an example of a random vector X with

∥X∥ψ2 ≫ max∥Xi ∥ψ2 .


i≤n

Answer. 1. We see that


n
X
∥X∥ψ2 = sup ∥⟨X, x⟩∥ψ2 ≤ sup ∥xi Xi ∥ψ2 ≤ sup ∥Xi ∥ψ2 < ∞.
x∈S n−1 x∈S n−1 i=1 x∈S n−1

2. Just consider Xi = Z are the same where Z ∼ N (0, 1). Then, we see that
p
max∥Xi ∥ψ2 = ∥Z∥ψ2 = 8/3
i

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 34


Week 11: High-Dimensional Sub-Gaussian Distributions

as E[exp Z 2 /t2 ] = 1/ 1 − 2/t2 . On the other hand,


 p

√ √
∥X∥ψ2 ≥ ⟨X, 1n / n⟩
p
ψ2
= ∥ nZ∥ψ2 = 8n/3.

Problem (Exercise 3.4.4). Show that


r
n
∥X∥ψ2 ≍ .
log n

Answer. Since we not only want an upper-bound, but a tight, non-asymptotic behavior, we need to
calculate ∥X∥ψ2 as precise as possible. We note that

∥X∥ψ2 = sup ∥⟨X, x⟩∥ψ2 = sup inf{t > 0 : E[exp ⟨X, x⟩2 /t2 ] ≤ 2},

x∈S n−1 x∈S n−1

and clearly the supremum is attained when x = ei for some i. In this case,

∥X∥ψ2 = inf{t > 0 : E[exp Xi2 /t2 ] ≤ 2}.





Note that since X ∼ U({ nei }i ), we see if we focus on a particular coordinate i,
n−1

0,
 w.p. ;
Xi = n
√n, w.p. 1 .

n
Hence, for any t > 0,
n−1 1 n
E[exp Xi2 /t2 ] =

+ exp 2 .
n n t
Equating the above to be exactly 2 and solve it w.r.t. t, we have
2
n − 1 + en/t
r
2 n n
= 2 ⇔ n − 1 + en/t = 2n ⇔ ln(n + 1) = 2 ⇔ t = ,
n t ln(n + 1)

meaning that
r r
n n
∥X∥ψ2 = inf{t > 0 : E[exp Xi2 /t2 ] ≤ 2} =

≍ .
ln(n + 1) log n

Problem (Exercise 3.4.5). Let X be an isotropic random vector supported in a finite set T ⊆ Rn .
Show that in order for x to be sub-gaussian with ∥X∥ψ2 = O(1), the cardinality of the set must be
exponentially large in n:
|T | ≥ ecn .

Answer. This is a hard one. See here for details. ⊛

√ (Exercise 3.4.7). Extend Theorem 3.4.6 for the√uniform distribution on the Euclidean ball
Problem
B(0, n) in Rn centered at the origin and with radius n. Namely, show that a random vector

X ∼ U(B(0, n))

is sub-gaussian, and
∥X∥ψ2 ≤ C.

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 35


Week 11: High-Dimensional Sub-Gaussian Distributions

√ √ √ √
Answer. For X ∼ U(B(0, n)), consider R := ∥X∥2 / n and Y := X/R = nX/∥X∥2 ∼ U( nS n−1 ).
From Theorem 3.4.6, ∥Y ∥ψ2 ≤ C. It’s clear that R ≤ 1, hence for any x ∈ S n−1 ,

E[exp ⟨X, x⟩2 /t2 ] = E[exp R2 ⟨Y, x⟩2 /t2 ] ≤ E[exp ⟨Y, x⟩2 /t2 ],
  

which implies ∥⟨X, x⟩∥ψ2 ≤ ∥⟨Y, x⟩∥ψ2 . Hence, ∥X∥ψ2 ≤ ∥Y ∥ψ2 ≤ C. ⊛

Problem (Exercise 3.4.9). Consider a ball of the ℓ1 norm in Rn :

K := {x ∈ Rn : ∥x∥1 ≤ r}.

(a) Show that the uniform distribution on K is isotopic for some r ≍ n.


(b) Show that the subgaussian norm of this distribution is not bounded by an absolute constant
as the dimension n grows.

D
Answer. (a) Observe that for i ̸= j, (Xi , Xj ) = (Xi , −Xj ), hence E[Xi ] = 0 and E[Xi Xj ] = 0 for
i ̸= j. Hence, for X to be isotropic, we need E[Xi2 ] = 1. Now, we note that P(|Xi | > x) =
(r − x)n /rn = (1 − x/r)n for x ∈ [0, r], hence
Z ∞ Z r Z 1
x x n dx
E[Xi2 ] = 2xP(|Xi | > x) dx = 2r2 1− = 2r2 t(1 − t)n dt,
0 0 r r r 0

which with some calculation is 2r /(n + 3n + 2). Equating this with 1 gives r ≍ n.
2 2


(b) It suffices to show that ∥Xi ∥Lp > C p, which in turns blow up the sub-Gaussian property in
terms of L norm. We see that
p

Z ∞
p
∥Xi ∥Lp = pxp−1 P(|Xi | > x) dx
0
Z r  x p−1  Z 1
p x n dx
= pr 1− = prp tp−1 (1 − t)n dt = prp · B(p, n + 1),
0 r r r 0

where B is the Beta function. From the Beta function,

Γ(p)Γ(n + 1)
∥Xi ∥pLp = prp · ,
Γ(p + n + 1)

hence ∥Xi ∥Lp > C p is evident from the Stirling’s formula.

Problem (Exercise 3.4.10). Show that the concentration inequality in Theorem 3.1.1 may not hold
for a general isotropic sub-gaussian random vector X. Thus, independence of the coordinates of X
is an essential requirement in that result.

Answer. We want to show that ∥∥X∥2 − n∥ψ2 ≤ C max∥Xi ∥2ψ2 does not hold for a general isotropic
sub-Gaussian random vector X with E[Xi2 ] = 1. Let 0 < a < 1 < b such that a2 + b2 = 2, and
define
X := (aZ)ϵ (bZ)1−ϵ ,
where ϵ ∼ Bern(1/2) and Z ∼ N (0, In ). In human language, consider X has a distribution
1 1
FX := FaZ + FbZ .
2 2

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 36


Week 12: High-Dimensional Sub-Gaussian Distributions

With this construction, X is isotropic since


1 1
E[XX ⊤ ] = E[(aZ)(aZ)⊤ ] + E[(bZ)(bZ)⊤ ]
2 2  2
b2

1 2 ⊤ 1 2 ⊤ a
= a E[ZZ ] + b E[ZZ ] = + In = In ,
2 2 2 2

and E[Xi2 ] = 1 with a similar calculation. Moreover, for any vector x ∈ S n−1 ,

1 1
E[exp ⟨X, x⟩2 /t2 ] = p

+ p <2
2
2 1 − 2a /t 2 2 1 − 2b2 /t2

when t is large enough (compared to a, b). This shows ∥⟨X, x⟩∥ψ2 ≤ t, and since a, b is taken to be
constants, X is indeed a sub-Gaussian random vector. √
Now, we show that the norm of √ X actually deviates away from n at a non-vanishing rate of
n. In particular, conciser t = (b − 1) n/2, then
√ 2 2 √
n /t ] > E[exp (∥bZ∥2 − n)2 /t2 ]

2E[exp ∥X∥2 −

> E[exp (∥bZ∥2 − n)2 /t2 1∥Z∥22 >n ]

√ √
> exp (b n − n)2 /t2 P(∥Z∥22 > n) since b > 1


4
=e P(∥Z∥22 > n)
4
→ e /2 > 4
Pn
since P(∥Z∥22 > n) = P Zi2 > n , and with E[Zi2 ] = Var[Zi ] = 1, and Var[Zi2 ] = E[Zi4 ] −

i=1
E[Zi ]2 = 3 − 1 = 2 < ∞,
Pn n
!
1
Z2 − 1 1 X D
n
√i=1 √i =√ Zi2 −n → N (0, 1)
2/ n 2n i=1
Pn
by the central limit theorem,
Pn hence, the asymptotic distribution of i=1 Zi2 −n is symmetric around
n
0, meaning that P( i=1 Zi2 > n) = P( i=1 Zi2 − n > 0) = 1/2. This implies that for all large
P
enough n, √
√ n
∥∥X∥2 − n∥ψ2 ≥ t = (b − 1) → ∞.
2

Week 12: High-Dimensional Sub-Gaussian Distributions


3 Apr. 2024
3.5 Application: Grothendieck’s inequality and semidefinite pro-
gramming
Problem (Exercise 3.5.2). 1. Check that the assumption of Grothendieck’s inequality can be
equivalently stated as follows:

X
aij xi yi ≤ max|xi | · max|yj |
i j
i,j

for any real numbers xi and yj .


2. Show that the conclusion of Grothendieck’s inequality can be equivalently stated as follows:

X
aij ⟨ui , vj ⟩ ≤ K max∥ui ∥ · max∥vj ∥
i j
i,j

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 37


Week 12: High-Dimensional Sub-Gaussian Distributions

for any Hilbert space H and any vectors ui , vj ∈ H.

Answer. Omit. ⊛

Problem (Exercise 3.5.3). Deduce the following version of Grothendieck’s inequality for symmetric
n × n matrices A = (aij ) with real entries. Suppose that A is either positive semidefinie or has zero
diagonal. Assume that, for any numbers xi ∈ {−1, 1}, we have

X
aij xi xj ≤ 1.
i,j

Then, for any Hilbert space H and any vectors ui , vj ∈ H satisfying ∥ui ∥ = ∥vj ∥ = 1, we have

X
aij ⟨ui , vj ⟩ ≤ 2K,
i,j

where K is the absolute constant from Grothendieck’s inequality.

Answer. Omit. ⊛

Problem (Exercise 3.5.5). Show that the optimization (3.21) is equivalent to the following semidef-
inite program:
max⟨A, X⟩ : X ⪰ 0, Xii = 1 for i = 1, . . . , n.

Answer. Omit. ⊛

Problem (Exercise 3.5.7). Let A be an m × n matrix. Consider the optimization problem


X
max Aij ⟨Xi , Yj ⟩ : ∥Xi ∥2 = ∥Yj ∥2 = 1 for all i, j
i,j

over Xi , Yj ∈ Rk and k ∈ N. Formulate this problem as a semidefinite program.

Answer. Omit. ⊛

3.6 Application: Maximum cut for graphs


Problem (Exercise 3.6.4). For any ϵ > 0, given an (0.5 − ϵ)-approximation algorithm for maximum
cut, which is always guaranteed to give a suitable cut, but may have a random running time. Give
a bound on the expected running time.

Answer. Omit. ⊛

Problem (Exercise 3.6.7). Prove Grothendieck’s identity.

Answer. Omit. ⊛

3.7 Kernel trick, and tightening of Grothendieck’s inequality


Problem (Exercise 3.7.4). Show that for any vectors u, v ∈ Rn and k ∈ N, we have

u⊗k , v ⊗k = ⟨u, v⟩k .

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 38


Week 12: High-Dimensional Sub-Gaussian Distributions

Answer. This is immediate from the definition, i.e.,

n
!k
X X X
⊗k ⊗k
⟨u ,v ⟩= ui1 ...ik vi1 ...ik = ui1 . . . uik vi1 . . . vik = ui vi
i1 ,...,ik i1 ,...,ik i=1

by observation (and probably term-matching). ⊛

Problem (Exercise 3.7.5). (a) Show that there exist a Hilbert space H and a transformation
Φ : Rn → H such that

⟨Φ(u), Φ(v)⟩ = 2⟨u, v⟩2 + 5⟨u, v⟩3 for all u, v ∈ Rn .

(b) More generally, consider a polynomial f : R → R with non-negative coefficients, and construct
H and Φ such that
⟨Φ(u), Φ(v)⟩ = f (⟨u, v⟩) for all u, v ∈ Rn .

(c) Show the same for any real analytic function f : R → R with non-negative coefficients, i.e.,
for any function that can be represented as a convergent series

X
f (x) = a k xk , x∈R (3.2)
k=0

and such that ak ≥ 0 for all k.


√ √
Answer. (a) Consider H = Rn×n ⊕ Rn×n×n . Then, consider Φ(x) := ( 2x⊗2 , 5x⊗3 ), and we
have
√ √ √ √
⟨Φ(u), Φ(v)⟩ = ⟨( 2u⊗2 , 5u⊗3 ), ( 2v ⊗2 , 5v ⊗3 )⟩
= 2⟨u⊗2 , v ⊗2 ⟩ + 5⟨u⊗3 , v ⊗3 ⟩ = 2⟨u, v⟩2 + 5⟨u, v⟩3 ,

where the last equality follows from Exercise 3.7.4.


Pm
(b) Consider an m-order polynomial of ⟨u, v⟩, which we write f (⟨u, v⟩) =: k=0 ak ⟨u, v⟩k . Then,
by noting that ak ≥ 0, we may define
m m
M k M √ √ √ √ √
H := Rn , and Φ(x) := ak x⊗k = ( a0 , a1 x, a2 x⊗2 , . . . , am x⊗m ).
k=0 k=0

Then by a similar calculation as (a), we have ⟨Φ(u), Φ(v)⟩ = f (⟨u, v⟩) for all u, v ∈ Rn .
(c) In this case, we just let m = ∞ in (b), i.e., consider
∞ ∞
M k M √
H := Rn , and Φ(x) := ak x⊗k ,
k=0 k=0

where the limit is allowed as f converges everywhere. Note that ak ≥ 0, hence ak is also
well-defined.

Problem (Exercise 3.7.6). Let f : R → R be any real analytic function (with possibly negative coeffi-
cients in Equation 3.2). Show that there exist a Hilbert space H and transformation Φ, Ψ : Rn → H
such that
⟨Φ(u), Ψ(v)⟩ = f (⟨u, v⟩) for all u, v ∈ Rn .

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 39


Week 12: High-Dimensional Sub-Gaussian Distributions

Moreover, check that



X
∥Φ(u)∥2 = ∥Ψ(u)∥2 = |ak |∥u∥2k
2 .
k=0

Answer. Again, similar to Exercise 3.7.5 (c), we construct


∞ ∞ ∞
M
nk
M √ M
R , and Φ(x) := , and Ψ(x) :=
⊗k
p
H := ak x sgn(ak ) |ak |x⊗k .
k=0 k=0 k=0

Then, ⟨Φ(u), Ψv ⟩ = f (⟨u, v⟩) since the sign of ak is now taking care by Ψ. The norm can be
calculated as
∞ p
X p
∥Φ(u)∥2 = ⟨Φ(u), Φu ⟩ = ⟨ |ak |u⊗k , |ak |u⊗k ⟩
k=0
X∞ ∞
X ∞
X
⊗k ⊗k k
= |ak |⟨u ,u ⟩= |ak |⟨u, u⟩ = |ak |∥u∥2k
2 ,
k=0 k=0 k=0

where the last equality follows from Exercise 3.7.4. A similar calculation can be carried out for
∥Ψ(u)∥2 . ⊛

CHAPTER 3. RANDOM VECTORS IN HIGH DIMENSIONS 40


Chapter 4

Random matrices

4.1 Preliminaries on matrices


Problem (Exercise 4.1.1). Suppose A is an invertible matrix with singular value decomposition
n
X
A= si ui vi⊤ .
i=1

Check that
n
X 1
A−1 = v i u⊤
i .
i=1
si

Answer. Let A = U ΣV ∗ , and it suffices to check that


n
!
X 1 ⊤
A vi ui = In .
s
i=1 i

Indeed, by plugging A, we have


n
! n ! n n
X X 1 X si X
⊤ ⊤
si ui vi vi ui = ui vi⊤ vi u⊤
i = ui u⊤
i = UU

= In ,
i=1 i=1
si i=1
si i=1
Pn
where all the cross-terms vanish since vi⊤ vj = 0 as V is orthonormal, and i=1 ui u⊤
i = UU

= In
since U is again orthonormal. ⊛

Problem (Exercise 4.1.2). Prove the following bound on the singular values si of any matrix A:
1
si ≤ √ ∥A∥F .
i

Answer. We have seen that ∥A∥F = ∥s∥2 = k sk , hence


pP
2

r
X X
∥A∥2F = s2i ≥ s2k ≥ is2i
k=1 k≤i

since we arrange sk ’s in the decreasing order. This proves the result. ⊛

Problem (Exercise 4.1.3). Let Ak be the best rank k approximation of a matrix A. Express ∥A−Ak ∥2
and ∥A − Ak ∥2F in terms of the singular values si of A.

41
Week 12: High-Dimensional Sub-Gaussian Distributions

Answer. From Eckart-Young-Mirsky theorem, we have


k
X
Ak = si ui vi⊤ ,
i=1

hence
n
X
A − Ak = si ui vi⊤ .
i=k+1

This implies, the singular values of the matrix A − Ak are just sk+1 , . . . , sn ,a implying

∥A − Ak ∥2 = s2k+1 ,

and
n
X
∥A − Ak ∥2F = s2i .
i=k+1


a This can be seen from the fact that the same U and V still work, but now si = 0 for all 1 ≤ i ≤ k.

Problem (Exercise 4.1.4). Let A be an m×n matrix with m ≥ n. Prove that the following statements
are equivalent.
(a) A⊤ A = In .

(b) P := AA⊤ is an orthogonal projection a in Rm onto a subspace of dimension n.


(c) A is an isometry, or isometric embedding of Rn into Rm , which means that

∥Ax∥2 = ∥x∥2 for all x ∈ Rn .

(d) All singular values of A equal 1; equivalently

sn (A) = s1 (A) = 1.
a Recall that P is a projection if P 2 = P , and P is called orthogonal if the image and kernel of P are orthogonal

subspaces.

Answer. It’s easy to see that (a), (c), and (d) are all equivalent. Indeed, for (a) and (c), we
want ∥Ax∥22 = (Ax)⊤ (Ax) = xA⊤ Ax = x⊤ x = ∥x∥22 , and the equivalency lies in the equality
xA⊤ Ax = x⊤ x. If ∥Ax∥2 = ∥x∥2 holds for all x, since A⊤ A is a symmetric matrix, we know that
this means A⊤ A = In . On the other hand, if A⊤ A = In , then we clearly have the equality. For (c)
and (d), noting the Equation 4.5 suffices. Now, we focus on proving the equivalence between (a)
and (b).

• (a)⇒(b): Suppose A⊤ A = In . Then P = AA⊤ is a projection since P 2 = AA⊤ AA⊤ =


AIn A⊤ = AA⊤ = P . Moreover, observe that P ⊤ = P , hence P is also an orthogonal
projection.a
Finally, we need to show that rank(P ) = rank(AA⊤ ) = n. But since A⊤ A = In ,

n = rank(In ) = rank(A⊤ A) ≤ rank(A) ≤ n

as matrix multiplication can only reduce the rank, hence rank(A) = n. This also implies
rank(A⊤ ) = n, hence we’re left to check whether Im A⊤ ∩ ker A = ∅. If this is true, then
rank(AA⊤ ) = n as well, and we’re done. But it’s well-known that Im A⊤ = (ker A)⊤ , which
completes the proof.

• (b)⇒(a): We want to show that if P = AA⊤ is an orthogonal projection on a subspace of

CHAPTER 4. RANDOM MATRICES 42


Week 13: Covering and Packing Numbers

dimension n, then A⊤ A = In . Observe that since P 2 = P ,

(AA⊤ )(AA⊤ ) = AA⊤ ⇔ A(A⊤ A − In )A⊤ = 0.

Now, we use the fact that rank(P ) = rank(AA⊤ ) = n. From the previous argument, we know
that rank(A) = rank(A⊤ ) = n, and hence

A(A⊤ A − In )A⊤ = 0 ⇒ A(A⊤ A − In ) = 0

as A⊤ spans all Rn . Taking the transpose, we again have

(A⊤ A − In )⊤ A⊤ = 0 ⇒ (A⊤ A − In )⊤ = 0

since again, A⊤ spans all Rn . We hence have A⊤ A = In as desired.


a Note that such a characterization is standard. See here for example.

Problem (Exercise 4.1.6). Prove the following converse to Lemma 4.1.5: if (4.7) holds, then

∥A⊤ A − In ∥ ≤ 3 max(δ, δ 2 ).

Answer. Firstly, by the quadratic maximizing characterization, we have

∥A⊤ A − In ∥ = max ⟨(A⊤ A − In )x, y⟩


x∈S n−1 ,y∈S n−1

≤ max
n−1
|x⊤ (A⊤ A − In )x| = max
n−1
|∥Ax∥22 − 1|.
x∈S x∈S

Since we assume that ∥Ax∥2 ∈ [1 − δ, 1 + δ] (with x ∈ S n−1 now),

∥A⊤ A − In ∥ ≤ max|(1 ± δ)2 − 1| = max|δ 2 ± 2δ| ≤ 3 max(δ, δ 2 ).

Problem (Exercise 4.1.8). Canonical examples of isometries and projections can be constructed from
a fixed unitary matrix U . Check that any sub-matrix of U obtained by selecting a subset of columns
is an isometry, and any sub-matrix obtained by selecting a subset of rows is a projection.

Answer. Consider a tall sub-matrix An×k of Un×n for some k < n. We know that A is an isometry
if and only if A⊤ is a projection. From Remark 4.1.7, it suffices to check A⊤ A = Ik . But this
is trivial since U is unitary, and we’re basically computing pair-wise inner products between some
columns (selected in A) of U .
On the other hand, consider a fat sub-matrix Bk×n of Un×n for some k < n. We want to show
that B ⊤ B is an orthogonal projection (of dimension k). From Exercise 4.1.4, it’s equivalent to
showing B ⊤ is an isometry, and from the above, it reduces to show that U ⊤ is also unitary since
B ⊤ can be viewed as a tall sub-matrix of U ⊤ . But this is true by definition. ⊛

Week 13: Covering and Packing Numbers


12 Apr. 2024
4.2 Nets, covering numbers and packing numbers
Problem (Exercise 4.2.5). (a) Suppose T is a normed space. Prove that P(K, d, ϵ) is the largest
number of closed disjoint balls with centers in K and radii ϵ/2.

(b) Show by example that the previous statement may be false for a general metric space T .

CHAPTER 4. RANDOM MATRICES 43


Week 13: Covering and Packing Numbers

Answer. (a) Consider any ϵ-separated subset of K. Then, B(xi , ϵ/2)’s are disjoint since if not,
then there exists y ∈ B(xi , ϵ/2) ∩ B(xj , ϵ/2) such that
ϵ ϵ
ϵ < d(xi , xj ) ≤ d(xi , y) + d(xj , y) ≤ + = ϵ,
2 2
a contradiction. On the other hand, if d(xi , xj ) ≤ ϵ then
xi + xj
∈ B(xi , ϵ/2) ∩ B(xj , ϵ/2),
2
hence, there is a one-to-one correspondence between ϵ-separated subset of K and families of
closed disjoint balls with centers in K and radii ϵ/2, proving the result.
(b) Let T = Z and d(x, y) = 1x̸=y . For K = {0, 1} and ϵ = 1, we have P(K, d, 1) = 1. On the
other hand, B(0, 1/2) = {0} and B(1, 1/2) = {1} are disjoint. If the result of (a) holds, then
at least P(K, d, 1) = 2 as there are exactly two such disjoint closed balls.

Problem (Exercise 4.2.9). In our definition of the covering numbers of K, we required that the
centers xi of the balls B(xi , ϵ) that form a covering lie in K. Relaxing this condition, define the
exterior covering number N ext (K, d, ϵ) similarly but without requiring that xi ∈ K. Prove that

N ext (K, d, ϵ) ≤ N (K, d, ϵ) ≤ N ext (K, d, ϵ/2).

Answer. The lower bound is trivial. We focus on the upper bound. Consider an exterior cover
{B(xi , ϵ/2)} of K where xi might not lie in K. Now, for every i, choose exactly one yi from
B(xi , ϵ/2) ∩ K is it’s non-empty. Then, {B(yi , ϵ)} covers K since

B(xi , ϵ/2) ∩ K ⊆ B(yi , ϵ)

from d(x, yi ) ≤ d(x, xi ) + d(xi , yi ) ≤ ϵ/2 + ϵ/2 = ϵ for any x ∈ B(xi , ϵ/2). Hence, by taking the
union over i, {B(yi , ϵ)} indeed cover K, so the upper bound is proved. ⊛

Problem (Exercise 4.2.10). Give a counterexample to the following monotonicity property:

L ⊆ K implies N (L, d, ϵ) ≤ N (K, d, ϵ).

Prove an approximate version of monotonicity:

L ⊆ K implies N (L, d, ϵ) ≤ N (K, d, ϵ/2).

Answer. The problem lies in the fact that we’re not allowing exterior covering. Consider K = [−1, 1]
and L = {−1, 1}. Then, N (L, d, 1) = 2 > 1 = N (K, d, 1) for d(x, y) = |x − y|.
The approximate version of monotonicity can be proved with a similar argument as Exercise
4.2.9: specifically, consider an ϵ/2-covering {xi } of K with size exactly N (K, d, ϵ/2). Now, for every
i, choose one yi ∈ B(xi , ϵ/2) ∩ L if the latter is non-empty. It turns out that {B(yi , ϵ)} covers L.
Indeed, B(xi , ϵ/2) ∩ L ⊆ B(yi , ϵ) since
ϵ ϵ
d(x, yi ) ≤ d(x, xi ) + d(xi , yi ) ≤ + =ϵ
2 2
for all x ∈ B(xi , ϵ/2). ⊛

Intuition. The fundamental idea is just every such B(yi , ϵ) can cover B(xi , ϵ/2).

Problem (Exercise 4.2.15). Check that dH is indeed a metric.

CHAPTER 4. RANDOM MATRICES 44


Week 14: Random Sub-Gaussian Matrices

Answer. We check the following.


• dH (x, x) = 0 for all x and dH (x, y) > 0 for all x ̸= y: Trivial.
• dH (x, y) = dH (y, x) for all x, y: Trivial.
• dH (x, y) ≤ dH (x, z) + dH (y, z) for all x, y, z: Suppose x and y initially disagrees at dH (x, y)
places, and denote the set of those disagreeing indices as I. Then for any z, as long as z and
x (hence y) disagrees at an index outside I, dH (x, z) + dH (y, z) increases by 2. There’s no
way to exist a z such that dH (x, z) + dH (y, z) can decrease, at best z and x (or y) disagrees
at an index in I, then it’ll coincide with y (or x), contributing the same amount to dH (x, y).

Problem (Exercise 4.2.16). Let K = {0, 1}n . Prove that for every integer m ∈ [0, n], we have

2n 2n
Pm n
 ≤ N (K, dH , m) ≤ P(K, dH , m) ≤ P⌊m/2⌋ .
n
k=0 k k=0 k

Answer. The middle inequality follows from Lemma 4.2.8. Now, for K = {0, 1}n , we first note that
we have |K| = 2n . Furthermore, observe the following.

Claim. For any x ∈ K, we have


m m  
X X n
|{y ∈ K : dH (x, y) ≤ m}| = |{y ∈ K : dH (x, y) = k}| = .
k
k=0 k=0

We then see the following.


• Lower bound: observe that |K| ≤ N (K, dH , m)|{y ∈ K : dH (xi , y) ≤ m}| where {xi } is an
m-net of K of size N (K, dH , m).

• Upper bound: observe that |K| ≥ P(K, dH , m)|{y ∈ K : dH (xi , y) ≤ ⌊m/2⌋}| where {xi } is
m-packing of size P(K, dH , m).
Plugging the above calculation complete the proof of both bounds. ⊛

Remark. Unlike Proposition 4.2.12, we don’t have the issue of “going outside K” since we’re working
with a hamming cube, i.e., the entire universe is exactly the collection of n-bits string. Moreover,
for the upper bound, we use ⌊m/2⌋ since m ∈ N, and taking the floor makes sure that {y ∈
K : dH (x, y) ≤ ⌊m/2⌋}’s are disjoint for {xi } being m-separated. Hence, the total cardinality is
upper bounded by |K|.

Week 14: Random Sub-Gaussian Matrices


17 Apr. 2024
4.3 Application: error correcting codes
Problem (Exercise 4.3.7). (a) Prove the converse to the statement of Lemma 4.3.4.

(b) Deduce a converse to Theorem 4.3.5. Conclude that for any error correcting code that encodes
k-bit strings into n-bit strings and can correct r errors, the rate must be

R ≤ 1 − f (δ)

where f (t) = t log2 (1/t) as before.

CHAPTER 4. RANDOM MATRICES 45


Week 14: Random Sub-Gaussian Matrices

Answer. Omit. ⊛

4.4 Upper bounds on random sub-gaussian matrices


Problem (Exercise 4.4.2). Let x ∈ Rn and N be an ϵ-net of the sphere S n−1 . Show that
1
sup ⟨x, y⟩ ≤ ∥x∥2 ≤ sup ⟨x, y⟩.
y∈N 1 − ϵ y∈N

Answer. The lower bound is again trivial. On the other hand, for any x ∈ Rn , consider an x0 ∈ N
such that ∥x0 −x/∥x∥2 ∥2 ≤ ϵ (normalization is necessary since N is an ϵ-net of S n−1 , while x ∈ Rn ).
Now, observe that from the Cauchy-Schwarz inequality, we have
 
x x
∥x∥2 − ⟨x, x0 ⟩ = x, − x0 ≤ ∥x∥2 − x0 ≤ ϵ∥x∥2 ,
∥x∥2 ∥x∥2

which implies ⟨x, x0 ⟩ ≥ (1 − ϵ)∥x∥2 . This proves the upper bound. ⊛

Problem (Exercise 4.4.3). Let A be an m × n matrix and ϵ ∈ [0, 1/2).


(a) Show that for any ϵ-net N of the sphere S n−1 and any ϵ-net M of the sphere S m−1 we have
1
sup ⟨Ax, y⟩ ≤ ∥A∥ ≤ · sup ⟨Ax, y⟩.
x∈N ,y∈M 1 − 2ϵ x∈N ,y∈M

(b) Moreover, if m = n and A is symmetric, show that


1
sup |⟨Ax, x⟩| ≤ ∥A∥ ≤ · sup |⟨Ax, x⟩|.
x∈N 1 − 2ϵ x∈N

Answer. (a) The lower bound is again trivial. On the other hand, denote x∗ ∈ S n−1 and y ∗ ∈
S m−1 such that ∥A∥ = ⟨Ax∗ , y ∗ ⟩. Pick x0 ∈ N and y0 ∈ M such that ∥x∗ −x0 ∥2 , ∥y ∗ −y0 ∥2 ≤
ϵ. We then have
⟨Ax∗ , y ∗ ⟩ − ⟨Ax0 , y0 ⟩ = ⟨A(x∗ − x0 ), y ∗ ⟩ + ⟨Ax0 , y ∗ − y0 ⟩
≤ ∥A∥(∥x∗ − x0 ∥2 ∥y ∗ ∥2 + ∥x0 ∥2 ∥y ∗ − y0 ∥2 ) ≤ 2ϵ∥A∥

as ∥y ∗ ∥ = ∥x0 ∥2 = 1. Rewrite the above, we have


1 1
∥A∥ − ⟨Ax0 , y0 ⟩ ≤ 2ϵ∥A∥ ⇒ ∥A∥ ≤ ⟨Ax0 , y0 ⟩ ≤ sup ⟨Ax, y⟩.
1 − 2ϵ 1 − 2ϵ x∈N ,y∈N

(b) Following the same argument as (a), with y ∗ := x∗ and y0 := x0 . To be explicit to handle the
absolute value, we see that

|⟨Ax∗ , x∗ ⟩| − |⟨Ax0 , x0 ⟩| ≤ |⟨Ax∗ , x∗ ⟩ − ⟨Ax0 , x0 ⟩| ≤ 2ϵ∥A∥,

from the same argument. The result follows immediately.

Problem (Exercise 4.4.4). Let A be an m × n matrix, µ ∈ R and ϵ ∈ [0, 1/2). Show that for any
ϵ-net N of the sphere S n−1 , we have
C
sup |∥Ax∥2 − µ| ≤ · sup |∥Ax∥2 − µ|.
x∈S n−1 1 − 2ϵ x∈N

CHAPTER 4. RANDOM MATRICES 46


Week 14: Random Sub-Gaussian Matrices

Answer. Let µ = 1. Firstly, for x ∈ S n−1 , observe that we can write

∥Ax∥22 − 1 = ⟨Rx, x⟩

for a symmetric R = A⊤ A − In . Secondly, there exists x∗ such that ∥R∥ = ⟨Rx∗ , x∗ ⟩, consider
x0 ∈ N such that ∥x0 − x∗ ∥ ≤ ϵ. Now, from a numerical inequality |z − 1| ≤ |z 2 − 1| for z > 0, we
have
sup |∥Ax∥2 − 1| ≤ sup ∥Ax∥22 − 1 = ∥R∥
x∈S n−1 x∈S n−1
1 1
≤ sup |⟨Rx, x⟩| = sup ∥Ax∥22 − 1 ,
1 − 2ϵ x∈N 1 − 2ϵ x∈N

where the last inequality follows from Exercise 4.4.3. Further, factoring |∥Ax∥22 − 1| get
1
sup |∥Ax∥2 − 1| ≤ sup |∥Ax∥2 − 1| (∥Ax∥2 + 1) .
x∈S n−1 1 − 2ϵ x∈N

If ∥A∥ ≤ 2, then ∥Ax∥2 + 1 ≤ 3, and C = 3 suffices.


On the other hand, if ∥A∥ > 2, consider directly computing the left-hand side

sup |∥Ax∥2 − 1| = ∥A∥ − 1


x∈S n−1

where the maximum is attained at some x′ ∈ S n−1 . With the existence of x′′ ∈ N ∩ {x : ∥x − x′ ∥2 ≤
ϵ}, the supremum over N can be lower bounded as

sup |∥Ax∥2 − 1| ≥ ∥Ax′′ ∥2 − 1 ≥ ∥Ax′ ∥2 − ∥A(x′′ − x′ )∥2 − 1 ≥ ∥A∥(1 − ϵ) − 1 > 1 − 2ϵ.


x∈N

The above implies the following.


• ∥A∥ ≤ 1
1−ϵ (supx∈N |∥Ax∥2 − 1| + 1).
• supx∈N |∥Ax∥2 − 1| > 1 − 2ϵ.
This allows us to conclude that
 
1
sup |∥Ax∥2 − 1| = ∥A∥ − 1 ≤ sup |∥Ax∥2 − 1| + 1 − 1
x∈S n−1 1−ϵ x∈N
 
1 3
= sup |∥Ax∥2 − 1| + ϵ ≤ sup |∥Ax∥2 − 1| ,
1 − ϵ x∈N 1 − 2ϵ x∈N

provided that

1 − 2ϵ  ϵ  1 − 2ϵ supx∈N |∥Ax∥2 − 1| + ϵ
C := 3 ≥ sup 1+ ≥ ,
d>1−2ϵ 1 − ϵ d 1 − ϵ supx∈N |∥Ax∥2 − 1|

which is true since the middle supremum is just 1. The case that µ ̸= 1 can be easily generalized
by considering R = A⊤ A − µIn . ⊛

Problem (Exercise 4.4.6). Deduce from Theorem 4.4.5 that


√ √
E[∥A∥] ≤ CK( m + n).

Answer. From Theorem 4.4.5, for any t > 0, we have


√ √
P(∥A∥ − CK( m + n) > CKt) ≤ 2 exp −t2 .


CHAPTER 4. RANDOM MATRICES 47


Week 15: Stochastic Block Model and Community Detection

Then we immediately have


√ √ √ √
E[∥A∥ − CK( m + n)] = E[∥A∥] − CK( m + n)
Z ∞
√ √
= P(∥A∥ − CK( m + n) > CKt)CK dt
0
Z ∞

exp −t2 dt = CK π,

≤ 2CK
0
√ √ √ √
hence E[∥A∥] ≤ CK( m + n + π), and choosing a large enough C subsumes π. ⊛

Problem (Exercise 4.4.7). Suppose that in Theorem 4.4.5 the entries Aij have unit variances. Prove
that for sufficiently large n and m one has
1 √ √
E[∥A∥] ≥ ( m + n).
4

Answer. Clearly, by choosing x = e1 ∈ S n−1 ,

∥A∥ = sup ∥Ax∥2 ≥ ∥(Ai1 )1≤i≤m ∥2 .


x∈S n−1

On the other hand, by picking x = (A11 /∥(A1j )1≤j≤n ∥2 , . . . , A1n /∥(A1j )1≤j≤n ∥2 ) ∈ S n−1 and
y = e1 ∈ S m−1 , we have
n
X A1j
∥A∥ = sup ⟨Ax, y⟩ ≥ A1j = ∥(A1j )1≤j≤n ∥2 .
x∈S n−1 ,y∈S m−1 j=1
∥(A1j )1≤j≤n ∥2

Hence, ∥A∥ is lower bounded by the norm of the first row and column, i.e.,

∥A∥ ≥ max(∥(Ai1 )1≤i≤m ∥2 , ∥(A1j )1≤j≤n ∥2 ).


√ √
Exercise 3.1.4 (b), the√expectation
√ of ∥A∥ is then greater than or equal to max( m−o(1), n−o(1))
by Thus, E[∥A∥] ≥ ( m + n − o(1))/2. ⊛

Remark. An easier way to deduce the second (i.e., lower bounded by the norm of the first row) is
to note that ∥A⊤ ∥ = ∥A∥ by some elementary (functional) analysis.

Week 15: Stochastic Block Model and Community Detection


8 Jun. 2024
4.5 Application: community detection in networks
Problem (Exercise 4.5.2). Check that the matrix D has rank 2, and the non-zero eigenvalues λi and
the corresponding eigenvectors ui are

1 1n/2×1
       
p+q p−q
λ1 = n, u1 = n/2×1 , λ2 = n, u2 = .
2 1n/2×1 2 −1n/2×1

Answer. Let n be an even number. Firstly, for any D ∈ Rn×n , columns 1 to n/2 are identical, same
for columns n/2 + 1 to n. Furthermore, since p > q, column 1 and n/2 + 1 are linear independent,
so rank(D) = 2.
Instead of solving the characteristic equation and find the eigenvalues, and find the corresponding
eigenvectors later, since we know that rank(D) = 2, it’s immediate that there are only 2 non-zero

CHAPTER 4. RANDOM MATRICES 48


Week 15: Stochastic Block Model and Community Detection

eigenvalues. Hence, we directly verify that

11×n/2
     ⊤
p+q p−q
λ1 = n, u1 = 1n×1 , λ2 = n, u2 = .
2 2 −11×n/2

For λ1 , indeed, since  p+q   


2 n 1
 p+q n   1
 2 
 .. 
n ·  ...  .
p+q 
Du1 = λ1 u1 ⇒  .  =
 
 q+p  2  
1
2 n
 
q+p
2 n
1
On the other hand, for λ2 , we have
 p−q   
2 n 1
 p−q n  1
 2  
Du2 = λ2 u2 ⇒  ...  =
p−q
n ·  ...  ,
 
   
 q−p  2  
 n  −1
2
q−p
2 n
−1

which again holds. ⊛

Problem (Exercise 4.5.4). Deduce Weyl’s inequality from the Courant-Fisher’s min-max character-
ization of eigenvalues.

Answer. We have that from the Courant-Fisher’s min-max characterization,

λi (A) = max min ⟨Ax, x⟩.


dim E=i x∈S(E)

Now, as λi (A) = −λn−i+1 (−A), we see that

λi (A) = −λn−i+1 (−A) = − max min ⟨−Ax, x⟩ = min max ⟨Ax, x⟩.
dim E=n−i+1 x∈S(E) dim E=n−i+1 x∈S(E)

We now show the Weyl’s inequality.

Theorem 4.5.1 (Weyl’s inequality). λi+j−1 (A + B) ≤ λi (A) + λj (B) ≤ λi+j−n (A + B).


Proof. We first show the lower-bound. From the Courant-Fisher’s min-max characterization,
it suffices to show that for any E with dim E = i + j − 1, there exists some x ∈ S(E) such that
⟨(A + B)x, x⟩ ≤ λi (A) + λj (B).
We first analyze λi (A). We know that from the max-min characterization,

λi (A) = min max ⟨Ax, x⟩,


dim E=n−i+1 x∈S(E)

i.e., there exists some EA with dim EA = n − i + 1 such that λi (A) = maxx∈S(EA ) ⟨Ax, x⟩.
Similarly, there exists some EB with dim EB = n − j + 1 satisfying the same property. Hence,
it suffices to find some unit vector x in EA ∩ EB ∩ E. We see that

dim(EA ∩ EB ) ≥ dim EA + dim EB − n = n − i − j + 2,

which implies that EA ∩ EB will have a non-trivial intersection with E since dim E = i + j − 1,
hence we’re done. For the upper-bound, taking the negative gives the result. ■

CHAPTER 4. RANDOM MATRICES 49


Week 16: Tighter Bounds on Sub-Gaussian Matrices

To obtain the spectral stability, we see that from Weyl’s inequality, we have
(
λi (A + B) ≤ λi (A) + λ1 (B);
⇒ λn (B) ≤ λi (A + B) − λi (A) ≤ λ1 (B).
λi (A + B) ≥ λi (A) + λn (B);

Given any symmetric S, T , by setting A := T and B := S − T , the upper-bound yields

λi (S) − λi (T ) ≤ λ1 (S − T ) = ∥S − T ∥.

On the other hand, by setting A := S and B := T − S, the upper-bound again yields

λi (T ) − λi (S) ≤ λ1 (T − S) = ∥T − S∥ = ∥S − T ∥.

As this holds for every i, we have

max|λi (S) − λi (T )| ≤ ∥S − T ∥
i

as we desired. ⊛

Week 16: Tighter Bounds on Sub-Gaussian Matrices


13 Jun. 2024
4.6 Two-sided bounds on sub-gaussian matrices
Problem (Exercise 4.6.2). Deduce from (4.22) that
  r 
1 ⊤ 2 n n
E A A − In ≤ CK + .
m m m

Answer. We have that for any t ≥ 0, with probability at least 1 − 2 exp −t2 ,


r 
1 ⊤ n t
A A − In ≤ K 2 max(δ, δ 2 ), where δ = C +√ ,
m m m

and we want to prove   r 


1 ⊤ 2 n n
E A A − In ≤ CK + .
m m m

2C 2 n C2 2
Firstly, we know that with u := K 2 (( √Cm + m )t + m t ), we get exactly
  r  
1 ⊤ n 2 n 2
P A A − In > K C 2
+C + u ≤ 2e−t .
m m m

Then, from the integral identity with the substitution v := u + K 2 (C m


pn n
+ C2 m ),
 
1 ⊤
E A A − In
m
Z K 2 (C √ m
n
+C 2 m
n
) Z ∞
!  
1 ⊤
= + √n 2n P A A − I n > v dv
0 K 2 (C m +C m ) m
Z K 2 (C √ m
n
+C 2 m
n
) Z ∞  
1 ⊤
≤ 1 dv + √n 2n P A A − In > v dv
0 K 2 (C m +C m ) m
 r  Z ∞   r  
2 n 2 n 1 ⊤ 2 n 2 n
=K C +C + P A A − In > K C +C + u du
m m 0 m m m

CHAPTER 4. RANDOM MATRICES 50


Week 16: Tighter Bounds on Sub-Gaussian Matrices

plugging back v = u + K 2 (C m
pn n
+ C2 m ),
 r  Z ∞
n n 2
≤ K2 C + C2 + 2e−t du
m m 0
 Z ∞ √
2C 2 n 2C 2
 r  
2 n 2 n −t2 2 C
=K C +C + 2e K √ + + t dt
m m 0 m m m
√ 
√ 2C 2 n 2C 2
 r    
n n C
= K2 C + C2 + K2 π √ + + ,
m m m m m

which is asymptotically ≍ K 2 ( m
pn n
+m ). ⊛

Problem (Exercise 4.6.3). Deduce from Theorem 4.6.1 the following bounds on the expectation:
√ √ √ √
m − CK 2 n ≤ E[sn (A)] ≤ E[s1 (A)] ≤ m + CK 2 n.

Answer. From Theorem 4.6.1, for any t ≥ 0,


√ √ √ √
m − CK 2 ( n + t) ≤ sn (A) ≤ s1 (A) ≤ m + CK 2 ( n + t)

with probability at least 1 − 2 exp −t2 . We want to show that




√ √ √ √
m − CK 2 n ≤ E[sn (A)] ≤ E[s1 (A)] ≤ m + CK 2 n.

Consider √ √ √ √ 
max 0, m − CK 2 n − sn (A), s1 (A) − m − CK 2 n
ξ := ≥ 0,
CK 2
then from the integral identity,
Z ∞ Z ∞
2 √
E[ξ] = P(ξ > t) dt ≤ 2e−t dt = π,
0 0

which proves the result. ⊛

Problem (Exercise 4.6.4). Give a simpler proof of Theorem 4.6.1, using Theorem 3.1.1 to obtain a
concentration bound for ∥Ax∥2 and Exercise 4.4.4 to reduce to a union bound over a net.

Answer. From the proof of Theorem 4.6.1, we know that S n−1 admits a 1/4-net N such that
|N | ≤ 9n . Furthermore, for any x ∈ N , we have
• E[⟨Ai , x⟩] = ⟨E[Ai ], x⟩ = ⟨0, x⟩ = 0;

• E[⟨Ai , x⟩2 ] = x⊤ E[A⊤


i Ai ]x = x In x = 1 (x ∈ S
⊤ n−1
too);
• ∥⟨Ai , x⟩∥ψ2 ≤ ∥Ai ∥ψ2 ≤ K for all i,

by Theorem 3.1.1, we have ∥∥Ax∥2 − m∥ψ2 ≤ CK 2 . From Proposition 2.5.2 (i), for any t > 0,
 √ p 
P |∥Ax∥2 − m| > CK( n log 9 + t)
 p  2
≤ 2 exp −( n log 9 + t)2 ≤ 2 exp −(n log 9 + t2 ) = 2 · 9−n · e−t .


Finally, from Exercise 4.4.4, with a union bound over N , we have


 n√ p √ p o
P ¬ m − 2CK 2 ( n log 9 + t) ≤ sn (A) ≤ s1 (A) ≤ m + 2CK 2 ( n log 9 + t)

CHAPTER 4. RANDOM MATRICES 51


Week 16: Tighter Bounds on Sub-Gaussian Matrices

by the definition of sn (A) and s1 (A), we have



 
p
≤ P max ∥Ax∥2 − m > 2CK 2 ( n log 9 + t)
x∈S n−1

 
p
≤ P 2 max ∥Ax∥2 − m > 2CK 2 ( n log 9 + t)
x∈N
X  √ p 
≤ P ∥Ax∥2 − m > CK 2 ( n log 9 + t)
x∈N
2 2
≤ 9n · 2 · 9−n · e−t = 2e−t .

Scaling C accommodates the additional log 9 factor finishes the proof. ⊛

4.7 Application: covariance estimation and clustering


Problem (Exercise 4.7.3). Our argument also implies the following high-probability guarantee. Check
that for any u ≥ 0, we have
r !
2 n+u n+u
∥Σm − Σ∥ ≤ CK + ∥Σ∥
m m

with probability at least 1 − 2e−u .

Answer. Omit ⊛

Problem (Exercise 4.7.6). Prove Theorem 4.7.5 for the spectral clustering algorithm applied for the
Gaussian mixture model. Proceed as follows.
(a) Compute the covariance matrix Σ of X; note that the eigenvector corresponding to the largest
eigenvalue is parallel to µ.
(b) Use results about covariance estimation to show that the sample covariance matrix Σm is close
to Σ, if the sample size m is relatively large.
(c) Use the Davis-Kahan Theorem 4.5.5 to deduce that the first eigenvector v = v1 (Σm ) is close
to the direction of µ.
(d) Conclude that the signs of ⟨µ, Xi ⟩ predict well which community Xi belongs to.

(e) Since v ≈ µ, conclude the same for v.

Answer. Omit ⊛

CHAPTER 4. RANDOM MATRICES 52


Chapter 5

Concentration without independence

Week 17: Concentration of Lipschitz Functions on Spheres


22 Jun. 2024
5.1 Concentration of Lipschitz functions on the sphere
Problem (Exercise 5.1.2). Prove the following statements.
(a) Every Lipschitz function is uniformly continuous.
(b) Every differentiable function f : Rn → R is Lipschitz, and

∥f ∥Lip ≤ sup ∥∇f (x)∥2 .


x∈Rn

(c) Give an example of a non-Lipschitz but uniformly continuous function f : [−1, 1] → R.


(d) Give an example of a non-differentiable but Lipschitz function f : [−1, 1] → R.

Answer. Omit. ⊛

Problem (Exercise 5.1.3). Prove the following statements.


(a) For a fixed θ ∈ Rn , the linear functional

f (x) = ⟨x, θ⟩

is a Lipschitz function on Rn , and ∥f ∥Lip = ∥θ∥2 .


(b) More generally, an m × n matrix A acting as a linear operator

A : (Rn , ∥·∥2 ) → (Rm , ∥·∥2 )

is Lipschitz, and ∥A∥Lip = ∥A∥.


(c) Any norm f (x) = ∥x∥ on (Rn , ∥·∥2 ) is a Lipschitz function. The Lipschitz norm of f is the
smallest L that satisfies
∥x∥ ≤ L∥x∥2 for all x ∈ Rn .

Answer. Omit. ⊛

√ √
Problem (Exercise 5.1.8). Prove inclusion (5.2), i.e., Ht ⊇ {x ∈ nS n−1 : x1 ≤ t/ 2}.

Answer. Omit. ⊛

53
Week 17: Concentration of Lipschitz Functions on Spheres


Problem (Exercise 5.1.9). Let A be the subset of the sphere nS n−1 such that

σ(A) > 2 exp −cs2 for some s > 0.




(a) Prove that σ(As ) > 1/2.


(b) Deduce from this that for any t ≥ s,

σ(A2t ) ≥ 1 − 2 exp −ct2 .




Here c > 0 is the absolute constant from Lemma 5.1.7.


Answer. Omit. ⊛

Problem (Exercise 5.1.11). We proved Theorem 5.1.4 for functions f that are Lipschitz with respect
to the Euclidean metric ∥x − y∥2 on the sphere. Argue that the same result holds for the geodesic
metric, which is the length of the shortest arc connecting x and y.

Answer. Omit. ⊛


Problem (Exercise 5.1.12). We stated Theorem 5.1.4 for the scaled sphere nS n−1 . Deduce that a
Lipschitz function f on the unit sphere S n−1 satisfies

C∥f ∥Lip
∥f (X) − E[f (X)]∥ψ2 ≤ √ ,
n

where X ∼ U(S n−1 ). Equivalently, for every t ≥ 0, we have


!
cnt2
P (|f (X) − E[f (X)]| ≥ t) ≤ 2 exp − .
∥f ∥2Lip

Answer. Omit. ⊛

Problem (Exercise 5.1.13). Consider a random variable Z with median M . Show that

c∥Z − E[Z]∥ψ2 ≤ ∥Z − M ∥ψ2 ≤ C∥Z − E[Z]∥ψ2 ,

where c, C > 0 are some absolute constants.

Answer. Omit. ⊛

Problem (Exercise 5.1.14). Consider a random vector X taking values in some metric space (T, d).
Assume that there exists K > 0 such that

∥f (X) − E[f (X)]∥ψ2 ≤ K∥f ∥Lip

for every Lipschitz function f : T → R. For a subset A ⊆ T , define σ(A) := P(X ∈ A). (Then σ is
a probability measure on T .) Show that if σ(A) ≥ 1/2 then, for every t ≥ 0,

σ(At ) ≥ 1 − 2 exp −ct2 /K 2




where c > 0 is an absolute constant.


Answer. Omit. ⊛

Problem (Exercise 5.1.15). From linear algebra, we know that any set of orthonormal vectors in Rn

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 54


Week 17: Concentration of Lipschitz Functions on Spheres

must contain at most n vectors. However, if we allow the vectors to be almost orthogonal, there
can be exponentially many of them! Prove this counterintuitive fact as follows. Fix ϵ ∈ (0, 1). Show
that there exists a set {x1 , . . . , xN } of unit vectors in Rn which are mutually almost orthogonal:

|⟨xi , xj ⟩| ≤ ϵ for all i ̸= j,

and the set is exponentially large in n:

N ≥ exp(c(ϵ)n).

Answer. Omit. ⊛

5.2 Concentration on other metric measure spaces


Problem (Exercise 5.2.3). Deduce Gaussian concentration inequality (Theorem 5.2.2) from Gaussian
isoperimetric inequality (Theorem 5.2.1).

Answer. Omit. ⊛

Problem (Exercise 5.2.4). Prove that in the concentration results for sphere and Gauss space (The-
orem 5.1.4 and 5.2.2), the expectation E[f (X)] can be replaced by the Lp norm (E[f (X)p ])1/p for
any p ≥ 1 and for any non-negative function f . The constants may depend on p.

Answer. Omit. ⊛

Problem (Exercise 5.2.11). Let Φ(x) denote the cumulative distribution function of the standard
normal distribution N (0, 1). Consider a random vector Z = (Z1 , . . . , Zn ) ∼ N (0, In ). Check that

ϕ(Z) := (Φ(Z1 ), . . . , Φ(Zn )) ∼ U([0, 1]n ).

Answer. Omit. ⊛

Problem (Exercise 5.2.12). Expressing X = ϕ(Z) by the previous exercise, use Gaussian concentra-
tion to control the deviation of f (ϕ(Z)) in terms of ∥f ◦ ϕ∥Lip ≤ ∥f ∥Lip ∥ϕ∥Lip . Show that ∥ϕ∥Lip is
bounded by an absolute constant and complete the proof of Theorem 5.2.10.

Answer. Omit. ⊛

√ n 5.2.14). Use a similar method to prove Theorem


Problem (Exercise 5.2.13. Define a function
ϕ :
√ n R n
→ nB 2 that pushes forward the Gaussian measure on R n
into the uniform measure on
nB2 , and check that ϕ has bounded Lipschitz norm.

Answer. Omit. ⊛

5.3 Application: Johnson-Lindenstrauss Lemma


Problem (Exercise 5.3.3). Let A be an m×n random matrix whose rows are independent, mean zero,
sub-gaussian isotropic random
√ vectors in Rn . Show that the conclusion of Johnson-Lindenstrauss
lemma holds for Q = (1/ m)A.

Answer. Omit. ⊛

Problem (Exercise 5.3.4). Give an example of a set X of N points for which no scaled projection

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 55


Week 18: Tighter Bounds on Sub-Gaussian Matrices

onto a subspace of dimension m ≪ log N is an approximate isometry.

Answer. Omit. ⊛

Week 18: Tighter Bounds on Sub-Gaussian Matrices


29 Jun. 2024
5.4 Matrix Bernstein’s inequality
Problem (Exercise 5.4.3). (a) Consider a polynomial

f (x) = a0 + a1 x + · · · + ap xp .

Check that for a matrix X, we have

f (X) = a0 I + a1 X + · · · + ap X p .

In the right side, we use the standard rules for matrix addition and multiplication, so in
particular, X p = X · · · X (p times) there.
(b) Consider a convergent power series expansion of f about x0 :

X
f (x) = ak (x − x0 )k .
k=1

Check that the series of matrix terms converges, and



X
f (X) = ak (X − x0 I)k .
k=1

Answer. Let X =: U ΛU ⊤ be the symmetric eigendecomposition of X.

(a) Since X k = U ΛU ⊤ · · · U ΛU ⊤ = U ΛI · · · IΛU ⊤ = U Λk U ⊤ for all k ≥ 0, then


p p p
!
X X X

f (X) = U f (Λ)U = U ak Λ U ⊤ =
k
ak U Λk U ⊤ = ak X k .
k=0 k=0 k=0

(b) Since X − x0 I = U (Λ − x0 I)U ⊤ , then by (a),


∞ ∞ ∞
!
X X X
f (X) = U ak (Λ − x0 I) U ⊤ =
k
ak U (Λ − x0 I)k U ⊤ = ak (X − x0 I)k .
k=1 k=1 k=0

Problem (Exercise 5.4.5). Prove the following properties.


(a) ∥X∥ ≤ t if and only if −tI ⪯ X ⪯ tI.
(b) Let f, g : R → R be two functions. If f (x) ≤ g(x) for all x ∈ R satisfying |x| ≤ K, then
f (X) ⪯ g(X) for all X satisfying ∥X∥ ≤ K.

(c) Let f : R → R be an increasing function and X, Y are commuting matrices. Then X ⪯ Y


implies f (X) ⪯ f (Y ).
(d) Given an example showing that property (c) may fail for non-commuting matrices.
(e) In the following parts of the exercise, we develop weaker versions of property (c) that hold

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 56


Week 18: Tighter Bounds on Sub-Gaussian Matrices

for arbitrary, nor necessarily commuting, matrices. First, show that X ⪯ Y always implies
tr f (X) ≤ tr f (Y ) for any increasing function f : R → R.
(f) Show that 0 ⪯ X ⪯ Y implies X −1 ⪯ Y −1 if X is invertible.
(g) Show that 0 ⪯ X ⪯ Y implies log X ⪯ log Y .

Answer. Let X =: U ΛU ⊤ and Y =: V M V ⊤ denote the symmetric eigendecompositions of X and


Y , respectively. Additionally, let λ := diag(Λ) and µ := diag(M ) in Rn .
(a) By the Courant-Fisher min-max theorem w.r.t. λ1 and λn ,

∥X∥ ≤ t ⇔ −t1 ≤ λ ≤ t1 ⇔ t1 ± λ ≥ 0 ⇔ tI ± X ⪰ 0 ⇔ −tI ⪯ X ⪯ tI.

(b) Since |λ| ≤ K 1, then g(λ) − f (λ) ≥ 0. This implies that g(X) − f (X) = U (g(Λ) − f (Λ))U ⊤
has non-negative eigenvalues. Therefore, g(X) ⪰ f (X).

(c) Since X and Y are symmetric and commute, then Y admits an eigendecomposition with V =
U . This implies λ ≤ µ. It follows that f (µ) − f (λ) ≥ 0, so f (Y ) − f (X) = U (f (M ) − f (Λ))U ⊤
has non-negative eigenvalues. Therefore, f (X) ⪯ f (Y ).
(d) We see that    
4 2 3 0
λ − = {5, 0},
2 4 0 0
while  3  3 ! ( √ √ )
4 2 3 0 43993 + 197 43993 − 197
λ − = ,− .
2 4 0 0 2 2

(e) Since X − Y ⪯ 0, then by the Courant-Fisher min-max theorem, for any i = 1, . . . , n,

λi − µi = max min v ⊤ Xv − max min v ⊤ Y v


dim E=i v∈S(E) dim E=i v∈S(E)
 
⊤ ⊤
≤ max min v Xv − min v Y v
dim E=i v∈S(E) v∈S(E)

≤ max max v Xv − v ⊤ Y v = max max v ⊤ (X − Y )v ≤ 0




dim E=i v∈S(E) dim E=i v∈S(E)

Since f is increasing, then f (λi ) ≤ f (µi ) for all i. It follows that


n
X n
X
tr f (X) = f (λi ) ≤ f (µi ) = tr f (Y ).
i=1 i=1

(f) Since X ⪯ Y , then I = X −1/2 XX −1/2 ⪯ X −1/2 Y X −1/2 . This implies λ(X −1/2 Y X −1/2 ) ≥ 1.
Thus, λ(X 1/2 Y −1 X 1/2 ) = λ−1 (X −1/2 Y X −1/2 ) ≤ 1, so X 1/2 Y −1 X 1/2 ⪯ I. It follows that

Y −1 = X −1/2 (X 1/2 Y −1 X 1/2 )X −1/2 ⪯ X −1/2 IX −1/2 = X −1 .

t=∞ R∞
(g) By (f), (X + tI)−1 ⪰ (Y + tI)−1 for t ≥ 0. Since log z = log 1+t
z+t = 0
1
1+t
1
− z+t dt, then
t=0
Z ∞ Z ∞
log X = ((1 + t)−1 I − (X + tI)−1 ) dt ⪯ ((1 + t)−1 I − (Y + tI)−1 ) dt = log Y.
0 0

Problem (Exercise 5.4.6). Let X and Y be n × n symmetric matrices.

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 57


Week 18: Tighter Bounds on Sub-Gaussian Matrices

(a) Show that if the matrices commute, i.e., XY = Y X, then

eX+Y = eX eY .

(b) Find and example of matrices X and Y such that

eX+Y ̸= eX eY .

Answer. (a) Since X and Y commute, by the binomial theorem and the substitution i := k − j,
∞ ∞ k ∞ ∞
X (X + Y )k X 1 X k! X Xi X Y j
eX+Y = = X k−j Y j = = eX eY .
k! k! j=0 (k − j)!j! i=0
i! j=0
j!
k=0 k=0

   
1 0 0 1
(b) For X := and Y := ,
0 −1 1 0
√ sinh

2 sinh

2
!
cosh 2+ e2 + 1 e2 − 1
√ √
 
X+Y 1
e = √ 2 √ 2 √ , e X eY = .
sinh
√ 2
cosh 2− sinh
√ 2 2 1 − e−2 1 + e−2
2 2

Problem (Exercise 5.4.11). Let X1 , . . . , XN be independent, mean zero, n × n symmetric random


matrices, such that ∥Xi ∥ ≤ K almost surely for all i. Deduce from Bernstein’s inequality that
" N
# N 1/2
X X p
E Xi ≲ E[Xi2 ] 1 + log n + K(1 + log n).
i=1 i=1

PN
Answer. Let σ 2 := ∥ i=1 √ E[Xi ]∥. By−1
2
the matrix Berstein’s inequality, for every u > 0, with the
substitution t := c −1/2
σ u + log n + c K(u + log n),
N
!  2 
t
−c min σt 2 , K
X
P Xi ≥ t ≤ 2ne ≤ 2ne−(u+log n) = 2e−u .
i=1

Then by Lemma 1.2.1,


" N #
X
E Xi
i=1

c−1/2 σ 1+log n+c−1 K(1+log n)
! N
!
Z Z ∞ X
= + √
P Xi ≥ t dt
0 c−1/2 σ 1+log n+c−1 K(1+log n) i=1

Z c−1/2 σ 1+log n+c−1 K(1+log n) Z ∞
≤ 1 dt + √
2e−u dt
0 c−1/2 σ 1+log n+c−1 K(1+log n)

2−1 c−1/2 σ
 Z 
−1/2
p −1 −1 −u
=c σ 1 + log n + c K(1 + log n) + 2e √ + c K du
1 u + log n
Z ∞  −1 −1/2 
p 2 c σ
≤ c−1/2 σ 1 + log n + c−1 K(1 + log n) + 2e−u √ + c−1 K du
1 1 + log n
 −1 −1/2 
p 2 c σ
= c−1/2 σ 1 + log n + c−1 K(1 + log n) + 2e−1 √ + c−1 K
1 + log n
p
≲ σ 1 + log n + K(1 + log n),

which is exactly what we want to show. ⊛

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 58


Week 18: Tighter Bounds on Sub-Gaussian Matrices

Problem (Exercise 5.4.12). Let ε1 , . . . , εn be independent symmetric Bernoulli random variables and
let A1 , . . . , AN be symmetric n × n matrices (deterministic). Prove that, for any t ≥ 0, we have
N
!
X
εi Ai ≥ t ≤ 2n exp −t2 /2σ 2 ,

P
i=1

PN
where σ 2 = ∥ i=1 A2i ∥.
PN
Answer. Let σ 2 := ∥ i=1 A2i ∥ and λ := t/σ 2 ≥ 0. By Exercise 2.2.3,
λ2 λ2 λ2 σ 2
PN
log E[eλεi Ai ]
PN PN
A2i λmax ( N 2
P
log cosh(λAi ) i=1 Ai )
tr e i=1 = tr e i=1 ≤ tr e i=1 2 ≤ ne 2 = ne 2

Then by the Chernoff bound and Lieb’s inequality,


N
! !
X PN
P λmax εi Ai ≥ t ≤ e−λt E[eλ·λmax ( i=1 εi Ai ) ]
i=1
PN λ2 σ 2 t2
log E[eλεi Ai ]
= e−λt tr e i=1 ≤ e−λt ne 2 = ne− 2σ2 .
PN t2
Similarly, P(λmin ( i=1 εi Ai ) ≤ −t) ≤ ne− 2σ2 . ⊛

Problem (Exercise 5.4.13). Let ε1 , . . . , εN be independent symmetric Bernoulli random variables


and let A1 , . . . , AN be symmetric n × n matrices (deterministic).
1. Prove that " N
# N 1/2
X p X
E ε i Ai ≤C 1 + log n A2i .
i=1 i=1

2. More generally, prove that for every p ∈ [1, ∞), we have


" N p #!1/p N 1/2
X p X
E ε i Ai ≤ C p + log n A2i .
i=1 i=1

Answer. Since (a) follows from (b) with p = 1, we will only prove (b) here. As the inequality
trivially holds for n = 1 with C = 1, let’s assume n ≥ 2 from now on.
q
z z 12z
1
Note that if 1 ≤ p ≤ 2, then by Stirling’s approximation Γ(z) ≤ 2π e ,

z e

1/p Z 1/p 1 p−1


Z ∞ ∞  p 1/p
−s p p π 2p p 2p
e (log(2n) + s) 2 −1 ds ≤ e −s
(0 + s) 2 −1 ds =Γ ≤ 1 1 1− 1 ,
0 0 2 2 2 − p e 2 6p2
and that if p > 2, then by Minkowski’s inequality (and the same Stirling’s approximation),
Z ∞ 1/p Z ∞ 1/p
p s 1 1 p
e−s (log(2n) + s) 2 −1 ds = e− p (log(2n) + s) 2 − p ds
0 0
1 1 1 1 1 1
since p > 2, and as x, y > 0, we have (x + y) 2 − p ≤ x 2 − p + y 2 − p ,
Z ∞ 1/p
s 1 1 1 1 p
≤ e− p ((log(2n)) 2 − p + s 2 − p ) ds
0
then by Minkowski’s inequality (i.e., ∥f + g∥Lp ≤ ∥f ∥Lp + ∥g∥Lp ),
Z ∞ 1/p Z ∞ 1/p
s 1 1 s 1 1
≤ (e− p (log(2n)) 2 − p )p ds + (e− p s 2 − p )p ds
0 0

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 59


Week 18: Tighter Bounds on Sub-Gaussian Matrices

then by some direct calculations,


Z ∞ 1/p Z ∞ 1/p
1 1 p
= (log(2n)) 2−p e −s
ds + e −s
s 2 −1 ds
0 0
1 1
 p 1/p
= (log(2n)) 2 − p + Γ
2
1 p−1
1 1
2−p
π 2p p 2p
≤ (log(2n)) + 1 1 1
− 6p12
.
22−p e2
PN
Let σ 2 := ∥ i=1 A2i ∥. By Exercise 5.4.12, for any t ≥ 0,

N p ! N
!
X X t2/p
P ε i Ai ≥t =P εi Ai ≥ t1/p ≤ 2ne− 2σ2 .
i=1 i=1
p
Then with the substitution t =: σ 2(log(2n) + s) , by Lemma 1.2.1 and Minkowski’s inequality,
p

" p #!1/p
 √ p  p ! 1/p
N
X Z σ 2 log(2n) Z ∞ N
X
E εi Ai =  + √ p  P ε i Ai ≥t dt
i=1 0 σ 2 log(2n) i=1
 √ p 1/p
Z σ 2 log(2n) Z ∞ 2/p
− t2σ2
≤ 1 dt + √ p 2ne dt
0 σ 2 log(2n)



!1/p
2σ)p p
Z
−s (
p p p
−1
= σ 2 log(2n) + e (log(2n) + s) 2 ds
0 2
1/p
√ p ∞ −s
 Z
p p p
= 2σ log(2n) + e (log(2n) + s) 2 −1 ds
2 0
Z ∞ 1/p !
√ p  p 1/p −s p
−1
≤ 2σ log(2n) + e (log(2n) + s) 2 ds
2 0

plugging in the bound we have established in the beginning,


p−1
1
!!
√  p 1/p π 2p p 2p
1p>2 + 1 − 1 1 − 1
1 1
−p
p
≤ 2σ log(2n) + (log(2n)) 2
2 2 2 p e 2 6p2
p+1
1
!

  p 1/p 
π 2p p 2p
(log(2n)) 1p>2
1
−p
p
= 2σ 1+ log(2n) + √ 1 − 1
2 2e 2 6p2


 
1 p π √
≤ 2σ (1 + e e log(16) ) log(2n) + √ 1 p
2e 3
p
≍ σ p + log n,

which is exactly what we want to show. ⊛

Problem (Exercise 5.4.14). Let X be an n × n random matrix that takes values ek e⊤


k , k = 1, . . . , n,
with probability 1/n each. (Here (ek ) denotes the standard basis in Rn .) Let X1 , . . . , XN be
PN
independent copies of X. Consider the sum S = i=1 Xi , which is a diagonal matrix.
(a) Show that the entry Sii has the same distribution as the number of balls in i-th bin when N
balls are thrown into n bins independently.
(b) Relating this to the classical coupon collector’s problem, show that if N ≍ n, then

log n
E∥S∥ ≍ .
log log n

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 60


Week 18: Tighter Bounds on Sub-Gaussian Matrices

Deduce that the bound in Exercise 5.4.11 would fail if the logarithmic factors were removed
from it.

Answer. (a) We see that X = ek e⊤ k with k being chosen uniformly randomly among [n], where
ek e⊤
k is a matrix with all 0’s except the k th diagonal element being 1. Hence, by interpreting
each Xi as “throwing a ball into n bins,” Skk records the number of balls in the k th bin when
N balls are thrown into n bins independently.
(b) We first observe that since S is diagonal, ∥S∥ = λ1 (S) = maxk Skk as all the diagonal elements
are eigenvalues of S. We first answer the question of how this related to the coupon collector’s
problem. Firstly, let’s introduce the problem formally:

Problem 5.4.1 (Coupon collector’s problem). Say we have n different types of coupons
to collect, and we buy N boxes, where each box contains a (uniformly) random type of
coupon. The classical coupon collector’s problem asks for the expected number of boxes
(i.e., N ) we need in order to collect all coupons.

Intuition. From (a), we can view Skk as the number of coupons we have collected for the
k th type of the coupon, where N is the number of boxes we have bought.

Hence, the coupon collector’s problem asks for the expected N we need for λn (S) = mink Skk >
0, while (b) is asking for the expected number of the most frequent coupons (i.e., maxk Skk )
we will see when buying only N ≍ n boxes.
Next, let’s prove the upper bound and the lower bound separately. Let 0 < c < C to be some
constants satisfying N ≤ Cn and n ≤ cN .

Claim (Upper bound). E[∥S∥] ≲ log n/ log log n.


Proof. We first note that Skk ∼ Binomial(N, 1/n) for all k, so by Exercise 2.4.3, for any Fix
m > N/n, we have
n
X N m log log n
P(∥S∥ ≥ m) = P(∃k : Skk ≥ m) ≤ P(Skk ≥ m) ≤ 3 n +1− log n .
k=1
j k
(C+1) log n
Let L := log log n + 1 > C + 1 > N/n, then


L−1
!
X X
E[∥S∥] = + P(∥S∥ ≥ m)
m=1 m=L
L−1 ∞
X X N m log log n
≤ 1+ 3 n +1− log n

m=1 m=L
N L log log n
3 n +1− log n (C + 1) log n 3C+1−(C+1) (C + 25 ) log n
=L−1+ log log n ≤ + 2 log log n = ,
1 − 3− log n log log n 3 · log n
log log n

establishing the desired upper bound. ⊛

The hard part lies in the lower bound. We will need the following fact.

i.i.d.
Lemma 5.4.1 (Maximum of Poisson [Kim83; BSP09]). Given Y1 , . . . , Yn ∼ Pois(1),
 
log n
E max Yk ≍ .
1≤k≤n log log n

Such a concentration is very tight.

Claim (Lower bound). E[∥S∥] ≳ log n/ log log n

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 61


Week 18: Tighter Bounds on Sub-Gaussian Matrices

Pn i.i.d.
Proof. Let MTP := E[maxk {Yk } | k=1 Yk = T ] with Y1 , . . . , Yn ∼ Pois(1). As
n
(Y1 , . . . , Yn ) | k=1 Yk = T ∼ Multinomial(T
Pn ; 1/n, . . . , 1/n), we know that MT is non-
decreasing w.r.t. T . Moreover, as k=1 Yk ∼ Pois(n), by the law of total expectation
and maximum of Poisson lemma,
 1

⌊ne2+ 2e ⌋ ∞
 e−n nT
 
log n X X
≍ E max Yk =  + MT

log log n T!

1≤k≤n
T =0 1
T =⌊ne2+ 2e ⌋+1
1
⌊ne2+ 2e ⌋ ∞
X e−n nT X e−n nT
≤ M 2+ 2e
1 + T
T! ⌊ne ⌋ T!
T =0 1
T =⌊ne2+ 2e ⌋+1

X e−n nT
≤M 1 ·1+
⌊ne2+ 2e ⌋ Γ(T )
1
T =⌊ne2+ 2e ⌋+1

From Stirling’s approximation, Γ(z) ≥ 2πz z−1/2 e−z for z > 0,

X e−n nT
≤ M 2+ 2e
1 + √
⌊ne ⌋
2+ 1
2πT T −1/2 e−T
T =⌊ne 2e ⌋+1

∞ 1
!T
e−n X neT 2T
= M 2+ 2e
1 +√
⌊ne ⌋ 2π T
1
T =⌊ne2+ 2e ⌋+1

since for all x > 0, x1/2x ≤ e1/2e , for x = T , we have


∞ 1
!T
e−n X ne1+ 2e
≤ M 2+ 2e
1 +√
⌊ne ⌋ 2π T
1
T =⌊ne2+ 2e ⌋+1

e−n X
≤M 1 +√ e−T
⌊ne2+ 2e ⌋ 2π 1
T =⌊ne2+ 2e ⌋+1
2+ 1
e−n−⌊ne 2e ⌋−1
= M 2+ 2e
1 + √ ,
⌊ne ⌋ 2π(1 − e−1 )

leading to
log n
M 1 ≳
⌊ne2+ 2e ⌋ log log n
as the trailing term is decreasing exponentially fast. Finally, we have
& 1
' & 1
'
⌊ne2+ 2e ⌋ ne2+ 2e 1
M 2+ 2e 1 ≤M &
2+ 1
' ≤ MN ≤ MN ≤ ⌈ce2+ 2e ⌉MN ,
⌊ne ⌋ ⌊ne 2e ⌋ N N
N N

where the second inequality follows from the triangle inequality of max. This leads to
1 log n
E[∥S∥] = MN ≥ M 1 ≳
1
⌈ce2+ 2e ⌉ ⌊ne2+ 2e ⌋ log log n

as desired. ⊛

Finally, the bound in Exercise 5.4.11 will fail if the logarithmic factors were removed becomes
obvious after a direct substitution. Indeed, since ∥Xi ∥ = 1 =: K, Exercise 5.4.11 states that
" N
# N 1/2
X X
E Xi ≲ E[Xi2 ] ,
i=1 i=1

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 62


Week 18: Tighter Bounds on Sub-Gaussian Matrices

where theP logarithmic factors were removed along with K = 1. Now, using the bound
N
for S := i=1 Xi we have, with the observation that
p Xi = Xi and E[Xi ] = E[Xi ] =
2 2

diag(1/n, . . . , 1/n), we see that the bound becomes N/n = Θ(1), while the left-hand side
grows as log n/ log log n → ∞, which is clear not valid.

Remark (Alternative examples). We give another example to demonstrate the sharpness of the matrix
Bernstein’s inequality. Consider the following random n × n matrix (slightly different from S)
N X
n
(N )
X
T := bik ek e⊤
k,
i=1 k=1

(N ) i.i.d. Pn (N )
where bik ∼ Ber(1/N ). Here, we view Xi := ⊤
k=1 bik ek ek

Intuition. In expectation, T and S should behave the same. However, this is easier to work
with from independence.

Claim. As N → ∞, with Yk ∼ Pois(1), we have


     
log n
E[λ1 (T )] = E max Tkk → E max Yk = Θ .
1≤k≤n 1≤k≤n log log n

Noticeably, the above claim doesn’t require n to vary with N .

Proof. For every k ∈ [n], we apply the Poisson limit theorem since as N → ∞, pN,ik = 1/N → 0
k D
PN (N )
and E[SNk
] = E[ i=1 bik ] = 1 =: λ as N → ∞. So as N → ∞, SN → Pois(1).
PN (N )
With a similar interpretation as in (a), we can interpret SN k
= i=1 bik as the value
D
of the k th diagonal element of T , i.e., Tkk . Hence, as N → ∞, for all k, Tkk → Yk where
i.i.d. D
Yk ∼ Pois(1). Since Tkk ’s are independent, we have T → diag(Z1 , . . . , Zn ), therefore
     
log n
E[λ1 (T )] = E max Tkk → E max Yk ≍ Θ
1≤k≤n 1≤k≤n log log n

from the maximum of Poisson lemma. ⊛


PN
A simple calculation of ∥ i=1 E[Xi ]∥
2 1/2
reveals that the logarithmic factors can’t be removed.

Problem (Exercise 5.4.15). Let X1 , . . . , XN be independent, mean zero, m × n random matrices,


such that ∥Xi ∥ ≤ K almost surely for all i. Prove that for t ≥ q0, we have
N
!
t2 /2
X  
P Xi ≥ qt ≤ 2(m + n) exp − 2 ,
i=1
σ + Kt/3

where !
N
X N
X
σ 2 = max E[Xi⊤ Xi ] , E[Xi Xi⊤ ] .
i=1 i=1

Answer. Consider the following N independent (m + n) × (m + n) symmetric, mean 0 matrices

Xi⊤
 
′ 0n×n
Xi := .
Xi 0m×m

To apply the matrix Bernstein’s inequality (Theorem 5.4.1), we need to show that ∥Xi′ ∥ ≤ K ′ for

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 63


Week 18: Tighter Bounds on Sub-Gaussian Matrices

some K ′ , where we know that ∥Xi ∥ ≤ K. However, it’s easy to see that since ∥Xi ∥ = ∥Xi⊤ ∥, we
have ∥Xi′ ∥ ≤ K as well since the characteristic equation for Xi′ is

−λI Xi⊤
 

= det λ2 I − Xi⊤ Xi = 0,

det(Xi − λI) = det
Xi −λI

so ∥Xi′ ∥ =
p
∥Xi⊤ Xi ∥ ≤ K.

Claim. Actually, we have ∥Xi′ ∥ = ∥Xi ∥, hence ∥Xi′ ∥ ≤ K.

Proof. Observe that for any matrix A ∈ Rm×n , as ∥A∥ = λ1 (AA⊤ ) = λ1 (A⊤ A), we have
p p

s  v

 u  !
⊤ 2
0 A⊤
  
A A 0 u 0 A
∥A∥ = λ1 = λ1
t = .
0 AA⊤ A 0 A 0

Plugging in Xi =: A, we’re done. ⊛


Hence, from matrix Bernstein’s inequality, for every t ≥ q0,
N
!
t2 /2
X  
P Xi ≥ qt ≤ 2(m + n) exp − 2 ,
i=1
σ + Kt/3
PN
where σ 2 = ∥ i=1 E[(Xi′ )2 ]∥. A quick calculation reveals that
  ⊤
0 Xi⊤ 0 Xi⊤
  
′ 2 Xi Xi 0
(Xi ) = = ,
Xi 0 Xi 0 0 Xi Xi⊤

hence we have !
N
X N
X
2
σ = max E[Xi⊤ Xi ] , E[Xi Xi⊤ ] ,
i=1 i=1

which completes the proof. ⊛

5.5 Application: community detection in sparse networks


5.6 Application: covariance estimation for general distributions

CHAPTER 5. CONCENTRATION WITHOUT INDEPENDENCE 64


Chapter 6

Quadratic forms, symmetrization and


contraction

Week 19: Decoupling and Hanson-Wright Inequality


6 Jul. 2024
6.1 Decoupling
Problem (Exercise 6.1.4). Prove the following generalization of Theorem 6.1.1. Let A = (aij ) be an
n × n matrix. Let X1 , . . . , Xn be independent, mean zero random vectors in some Hilbert space.
Show that for every convex function F : R → R, one has
     
X X
E F  aij ⟨Xi , Xj ⟩ ≤ E F 4 aij ⟨Xi , Xj′ ⟩ ,
i,j : i̸=j i,j

where (Xi′ ) is an independent copy of (Xi ).

Answer. Omit. ⊛

Problem (Exercise 6.1.5). Prove the following alternative generalization of Theorem 6.1.1. Let
(uij )ni,j=1 be fixed vectors in some normed space. Let X1 , . . . , Xn be independent, mean zero
random variables. Show that, for every convex and increasing function F , one has
     
X X
E F  Xi Xj uij  ≤ E F 4 Xi Xj′ uij  ,
i,j : i̸=j i,j

where (Xi′ ) is an independent copy of (Xi ).

Answer. Omit. ⊛

6.2 Hanson-Wright Inequality


Problem (Exercise 6.2.4). Complete the proof of Lemma 6.2.3. Replace X ′ by g ′ ; write all details
carefully.

Answer. Omit. ⊛

Problem (Exercise 6.2.5). Give an alternative proof of Hanson-Write inequality for normal distribu-
tions, without separating the diagonal part or decoupling.

65
Week 19: Decoupling and Hanson-Wright Inequality

Answer. Omit. ⊛

Problem (Exercise 6.2.6). Consider a mean zero, sub-gaussian random vector X in Rn with ∥X∥ψ2 ≤
K. Let B be an m × n matrix. Show that
c
E exp λ2 ∥BX∥22 ≤ exp CK 2 λ2 ∥B∥2F provided |λ| ≤
  
.
K∥B∥

To prove this bound, replace X with a Gaussian random vector g ∼ N (0, Im ) along the following
lines:
(a) Prove the comparison inequality

E[exp λ2 ∥BX∥22 ] ≤ E[exp CK 2 λ2 ∥B ⊤ g∥22 ]


 

for every λ ∈ R.
(b) Check that
E[exp λ2 ∥B ⊤ g∥22 ] ≤ exp Cλ2 ∥B∥2F
 

provided that |λ| ≤ c/∥B∥.

Answer. Omit. ⊛

Problem (Exercise 6.2.7). Let X1 , . . . , Xn be independent, mean zero, sub-gaussian random vectors
in Rd . Let A = (aij ) be an n × n matrix. prove that for every t ≥ 0, we have
 
n
t2
  
X t
P aij ⟨Xi , Xj ⟩ ≥ t ≤ 2 exp −c min ,
K 4 d∥A∥2F K 2 ∥A∥
i,j : i̸=j

where K = maxi ∥Xi ∥ψ2 .

Answer. Omit. ⊛

6.3 Concentration of anisotropic random vectors


Problem (Exercise 6.3.1). Let B be an m × n matrix and X be an isotropic random vector in Rn .
Check that
E[∥BX∥22 ] = ∥B∥2F .

Answer. Omit. ⊛

Problem (Exercise 6.3.3). Let D be a k × m matrix and B be an m × n matrix. Prove that

∥DB∥F ≤ ∥D∥∥B∥F .

Answer. Let B = (b1 , . . . , bn ), then


n
X n
X
∥DB∥2F = ∥Dbi ∥22 ≤ ∥D∥2 ∥bi ∥22 = ∥D∥2 ∥B∥2F ,
i=1 i=1
qP
where we use the fact that ∥A∥ = i,j a2ij for any matrix A. ⊛

Problem (Exercise 6.3.4). Let E be a subspace of Rn of dimension d. Consider a random vector

CHAPTER 6. QUADRATIC FORMS, SYMMETRIZATION AND CONTRACTION 66


Week 20: The Symmetrization Trick

X = (X1 , . . . , Xn ) ∈ Rn with independent, mean zero, unit variance, sub-gaussian coordinates.


(a) Check that √
(E[dist(X, E)2 ])1/2 = n − d.

(b) Prove that for any t ≥ 0, the distance nicely concentrates:


 √ 
P dist(X, E) − n − d > t ≤ 2 exp −ct2 /K 4


where K = maxi ∥Xi ∥ψ2 .

Answer. Omit. ⊛

Problem (Exercise 6.3.5). Let B be an m×n matrix, and let X be a mean zero, sub-gaussian random
vector in Rn with ∥X∥ψ2 ≤ K. Prove that for any t ≥ 0, we have

ct2
 
P(∥BX∥2 ≥ CK∥B∥F + t) ≤ exp − 2 .
K ∥B∥2

Answer. Omit. ⊛

Problem (Exercise 6.3.6). Show that there exists a mean zero, isotropic, and sub-gaussian random
vector X in Rn such that
√ 1
P(∥X∥2 = 0) = P(∥X∥2 ≥ 1.4 n) = .
2

In other words, ∥X∥2 does not concentrate near n.

Answer. Omit. ⊛

Week 20: The Symmetrization Trick


13 Jul. 2024
6.4 Symmetrization
Problem (Exercise 6.4.1). Let X be a random variable and ξ be an independent symmetric Bernoulli
random variable.
(a) Check that ξX and ξ|X| are symmetric random variables, and they have the same distribution.

(b) If X is symmetric, show that the distribution of ξX and ξ|X| is the same as of x.
(c) Let X ′ be an independent copy of X. Check that X − X ′ is symmetric.

Answer. (a) For any random variable X and a symmetric Bernoulli random variable ξ, we first
D
prove that ξX = −ξX, i.e., P(ξX ≥ t) = P(−ξX ≥ t) for any t ∈ R. Indeed, since

P(ξX ≥ t | ξ = 1) + P(ξX ≥ t | ξ = −1) P(X ≥ t) + P(−X ≥ t)


P(ξX ≥ t) = =
2 2
while
P(−ξX ≥ t | ξ = 1) + P(−ξX ≥ t | ξ = −1) P(−X ≥ t) + P(X ≥ t)
P(−ξX ≥ t) = = .
2 2
This proves that both ξX and ξ|X| are symmetric (by substituting X as |X|). Secondly, we

CHAPTER 6. QUADRATIC FORMS, SYMMETRIZATION AND CONTRACTION 67


Week 20: The Symmetrization Trick

D
show that ξX = ξ|X|, i.e., P(ξX ≥ t) = P(ξ|X| ≥ t) for any t ∈ R. Again, we have

P(ξ|X| ≥ t | ξ = 1) + P(ξ|X| ≥ t | ξ = −1)


P(ξ|X| ≥ t) =
2
P(|X| ≥ t) + P(−|X| ≥ t)
=
2
P(|X| ≥ t | X ≥ 0)P(X ≥ 0) + P(|X| ≥ t | X < 0)P(X < 0)
=
2
P(−|X| ≥ t | X ≥ 0)P(X ≥ 0) + P(−|X| ≥ t | X < 0)P(X < 0)
+
2
(P(X ≥ t) + P(−X ≥ t))P(X ≥ 0) + (P(−X ≥ t) + P(X ≥ t))P(X < 0)
=
2
(P(X ≥ t) + P(−X ≥ t))(P(X ≥ 0) + P(X < 0))
=
2
P(X ≥ t) + P(−X ≥ t)
= ,
2
which is just P(ξX ≥ t), as we desired.
D D
(b) Moreover, if X is symmetric, we want to show that ξX = ξ|X| = X. The first equation is
from (a); as for the second, we see that for any t ≥ 0,

P(X ≥ t) + P(−X ≥ t)
P(X ≥ t) = P(−X ≥ t) = = P(ξX ≥ t)
2
from the proof of (a).
D D
(c) It suffices to show that X − X ′ = X ′ − X, but this is trivial since (X, X ′ ) = (X ′ , X).

Problem (Exercise 6.4.3). Where in this argument did we use the independence of the random
variables Xi ? Is mean zero assumption needed for both upper and lower bounds?

Answer. If Xi ’s are not independent, then {εi (Xi − Xi′ )}N


i=1 might not have the same joint distri-
bution as {(Xi − Xi′ )}N
i=1 . For the mean zero assumption, see Exercise 6.4.4. ⊛

Problem (Exercise 6.4.4). (a) Prove the following generalization of Symmetrization Lemma 6.4.2
for random vectors Xi that do not necessarily have zero means:
" N N
# " N #
X X X
E Xi − E[Xi ] ≤ 2E εi Xi .
i=1 i=1 i=1

(b) Argue that there can not be any non-trivial reverse inequality.

Answer. (a) We see that using Lemma 6.1.2 again, we have


" N N
# " N #
X X X
E Xi − E[Xi ] = E (Xi − E[Xi ])
i=1 i=1 i=1
" N
#
X
≤E ((Xi − E[Xi ]) − (Xi′ − E[Xi′ ]))
i=1

CHAPTER 6. QUADRATIC FORMS, SYMMETRIZATION AND CONTRACTION 68


Week 20: The Symmetrization Trick

as E[Xi ] = E[Xi′ ], and using Exercise 6.4.1, we have


" N #
X

=E (Xi − Xi )
i=1
" N
#
X
=E εi (Xi − Xi′ )
i=1
" N
# " N
# " N
#
X X X
≤E εi Xi +E εi Xi′ = 2E εi Xi .
i=1 i=1 i=1

(b) Let N = 1 and X1 = λ1 for some λ > 0. Then,

E[∥X1 − E[X1 ]∥2 ] = 0,

while
E[∥ε1 X1 ∥2 ] = λ∥1∥2
can be arbitrarily large as λ → ∞.

Problem (Exercise 6.4.5). Prove the following generalization of Symmetrization Lemma 6.4.2. Let
F : R+ → R be an increasing, convex function. Show that the same inequalities in Lemma 6.4.2
hold if the norm ∥·∥ is replaced with F (∥·∥), namely
" N
!# " N
!# " N
!#
1 X X X
E F εi Xi ≤E F Xi ≤E F 2 εi Xi .
2 i=1 i=1 i=1

Answer. We see that for the lower bound, we have


" n
!# " " n # !#
1 X 1 X
E F εi Xi =E F EX ′ ′
εi (Xi − Xi ) (EXi′ [εi Xi′ ] = 0)
2 i=1 2 i=1
" " n
#!#
1 X
≤ E F EX ′ εi (Xi − Xi′ )
2 i=1
(Jensen’s inequality, F increasing)
" n
!#
1 X
≤E F εi (Xi − Xi′ ) (Jensen’s inequality)
2 i=1
" n
!#
1 X
=E F (Xi − Xi′ ) (Exercise 6.4.1 (b) and (c))
2 i=1
" n n
!#
1 X 1 X ′
≤E F Xi + X (F increasing)
2
i=1
2 i=1 i
" n
! n
!#
1 X 1 X
≤E F Xi + F ′
Xi (F convex)
2 i=1
2 i=1
" n
!#
X
=E F Xi .
i=1

CHAPTER 6. QUADRATIC FORMS, SYMMETRIZATION AND CONTRACTION 69


Week 22: Contraction Trick

On the other hand, for the upper bound, we also have


" n
!# " " n # !#
X X
E F Xi =E F EX ′ (Xi − Xi′ ) (EXi′ [Xi′ ] = 0)
i=1 i=1
" " n
#!#
X
≤E F EX ′ (Xi − Xi′ ) (Jensen’s inequality, F increasing)
i=1
" n
!#
X
≤E F (Xi − Xi′ ) (Jensen’s inequality)
i=1
" n
!#
X
=E F εi (Xi − Xi′ ) (Exercise 6.4.1 (b) and (c))
i=1
" n n
!#
X X
≤E F εi Xi + εi Xi′ (F increasing)
i=1 i=1
" n
! n
!#
1 X 1 X
≤E F 2 εi Xi + F 2 εi Xi′ (F convex)
2 i=1
2 i=1
" n
!#
X
=E F 2 εi Xi .
i=1

Problem (Exercise 6.4.6). Let X1 , . . . , XN be independent, mean zero random variables. Show that
their sum i Xi is sub-gaussian if and only if i εi Xi is sub-gaussian, and
P P

N
X N
X N
X
c εi Xi ≤ Xi ≤C εXi .
i=1 ψ2 i=1 ψ2 i=1 ψ2

Answer. Consider P FK (x) := exp x2 /K 2 − 1 for some K ≥ 0, which is clearly convex. Hence, by

n
Exercise 6.4.5, if ∥ i=1 εi Xi ∥ψ2 ≤ K, then
" n
!# " n
!# " n
!#
X X X
E F2K Xi ≤ E F2K 2 εi Xi = E FK εi Xi ≤ 1,
i=1 i=1 i=1
Pn Pn
implying ∥ i=1 Xi ∥ψ2 ≤ 2K. Conversely, if ∥ i=1 Xi ∥ψ2 ≤ K, then
" n
!# " n
!# " n
!#
X 1 X X
E F2K εi Xi = E FK εi Xi ≤ E FK Xi ≤ 1,
i=1
2 i=1 i=1
Pn
thus ∥ i=1 εi Xi ∥ψ2 ≤ 2K. ⊛

Week 21: Random Matrices with Non-I.I.D. Entries


20 Jul. 2024
6.5 Random matrices with non-i.i.d. entries
6.6 Application: matrix completion
Week 22: Contraction Trick
25 Jul. 2024
6.7 Contraction Principle

CHAPTER 6. QUADRATIC FORMS, SYMMETRIZATION AND CONTRACTION 70


Week 22: Contraction Trick

Problem (Exercise 6.7.2). Check that the function f defined in (6/16) is convex. For reference,
f : RN → R is defined as " N #
X
f (a) := E a i εi xi .
i=1

Answer. To prove that for f : RN → R where


" N
#
X
f (a) = E ai εi xi
i=1

is convex, consider a, b ∈ RN and some λ ∈ (0, 1), we have


" N #
X
f (λa + (1 − λ)b) = E [λai + (1 − λ)bi ] εi xi
i=1
" N N
#
X X
≤E λ ai εi xi + (1 − λ) bi εi xi = λf (a) + (1 − λ)f (b),
i=1 i=1

implying that f is convex. ⊛

Problem (Exercise 6.7.3). Prove the following generalization of Theorem 6.7.1. Let X1 , . . . , XN be
independent, mean zero random vectors in a normed space, and let a = (a1 , . . . , an ) ∈ Rn . Then
" N # " N #
X X
E ai Xi ≤ 4∥a∥∞ · E Xi .
i=1 i=1

Answer. Let εi ’s be independent Bernoulli’s random variables, then from the symmetrization and
Theorem 6.7.1 with conditioning on Xi ’s, we have
" N # " N # " N # " N #
X X X X
E ai Xi ≤ 2E ai εi Xi ≤ 2∥a∥∞ · E εi Xi ≤ 4∥a∥∞ · E Xi ,
i=1 i=1 i=1 i=1

where the last inequality follows again from the symmetrization. ⊛


Problem (Exercise 6.7.5). Show that the factor log N in Lemma 6.7.4 is needed in general, and
is optimal. Thus, symmetrization with Gaussian random variables is generally weaker than sym-
metrization with symmetric Bernoullis.

Answer. Consider ei ’s being ith standard basis in RN , and consider Xi := εi ei for all i ≥ 1. We
have " N # " N #
X X
E Xi =E εi ei = E[∥(ε1 , . . . , εN )∥∞ ] = 1,
i=1 ∞ i=1 ∞

while given gi ∼ N (0, 1), we have


" N # " N #
X X p
E gi Xi =E gi εi ei = E[∥(g1 , . . . , gN )∥∞ ] ≍ log N
i=1 ∞ i=1 ∞

due to symmetry of gi ’s and Exercise 2.5.10 and 2.5.11. ⊛

Problem (Exercise 6.7.6). Let F : R+ → R be a convex increasing function. Generalize the sym-
metrization and contraction results of this and previous section by replacing the norm ∥·∥ with
F (∥·∥) throughout.

CHAPTER 6. QUADRATIC FORMS, SYMMETRIZATION AND CONTRACTION 71


Week 22: Contraction Trick

Answer. Omit. ⊛

Problem (Exercise 6.7.7). Consider a bounded subset T ⊆ Rn , and let ε1 , . . . , εn be independent


symmetric Bernoulli random variables. Let ϕi : R → R be contractions, i.e., Lipschitz functions
with ∥ϕi ∥Lip ≤ 1. Then " # " #
Xn n
X
E sup εi ϕi (ti ) ≤ E sup εi ti .
t∈T i=1 t∈T i=1

To prove this result, do the following steps:


(a) First let n = 2. Consider a subset T ⊆ R2 and contraction ϕ : R → R, and check that

sup(t1 + ϕ(t2 )) + sup(t1 − ϕ(t2 )) ≤ sup(t1 + t2 ) + sup(t1 − t2 ).


t∈T t∈T t∈T t∈T

(b) Use induction on n complete proof.

Answer. (a) Writing t by t′ in the second term on both sides, which gives

sup(t1 + ϕ(t2 )) + sup (t′1 − ϕ(t′2 )) = sup t1 + ϕ(t2 ) + t′1 − ϕ(t′2 )



t∈T t′ ∈T t,t′ ∈T

≤ sup t1 + t′1 + |t2 − t′2 |



t,t′ ∈T

= sup t1 + t′1 + t2 − t′2 = sup(t1 + t2 ) + sup (t′1 − t′2 ),



t,t′ ∈T t∈T t′ ∈T

where we use symmetry strategically.


(b) Firstly, we observe that conditioning on ε1 , . . . , εn−1 gives
" n−1
#
X
E sup εi ϕi (ti ) + εn ϕn (tn )
t∈T i=1
n−1 n−1
!
1 X X
= sup εi ϕi (ti ) + ϕn (tn ) + sup εi ϕi (ti ) − ϕn (tn )
2 t∈T i=1 t∈T i=1
n−1 n−1
! " n−1
#
1 X X X
≤ sup εi ϕi (ti ) + tn + sup εi ϕi (ti ) − tn = E sup εi ϕi (ti ) + εn tn ,
2 t∈T i=1 t∈T i=1 t∈T i=1

where the inequality comes from (a) by considering the supremum over
( n−1
)
X
(n) 2
T := (x, y) ∈ R : x = εi ϕi (ti ), y = tn , (t1 , . . . , tn−1 , tn ) ∈ T .
i=1

Explicitly, we get
" " n−1
# # " " n−1
# #
X X
E E sup εi ϕi (ti ) + εn ϕn (tn ) | ε1 : n−1 ≤ E E sup εi ϕi (ti ) + εn tn | ε1 : n−1 .
t∈T i=1 t∈T i=1

By iterating this with conditioning on ε1 : k for every k and apply (a) on


( k−1 n
)
X X
(k) 2
T := (x, y) ∈ R : x = εi ϕi (ti ) + εi ti , y = tk , (t1 , . . . , tn−1 , tn ) ∈ T ,
i=1 i=k+1

we get the desired result.

CHAPTER 6. QUADRATIC FORMS, SYMMETRIZATION AND CONTRACTION 72


Week 22: Contraction Trick

Problem (Exercise 6.7.8). Generalize Talagrand’s contraction principle for arbitrary Lipschitz func-
tions ϕi : R → R without restriction on their Lipschitz norms.

Answer. Look into the proof of Exercise 6.7.7, we see that for general Lipschitz functions ϕi ’s,
" n
# " n
# " n
#
X X X
E sup εi ϕi (ti ) ≤ E sup εi ∥ϕi ∥Lip ti ≤ max ∥ϕi ∥Lip E sup εi ti ,
t∈T i=1 t∈T i=1 1≤i≤n t∈T i=1

where the last inequality follows from Theorem 6.7.1, by noting that supt∈T satisfies all the condi-
tions we need in Theorem 6.7.1. ⊛

CHAPTER 6. QUADRATIC FORMS, SYMMETRIZATION AND CONTRACTION 73


Bibliography

[BSP09] K. M. Briggs, L. Song, and T. Prellberg. A note on the distribution of the maximum of a set
of Poisson random variables. 2009. arXiv: 0903.4373 [math.PR]. url: https://arxiv.org/
abs/0903.4373.
[Kim83] AC Kimber. “A note on Poisson maxima”. In: Zeitschrift für Wahrscheinlichkeitstheorie und
Verwandte Gebiete 63.4 (1983), pp. 551–552.
[Ver24] Roman Vershynin. High-Dimensional Probability. Vol. 47. Cambridge University Press, 2024.
url: https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html.
[Vla+20] Mariia Vladimirova et al. “Sub-Weibull distributions: Generalizing sub-Gaussian and sub-
Exponential properties to heavier tailed distributions”. In: Stat 9.1 (Jan. 2020). issn: 2049-
1573. doi: 10.1002/sta4.318. url: http://dx.doi.org/10.1002/sta4.318.

74

You might also like