(MIT 18.656) Lecture 10 Notes
(MIT 18.656) Lecture 10 Notes
Isabella Zhu
11 March 2025
Theorem 1.1
Assume INC(k) with k equal to the sparsity of θ∗ (i.e. k = |θ∗ |0 ). Fix
p p
2τ = 8σ log(2d)/n + 8σ log(1/δ)/n.
L σ 2 |θ∗ |0
2
MSE(Xθ̂ ) ≤ 32kτ ≲ log(d/δ)
n
Moreover,
|θ̂ − θ∗ |22 ≤ 2MSE(Xθ̂L )
all happening with probability at least 1 − δ.
Proof. For the five hundred millionth time, we start with the good ole basic inequality
|Xθ̂ − Xθ∗ |22 ≤ 2⟨ϵ, Xθ̂ − Xθ∗ ⟩ + 2nτ |θ∗ |1 − 2nτ |θ̂|1
We bound
2⟨ϵ, Xθ̂ − Xθ∗ ⟩ ≤ 2|XT ϵ|∞ · |θ̂ − θ∗ |1
We bound the highest column norm of X. We have
n
|Xj |22 = (XT X)jj ≤ n + ≤ 2n
32k
by the incoherence property. Therefore, we get
τ
2⟨ϵ, Xθ̂ − Xθ∗ ⟩ ≤ 2|XT ϵ|∞ · |θ̂ − θ∗ |1 ≤ 2 · 2n · · |θ̂ − θ∗ |1 = nτ |θ̂ − θ∗ |1
4
To summarize, we’ve proved so far that
1
Isabella Zhu — 11 March 2025 Lecture 10: Matrices Review
|Xθ̂ − Xθ∗ |22 + nτ |θ̂ − θ∗ |1 ≤ 2nτ |θ̂ − θ∗ |1 + 2nτ |θ∗ |1 − 2nτ |θ̂|1
Putting it together,
h i
|Xθ̂ − Xθ∗ |22 + |Xθ̂ − Xθ∗ |22 ≤ 2nτ |θ̂S − θ |1 + |θ |1 − |θ̂|S ≤ 4nτ |θ̂S − θ∗ |1
∗ ∗
We have that
which is exactly the cone condition! Everything below this is kinda suspicious because
I was playing squardle instead of paying attention. So for our lower bound, we get
2|X(θ̂ − θ∗ )|22
≥ |θ̂ − θ∗ |22
n
By Cauchy,
√ √
r
∗ 2k
|θ̂S − θ |1 ≤ k|θ̂s − θ |2 ≤ k||θ̂ − θ∗ |2 ≤
∗
|Xθ̂ − Xθ∗ |2
n
Therefore, we get r
2k
|Xθ̂ − Xθ∗ |22 ≤ 4nτ
|Xθ̂ − Xθ∗ |2
n
from which we divide and square to get the desired result.
§2 Matrix Estimation
We will go over some linear algebra ”basics” which need to be known for later lectures.
Apparently this lecture will be ”boring to death” (not my words).
If θ∗ is sparse, then we can just use θ̂HARD , so we aren’t utilizing matrix properties.
2
Isabella Zhu — 11 March 2025 Lecture 10: Matrices Review
Clearly, the matrix is very sparse. In fact, only 1% was filled. The goal was the
fill the rest of the matrix.
Mij = ui · vj + noise.
M = uv T + noise
§3 Matrix Redux
§3.1 Eigenvalues and Eigenvectors
Square matrix A ∈ Rn×n . Defines eigenvalue and eigenvector Au = λu.
Fact 3.1. If A is symmetric, then all eigenvalues are real.
In this class, we will assume that all eigenvectors have norm 1.
Fact 3.2. If u1 , . . . un eigenvectors of symmetric A, they can form an orthogonal basis
for column span of A. We will call this the eigenbasis.
Consider the special case when A is positive semidefinite. The eigenvalues are positive
and are equal to the singular values. U and V become the same matrix. In this case,
3
Isabella Zhu — 11 March 2025 Lecture 10: Matrices Review
Remark 3.4. Notep that |A|∞ =pmax |Aij | and |A|0 is the number of nonzero entries. We
also have |A|2 = T r(AT A) = T r(AAT ) = ||A||F .
Theorem 3.5
Weyl. We have
max |λj (A) − λj (B)| ≤ ||A − B||op
j
Theorem 3.6
Hoffman-Wielaudt. We have
X
|λj (A) − λj (B)|2 ≤ ||A − B||2F
j
Theorem 3.7
1 1
Holder. We have for p
+ q
= 1,
4
Isabella Zhu — 11 March 2025 Lecture 10: Matrices Review
§3.6 Eckert-Young
Also known as best rank-k approximation.
Lemma 3.8
Let matrix A be of rank r. Look at SVD A = rj=1 λj uj vjT and assume singular
P
values are in decreasing order. For any k ≤ r, define the truncated SVD
k
X
A= λj uj vjT
j=1