0% found this document useful (0 votes)
17 views5 pages

(MIT 18.656) Lecture 10 Notes

Lecture 10 notes from 18.656.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

(MIT 18.656) Lecture 10 Notes

Lecture 10 notes from 18.656.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Lecture 10: Matrices Review

Isabella Zhu
11 March 2025

§1 Last Lecture Wrapup


We will wrap up the proof from lecture 9.

Theorem 1.1
Assume INC(k) with k equal to the sparsity of θ∗ (i.e. k = |θ∗ |0 ). Fix
p p
2τ = 8σ log(2d)/n + 8σ log(1/δ)/n.

Then, the MSE of the lasso estimator is at most

L σ 2 |θ∗ |0
2
MSE(Xθ̂ ) ≤ 32kτ ≲ log(d/δ)
n
Moreover,
|θ̂ − θ∗ |22 ≤ 2MSE(Xθ̂L )
all happening with probability at least 1 − δ.

Proof. For the five hundred millionth time, we start with the good ole basic inequality

|Xθ̂ − Xθ∗ |22 ≤ 2⟨ϵ, Xθ̂ − Xθ∗ ⟩ + 2nτ |θ∗ |1 − 2nτ |θ̂|1

We bound
2⟨ϵ, Xθ̂ − Xθ∗ ⟩ ≤ 2|XT ϵ|∞ · |θ̂ − θ∗ |1
We bound the highest column norm of X. We have
n
|Xj |22 = (XT X)jj ≤ n + ≤ 2n
32k
by the incoherence property. Therefore, we get
τ
2⟨ϵ, Xθ̂ − Xθ∗ ⟩ ≤ 2|XT ϵ|∞ · |θ̂ − θ∗ |1 ≤ 2 · 2n · · |θ̂ − θ∗ |1 = nτ |θ̂ − θ∗ |1
4
To summarize, we’ve proved so far that

|Xθ̂ − Xθ∗ |22 ≤ nτ |θ̂ − θ∗ |1 + 2nτ |θ∗ |1 − 2nτ |θ̂|1

1
Isabella Zhu — 11 March 2025 Lecture 10: Matrices Review

We add nτ |θ̂ − θ∗ |1 on both sides.

|Xθ̂ − Xθ∗ |22 + nτ |θ̂ − θ∗ |1 ≤ 2nτ |θ̂ − θ∗ |1 + 2nτ |θ∗ |1 − 2nτ |θ̂|1

Now we take the support S into account. We have

|θ̂|1 = |θ̂S |1 + |θ̂S c |1 =⇒ |θ̂ − θ∗ |1 − |θ̂|1 = |θ̂S − θ∗ |1 − |θ̂S |1 .

Putting it together,
h i
|Xθ̂ − Xθ∗ |22 + |Xθ̂ − Xθ∗ |22 ≤ 2nτ |θ̂S − θ |1 + |θ |1 − |θ̂|S ≤ 4nτ |θ̂S − θ∗ |1
∗ ∗

We have that

|θ̂ − θ∗ |1 ≤ 4|θ̂S − θ∗ |1 ⇔ |θ̂S c − θS∗ c | ≤ 3|θ̂S − θS∗ |

which is exactly the cone condition! Everything below this is kinda suspicious because
I was playing squardle instead of paying attention. So for our lower bound, we get

2|X(θ̂ − θ∗ )|22
≥ |θ̂ − θ∗ |22
n
By Cauchy,

√ √
r
∗ 2k
|θ̂S − θ |1 ≤ k|θ̂s − θ |2 ≤ k||θ̂ − θ∗ |2 ≤

|Xθ̂ − Xθ∗ |2
n
Therefore, we get r
2k
|Xθ̂ − Xθ∗ |22 ≤ 4nτ
|Xθ̂ − Xθ∗ |2
n
from which we divide and square to get the desired result.

§2 Matrix Estimation
We will go over some linear algebra ”basics” which need to be known for later lectures.
Apparently this lecture will be ”boring to death” (not my words).

§2.1 SubGaussian Sequence Model


Our subGaussian sequence model is of the form Y = θ∗ + ϵ ∈ Rd . We can make this a
matrix problem by just reshaping each vector into a matrix.

If θ∗ is sparse, then we can just use θ̂HARD , so we aren’t utilizing matrix properties.

§2.2 An Aside: Netflix Prize 2006


Aka how Netflix got half the academic community to work for them for free. The problem
is the following: consider matrix M , with n users and m movies, such that Mi,j is how
the ith person rated the jth movie.

2
Isabella Zhu — 11 March 2025 Lecture 10: Matrices Review

Clearly, the matrix is very sparse. In fact, only 1% was filled. The goal was the
fill the rest of the matrix.

§2.2.1 A Simple Model


Consider where Mij only has two effects: user and movie. So,

Mij = ui · vj + noise.

For the simple model, we reduce the number of parameters from nm to n + m.

M = uv T + noise

The rank of uv T is 1. More generally, if the rank of M is r, we can write as


r
X
M= u(j) v (j)T
j=1

§3 Matrix Redux
§3.1 Eigenvalues and Eigenvectors
Square matrix A ∈ Rn×n . Defines eigenvalue and eigenvector Au = λu.
Fact 3.1. If A is symmetric, then all eigenvalues are real.
In this class, we will assume that all eigenvectors have norm 1.
Fact 3.2. If u1 , . . . un eigenvectors of symmetric A, they can form an orthogonal basis
for column span of A. We will call this the eigenbasis.

§3.2 Singular Value Decomposition


Let A ∈ Rm×n . The SVD of A is A written as

A = U DV T , U ∈ Rm×r , V ∈ Rr×n , D ∈ Rr×r

where r is the rank of A, U T U = Ir , V T V = Ir , D is diagonal with positive entries.

This implies that u1 , u2 , . . . ∈ colspan(A) and v1T , v2T , . . . vnT ∈ rowspan(A).

The vector form of this is r


X
A= λj uj vjT
j=1

Remark 3.3. We have AAT uj = λ2j uj and AT Avj = λ2j vj .

Consider the special case when A is positive semidefinite. The eigenvalues are positive
and are equal to the singular values. U and V become the same matrix. In this case,

||A||op = maxm |Ax|2 = λmax (A)


x∈B2

3
Isabella Zhu — 11 March 2025 Lecture 10: Matrices Review

§3.3 Vector Norms and Inner Products


Let A and B be matrices. The q-norm is defined as
!1/q
X
|A|q = |Aij |q
ij

Remark 3.4. Notep that |A|∞ =pmax |Aij | and |A|0 is the number of nonzero entries. We
also have |A|2 = T r(AT A) = T r(AAT ) = ||A||F .

Then we can define the inner product

⟨A, B⟩ = T r(AT B) = T r(AB T )

§3.4 Spectral Norms


Let A have singular values λ1 , . . . , λr . Consider vector λ = (λ1 , . . . , λr ). The Schatten
q-norm is defined as
||A||q = |λ|q
When q = 2, we have
||A||22 = |λ|22 = ||A||2F = |A|22
which can be derived trivially by plugging in SVD into T r(AT A).

When q = 1, we call this the nuclear/trace norm.


X
||A||1 = |λ|1 = λj = ||A||A

§3.5 Matrix Inequalities


Let A and B be positive semidefinite. Order their eigenvalues in decreasing order.

Theorem 3.5
Weyl. We have
max |λj (A) − λj (B)| ≤ ||A − B||op
j

Theorem 3.6
Hoffman-Wielaudt. We have
X
|λj (A) − λj (B)|2 ≤ ||A − B||2F
j

Theorem 3.7
1 1
Holder. We have for p
+ q
= 1,

⟨A, B⟩ ≤ ||A||p ||B||q

4
Isabella Zhu — 11 March 2025 Lecture 10: Matrices Review

§3.6 Eckert-Young
Also known as best rank-k approximation.

Lemma 3.8
Let matrix A be of rank r. Look at SVD A = rj=1 λj uj vjT and assume singular
P
values are in decreasing order. For any k ≤ r, define the truncated SVD
k
X
A= λj uj vjT
j=1

This matrix has rank k. Then, we have


r
X
||A − Ak ||2F = inf ||A − B||2F = λ2j
rank(B)≤k
j=k+1

You might also like