0% found this document useful (0 votes)
42 views5 pages

1 Applications of SVD: Least Squares Approximation: Lecture 8: October 21, 2021

Lecture 8 Mathematical Toolkit, Autumn 2021 Toyota Technological Institute at Chicago Madhur Tulsiani

Uploaded by

Pushkaraj Panse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views5 pages

1 Applications of SVD: Least Squares Approximation: Lecture 8: October 21, 2021

Lecture 8 Mathematical Toolkit, Autumn 2021 Toyota Technological Institute at Chicago Madhur Tulsiani

Uploaded by

Pushkaraj Panse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Mathematical Toolkit Autumn 2021

Lecture 8: October 21, 2021


Lecturer: Madhur Tulsiani

1 Applications of SVD: least squares approximation

We discuss another application of singular value decomposition (SVD) of matrices. Let


a1 , . . . , an ∈ Rd be points which we want to fit to a low-dimensional subspace. The goal
is to find a subspace S of Rd of dimension at most k to minimize ∑in=1 (dist( ai , S))2 , where
dist( ai , S) denotes the distance of ai from the closest point in S. We first prove the follow-
ing.

Claim 1.1 Let u1 , . . . , uk be an orthonormal basis for S. Then


k

(dist( ai , S))2 = ∥ ai ∥22 − ∑ ai , u j .


2
j =1

Proof: Complete u1 , . . . , uk to an orthonormal basis uk+1 , . . . , ud for all of Rd . For any


point v ∈ Rd , where exist c1 , . . . , cd ∈ R such that v = ∑dj=1 c j · u j . To find the distance
dist(v, S) = minu∈S ∥v − u∥, we need to find the point u ∈ S, which is closest to v.
Let u = ∑kj=1 b j · uk be an arbitrary point in S (any u ∈ S can be written in this form, since
u1 , . . . , uk form a basis for S). We have that
2
k d k d
∥ v − u ∥2 = ∑ (c j − b j ) · u j + ∑ c j · u j = ∑ ( c j − b j )2 + ∑ c2j ,

j =1 j = k +1
j =1 j = k +1

which is minimized when b j = c j for all j ∈ [k ]. Thus, the cloest point u ∈ S to v =


∑dj=1 c j · u j is given by u = ∑kj=1 c j · u j , with v − u = ∑dj=k+1 c j · u j . Moreover, since u1 , . . . , ud


form an orthonormal basis, we have c j = u j , v for all j ∈ [d], which gives

d d k k

∑ ∑ j ∑ j ∑ uj, v .
2
∥ v − u ∥2 = c2j = c 2
− c 2
= ∥ v ∥ 2

j = k +1 j =1 j =1 j =1

Using the above for each ai (as the point v) completes the proof.

1
Thus, the goal is to find a set of k orthonormal vectors u1 , . . . , uk to maximize the quantity
2
∑in=1 ∑kj=1 ai , u j . Let A ∈ Rn×d be a matrix with the ith row equal to aiT . Then, we need

to find orthonormal vectors u1 , . . . , uk to maximize ∥ Au1 ∥22 + · · · + ∥ Auk ∥22 . We will prove
the following.

Proposition 1.2 Let v1 , . . . , vr be the right singular vectors of A corresponding to singular values
σ1 ≥ · · · ≥ σr > 0. Then, for all k ≤ r and all orthonormal sets of vectors u1 , . . . , uk

∥ Au1 ∥22 + · · · + ∥ Auk ∥22 ≤ ∥ Av1 ∥22 + · · · + ∥ Avk ∥22

Thus, the optimal solution is to take S = Span (v1 , . . . , vk ). We prove the above by induc-
tion on k. For k = 1, we note that
D E
∥ Au1 ∥22 = AT Au1 , u1 ≤ max R AT A (v) = σ12 = ∥ Av1 ∥22 .
v∈Rd \{0}

To prove the induction step for a given k ≤ r, define


n o
Vk⊥−1 = v ∈ Rd | ⟨v, vi ⟩ = 0 ∀i ∈ [k − 1] .

First prove the following claim.

Claim 1.3 Given an orthonormal set u1 , . . . , uk , there exist orthonormal vectors u1′ , . . . , u′k such
that

- u′k ∈ Vk⊥−1 .

- Span (u1 , . . . , uk ) = Span u1′ , . . . , u′k .




2 2
- ∥ Au1 ∥22 + · · · + ∥ Auk ∥22 = ∥ Au1′ ∥2 + · · · + Au′k 2 .

Proof: We only provide a sketch of the proof here. Let S = Span ({u1 , . . . , uk }). Note that
dim Vk⊥−1 = d − k + 1 (why?) and dim(S) = k. Thus,
 
dim Vk⊥−1 ∩ S ≥ k + (d − k + 1) − d = 1 .

Hence, there exists u′k ∈ Vk⊥−1 ∩ S with u′k = 1. Completing this to an orthonormal basis

of S gives orthonormal u1′ , . . . , u′k with the first and second properties. We claim that this
already implies the third property (why?).

2
Thus, we can assume without loss of generality that the given vectors u1 , . . . , uk are such
that uk ∈ Vk⊥−1 . Hence,
∥ Auk ∥22 ≤ max ∥ Av∥22 = σk2 = ∥ Avk ∥22 .
v ∈V ⊥
k −1
∥v∥=1

Also, by the inductive hypothesis, we have that


∥ Au1 ∥22 + · · · + ∥ Auk−1 ∥22 ≤ ∥ Av1 ∥22 + · · · + ∥ Avk−1 ∥22 ,
which completes the proof. The above proof can also be used to prove that SVD gives the
best rank k approximation to the matrix A in Frobenius norm. We will see this in the next
homework.

2 Bounding the eigenvalues: Gershgorin Disc Theorem

We will now see a simple but extremely useful bound on the eigenvalues of a matrix, given
by the Gershgorin disc theorem. Many useful variants of this bound can also be derived
from the observation that for any invertible matrix S, the matrices S−1 MS and M have the
same eigenvalues (prove it!).
Theorem 2.1 (Gershgorin Disc Theorem) Let M ∈ Cn×n . Let Ri = ∑ j̸=i Mij . Define the

set
Disc( Mii , Ri ) := {z | z ∈ C, | x − Mii | ≤ Ri } .
If λ is an eigenvalue of M, then
n
[
λ ∈ Disc( Mii , Ri ) .
i =1

Proof: Let x ∈ Cn be an eigenvector corresponding to the eigenvalue λ. Let i0 =


argmaxi∈[n] {| xi |}. Since x is an eigenvector, we have
n
Mx = λx ⇒ ∀i ∈ [ n ] ∑ Mij z j = λxi .
j =1

In particular, we have that for i = i0 ,


n n xj xj
∑ Mi j x j
0
= λxi0 ⇒ ∑ Mi j x i 0
= λ ⇒ ∑ Mi j x i 0
= λ − Mi 0 i 0 .
j =1 j =1 0 j ̸ = i0 0

Thus, we have
x

∑ Mi j · j ∑ Mi j

| λ − Mi 0 i 0 | ≤ 0 x

0
= R i0 .
j ̸ = i0 i0 j ̸ = i0

3
2.1 An application to compressed sensing

The Gershgorin disc theorem is quite useful in compressed sensing, to ensure what is
known as the “Restricted Isometry Property” for the measurement matrices.

Definition 2.2 A matrix A ∈ Rk×n is said to have the restricted isometry property with parame-
ters (s, δs ) if
(1 − δs ) · ∥ x ∥2 ≤ ∥ Ax ∥2 ≤ (1 + δs ) · ∥ x ∥2
for all x ∈ Rn which satisfy |{i | xi ̸= 0}| ≤ s.

Thus, we want the transformation A to be approximately norm preserving for all sparse
vectors x. This can of course be ensured for all x by taking A = id, but we require k ≪ n
for the applications in compressed sensing. In general, the restricted isometry property
is NP-hard to verify and can thus also be hard to reason about for a given matrix. The
Gershgorin Disc Theorem lets us derive a much easier condition which is sufficient to
ensure the restricted isometry property.

Definition 2.3 Let A ∈ Rk×n be such that A(i) = 1 for each column A(i) of A. Define the

coherence of A as D E
µ( A) = max A(i) , A( j) .

i̸= j

We will prove the following



Proposition 2.4 Let A ∈ Rk×n be such that A(i) = 1 for each column A(i) of A. Then, for any

s, the matrix A has the restricted isometry property with parameters (s, (s − 1)µ( A)).

Note that the bound becomes meaningless if s ≥ 1 + µ(1A) . However, the above proposition
shows that it may be sufficient to bound µ( A) (which is also easier to check in practice).

Proof: Consider any x such that |{i | xi ̸= 0}| ≤ s. Let S denote the support of x i.e.,
S = {i | xi ̸= 0}. Let AS denote the k × |S| submatrix where we only keep the columns
corresponding to indices in S. Let xS denote x restricted to the non-zero entries. Then
D E
∥ Ax ∥2 = ∥ AS xS ∥2 = AST AS xS , xS .
D E
Thus, it suffices to bound the eigenvalues of the matrix AST AS . Note that ( AS )ij = A (i ) , A( j) .
Thus the diagonal entries are 1 and the off-diagonal entries are bounded by µ( A) in abso-
lute value. By the Gershgorin Disc Theorem, for any eigenvalue λ of A, we have

| λ − 1| ≤ ( s − 1) · µ ( A ) .

4
Thus, we have

(1 − (s − 1) · µ( A)) · ∥ x ∥2 ≤ ∥ Ax ∥2 ≤ (1 + (s − 1) · µ( A)) · ∥ x ∥2 ,

as desired.

The theorem is also very useful for bounding how much the eigenvalues of matrix change
due to a perturbation. We will see an example of this in the homework.

You might also like