0% found this document useful (0 votes)
31 views5 pages

1 Singular Value Decomposition (Recap) : Lecture 7: October 19, 2021

Lecture 7 Mathematical Toolkit, Autumn 2021 Toyota Technological Institute at Chicago Madhur Tulsiani

Uploaded by

Pushkaraj Panse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views5 pages

1 Singular Value Decomposition (Recap) : Lecture 7: October 19, 2021

Lecture 7 Mathematical Toolkit, Autumn 2021 Toyota Technological Institute at Chicago Madhur Tulsiani

Uploaded by

Pushkaraj Panse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Mathematical Toolkit Autumn 2021

Lecture 7: October 19, 2021


Lecturer: Madhur Tulsiani

1 Singular Value Decomposition (recap)

We proved the following result in the previous lecture.

Proposition 1.1 Let V, W be finite-dimensional inner product spaces and let φ : V → W be a


linear transformation. Let σ12 ≥ σ22 ≥ · · · ≥ σr2 > 0 be the non-zero eigenvalues of φ∗ φ, and let
v1 , . . . , vr be a corresponding orthonormal eigenvectors (since φ∗ φ is self-adjoint, these are a subset
of some orthonormal eigenbasis). For w1 , . . . , wr defined as wi = φ(vi )/σi , we have that

1. {w1 , . . . , wr } form an orthonormal set. They are also eigenvectors for φφ∗ : W → W with
eigenvalues σ12 , . . . , σr2 .

2. For all i ∈ [r ]
φ(vi ) = σi · wi and φ∗ (wi ) = σi · vi .

The values σ1 , . . . , σr are known as the (non-zero) singular values of φ. For each i ∈ [r ], the
vector vi is known as the right singular vector and wi is known as the left singular vector
corresponding to the singular value σi .

Proposition 1.2 Let r be the number of non-zero eigenvalues of φ∗ φ. Then,

rank( φ) = dim(im( φ)) = r .

Using the above, we can write φ in a particularly convenient form. We first need the
following definition.

Definition 1.3 Let V, W be inner product spaces and let v ∈ V, w ∈ W be any two vectors. The
outer product of w with v, denoted as |w⟩ ⟨v|, is a linear transformation from V to W such that

|w⟩ ⟨v| (u) := ⟨v, u⟩ · w .

Note that if ∥v∥ = 1, then |w⟩ ⟨v| (v) = w and |w⟩ ⟨v| (u) = 0 for all u ⊥ v.

1
Exercise 1.4 Show that for any v ∈ V and w ∈ W, we have
rank (|w⟩ ⟨v|) = dim (im (|w⟩ ⟨v|)) = 1 .

We can then write φ : V → W in terms of outer products of its singular vectors.

Proposition 1.5 Let V, W be finite dimensional inner product spaces and let φ : V → W be a
linear transformation with non-zero singular values σ1 , . . . , σr , right singular vectors v1 , . . . , vr
and left singular vectors w1 , . . . , wr . Then,
r
φ = ∑ σi · |wi ⟩ ⟨vi | .
i =1

Exercise 1.6 If φ : V → V is a self-adjoint operator with dim(V ) = n, then the real spectral
theorem proves the existence of an orthonormal basis of eigenvectors, say {v1 , . . . , vn } with corre-
sponding eigenvalues λ1 , . . . , λn . Check that in this case, we can write φ as
n
φ = ∑ λi · | vi ⟩ ⟨ vi | .
i =1

Note that while the above decomposition has possibly negative coefficients (the λi s), the singular
value decomposition only has positive coefficients (the σi s). Why is this the case?

2 Singular Value Decomposition for matrices

Using the previous discussion, we can write matrices in convenient form. Let A ∈ Cm×n ,
which can be thought of as an operator from Cn to Cm . Let σ1 , . . . , σr be the non-zero singu-
lar values and let v1 , . . . , vr and w1 , . . . , wr be the right and left singular vectors respectively.
Note that V = Cn and W = Cm and v ∈ V, w ∈ W, we can write the operator |w⟩ ⟨v| as the
matrix wv∗ , there v∗ denotes v T . This is because for any u ∈ V, wv∗ u = w(v∗ u) = ⟨v, u⟩ · w.
Thus, we can write
r
A = ∑ σi · wi vi∗ .
i =1

Let W ∈ Cm ×rbe a matrix with w1 , . . . , wr as columns, such that ith column equals wi .
Similarly, let V ∈ Cn×r be a matrix with v1 , . . . , vr as the columns. Let Σ ∈ Cr×r be a
diagonal matrix with Σii = σi . Then, check that the above expression for A can also be
written as
A = WΣV ∗ ,
where V ∗ = V T as before.
We can also complete the bases {v1 , . . . , vr } and {w1 , . . . , wr } to bases for Cn and Cm re-
spectively and write the above in terms of unitary matrices.

2
Definition 2.1 A matrix U ∈ Cn×n is known as a unitary matrix if the columns of U form an
orthonormal basis for Cn .

Proposition 2.2 Let U ∈ Cn×n be a unitary matrix. Then UU ∗ = U ∗ U = id, where id denotes
the identity matrix.

Let {v1 , . . . , vn } be a completion of {v1 , . . . , vr } to an orthonormal basis of Cn , and let Vn ∈


Cn×n be a unitary matrix with {v1 , . . . , vn } as columns. Similarly, let Wm ∈ Cm×m be a
unitary matrix with a completion of {w1 , . . . , wr } as columns. Let Σ′ ∈ Cm×n be a matrix
with Σii′ = σi if i ≤ r, and all other entries equal to zero. Then, we can also write

A = Wm Σ′ Vn∗ .

3 Low-rank approximation for matrices

Given a matrix A ∈ Cm×n , we want to find a matrix B of rank at most k which “approxi-
mates” A. For now we will consider the notion of approximation in spectral norm i.e., we
want to minimize ∥ A − B∥2 , where

∥( A − B)v∥2
∥( A − B)∥2 = max .
v ̸ =0 ∥ v ∥2

⟨v, v⟩ denotes the norm defined by the standard inner product on Cn .


p
Here, ∥v∥2 =
The 2 in the notation ∥·∥2 comes from the express from the expression we get by ex-
pressing v in the orthonormal basis of the coordinate vectors. If v = (c1 , . . . , cn )T , then
 1/2
∥v∥2 = ∑in=1 |ci |2 which is simply the Euclidean norm we are familiar with 1 . Note
that while the norm here seems to be defined in terms of the coefficients, which p indeed
depend on the choice of the orthonormal basis, the value of the norm is in fact ⟨v, v⟩
which is just a function of the vector itself and not of the basis we are working with. The basis and
the coefficients merely provide a convenient way of computing the norm.
SVD also gives the optimal solution for another notion of approximation: minimizing the
Frobenius norm ∥ A − B∥ F , which equals (∑ij ( Aij − Bij )2 )1/2 . We will see this later. Let
A = ∑ri=1 wi vi∗ be the singular value decomposition of A and let σ1 ≥ · · · ≥ σr > 0. If k ≥ r,
we can simply use B = A since rank( A) = r. If k < r, we claim that Ak = ∑ik=1 σi wi vi∗ is
the optimal solution. If is easy to check the following.
1/p
general, one can consider the norm ∥v∥ p := ∑in=1 |ci | p
1 In for any p ≥ 1. While these are indeed valid
notions of distance satisfying a triangle inequality for any p ≥ 1, they do not arise as a square root of an inner
product when p ̸= 2.

3
Proposition 3.1 ∥ A − Ak ∥2 = σk+1 .

Proof: Complete v1 , . . . , vk to an orthonormal basis v1 , . . . , vn for Cn . Given any v ∈ Cn ,


we can uniquely express it as ∑in=1 ci · vi for appropriate coefficients c1 , . . . , cn . Thus, we
have
! !
r n r n r
∑ σj · w j v j ∗ ∑ ci · vi = ∑ ∑ ci σj · v j , vi · w j = ∑ c j σj · w j ,


( A − Ak )v =
j = k +1 i =1 j = k +1 i =1 j = k +1

where the last equality uses the orthonormality of {v1 , . . . , vn }. We can also complete
w1 , . . . , wr to an orthonormal basis w1 , . . . , wm for Cm . Since ( A − Ak ) is already expressed
in this basis above, we get that
2 * +
r r r r
= ∑ c j σj · w j = ∑ c j σj · w j , ∑ c j σj · w j = ∑
2 2 2
∥( A − A k ) v ∥2 c j · σ .

j
j = k +1 j = k +1 j = k +1 j = k +1
2

Finally, as in the computation with Rayleigh quotients, we have that for any v ̸= 0 ex-
pressed as v = ∑in=1 ci · vi ,
2 2
∥( A − Ak )v∥22 ∑rj=k+1 c j · σj2 ∑rj=k+1 c j · σk2+1
= ≤ ≤ σk2+1 .
∥v∥22 n
∑ i =1 i
| c | 2 n
∑ i =1 i
| c | 2

This gives that ∥ A − Ak ∥2 ≤ σk+1 . Check that it is in fact equal to σk+1 (why?)

In fact the proof above actually shows the following:

Exercise 3.2 Let M ∈ Cm×n be any matrix with singular values σ1 ≥ · · · σr > 0. Then, ∥ M ∥2 =
σ1 i.e., the spectral norm of a matrix is actually equal to its largest singular value.

Thus, we know that the error of the best approximation B is at most σk+1 . To show the
lower bound, we need the following fact.

Exercise 3.3 Let V be a finite-dimensional vector space and let S1 , S2 be subspaces of V. Then,
S1 ∩ S2 is also a subspace and satisfies

dim(S1 ∩ S2 ) ≥ dim(S1 ) + dim(S2 ) − dim(V ) .

We can now show the following.

Proposition 3.4 Let B ∈ Cm×n have rank( B) ≤ k and let k < r. Then ∥ A − B∥2 ≥ σk+1 .

4
Proof: By rank-nullity theorem dim(ker( B)) ≥ n − k. Thus, by the fact above

dim (ker( B) ∩ Span (v1 , . . . , vk+1 )) ≥ (n − k ) + (k + 1) − n ≥ 1 .

Thus, there exists a z ∈ ker( B) ∩ Span (v1 , . . . , vk+1 ) \ {0}. Then,

∥( A − B)z∥22 = ∥ Az∥22 = ⟨z, A∗ Az⟩ = R A∗ A (z) · ∥z∥22


 
≥ min R A∗ A (y) · ∥z∥22
y∈Span(v1 ,...,vk+1 )\{0}

≥ σk2+1 · ∥z∥22 .

Thus, there exists a z ̸= 0 such that ∥( A − B)z∥2 ≥ σk+1 · ∥z∥2 , which implies ∥ A − B∥2 ≥
σk+1 .

You might also like