8 - The Singular Value Decomposition: Cmda 3606 Mark Embree
8 - The Singular Value Decomposition: Cmda 3606 Mark Embree
Theorem 8.1 (Spectral Theorem) Suppose H ∈ C n×n is symmetric. For example, when
Then there exist n (not necessarily distinct) eigenvalues λ1 , . . . , λn and 3 −1
H= ,
−1 3
corresponding unit-length eigenvectors v1 , . . . , vn such that
we have λ1 = 4 and λ2 = 2, with
√ √
Hvj = λ j vj .
v1 = √2/2 , v2 = √2/2 .
− 2/2 2/2
The eigenvectors form an orthonormal basis for C n : Note that these eigenvectors are unit
vectors, and they are orthogonal.
C n = span{v1 , . . . , vn }
Definition 8.1 A symmetric matrix H ∈ C n×n is positive definite For the example above,
Since rank(A) = n, notice that Keep this in mind: If rank(A) < n, then
dim(N(A)) > 0, so there exist x 6= 0
dim(N(A)) = n − rank(A) = 0. for which x∗ A∗ Ax = kAxk2 = 0. Hence
A∗ A will only be positive semidefinite
in this case.
Since the null space of A is trivial, Ax 6= 0 whenever x 6= 0, so
x∗ A∗ Ax = kAxk2 > 0
q
Step 2. Define σj = kAvj k = λ j , j = 1, . . . , n.
Note that σj2 = kAvj k22 = v∗j A∗ Avj = λ j . Since the eigenvalues
λ1 , . . . , λn are decreasing in size, so too are the σj values:
σ1 ≥ σ2 ≥ · · · ≥ σn > 0.
1 1 ∗ ∗
u∗j uk = (Avj )∗ (Avk ) = v A Avk .
σj σk σj σk j
1 ∗ ∗ 1 ∗ λj ∗
vj A Avk = vj (λk vk ) = vv.
σj σk σj σk σj σk j k
8. the singular value decomposition 4
u∗j uk = 0, j 6= k.
b Σ, ..
AV = U b (8.2) .
We now have all the ingredients for various forms of the sin-
gular value decomposition. Since the eigenvectors vj of the symmet-
ric matrix A∗ A are orthonormal, the square matrix V has orthonor-
mal columns. This means that
V∗ V = I,
The inverse of a square matrix is
unique: since V∗ does what the inverse
since the ( j, k ) entry of V∗ V is simply v∗j vk . Since V is square, the
of V is supposed to do, i.e., V∗ V = I, it
equation V∗ V
= I implies that V∗ = V−1 .
Thus, in addition to V∗ V, must be the unique matrix V−1 .
we also have
VV∗ = VV−1 = I.
Thus multiplying both sides of equation (8.2) on the right by V∗ gives
Σ V∗
b ∗.
b ΣV
b
A=U (8.3) A = U
b
What can be said of the matrix U b ∈ C m×n ? Recall that its columns, When m > n, there must exist
the vectors u1 , . . . , un , are orthonormal. However, in contrast to V, we some nonzero z ∈ C m such that
b ∗ z = 0.
z ⊥ u1 , . . . , un , which implies U
cannot conclude that U bU b ∗ = I when m > n. Why not? Because when Hence U bU b ∗ z = 0, so we cannot have
m > n, Ub has a nontrivial null space, and hence cannot be invertible. U
bU b ∗ = I. However, U bUb ∗ ∈ C m×m
is a projector onto the n-dimensional
subspace span{u1 , . . . , un } of C m .
We wish to augment the matrix U b with m − n additional column
vectors, to give a full set of m orthonormal vectors in C m . Here is the
recipe to find these extra vectors: For j = n + 1, . . . , m, pick
u j ⊥ span{u1 , . . . , u j−1 }
U∗ U = I.
Finally, we arrive at the main result, the full singular value decomposi-
tion, for the case where rank(A) = n.
A = UΣV∗ ,
U∗ U = I ∈ C m×m , V∗ V = I ∈ C n×n ,
and Σ ∈ C m×n is zero everywhere except for entries on the main diagonal,
where the ( j, j) entry is σj , for j = 1, . . . , n and
σ1 ≥ σ2 ≥ · · · ≥ σn > 0.
8. the singular value decomposition 6
[U,S,V] = svd(A).
b ∗,
b ΣV
A=U
bΣ
Now notice that you can write A = (U b )V∗ as
v1∗
v2∗ n
σ1 u1 σ1 u2 · · · σn un ∑ σj u j v∗j ,
.. =
.
j =1
v∗n
This expression is called the dyadic form of the SVD. Because we have
ordered σ1 ≥ σ2 ≥ · · · ≥ σn , the leading terms in this sum dominate
the others. This fact plays a crucial role in applications where we
want to approximate a matrix with its leading low-rank part.
A= 0 0 ,
√ √
2 − 2
8. the singular value decomposition 7
u3 = 1 .
0
8. the singular value decomposition 8
1 1 0 1 0 2 0 √ √
√ − 2/2
2/2
0 0 = 0 0 10 2 √ √ .
√ √ 2/2 2/2
2 − 2 1 0 0 0 0
1 1 0 √ 1 √
√ √ √
0 0 = 2 0 [ 2/2 − 2/2 ] + 2 0 [ 2/2 2/2 ]
√ √
2 − 2 1 0
0 0 1 1
= 0 0 +0 0.
√ √
2 − 2 0 0
dim(N(A)) = n − rank(A) = n − r.
λ1 ≥ λ2 ≥ · · · ≥ λr > 0, λr+1 = · · · = λn = 0.
σ1 ≥ σ2 ≥ · · · ≥ σr > 0, σr+1 = · · · = σn = 0.
The third step of the SVD construction needs alteration, since we can
only define the left singular vectors via u j = Avj /σj when σj > 0, that
is, for j = 1, . . . , r. Any choice for the remaining vectors, ur+1 , . . . , un ,
will trivially satisfy the equation Avj = σj u j , since Avj = 0 and
σj = 0 for j = r + 1, . . . , n. Since we are building U b ∈ C m×n (and
eventually U ∈ C m × m ) to have orthonormal, we will simply build out
ur+1 , . . . , un so that all the vectors u1 , . . . , un are orthonormal.
σ1 σ1
.. ..
.
.
σr σr
Σ= = ∈ C n×n ,
b
σr+1 0
.. ..
. .
σn 0
and
| | | |
V = v1 · · · vr vr+1 · · · vn ∈ C n×n .
| | | |
Notice that V is still a square matrix with orthonormal columns, so
V∗ V = I and V−1 = V∗ . Since Avj = σj u j holds for j = 1, . . . , n, we
again have the reduced singular value decomposition
b ∗.
b ΣV
A=U
b ∈ C m×n can be enlarged to give U ∈ C n×n by supply-
As before, U
ing extra orthogonal unit vectors that complete a basis for C m :
u j ⊥ span{u1 , . . . , u j−1 }, ku j k = 1, j = n + 1, . . . , m.
Constructing U ∈ C m×m as in (8.4) and Σ ∈ C m×n as in (8.5), we
have the full singular value decomposition
A = UΣV∗ .
but we get more insight if we crop the trivial terms from this sum.
Since σr+1 = · · · = σn = 0, we can truncate the decomposition to
its first r terms in the sum:
r
A= ∑ σj u j v∗j .
j =1
AA∗ u j = λ j u j , j = 1, . . . , n,
AA∗ u j = 0u j , j = n + 1, . . . , m.
Thus the columns u1 , . . . , un are eigenvectors of AA∗ . Notice then This suggests a different way to com-
pute the U matrix: form AA∗ and
that AA∗ and A∗ A have the same eigenvalues, except that AA∗ has compute all its eigenvectors, giving
m − n extra zero eigenvalues. u1 , . . . , um all at once. Thus we avoid
the need for a special procedure to
construct unit vectors orthogonal to
8.7 Modification for the case of m < n u1 , . . . , ur .
λ1 ≥ λ2 ≥ · · · ≥ λ m
u1 , u2 , · · · , u m
8. the singular value decomposition 12
q
Step 2. Define σj = kA∗ u j k = λ j , j = 1, . . . , m.
Step 3a. Define vj = A∗ u j /σj for j = 1, . . . , r.
Step 3b. Construct orthonormal vectors vr+1 , . . . , vm .
Notice that these vectors only arise in the rank-deficient case, Steps 3a and 3b construct a matrix
b ∈ C n×m with orthonormal columns.
V
when r < m.
b = diag(σ1 , . . . , σm ) ∈ C m×m ,
Σ
A = UΣ
bVb ∗.
= U Σ
b b∗
V
A = UΣV∗ ,
U∗ U = I ∈ C m×m , V∗ V = I ∈ C n×n ,
and Σ ∈ C m×n is zero everywhere except for entries on the main diagonal,
where the ( j, j) entry is σj , for j = 1, . . . , min{m, n} and
write
r
A= ∑ σj u j v∗j . (8.8)
j =1
R(A) = {Ax : x ∈ C n } ⊆ C m ,
N(A) = {x ∈ C n : Ax = 0} ⊆ C n ,
N(A∗ ) = {y ∈ C m : A∗ y = 0} ⊆ C m .
where in the last step we have switched the order of the scalar v∗j x
and the vector u j . We see that Ax is a weighted sum of the vectors
u1 , . . . , ur . Since this must hold for all x ∈ C n , we conclude that
R(A) ⊆ span{u1 , . . . , ur }.
8. the singular value decomposition 14
R(A∗ ) = span{v1 , . . . , vr }.
Equation (8.9) for Ax is also the key that unlocks the null space
N(A). For what x ∈ C n does Ax = 0? Let us consider
r ∗ r
kAxk2 = (Ax)∗ (Ax) = ∑ σj v∗j x u j ∑ σk v∗k x uk
j =1 k =1
Since the left singular vectors are orthogonal, u∗j uk = 0 for j 6= k, this
double-sum collapses: only the terms with j = k make a nontrivial
contribution:
r r
kAxk2 = ∑ σj x∗ vj σj v∗j x u∗j u j = ∑ σj2 |v∗j x|2 ,
(8.11)
j =1 j =1
since u∗j u j = 1 and (x∗ vj )(v∗j x) = |v∗j x|2 . If z is complex, then z∗ z = zz = |z|2 .
8. the singular value decomposition 15
v∗j x = 0, j = 1, . . . , r,
N(A∗ ) = span{ur+1 , . . . , um }.
R(A) = span{u1 , . . . , ur }
N(A∗ ) = span{ur+1 , . . . , um }
R(A∗ ) = span{v1 , . . . , vr }
N(A) = span{vr+1 , . . . , vn },
which implies
and
R(A) ⊥ N(A∗ ), R(A∗ ) ⊥ N(A).
the entries, then take the square root. This idea is useful, but we pre-
fer a more subtle alternative that is of more universal utility through-
out mathematics: we shall gauge the size A ∈ C m×n by the maximum
amount it can stretch a vector, x ∈ C n . That is, we will measure kAk
by the largest that kAxk can be. Of course, we can inflate kAxk as
much as we like simply by making kxk larger, which we avoid by
imposing a normalization: kxk = 1. We arrive at the definition
subject to 1 = kyk2 ≥ |y1 |2 + · · · + |yr |2 . Since σ1 ≥ · · · ≥ σr , Alternatively, you could compute kΣk
by maximizing f (y) = kΣyk subject to
kΣyk2 = σ12 |y1 |2 + · · · + σr2 |yr |2 kyk = 1 using the Lagrange multiplier
technique from vector calculus.
≤ σ12 |y1 |2 + · · · + |yr |2 ) ≤ σ12 kyk2 = σ12 ,
Will any unit vector y attain this upper bound? That is, can we find
such a vector so that kΣyk = σ1 ? Sure: just take y = [1, 0, · · · , 0]∗ to
be the first column of the identity matrix. For this special vector,
1
Since |Σyk can be no larger than σ1 for any y, and since kΣyk = σ1
for at least one choice of y, we conclude
x1
kΣk = max kΣyk = σ1 , −1 1
kyk=1
1
Ax = σ1 (v1∗ x)u1 + σ2 (v2∗ x)u2
√
√ 2
= 2 x2 u1 + x u2 ,
2 1 (Ax)1
σj u j v∗j .
k
∑ σj u j v∗j
j =1
to have much larger rank. If the noise is small relative to the “true”
data in A, we expect A to have a number of very small singular val-
ues that we might wish to neglect as we work with A. We will see
examples of this kind of behavior in the next chapter.
For square diagonalizable matrices, the eigenvalue decompositions
we wrote down in Chapter 6 also express A as the sum of rank-1
matrices,
n
A= ∑ λ j w j wb ∗j , Here we write w j and w b j for the right
and left eigenvectors, to distinguish
j =1
them from the singular vectors.
but there are three key distinctions that make the singular value
decomposition a better tool for developing low-rank approximations
to A.
1. The SVD holds for all matrices, while the eigenvalue decomposi-
tion only holds for square matrices.
σ1 ≥ σ2 ≥ · · · ≥ σr > 0
3. The eigenvectors are not generally orthogonal, and this can skew
b ∗j away from giving good approxima-
the rank-1 matrices λ j w j w
b ∗j k 1, whereas
tions to A. In particular, we can find that kw j w
the matrices u j v j from the SVD always satisfy ku j v∗j k = 1.
∗
kA − Ak k = min kA − Xk = σk+1 .
rank(X)≤k
8. the singular value decomposition 20
It is easy to see that this Ak gives the approximation error σk+1 , since
r k r
A − Ak = ∑ σj u j v∗j − ∑ σj u j v∗j = ∑ σj u j v∗j ,
j =1 j =1 j = k +1
and this last expression is an SVD for the error in the approximation Let X ∈ C m×n be any rank-k matrix.
A − Ak . As described in Section 8.10, the norm of a matrix equals its The Fundamental Theorem of Linear
Algebra gives C n = R(X∗ ) ⊕ N(X).
largest singular value, so Since rank(X∗ ) = rank(X) = k, notice
r
that dim(N(X)) = n − k. From the
singular value decomposition of A
kA − Ak k =
∑ σj u j vk
= σk+1 .
∗
extract v1 , . . . , vk+1 , a basis for some
j = k +1 k + 1 dimensional subspace of C n . Since
N(X) ⊆ C n has dimension n − k, it must
To complete the proof, one needs to show that no other rank-k matrix be that the intersection
can come closer to A than Ak . This pretty argument is a bit too intri- N(X) ∩ span{v1 , . . . , vk+1 }
cate for this course, but we include it in the margin for those that are
has dimension at least one. (Otherwise,
interested. N(X) ⊕ span{v1 , . . . , vk+1 } would be
an n + 1 dimensional subspace of C n :
impossible!) Let z be some unit vector
8.11.1 Compressing images with low rank approximatoins in that intersection: kzk = 1 and
Image compression provides the most visually appealing application z ∈ N(X) ∩ span{v1 , . . . , vk+1 }.
of the low-rank matrix factorization ideas we have just described. An Expand z = γ1 v1 + · · · + γk+1 vk+1 , so
that kzk = 1 implies
image can be represented as a matrix. For example, typical grayscale
k +1 ∗ k+1 k +1
images consist of a rectangular array of pixels, m in the vertical direc- 1 = z∗ z = ∑ γ j v j ∑ γ j v j = ∑ | γ j |2 .
tion, n in the horizontal direction. The color of each of those pixels j =1 j =1 j =1
is denoted by a single number, an integer between 0 (black) and 255 Since z ∈ N(X), we have
(white). (This gives 28 = 256 different shades of gray for each pixel. kA − Xk ≥ k(A − X)zk = kAzk,
Color images are represented by three such matrices: one for red, one and then
for green, and one for blue. Thus each pixel in a typical color image
k +1
k +1
∑ ∗
∑ σ γ u
.
takes (28 )3 = 224 = 16, 777, 216 shades.) kAzk =
σ u v z
j j j
=
j j j
j =1 j =1
matlab has many built-in routines for processing images. The Since σk+1 ≤ σk ≤ · · · ≤ σ1 and the u j
imread command reads in image files. For example, if you want to vectors are orthogonal,
load the file snapshot.jpg into matlab, you would use the com-
k +1
k +1
∑σγu
≥σ
∑γu
.
j j j
k +1
j j
mand:
j =1 2 j =1 2
The double command converts the entries of the image into floating
k +1
∑
kA − Xk2 ≥ σk+1
γ j j
= σk +1 .
u
point numbers. (To conserve memory, MATLAB’s default is to save
j =1 2
the entries of an image as integers, but MATLAB’s linear algebra (This proof is adapted from §3.2.3 of
routines like svd will only work with floating point matrices.) Finally, Demmel’s text.)
James W. Demmel. Applied Numerical
to visualize an image in MATLAB, use
Linear Algebra. SIAM, Philadelphia, 1997
imagesc(A)
8. the singular value decomposition 21
original uncompressed image, rank = 480 Figure 8.1: A sample image: the
founders of numerical linear algebra
at an early Gatlinburg Symposium.
From left to right: Jim Wilkinson,
Wallace Givens, George Forsythe,
Alston Householder, Peter Henrici, and
Friedrich Bauer.
8. the singular value decomposition 22
student experiments
8.1. The two images shown in Figure 8.5 show characters gen-
erated in the MinionPro italic font, defined in the image files
minionamp.jpg and minion11.jpg, each of which leads to a 200 ×
200 matrix. Which image do you think will better lend itself to
low-rank approximation? Compute the singular values and trun-
cated SVD approximations for a variety of ranks k. Do your results
agree with your intuition?
8.13 Afterword