Book3 SVD
Book3 SVD
DIMENSIONALITY REDUCTION
Example 11.7 : The eigenvalues of M M T for our running example must in-
clude 58 and 2, because those are the eigenvalues of M T M as we observed in
Section 11.2.1. Since M M T is a 4 × 4 matrix, it has two other eigenvalues,
which must both be 0. The matrix of eigenvectors corresponding to 58, 2, 0,
and 0 is shown in Fig. 11.4. ✷
! (d) Find the eigenvectors of M M T , using your eigenvalues from part (c).
for which no nonzero linear combination of the rows is the all-zero vector 0 (we
say a set of such rows or columns is independent). Then we can find matrices
U , Σ, and V as shown in Fig. 11.5 with the following properties:
3. Σ is a diagonal matrix; that is, all elements not on the main diagonal are
0. The elements of Σ are called the singular values of M .
n r r n
T r
V
Σ
m M = U
Casablanca
Star Wars
Matrix
Titanic
Alien
Joe 1 1 1 0 0
Jim 3 3 3 0 0
John 4 4 4 0 0
Jack 5 5 5 0 0
Jill 0 0 0 4 4
Jenny 0 0 0 5 5
Jane 0 0 0 2 2
1 1 1 0 0 .14 0
3 3 3 0 0
.42 0
4 4 4 0 0 .56 0
12.4 0
.58 .58 .58 0 0
5 5 5 0 0 = .70 0
0 9.5 0 0 0 .71 .71
0 0 0 4 4
0 .60
0 0 0 5 5 0 .75
0 0 0 2 2 0 .30
M U Σ VT
Titanic
Alien
Joe 1 1 1 0 0
Jim 3 3 3 0 0
John 4 4 4 0 0
Jack 5 5 5 0 0
Jill 0 2 0 4 4
Jenny 0 0 0 5 5
Jane 0 1 0 2 2
Figure 11.8: The new matrix M ′ , with ratings for Alien by two additional raters
Example 11.9 : Figure 11.8 is almost the same as Fig. 11.6, but Jill and Jane
rated Alien, although neither liked it very much. The rank of the matrix in
Fig. 11.8 is 3; for example the first, sixth, and seventh rows are independent,
but you can check that no four rows are independent. Figure 11.9 shows the
decomposition of the matrix from Fig. 11.8.
We have used three columns for U , Σ, and V because they decompose a
matrix of rank three. The columns of U and V still correspond to concepts.
The first is still “science fiction” and the second is “romance.” It is harder to
446 CHAPTER 11. DIMENSIONALITY REDUCTION
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0 =
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
M′
.13 .02 −.01
.41 .07 −.03
.55 .09 −.04 12.4 0
0 .56 .59 .56 .09 .09
.68 .11 −.05 0
9.5 0 .12 −.02 .12 −.69 −.69
.15 −.59 .65
0 0 1.3 .40 −.80 .40 .09 .09
.07 −.73 −.67
.07 −.29 .32
U Σ VT
explain the third column’s concept, but it doesn’t matter all that much, because
its weight, as given by the third nonzero entry in Σ, is very low compared with
the weights of the first two concepts. ✷
Example 11.10 : The decomposition of Example 11.9 has three singular val-
ues. Suppose we want to reduce the number of dimensions to two. Then we
set the smallest of the singular values, which is 1.3, to zero. The effect on the
expression in Fig. 11.9 is that the third column of U and the third row of V T are
11.3. SINGULAR-VALUE DECOMPOSITION 447
multiplied only by 0’s when we perform the multiplication, so this row and this
column may as well not be there. That is, the approximation to M ′ obtained
by using only the two largest singular values is that shown in Fig. 11.10.
.13 .02
.41 .07
.55 .09
12.4 0
.56 .59 .56 .09 .09
.68 .11
0 9.5 .12 −.02 .12 −.69 −.69
.15 −.59
.07 −.73
.07 −.29
0.93 0.95 0.93 .014 .014
2.93 2.99 2.93 .000 .000
3.92 4.01 3.92 .026 .026
= 4.84 4.96 4.84 .040 .040
0.37 1.21 0.37 4.04 4.04
0.35 0.65 0.35 4.87 4.87
0.16 0.57 0.16 1.98 1.98
Figure 11.10: Dropping the lowest singular value from the decomposition of
Fig. 11.7
The resulting matrix is quite close to the matrix M ′ of Fig. 11.8. Ideally, the
entire difference is the result of making the last singular value be 0. However,
in this simple example, much of the difference is due to rounding error caused
by the fact that the decomposition of M ′ was only correct to two significant
digits. ✷
multiplication tells us XX
mij = pik qkℓ rℓj
k ℓ
Then 2
XX X XX X
kM k2 = (mij )2 = pik qkℓ rℓj (11.1)
i j i j k ℓ
Now, let us examine the case where P , Q, and R are really the SVD of M .
That is, P is a column-orthonormal matrix, Q is a diagonal matrix, and R is
the transpose of a column-orthonormal matrix. That is, R is row-orthonormal;
its rows are unit vectors and the dot product of any two different rows is 0. To
begin, since Q is a diagonal matrix, qkℓ and qnm will be zero unless k = ℓ and
n = m. We can thus drop the summations for ℓ and m in Equation 11.2 and
set k = ℓ and n = m. That is, Equation 11.2 becomes
XXXX
kM k2 = pik qkk rkj pin qnn rnj (11.3)
i j k n
Next, reorder the summation, so i is the innermost sum. Equation 11.3 has
only two factors pik and pin that involve i; all other factors are constants as far
as summation over i is concerned. Since P is column-orthonormal, We know
11.3. SINGULAR-VALUE DECOMPOSITION 449
P
that i pik pin is 1 if k = n and 0 otherwise. That is, in Equation 11.3 we can
set k = n, drop the factors pik and pin , and eliminate the sums over i and n,
yielding XX
kM k2 = qkk rkj qkk rkj (11.4)
j k
P
Since R is row-orthonormal, j rkj rkj is 1. Thus, we can eliminate the
terms rkj and the sum over j, leaving a very simple formula for the Frobenius
norm: X
kM k2 = (qkk )2 (11.5)
k
[2.32, 0] by V T . This product is [1.35, 1.35, 1.35, 0, 0]. It suggests that Quincy
would like Alien and Star Wars, but not Casablanca or Titanic.
Another sort of query we can perform in concept space is to find users similar
to Quincy. We can use V to map all users into concept space. For example,
Joe maps to [1.74, 0], and Jill maps to [0, 5.68]. Notice that in this simple
example, all users are either 100% science-fiction fans or 100% romance fans, so
each vector has a zero in one component. In reality, people are more complex,
and they will have different, but nonzero, levels of interest in various concepts.
In general, we can measure the similarity of users by their cosine distance in
concept space.
Example 11.11 : For the case introduced above, note that the concept vectors
for Quincy and Joe, which are [2.32, 0] and [1.74, 0], respectively, are not the
same, but they have exactly the same direction. That is, their cosine distance
is 0. On the other hand, the vectors for Quincy and Jill, which are [2.32, 0] and
[0, 5.68], respectively, have a dot product of 0, and therefore their angle is 90
degrees. That is, their cosine distance is 1, the maximum possible. ✷
M T = (U ΣV T )T = (V T )T ΣT U T = V ΣT U T
M T M = V Σ2 V T
M T M V = V Σ2 V T V
M T M V = V Σ2 (11.6)
Since Σ is a diagonal matrix, Σ2 is also a diagonal matrix whose entry in the
ith row and column is the square of the entry in the same position of Σ. Now,
Equation (11.6) should be familiar. It says that V is the matrix of eigenvectors
of M T M and Σ2 is the diagonal matrix whose entries are the corresponding
eigenvalues.
11.3. SINGULAR-VALUE DECOMPOSITION 451
Thus, the same algorithm that computes the eigenpairs for M T M gives us
the matrix V for the SVD of M itself. It also gives us the singular values for
this SVD; just take the square roots of the eigenvalues for M T M .
Only U remains to be computed, but it can be found in the same way we
found V . Start with
M M T = U ΣV T (U ΣV T )T = U ΣV T V ΣU T = U Σ2 U T
M M T U = U Σ2
(d) Find the SVD for the original matrix M from parts (b) and (c). Note
that there are only two nonzero eigenvalues, so your matrix Σ should have
only two singular values, while U and V have only two columns.
(e) Set your smaller singular value to 0 and compute the one-dimensional
approximation to the matrix M from Fig. 11.11.
452 CHAPTER 11. DIMENSIONALITY REDUCTION
(f) How much of the energy of the original singular values is retained by the
one-dimensional approximation?
Exercise 11.3.2 : Use the SVD from Fig. 11.7. Suppose Leslie assigns rating 3
to Alien and rating 4 to Titanic, giving us a representation of Leslie in “movie
space” of [0, 3, 0, 0, 4]. Find the representation of Leslie in concept space. What
does that representation predict about how well Leslie would like the other
movies appearing in our example data?
! Exercise 11.3.3 : Demonstrate that the rank of the matrix in Fig. 11.8 is 3.
! Exercise 11.3.4 : Section 11.3.5 showed how to guess the movies a person
would most like. How would you use a similar technique to guess the people
that would most like a given movie, if all you had were the ratings of that movie
by a few people?
an artifact of the very regular nature of our example matrix M and is not the case in general.