Book4 SVD
Book4 SVD
Σ = UΛUT (7.37)
where the covariance matrix has been factorized into the orthogonal matrix U con-
taining its eigenvectors, and a diagonal matrix Λ containing its eigenvalues (sorted
in decreasing order). SVD generalizes the above factorization for any matrix. In
particular for an n × d data matrix D with n points and d columns, SVD factorizes
D as follows
D = L∆RT (7.38)
δ1 ≥ δ2 ≥ · · · ≥ δr > 0
One can discard those left and right singular vectors that correspond to zero singular
values, to obtain the reduced SVD as
D = Lr ∆r RTr (7.39)
where Lr is the n × r matrix of the left singular vectors, Rr is the d × r matrix of the
right singular vectors, and ∆r is the r × r diagonal matrix containing the positive
singular vectors. The reduced SVD leads directly to the spectral decomposition of
CHAPTER 7. DIMENSIONALITY REDUCTION 234
D, given as
D =Lr ∆r RTr
δ1 0 · · · 0 — rT1 —
| | | 0 δ · · · 0 — rT —
2 2
= l1 l2 · · · lr .
.. .. . . . ...
.
— ..
.
—
| | |
0 0 · · · δr — rTr —
=δ1 l1 rT1 + δ2 l2 rT2 + · · · + δr lr rTr
Xr
= δi li rTi
i=1
The spectral decomposition represents D as a sum of rank one matrices of the form
δi li rTi . By selecting the q largest singular-values δ1 , δ2 , · · · , δq and the correspond-
ing left and right singular-vectors, we obtain the best rank q approximation to the
original matrix D. That is, if Dq is the matrix defined as
q
X
Dq = δi li rTi
i=1
then it can be shown that Dq is the rank q matrix that minimizes the expression
kD − Dq kF
Dr = l
The set of all vectors l ∈ Rn such that Dr = l over all possible r ∈ Rd , is called the
column space of D, and the set of all vectors r ∈ Rd , such that DT l = r over all
l ∈ Rn , is called the row space of D, which is equivalent to the column space of DT .
In other words, the column space of D is the set of all vectors that can be obtained
as a linear combinations of columns of D, and the row space of D is the set of all
vectors that can be obtained as a linear combinations of the rows of D (or columns
CHAPTER 7. DIMENSIONALITY REDUCTION 235
of DT ). Also note that the set of all vectors r ∈ Rd , such that Dr = 0 is called the
null space of D, and finally, the set of all vectors l ∈ Rn , such that DT l = 0 is called
the left null space of D.
One of the main properties of SVD is that it gives a basis for each of the four
fundamental spaces associated with matrix the D. If D has rank r, it means that
it has only r independent columns, and also only r independent rows. Thus, the r
left singular vectors l1 , l2 , · · · , lr corresponding to the r non-zero singular values of
D in (7.38) represent a basis for the column space of D. The remaining n − r left
singular vectors lr+1 , · · · , ln represent a basis for the left null space of D. For the
row space, the r right singular vectors r1 , r2 , · · · , rr corresponding to the r non-zero
singular values, represent a basis for the row space of D, and the remaining d − r
right singular vectors rj , represent a basis for the null space of D.
Consider the reduced SVD expression in (7.39). Right multiplying both sides
of the equation by Rr and noting that RTr Rr = Ir , where Ir is the r × r identity
matrix, we have
DRr = Lr ∆r RTr Rr
DRr = Lr ∆r
δ1 0 · · · 0
0 δ2 · · · 0
DRr = Lr . .. . . .
.. . . ..
0 0 · · · δr
| | | | | |
D r1 r2 · · · rr = δ1 l1 δ2 l2 · · · δr lr
| | | | | |
In other words, SVD is a special factorization of the matrix D, such that any basis
vector ri for the row space is mapped to the corresponding basis vector li in the
column space, scaled by the singular value δi . As such, we can think of the SVD as
a mapping from an orthonormal basis (r1 , r2 , · · · , rr ) in Rd (the row space) to an
orthonormal basis (l1 , l2 , · · · , lr ) in Rn (the column space), with the corresponding
axes scaled according to the singular values δ1 , δ2 , · · · , δr .
We have
T
DT D = L∆RT L∆RT
= R∆T LT L∆RT
= R(∆T ∆)RT
= R∆2d RT (7.40)
where ∆2d is the d × d diagonal matrix defined as ∆2d (i, i) = δi2 , for i = 1, · · · , d.
Only r ≤ min(d, n) of these eigenvalues are positive, whereas the rest are all zeros.
Since the covariance matrix of centered D is given as Σ = n1 DT D, and since it
can be decomposed as Σ = UΛUT via PCA (7.37), we have
DT D = nΣ
= nUΛUT
= U(nΛ)UT (7.41)
Equating (7.40) and (7.41), we conclude that the right singular vectors R are the
same as the eigenvectors of Σ. Furthermore, the corresponding singular values of D
are related to the eigenvalues of Σ by the expression
nλi = δi2
δi2
or, λi = , for i = 1, · · · , d (7.42)
n
Let us now consider the matrix DDT . We have
where ∆2n is the n × n diagonal matrix given as ∆2n (i, i) = δi2 , for i = 1, · · · , n.
Only r of these singular values are positive, whereas the rest are all zeros. Thus,
the left singular L are the eigenvectors of the matrix n × n matrix DDT , and the
corresponding eigenvalues are given as δi2 .
Example 7.9: Let us consider the n × d centered Iris data matrix D from Exam-
ple 7.1, with n = 150 and d = 3. In Example 7.5 we computed the eigenvectors
CHAPTER 7. DIMENSIONALITY REDUCTION 237
Computing the SVD of D yields the following non-zero singular values and the
corresponding right singular vectors
We do not show the left singular vectors l1 , l2 , l3 since they lie in R150 . Using (7.42)
δi2
one can verify that λi = n. For example
Pearson, K. (1901), “On lines and planes of closest fit to systems of points in space”,
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Sci-
ence, 2 (11), pp. 559–572.
Schölkopf, B., Smola, A. J., and Müller, K.-R. (1998), “Nonlinear Component Analy-
sis as a Kernel Eigenvalue Problem”, Neural Computation, 10 (5), pp. 1299–1319.
Strang, G. (2006), Linear Algebra and Its Applications, 4th Edition, Thomson Brooks/-
Cole, Cengage learning.
7.6 Exercises
Q1. Consider the data matrix D given below:
X1 X2
8 -20
0 -1
10 -19
10 -20
2 0
Q3. Compute the singular values and the left and right singular vectors of the fol-
lowing matrix
1 1 0
A=
0 0 1
Q4. Consider the data in Table 7.1. Define the kernel function as follows: K(xi , xj ) =
kxi − xj k2 . Answer the following questions.
CHAPTER 7. DIMENSIONALITY REDUCTION 239
i xi
x1 (4,2.9)
x4 (2.5,1)
x7 (3.5,4)
x9 (2,2.1)
Q5. Given the two points x1 = (1, 2), and x2 = (2, 1), use the kernel function
K(xi , xj ) = (xTi xj )2