0% found this document useful (0 votes)
15 views7 pages

Book4 SVD

Uploaded by

jeren2606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views7 pages

Book4 SVD

Uploaded by

jeren2606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CHAPTER 7.

DIMENSIONALITY REDUCTION 233

7.4 Singular Value Decomposition


Principal components analysis is a special case of a more general matrix decompo-
sition method called Singular Value Decomposition (SVD). We saw above in (7.28)
that PCA yields the following decomposition of the covariance matrix

Σ = UΛUT (7.37)

where the covariance matrix has been factorized into the orthogonal matrix U con-
taining its eigenvectors, and a diagonal matrix Λ containing its eigenvalues (sorted
in decreasing order). SVD generalizes the above factorization for any matrix. In
particular for an n × d data matrix D with n points and d columns, SVD factorizes
D as follows

D = L∆RT (7.38)

where L is a orthogonal n × n matrix, R is an orthogonal d × d matrix, and ∆ is an


n × d “diagonal” matrix. The columns of L are called the left singular vectors, and
the columns of R (or rows of RT ) are called the right singular vectors. The matrix
∆ is defined as
(
δi If i = j
∆(i, j) =
0 If i 6= j

where i = 1, · · · , n and j = 1, · · · , d. The entries ∆(i, i) = δi along the main


diagonal of ∆ are called the singular values of D, and they are all non-negative. If
the rank of D is r ≤ min(n, d), then there will be only r non-zero singular values,
which we assume are ordered as follows

δ1 ≥ δ2 ≥ · · · ≥ δr > 0

One can discard those left and right singular vectors that correspond to zero singular
values, to obtain the reduced SVD as

D = Lr ∆r RTr (7.39)

where Lr is the n × r matrix of the left singular vectors, Rr is the d × r matrix of the
right singular vectors, and ∆r is the r × r diagonal matrix containing the positive
singular vectors. The reduced SVD leads directly to the spectral decomposition of
CHAPTER 7. DIMENSIONALITY REDUCTION 234

D, given as

D =Lr ∆r RTr
  
  δ1 0 · · · 0 — rT1 —
| | |  0 δ · · · 0  — rT —
 2  2 
= l1 l2 · · · lr   .
 .. .. . . . ... 
. 
 — ..
. 
—
| | |
0 0 · · · δr — rTr —
=δ1 l1 rT1 + δ2 l2 rT2 + · · · + δr lr rTr
Xr
= δi li rTi
i=1

The spectral decomposition represents D as a sum of rank one matrices of the form
δi li rTi . By selecting the q largest singular-values δ1 , δ2 , · · · , δq and the correspond-
ing left and right singular-vectors, we obtain the best rank q approximation to the
original matrix D. That is, if Dq is the matrix defined as
q
X
Dq = δi li rTi
i=1

then it can be shown that Dq is the rank q matrix that minimizes the expression

kD − Dq kF

where kAkF is called the Frobenius Norm of the n × d matrix A, defined as


v
u n d
uX X
kAkF = t A(i, j)2
i=1 j=1

7.4.1 Geometry of SVD


In general, any n × d matrix D represents a linear transformation, D : Rd → Rn ,
from the space of d-dimensional vectors to the space of n-dimensional vectors, since
for any r ∈ Rd there exists l ∈ Rn such that

Dr = l

The set of all vectors l ∈ Rn such that Dr = l over all possible r ∈ Rd , is called the
column space of D, and the set of all vectors r ∈ Rd , such that DT l = r over all
l ∈ Rn , is called the row space of D, which is equivalent to the column space of DT .
In other words, the column space of D is the set of all vectors that can be obtained
as a linear combinations of columns of D, and the row space of D is the set of all
vectors that can be obtained as a linear combinations of the rows of D (or columns
CHAPTER 7. DIMENSIONALITY REDUCTION 235

of DT ). Also note that the set of all vectors r ∈ Rd , such that Dr = 0 is called the
null space of D, and finally, the set of all vectors l ∈ Rn , such that DT l = 0 is called
the left null space of D.
One of the main properties of SVD is that it gives a basis for each of the four
fundamental spaces associated with matrix the D. If D has rank r, it means that
it has only r independent columns, and also only r independent rows. Thus, the r
left singular vectors l1 , l2 , · · · , lr corresponding to the r non-zero singular values of
D in (7.38) represent a basis for the column space of D. The remaining n − r left
singular vectors lr+1 , · · · , ln represent a basis for the left null space of D. For the
row space, the r right singular vectors r1 , r2 , · · · , rr corresponding to the r non-zero
singular values, represent a basis for the row space of D, and the remaining d − r
right singular vectors rj , represent a basis for the null space of D.
Consider the reduced SVD expression in (7.39). Right multiplying both sides
of the equation by Rr and noting that RTr Rr = Ir , where Ir is the r × r identity
matrix, we have

DRr = Lr ∆r RTr Rr
DRr = Lr ∆r
 
δ1 0 · · · 0
 0 δ2 · · · 0 
 
DRr = Lr  . .. . . .
 .. . . .. 
0 0 · · · δr
   
| | | | | |
D r1 r2 · · · rr  = δ1 l1 δ2 l2 · · · δr lr 
| | | | | |

From the above, we conclude that

Dri = δi li for all i = 1, · · · , r

In other words, SVD is a special factorization of the matrix D, such that any basis
vector ri for the row space is mapped to the corresponding basis vector li in the
column space, scaled by the singular value δi . As such, we can think of the SVD as
a mapping from an orthonormal basis (r1 , r2 , · · · , rr ) in Rd (the row space) to an
orthonormal basis (l1 , l2 , · · · , lr ) in Rn (the column space), with the corresponding
axes scaled according to the singular values δ1 , δ2 , · · · , δr .

7.4.2 Connection between SVD and PCA


Assume that the matrix D has been centered, and assume that it has been factorized
via SVD (7.38) as D = L∆RT . Consider the scatter matrix for D, given as DT D.
CHAPTER 7. DIMENSIONALITY REDUCTION 236

We have
T 
DT D = L∆RT L∆RT
= R∆T LT L∆RT
= R(∆T ∆)RT
= R∆2d RT (7.40)

where ∆2d is the d × d diagonal matrix defined as ∆2d (i, i) = δi2 , for i = 1, · · · , d.
Only r ≤ min(d, n) of these eigenvalues are positive, whereas the rest are all zeros.
Since the covariance matrix of centered D is given as Σ = n1 DT D, and since it
can be decomposed as Σ = UΛUT via PCA (7.37), we have

DT D = nΣ
= nUΛUT
= U(nΛ)UT (7.41)

Equating (7.40) and (7.41), we conclude that the right singular vectors R are the
same as the eigenvectors of Σ. Furthermore, the corresponding singular values of D
are related to the eigenvalues of Σ by the expression

nλi = δi2
δi2
or, λi = , for i = 1, · · · , d (7.42)
n
Let us now consider the matrix DDT . We have

DDT =(L∆RT )(L∆RT )T


=L∆RT R∆T LT
=L(∆∆T )LT
=L∆2n LT

where ∆2n is the n × n diagonal matrix given as ∆2n (i, i) = δi2 , for i = 1, · · · , n.
Only r of these singular values are positive, whereas the rest are all zeros. Thus,
the left singular L are the eigenvectors of the matrix n × n matrix DDT , and the
corresponding eigenvalues are given as δi2 .

Example 7.9: Let us consider the n × d centered Iris data matrix D from Exam-
ple 7.1, with n = 150 and d = 3. In Example 7.5 we computed the eigenvectors
CHAPTER 7. DIMENSIONALITY REDUCTION 237

and eigenvalues of the covariance matrix Σ as follows

λ1 = 3.662 λ2 = 0.239 λ3 = 0.059


     
−0.390 −0.639 −0.663
u1 =  0.089 u2 = −0.742 u3 =  0.664
−0.916 0.200 0.346

Computing the SVD of D yields the following non-zero singular values and the
corresponding right singular vectors

δ1 = 23.437 δ2 = 5.992 δ3 = 2.974


     
−0.390 0.639 −0.663
r1 =  0.089 r2 =  0.742 r3 =  0.664
−0.916 −0.200 0.346

We do not show the left singular vectors l1 , l2 , l3 since they lie in R150 . Using (7.42)
δi2
one can verify that λi = n. For example

δ12 23.4372 549.29


λ1 = = = = 3.662
n 150 150
Notice also that the right singular vectors are equivalent to the principal com-
ponents or eigenvectors of Σ, up to isomorphism. That is, they may potentially
be reversed in direction. For the Iris dataset, we have r1 = u1 , r2 = −u2 , and
r3 = u3 . Here the second right singular vector is reversed when compared to the
second principal component.

7.5 Further Reading


Principal component analysis was pioneered in (Pearson, 1901). For a comprehensive
description of PCA see (Jolliffe, 2002). Kernel PCA was first introduced in (Schölkopf,
Smola, and Müller, 1998). For further exploration of non-linear dimensionality reduc-
tion methods see (Lee and Verleysen, 2007). The requisite linear algebra background
can be found in (Strang, 2006).

Jolliffe, I. (2002), Principal Component Analysis, 2nd Edition, Springer Series in


Statistics, New York, USA: Springer-Verlag, Inc.
Lee, J. A. and Verleysen, M. (2007), Nonlinear dimensionality reduction, Springer.
CHAPTER 7. DIMENSIONALITY REDUCTION 238

Pearson, K. (1901), “On lines and planes of closest fit to systems of points in space”,
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Sci-
ence, 2 (11), pp. 559–572.
Schölkopf, B., Smola, A. J., and Müller, K.-R. (1998), “Nonlinear Component Analy-
sis as a Kernel Eigenvalue Problem”, Neural Computation, 10 (5), pp. 1299–1319.
Strang, G. (2006), Linear Algebra and Its Applications, 4th Edition, Thomson Brooks/-
Cole, Cengage learning.

7.6 Exercises
Q1. Consider the data matrix D given below:

X1 X2
8 -20
0 -1
10 -19
10 -20
2 0

(a) Compute the mean µ and covariance matrix Σ for D.


(b) Compute the eigenvalues of Σ.
(c) What is the “intrinsic” dimensionality of this dataset (discounting some
small amount of variance)?
(d) Compute the first principal component.
(e) If the µ and Σ from above characterize the normal distribution from
which the points were generated, sketch the orientation/extent of the two-
dimensional normal density function.
 
5 4
Q2. Given the covariance matrix Σ = , answer the following
4 5

(a) Compute the eigenvalues of Σ by solving the equation det(Σ − λI) = 0.


(b) Find the corresponding eigenvectors by solving the equation Σui = λi ui .

Q3. Compute the singular values and the left and right singular vectors of the fol-
lowing matrix  
1 1 0
A=
0 0 1

Q4. Consider the data in Table 7.1. Define the kernel function as follows: K(xi , xj ) =
kxi − xj k2 . Answer the following questions.
CHAPTER 7. DIMENSIONALITY REDUCTION 239

i xi
x1 (4,2.9)
x4 (2.5,1)
x7 (3.5,4)
x9 (2,2.1)

Table 7.1: Dataset for Q4

(a) Compute the kernel matrix K.


(b) Find the first kernel principal component.

Q5. Given the two points x1 = (1, 2), and x2 = (2, 1), use the kernel function

K(xi , xj ) = (xTi xj )2

to find the kernel principal component, by solving the equation Kc = η1 c.

You might also like