0% found this document useful (0 votes)
36 views31 pages

Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)

The document discusses singular value decomposition (SVD) and principal component analysis (PCA). It provides an example to calculate the eigenvectors and eigenvalues of a sample data matrix X. The spectral theorem is used, which states that the matrices XX^T and X^TX share the same non-zero eigenvalues. This allows calculating the eigenvectors of XX^T from the eigenvectors of X^TX. For the example matrix X, it is shown that the eigenvalues are 58 and 2, and the corresponding eigenvectors are [1,2]^T and [-1,2]^T.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views31 pages

Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)

The document discusses singular value decomposition (SVD) and principal component analysis (PCA). It provides an example to calculate the eigenvectors and eigenvalues of a sample data matrix X. The spectral theorem is used, which states that the matrices XX^T and X^TX share the same non-zero eigenvalues. This allows calculating the eigenvectors of XX^T from the eigenvectors of X^TX. For the example matrix X, it is shown that the eigenvalues are 58 and 2, and the corresponding eigenvectors are [1,2]^T and [-1,2]^T.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

SINGULAR VALUE DECOMPOSITION (SVD)/

PRINCIPAL COMPONENTS ANALYSIS (PCA)

!1
SVD - EXAMPLE

U, S, V T = numpy . linalg . svd(img)

!2
SVD - EXAMPLE
full rank 600 300

100

50 20

10 U[: , k]S[: k]V T[: k, :]

!3
PCA - INTRODUCTION

1 2 4
2 1 5
X=
3 4 10
4 3 11

!4
PCA - INTRODUCTION

!5
PCA - INTRODUCTION

!6
PCA - INTRODUCTION

!7
PRINCIPAL COMPONENT ANALYSIS

• A technique to find the directions along which the points


(set of tuples) in high-dimensional data line up best.

• Treat a set of tuples as a matrix M and find the eigenvectors


for MMT or MTM.

• The matrix of these eigenvectors can be thought of as a


rigid rotation in a high-dimensional space.

• When this transformation is applied to the original data - the


axis corresponding to the principal eigenvector is the one
along which the points are most “spread out”.

!8
PRINCIPAL COMPONENT ANALYSIS

• When this transformation is applied to the original data - the


axis corresponding to the principal eigenvector is the one
along which the points are most “spread out”.

• This axis is the one along which variance of the data is


maximized.

• Points can best be viewed as lying along this axis with small
deviations from this axis.

• Likewise, the axis corresponding the second eigenvector is


the axis along which the variance of distances from the first
axis is greatest, and so on.

!9
PRINCIPAL COMPONENT ANALYSIS
• Principal Component Analysis (PCA) is a dimensionality reduction
method.

• The goal is to embed data in high dimensional space, onto a small


number of dimensions.

• It most frequent use is in exploratory data analysis and


visualization.

• It can also be helpful in regression (linear or logistic) where we can


transform input variables into a smaller number of predictors for
modeling.

!10
PRINCIPAL COMPONENT ANALYSIS
• Mathematically,


Given: Data set 
 {x1, x2, . . . , xn}

where, xi is the vector of p variable values for the i-th observation.





Return:


Matrix
 [ϕ1, ϕ2, . . . , ϕp]

of linear transformations that retain maximal variance.

• You can think of the first vector ϕ1 as a linear transformation that


embeds observations into 1 dimension


Z1 = ϕ11X1 + ϕ21X2 + … + ϕp1Xp

!11
PRINCIPAL COMPONENT ANALYSIS
• You can think of the first vector ϕ1 as a linear transformation that
embeds observations into 1 dimension




 Z1 = ϕ11X1 + ϕ21X2 + … + ϕp1Xp

where ϕ1 is selected so that the resulting dataset {zi,
 …, zn}

has maximum variance.

• In order for this to make sense, mathematically, data has to be


centered
• Each Xi has zero mean p
2
• Transformation vector ϕ1 has to be normalized, i.e., 
 ∑ ϕj1 = 1

 j=1

!12
PRINCIPAL COMPONENT ANALYSIS
• In order for this to make sense, mathematically, data has to be
centered
• Each Xi has zero mean p
• Transformation vector ϕ1 has to be normalized, i.e., 
 ϕ 2 = 1
∑ j1
j=1

• We can find ϕ1 by solving an optimization problem:





 n p 2 p

ϕ11,ϕ21,…,ϕp1 n ∑ (∑ )

 1
ϕj12 = 1


 max ϕj1xij s.t.

 i=1 j=1 j=1

Maximize variance but subject to normalization constraint. 



!13
PRINCIPAL COMPONENT ANALYSIS

• We can find ϕ1 by solving an optimization problem:





 p 2 p
n

ϕ11,ϕ21,…,ϕp1 n ∑ (∑ )

 1
ϕj12 = 1

max ϕj1xij s.t.


 i=1 j=1 j=1

Maximize variance but subject to normalization constraint.

• The second transformation, ϕ2 is obtained similarly with the added




constraint that ϕ2 is orthogonal to ϕ1 


• Taken together [ϕ1, ϕ2] define a pair of linear transformations of the 




data into 2 dimensional space 


Zn×2 = Xn×p[ϕ1, ϕ2]p×2

!14
PRINCIPAL COMPONENT ANALYSIS

• Taken together [ϕ1, ϕ2] define a pair of linear transformations of the 




data into 2 dimensional space 


 Zn×2 = Xn×p[ϕ1, ϕ2]p×2

• Each of the columns of the Z matrix are called Principal


components.

• The units of the PCs are meaningless.

• In practice we may also scale Xj to have unit variance.

• In general if variables Xj are measured in different units(e.g., miles 




vs. liters vs. dollars), variables should be scaled to have unit 


variance.

!15
SPECTRAL THEOREM
Using Spectral theorem

(X T X)ϕ = λϕ
XX T (Xϕ) = λ(Xϕ)
Conclusion:

The matrices XX T and X T X share the same nonzero eigenvalues

To get an eigenvector of XX T from X T X multiply ϕ on the left by X

Very powerful, particularly if number of observations, m, and the


number of predictors, n, are drastically different in size.

For PCA:
Cov(X, X ) = XX T

!16
EXAMPLE - PCA
1 2
2 1
X=
3 4
4 3

Eigen Values and Eigen Vectors?

!17
EXAMPLE - PCA
1 2
2 1
X=
3 4
4 3

From spectral theorem:

(X T X)ϕ = λϕ
1 2

[2 1 4 3] 3 [28 30]
1 2 3 4 2 1 30 28
XT X = =
4
4 3

!18
EXAMPLE - PCA
1 2
2 1
X=
3 4
4 3

From spectral theorem:

(X T X)ϕ = λϕ ⟹ (X T X)ϕ − λIϕ = 0


((X T X) − λI)ϕ = 0

[ 28 30 − λ]
30 − λ 28
= 0 ⟹ λ = 58 and λ = 2

!19
EXAMPLE - PCA
1 2
2 1
X=
3 4
4 3

From spectral theorem:

(X T X)ϕ = λϕ
1

[28 30] [ϕ12] [ϕ12]


30 28 ϕ11 ϕ11 2
= 58 ⟹ ϕ1 = 1
2

!20
EXAMPLE - PCA
1 2
2 1
X=
3 4
4 3

From spectral theorem: (X T X)ϕ = λϕ


1

[28 30] [ϕ12] [ϕ12]


30 28 ϕ11 ϕ11 2
= 58 ⟹ ϕ1 = 1
2

−1

[28 30] [ϕ22] [ϕ22]


30 28 ϕ21 ϕ21 2
=2 ⟹ ϕ2 = 1
2

!21
EXAMPLE - PCA
1 2
2 1
X=
3 4
4 3

From spectral theorem: (X T X)ϕ = λϕ


1 −1
2 2
ϕ1 = λ1 = 58 ϕ2 = λ2 = 2
1 1
2 2

1 −1
2 2
ϕ= 1 1
2 2

!22
EXAMPLE - PCA
1 −1
1 2 2 2 λ2 = 2
2 1 ϕ1 = λ1 = 58 ϕ2 =
X= 1 1
3 4 2 2
4 3

3 1
2 2
1 2 1 −1 3 −1
2 1 2 2 2 2
Z = Xϕ = =
3 4 1 1 7 1
4 3 2 2 2 2
7 −1
2 2

!23
EXAMPLE - PCA
3 1
2 2
1 2 3 −1
2 1
X= 2 2
3 4 Z= 7 1
4 3
2 2
7 −1

(3,4) 2 2

(3.5,3.5)

(1,2)
(4,3)
( 2) ( 2)
3 1 7 1
, ,
(1.5,1.5) 2 2

(2,1)

( 2) ( 2)
3 −1 7 −1
, ,
2 2

!24
PCA STEPS -
STEP 1 MEAN SUBTRACTION

!25
PCA STEPS -
STEP 2 COVARIANCE MATRIX

top 5 rows of mean centered data

covariance matrix

!26
PCA STEPS -
STEP 3 EIGEN VALUES & EIGEN
VECTORS OF COVARIANCE MATRIX

!27
PCA STEPS -
STEP 4 - PRINCIPAL COMPONENTS

Multiply each eigen vector by its corresponding eigen value


(usually square root)

Plot them on top of the data

!28
PCA STEPS -
STEP 5 - PROJECT DATA ALONG
DOMINANT PC

newData = PC1 × oldData

!29
HOW MANY PRINCIPAL COMPONENTS ?

• How many PCs should we consider in post-hoc analysis?

• One result of PCA is a measure of the variance to each PC relative 




to the total variance of the dataset.

• We can calculate the percentage of variance explained for the m-th


PC:


 n
2

 ∑ z im
PVEm = pi=1 n
∑ ∑ xij2
j=1 i=1

!30
HOW MANY PRINCIPAL COMPONENTS ?

• We can calculate the percentage of variance explained for the m-th


PC:
 n

 ∑ zim2

PVEm = pi=1 n
∑ ∑ xij2
j=1 i=1

!31

You might also like