0% found this document useful (0 votes)

14 views11 pages

Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)

Uploaded by

陳某

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views11 pages

Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)

Uploaded by

陳某

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Lectures on Machine Learning (Fall 2017)

Hyeong In Choi Seoul National University

Lecture 7: Principal Component Analysis

(PCA)
(Draft: version 0.9.1)

Topics to be covered:

• Basic setup of PCA

• SVD formulation

• Projection in terms of SVD

• Mathematics of singular value decomposition (SVD)

1 PCA: Principal Component Analysis

Principal component analysis is one of the most important tools in ma-
chine learning. At its core, it is a tool in unsupervised learning, but it is also
used as a feature extraction tool in supervised learning.

1
1.1 Basics of PCA
Let {x(i) }N
i=1 be a dataset with no label whose data matrix X is denoted
by 0 1
0 1 (1) (1) (1)
(x(1) )T x1 · · · xj · · · xd
B .. C B .. .. .. C
B . C B B . . . CC
B (i) T C B (i) (i) (i) C
X = B (x ) C = B x1 · · · xj · · · xd C . (1)
B .. C B . .. .. C
@ . A @ .. . . A
(x(N ) )T x1
(N ) (N )
· · · xj
(N )
· · · xd
(i)
Lowering the superscript, let xij = xj . Then the data matrix X is written
as 0 1
x11 · · · x1j · · · x1d
B .. .. .. C
B . . . C
B C
X = B xi1 · · · xij · · · xid C . (2)
B . . . C
@ .. .. .. A
xN 1 · · · xN j · · · xN d
Note that the rows of X represent data points and the columns values of
features. In particular, We may view the j-th column as a random variable
Zj whose IID samples are the entries of the j-th column. This way, we
identify
Zj = [x1j , · · · , xN j ]T .
Let µj = E[Zj ], i.e.
N
1 X
µj = xij
N i=1
for j = 1, · · · , d, and let
µ = [µ1 , · · · , µd ]T .
If we think of each data point x(i) as a point in Rd , this µ represents the
centroid of all data points in Rd . Define the normalized centered data
matrix X̃ by
0 1 0 1
(x(1) µ)T x11 µ1 · · · x1j µj · · · x1d µd
B .. C B .. .. .. C
B . C B . . . C
1 B (i) T C 1 B C
X̃ = p B (x µ) C= p B xi1 µ1 · · · xij µj · · · xid µd C . (3)
NB @ .. C
A NB@ .. .. .. C
A
. . . .
(x(N ) µ)T x N 1 µ1 · · · x N j µj · · · x N d µd

2
The gist of the idea of PCA is to successively find the directions along
which the data spread out the most. In particular, our objective is to find a
unit vector v 2 Rd that maximizes
N N
1 X 1 X T (`)
F (v) = |Projv (x(`) 2
µ)| = |v (x µ)|2 . (4)
N `=1 N `=1

Now v T (x(`) µ) is a scalar, which is equal to (x(`) µ)T v. Thus we can write
N
1 X T (`) n1 XN o
2 T
|v (x µ)| = v (x(`) µ)(x(`) µ)T v.
N `=1 N `=1

Define a d ⇥ d symmetric matrix C by

N
1 X (`)
C= (x µ)(x(`) µ)T .
N `=1

Then (4) can be written as

F (v) = v T Cv. (5)

Note the (i, j)-th entry Cij of C is

N
1 X
Cij = (x`i µi )(x`j µj ).
N `=1

Therefore the following is easy to check.

Lemma 1. C is a d ⇥ d (empirical) covariance matrix satisfying
(i) Cij = Cov(Zi , Zj )

(ii) C = X̃ T X̃.
The way to maximize (4) subject to |v| = 1 is to use the Lagrange mul-
tiplier
N
1 X T (`)
L(v, ) = |v (x µ)|2 (|v|2 1)
N `=1
= v T Cv (v T v 1).

3
Thus by computing the derivative and setting it equal to zero, we have
@L
= 2Cv 2 v = 0,
@v
i.e.,
Cv = v.
This equation says v must be an eigenvector of C. Since C is a symmetric
matrix, it is diagonalizable, and the diagonalization theorem says C can be
written as
C = 1 v1 v1T + · · · + d vd vdT ,
where {v1 , · · · , vd } is an orthonormal basis of Rd and 1 . · · · , d are eigen-
values of C ordered in the descending order: 1 2··· d 0. Check
that
F (vi ) = viT Cvi = i .
Thus F (v) is the biggest when v = v1 .
Let us now look at the multi-dimensional version. So let W be a linear
subspace of Rd and let ProjW be the orthogonal projection of Rd to W. We
want to find a subspace in which the data spreads out more.
Proposition 1. Let
ProjW (x(i) µ)
be the orthogonal projection of the centered i-th data to the subspace W. Define
N
1 X
F (W ) = |ProjW (x(`) µ)|2 .
N `=1

Then F (W ) is maximum among all k dimensional linear subspaces of Rd if

W is the subspace spanned by the k highest eigenvectors, i.e., the eigenvectors
corresponding to k highest eigenvalues.
Proof. We will give a proof for k = 2 as it is trivial to generalize it for any
k 2. Let v and w be orthonormal vectors and let W be a two dimensional
linear subspace spanned by v and w. Then

1 X n T (`) o
N N
1 X
|ProjW (x(`) µ)|2 = |v (x µ)|2 + |wT (x(`) µ)|2
N `=1 N `=1
= v T Cv + wT Cw.

4
Use the Lagrange multiplier

L(v, w, ↵, ) = v T Cv + wT Cw ↵{|v|2 1} {|w|2 1}.

Upon setting
@L
= 2Cv 2↵v = 0
@v
@L
= 2Cw 2 w = 0.
@w
we get

Cv = ↵v
Cw = w.

Thus v and w are orthonormal eigenvectors of C. As before, write

T
C= 1 v1 v1 + ··· + d vd vd .

where v1 , · · · , vd are orthonormal eigenvectors with eigenvalues 1 2···

d 0. Since v, w must be two of v1 , · · · , vd , let v = vi and w = vj for i 6= j.
Then
F (v, w) = F (vi , vj ) = i + j .
This is maximum when v = v1 and w = v2 , in which case

F (v, w) = 1 + 2.

For general k, it is trivial to see that W must be the k dimensional subspace

spanned by v1 , · · · , vk .

1.2 SVD formulation

Proposition 1 shows how to find the k dimensional linear subspace in
which the data spreads out the most. The remaining task is how to do it
most efficiently, for which the singular value decomposition comes in handy.
By SVD, the N ⇥ d matrix X̃ can be written as

X̃ = U DV T , (6)

5
where U is an N ⇥ N orthogonal matrix, V a d ⇥ d orthogonal matrix, and
D a N ⇥ d “diagonal” matrix. Therefore

C = X̃ T X̃ = V DT U T U DV T = V DT DV T = V ⇤V T , (7)

where ⇤ = DT D is a d ⇥ d diagonal matrix. Note that (7) is rewritten as

CV = V ⇤,

which can be interpreted as saying

Cvj = j vj ,

for j = 1, · · · , d, where v1 , · · · , vd are column vectors of V. Namely, the

column vectors of V are precisely the eigenvectors of C. Assuming the eigen-
values of C are ordered in descending order, the basis of the k dimensional
subspace W found in Proposition 1 can be readily read o↵ as the set of the
first k column vectors of the matrix V.
These k vectors v1 , · · · , vk become new features to be used in lieu of
the old ones. This process of finding new features is called the feature
extraction.
The next question is how the data x(i) is written in terms of these newly
extracted features. Let R be an N ⇥ d matrix defined by
0 1
r11 · · · r1d
B .. C .
R = U D = @ ... . A
rN 1 · · · rN d

Then, from (6), we have X̃ T = V RT . Let ei be the N dimensional standard

basis vector all of whose components are zero except the i-th which is 1.
Then it is easy to see that
d
X d
X
T T
R ei = (R )ji ej = rij ej .
j=1 j=1

Thus
d
X d
X
T
V R ei = rij V ej = rij vj
j=1 j=1

6
Therefore, by (3), we have
d
X
1
p (x(i) µ) = X̃ T ei = V RT ei = rij vj .
N j=1

This gives a convenient way of expressing the projection to the subspace

W = span{v1 , · · · , vk } of Rd as
k
X
(i)
p
ProjW x µ = N rij vj . (8)
j=1

It now remains to determine k. In general, as we can see in Figure 1,

the data does not spread out evenly in all directions. In some directions it
spreads out quite a lot, while it remains little changed in others. In fact, W
depicted here is the linear subspace W drawn at (or translated to) the center
point µ of the data. One can readily see that the data spreads out more
in the direction of W , while it is relatively unchanged in the perpendicular
direction W ? . As we saw above, the degree of spreading out in each direction
is measured in terms of eigenvalues. So the strategy is to choose the first k
directions (eigenvector) that can capture most of the spreading. To do that,
note that Tr(C) = 1 + · · · + d . Choose k so that 1 + · · · + k is close to
Tr(C) within a given degree of closeness. Say, if we want to capture 90% of
Tr(C), choose the smallest k such that 1 + · · · + k 0.9Tr(C).

1.3 Summary: Procedure of PCA

In view of what we have done so far, the PCA procedure can be summa-
rized as follows:

(i) (Preparation) Compute X̃ as in (3)

(ii) (SVD) Compute the SVD of X̃ as in (6)

(iii) (New features) New features are the first k eigenvectors of V and W =
span{v1 , · · · , vk }

(iv) (Data) The ith data projected to W is written as in (8).

7
Figure 1: Subspace W

Remark. One must exercise judicious caution when applying PCA. The
most salient is the problem of scale. For example, look at the regression
problem of estimating the weight of a person in terms of bodily measurement
quantities. Suppose one uses the millimeter as the unit of measurement of the
girth of waist instead of the customary unit of centimeters. Then the quantity
representing the girth will be ten times more exaggerated in the PCA analy-
sis. Or, if one uses the kilometer as the unit of measurement, the girth data
may look like it changes so little from one person to another that it could be
tossed out as an irrelevant feature.
In order to prevent such discrepancies, people frequently normalize the
data before embarking on PCA. For instance, for the random variable repre-
senting the feature, one normalizes it to have mean zero and variance one.
Z j µj
Namely, Zj is replaced with . It really amounts to using the correla-
j
tion rather than covariance matrix in our PCA.
This cure, however, is not a panacea. For example if j is very small,
through this procedure, the random variable Zj may receive a disproportionate
boost. So one must exercise judicious caution and be aware of the relative im-
portance of each feature in the first place so that the PCA can truly suppress
unimportant features while boosting more important ones.

8
2 Mathematical supplements
2.1 Singular value decomposition (SVD)
The singular value decomposition (SVD) of a matrix is perhaps one of the
most useful techniques in numerical linear algebra. Its impact is far-reaching
and one cannot do without it in many applications. In here, we introduce
what it is and give an intuitive, geometric proof of it.
Let A be an m⇥n matrix. A can be viewed as a linear map A : Rn ! Rm .
Since the diagonal matrices are the easiest to deal with, it would have been
great, if A were a diagonal matrix. Of course, it cannot be true in general. So
the next best thing to hope for is to find some orthonormal bases {v1 , · · · , vn }
of Rn and {u1 , · · · , um } of Rm with regard to which A is diagonal, i.e.

Avi = i ui ,

for some i for i = 1, · · · , n. If so, the first thing that must hold is that
there is an orthonormal basis {v1 , ·, vn } such that Av1 , · · · , Avn are mutually
orthogonal, i.e.,

(Avj )T (Avi ) = 0,
for all j 6= i. Thus for any fixed i and for all j 6= i,

vjT (AT Avi ) = 0,

which must necessarily mean that AT Avi is a scalar multiple of vi , i.e., there
exists some ↵i 2 R such that AT Avi = ↵i vi . Therefore v1 , · · · , vn must be
eigenvectors of the n ⇥ n symmetric matrix AT A, which are well known
to exist and relatively easier to find numerically. So the idea is to define
Avi
ui = and declare victory, because Avi = |Avi |ui holds then. So the gist
|Avi |
of the proof is making this line of thoughts mathematically correct.
Let ↵1 , · · · , ↵n be eigenvalues of AT A. Let r = rank(A)  n. Since
rank(AT A) = rank(A) by Lemma 2, there are exactly r non-zero eigen-
values ↵1 , · · · , ↵r of AT A. We may assume that ↵1 , · · · , ↵n are ordered in
descending order as

↵1 ↵2 ··· ↵r > 0 = ↵r+1 = · · · = ↵n = 0.

9
Define, for i = 1, · · · , r,
Avi Avi
ui = =p ,
|Avi | ↵i
p
where the last equality uses the fact that |Avi | = ↵i for i = 1, · · · , n. Thus
we have
p
Avi = ↵i ui ,
for i = 1, · · · , r. Now extend {u1 , · · · , ur } to an orthonormal basis {u1 , · · · , um }
of Rm . For i r + 1, let ↵i = 0. Then we can write
p
Avj = ↵j uj
for j = 1, · · · , n. In other words,
p
uTi Avj = ↵j ij . (9)
Let us write these relations in matrix form. Define an n ⇥ n orthogonal
matrix V by 0 1
| |
V = @v 1 · · · v n A ,
| |
where vj is written as the jth column vector. Define also an m⇥m orthogonal
matrix U by 0 1
| |
U = @u 1 · · · u m A .
| |
Therefore the relation (9) is written in matrix form as
U T AV = D, (10)
where D is an m ⇥ n diagonal matrix of the form
0p 1
↵1 0 ··· ··· 0
B . .. .C
B 0
B . 0 · · · .. CC
B . p C
B . ↵r 0C
B . . .. C
D=B 0 ··· 0 . .C ,
B C
B 0 ··· 0C C
B
B .. ... .. C
@ . .A
0 ··· 0

10
when m n, or
0p 1
↵1 0 ··· 0 0 ··· 0
B .. .. C
B 0 . . ··· C
B .. p . . .C ,
D=B
B . 0 ↵r 0 .. . . .. C
C
B .. .. .. C
@ . . . A
0 ··· 0 ··· 0 ··· 0

when m  n. From (10) we get

A = U DV T ,

which is called the singular value decomposition of A.

Lemma 2. For any m ⇥ n matrix A,

rank(AT A) = rank(A).

Proof. Since xT AT Ax = |Ax|2 , it is easy to see that Ax = 0 if and only if

AA x = 0. Thus ker(A) = ker(AT A), i.e. nullity(A) = nullity(AT A). Since
rank + nullity = n, the proof follows.

NeuralHack Stage 2 Python
100% (1)
NeuralHack Stage 2 Python
2 pages
ProfEd221 - Unit 5 - Feedbacking and Communicating Assessment Results PDF
100% (4)
ProfEd221 - Unit 5 - Feedbacking and Communicating Assessment Results PDF
12 pages
Chapter 10. Dimensionality Reduction With PCA
No ratings yet
Chapter 10. Dimensionality Reduction With PCA
23 pages
TF1600 Manual Rev0
No ratings yet
TF1600 Manual Rev0
18 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
20 Pca
No ratings yet
20 Pca
50 pages
Pca
No ratings yet
Pca
6 pages
Principal Component Analysis: Atent Ariables
No ratings yet
Principal Component Analysis: Atent Ariables
13 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
Outline: - Mathematical Background - PCA - SVD - Some PCA and SVD Applications - Case Study: LSI
No ratings yet
Outline: - Mathematical Background - PCA - SVD - Some PCA and SVD Applications - Case Study: LSI
42 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Sanjay Singh Principal Component Analysis
No ratings yet
Sanjay Singh Principal Component Analysis
9 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
CDT 05 PCA SVD FoDS
No ratings yet
CDT 05 PCA SVD FoDS
34 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
No ratings yet
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
49 pages
PCA Basics
No ratings yet
PCA Basics
1 page
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
PCA revis-BoW PDF
No ratings yet
PCA revis-BoW PDF
47 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Unit 3
No ratings yet
Unit 3
28 pages
Principal Component Analysis (PCA) Application To Images: Outline of The Lecture
No ratings yet
Principal Component Analysis (PCA) Application To Images: Outline of The Lecture
26 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
Presentation
No ratings yet
Presentation
31 pages
Principal Component Analysis - A Tutorial
No ratings yet
Principal Component Analysis - A Tutorial
37 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Lec 16 PCA
No ratings yet
Lec 16 PCA
64 pages
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
No ratings yet
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
9 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
17 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
No ratings yet
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
31 pages
Generalized PCA
No ratings yet
Generalized PCA
15 pages
Lecture6 PCA
No ratings yet
Lecture6 PCA
30 pages
Pac
No ratings yet
Pac
70 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Slides Lecture7 Ext
No ratings yet
Slides Lecture7 Ext
21 pages
PCA
100% (1)
PCA
33 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
KPCA
No ratings yet
KPCA
26 pages
Lec 15
No ratings yet
Lec 15
28 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
15PCA
No ratings yet
15PCA
27 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
HW2 4050-2025
No ratings yet
HW2 4050-2025
2 pages
功夫講義2
No ratings yet
功夫講義2
6 pages
Note 17 Jan 2025 Math4050 Tuto1
No ratings yet
Note 17 Jan 2025 Math4050 Tuto1
8 pages
10 CV Val1
No ratings yet
10 CV Val1
26 pages
Data Analysis
No ratings yet
Data Analysis
40 pages
Set Theory (Not Finish)
No ratings yet
Set Theory (Not Finish)
34 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
Latex Sample
No ratings yet
Latex Sample
1 page
Sample 1 Miniproject1 Report
No ratings yet
Sample 1 Miniproject1 Report
2 pages
2024 Math 2070 HW 4 Sol
No ratings yet
2024 Math 2070 HW 4 Sol
6 pages
Week 3-4 (June 1 and 6) Basic Reading Hurley 2018 Chapter 6 Propositional Logic
No ratings yet
Week 3-4 (June 1 and 6) Basic Reading Hurley 2018 Chapter 6 Propositional Logic
82 pages
Week 2-3 (May 25 and 30) Further Reading Hurley 2018 Chapter 1 Sections 1.3 and 1.4
No ratings yet
Week 2-3 (May 25 and 30) Further Reading Hurley 2018 Chapter 1 Sections 1.3 and 1.4
20 pages
Chap 2 First Order ODE
No ratings yet
Chap 2 First Order ODE
34 pages
Week 3-4 (June 1 and 6) Basic Reading Hurley 2018 Chapter 6 Propositional Logic Answers For Selected Questions
No ratings yet
Week 3-4 (June 1 and 6) Basic Reading Hurley 2018 Chapter 6 Propositional Logic Answers For Selected Questions
6 pages
Lecture 2 Meaning Analysis and Argument Identification
No ratings yet
Lecture 2 Meaning Analysis and Argument Identification
24 pages
Lecture 3 Deductive Reasoning and Basic Logic Part 1
No ratings yet
Lecture 3 Deductive Reasoning and Basic Logic Part 1
29 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
28 pages
Sepam80 64REF Wiring 4wire Low-Voltage Transformer T81 v0
No ratings yet
Sepam80 64REF Wiring 4wire Low-Voltage Transformer T81 v0
2 pages
XRIO User Manual
No ratings yet
XRIO User Manual
38 pages
Gyan Sagar College of Engineering, SAGAR, (M.P.)
No ratings yet
Gyan Sagar College of Engineering, SAGAR, (M.P.)
5 pages
Nistgcr10 917 8 PDF
No ratings yet
Nistgcr10 917 8 PDF
268 pages
34.1.18 AOAC Official Method 948.14 Succinic Acid in Eggs
No ratings yet
34.1.18 AOAC Official Method 948.14 Succinic Acid in Eggs
2 pages
Fundamentals of Aerodynamits: MC Graw Hill
No ratings yet
Fundamentals of Aerodynamits: MC Graw Hill
9 pages
Simple Compound Complex Sentences
No ratings yet
Simple Compound Complex Sentences
15 pages
MAQ TNC AC Test
No ratings yet
MAQ TNC AC Test
1 page
Theory of Elasticity
No ratings yet
Theory of Elasticity
4 pages
GREAT Manager Framework
100% (4)
GREAT Manager Framework
14 pages
Troubleshooting Neato Botvac Connected Series
No ratings yet
Troubleshooting Neato Botvac Connected Series
4 pages
Gas Laws Practice Worksheet
No ratings yet
Gas Laws Practice Worksheet
2 pages
Introduction To Management Accounting
No ratings yet
Introduction To Management Accounting
30 pages
Wsei It Eng 2022
No ratings yet
Wsei It Eng 2022
9 pages
2 5348041922055258424
No ratings yet
2 5348041922055258424
26 pages
Fenomenologia Da Psicologia
No ratings yet
Fenomenologia Da Psicologia
24 pages
2001 Nieuwaal
No ratings yet
2001 Nieuwaal
89 pages
BUF16821 DC-DC Ic
100% (1)
BUF16821 DC-DC Ic
31 pages
Modelling of Fluid Power Systems
No ratings yet
Modelling of Fluid Power Systems
85 pages
Methods 3 Unit Plan Project: Petition Rubric
No ratings yet
Methods 3 Unit Plan Project: Petition Rubric
1 page
A Detailed Lesson Plan in Mathematics 7: I. Objectives
No ratings yet
A Detailed Lesson Plan in Mathematics 7: I. Objectives
8 pages
Reflective Essay
No ratings yet
Reflective Essay
4 pages
ch01 Edit v2
No ratings yet
ch01 Edit v2
33 pages
Gr.8 - Unit #3 - L.4 - Speech Analysis
No ratings yet
Gr.8 - Unit #3 - L.4 - Speech Analysis
11 pages
Electrophysiology Devices Market Report
No ratings yet
Electrophysiology Devices Market Report
7 pages
MS For Survey Works (Draft) R5
No ratings yet
MS For Survey Works (Draft) R5
47 pages
Ed Ruscha's One Way Street
No ratings yet
Ed Ruscha's One Way Street
16 pages

Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)

Uploaded by

Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)

Uploaded by

Lectures on Machine Learning (Fall 2017)

Hyeong In Choi Seoul National University

Lecture 7: Principal Component Analysis

• Basic setup of PCA

• Projection in terms of SVD

• Mathematics of singular value decomposition (SVD)

1 PCA: Principal Component Analysis

Define a d ⇥ d symmetric matrix C by

Then (4) can be written as

F (v) = v T Cv. (5)

Note the (i, j)-th entry Cij of C is

Therefore the following is easy to check.

Then F (W ) is maximum among all k dimensional linear subspaces of Rd if

L(v, w, ↵, ) = v T Cv + wT Cw ↵{|v|2 1} {|w|2 1}.

Thus v and w are orthonormal eigenvectors of C. As before, write

where v1 , · · · , vd are orthonormal eigenvectors with eigenvalues 1 2···

For general k, it is trivial to see that W must be the k dimensional subspace

1.2 SVD formulation

where ⇤ = DT D is a d ⇥ d diagonal matrix. Note that (7) is rewritten as

which can be interpreted as saying

for j = 1, · · · , d, where v1 , · · · , vd are column vectors of V. Namely, the

Then, from (6), we have X̃ T = V RT . Let ei be the N dimensional standard

This gives a convenient way of expressing the projection to the subspace

It now remains to determine k. In general, as we can see in Figure 1,

1.3 Summary: Procedure of PCA

(i) (Preparation) Compute X̃ as in (3)

(ii) (SVD) Compute the SVD of X̃ as in (6)

(iv) (Data) The ith data projected to W is written as in (8).

vjT (AT Avi ) = 0,

↵1 ↵2 ··· ↵r > 0 = ↵r+1 = · · · = ↵n = 0.

when m  n. From (10) we get

which is called the singular value decomposition of A.

Lemma 2. For any m ⇥ n matrix A,

Proof. Since xT AT Ax = |Ax|2 , it is easy to see that Ax = 0 if and only if

You might also like