0% found this document useful (0 votes)

48 views

PCA Machine Learning

The document discusses principal component analysis (PCA), a method of dimensionality reduction. It begins with an introduction to why dimensionality reduction is useful and provides recaps of linear algebra and statistics concepts. It then covers the foundations of PCA, describing how it finds principal components that explain the most variance in the data and can be used to reduce dimensions. It notes some assumptions of PCA, such as variables being normally distributed and high variance necessarily indicating importance.

Uploaded by

Arjun PM

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

PCA Machine Learning

Uploaded by

Arjun PM

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Advanced Section #4:

Methods of Dimensionality Reduction:

Principal Component Analysis (PCA)

Cedric Flamant

CS109A Introduction to Data Science

Pavlos Protopapas, Kevin Rader, and Chris Tanner

1
Outline

1. Introduction:
a. Why Dimensionality Reduction?
b. Linear Algebra (Recap).
c. Statistics (Recap).

2. Principal Component Analysis:

a. Foundation.
b. Assumptions & Limitations.
c. Kernel PCA for nonlinear dimensionality reduction.

CS109A, PROTOPAPAS, RADER, 2

TANNER
Dimensionality Reduction, why?

A process of reducing the number of predictor variables under

consideration.

To find a more meaningful basis to express our data filtering the noise
and revealing the hidden structure.

C. Bishop, Pattern Recognition and Machine

CS109A, PROTOPAPAS, RADER, Learning, Springer (2008). 3
TANNER
A simple example taken from Physics
Consider an ideal spring-mass system oscillating along x.
Seeking the pressure Y that spring exerts on the wall.

LASSO regression model:

LASSO variable selection:

J. Shlens, A Tutorial on Principal Component

Analysis, (2003). CS109A, PROTOPAPAS, RADER, 4
TANNER
Principal Component Analysis versus LASSO

LASSO LASSO simply selects one of the arbitrary

directions, scientifically unsatisfactory.

We want to use all the measurements to situate

X the position of mass.

X We want to find a lower-dimensional manifold

of predictors on which data lie.

✓ Principal Component Analysis (PCA):

A powerful Statistical tool for analyzing data sets and is
formulated in the context of Linear Algebra.
CS109A, PROTOPAPAS, RADER, 5
TANNER
Linear Algebra (Recap)

6
Symmetric matrices
Consider a design (or data) matrix consists of n observations and p
predictors:

Then is a symmetric matrix.

Symmetric:
Using that :

Similar for

CS109A, PROTOPAPAS, RADER, 7

TANNER
Eigenvalues and Eigenvectors
For a real and symmetric matrix:
There exists a unique set of real eigenvalues:
and the associated eigenvectors:

such that:

(orthogonal)

(normalized)
➢ Hence, they form an orthonormal basis.

CS109A, PROTOPAPAS, RADER, 8

TANNER
Spectrum and Eigen-decomposition

Spectrum:

Orthogonal Matrix:

Eigen-decomposition:

CS109A, PROTOPAPAS, RADER, 9

TANNER
Real & Positive Eigenvalues: Gram Matrix
● The eigenvalues of are non-negative real numbers:

Similar for

● Hence, and are positive-semidefinite .

CS109A, PROTOPAPAS, RADER,

TANNER
Same eigenvalues

● and share the same eigenvalues:

Same eigenvalues.
Transformed eigenvectors:

CS109A, PROTOPAPAS, RADER, 11

TANNER
The sum of eigenvalues of is equal to its trace

● Cyclic Property of Trace:

Suppose the matrices:

● The trace of a Gram matrix is the sum of its eigenvalues.

CS109A, PROTOPAPAS, RADER, 12

TANNER
Statistics (Recap)

13
Centered Model Matrix

Consider the model (data) matrix

We make the predictors centered (each column has zero expectation) by

subtracting the sample mean:

Centered Model Matrix:

CS109A, PROTOPAPAS, RADER, 14

TANNER
Sample Covariance Matrix
Consider the Covariance matrix:

Inspecting the terms:

➢ The diagonal terms are the sample variances:

➢ The non-diagonal terms are the sample covariances:

CS109A, PROTOPAPAS, RADER, 15

TANNER
Principal Components Analysis (PCA)

16
PCA

PCA tries to fit an ellipsoid to the data.

PCA is a linear transformation that transforms data

to a new coordinate system.

The data with the greatest variance lie on the first axis
(first principal component) and so on.

PCA reduces the dimensions by throwing away the

low variance principal components.
CS109A, PROTOPAPAS, RADER,
TANNER J. Jauregui (2012) 17
PCA foundation

Note that the covariance matrix is symmetric, so it permits an

orthonormal eigenbasis:

The eigenvalues can be sorted in as:

The eigenvector is called the ith principal component of

CS109A, PROTOPAPAS, RADER, 18

TANNER
Measure the importance of the principal components

The total sample variance of the predictors:

The fraction of the total sample variance that corresponds to :

so, indicates the “importance” of the ith principal component.

CS109A, PROTOPAPAS, RADER, 19

TANNER
Back to spring-mass example

PCA finds:

revealing the one-degree of freedom.

Hence, PCA indicates that there may be fewer variables that are essentially
responsible for the variability of the response.

CS109A, PROTOPAPAS, RADER, 20

TANNER
PCA Dimensionality Reduction
The Spectrum represents the dimensionality reduction by PCA.

CS109A, PROTOPAPAS, RADER, 21

TANNER
PCA Dimensionality Reduction
There is no rule in how many eigenvalues to keep, but it is generally
clear and left to the analyst’s discretion.

C. Bishop, Pattern Recognition and Machine

Learning, Springer (2008).

CS109A, PROTOPAPAS, RADER, 22

TANNER
PCA Dimensionality Reduction
An example on leaves (thanks to Chris Rycroft, AM205)

CS109A, PROTOPAPAS, RADER, 23

TANNER
PCA Dimensionality Reduction

The average leaf

(Why do we need this again?)

CS109A, PROTOPAPAS, RADER, 24

TANNER
PCA Dimensionality Reduction
First three principal components

positive

negative

CS109A, PROTOPAPAS, RADER, 25

TANNER
PCA Dimensionality Reduction – Keeping up to k Components

CS109A, PROTOPAPAS, RADER, 26

TANNER
Assumptions of PCA

Although PCA is a powerful tool for dimension reduction, it is based on

some strong assumptions.

The assumptions are reasonable, but they must be checked in practice

before drawing conclusions from PCA.

When PCA assumptions fail, we need to use other Linear or Nonlinear

dimension reduction methods.

CS109A, PROTOPAPAS, RADER, 27

TANNER
Mean/Variance are sufficient
In applying PCA, we assume that means and covariance matrix are sufficient for
describing the distributions of the predictors.
This is only exactly true if the predictors are drawn from a multivariable Normal
distribution, but works approximately for many situations.

When a predictor deviates heavily from being

Normally distributed, an appropriate nonlinear
transformation may solve this problem.

CS109A, PROTOPAPAS, RADER, 28

TANNER
Wikipedia – multivariate normal distribution
High Variance indicates importance

Assumption: The eigenvalue is measures the “importance” of the i th

principal component.

It is intuitively reasonable that lower variability components describe the

data less, but it is not always true.

CS109A, PROTOPAPAS, RADER, 29

TANNER
Principal Components are orthogonal

PCA assumes that the intrinsic dimensions are orthogonal.

When this assumption fails, we need

to assume non-orthogonal components
which are not compatible with PCA.

Balaji Pitchai Kannu (on Quora)

CS109A, PROTOPAPAS, RADER, 30
TANNER
Linear Change of Basis

PCA assumes that data lie on a lower dimensional linear manifold.

projectrhea.org Alexsei Tiulpin

When the data lie on a nonlinear manifold in the predictor space, then
linear methods are likely to be ineffective.
CS109A, PROTOPAPAS, RADER, 31
TANNER
Kernel PCA for Nonlinear Dimensionality Reduction

Applying a nonlinear map Φ (called feature map) on data yields PCA

kernel:

Centered nonlinear representation:

Apply PCA to the modified Kernel:

CS109A, PROTOPAPAS, RADER, 32

TANNER Alexsei Tiulpin
Summary
• Dimensionality Reduction Methods
1. A process of reducing the number of predictor variables under consideration.
2. To find a more meaningful basis to express our data filtering the noise and
revealing the hidden structure.

• Principal Component Analysis

1. A powerful Statistical tool for analyzing data sets and is formulated in the
context of Linear Algebra.
2. Spectral decomposition: We reduce the dimension of predictors by reducing
the number of principal components and their eigenvalues.
3. PCA is based on strong assumptions that we need to check.
4. Kernel PCA for nonlinear dimensionality reduction.

CS109A, PROTOPAPAS, RADER, 33

TANNER
Advanced Section 4: Dimensionality Reduction, PCA

Thank you

CS109A, PROTOPAPAS, RADER, 34

TANNER

Elementary Linear Algebra Applications Version 12th Edition PDF
No ratings yet
Elementary Linear Algebra Applications Version 12th Edition PDF
24 pages
Linear Algebra - Lecture 2
No ratings yet
Linear Algebra - Lecture 2
100 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Chapter 10. Dimensionality Reduction With PCA
No ratings yet
Chapter 10. Dimensionality Reduction With PCA
23 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
3
No ratings yet
3
12 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
program-3
No ratings yet
program-3
7 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Principal+Component+Analysis
No ratings yet
Principal+Component+Analysis
6 pages
2 - 4 Principal Component Analysis (PCA)
No ratings yet
2 - 4 Principal Component Analysis (PCA)
15 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Dimensionality Reduction2023
No ratings yet
Dimensionality Reduction2023
20 pages
Module 3
No ratings yet
Module 3
41 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
Intermediate R - Principal Component Analysis
No ratings yet
Intermediate R - Principal Component Analysis
8 pages
Sess03 Dimension Reduction Methods
No ratings yet
Sess03 Dimension Reduction Methods
36 pages
Kumar 2017
No ratings yet
Kumar 2017
13 pages
Principal Components Analysis: A How-To Manual For R
No ratings yet
Principal Components Analysis: A How-To Manual For R
12 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
2d
No ratings yet
2d
17 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
PCA
100% (1)
PCA
33 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Aiml - 07 - 28
No ratings yet
Aiml - 07 - 28
4 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Sanjay Singh Principal Component Analysis
No ratings yet
Sanjay Singh Principal Component Analysis
9 pages
STAT502
No ratings yet
STAT502
13 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Pca PDF
No ratings yet
Pca PDF
33 pages
9 ML
No ratings yet
9 ML
39 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
17 pages
PCA Tutorial: Instructor: Forbes Burkowski
No ratings yet
PCA Tutorial: Instructor: Forbes Burkowski
12 pages
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
No ratings yet
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
32 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
5-dimension reduction
No ratings yet
5-dimension reduction
48 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
PCA revis-BoW PDF
No ratings yet
PCA revis-BoW PDF
47 pages
Prs l6
No ratings yet
Prs l6
10 pages
Unit-3
No ratings yet
Unit-3
28 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Advanced Mathematical Applications in Data Science
From Everand
Advanced Mathematical Applications in Data Science
Biswadip Basu Mallik
No ratings yet
Determinants One Shot - Vmath
No ratings yet
Determinants One Shot - Vmath
74 pages
Linear Algebra Assignment
0% (1)
Linear Algebra Assignment
2 pages
Xii Maths
No ratings yet
Xii Maths
118 pages
A Review On The Inverse of Symmetric Tridiagonal A
No ratings yet
A Review On The Inverse of Symmetric Tridiagonal A
23 pages
E9 205 - Machine Learning For Signal Processing: Homework # 1 January 24, 2022
No ratings yet
E9 205 - Machine Learning For Signal Processing: Homework # 1 January 24, 2022
2 pages
Learn Cbse: Maths Mcqs For Class 12 With Answers Chapter 3 Matrices
No ratings yet
Learn Cbse: Maths Mcqs For Class 12 With Answers Chapter 3 Matrices
9 pages
Namma Kalvi 11th Maths Chapter 7 QP 215753
No ratings yet
Namma Kalvi 11th Maths Chapter 7 QP 215753
2 pages
Ncert Exemplar Math Class 12 Chapter 03 Matrices
No ratings yet
Ncert Exemplar Math Class 12 Chapter 03 Matrices
52 pages
Chapter 2 - Matrix Algebra
No ratings yet
Chapter 2 - Matrix Algebra
89 pages
Ex Matrix Eigen Sol
No ratings yet
Ex Matrix Eigen Sol
3 pages
07 Matrix Frame
No ratings yet
07 Matrix Frame
72 pages
Determinants and Cramer'S Rule
No ratings yet
Determinants and Cramer'S Rule
3 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Experiment 2 Creating Array in Matlab: Mindanao State University
No ratings yet
Experiment 2 Creating Array in Matlab: Mindanao State University
5 pages
Determinant
No ratings yet
Determinant
41 pages
Reflection-Mirror Through An Arbitrary Plane
No ratings yet
Reflection-Mirror Through An Arbitrary Plane
12 pages
Applied Numerical Methods: Dr. Khaled Ahmida Ashouri
No ratings yet
Applied Numerical Methods: Dr. Khaled Ahmida Ashouri
12 pages
Assignment 8 Answers Math 130 Linear Algebra
No ratings yet
Assignment 8 Answers Math 130 Linear Algebra
3 pages
Unitary Transformation
No ratings yet
Unitary Transformation
34 pages
Linear Algebra Gilbert Strang - MIT18 - 06S10 - Pset5 - s10 - Soln
No ratings yet
Linear Algebra Gilbert Strang - MIT18 - 06S10 - Pset5 - s10 - Soln
9 pages
Matrices DPP - 1: Let'S Crack It!
No ratings yet
Matrices DPP - 1: Let'S Crack It!
2 pages
Revision (2023 24) 2
No ratings yet
Revision (2023 24) 2
63 pages
Algebra Matrices
No ratings yet
Algebra Matrices
21 pages
10820林秀豪教授應用數學入門筆記 - L4 matrix 2
No ratings yet
10820林秀豪教授應用數學入門筆記 - L4 matrix 2
9 pages
Solution Eigen
100% (2)
Solution Eigen
4 pages
SVD PDF
No ratings yet
SVD PDF
10 pages
14 PCA Max Variance and Min Error
No ratings yet
14 PCA Max Variance and Min Error
26 pages
MATH 20D Practice Final Problems: 1 Short Answer Questions
No ratings yet
MATH 20D Practice Final Problems: 1 Short Answer Questions
14 pages

PCA Machine Learning

Uploaded by

PCA Machine Learning

Uploaded by

Advanced Section #4:

Methods of Dimensionality Reduction:

CS109A Introduction to Data Science

2. Principal Component Analysis:

CS109A, PROTOPAPAS, RADER, 2

A process of reducing the number of predictor variables under

C. Bishop, Pattern Recognition and Machine

LASSO regression model:

LASSO variable selection:

J. Shlens, A Tutorial on Principal Component

LASSO LASSO simply selects one of the arbitrary

We want to use all the measurements to situate

X We want to find a lower-dimensional manifold

✓ Principal Component Analysis (PCA):

Then is a symmetric matrix.

CS109A, PROTOPAPAS, RADER, 7

CS109A, PROTOPAPAS, RADER, 8

CS109A, PROTOPAPAS, RADER, 9

● Hence, and are positive-semidefinite .

CS109A, PROTOPAPAS, RADER,

● and share the same eigenvalues:

CS109A, PROTOPAPAS, RADER, 11

● Cyclic Property of Trace:

● The trace of a Gram matrix is the sum of its eigenvalues.

CS109A, PROTOPAPAS, RADER, 12

Consider the model (data) matrix

We make the predictors centered (each column has zero expectation) by

Centered Model Matrix:

CS109A, PROTOPAPAS, RADER, 14

Inspecting the terms:

➢ The non-diagonal terms are the sample covariances:

CS109A, PROTOPAPAS, RADER, 15

PCA tries to fit an ellipsoid to the data.

PCA is a linear transformation that transforms data

PCA reduces the dimensions by throwing away the

Note that the covariance matrix is symmetric, so it permits an

The eigenvalues can be sorted in as:

The eigenvector is called the ith principal component of

CS109A, PROTOPAPAS, RADER, 18

The total sample variance of the predictors:

The fraction of the total sample variance that corresponds to :

so, indicates the “importance” of the ith principal component.

CS109A, PROTOPAPAS, RADER, 19

revealing the one-degree of freedom.

CS109A, PROTOPAPAS, RADER, 20

CS109A, PROTOPAPAS, RADER, 21

C. Bishop, Pattern Recognition and Machine

CS109A, PROTOPAPAS, RADER, 22

CS109A, PROTOPAPAS, RADER, 23

The average leaf

(Why do we need this again?)

CS109A, PROTOPAPAS, RADER, 24

CS109A, PROTOPAPAS, RADER, 25

CS109A, PROTOPAPAS, RADER, 26

Although PCA is a powerful tool for dimension reduction, it is based on

The assumptions are reasonable, but they must be checked in practice

When PCA assumptions fail, we need to use other Linear or Nonlinear

CS109A, PROTOPAPAS, RADER, 27

When a predictor deviates heavily from being

CS109A, PROTOPAPAS, RADER, 28

Assumption: The eigenvalue is measures the “importance” of the i th

It is intuitively reasonable that lower variability components describe the

CS109A, PROTOPAPAS, RADER, 29

PCA assumes that the intrinsic dimensions are orthogonal.

When this assumption fails, we need

Balaji Pitchai Kannu (on Quora)

PCA assumes that data lie on a lower dimensional linear manifold.

projectrhea.org Alexsei Tiulpin

Applying a nonlinear map Φ (called feature map) on data yields PCA

Centered nonlinear representation:

Apply PCA to the modified Kernel:

CS109A, PROTOPAPAS, RADER, 32

• Principal Component Analysis

CS109A, PROTOPAPAS, RADER, 33

CS109A, PROTOPAPAS, RADER, 34

You might also like