0% found this document useful (0 votes)

2 views37 pages

Lecture 3

The document discusses Principal Component Analysis (PCA) as a method for dimension reduction in high-dimensional data analysis, highlighting its mathematical principles and applications. It covers the fundamentals of linear algebra necessary for understanding PCA, including eigenvalues, eigenvectors, and matrix transformations. Additionally, it addresses the limitations of PCA, particularly in handling nonlinear relationships in complex datasets.

Uploaded by

Husam hr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views37 pages

Lecture 3

Uploaded by

Husam hr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Principal component analysis

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

High dimensional data in data analysis?

Words embeddings in NLP

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

High dimensional data in data analysis?

Brain activity
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

High dimensional data in data analysis?

Challenges ?
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

High dimensional data in data analysis?

Challenges ?
Visualize
Group in relevant clusters
Difficult with high dimensional data!
A classical dimension reduction approach Principal Component
Analysis
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Dimension reduction

Dimension reduction without loss of information?

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Dimension reduction

Dimension reduction without loss of information?

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Dimension reduction

Scientific questions
How can we reduce dimension to separate observations?
Possible answer : Principal Component Analysis
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

High dimensional data in data analysis?

Challenges ?
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

High dimensional data in data analysis?

Challenges ?
Visualize
Group in relevant clusters
Difficult with high dimensional data!
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Dimension reduction

Main features of Principal Component Analysis (PCA)

preserves the global structure of data.
maps all the clusters as a whole
potential applications : noise filtering, feature extractions, stock
market predictions, and gene data analysis.
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Vectors of Rp
Rp is the set of vectors with p components
 
 1 
For e.g. X = −3 is a 3-component vector.
 
 
4
We can also say that X belongs to R3

Concept of basis
The family (X1 , · · · , Xp ) is a basis of Rp if each vector of Rp can be
expressed in a unique way as a linear combination of X1 · · · , Xp
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Example 1
! !!
1 0
, is a basis of R2
0 1
!
x
Indeed every X = 1 can be expressed in a unique way as
x2
! !
1 0
X = x1 · + x2 ·
0 1
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

!
2
Example with X =
3
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Example 2
! !!
1 1
, is a basis of R2
1 −1
!
x1
Indeed, every X = can be expressed in a unique way as
x2

x1 + x2 1
! !
x1 − x2 1
· X= + ·
2 1 2 −1
!
3
Example with X =
2
! !
1 1
X = 2.5 · + 0.5 ·
1 −1
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

!
3
Example with X =
2
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Matrices
A matrix with p rows and p columns is an array of reals with p
rows nd p columns
A matrix maps vectors of Rp to vectors of Rp
It can be interpretated as a linear transformation of the plane in
the case p = 2
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Matrices
If we are given a matrix M, the transformation may be not so
simple to identify!
!
2 1
What is the transformation associated to M = ?
1 2
! !
x1 y1
Y = M · X with X = and Y = means
x2 y2

y1 = 2x1 + x2
y2 = x1 + 2x2

The transformation is eplicit but the geometric interpretation is

not so clear
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Simpler with a smartchange of coordinate?

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Eigenvalues and eigenvectors

Let A ∈ Md (R).
The vector X ∈ Rd \ {0} is said to be a eigenvector of matrix A
associated to the eigenvalue λ if AX = λX.
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Example 3
! !
2 1 1
Let A = and X =
1 2 1
!
3
Since AX = , X is an eigenvector of A with associated
3
eigenvalue 3
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Refresher on Linear Algebra

Diagonalizable matrices
The square matrix A with p columns and p rows is said to be
diagonalizable if there exists (X1 , · · · , Xp ) such that
Condition 1 : (X1 , · · · , Xp ) is a basis of Rp
Condition 2 : for each i, Xi is an eigenvector of A
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Principal Component Analysis

Principle

How perform dimension reduction in a linear way?

Mathematical tool : linear projection on a low-dimensional space
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Principal Component Analysis

Principle

How can we find the low-dimensional space H?

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Principal Component Analysis

Principle

Principal component analysis : how does it work?

The k dimensional space H that we are looking for is generated
by the k eigenvectors uα associated to the k largest eigenvalues
λα of the matrix X T X
We have several possible choices for the matrix X :
General PCA : the raw data matrix X = R
Centered PCA: the centered data matrix. the matrix X T X is then
the matrix of empirical covariances
Normed PCA : the normed and centered data matrix. The matrix
X T X is then the matrix of empirical correlations
Projection of the observation oi on the axis α
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Principal Component Analysis

Principle

Principal component analysis : how does it work?

In general n ≫ d (number of observations ≫ number of initial
variables)
It is the reason why we deal with the matrix X T X with dimension
d × d rather than XX T with dimension n × n
Existence of some links between these two analysis
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

PCA with Python

An example

To illustrate PCA we consider a dataset containing gene

expression profiles for 105 breast tumour samples measured
using Swegene Human 27K RAP UniGene188 arrays
Within the population of cells, one can focus focused on the
expression of GATA3 and XBP1, whose expression was known
to correlate with estrogen receptor status1

1
Breast cancer cells may be estrogen receptor positive, ER +,
or negative, ER , indicating capacity to respond to estrogen
signalling, which can therefore influence treatment
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

PCA with Python

An example

We plot the expression levels of GATA3 and XBP1 against one

another to visualise the data in the two-dimensional space
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

PCA with Python

An example

We perform PCA and visualize by plotting the original data

side-by-side with the transformed data

Original data versus transformed ones

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

PCA with Python

An example

We have simply rotated the original data, so that the greatest

variance aligns along the x-axis and so forth
We can find out how much of the variance each of the principle
components explains
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

PCA with Python

An example

PC1 explains the vast majority of the variance in the observations

The dimensionality reduction step of PCA occurs when we
choose to discard the later PCs.
We visualise the data using only PC1.
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Limits of PCA
An example

Principle component analysis is not always appropriate for

complex datasets, particularly when dealing with nonlinearities
To illustrate this, let’s consider an simulated expression set
containing 8 genes, with 10 timepoints/conditions.
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Limits of PCA
An example

The data can be separated out by a single direction

The data from time/condition 1 through to time/condition 10 can
ordered
Intuitively, the data can be represented by a single dimension
We run PCA as we would normally, and visualise the result,
plotting the first two PCs
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Limits of PCA
An example
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Limits of PCA
An example

We see that the PCA plot has placed the datapoints in a

horseshoe shape, with condition/time point 1 very close to
condition/time point 10.
From the earlier plots of gene expression profiles we can see that
the relationships between the various genes are not entirely
straightforward.
For example, gene 1 is initially correlated with gene 2, then
negatively correlated, and finally uncorrelated, whilst no
correlation exists between gene 1 and genes 5 - 8.
These nonlinearities make it difficult for PCA which, in general,
attempts to preserve large pairwise distances, leading to the well
known horseshoe effect
Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Limits of PCA
Pro and cons of PCA

Main advantages of PCA

Simple to implement, no tuning
Highly interpretable. We can find decide on how much variance
to preserve using eigenvalues.

Main drawbacks of PCA

It is a global transform which may not preserve local structure
(clusters)
It is sensitive to outliers

UNIT-4 Machine Learning
No ratings yet
UNIT-4 Machine Learning
20 pages
Grade 10 Math Essentials Midterm Exam
No ratings yet
Grade 10 Math Essentials Midterm Exam
3 pages
Big Data Unit II
No ratings yet
Big Data Unit II
23 pages
A Gateway To Modern Geometry The Poincare Half Plane 2nd Edition Saul Stahl Download
No ratings yet
A Gateway To Modern Geometry The Poincare Half Plane 2nd Edition Saul Stahl Download
42 pages
Metaheuristics For Portfolio Optimization An Introduction Using MATLAB 1st Edition G. A. Vijayalakshmi Pai
100% (1)
Metaheuristics For Portfolio Optimization An Introduction Using MATLAB 1st Edition G. A. Vijayalakshmi Pai
57 pages
FALLSEM2024-25 BCSE401L TH VL2024250102082 2024-09-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102082 2024-09-04 Reference-Material-I
19 pages
DSA5105 Lecture8
No ratings yet
DSA5105 Lecture8
35 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Assignment 1 Singly Linked List by Jeetansh Arora
No ratings yet
Assignment 1 Singly Linked List by Jeetansh Arora
10 pages
Mat 211 - 7
No ratings yet
Mat 211 - 7
14 pages
Asset-V1 MITx+CTL - sc0x+2T2023+Type@Asset+Block@Module 3 Clean
No ratings yet
Asset-V1 MITx+CTL - sc0x+2T2023+Type@Asset+Block@Module 3 Clean
130 pages
Computer Science
No ratings yet
Computer Science
3 pages
Scalding Tank Design
100% (3)
Scalding Tank Design
25 pages
Files Exercises (Consumer Theory)
75% (4)
Files Exercises (Consumer Theory)
47 pages
Data Science Lecture
No ratings yet
Data Science Lecture
24 pages
Lec 13-14 PCA
No ratings yet
Lec 13-14 PCA
53 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
Credit Scoring SAS
No ratings yet
Credit Scoring SAS
42 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
solClassNumericalCh04 24091100
No ratings yet
solClassNumericalCh04 24091100
17 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
SIRE: Scale-Invariant, Rotation-Equivariant Estimation of Artery Orientations Using Graph Neural Networks
No ratings yet
SIRE: Scale-Invariant, Rotation-Equivariant Estimation of Artery Orientations Using Graph Neural Networks
16 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Errors
No ratings yet
Errors
42 pages
A Tutorial On Principal Component Analysis
No ratings yet
A Tutorial On Principal Component Analysis
12 pages
DSA5102 Lecture9
100% (1)
DSA5102 Lecture9
35 pages
Intermediate Algebra Posttest
No ratings yet
Intermediate Algebra Posttest
3 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
No ratings yet
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
51 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
Agard Ar 138 PDF
No ratings yet
Agard Ar 138 PDF
612 pages
5-Dimension Reduction
No ratings yet
5-Dimension Reduction
48 pages
Unit 3
No ratings yet
Unit 3
102 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Lesson Exemplar in Science 8: 2. Engage - 5 Minutes
No ratings yet
Lesson Exemplar in Science 8: 2. Engage - 5 Minutes
3 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Principal Component Analysis (PCA) : Feature Extraction Node
No ratings yet
Principal Component Analysis (PCA) : Feature Extraction Node
4 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Visvesvaraya National Institute of Technology, Nagpur: Presentation On Fuzzy Arithmetic
No ratings yet
Visvesvaraya National Institute of Technology, Nagpur: Presentation On Fuzzy Arithmetic
32 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
A Tutorial On Principal Component Analysis
No ratings yet
A Tutorial On Principal Component Analysis
12 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
33 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Digital Logic Final Exam
0% (1)
Digital Logic Final Exam
3 pages
CNC Wire Cut Edm Kcut Programming Instruction
No ratings yet
CNC Wire Cut Edm Kcut Programming Instruction
26 pages
Principal Component Analysis (PCA) : Gundimeda Venugopal
No ratings yet
Principal Component Analysis (PCA) : Gundimeda Venugopal
17 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
CDB 3033 Transport Phenomena: Ii. Diffusion Through A Stagnant Gas Film
No ratings yet
CDB 3033 Transport Phenomena: Ii. Diffusion Through A Stagnant Gas Film
18 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Stat Report
No ratings yet
Stat Report
28 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Module 3
No ratings yet
Module 3
41 pages
Harmonic Oscilator With F.D.
No ratings yet
Harmonic Oscilator With F.D.
5 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Errata: Last Updated 5 December 2011
No ratings yet
Errata: Last Updated 5 December 2011
4 pages
Assignment 2 Question 4
No ratings yet
Assignment 2 Question 4
4 pages
Q3 Week 1 - Describing and Drawing Parallel, Intersecting and Perpendicular Lines
No ratings yet
Q3 Week 1 - Describing and Drawing Parallel, Intersecting and Perpendicular Lines
14 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
PCA revis-BoW PDF
No ratings yet
PCA revis-BoW PDF
47 pages
Introduction To Orthogonal Polynomials PDF
No ratings yet
Introduction To Orthogonal Polynomials PDF
30 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
04 Mathematics (Bases)
No ratings yet
04 Mathematics (Bases)
11 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Unit 3
No ratings yet
Unit 3
28 pages
2021 P11 Wk03 WS Vectors, Motion and Forces - 1617103209579 - ENT6T
No ratings yet
2021 P11 Wk03 WS Vectors, Motion and Forces - 1617103209579 - ENT6T
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
Permeability of Saturated Sands, Soils and Clays: Department of Chemistry, Rondebosch, University of Cape Town
No ratings yet
Permeability of Saturated Sands, Soils and Clays: Department of Chemistry, Rondebosch, University of Cape Town
12 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
PCA
100% (1)
PCA
33 pages
Week 1 2ND Quarter Modules Gen Math
100% (3)
Week 1 2ND Quarter Modules Gen Math
32 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

Lecture 3

Uploaded by

Lecture 3

Uploaded by

Motivation Dimension reduction Refresher on Linear Algebra Principal component analysis PCA with Python Limits of PCA

Principal component analysis

High dimensional data in data analysis?

Words embeddings in NLP

High dimensional data in data analysis?

High dimensional data in data analysis?

High dimensional data in data analysis?

Dimension reduction without loss of information?

Dimension reduction without loss of information?

High dimensional data in data analysis?

High dimensional data in data analysis?

Main features of Principal Component Analysis (PCA)

Refresher on Linear Algebra

Refresher on Linear Algebra

Refresher on Linear Algebra

Refresher on Linear Algebra

Refresher on Linear Algebra

Refresher on Linear Algebra

Refresher on Linear Algebra

The transformation is eplicit but the geometric interpretation is

Refresher on Linear Algebra

Simpler with a smartchange of coordinate?

Refresher on Linear Algebra

Eigenvalues and eigenvectors

Refresher on Linear Algebra

Refresher on Linear Algebra

Refresher on Linear Algebra

Principal Component Analysis

How perform dimension reduction in a linear way?

Principal Component Analysis

How can we find the low-dimensional space H?

Principal Component Analysis

Principal component analysis : how does it work?

Principal Component Analysis

Principal component analysis : how does it work?

PCA with Python

To illustrate PCA we consider a dataset containing gene

PCA with Python

We plot the expression levels of GATA3 and XBP1 against one

PCA with Python

We perform PCA and visualize by plotting the original data

Original data versus transformed ones

PCA with Python

We have simply rotated the original data, so that the greatest

PCA with Python

PC1 explains the vast majority of the variance in the observations

Principle component analysis is not always appropriate for

The data can be separated out by a single direction

We see that the PCA plot has placed the datapoints in a

Main advantages of PCA

Main drawbacks of PCA

You might also like