Open navigation menu

Scribd

0% found this document useful (0 votes)

17 views63 pages

Presentation A I STD 2

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views63 pages

Presentation A I STD 2

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Principal Component Analysis

Projection to reduce the dimension

Projection to reduce the dimension

Projection from 3D to 2D reduces dimension but causes loss of information

Where should we project?
Where should we project?
Where should we project?
Using a new axis
Using a new axis

U1 is a new axis.
The data has maximum dispersal along axis.
Using a new axis
Using a new axis

U2 is another new axis.

U2 captures rest of the variation within the data.
U2 is orthogonal to U1.
Using a new axis

U2 is another new axis.

U2 captures rest of the variation within the data.
U2 is orthogonal to U1.
Using a new axis

Project the data on U1 and U2 is another new axis.

Principal component analysis (PCA)

• Definition: Principal component analysis (PCA) is a

dimensionality reduction machine learning technique used to
simplify a large data set into a smaller set while still
maintaining the significant patterns and trends.
Now…
• Reducing the number of variables of a data set naturally comes at the
expense of accuracy.
• The trick in dimensionality reduction is to trade a little accuracy for
simplicity.
• Smaller data sets are easier to explore and visualize, and thus make
analyzing data points much easier and faster for machine learning
algorithms without extraneous variables to process.

• Reduce the number of variables of a data set, while preserving as much

information as possible: PCA
The essence of PCA

 In PCA we project our higher-dimensional data to new coordinate

system.

 Choose only a few of those new dimensions/axes which explain

most of the variation in the data.

 These axes are orthogonal to each other.

How Do You Do a Principal Component Analysis?

1.Standardize the range of continuous initial variables so that each one

of them contributes equally to the analysis. (PCA is quite
sensitive regarding the variances of the initial variables)
2.Compute the covariance matrix to identify correlations
3.Compute the eigenvectors and eigenvalues of the covariance matrix to
identify the principal components
4.Create a feature vector to decide which principal components have to
keep
5.Recast the data along the principal components axes
Mathematics of PCA

Variables

… …
⋮ ⋱ ⋮ ⋱ ⋮

Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd

Project the data(X) on a vector(u) such that:

(a)The variance of the projected data is maximum.

(b) u is an unit vector.
Projection of Data

2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

Task: Project on u
Projection of Data

2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

Task: Project on u
Projection of Data

2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

Task: Project on u
Projection of Data

2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

.u
p= = = .u
u
Projection of Data
2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

.u
p= = = .u
u
p= .u = [ ] = +
Mathematics of PCA
Variables
unit Vector
… … ⋮
⋮ ⋱ ⋮ ⋱
u=
⋮

Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
dx1

… …
nxd

Projection of data(X) on u:

= Xu = ⋮

nx1
covariance matrix

• The aim of this step is to understand how the variables of the input
data set are varying from the mean with respect to each other, or in
other words, to see if there is any relationship between them.
• Because sometimes, variables are highly correlated in such a way that
they contain redundant information. So, in order to identify these
correlations, we compute the covariance matrix.
What do the covariances that we have as entries of
the matrix tell us about the correlations between the
variables?

• It’s actually the sign of the covariance that matters:

• If positive then: the two variables increase or decrease together
(correlated)
• If negative then: one increases when the other decreases (Inversely
correlated)
• The covariance matrix is not more than a table that summarizes the
correlations between all the possible pairs of variables.
 The covariance matrix depicts the variance of datasets and covariance
of a pair of datasets in matrix format.
 The diagonal elements represent the variance of a dataset and the off-
diagonal terms give the covariance between a pair of datasets.
 The variance covariance matrix is always square, symmetric, and
positive semi-definite.
1
!" # = $
"
%
X Y
&. '(!")! ) * +!(+ " = ∑ % ( −#)

S. '(!")! ) * +!(+ " = ∑ % ( − ̅)

( 3 )
P.Covariance∑ 0 = ∑% ( − #1 )( − #2 )
.. ..
.. ..
S.Covariance∑ 0 = ∑% ( − ̅ )( − 5)
( 3 )

̅ = Sample mean, # = mean of population data

Covariance Matrix

2 dimensional data:
X=

Centred data:
− ̅ 35
6 = − ̅ −5

7 ! ( ) 8 ( , )
Cov(X) = 6 6 =
8 ( , ) ! ( )
Mathematics of PCA
… …
⋮ ⋱ ⋮ ⋱ ⋮ 7
Data: X = … … Covariance matrix S = cov(X) =
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd

! ( ) 8 , … 8 ( , )
8 ( , ) ! … 8 ( , )
S=
: : :
8 ( , ) 8 , … ! ( )
Mathematics of PCA
… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd

7
Covariance matrix S = cov(X) =
S is a square matrix (dxd)
S is a symmetric matrix S = ' 7
All eigen values of S are non-negative
All eigen vectors of S are orthogonal
 It is eigenvectors and eigenvalues who are behind all the magic of
principal components.
 The eigenvectors of the Covariance matrix are actually the directions
of the axes where there is the most variance (most information) and
that we call Principal Components.
 Eigenvalues are simply the coefficients attached to eigenvectors,
which give the amount of variance carried in each Principal
Component.
 By ranking your eigenvectors in order of their eigenvalues, highest to
lowest, you get the principal components in order of significance.
Mathematics of PCA

… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
7
Covariance matrix S = cov(X) =
If n > d and all columns of X are linearly
independent:
λ > 0 i = 1,2,…d
S has d eigenvectors
Mathematics of PCA

… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
7
Covariance matrix S = cov(X) =
If n < d, at least one λ is 0
Mathematics of PCA

Variables

… …
⋮ ⋱ ⋮ ⋱ ⋮

Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd

Project the data(X) on a vector(u) such that:

(a)The variance of the projected data is maximum.

(b) u is an unit vector.
PCA: An optimization Problem
… … unit Vector
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … … u= ⋮
⋮ ⋱ ⋮ ⋱ ⋮
… … dx1
nxd

Projection of X on u: = Xu

Objective function: argmax[var(Xu)]

u
Constrain: =1
PCA: An optimization Problem
… … unit Vector
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … … u= ⋮
⋮ ⋱ ⋮ ⋱ ⋮
… … dx1
nxd

Projection of X on u: = Xu
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u

Constrain: =1
PCA: An optimization Problem
… … unit Vector
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … … u= ⋮
⋮ ⋱ ⋮ ⋱ ⋮
… … dx1
nxd

Projection of X on u: = Xu
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u

Constrain: =1
PCA: An optimization Problem
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u
Constrain: =1

Solution: Su = λ u

λ: Eigen value of S
u: Eigenvector of S
Variance of the projected data would be maximum when the unit vector (u) is
an eigenvector of the covariance matrix (S) of the data.
The Principal Components
Su = λ u
S is a dxd matrix
λ i = 1,2,…d
u i = 1,2,…d

Variance of the projected data:

var(Xu) = 7 Su = 7 λ u = λ 7 u = λ
λ > λ > λ? >…. > λ

Eigenvector corresponding to λ : u
Data projected on u will have highest variance.
u is the Principal Component 1
The Principal Components
Su = λ u
S is a dxd matrix
λ i = 1,2,…d
u i = 1,2,…d

Variance of the projected data:

var(Xu) = 7 Su = 7 λ u = λ 7 u = λ
λ > λ > λ? >…. > λ

Eigenvector corresponding to λ : u
Data projected on u will have highest variance.
u is the Principal Component 2
What Are Principal Components?
Principal components are new variables(axes) that are constructed as
linear combinations or mixtures of the initial variables.
These combinations are done in such a way that the new variables (i.e.,
principal components) are uncorrelated and most of the information
within the initial variables is squeezed or compressed into the first
components.
So, the idea is 10-dimensional data gives you 10 principal components,
but PCA tries to put maximum possible information in the first
component, then maximum remaining information in the second and so
on.
• Geometrically, principal components represent the directions of the
data that explain a maximal amount of variance, that is to say, the
lines that capture most information of the data.
• The relationship between variance and information here, is that, the
larger the variance carried by a line, the larger the dispersion of the
data points along it, and the larger the dispersion along a line, the more
information it has.
• To put all this simply, just think of principal components as new axes
that provide the best angle to see and evaluate the data, so that the
differences between the observations are better visible.
Selection of Principal Components
λ > λ > λ? >…. > λ

@ @D @B
> >…..>
∑B
AC @A ∑B
AC @A ∑B
AC @A

Fraction of total
variation in data
Selection of Principal Components
Projected data on principal components
X V T

… … …
ǀ ǀ ǀ F
⋮ ⋱ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
… … ….. …
F F
⋮ ⋱ ⋮ ⋱ ⋮ =
⋮ ⋮ ⋱ ⋮
… …
ǀ ǀ ǀ … F
nxd dxm nxm

Centered data m selected principal components Projected data

Project data on principal components
X V T

… … …
ǀ ǀ ǀ F
⋮ ⋱ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
… … ….. …
F F
⋮ ⋱ ⋮ ⋱ ⋮ =
⋮ ⋮ ⋱ ⋮
… …
ǀ ǀ ǀ … F
nxd dxm nxm

Centered data m selected principal components Projected data

Loading matrix Score matrix
Project data on principal components
Projected data

…
F
⋮ ⋮ ⋱ ⋮
Sample i …
F
⋮ ⋮ ⋱ ⋮
… F

PC1 PC2
Key points
1. PCA projects higher dimensional data to lower dimensions while
preserving the trends and patterns in the data.
2. Data projected on those new dimensions/axes captures most of the
variation in the data
3. These axes are orthogonal to each other.
4. These axes are called Principal Components.
5. The eigenvectors of the covariance matrix of the data are the
Principal components.
6. Order the eigenvectors based on eigenvalues.
7. Select first few eigenvectors with high eigenvalues.
8. Project the data on those selected eigenvectors or principal
components.
Visualization of Principal Components
Note:
Standardization: That is, if there are large differences between the
ranges of initial variables, those variables with larger ranges will
dominate over those with small ranges (for example, a variable that
ranges between 0 and 100 will dominate over a variable that ranges
between 0 and 1), which will lead to biased results. So, transforming the
data to comparable scales can prevent this problem.

PCA is also used for Identification and elimination of multicolinearities

in the data.
Find covariance matrix for the following Sample data

S.Variance∑ = ∑% ( − ̅)
Sampl X Y Z ( 3 )
e
1 S.Covariance∑ 0 = ∑% ( − ̅ )( − 5)
15 12.5 50 ( 3 )
2 35 15.8 55
3 20 9.3 70
4 14 20.1 65
5 28 5.2 80
Find covariance matrix for the following Sample data

S.Variance∑ = ∑% ( − ̅)
Sampl X Y Z ( 3 )
e
1 S.Covariance∑ 0 = ∑% ( − ̅ )( − 5)
15 12.5 50 ( 3 )
2 35 15.8 55
3 20 9.3 70 n=5, ̅ = 22.4, var(X) = 321.2 / (5 - 1) = 80.3
4
5 = 12.58, var(Y) = 132.148 / 4 = 33.037
G̅ = 64, var(Z) = 570 / 4 = 142.5
14 20.1 65
5 28 5.2 80 Cov(X, Y) = ∑( −22.4)( −12.58)5−1 = -13.865
Cov(X, Z) = ∑( −22.4)(G −64)5−1 = 14.25
Cov(Y, Z) = ∑( −12.58)(G −64)5−1 = -39.525
80.3 −13.865 14.25
• The covariance matrix S = −13.865 33.037 −39.5250
14.25 −39.5250 142.5

You might also like

Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
45 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
No ratings yet
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
32 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
Mlfa Autumn 2023 Pca
No ratings yet
Mlfa Autumn 2023 Pca
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
20 Pca
No ratings yet
20 Pca
50 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
PCA
100% (1)
PCA
45 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Presentation
No ratings yet
Presentation
31 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Principal Component Analysis (PCA) : Gundimeda Venugopal
No ratings yet
Principal Component Analysis (PCA) : Gundimeda Venugopal
17 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
PCA1
No ratings yet
PCA1
45 pages
PCA revis-BoW PDF
No ratings yet
PCA revis-BoW PDF
47 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
PCA Biology
No ratings yet
PCA Biology
45 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Unit 3
No ratings yet
Unit 3
28 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Pca 1
No ratings yet
Pca 1
3 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Pac
No ratings yet
Pac
70 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
PCA
100% (1)
PCA
33 pages
12 Pri WB Math P5 PDF
No ratings yet
12 Pri WB Math P5 PDF
10 pages
Pre-Nurture & Career Foundation: Study Material Syllabus For Class - X Cbse (In)
No ratings yet
Pre-Nurture & Career Foundation: Study Material Syllabus For Class - X Cbse (In)
2 pages
Chapter 2 - Origin of Soil and Grain Size
No ratings yet
Chapter 2 - Origin of Soil and Grain Size
20 pages
The Dirac Spectrum
100% (2)
The Dirac Spectrum
168 pages
Chiron Product Profile - New
No ratings yet
Chiron Product Profile - New
16 pages
Viscosity Flow Time Relation
No ratings yet
Viscosity Flow Time Relation
7 pages
AEEM 728 Introduction To Ultrasonics Lecture Notes PDF
No ratings yet
AEEM 728 Introduction To Ultrasonics Lecture Notes PDF
147 pages
Air Duct Sizer Table
No ratings yet
Air Duct Sizer Table
2 pages
B.sc. Hons. Instrumentation
No ratings yet
B.sc. Hons. Instrumentation
64 pages
Modelo Helenos
No ratings yet
Modelo Helenos
2 pages
Cooling Tower Graph
No ratings yet
Cooling Tower Graph
4 pages
Chapter 2 Flownet
No ratings yet
Chapter 2 Flownet
10 pages
Thermodynamics
No ratings yet
Thermodynamics
145 pages
F3 Phy 1
No ratings yet
F3 Phy 1
12 pages
Activity 2 To Obtain Mirror Image
No ratings yet
Activity 2 To Obtain Mirror Image
3 pages
Ut Report - 017 - Fata Epc Fabtech - 03.04.2017
No ratings yet
Ut Report - 017 - Fata Epc Fabtech - 03.04.2017
2 pages
Problem 6.3: Solution
No ratings yet
Problem 6.3: Solution
2 pages
Zylbersztejn 1975
No ratings yet
Zylbersztejn 1975
13 pages
AZzardo Decorativ 2023
No ratings yet
AZzardo Decorativ 2023
34 pages
Year 10 - Paper 6 Q1
No ratings yet
Year 10 - Paper 6 Q1
8 pages
PVC 210 Gutter System Manual
No ratings yet
PVC 210 Gutter System Manual
38 pages
Little 2016
No ratings yet
Little 2016
7 pages
Well Log Interpretation
No ratings yet
Well Log Interpretation
64 pages
Isotherm Tabular Report: Create PDF
No ratings yet
Isotherm Tabular Report: Create PDF
28 pages
WSD Beam Design
No ratings yet
WSD Beam Design
5 pages
Ch01 Lecture01 PDF
No ratings yet
Ch01 Lecture01 PDF
22 pages
PPMG Engineering Serviceslimited
No ratings yet
PPMG Engineering Serviceslimited
6 pages
Calorimetry Practice Problems WS2
No ratings yet
Calorimetry Practice Problems WS2
2 pages
Algebra Lineal Y Teoria Matricial: Practico
No ratings yet
Algebra Lineal Y Teoria Matricial: Practico
1 page
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)