0% found this document useful (0 votes)
17 views63 pages

Presentation A I STD 2

Uploaded by

Archa Soman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views63 pages

Presentation A I STD 2

Uploaded by

Archa Soman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Principal Component Analysis

Projection to reduce the dimension


Projection to reduce the dimension

Projection from 3D to 2D reduces dimension but causes loss of information


Where should we project?
Where should we project?
Where should we project?
Using a new axis
Using a new axis

U1 is a new axis.
The data has maximum dispersal along axis.
Using a new axis
Using a new axis

U2 is another new axis.


U2 captures rest of the variation within the data.
U2 is orthogonal to U1.
Using a new axis

U2 is another new axis.


U2 captures rest of the variation within the data.
U2 is orthogonal to U1.
Using a new axis

Project the data on U1 and U2 is another new axis.


Principal component analysis (PCA)

• Definition: Principal component analysis (PCA) is a


dimensionality reduction machine learning technique used to
simplify a large data set into a smaller set while still
maintaining the significant patterns and trends.
Now…
• Reducing the number of variables of a data set naturally comes at the
expense of accuracy.
• The trick in dimensionality reduction is to trade a little accuracy for
simplicity.
• Smaller data sets are easier to explore and visualize, and thus make
analyzing data points much easier and faster for machine learning
algorithms without extraneous variables to process.

• Reduce the number of variables of a data set, while preserving as much


information as possible: PCA
The essence of PCA

 In PCA we project our higher-dimensional data to new coordinate


system.

 Choose only a few of those new dimensions/axes which explain


most of the variation in the data.

 These axes are orthogonal to each other.


How Do You Do a Principal Component Analysis?

1.Standardize the range of continuous initial variables so that each one


of them contributes equally to the analysis. (PCA is quite
sensitive regarding the variances of the initial variables)
2.Compute the covariance matrix to identify correlations
3.Compute the eigenvectors and eigenvalues of the covariance matrix to
identify the principal components
4.Create a feature vector to decide which principal components have to
keep
5.Recast the data along the principal components axes
Mathematics of PCA

Variables

… …
⋮ ⋱ ⋮ ⋱ ⋮

Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd

Project the data(X) on a vector(u) such that:

(a)The variance of the projected data is maximum.


(b) u is an unit vector.
Projection of Data

2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

Task: Project on u
Projection of Data

2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

Task: Project on u
Projection of Data

2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

Task: Project on u
Projection of Data

2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

.u
p= = = .u
u
Projection of Data
2 dimensional data:
X=

u is an unit vector:
=1
Data point 1:
=[ ]

.u
p= = = .u
u
p= .u = [ ] = +
Mathematics of PCA
Variables
unit Vector
… … ⋮
⋮ ⋱ ⋮ ⋱
u=

Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
dx1

… …
nxd

Projection of data(X) on u:

= Xu = ⋮

nx1
covariance matrix

• The aim of this step is to understand how the variables of the input
data set are varying from the mean with respect to each other, or in
other words, to see if there is any relationship between them.
• Because sometimes, variables are highly correlated in such a way that
they contain redundant information. So, in order to identify these
correlations, we compute the covariance matrix.
What do the covariances that we have as entries of
the matrix tell us about the correlations between the
variables?

• It’s actually the sign of the covariance that matters:


• If positive then: the two variables increase or decrease together
(correlated)
• If negative then: one increases when the other decreases (Inversely
correlated)
• The covariance matrix is not more than a table that summarizes the
correlations between all the possible pairs of variables.
 The covariance matrix depicts the variance of datasets and covariance
of a pair of datasets in matrix format.
 The diagonal elements represent the variance of a dataset and the off-
diagonal terms give the covariance between a pair of datasets.
 The variance covariance matrix is always square, symmetric, and
positive semi-definite.
1
!" # = $
"
%
X Y
&. '(!")! ) * +!(+ " = ∑ % ( −#)

S. '(!")! ) * +!(+ " = ∑ % ( − ̅)


( 3 )
P.Covariance∑ 0 = ∑% ( − #1 )( − #2 )
.. ..
.. ..
S.Covariance∑ 0 = ∑% ( − ̅ )( − 5)
( 3 )

̅ = Sample mean, # = mean of population data


Covariance Matrix

2 dimensional data:
X=

Centred data:
− ̅ 35
6 = − ̅ −5

7 ! ( ) 8 ( , )
Cov(X) = 6 6 =
8 ( , ) ! ( )
Mathematics of PCA
… …
⋮ ⋱ ⋮ ⋱ ⋮ 7
Data: X = … … Covariance matrix S = cov(X) =
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd

! ( ) 8 , … 8 ( , )
8 ( , ) ! … 8 ( , )
S=
: : :
8 ( , ) 8 , … ! ( )
Mathematics of PCA
… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd

7
Covariance matrix S = cov(X) =
S is a square matrix (dxd)
S is a symmetric matrix S = ' 7
All eigen values of S are non-negative
All eigen vectors of S are orthogonal
 It is eigenvectors and eigenvalues who are behind all the magic of
principal components.
 The eigenvectors of the Covariance matrix are actually the directions
of the axes where there is the most variance (most information) and
that we call Principal Components.
 Eigenvalues are simply the coefficients attached to eigenvectors,
which give the amount of variance carried in each Principal
Component.
 By ranking your eigenvectors in order of their eigenvalues, highest to
lowest, you get the principal components in order of significance.
Mathematics of PCA

… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
7
Covariance matrix S = cov(X) =
If n > d and all columns of X are linearly
independent:
λ > 0 i = 1,2,…d
S has d eigenvectors
Mathematics of PCA

… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
7
Covariance matrix S = cov(X) =
If n < d, at least one λ is 0
Mathematics of PCA

Variables

… …
⋮ ⋱ ⋮ ⋱ ⋮

Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd

Project the data(X) on a vector(u) such that:

(a)The variance of the projected data is maximum.


(b) u is an unit vector.
PCA: An optimization Problem
… … unit Vector
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … … u= ⋮
⋮ ⋱ ⋮ ⋱ ⋮
… … dx1
nxd

Projection of X on u: = Xu

Objective function: argmax[var(Xu)]


u
Constrain: =1
PCA: An optimization Problem
… … unit Vector
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … … u= ⋮
⋮ ⋱ ⋮ ⋱ ⋮
… … dx1
nxd

Projection of X on u: = Xu
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u

Constrain: =1
PCA: An optimization Problem
… … unit Vector
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … … u= ⋮
⋮ ⋱ ⋮ ⋱ ⋮
… … dx1
nxd

Projection of X on u: = Xu
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u

Constrain: =1
PCA: An optimization Problem
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u
Constrain: =1

Solution: Su = λ u

λ: Eigen value of S
u: Eigenvector of S
Variance of the projected data would be maximum when the unit vector (u) is
an eigenvector of the covariance matrix (S) of the data.
The Principal Components
Su = λ u
S is a dxd matrix
λ i = 1,2,…d
u i = 1,2,…d

Variance of the projected data:


var(Xu) = 7 Su = 7 λ u = λ 7 u = λ
λ > λ > λ? >…. > λ

Eigenvector corresponding to λ : u
Data projected on u will have highest variance.
u is the Principal Component 1
The Principal Components
Su = λ u
S is a dxd matrix
λ i = 1,2,…d
u i = 1,2,…d

Variance of the projected data:


var(Xu) = 7 Su = 7 λ u = λ 7 u = λ
λ > λ > λ? >…. > λ

Eigenvector corresponding to λ : u
Data projected on u will have highest variance.
u is the Principal Component 2
What Are Principal Components?
Principal components are new variables(axes) that are constructed as
linear combinations or mixtures of the initial variables.
These combinations are done in such a way that the new variables (i.e.,
principal components) are uncorrelated and most of the information
within the initial variables is squeezed or compressed into the first
components.
So, the idea is 10-dimensional data gives you 10 principal components,
but PCA tries to put maximum possible information in the first
component, then maximum remaining information in the second and so
on.
• Geometrically, principal components represent the directions of the
data that explain a maximal amount of variance, that is to say, the
lines that capture most information of the data.
• The relationship between variance and information here, is that, the
larger the variance carried by a line, the larger the dispersion of the
data points along it, and the larger the dispersion along a line, the more
information it has.
• To put all this simply, just think of principal components as new axes
that provide the best angle to see and evaluate the data, so that the
differences between the observations are better visible.
Selection of Principal Components
λ > λ > λ? >…. > λ

@ @D @B
> >…..>
∑B
AC @A ∑B
AC @A ∑B
AC @A

Fraction of total
variation in data
Selection of Principal Components
Projected data on principal components
X V T

… … …
ǀ ǀ ǀ F
⋮ ⋱ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
… … ….. …
F F
⋮ ⋱ ⋮ ⋱ ⋮ =
⋮ ⋮ ⋱ ⋮
… …
ǀ ǀ ǀ … F
nxd dxm nxm

Centered data m selected principal components Projected data


Project data on principal components
X V T

… … …
ǀ ǀ ǀ F
⋮ ⋱ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
… … ….. …
F F
⋮ ⋱ ⋮ ⋱ ⋮ =
⋮ ⋮ ⋱ ⋮
… …
ǀ ǀ ǀ … F
nxd dxm nxm

Centered data m selected principal components Projected data


Loading matrix Score matrix
Project data on principal components
Projected data


F
⋮ ⋮ ⋱ ⋮
Sample i …
F
⋮ ⋮ ⋱ ⋮
… F

PC1 PC2
Key points
1. PCA projects higher dimensional data to lower dimensions while
preserving the trends and patterns in the data.
2. Data projected on those new dimensions/axes captures most of the
variation in the data
3. These axes are orthogonal to each other.
4. These axes are called Principal Components.
5. The eigenvectors of the covariance matrix of the data are the
Principal components.
6. Order the eigenvectors based on eigenvalues.
7. Select first few eigenvectors with high eigenvalues.
8. Project the data on those selected eigenvectors or principal
components.
Visualization of Principal Components
Note:
Standardization: That is, if there are large differences between the
ranges of initial variables, those variables with larger ranges will
dominate over those with small ranges (for example, a variable that
ranges between 0 and 100 will dominate over a variable that ranges
between 0 and 1), which will lead to biased results. So, transforming the
data to comparable scales can prevent this problem.

PCA is also used for Identification and elimination of multicolinearities


in the data.
Find covariance matrix for the following Sample data

S.Variance∑ = ∑% ( − ̅)
Sampl X Y Z ( 3 )
e
1 S.Covariance∑ 0 = ∑% ( − ̅ )( − 5)
15 12.5 50 ( 3 )
2 35 15.8 55
3 20 9.3 70
4 14 20.1 65
5 28 5.2 80
Find covariance matrix for the following Sample data

S.Variance∑ = ∑% ( − ̅)
Sampl X Y Z ( 3 )
e
1 S.Covariance∑ 0 = ∑% ( − ̅ )( − 5)
15 12.5 50 ( 3 )
2 35 15.8 55
3 20 9.3 70 n=5, ̅ = 22.4, var(X) = 321.2 / (5 - 1) = 80.3
4
5 = 12.58, var(Y) = 132.148 / 4 = 33.037
G̅ = 64, var(Z) = 570 / 4 = 142.5
14 20.1 65
5 28 5.2 80 Cov(X, Y) = ∑( −22.4)( −12.58)5−1 = -13.865
Cov(X, Z) = ∑( −22.4)(G −64)5−1 = 14.25
Cov(Y, Z) = ∑( −12.58)(G −64)5−1 = -39.525
80.3 −13.865 14.25
• The covariance matrix S = −13.865 33.037 −39.5250
14.25 −39.5250 142.5

You might also like