Presentation A I STD 2
Presentation A I STD 2
U1 is a new axis.
The data has maximum dispersal along axis.
Using a new axis
Using a new axis
Variables
… …
⋮ ⋱ ⋮ ⋱ ⋮
Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
2 dimensional data:
X=
u is an unit vector:
=1
Data point 1:
=[ ]
Task: Project on u
Projection of Data
2 dimensional data:
X=
u is an unit vector:
=1
Data point 1:
=[ ]
Task: Project on u
Projection of Data
2 dimensional data:
X=
u is an unit vector:
=1
Data point 1:
=[ ]
Task: Project on u
Projection of Data
2 dimensional data:
X=
u is an unit vector:
=1
Data point 1:
=[ ]
.u
p= = = .u
u
Projection of Data
2 dimensional data:
X=
u is an unit vector:
=1
Data point 1:
=[ ]
.u
p= = = .u
u
p= .u = [ ] = +
Mathematics of PCA
Variables
unit Vector
… … ⋮
⋮ ⋱ ⋮ ⋱
u=
⋮
Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
dx1
… …
nxd
Projection of data(X) on u:
= Xu = ⋮
nx1
covariance matrix
• The aim of this step is to understand how the variables of the input
data set are varying from the mean with respect to each other, or in
other words, to see if there is any relationship between them.
• Because sometimes, variables are highly correlated in such a way that
they contain redundant information. So, in order to identify these
correlations, we compute the covariance matrix.
What do the covariances that we have as entries of
the matrix tell us about the correlations between the
variables?
2 dimensional data:
X=
Centred data:
− ̅ 35
6 = − ̅ −5
7 ! ( ) 8 ( , )
Cov(X) = 6 6 =
8 ( , ) ! ( )
Mathematics of PCA
… …
⋮ ⋱ ⋮ ⋱ ⋮ 7
Data: X = … … Covariance matrix S = cov(X) =
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
! ( ) 8 , … 8 ( , )
8 ( , ) ! … 8 ( , )
S=
: : :
8 ( , ) 8 , … ! ( )
Mathematics of PCA
… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
7
Covariance matrix S = cov(X) =
S is a square matrix (dxd)
S is a symmetric matrix S = ' 7
All eigen values of S are non-negative
All eigen vectors of S are orthogonal
It is eigenvectors and eigenvalues who are behind all the magic of
principal components.
The eigenvectors of the Covariance matrix are actually the directions
of the axes where there is the most variance (most information) and
that we call Principal Components.
Eigenvalues are simply the coefficients attached to eigenvectors,
which give the amount of variance carried in each Principal
Component.
By ranking your eigenvectors in order of their eigenvalues, highest to
lowest, you get the principal components in order of significance.
Mathematics of PCA
… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
7
Covariance matrix S = cov(X) =
If n > d and all columns of X are linearly
independent:
λ > 0 i = 1,2,…d
S has d eigenvectors
Mathematics of PCA
… …
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
7
Covariance matrix S = cov(X) =
If n < d, at least one λ is 0
Mathematics of PCA
Variables
… …
⋮ ⋱ ⋮ ⋱ ⋮
Samples
Data: X = … …
⋮ ⋱ ⋮ ⋱ ⋮
… …
nxd
Projection of X on u: = Xu
Projection of X on u: = Xu
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u
Constrain: =1
PCA: An optimization Problem
… … unit Vector
⋮ ⋱ ⋮ ⋱ ⋮
Data: X = … … u= ⋮
⋮ ⋱ ⋮ ⋱ ⋮
… … dx1
nxd
Projection of X on u: = Xu
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u
Constrain: =1
PCA: An optimization Problem
7
Objective function: argmax[var(Xu)] = argmax[ Su]
u u
Constrain: =1
Solution: Su = λ u
λ: Eigen value of S
u: Eigenvector of S
Variance of the projected data would be maximum when the unit vector (u) is
an eigenvector of the covariance matrix (S) of the data.
The Principal Components
Su = λ u
S is a dxd matrix
λ i = 1,2,…d
u i = 1,2,…d
Eigenvector corresponding to λ : u
Data projected on u will have highest variance.
u is the Principal Component 1
The Principal Components
Su = λ u
S is a dxd matrix
λ i = 1,2,…d
u i = 1,2,…d
Eigenvector corresponding to λ : u
Data projected on u will have highest variance.
u is the Principal Component 2
What Are Principal Components?
Principal components are new variables(axes) that are constructed as
linear combinations or mixtures of the initial variables.
These combinations are done in such a way that the new variables (i.e.,
principal components) are uncorrelated and most of the information
within the initial variables is squeezed or compressed into the first
components.
So, the idea is 10-dimensional data gives you 10 principal components,
but PCA tries to put maximum possible information in the first
component, then maximum remaining information in the second and so
on.
• Geometrically, principal components represent the directions of the
data that explain a maximal amount of variance, that is to say, the
lines that capture most information of the data.
• The relationship between variance and information here, is that, the
larger the variance carried by a line, the larger the dispersion of the
data points along it, and the larger the dispersion along a line, the more
information it has.
• To put all this simply, just think of principal components as new axes
that provide the best angle to see and evaluate the data, so that the
differences between the observations are better visible.
Selection of Principal Components
λ > λ > λ? >…. > λ
@ @D @B
> >…..>
∑B
AC @A ∑B
AC @A ∑B
AC @A
Fraction of total
variation in data
Selection of Principal Components
Projected data on principal components
X V T
… … …
ǀ ǀ ǀ F
⋮ ⋱ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
… … ….. …
F F
⋮ ⋱ ⋮ ⋱ ⋮ =
⋮ ⋮ ⋱ ⋮
… …
ǀ ǀ ǀ … F
nxd dxm nxm
… … …
ǀ ǀ ǀ F
⋮ ⋱ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
… … ….. …
F F
⋮ ⋱ ⋮ ⋱ ⋮ =
⋮ ⋮ ⋱ ⋮
… …
ǀ ǀ ǀ … F
nxd dxm nxm
…
F
⋮ ⋮ ⋱ ⋮
Sample i …
F
⋮ ⋮ ⋱ ⋮
… F
PC1 PC2
Key points
1. PCA projects higher dimensional data to lower dimensions while
preserving the trends and patterns in the data.
2. Data projected on those new dimensions/axes captures most of the
variation in the data
3. These axes are orthogonal to each other.
4. These axes are called Principal Components.
5. The eigenvectors of the covariance matrix of the data are the
Principal components.
6. Order the eigenvectors based on eigenvalues.
7. Select first few eigenvectors with high eigenvalues.
8. Project the data on those selected eigenvectors or principal
components.
Visualization of Principal Components
Note:
Standardization: That is, if there are large differences between the
ranges of initial variables, those variables with larger ranges will
dominate over those with small ranges (for example, a variable that
ranges between 0 and 100 will dominate over a variable that ranges
between 0 and 1), which will lead to biased results. So, transforming the
data to comparable scales can prevent this problem.
S.Variance∑ = ∑% ( − ̅)
Sampl X Y Z ( 3 )
e
1 S.Covariance∑ 0 = ∑% ( − ̅ )( − 5)
15 12.5 50 ( 3 )
2 35 15.8 55
3 20 9.3 70
4 14 20.1 65
5 28 5.2 80
Find covariance matrix for the following Sample data
S.Variance∑ = ∑% ( − ̅)
Sampl X Y Z ( 3 )
e
1 S.Covariance∑ 0 = ∑% ( − ̅ )( − 5)
15 12.5 50 ( 3 )
2 35 15.8 55
3 20 9.3 70 n=5, ̅ = 22.4, var(X) = 321.2 / (5 - 1) = 80.3
4
5 = 12.58, var(Y) = 132.148 / 4 = 33.037
G̅ = 64, var(Z) = 570 / 4 = 142.5
14 20.1 65
5 28 5.2 80 Cov(X, Y) = ∑( −22.4)( −12.58)5−1 = -13.865
Cov(X, Z) = ∑( −22.4)(G −64)5−1 = 14.25
Cov(Y, Z) = ∑( −12.58)(G −64)5−1 = -39.525
80.3 −13.865 14.25
• The covariance matrix S = −13.865 33.037 −39.5250
14.25 −39.5250 142.5