Lec 13-14 PCA
Lec 13-14 PCA
COCSC403/CACSC403
Feature extraction
PCA
x1 x2 x3 .. Xn-1 xn X
(Target)
M3
A - M1 Wrapper
AB - M2
ABC - M3
ABCD - M4
……
Dr Poonam Rani - Machine learning
5
Dimensionality reduction
❑Dimensionality reduction can be divided into:
Feature selection :
➢Find a subset of the original set of variables, or features, to get
a smaller subset which can be used to model the problem.
Feature extraction/scaling:
➢This reduces the data in a high dimensional space to a lower
dimension space, i.e. a space with lesser no. of dimensions.
➢ Wrapper method
➢ Embedded Method
➢ Hybrid Method
Correlation
Entropy
Mutual Information
Cov( X , Y )
( X ,Y ) =
XY
where ,
cov(X, Y) - covariance
σ(X) - standard deviation of X
σ(Y) - standard deviation of Y
H ( x) = E[ I ( X ) = E[− ln P ( X )]
Where, X - discrete random variable X
P(X) - probability mass function
E - expected value operator,
I - information content of X.
I(X) - a random variable.
o It is calculated as:
P ( x, y )
I ( X , Y ) = y =Y x = X P( x, y ) log( )
P( x) P( y )
where
Curse of dimensionality
Ball
Sphere
Size = 5 cm
Dimensionality reduction
Dimensionality reduction
Dimensionality reduction
Dimensionality reduction
Dimensionality reduction
1. Standardization of data
and eigenvectors
x' = 1.81
y' = 1.91
Step 2: Subtract the mean to make the data pass through the
origin.
(x’-x) (y’-y)
-0.69 -0.49
1.31 1.21
The normalized data will have
-0.39 -0.99 mean=0
-0.09 -0.29
-1.29 -1.09
-0.49 -0.79 n
-0.19
0.81
0.31
0.81
( x − x ')( y − y ')
0.31 0.31
Co − Variance = i =1
n −1
0.71 1.01
Step 3: Find the Co-Variance Matrix: It shows how two variable vary
together
Dr Poonam Rani - Machine learning
PCA Example 38
Step 3: Find the Co-Variance Matrix: It shows how two variable vary
together n
( x − x ')( y − y ')
Co − Variance = i =1
n −1
X=(x’-x) Y=(y’-y) X^2 Y^2 X*Y
-0.69 -0.49 0.4761 0.2401 0.3381
1.31 1.21 1.7161 1.4641 1.5851
-0.39 -0.99 0.1521 0.9801 0.3861
-0.09 -0.29 0.0081 0.0841 0.0261
-1.29 -1.09 1.6641 1.1881 1.4061
-0.49 -0.79 0.2401 0.6241 0.3871
-0.19 0.31 0.0361 0.0961 -0.0589
0.81 0.81 0.6561 0.6561 0.6561
0.31 0.31 0.0961 0.0961 0.0961
0.71 1.01 0.5041 1.0201 0.7171
0 0 5.549 6.449 5.539
=sum/9 0.61656 0.71656 0.61544
Dr Poonam Rani - Machine learning
PCA Example 39
0.616 0.615
0.615 0.716
Since, the non-diagonal elements in covariance matrix are positive,
Thus, x and y variable increase together in one direction.
0.616 0.615 1 0
C − I = 0 − =0
0.615 0.716 0 1
0.616 − 0.615 0.4908
=
0.716 − 1.25402
=0 eigen values
0.615
0.616 − 0.490 0.615 x1
Calculate Eigen Vectors y =0
0.615 0.716 − 0.490 1
0.616 − 1.254 0.615 x2
and
0.716 − 1.254 y2 =0
0.615
0.4908 −0.735 −0.678
eigen values = eigen vectors
1.25402 0.677 −0.731
The most important (principle) Eigen vector would have the direction
inDrwhich the variables strongly correlate.
Poonam Rani - Machine learning
PCA Example 41
o Step 5: The Eigen vectors with highest Eigen value will be selected
for PCA.
➢Final data is the final dataset, with data items in columns, and
dimensions along rows.
PCA fails in cases where mean and covariance are not enough to
define datasets.
analysis (ICA)
Decomposing the mixed signal of each microphone’s recording into
independent source’s speech signal can be done by using the
machine learning technique, independent component analysis.
[ X1, X2, ….., Xn ] => [ Y1, Y2, ….., Yn ]
where, X1, X2, …, Xn are the original signals present in the mixed
signal and Y1, Y2, …, Yn are the new features and are independent
components which are independent of each other.
Restrictions on ICA –
Analysis (LDA)
The goal of an LDA is to project a feature space (a dataset n-
dimensional samples) onto a smaller subspace k (where k≤n−1)
while maintaining the class-discriminatory information.
LDA is “supervised”
3. Automatic strategy
Assignment
• Chi-Square Test
• SURVEY PAPER STUDY (ppt and complete notes and video
lecture)
1. ALL DIMENSIONAL REDUCTION ALGORITHMS
• PCA, LDA, ICA and TSNE
2. MISSING VALUE HANDLING
3. IMPLEMENTATION OF PCA/ICA/LDA/T-SNE
1. RESEARCH PAPER IMPLEMENTION ON ANY
TOPIC
2. COMPARSION PPTS
Thanks