0% found this document useful (0 votes)
17 views53 pages

Lec 13-14 PCA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views53 pages

Lec 13-14 PCA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Machine Learning

COCSC403/CACSC403

Dr. Poonam Rani


Associate Professor Lecture 13-14:
NSUT(CSE) Dimensionality reduction - PCA
[email protected]

Dr Poonam Rani - Machine learning 1


2
Contents
 Feature selection techniques
1. Filter methods
2. Wrapper methods
3. Embedded methods

 Feature extraction

 PCA

 Managing Missing features

Dr Poonam Rani - Machine learning


1.Filter methods 3

x1 x2 x3 .. Xn-1 xn X
(Target)

Dr Poonam Rani - Machine learning


2. Wrapper methods 4
x1 x2 x3 .. Xn-1 xn X
(Target)
M1
M2

M3

A - M1 Wrapper
AB - M2
ABC - M3
ABCD - M4
……
Dr Poonam Rani - Machine learning
5
Dimensionality reduction
❑Dimensionality reduction can be divided into:

 Feature selection :
➢Find a subset of the original set of variables, or features, to get
a smaller subset which can be used to model the problem.

 Feature extraction/scaling:
➢This reduces the data in a high dimensional space to a lower
dimension space, i.e. a space with lesser no. of dimensions.

Dr Poonam Rani - Machine learning


6
Feature Selection
➢ Filter Method

➢ Wrapper method

➢ Embedded Method

➢ Hybrid Method

Dr Poonam Rani - Machine learning


7
Parameters for Feature Selection
❑The Similarity of information contributed by the features :

 Correlation

❑Quantum of information contributed by the features :

 Entropy

 Mutual Information

Dr Poonam Rani - Machine learning


8
Parameters for Feature Selection
❑Correlation

❖ Pearson’s correlation coefficient(ρ)

Cov( X , Y )
 ( X ,Y ) =
 XY
where ,
cov(X, Y) - covariance
σ(X) - standard deviation of X
σ(Y) - standard deviation of Y

Dr Poonam Rani - Machine learning


9
Parameters for Feature Selection
❑Entropy
o Entropy (H) can be formulated as:

H ( x) = E[ I ( X ) = E[− ln P ( X )]
Where, X - discrete random variable X
P(X) - probability mass function
E - expected value operator,
I - information content of X.
I(X) - a random variable.

o To calculate Entropy of feature fi: Exclude fi and calculate


entropy for rest of the features.

o If Entropy is low, then the information by feature fi is high.

o Entropy is mostly used for Unsupervised Learning


Dr Poonam Rani - Machine learning
10
Parameters for Feature Selection
❑Mutual Information

o Amount of uncertainty in X due to the knowledge of Y.

o It is calculated as:
P ( x, y )
I ( X , Y ) =  y =Y  x = X P( x, y ) log( )
P( x) P( y )
where

p(x, y) - joint probability function of X and Y,

p(x) - marginal probability distribution function of X

p(y) - marginal probability distribution function of Y


o Calculated to know the amount of information shared about the
class by a feature.

Dr Poonam Rani - Machine learning


Feature Scaling 11

 Feature Scaling is a technique to standardize the independent


features present in the data in a fixed range.

 It is performed during the data pre-processing.

 Feature Scaling Algorithms will scale Age, Salary, BHK in fixed


range say [-1, 1] or [0, 1].

 And then no feature can dominate other.

X1 500 300 200


X2 6 8 2

Dr Poonam Rani - Machine learning


Without Feature Scaling 12

 Eg: Classes: Yes or No

Dr Poonam Rani - Machine learning


Without Feature Scaling:
Distance Measure 13

Dr Poonam Rani - Machine learning


14
Techniques for Feature Scaling

Dr Poonam Rani - Machine learning


15
Methods of Dimensionality Reduction

➢ Principal Component Analysis (PCA)

➢ Linear Discriminant Analysis (LDA)

➢ Generalized Discriminant Analysis (GDA)

❑ Most Common Linear Method:

➢ Principal Component Analysis (PCA)

Dr Poonam Rani - Machine learning


16

Curse of dimensionality

Dr Poonam Rani - Machine learning


17

Ball
 Sphere

 Not - Eatable Orange


 Play White ball
 Red

 Size = 5 cm

Dr Poonam Rani - Machine learning


18

Real life example

Dr Poonam Rani - Machine learning


19

Dr Poonam Rani - Machine learning


20

Dr Poonam Rani - Machine learning


21

Dr Poonam Rani - Machine learning


22

Dr Poonam Rani - Machine learning


23

Dimensionality reduction

Dr Poonam Rani - Machine learning


24

Dimensionality reduction

Dr Poonam Rani - Machine learning


25

Dimensionality reduction

Dr Poonam Rani - Machine learning


26

Dimensionality reduction

Dr Poonam Rani - Machine learning


27

Dimensionality reduction

Dr Poonam Rani - Machine learning


28
Principle Component Analysis
(PCA)

Dr Poonam Rani - Machine learning


29
Principle Component Analysis
 Principal Component Analysis (PCA) is a dimension-reduction
tool that can be used to reduce a large set of variables to a
small set that still contains most of the information in the large
set.

 It is a mathematical procedure that transforms a number of


(possibly) correlated variables into a (smaller) number of
uncorrelated variables called principal components.

 The first principal component accounts for as much of the


variability in the data as possible, and each succeeding
component accounts for as much of the remaining variability
as possible.

Dr Poonam Rani - Machine learning


30
PCA Steps

Dr Poonam Rani - Machine learning


31

1. Standardization of data

Dr Poonam Rani - Machine learning


32
2. Calculate the covariance
matrix

Dr Poonam Rani - Machine learning


3. Calculate the eigenvalues 33

and eigenvectors

Dr Poonam Rani - Machine learning


34

4. Computing the principal


components

Dr Poonam Rani - Machine learning


35
5. Reducing the dimensions

Dr Poonam Rani - Machine learning


36
PCA Example
 Step 1: Find the mean Values

 Data: Mean Values:

x' = 1.81

y' = 1.91

Dr Poonam Rani - Machine learning


PCA Example 37

 Step 2: Subtract the mean to make the data pass through the
origin.

(x’-x) (y’-y)
-0.69 -0.49
1.31 1.21
The normalized data will have
-0.39 -0.99 mean=0
-0.09 -0.29
-1.29 -1.09
-0.49 -0.79 n
-0.19
0.81
0.31
0.81
 ( x − x ')( y − y ')
0.31 0.31
Co − Variance = i =1
n −1
0.71 1.01
Step 3: Find the Co-Variance Matrix: It shows how two variable vary
together
Dr Poonam Rani - Machine learning
PCA Example 38

Step 3: Find the Co-Variance Matrix: It shows how two variable vary
together n

 ( x − x ')( y − y ')
Co − Variance = i =1

n −1
X=(x’-x) Y=(y’-y) X^2 Y^2 X*Y
-0.69 -0.49 0.4761 0.2401 0.3381
1.31 1.21 1.7161 1.4641 1.5851
-0.39 -0.99 0.1521 0.9801 0.3861
-0.09 -0.29 0.0081 0.0841 0.0261
-1.29 -1.09 1.6641 1.1881 1.4061
-0.49 -0.79 0.2401 0.6241 0.3871
-0.19 0.31 0.0361 0.0961 -0.0589
0.81 0.81 0.6561 0.6561 0.6561
0.31 0.31 0.0961 0.0961 0.0961
0.71 1.01 0.5041 1.0201 0.7171
0 0 5.549 6.449 5.539
=sum/9 0.61656 0.71656 0.61544
Dr Poonam Rani - Machine learning
PCA Example 39

The Co-variance matrix for example is:

 0.616 0.615 
 0.615 0.716 
 
Since, the non-diagonal elements in covariance matrix are positive,
Thus, x and y variable increase together in one direction.

o Step 4: Calculate Eigen Vales and Eigen Vectors for covariance


matrix”

 0.4908   −0.735 −0.678 


eigen values =   eigen vectors  
 1.25402   0.677 −0.731 
The most important (principle) Eigen vector would have the direction
in which the variables strongly correlate.
Dr Poonam Rani - Machine learning
PCA Example 40
o Step 4: Calculate Eigen Vales and Eigen Vectors for covariance
matrix”

 0.616 0.615  1 0
C − I = 0    −    =0
 0.615 0.716  0 1
 0.616 −  0.615   0.4908 
  =
0.716 −   1.25402 
=0 eigen values
 0.615  
 0.616 − 0.490 0.615   x1 
Calculate Eigen Vectors     y  =0
 0.615 0.716 − 0.490  1 
 0.616 − 1.254 0.615   x2 
and   
0.716 − 1.254  y2  =0
 0.615
 0.4908   −0.735 −0.678 
eigen values =   eigen vectors  
 1.25402   0.677 −0.731 
The most important (principle) Eigen vector would have the direction
inDrwhich the variables strongly correlate.
Poonam Rani - Machine learning
PCA Example 41
o Step 5: The Eigen vectors with highest Eigen value will be selected
for PCA.

➢Now we can ignore the other dimensions,

➢For n dimensions of data→ n Eigen vectors→ select p Eigen vectors

➢For dimensionality reduction p < n

Final data= Feature Vector x Scaled DataT

Final data= [Eigen vectors]T x Scaled DataT

➢Final data is the final dataset, with data items in columns, and
dimensions along rows.

➢Example data has 2 dimensions so data was in terms of x and y.


Now the data will be in the terms of eigen vectors.
Dr Poonam Rani - Machine learning
Pros and cons of 42
Dimensionality Reduction
Advantages of Dimensionality Reduction

 It helps in data compression, and hence reduced storage space.

 It reduces computation time.

 It also helps remove redundant features, if any.

Disadvantages of Dimensionality Reduction

 It may lead to some amount of data loss.

 PCA tends to find linear correlations between variables, which is


sometimes undesirable.

 PCA fails in cases where mean and covariance are not enough to
define datasets.

 We may not know how many principal components to keep- in


practice,
Dr Poonam some
Rani - Machine thumb rules are applied.
learning
Independent Component analysis 43
(ICA)
➢Unlike principal component analysis which focuses on
maximizing the variance of the data points, the independent
component analysis focuses on independence, i.e.
independent components.

➢Problem: To extract independent sources’ signals from a mixed


signal composed of the signals from those sources.
Given: Mixed signal from five different independent sources.
Aim: To decompose the mixed signal into independent
sources:

Dr Poonam Rani - Machine learning


Independent Component 44
analysis (ICA)

Dr Poonam Rani - Machine learning


Independent Component 45

analysis (ICA)
 Decomposing the mixed signal of each microphone’s recording into
independent source’s speech signal can be done by using the
machine learning technique, independent component analysis.
[ X1, X2, ….., Xn ] => [ Y1, Y2, ….., Yn ]
where, X1, X2, …, Xn are the original signals present in the mixed
signal and Y1, Y2, …, Yn are the new features and are independent
components which are independent of each other.

Restrictions on ICA –

 The independent components generated by the ICA are assumed


to be statistically independent of each other.

 The independent components generated by the ICA must have


non-gaussian distribution.

 The number of independent components generated by the ICA is


equal to the number of observed mixtures.
Dr Poonam Rani - Machine learning
46

PRINCIPAL COMPONENT INDEPENDENT COMPONENT


ANALYSIS ANALYSIS

▪ It reduces the dimensions to ▪ It decomposes the mixed


avoid the problem of signal into its independent
overfitting. sources’ signals.

▪ It deals with the Principal ▪ It deals with the


Components. Independent Components.

▪ It focuses on maximizing the ▪ It doesn’t focus on the issue


variance. of variance among the data
points.
▪ It focuses on the mutual ▪ It doesn’t focus on the
orthogonality property of the mutual orthogonality of the
principal components. components.

▪ It doesn’t focus on the ▪ It focuses on the mutual


mutual independence of the independence of the
components. components.
Dr Poonam Rani - Machine learning
Linear Discriminate Analysis 47
(LDA)

 Linear Discriminant Analysis (LDA) is most commonly used as


dimensionality reduction technique in the pre-processing step
for pattern-classification and machine learning applications.

 The goal is to project a dataset onto a lower-dimensional


space with good class-separability in order avoid overfitting
(“curse of dimensionality”) and also reduce computational
costs.

 In addition to finding the component axes that maximize the


variance of our data (PCA), we are additionally interested in
the axes that maximize the separation between multiple
classes (LDA).

Dr Poonam Rani - Machine learning


Linear Discriminate 48

Analysis (LDA)
 The goal of an LDA is to project a feature space (a dataset n-
dimensional samples) onto a smaller subspace k (where k≤n−1)
while maintaining the class-discriminatory information.

 In general, dimensionality reduction does not only help


reducing computational costs for a given classification task, but
it can also be helpful to avoid overfitting by minimizing the error
in parameter estimation (“curse of dimensionality”).

Dr Poonam Rani - Machine learning


PCA vs LDA 49
 Both Linear Discriminant Analysis (LDA) and Principal Component
Analysis (PCA) are linear transformation techniques that are
commonly used for dimensionality reduction.

 PCA is a “unsupervised” algorithm.

 LDA is “supervised”

Dr Poonam Rani - Machine learning


50
LDA Steps
 Compute the d-dimensional mean vectors for the different
classes from the dataset.
 Compute the scatter matrices (in-between-class and within-class
scatter matrix).
 Compute the eigenvectors (e1,e2,...,ed) and corresponding
eigenvalues (λ1,λ2,...,λd) for the scatter matrices.
 Sort the eigenvectors by decreasing eigenvalues and
choose k eigenvectors with the largest eigenvalues to form
a d×k dimensional matrix W (where every column represents an
eigenvector).
 Use this d×k eigenvector matrix to transform the samples onto the
new subspace. This can be summarized by the matrix
multiplication: Y=X×WY=X×W (where X is a n×d-dimensional
matrix representing the n samples, and y are the
transformed n×k-dimensional samples in the new subspace).
Dr Poonam Rani - Machine learning
51

Managing Missing features

1. Remove the instance

2. Create sub model

3. Automatic strategy

Dr Poonam Rani - Machine learning


52

Assignment
• Chi-Square Test
• SURVEY PAPER STUDY (ppt and complete notes and video
lecture)
1. ALL DIMENSIONAL REDUCTION ALGORITHMS
• PCA, LDA, ICA and TSNE
2. MISSING VALUE HANDLING
3. IMPLEMENTATION OF PCA/ICA/LDA/T-SNE
1. RESEARCH PAPER IMPLEMENTION ON ANY
TOPIC
2. COMPARSION PPTS

Dr Poonam Rani - Machine learning


53

Thanks

Dr Poonam Rani - Machine learning

You might also like