0% found this document useful (0 votes)
147 views20 pages

Principal Component Analysis

This document provides an introduction to principal component analysis (PCA) for dimensionality reduction. It discusses some prerequisites for understanding PCA, including the origins of PCA in statistics and its goal of making sense of data through an iterative process of cleaning, reducing, and transforming data. It also covers the importance of variance in data and information entropy, what is meant by dimensions in data analysis, and why dimensionality needs to be reduced. The key principles of PCA are that it finds the most important dimensions that contain the most information, allowing less important dimensions that do not contribute much to be discarded while retaining most of the information. This is done by determining the eigenvectors and eigenvalues of the data's covariance matrix and sorting the eigenvectors by eigenvalue from highest to

Uploaded by

Shahnawaz sahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views20 pages

Principal Component Analysis

This document provides an introduction to principal component analysis (PCA) for dimensionality reduction. It discusses some prerequisites for understanding PCA, including the origins of PCA in statistics and its goal of making sense of data through an iterative process of cleaning, reducing, and transforming data. It also covers the importance of variance in data and information entropy, what is meant by dimensions in data analysis, and why dimensionality needs to be reduced. The key principles of PCA are that it finds the most important dimensions that contain the most information, allowing less important dimensions that do not contribute much to be discarded while retaining most of the information. This is done by determining the eigenvectors and eigenvalues of the data's covariance matrix and sorting the eigenvectors by eigenvalue from highest to

Uploaded by

Shahnawaz sahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Principal

Component
Analysis
An Introduction to dimensionality reduction.

-Sahil Imani
Some prerequisites before
getting into PCA.
 Origins of PCA
 Importance of Variance in data and information Entropy.
 What do we mean by dimensions?
 Why do we need it to reduce dimensions?
 The logic Behind and A visual explaination
PCA: Origins
 Comes from Statistics, a part of factor analysis and
Dimensionality reduction (Feature Extraction).
 Is NOT a Machine Learning technique by itself.
 Goal Of data analysis is to generally make “sense” of
the data.
 Is done in 3 iteratively steps (Clean, Reduce, Transform)
until we get to an acceptable level.
Video Taken from Computerphile Youtube Channel
Importance of Variance in data
and information Entropy.

 Information Entropy Basically tells the rate of


information generation from a stochastic process.
 Basically, it gives us a relation between
 The Information Gain vs The Uncertainty.
 The more the uncertainty the more the information
Transfer/gained.
Dimensionality
 In terms of data analysis, the number of attributes or
features that determines the final output of a data
driven decision is known as its dimensionality.
 The more attributes we use to better define something
the more “dimensions” it has.
 For dimensions greater than 4 however it becomes
impossible to visualize on a 2d plane. Which is why we
need to reduce/project the data to a lower dimension
while at the same time trying to retain most of the
information.
Need to reduce
Dimensionality
 Helps in data Visualization
 Makes Calculations faster and the upcoming machine
learning stage needs less data to work with for the same
amount of information
 Reduces the data set so we can start drawing
Conclusions
 Optimizes the data for use in Actual machine learning or
statistical modelling.
PCA: The logic behind it and
visual explanation
 A common Example of Dimensionality reduction in
everyday life.
 Some Dimensions/factors contains much more
information than others.
 If we can find the principle or the “important”
dimensions we can discard the ones that doesn’t
contribute much or some highly correlated dimensions.
 This is the logical basis for PCA.
 Visually, we can see it (for 2 dimensions) as trying to fit
in a line along the direction of maximum Variance.
 It will be a linear combination of both the dimensions
A 2D visualization of a data
set having two attributes.
The Math Powering it.
Programming Implementation

 The Basic Flow is:


 To find the Eigen Values and Eigen Vectors Of the
Covariance matrix of the attributes.
 Sort the Eigen Vectors according to the eigen Values(from
max to Min).
 Discard as many Principal components as long as we are
within the Amount of information we need.
 Reproject The data using the reduced Dimension.
Thank You

You might also like