0% found this document useful (0 votes)
34 views10 pages

Principal Component Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views10 pages

Principal Component Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

PRINCIPAL COMPONENT ANALYSIS

WHAT IS PRINCIPAL COMPONENT ANALYSIS?

1. Principal Component Analysis, or PCA, is a dimensionality-reduction method that is


often used to reduce the dimensionality of large data sets, by transforming a large set
of variables into a smaller one that still contains most of the information in the large
set.
2. Reducing the number of variables of a data set naturally comes at the expense of
accuracy, but the trick in dimensionality reduction is to trade a little accuracy for
simplicity.
3. Because smaller data sets are easier to explore and visualize and make analyzing data
much easier and faster for machine learning algorithms without extraneous variables
to process.
4. The idea of PCA is simply to reduce the number of variables of a data set, while
preserving as much information as possible.
STEP BY STEP EXPLANATION OF PCA

Step 1: Standardization
Step 2: Covariance Matrix Computation
Step 3: Compute The Eigenvectors And Eigenvalues of The Covariance Matrix To
Identify The Principal Components
Step 4: Feature Vector
Step 5: Recast The Data Along The Principal Components Axes
STEP 1: STANDARDIZATION

1. The aim of this step is to standardize the range of the continuous initial variables so
that each one of them contributes equally to the analysis.
2. Mathematically, this can be done by subtracting the mean and dividing by the
standard deviation for each value of each variable.

3. Once the standardization is done, all the variables will be transformed to the same
scale.
STEP 2: COVARIANCE MATRIX COMPUTATION

1. The aim of this step is to understand how the variables


of the input data set are varying from the mean with
respect to each other.
2. In other words, to see if there is any relationship  Since the covariance of a variable
between them. Because sometimes, variables with itself is its variance
are highly correlated in such a way that they contain (Cov(a,a)=Var(a)), in the main
redundant information. So, in order to identify these diagonal (Top left to bottom right)
correlations, we compute the covariance matrix. we actually have the variances of
each initial variable.
3. The covariance matrix is a p × p symmetric matrix  And since the covariance is
(where p is the number of dimensions) that has as commutative (Cov(a,b)=Cov(b,a)),
entries the covariances associated with all possible the entries of the covariance matrix
pairs of the initial variables. are symmetric with respect to the
4. For example, for a 3-dimensional data set with 3 main diagonal.
 It means that the upper and the
variables x, y, and z, the covariance matrix is a 3×3
lower triangular portions are equal.
matrix of this from:
STEP 3: COMPUTE THE EIGENVECTORS AND EIGENVALUES OF THE
COVARIANCE MATRIX TO IDENTIFY THE PRINCIPAL COMPONENTS

1. Eigenvectors and eigenvalues are the mathematical constructs that must be computed
from the covariance matrix in order to determine the principal components of the data
set.
2. Principal components are the new set of variables that are obtained from the initial
set of variables.
3. The principal components are computed in such a manner that newly obtained
variables are highly significant and independent of each other.
4. The principal components compress and possess most of the useful information that
was scattered among the initial variables.
5. If your data set is of 5 dimensions, then 5 principal components are computed, such
that, the first principal component stores the maximum possible information and the
second one stores the remaining maximum info and so on.
HOW PCA CONSTRUCTS THE PRINCIPAL COMPONENTS

1. Once the Eigenvectors and eigenvalues are computed, we have to arrange them in the
descending order, where the eigenvector with the highest eigenvalue is the most
significant and thus forms the first principal component.
2. The principal components of lesser significances can thus be removed in order to
reduce the dimensions of the data.
STEP 4: FEATURE VECTOR

1. The final step in computing the Principal Components is to form a matrix known as
the feature matrix.
2. It contains all the significant data variables that possess maximum information about
the data.
LAST STEP: RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS AXES

1. The last step in performing PCA is to re-arrange the original data with the final
principal components which represent the maximum and the most significant
information of the data set.
2. In order to replace the original data axis with the newly formed Principal
Components, you simply multiply the transpose of the original data set by the
transpose of the obtained feature vector.
Case Study
Problem Statement: To perform step by step
Principal Component Analysis in order to reduce the
dimension of the data set.
Data set Description: Movies rating data set that
contains ratings from 700+ users for approximately
9000 movies (features).
Logic: Perform PCA by finding the most significant
features in the data. PCA will be performed by
following the steps. Step 1: Import Required Packages
Import data set
Formatting the data
Step 2: Standardization
Step 3: Compute Covariance Matrix
Step 4: Calculate Eigenvectors and Eigenvalues
Step 5: Compute the feature vector
Step 6: Use the PCA() Function to Reduce the Dimensionality
of the Dataset
Step 7: Projecting the Variance w.r.t the Principle Components

You might also like