0% found this document useful (0 votes)
60 views22 pages

DR Pca

PCA stands for Principal Component Analysis and is a dimensionality reduction technique. It transforms a large set of variables into a smaller set of variables, called principal components, that contain most of the information. PCA constructs principal components as linear combinations of the original variables in a way that maximizes variance and is uncorrelated. It can be used for exploratory data analysis and predictive modeling by drawing strong patterns from datasets.

Uploaded by

adarsh.tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views22 pages

DR Pca

PCA stands for Principal Component Analysis and is a dimensionality reduction technique. It transforms a large set of variables into a smaller set of variables, called principal components, that contain most of the information. PCA constructs principal components as linear combinations of the original variables in a way that maximizes variance and is uncorrelated. It can be used for exploratory data analysis and predictive modeling by drawing strong patterns from datasets.

Uploaded by

adarsh.tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

PCA

PCA stands for Principal Component Analysis.


It is a way of finding the most important features in a dataset.
Principal component analysis, or PCA, is a dimensionality
reduction method that is often used to reduce the
dimensionality of large data sets, by transforming a large
set of variables into a smaller one that still contains most of
the information.
What Is Principal Component Analysis?

Principal Component Analysis is an unsupervised learning algorithm that is


used for the dimensionality reduction in machine learning.

It is a statistical process that converts the observations of correlated


features into a set of linearly uncorrelated features .

These new transformed features are called the Principal Components. It is


one of the popular tools that is used for exploratory data analysis and
predictive modeling.

It is a technique to draw strong patterns from the given dataset by reducing


the variances.
What Are Principal Components?

Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables.

These combinations are done in such a way that the new variables (i.e., principal
components) are uncorrelated and most of the information within the initial variables is
compressed into the first components.

So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to
put maximum possible information in the first component, then maximum remaining
information in the second and so on,
Scree plot
Organizing information in
principal components this
way will allow you to reduce
dimensionality without
losing much information,
and this by discarding the
components with low
information and considering
the remaining components
as your new variables.

Percentage of Variance (Information) for each by PC.


How PCA Constructs the Principal Components

As there are as many principal components as there are variables in the


data, principal components are constructed in such a manner that the
first principal component accounts for the largest possible
variance in the data set.
The second principal component is calculated in the same way, with
the condition that it is uncorrelated with (i.e., perpendicular to) the
first principal component and that it accounts for the next highest
variance.

This continues until a total of p principal components have been


calculated, equal to the original number of variables.
Steps how PCA is done on any dataset:
1.Standardize the data.

2. Compute the eigenvectors and eigenvalues of the covariance matrix to


identify the principal components

3. Arrange Eigenvalues

4. Create a feature vector to decide which principal components to keep

5. Project the data onto the principal components.


x1 2.5 0.5 2.2 1.9 3.1 2.3 2.0 1.0 1.5 1.1

x2 2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9

Consider the following dataset


Step 1: Standardize the Dataset,
If there are large differences between the ranges of initial variables, those variables with
larger ranges will dominate over those with small ranges (for example, a variable that
ranges between 0 and 100 will dominate over a variable that ranges between 0 and 1),
which will lead to biased results.

So, transforming the data to comparable scales can prevent this problem.

Once the standardization is done, all the variables will be transformed to


the same scale.
Step 1: Standardize the Dataset,

0.69 -1.31 0.39 0.09 1.29 0.49 0.19 -0.81 -0.31 -0.71

0.49 -1.21 0.99 0.29 1.09 0.79 -0.31 -0.81 -0.31 -1.0

Mean for x1= 1.81 and Mean for x2 = 1.91


Step 2: Find the Eigenvalues and eigenvectors

Eigenvectors and eigenvalues are the linear algebra concepts that we need to
compute from the covariance matrix in order to determine the principal
components of the data.

What you first need to know about eigenvectors and eigenvalues is that they
always come in pairs, so that every eigenvector has an eigenvalue. Also, their
number is equal to the number of dimensions of the data. For example, for a
3-dimensional data set, there are 3 variables, therefore there are 3 eigenvectors
with 3 corresponding eigenvalues.

By ranking your eigenvectors in order of their eigenvalues, highest to lowest, you


get the principal components in order of significance.
Step 3: Arrange Eigenvalues
The eigenvector with the highest eigenvalue is the Principal Component of the dataset. So in
this case, eigenvectors of lambda1 are the principal components.

Principal Component Analysis Example:

Let’s suppose that our data set is 2-dimensional with 2 variables x,y and that the
eigenvectors and eigenvalues of the covariance matrix are as follows:
If we rank the eigenvalues in descending order, we get λ1>λ2, which means

that the eigenvector that corresponds to the first principal

component (PC1) is v1 and the one that corresponds to the second

principal component (PC2) is v2.

After having the principal components, to compute the percentage of


variance (information) accounted for by each component, we divide the
eigenvalue of each component by the sum of eigenvalues.

If we apply this on the example above, we find that PC1 and PC2 carry

respectively 96 % and 4 % of the variance of the data.


STEP 4: CREATE A FEATURE VECTOR

As we saw in the previous step, computing the eigenvectors and ordering them by their
eigenvalues in descending order, allow us to find the principal components in order of
significance.

In this step, what we do is, to choose whether to keep all these components or discard
those of lesser significance (of low eigenvalues), and form with the remaining ones a
matrix of vectors that we call Feature vector.

So, the feature vector is simply a matrix that has as columns the eigenvectors of the
components that we decide to keep.

This makes it the first step towards dimensionality reduction, because if we choose to
keep only p eigenvectors (components) out of n, the final data set will have only p
dimensions.
Principal Component Analysis Example:
Continuing with the example from the previous step, we can either form

a feature vector with both of the eigenvectors v1 and v2

Where first column are the eigenvectors v1 of λ 1 = 1.28403 & second


column are the eigenvectors v2 of λ2 = 0.0490834

Or discard the eigenvector v2, which is the one of lesser significance, and form a feature
vector with v1 only:
Discarding the eigenvector v2 will reduce dimensionality by 1, and will

consequently cause a loss of information in the final data set.

But given that v2 was carrying only 4 percent of the information, the loss

will be therefore not important and we will still have 96 percent of the
information that is carried by v1.

So, as we saw in the example, it’s up to you to choose whether to keep all
the components or discard the ones of lesser significance, depending on
what you are looking for.
STEP 5: RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS AXES

In the previous steps, apart from standardization, you do not make any changes on
the data, you just select the principal components and form the feature vector, but
the input data set remains always in terms of the original axes (i.e, in terms of the
initial variables).

In this step, the aim is to use the feature vector formed using the eigenvectors of
the covariance matrix, to reorient the data from the original axes to the ones
represented by the principal components (hence the name Principal Components
Analysis).

This can be done by multiplying the transpose of the original data set by the
transpose of the feature vector.
Step 5: Example…Transform Original Dataset
Use the equation Z = X V
Step 6: Reconstructing Data
So in order to reconstruct the original data, we follow:
Row Original DataSet = Row Zero Mean Data + Original Mean
Page 302, ML in Action by Peter Harrington

13.3 Example: using PCA to reduce the dimensionality


of semiconductor manufacturing data
Now that we have PCA working on a simple dataset, let’s move to a real-world
example.
13.4 Summary

Dimensionality reduction techniques allow us to make data easier to use and often remove noise
to make other machine learning tasks more accurate. It’s often a preprocessing step that can be
done to clean up data before applying it to some other algorithm.
A number of techniques can be used to reduce the dimensionality of our data. Among these,
independent component analysis, factor analysis, and principal component analysis are popular
methods.
The most widely used method is principal component analysis.
Principal component analysis allows the data to identify the important features.
It does this by rotating the axes to align with the largest variance in the data. Other axes
are chosen orthogonal to the first axis in the direction of largest variance. Eigenvalue
analysis on the covariance matrix can be used to give us a set of orthogonal axes.

You might also like