0% found this document useful (0 votes)

60 views22 pages

DR Pca

PCA stands for Principal Component Analysis and is a dimensionality reduction technique. It transforms a large set of variables into a smaller set of variables, called principal components, that contain most of the information. PCA constructs principal components as linear combinations of the original variables in a way that maximizes variance and is uncorrelated. It can be used for exploratory data analysis and predictive modeling by drawing strong patterns from datasets.

Uploaded by

adarsh.tripathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views22 pages

DR Pca

Uploaded by

adarsh.tripathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

PCA

PCA stands for Principal Component Analysis.

It is a way of ﬁnding the most important features in a dataset.
Principal component analysis, or PCA, is a dimensionality
reduction method that is often used to reduce the
dimensionality of large data sets, by transforming a large
set of variables into a smaller one that still contains most of
the information.
What Is Principal Component Analysis?

Principal Component Analysis is an unsupervised learning algorithm that is

used for the dimensionality reduction in machine learning.

It is a statistical process that converts the observations of correlated

features into a set of linearly uncorrelated features .

These new transformed features are called the Principal Components. It is

one of the popular tools that is used for exploratory data analysis and
predictive modeling.

It is a technique to draw strong patterns from the given dataset by reducing

the variances.
What Are Principal Components?

Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables.

These combinations are done in such a way that the new variables (i.e., principal
components) are uncorrelated and most of the information within the initial variables is
compressed into the first components.

So, the idea is 10-dimensional data gives you 10 principal components, but PCA tries to
put maximum possible information in the first component, then maximum remaining
information in the second and so on,
Scree plot
Organizing information in
principal components this
way will allow you to reduce
dimensionality without
losing much information,
and this by discarding the
components with low
information and considering
the remaining components
as your new variables.

Percentage of Variance (Information) for each by PC.

How PCA Constructs the Principal Components

As there are as many principal components as there are variables in the

data, principal components are constructed in such a manner that the
first principal component accounts for the largest possible
variance in the data set.
The second principal component is calculated in the same way, with
the condition that it is uncorrelated with (i.e., perpendicular to) the
first principal component and that it accounts for the next highest
variance.

This continues until a total of p principal components have been

calculated, equal to the original number of variables.
Steps how PCA is done on any dataset:
1.Standardize the data.

2. Compute the eigenvectors and eigenvalues of the covariance matrix to

identify the principal components

3. Arrange Eigenvalues

4. Create a feature vector to decide which principal components to keep

5. Project the data onto the principal components.

x1 2.5 0.5 2.2 1.9 3.1 2.3 2.0 1.0 1.5 1.1

x2 2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9

Consider the following dataset

Step 1: Standardize the Dataset,
If there are large differences between the ranges of initial variables, those variables with
larger ranges will dominate over those with small ranges (for example, a variable that
ranges between 0 and 100 will dominate over a variable that ranges between 0 and 1),
which will lead to biased results.

So, transforming the data to comparable scales can prevent this problem.

Once the standardization is done, all the variables will be transformed to

the same scale.
Step 1: Standardize the Dataset,

0.69 -1.31 0.39 0.09 1.29 0.49 0.19 -0.81 -0.31 -0.71

0.49 -1.21 0.99 0.29 1.09 0.79 -0.31 -0.81 -0.31 -1.0

Mean for x1= 1.81 and Mean for x2 = 1.91

Step 2: Find the Eigenvalues and eigenvectors

Eigenvectors and eigenvalues are the linear algebra concepts that we need to
compute from the covariance matrix in order to determine the principal
components of the data.

What you first need to know about eigenvectors and eigenvalues is that they
always come in pairs, so that every eigenvector has an eigenvalue. Also, their
number is equal to the number of dimensions of the data. For example, for a
3-dimensional data set, there are 3 variables, therefore there are 3 eigenvectors
with 3 corresponding eigenvalues.

By ranking your eigenvectors in order of their eigenvalues, highest to lowest, you

get the principal components in order of significance.
Step 3: Arrange Eigenvalues
The eigenvector with the highest eigenvalue is the Principal Component of the dataset. So in
this case, eigenvectors of lambda1 are the principal components.

Principal Component Analysis Example:

Let’s suppose that our data set is 2-dimensional with 2 variables x,y and that the
eigenvectors and eigenvalues of the covariance matrix are as follows:
If we rank the eigenvalues in descending order, we get λ1>λ2, which means

that the eigenvector that corresponds to the first principal

component (PC1) is v1 and the one that corresponds to the second

principal component (PC2) is v2.

After having the principal components, to compute the percentage of

variance (information) accounted for by each component, we divide the
eigenvalue of each component by the sum of eigenvalues.

If we apply this on the example above, we find that PC1 and PC2 carry

respectively 96 % and 4 % of the variance of the data.

STEP 4: CREATE A FEATURE VECTOR

As we saw in the previous step, computing the eigenvectors and ordering them by their
eigenvalues in descending order, allow us to find the principal components in order of
significance.

In this step, what we do is, to choose whether to keep all these components or discard
those of lesser significance (of low eigenvalues), and form with the remaining ones a
matrix of vectors that we call Feature vector.

So, the feature vector is simply a matrix that has as columns the eigenvectors of the
components that we decide to keep.

This makes it the first step towards dimensionality reduction, because if we choose to
keep only p eigenvectors (components) out of n, the final data set will have only p
dimensions.
Principal Component Analysis Example:
Continuing with the example from the previous step, we can either form

a feature vector with both of the eigenvectors v1 and v2

Where ﬁrst column are the eigenvectors v1 of λ 1 = 1.28403 & second

column are the eigenvectors v2 of λ2 = 0.0490834

Or discard the eigenvector v2, which is the one of lesser significance, and form a feature
vector with v1 only:
Discarding the eigenvector v2 will reduce dimensionality by 1, and will

consequently cause a loss of information in the final data set.

But given that v2 was carrying only 4 percent of the information, the loss

will be therefore not important and we will still have 96 percent of the
information that is carried by v1.

So, as we saw in the example, it’s up to you to choose whether to keep all
the components or discard the ones of lesser significance, depending on
what you are looking for.
STEP 5: RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS AXES

In the previous steps, apart from standardization, you do not make any changes on
the data, you just select the principal components and form the feature vector, but
the input data set remains always in terms of the original axes (i.e, in terms of the
initial variables).

In this step, the aim is to use the feature vector formed using the eigenvectors of
the covariance matrix, to reorient the data from the original axes to the ones
represented by the principal components (hence the name Principal Components
Analysis).

This can be done by multiplying the transpose of the original data set by the
transpose of the feature vector.
Step 5: Example…Transform Original Dataset
Use the equation Z = X V
Step 6: Reconstructing Data
So in order to reconstruct the original data, we follow:
Row Original DataSet = Row Zero Mean Data + Original Mean
Page 302, ML in Action by Peter Harrington

13.3 Example: using PCA to reduce the dimensionality

of semiconductor manufacturing data
Now that we have PCA working on a simple dataset, let’s move to a real-world
example.
13.4 Summary

Dimensionality reduction techniques allow us to make data easier to use and often remove noise
to make other machine learning tasks more accurate. It’s often a preprocessing step that can be
done to clean up data before applying it to some other algorithm.
A number of techniques can be used to reduce the dimensionality of our data. Among these,
independent component analysis, factor analysis, and principal component analysis are popular
methods.
The most widely used method is principal component analysis.
Principal component analysis allows the data to identify the important features.
It does this by rotating the axes to align with the largest variance in the data. Other axes
are chosen orthogonal to the first axis in the direction of largest variance. Eigenvalue
analysis on the covariance matrix can be used to give us a set of orthogonal axes.

Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Grade 6 Math
100% (1)
Grade 6 Math
35 pages
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
No ratings yet
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
8 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principal Component Analysis: by Eesha Tur Razia Babar
No ratings yet
Principal Component Analysis: by Eesha Tur Razia Babar
38 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
SEA Maths 2015 Past Paper Solutions
No ratings yet
SEA Maths 2015 Past Paper Solutions
34 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
4 1 Pca
No ratings yet
4 1 Pca
21 pages
Module 3
No ratings yet
Module 3
41 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis (PCA) Explained - Built in
No ratings yet
Principal Component Analysis (PCA) Explained - Built in
11 pages
Principal Component Analysis 4 Dummies - Eigenvectors, Eigenvalues and Dimension Reduction - George Dallas
No ratings yet
Principal Component Analysis 4 Dummies - Eigenvectors, Eigenvalues and Dimension Reduction - George Dallas
10 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Dimension Reduction Techniques v1
No ratings yet
Dimension Reduction Techniques v1
14 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
STAT502
No ratings yet
STAT502
13 pages
Grade 9 Maths English P1 Test
No ratings yet
Grade 9 Maths English P1 Test
25 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Pca
No ratings yet
Pca
18 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Program 3
No ratings yet
Program 3
7 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Pseudo Code PDF
50% (2)
Pseudo Code PDF
23 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA
100% (1)
PCA
33 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
Unit 3
No ratings yet
Unit 3
28 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Pca 1
No ratings yet
Pca 1
3 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Polynomials Engineering Practice Sheet
100% (1)
Polynomials Engineering Practice Sheet
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
2024 B7 End of Term 1 Exams Mathematics 2 Solution
No ratings yet
2024 B7 End of Term 1 Exams Mathematics 2 Solution
5 pages
3 Probability
100% (1)
3 Probability
54 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Introduction To Matrices
No ratings yet
Introduction To Matrices
40 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Namma Kalvi 12th Maths 1st Midterm Important Questions
No ratings yet
Namma Kalvi 12th Maths 1st Midterm Important Questions
41 pages
Robert Rosen How Are Organisms Different
100% (2)
Robert Rosen How Are Organisms Different
17 pages
3G Maths pp2 Q
No ratings yet
3G Maths pp2 Q
18 pages
10-M-Ch-1 To 15-Most Imp Questions For Board 2023
No ratings yet
10-M-Ch-1 To 15-Most Imp Questions For Board 2023
69 pages
Nisv SMC 2024-25
No ratings yet
Nisv SMC 2024-25
68 pages
A Proof of Sylow Theorem
No ratings yet
A Proof of Sylow Theorem
2 pages
A Basic Course in Probability Theory 2nd Edition Rabi Bhattacharya 2024 Scribd Download
100% (7)
A Basic Course in Probability Theory 2nd Edition Rabi Bhattacharya 2024 Scribd Download
62 pages
Math210 03notes
No ratings yet
Math210 03notes
4 pages
Sample Research Paper Review of Related Literature
100% (1)
Sample Research Paper Review of Related Literature
7 pages
Script Dynamics
No ratings yet
Script Dynamics
13 pages
27 PPT 27 Final Calculus of Complex Variables Part II Output
No ratings yet
27 PPT 27 Final Calculus of Complex Variables Part II Output
49 pages
Efdhv JH JD JHBF
No ratings yet
Efdhv JH JD JHBF
2 pages
Shinozuka 1990 - Stochastic Methods in Wind Engineering PDF
No ratings yet
Shinozuka 1990 - Stochastic Methods in Wind Engineering PDF
15 pages
Endsem
No ratings yet
Endsem
7 pages
2021 S4 Ma Ut5 [email protected]
No ratings yet
2021 S4 Ma Ut5 [email protected]
11 pages
Ajol File Journals - 295 - Articles - 199386 - Submission - Proof - 199386 3517 500926 1 10 20200904
No ratings yet
Ajol File Journals - 295 - Articles - 199386 - Submission - Proof - 199386 3517 500926 1 10 20200904
14 pages
Iteration Methods
No ratings yet
Iteration Methods
12 pages
Experiment No 2. Study and Perform The Operation of Matrix and Array
No ratings yet
Experiment No 2. Study and Perform The Operation of Matrix and Array
4 pages
Institute, Admmed Studies Center, Carouge-Genetur. Switzerluad
No ratings yet
Institute, Admmed Studies Center, Carouge-Genetur. Switzerluad
6 pages
Problem Solving Adding Fraction
No ratings yet
Problem Solving Adding Fraction
1 page
Problem Set 5 Solutions
No ratings yet
Problem Set 5 Solutions
3 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet

DR Pca

Uploaded by

DR Pca

Uploaded by

PCA

PCA stands for Principal Component Analysis.

Principal Component Analysis is an unsupervised learning algorithm that is

It is a statistical process that converts the observations of correlated

These new transformed features are called the Principal Components. It is

It is a technique to draw strong patterns from the given dataset by reducing

Percentage of Variance (Information) for each by PC.

As there are as many principal components as there are variables in the

This continues until a total of p principal components have been

2. Compute the eigenvectors and eigenvalues of the covariance matrix to

4. Create a feature vector to decide which principal components to keep

5. Project the data onto the principal components.

Consider the following dataset

Once the standardization is done, all the variables will be transformed to

Mean for x1= 1.81 and Mean for x2 = 1.91

By ranking your eigenvectors in order of their eigenvalues, highest to lowest, you

Principal Component Analysis Example:

that the eigenvector that corresponds to the first principal

component (PC1) is v1 and the one that corresponds to the second

principal component (PC2) is v2.

After having the principal components, to compute the percentage of

respectively 96 % and 4 % of the variance of the data.

a feature vector with both of the eigenvectors v1 and v2

Where ﬁrst column are the eigenvectors v1 of λ 1 = 1.28403 & second

consequently cause a loss of information in the final data set.

13.3 Example: using PCA to reduce the dimensionality

You might also like