0% found this document useful (0 votes)

98 views5 pages

Steps For PCA

The document provides a detailed step-by-step explanation of Principal Component Analysis (PCA), starting with the standardization of data to ensure equal contribution of variables. It then covers the computation of the covariance matrix to identify relationships between variables, followed by the calculation of eigenvectors and eigenvalues to determine principal components. Finally, it discusses the creation of a feature vector for dimensionality reduction and the recasting of data along the principal components axes.

Uploaded by

GOBINDA PRADHAN 076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views5 pages

Steps For PCA

Uploaded by

GOBINDA PRADHAN 076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Step-by-Step Explanation of PCA

Step 1: Standardization

The aim of this step is to standardize the range of the continuous initial variables so that each one of
them contributes equally to the analysis.

More specifically, the reason why it is critical to perform standardization prior to PCA, is that the latter
is quite sensitive regarding the variances of the initial variables. That is, if there are large differences
between the ranges of initial variables, those variables with larger ranges will dominate over those with
small ranges (for example, a variable that ranges between 0 and 100 will dominate over a variable that
ranges between 0 and 1), which will lead to biased results. So, transforming the data to comparable
scales can prevent this problem.

Mathematically, this can be done by subtracting the mean and dividing by the standard deviation for
each value of each variable.

Once the standardization is done, all the variables will be transformed to the same scale.

Step 2: Covariance Matrix Computation

The aim of this step is to understand how the variables of the input data set are varying from the mean
with respect to each other, or in other words, to see if there is any relationship between them. Because
sometimes, variables are highly correlated in such a way that they contain redundant information. So, in
order to identify these correlations, we compute the covariance matrix.

The covariance matrix is a p × p symmetric matrix (where p is the number of dimensions) that has as
entries the covariances associated with all possible pairs of the initial variables. For example, for a 3-
dimensional data set with 3 variables x, y, and z, the covariance matrix is a 3×3 data matrix of this from:

Covariance Matrix for 3-Dimensional Data.

Since the covariance of a variable with itself is its variance (Cov(a,a)=Var(a)), in the main diagonal (Top
left to bottom right) we actually have the variances of each initial variable. And since the covariance is
commutative (Cov(a,b)=Cov(b,a)), the entries of the covariance matrix are symmetric with respect to
the main diagonal, which means that the upper and the lower triangular portions are equal.

What do the covariances that we have as entries of the matrix tell us about the correlations
between the variables?

It’s actually the sign of the covariance that matters:

 If positive then: the two variables increase or decrease together (correlated)
 If negative then: one increases when the other decreases (Inversely correlated)

Now that we know that the covariance matrix is not more than a table that summarizes the correlations
between all the possible pairs of variables, let’s move to the next step.

Step 3: Compute the eigenvectors and eigenvalues of the covariance matrix to identify the
principal components

Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from the
covariance matrix in order to determine the principal components of the data.

What you first need to know about eigenvectors and eigenvalues is that they always come in pairs, so
that every eigenvector has an eigenvalue. Also, their number is equal to the number of dimensions of the
data. For example, for a 3-dimensional data set, there are 3 variables, therefore there are 3 eigenvectors
with 3 corresponding eigenvalues.

It is eigenvectors and eigenvalues who are behind all the magic of principal components because the
eigenvectors of the Covariance matrix are actually the directions of the axes where there is the most
variance (most information) and that we call Principal Components. And eigenvalues are simply the
coefficients attached to eigenvectors, which give the amount of variance carried in each Principal
Component.

By ranking your eigenvectors in order of their eigenvalues, highest to lowest, you get the principal
components in order of significance.

Principal Component Analysis Example:

Let’s suppose that our data set is 2-dimensional with 2 variables x,y and that the eigenvectors and
eigenvalues of the covariance matrix are as follows:

If we rank the eigenvalues in descending order, we get λ1>λ2, which means that the eigenvector that
corresponds to the first principal component (PC1) is v1 and the one that corresponds to the second
principal component (PC2) is v2.

After having the principal components, to compute the percentage of variance (information) accounted
for by each component, we divide the eigenvalue of each component by the sum of eigenvalues. If we
apply this on the example above, we find that PC1 and PC2 carry respectively 96 percent and 4 percent
of the variance of the data.
Step 4: Create a Feature Vector

As we saw in the previous step, computing the eigenvectors and ordering them by their eigenvalues in
descending order, allow us to find the principal components in order of significance. In this step, what
we do is, to choose whether to keep all these components or discard those of lesser significance (of low
eigenvalues), and form with the remaining ones a matrix of vectors that we call Feature vector.

So, the feature vector is simply a matrix that has as columns the eigenvectors of the components that we
decide to keep. This makes it the first step towards dimensionality reduction, because if we choose to
keep only p eigenvectors (components) out of n, the final data set will have only p dimensions.

Principal Component Analysis Example:

Continuing with the example from the previous step, we can either form a feature vector with both of
the eigenvectors v1 and v2:

Or discard the eigenvector v2, which is the one of lesser significance, and form a feature vector with v1
only:

Discarding the eigenvector v2 will reduce dimensionality by 1, and will consequently cause a loss of
information in the final data set. But given that v2 was carrying only 4 percent of the information, the
loss will be therefore not important and we will still have 96 percent of the information that is carried
by v1.

So, as we saw in the example, it’s up to you to choose whether to keep all the components or discard the
ones of lesser significance, depending on what you are looking for. Because if you just want to describe
your data in terms of new variables (principal components) that are uncorrelated without seeking to
reduce dimensionality, leaving out lesser significant components is not needed.

Step 5: Recast the Data Along the Principal Components Axes

In the previous steps, apart from standardization, you do not make any changes on the data, you just
select the principal components and form the feature vector, but the input data set remains always in
terms of the original axes (i.e, in terms of the initial variables).

In this step, which is the last one, the aim is to use the feature vector formed using the eigenvectors of
the covariance matrix, to reorient the data from the original axes to the ones represented by the
principal components (hence the name Principal Components Analysis). This can be done by
multiplying the transpose of the original data set by the transpose of the feature vector.
Covariance Matrix

Example 1: The marks scored by 3 students in Physics and Biology are given below:
Student Physics(X) Biology(Y)

A 92 80

B 60 30

C 100 70

Calculate Covariance Matrix from the above data.

Solution:
Sample covariance matrix is given by ∑1n(xi−x‾)2n−1 n−1∑1n(xi−x)2 .
Here, μx = 84, n = 3
var(x) = [(92 – 84)2 + (60 – 84)2 + (100 – 84)2] / (3 – 1) = 448
Also, μy = 60, n = 3
var(y) = [(80 – 60)2 + (30 – 60)2 + (70 – 60)2] / (3 – 1) = 700
Now, cov(x, y) = cov(y, x) = [(92 – 84)(80 – 60) + (60 – 84)(30 – 60) + (100 – 84)(70 – 60)] / (3 –
1) = 520.
The population covariance matrix is given as: [448520520700][448520520700]

Properties of Covariance Matrix

The Properties of Covariance Matrix are mentioned below:
 A covariance matrix is always square, implying that the number of rows in a covariance matrix is always
equal to the number of columns in it.
 A covariance matrix is always symmetric, implying that the transpose of a covariance matrix is always
equal to the original matrix.
 A covariance matrix is always positive and semi-definite.
 The eigenvalues of a covariance matrix are always real and non-negative.
Eigen Values and Eigen Vector

Eigenvalues Definition
Eigenvalues are the scalar values associated with the eigenvectors in linear transformation. The
word ‘Eigen’ is of German Origin which means ‘characteristic’.

Eigenvalues are scalar values associated with a square matrix that measure how a matrix
transforms a vector. If a matrix AAA multiplies a vector vvv, and the result is a scalar multiple
of vvv, then that scalar is the eigenvalue corresponding to the eigenvector vvv. Eigenvalues are
widely used in fields like physics, engineering, and data science.
Hence, these characteristic values indicate the factor by which eigenvectors are stretched in
their direction. It doesn’t involve the change in the direction of the vector except when the
eigenvalue is negative. When the eigenvalue is negative the direction is just reversed.
The equation for eigenvalue is given by
Av = λv
Where,
 A is the matrix,
 v is associated eigenvector, and
 λ is scalar eigenvalue.

What are Eigenvectors?

Eigenvectors for square matrices are defined as non-zero vector values which when multiplied by
the square matrices give the scaler multiple of the vector, i.e. we define an eigenvector for matrix
A to be “v” if it specifies the condition, Av = λv
The scaler multiple λ in the above case is called the eigenvalue of the square matrix. We always
have to find the eigenvalues of the square matrix first before finding the eigenvectors of the
matrix.
For any square matrix, A of order n × n the eigenvector is the column matrix of order n × 1. If
we find the eigenvector of the matrix A by, Av = λv, “v” in this is called the right eigenvector of
the matrix A and is always multiplied to the right-hand side as matrix multiplication is not
commutative in nature. In general, when we find the eigenvector it is always the right
eigenvector.
We can also find the left eigenvector of the square matrix A by using the relation, vA = vλ
Here, v is the left eigenvector and is always multiplied to the left-hand side. If matrix A is of
order n × n then v is a column matrix of order 1 × n.
Eigenvector Equation
The Eigenvector equation is the equation that is used to find the eigenvector of any square
matrix. The eigenvector equation is,
Av = λv
Where,
 A is the given square matrix,
 v is the eigenvector of matrix A, and
 λ is any scaler multiple.

REGULA - FALSI METHOD Notes
0% (1)
REGULA - FALSI METHOD Notes
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
L08 PrincipalComponentAnalysis
No ratings yet
L08 PrincipalComponentAnalysis
36 pages
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
No ratings yet
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
RES805-RM-Module 2
No ratings yet
RES805-RM-Module 2
26 pages
How Do You Do A Principal Component Analysis?
No ratings yet
How Do You Do A Principal Component Analysis?
13 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
1-Python Algebra Maths
No ratings yet
1-Python Algebra Maths
26 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Lecture FPCA
No ratings yet
Lecture FPCA
67 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
Data Mining - Discretization
100% (1)
Data Mining - Discretization
5 pages
09 Pca
No ratings yet
09 Pca
19 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
The Mathematics Behind Principal Component Analysis
No ratings yet
The Mathematics Behind Principal Component Analysis
9 pages
Principal Component Analysis (PCA) Explained - Built in
No ratings yet
Principal Component Analysis (PCA) Explained - Built in
11 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Principal Component Analysis: by Eesha Tur Razia Babar
No ratings yet
Principal Component Analysis: by Eesha Tur Razia Babar
38 pages
4 1 Pca
No ratings yet
4 1 Pca
21 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
ACPusing R
No ratings yet
ACPusing R
25 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Steps of PCA
No ratings yet
Steps of PCA
2 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Pca
No ratings yet
Pca
16 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
DR Pca
No ratings yet
DR Pca
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Ecl To Petrel Model
No ratings yet
Ecl To Petrel Model
5 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
Principal Component Analysis (PCA) Final
No ratings yet
Principal Component Analysis (PCA) Final
37 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Unit 3
No ratings yet
Unit 3
28 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
Pca
No ratings yet
Pca
18 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Pac
No ratings yet
Pac
70 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
DSP Integrated Circuits 3
No ratings yet
DSP Integrated Circuits 3
3 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Markov Chain and Markov Processes
No ratings yet
Markov Chain and Markov Processes
9 pages
PCA
100% (1)
PCA
33 pages
Arid Agriculture University, Rawalpindi: Mid Exam / Spring 2021 (Paper Duration 12 Hours) To Be Filled by Teacher
No ratings yet
Arid Agriculture University, Rawalpindi: Mid Exam / Spring 2021 (Paper Duration 12 Hours) To Be Filled by Teacher
4 pages
Calibration Certificates 2019 2020
No ratings yet
Calibration Certificates 2019 2020
7 pages
A Combined Genetic Adaptive Search (Geneas) For Engineering Design
100% (3)
A Combined Genetic Adaptive Search (Geneas) For Engineering Design
34 pages
Tutorial Overview of Model Predictive Control
No ratings yet
Tutorial Overview of Model Predictive Control
15 pages
Unit - 1
No ratings yet
Unit - 1
69 pages
Aiming For A Star Pure A 2025
No ratings yet
Aiming For A Star Pure A 2025
65 pages
Continuum Mechanics, Applied Mathematics and Scientific Computing: Godunov's Legacy
No ratings yet
Continuum Mechanics, Applied Mathematics and Scientific Computing: Godunov's Legacy
378 pages
Lesson 2.0 DFT
No ratings yet
Lesson 2.0 DFT
42 pages
Deep Learning Pipeline: Building A Deep Learning Model With Tensorflow 1St Edition Hisham El-Amir
100% (4)
Deep Learning Pipeline: Building A Deep Learning Model With Tensorflow 1St Edition Hisham El-Amir
59 pages
Calculus Course Contents
No ratings yet
Calculus Course Contents
5 pages
L-26 (Parametric Geometric Continuity Conditions)
No ratings yet
L-26 (Parametric Geometric Continuity Conditions)
10 pages
Applied Regression Analysis and Generalized Linear Models, 3rd Edition Annotated PDF Download
100% (10)
Applied Regression Analysis and Generalized Linear Models, 3rd Edition Annotated PDF Download
16 pages
24 Ultimate Data Science Projects To Boost Your Knowledge and Skills
No ratings yet
24 Ultimate Data Science Projects To Boost Your Knowledge and Skills
10 pages
MAS Session1
No ratings yet
MAS Session1
72 pages
Aparta Mes12s2 Mathlab Simulation-Activity-2.1
No ratings yet
Aparta Mes12s2 Mathlab Simulation-Activity-2.1
4 pages
Unit 4 Relational Database Design
No ratings yet
Unit 4 Relational Database Design
22 pages
On The Controllers' Design To Stabilize Ground Resonance Helicopter
No ratings yet
On The Controllers' Design To Stabilize Ground Resonance Helicopter
16 pages
Quiz 04sle Ludecomposition Solution
No ratings yet
Quiz 04sle Ludecomposition Solution
11 pages
Untitled7.ipynb - Colaboratory
No ratings yet
Untitled7.ipynb - Colaboratory
12 pages
Output SPSS
No ratings yet
Output SPSS
19 pages
Technical Note 21
No ratings yet
Technical Note 21
2 pages
MCQ Chp1 ETItuba
No ratings yet
MCQ Chp1 ETItuba
7 pages
Hw5 C2a Vergara
No ratings yet
Hw5 C2a Vergara
5 pages
Dashboard - Qualification Round 2015 - Google Code Jam
No ratings yet
Dashboard - Qualification Round 2015 - Google Code Jam
1 page
Debre Markos University: Department of Information Technology
No ratings yet
Debre Markos University: Department of Information Technology
2 pages

Steps For PCA

Uploaded by

Steps For PCA

Uploaded by

Step-by-Step Explanation of PCA

Step 2: Covariance Matrix Computation

Covariance Matrix for 3-Dimensional Data.

It’s actually the sign of the covariance that matters:

Principal Component Analysis Example:

Principal Component Analysis Example:

Step 5: Recast the Data Along the Principal Components Axes

Calculate Covariance Matrix from the above data.

Properties of Covariance Matrix

What are Eigenvectors?

You might also like