Principal Component Analysis

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction, data compression, feature extraction, and data visualization by projecting data onto a lower-dimensional space while maximizing variance. The process involves steps such as mean subtraction, calculating the covariance matrix, determining eigenvectors and eigenvalues, selecting significant components, and deriving a new dataset. PCA effectively identifies patterns in high-dimensional data, allowing for a compressed representation with minimal loss of information.

Uploaded by

Jithin S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views27 pages

Principal Component Analysis

Uploaded by

Jithin S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Principal Component Analysis

Introduction

• Principal Component Analysis (PCA) is a technique that is widely

used for applications such as:
• dimensionality reduction
• lossy data compression
• feature extraction and
• data visualisation.
• PCA can be defined as the orthogonal projection of the data onto a
lower dimensional linear space, known as principal subspace, such
that the variance of the projected data is maximized.
• Equivalently, it can be defined as the linear projection that minimizes
the average projection cost, defined as the mean squared distance
between the data points and their projections.
Maximum variance formulation

• Consider a dataset of observations {xn}, where n=1,2,3,…,N and xn is

a Euclidean variable with dimensionality D.
• Our goal is to project the data onto a space having dimensionality
M<D, while maximizing the variance of the projected data.
• To start with, consider the projection onto a one dimensional space
(M=1).
• We can define the direction of this space using a D-dimensional vector
u1, which for convenience, we shall choose to be a unit vector so that
u1Tu1 = 1.
• Note: A set of vectors Rn is called a basis, if they are linearly
independent and every vector Rn can be expressed as a linear
combination of these vectors.
• A set of vectors {x1, x2, x3, …, xn} are said to be linearly independent
if the linear vector equation w1x1+w2x2+…..+wnxn = 0 has only the
trivial solution w1=w2=…=wn =0. The set {x1, x2, x3, …, xn} is
linearly dependent otherwise.
Principal Component Analysis

• PCA is a way of identifying patterns in data, and expressing the data in

such a way as to highlight their similarities and differences.
• Since patterns in data can be hard to find in data of high dimension,
PCA is a powerful tool for analysing data.
• One main advantage of PCA is that once we found these patterns in
data we are actually compressing the data by reducing the number of
dimensions without much loss of information.
Method

• Step 1: Get the data

• Step 2: Subtract the mean
For PCA to work properly, we have to subtract the mean from each of
the data dimensions. So, for example, all the x values becomes 𝑥ഥ
subtracted and all y values becomes 𝑦ത subtracted. This produces a dataset
whose mean is zero.
• Step 3: Calculate the covariance matrix.
Since the data is 2-dimensional, the covariance matrix is a 22 matrix. In
this example, the covariance matrix is

.616555556 .615444444
Cov=
.615444444 .716555556

Since the off-diagonal elements in this covariance matrix are positive, both
x and y variable have a positive co-relation. That is both x and y variables
increase together.
• Step 4: Calculate the eigenvectors and eigenvalues of the covariance
matrix.
From the covariance matrix, it is possible to calculate the eigenvectors and
eigenvalues. These are very important because they represent useful
information about our data.
The eigenvectors and eigenvalues of our covariance matrix are as follows:
0.0490833989
eigenvalues =
1.28402771
−.735178656 −.677873399
eigenvectors =
.677873399 −.735178656
• These eigenvectors are both unit eigenvectors.
• We can plot these eigenvectors on top of the data we have.
• They appear as diagonal lines on the plot.
• They are perpendicular to each other.
• They provide information about patterns in data.
• One of the eigen vectors goes through the middle of the points, like drawing a line of
best fit.
• That eigenvector is showing the relationship between x and y through that line (an
approximation of the data points).
• The second eigenvector is less important, gives us other important pattern in data.
• All the points follow the main line but are off to the side of the main line by some
amount.
• So, by this process of taking the eigenvectors of the covariance matrix, we
have been able to extract lines that characterize the data.
• It is possible to transform the given data in such a way that it is expressed in
terms of these lines.
• Step 5: Choosing components and forming a feature vector.
Here the idea of data compression and dimensionality reduction comes into
picture.
Eigenvector with highest eigenvalue is the principal component of the
dataset.
Once the eigenvectors are found from the covariance matrix, the next step is
to rank them by eigenvalue from highest to lowest.
This gives components in order of significance.
Based on this order, the components of less significance can be ignored.
• If we leave out some components, the final dataset will have less dimensions
than original.
• To be precise, if we have n dimensions in our data originally and if we
calculate n eigen vectors and values and if we choose only the first p
eigenvectors, then the final dataset has only p-dimensions.
So, for feature selection, what we have to do is form a reduced matrix by
taking the eigenvectors we want to keep from the list of eigenvectors by
keeping these selected eigenvectors as columns.
So, FeatureVector=(eig1, eig2, eig3,…,eign)
In our example data, since we have two eigenvectors, we have two choices.
We can form a feature vector with both of the eigenvectors or we can
choose to leave out the smaller; less significant component and only have a
single column.
In this example, by considering both these eigenvectors in the order of
eigenvalues,
−0.677873399 −0.735178656
FeatureVector =
−0.735178656 0.677873399
If we leave out the less significant eigenvector from the list, the reduced
−0.677873399
FeatureVector =
−0.735178650
• Step 6: Deriving the new dataset.
This is the final step in PCA. Once we chosen the component (eigenvectors)
that we wish to keep in our data and formed a feature vector, we simply take the
transpose of the vector and multiply it on the left of the original dataset
transformed.
FinalData = RowFeatureVector × RowDataAdjust
where RowFeatureVector is the matrix with the eigenvectors in the columns
transposed so that the eigenvectors are now in rows with most significant
vectors at the top. RowDataAdjust is the mean adjusted data transposed. That is,
data items are now in each column with each row holding a separate dimension.
FinalData is the final dataset with data items in columns and dimensions
along rows.
The final data is only in terms of the vectors that we decided to keep.
To bring the data back to the same table like format, take the transpose of
the result.
When we consider a transformation by taking only the eigenvector with the
largest eigenvalue, it has only a single dimension. This data set is nothing
but the data contained in the first column of the original dataset. If we plot
this data it is one-dimensional and is actually the projection of the actual
data ponts on the x-axis. We have effectively thrown away the other axis.
Getting the old data back

• If we took all the eigenvectors in our transformation, we get exactly

the original data back.
• If we have reduced the number of eigenvectors in the final
transformation, then the retrieved data has lost some information (i.e.,
the least significant features).
• So, the final transformation is
FinalData = RowFeatureVector  RowDataAdjust
• To get the original data back
RowDataAdjust = (RowFeatureVector)-1  FinalData
• Where (RowFeatureVector)-1 is the inverse of the RowFeatureVector.
• When we take all the eigenvectors, the inverse of our feature vector is
actually equal to the transpose of our feature vector.
• This is true because the elements of the matrix are all unit eigenvectors
of our dataset. Therefore the equation becomes
RowDataAdjust = (RowFeatureVector)T  FinalData
• To get the actual original data back, we need to add mean of the
original data along with RowDataAdjust
RowOriginalData = (RowFeatureVector)-1  FinalData+original mean
• When we leave out some eigenvectors, the above equation still make
the correct transform.
• When we use the complete eigenvector, the result is exactly the data
we started with.
• When we do it with a reduced feature vector by keeping the variation
along the principal eigenvector, the variation along the other
component has lost (i.e., the projection to the x-axis/y-axis as the case
may be).

Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Lectures On Discrete Geometry, Jiri Matousek
100% (1)
Lectures On Discrete Geometry, Jiri Matousek
491 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
MOE Kitwe District ADDMA Notes Grade 10 To 12
100% (1)
MOE Kitwe District ADDMA Notes Grade 10 To 12
68 pages
Multivariate
100% (1)
Multivariate
78 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
17 pages
D3S2 - Unsupervised - Dimensionality Reduction
No ratings yet
D3S2 - Unsupervised - Dimensionality Reduction
81 pages
Design For Manufacture and Assembly Tolerance Analysis
83% (6)
Design For Manufacture and Assembly Tolerance Analysis
44 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
MLSP-6 Dimensionality Reduction
No ratings yet
MLSP-6 Dimensionality Reduction
39 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Module 5 - BECE309L - AIML - Part2
No ratings yet
Module 5 - BECE309L - AIML - Part2
34 pages
Principal Component Analysis (PCA) : Anisha M. Lal
No ratings yet
Principal Component Analysis (PCA) : Anisha M. Lal
20 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
ML Lec-20
No ratings yet
ML Lec-20
17 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
MLSP Exp2
No ratings yet
MLSP Exp2
7 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
DR Pca
No ratings yet
DR Pca
22 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Pac
No ratings yet
Pac
70 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Principal Component Analysis (PCA) : Gundimeda Venugopal
No ratings yet
Principal Component Analysis (PCA) : Gundimeda Venugopal
17 pages
Presentation
No ratings yet
Presentation
31 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Unit 3
No ratings yet
Unit 3
28 pages
Maths Class 8 Worksheet
No ratings yet
Maths Class 8 Worksheet
4 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
PCA
100% (1)
PCA
33 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Q4 MATH 9-WEEK 4 - Word Problems Involving Right Triangles
No ratings yet
Q4 MATH 9-WEEK 4 - Word Problems Involving Right Triangles
51 pages
Decision Tree New
No ratings yet
Decision Tree New
52 pages
Lesson 5 Rational and Irrational Numbers PDF
No ratings yet
Lesson 5 Rational and Irrational Numbers PDF
35 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Std08 Maths EM 2 PDF
100% (1)
Std08 Maths EM 2 PDF
152 pages
Lecture 14: Principal Component Analysis: Computing The Principal Components
No ratings yet
Lecture 14: Principal Component Analysis: Computing The Principal Components
6 pages
Trig Equations PDF
No ratings yet
Trig Equations PDF
8 pages
Math SSS1 Week 3 3RD Term
No ratings yet
Math SSS1 Week 3 3RD Term
6 pages
Imo (1990 - 2002) Hojoo Lee
No ratings yet
Imo (1990 - 2002) Hojoo Lee
16 pages
12 Gold 4 - C2 Edexcel PDF
No ratings yet
12 Gold 4 - C2 Edexcel PDF
17 pages
The CFD Process SPRING 2012: Pre-Processing Solving Post-Processing
No ratings yet
The CFD Process SPRING 2012: Pre-Processing Solving Post-Processing
20 pages
A Key To Elementary Geometry
No ratings yet
A Key To Elementary Geometry
196 pages
NC Manufacturing Verification
No ratings yet
NC Manufacturing Verification
67 pages
Solve Each of The Following Equations and Represent The Solution Graphically
No ratings yet
Solve Each of The Following Equations and Represent The Solution Graphically
7 pages
Communication For Ugc Net Paper 1 Topics Brief
No ratings yet
Communication For Ugc Net Paper 1 Topics Brief
15 pages
Lecture-4 - Crystal Systems and Bravais Lattices
No ratings yet
Lecture-4 - Crystal Systems and Bravais Lattices
18 pages
Sec 2 SA2 Mock Exam - Set 1 Solutions
No ratings yet
Sec 2 SA2 Mock Exam - Set 1 Solutions
28 pages
081) de Worksheet (FRP)
No ratings yet
081) de Worksheet (FRP)
10 pages
Motion in A Plane - Practice Sheet - Uday Titans
No ratings yet
Motion in A Plane - Practice Sheet - Uday Titans
5 pages
LAN Switching and Link Layer Switches
No ratings yet
LAN Switching and Link Layer Switches
7 pages
Gini Index Problem
No ratings yet
Gini Index Problem
12 pages
Higher Ed
No ratings yet
Higher Ed
60 pages
Lesson 1-2 - Stem 1B
No ratings yet
Lesson 1-2 - Stem 1B
13 pages
Link State Protocol
No ratings yet
Link State Protocol
5 pages
DM 02 04 Data Transformation
No ratings yet
DM 02 04 Data Transformation
49 pages
Module 1.2 Projection of Point and Line KJSSE
No ratings yet
Module 1.2 Projection of Point and Line KJSSE
20 pages
8cm 8cm
No ratings yet
8cm 8cm
8 pages
Linear Algebra Chapter 1 PDF
No ratings yet
Linear Algebra Chapter 1 PDF
44 pages
TCP Congestion Control
No ratings yet
TCP Congestion Control
5 pages
Transport Layer Services
No ratings yet
Transport Layer Services
8 pages
Project in Calculus Ii
No ratings yet
Project in Calculus Ii
14 pages
Fourier Transforms 2D
No ratings yet
Fourier Transforms 2D
2 pages
Sol. Trigonometry (Rfi) - (Part 2)
No ratings yet
Sol. Trigonometry (Rfi) - (Part 2)
15 pages
C2 Circle
No ratings yet
C2 Circle
12 pages
2d and 3d Array Antenna
No ratings yet
2d and 3d Array Antenna
5 pages
IP Anycast
No ratings yet
IP Anycast
5 pages
Rate of Change Copy of BSIT 1A
No ratings yet
Rate of Change Copy of BSIT 1A
11 pages
Scale 0385
No ratings yet
Scale 0385
1 page
Introduction to Vectors, Matrices and Tensors
From Everand
Introduction to Vectors, Matrices and Tensors
Simone Malacrida
No ratings yet

Principal Component Analysis

Uploaded by

Principal Component Analysis

Uploaded by

Principal Component Analysis

• Principal Component Analysis (PCA) is a technique that is widely

• Consider a dataset of observations {xn}, where n=1,2,3,…,N and xn is

• PCA is a way of identifying patterns in data, and expressing the data in

• Step 1: Get the data

• If we took all the eigenvectors in our transformation, we get exactly

You might also like