0% found this document useful (0 votes)

304 views6 pages

Projecting Data To A Lower Dimension With PCA

The document discusses principal component analysis (PCA), a technique used to reduce the dimensionality of data by projecting it to a lower-dimensional space. It covers standard deviation, covariance, and eigenvectors/eigenvalues as mathematical concepts required to understand PCA. The PCA algorithm involves subtracting the mean, calculating the covariance matrix, finding eigenvectors and eigenvalues of the covariance matrix, choosing principal components to form a lower-dimensional feature vector.

Uploaded by

Kariem Muhammed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

304 views6 pages

Projecting Data To A Lower Dimension With PCA

Uploaded by

Kariem Muhammed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

13-09-2011

Projecting data to a lower dimension with PCA

Principal Components Analysis
In this paper I will discuss with you how to understand Principal Components Analysis (PCA), PCA is a useful
statistical technique that has found application in fields such as face recognition and image compression, and is a
common technique for finding patterns in data of high dimension.
At first we will discuss why we need this statistical algorithm?
If the data x lies in high dimensional space, then an enormous amount of data is required to learn distributions or
decision rules.
What are the problems with high dimensional data? [The curse of dimensionality]
 Running time: A lot of methods have at least O (nd2) complexity, where n is the number of samples.
 Over fitting

Ex: 50 dimensions. Each dimension has 20 levels. This gives a total of 2050 cells. But the no. of data samples will be far
less. There will not be enough data samples to learn!
So, we need to reduce data dimensionality!
Dimensionality reduction methods
 Principal Component Analysis (PCA) – for unsupervised learning. (That we will focus on isA  )
 Fisher Linear Discriminate (FLD) – for supervised learning.
 Multi-dimensional Scaling.
 Independent Component Analysis.

Before getting to a description of PCA, I will first introduce mathematical concepts that will be used in PCA that covers
standard deviation, covariance, eigenvectors and eigenvalues.
We can say that one may do a PCA or FA simply to reduce a set of p variables to m components or factors prior to
further analyses on those m factors.

Mathematical background
In this section I will attempt to refresh - using only examples - some elementary mathematical skills background
that will be required to understand the process of Principal Components Analysis (PCA), Divided in two parts:

 Standard Deviation
 Variance
Statistics
 Covariance
 Covariance Matrix
Matrix Algebra  Eigenvectors
 Eigenvalues

Standard Deviation 𝜎
Assume we will take a sample of a population X = [1 2 4 6 12 15 25 45 56 67 65 98]
𝑛
𝑖=1 𝑥 𝑖
Mean 𝑋=
𝑛
Unfortunately, the mean doesn’t tell us a lot about the data, For example, these two data sets have exactly the same
mean (10), but are obviously quite different [0 8 12 20 ] and [8 9 11 12]
So what is different about these two sets?
It is the spread of the data that is different; The Standard Deviation (SD) of a data set is a measure of how spread out
the data is.
𝑛 2
𝑖=1(𝑥 𝑖 −𝑋)
𝜎= “The average distance from the mean of the data set to a point”
(𝑛−1)
o So they are different now  ,, sample A 𝜎𝐴 = 8.3266 and sample B 𝜎𝐵 = 1.8257
- And so, as expected, the first set has a much larger standard deviation due to the fact that the data is much
more spread out from the mean.
- Another example, the data set: [10 10 10 10] also has a mean of 10, but its standard deviation is 0, because
all the numbers are the same. None of them deviate from the mean.
o It also discusses the difference between samples and populations.

Variance
Variance is another measure of the spread of data in a data set. In fact it is almost identical to the standard deviation.
The formula is this:

𝑛 (𝑥 −𝑋)2
𝜎2 = 𝑖=1 𝑖
(𝑛−1)
Covariance
- The last two measures we have looked at are purely 1-dimensional. Data sets like this “heights of all the people
in the room”,” marks for the last COMP exam “etc.
- However many data sets have more than one dimension, and the aim of the statistical analysis of these data
sets is usually to see if there is any relationship between the dimensions.
- For example, we might have as our data set both the height of all the students in a class, and the mark they
received for that paper. We could then perform statistical analysis to see if the height of a student has any
effect on their mark.
- Standard deviation and variance only operate on 1 dimension, so that you could only calculate the standard
deviation for each dimension of the data set independently of the other dimensions.
- Covariance is always measured between 2 dimensions, If you calculate the covariance between one dimension
and itself, you get the variance. So, if you had a 3-dimensional data set (x, y, z), then you could measure the
covariance between the x and y dimensions, the x and y dimensions, and the y and z dimensions.
- Measuring the covariance between x and x, or y and y, or z and z would give you the variance of the x, y and z
dimensions respectively.

𝑛 (𝑥 − 𝑋) (𝑥 − 𝑋)
𝑣𝑎𝑟 𝑋 =
𝑖=1 𝑖 𝑖
(𝑛−1)

𝑛 (𝑥 − 𝑋) (𝑦 − 𝑌)
𝑐𝑜𝑣 𝑋 =
𝑖=1 𝑖 𝑖
(𝑛−1)
- If the value of covariance is positive then that indicates that both dimensions increase together, If the value is
negative, then as one dimension increases, the other decreases
EX: Imagine we have gone into the world and collected some 2-dimensional data, say, we have
asked a bunch of students how many hours in total that they spent studying image processing, and the mark
that they received. So we have two dimensions, the first is the H dimension, the hours studied, and the second is
the M dimension, the mark received.

Covariance Matrix
Recall that covariance is always measured between 2 dimensions. If we have a data se with more than 2 dimensions,
there is more than one covariance measurement that can be calculated. For example, from a 3 dimensional data set
(dimensions x, y, z).
A useful way to get all the possible covariance values between all the different dimensions is to calculate them all and
put them in a matrix.

𝐶 𝑛∗𝑛 = (𝑐𝑖,𝑗 , 𝑐𝑖,𝑗 = 𝑐𝑜𝑣 𝐷𝑖𝑚𝑖 , 𝐷𝑖𝑚𝑗 )

𝑐𝑜𝑣 𝑥, 𝑥 𝑐𝑜𝑣 𝑥, 𝑦 𝑐𝑜𝑣 𝑥, 𝑧
𝐶= 𝑐𝑜𝑣 𝑦, 𝑥 𝑐𝑜𝑣 𝑦, 𝑦 𝑐𝑜𝑣 𝑦, 𝑧
𝑐𝑜𝑣 𝑧, 𝑥 𝑐𝑜𝑣 𝑧, 𝑦 𝑐𝑜𝑣 𝑧, 𝑧

Eigenvectors
As you know, you can multiply two matrices together, provided they are compatible sizes. Eigenvectors are a special
case of this. Consider the two multiplications between a matrix and a vector
2 3
2 1
∗ 13 = 11 5
And
2 3
2 1
∗ 32 = 12 8
=4
3
2

In the first example, the resulting vector is not an integer multiple of the original vector, whereas in the second example,
the example is exactly 4 times the vector we began with.
What properties do these eigenvectors have?
o Can only be found for square matrices.
o Not every square matrix has eigenvectors.
o Given an n*n matrix that does have eigenvectors, there are n of them, for example 3X3 matrix, there are 3
eigenvectors.
o If we scale the vector by some amount before we multiply it, we still get the same multiple of it as a result.

Eigenvalues
Eigenvalues are closely related to eigenvectors, in fact, In our example, the value was 4. 4 is the eigenvalue associated
with that eigenvector.

Principal Components Analysis

 Finally we come to Principal Components Analysis (PCA). What is it? It is a way of identifying patterns in data,
and expressing the data in such a way as to highlight their similarities and differences. Since patterns in data
can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA
is a powerful tool for analyzing data.
 PCA can compress the data by reducing the number of dimensions.
 With PCA we can get the old data back! 

PCA Algorithm steps:

 Step1: Get some data
 Step2: Subtract the mean
 Step3: Calculate the covariance matrix
 Step4: Calculate the eigenvectors and eigenvalues of the covariance matrix
 Step5: Choosing components and forming a feature vector
 Step6: Deriving the new data set
Step1: Get some Data
We will use 2-dimensions data in our example
Step2: Subtract the mean
For PCA to work properly, you have to subtract the mean from each of the data dimensions. The mean subtracted is the
average across each dimension. So, all the x values have 𝑥 (the mean of the x values of all the data points) subtracted,
and all the y values have 𝑦 subtracted from them. This produces a data set whose mean is zero.

 Step3: Calculate the covariance matrix

Step3: Calculate the covariance matrix

Since the non-diagonal elements in this covariance matrix are positive, we should expect that both the x and y variable
increase together.
Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix

Step 5: Choosing components and forming a feature vector

What needs to be done now is you need to form a feature vector, which is just a fancy name for a matrix of vectors. This
is constructed by taking the eigenvectors that you want to keep from the list of eigenvectors, and forming a matrix with
these eigenvectors in the columns.

Given our example set of data, and the fact that we have 2 eigenvectors, we havetwo choices. We can either form a
feature vector with both of the eigenvectors:

or, we can choose to leave out the smaller, less signiﬁcant component and only have a single column:

Step 6: Deriving the new data set

Where RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are
now in the rows, with the most signiﬁcant eigenvector at the top, and RowDataAdjust is the mean-adjusted data
transposed, ie. The data items are in each column, with each row holding a separate dimension.
Benefits
 Use PCA to find patterns
Say we have 20 images. Each image is N pixels high by N pixels wide. For each image we can create an
image vector as described in the representation section. We can then put all the images together in one big
image-matrix like this:

Which gives us a starting point for our PCA analysis? Once we have performed PCA, we have our original
data in terms of the eigenvectors we found from the covariance matrix. Why is this useful? Say we want to
do facial recognition, and so our original images were of people’s faces. Then, the problem is, given a new
image, whose face from the original set is it? (Note that the new image is not one of the 20 we started
with.) The way this is done is computer vision is to measure the difference between the new image and the
original images, but not along the original axes, along the new axes derived from the PCA analysis.
It turns out that these axes works much better for recognizing faces, because the PCA analysis has given us
the original images in terms of the differences and similarities between them. The PCA analysis has
identiﬁed the statistical patterns in the data.
Since all the vectors are 𝑁 2 dimensional, we will get 𝑁 2 eigenvectors. In practice, we are able to leave out
some of the less signiﬁcant eigenvectors, and the recognition still performs well.

 Use PCA for image compression

Using PCA for image compression also known as the Hotelling, or Karhunen and Leove (KL), transform. If we
have 20 images, each with 𝑁 2 pixels, we can form 𝑁 2 vectors, each with 20 dimensions. Each vector
consists of all the intensity values from the same pixel from each picture. This is different from the previous
example because before we had a vector for image, and each item in that vector was a different pixel,
whereas now we have a vector for each pixel, and each item in the vector is from a different image.
Now we perform the PCA on this set of data. We will get 20 eigenvectors because each vector is 20-
dimensional. To compress the data, we can then choose to transform the data only using, say 15 of the
eigenvectors. This gives us a final data set with only 15 dimensions, which has saved us 1/4 of the space.
However, when the original data is reproduced, the images have lost some of the information. This
compression technique is said to be loosy because the decompressed image is not exactly the same as the
original, generally worse.

Heim - Theory - Reconstructed-10-05-2025-Toward Experimental Validation
No ratings yet
Heim - Theory - Reconstructed-10-05-2025-Toward Experimental Validation
23 pages
BUS 3304 Unit 6 Assignment
No ratings yet
BUS 3304 Unit 6 Assignment
3 pages
TOC Unit 1
No ratings yet
TOC Unit 1
86 pages
Varahamihira
100% (2)
Varahamihira
6 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
17 pages
CTET Paper Complete Analysis by Himanshi Singh
No ratings yet
CTET Paper Complete Analysis by Himanshi Singh
37 pages
Le Maitre 1976
No ratings yet
Le Maitre 1976
10 pages
CONIC SECTION (Question)
No ratings yet
CONIC SECTION (Question)
4 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Multiple Correct Questions 1. Physics: Paper-1 JEE-Advanced - FT-02 - Sample Paper
No ratings yet
Multiple Correct Questions 1. Physics: Paper-1 JEE-Advanced - FT-02 - Sample Paper
12 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Multivariate
100% (1)
Multivariate
78 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
35 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Gemma Generalcatalogue2019wall PDF
No ratings yet
Gemma Generalcatalogue2019wall PDF
179 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
Worksheet-1 Trigonometry
No ratings yet
Worksheet-1 Trigonometry
3 pages
Cube and Cube Roots
No ratings yet
Cube and Cube Roots
5 pages
Che Lab Report On Flow Over Weirs
100% (1)
Che Lab Report On Flow Over Weirs
14 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Unit 3
No ratings yet
Unit 3
28 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Lec 3
No ratings yet
Lec 3
60 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Recommended System
No ratings yet
Recommended System
33 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Physics 1 - LESSON 1 (Mid - Fall 24)
No ratings yet
Physics 1 - LESSON 1 (Mid - Fall 24)
15 pages
Tms320C62X Image/Videoprocessing Library: Programmer'S Reference
No ratings yet
Tms320C62X Image/Videoprocessing Library: Programmer'S Reference
125 pages
1-PAC - Learning Framework - Example-20-12-2024
No ratings yet
1-PAC - Learning Framework - Example-20-12-2024
75 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Regression Analysis With Scilab
No ratings yet
Regression Analysis With Scilab
57 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Eigenfaces With Pca
No ratings yet
Eigenfaces With Pca
12 pages
5-Dimension Reduction
No ratings yet
5-Dimension Reduction
48 pages
Feature Extraction
No ratings yet
Feature Extraction
90 pages
5/13/2012 Prof. P. K. Dash 1
No ratings yet
5/13/2012 Prof. P. K. Dash 1
37 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
L7 Ann
No ratings yet
L7 Ann
22 pages
Assignment Solution
No ratings yet
Assignment Solution
6 pages
RES805-RM-Module 2
No ratings yet
RES805-RM-Module 2
26 pages
Pac
No ratings yet
Pac
70 pages
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
No ratings yet
Dimensionality Reduction Using PCA: Unsupervised Machine Learning
32 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
PCA
100% (1)
PCA
33 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
3 - Noman Naseer - Dimentionality - Reduction and Principle Component Analysis
No ratings yet
3 - Noman Naseer - Dimentionality - Reduction and Principle Component Analysis
43 pages
1-Python Algebra Maths
No ratings yet
1-Python Algebra Maths
26 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Dimensonality Reduction
No ratings yet
Dimensonality Reduction
25 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
Red e CVC
No ratings yet
Red e CVC
19 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
33 pages
? Answerbook - JL - Pluto - JL
No ratings yet
? Answerbook - JL - Pluto - JL
32 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Gilson Et Al 2005 - Creativity and Standardization - Complimentary of Conflicting Drivers or Team Effectiveness
No ratings yet
Gilson Et Al 2005 - Creativity and Standardization - Complimentary of Conflicting Drivers or Team Effectiveness
12 pages
Info Pca
No ratings yet
Info Pca
3 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Presentation
No ratings yet
Presentation
31 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
No ratings yet
Principal Component Analysis: Courtesy:University of Louisville, CVIP Lab
48 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Pasteurization of Milk PDF
No ratings yet
Pasteurization of Milk PDF
10 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
How Do You Do A Principal Component Analysis?
No ratings yet
How Do You Do A Principal Component Analysis?
13 pages
Effect of Axial Gap Between Inlet Nozzle and Impeller On Efficieny
No ratings yet
Effect of Axial Gap Between Inlet Nozzle and Impeller On Efficieny
12 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Proficient in AutoCAD (Electrical) Syllabus
No ratings yet
Proficient in AutoCAD (Electrical) Syllabus
4 pages
Evaluation For Vessel
No ratings yet
Evaluation For Vessel
10 pages
A3 - 1bm15me039 - Nyquist Plot Using Matlab
No ratings yet
A3 - 1bm15me039 - Nyquist Plot Using Matlab
12 pages
Database Management Systems-9
No ratings yet
Database Management Systems-9
10 pages
Basic Theory
No ratings yet
Basic Theory
4 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Mid Term Last Year
No ratings yet
Mid Term Last Year
4 pages
SCIENCE
No ratings yet
SCIENCE
4 pages
Technical Assessment 1
No ratings yet
Technical Assessment 1
3 pages
Smarter Heating With Your Climate Assistant: Smart Thermostats
No ratings yet
Smarter Heating With Your Climate Assistant: Smart Thermostats
4 pages
Resource - Types Permission - Types: Varchar Varchar Varchar Varchar
No ratings yet
Resource - Types Permission - Types: Varchar Varchar Varchar Varchar
2 pages
Resources: Updated - by Updated - at Description
No ratings yet
Resources: Updated - by Updated - at Description
2 pages
Veranstaltungsabsage EVENTIM EN
No ratings yet
Veranstaltungsabsage EVENTIM EN
1 page

Projecting Data To A Lower Dimension With PCA

Uploaded by

Projecting Data To A Lower Dimension With PCA

Uploaded by

13-09-2011

Projecting data to a lower dimension with PCA

𝐶 𝑛∗𝑛 = (𝑐𝑖,𝑗 , 𝑐𝑖,𝑗 = 𝑐𝑜𝑣 𝐷𝑖𝑚𝑖 , 𝐷𝑖𝑚𝑗 )

Principal Components Analysis

PCA Algorithm steps:

 Step3: Calculate the covariance matrix

Step 5: Choosing components and forming a feature vector

Step 6: Deriving the new data set

 Use PCA for image compression

You might also like