0% found this document useful (0 votes)
304 views6 pages

Projecting Data To A Lower Dimension With PCA

The document discusses principal component analysis (PCA), a technique used to reduce the dimensionality of data by projecting it to a lower-dimensional space. It covers standard deviation, covariance, and eigenvectors/eigenvalues as mathematical concepts required to understand PCA. The PCA algorithm involves subtracting the mean, calculating the covariance matrix, finding eigenvectors and eigenvalues of the covariance matrix, choosing principal components to form a lower-dimensional feature vector.

Uploaded by

Kariem Muhammed
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
304 views6 pages

Projecting Data To A Lower Dimension With PCA

The document discusses principal component analysis (PCA), a technique used to reduce the dimensionality of data by projecting it to a lower-dimensional space. It covers standard deviation, covariance, and eigenvectors/eigenvalues as mathematical concepts required to understand PCA. The PCA algorithm involves subtracting the mean, calculating the covariance matrix, finding eigenvectors and eigenvalues of the covariance matrix, choosing principal components to form a lower-dimensional feature vector.

Uploaded by

Kariem Muhammed
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

13-09-2011

Projecting data to a lower dimension with PCA


Principal Components Analysis
In this paper I will discuss with you how to understand Principal Components Analysis (PCA), PCA is a useful
statistical technique that has found application in fields such as face recognition and image compression, and is a
common technique for finding patterns in data of high dimension.
At first we will discuss why we need this statistical algorithm?
If the data x lies in high dimensional space, then an enormous amount of data is required to learn distributions or
decision rules.
What are the problems with high dimensional data? [The curse of dimensionality]
 Running time: A lot of methods have at least O (nd2) complexity, where n is the number of samples.
 Over fitting

Ex: 50 dimensions. Each dimension has 20 levels. This gives a total of 2050 cells. But the no. of data samples will be far
less. There will not be enough data samples to learn!
So, we need to reduce data dimensionality!
Dimensionality reduction methods
 Principal Component Analysis (PCA) – for unsupervised learning. (That we will focus on isA  )
 Fisher Linear Discriminate (FLD) – for supervised learning.
 Multi-dimensional Scaling.
 Independent Component Analysis.

Before getting to a description of PCA, I will first introduce mathematical concepts that will be used in PCA that covers
standard deviation, covariance, eigenvectors and eigenvalues.
We can say that one may do a PCA or FA simply to reduce a set of p variables to m components or factors prior to
further analyses on those m factors.

Mathematical background
In this section I will attempt to refresh - using only examples - some elementary mathematical skills background
that will be required to understand the process of Principal Components Analysis (PCA), Divided in two parts:

 Standard Deviation
 Variance
Statistics
 Covariance
 Covariance Matrix
Matrix Algebra  Eigenvectors
 Eigenvalues

Standard Deviation 𝜎
Assume we will take a sample of a population X = [1 2 4 6 12 15 25 45 56 67 65 98]
𝑛
𝑖=1 𝑥 𝑖
Mean 𝑋=
𝑛
Unfortunately, the mean doesn’t tell us a lot about the data, For example, these two data sets have exactly the same
mean (10), but are obviously quite different [0 8 12 20 ] and [8 9 11 12]
So what is different about these two sets?
It is the spread of the data that is different; The Standard Deviation (SD) of a data set is a measure of how spread out
the data is.
𝑛 2
𝑖=1(𝑥 𝑖 −𝑋)
𝜎= “The average distance from the mean of the data set to a point”
(𝑛−1)
o So they are different now  ,, sample A 𝜎𝐴 = 8.3266 and sample B 𝜎𝐵 = 1.8257
- And so, as expected, the first set has a much larger standard deviation due to the fact that the data is much
more spread out from the mean.
- Another example, the data set: [10 10 10 10] also has a mean of 10, but its standard deviation is 0, because
all the numbers are the same. None of them deviate from the mean.
o It also discusses the difference between samples and populations.

Variance
Variance is another measure of the spread of data in a data set. In fact it is almost identical to the standard deviation.
The formula is this:

𝑛 (𝑥 −𝑋)2
𝜎2 = 𝑖=1 𝑖
(𝑛−1)
Covariance
- The last two measures we have looked at are purely 1-dimensional. Data sets like this “heights of all the people
in the room”,” marks for the last COMP exam “etc.
- However many data sets have more than one dimension, and the aim of the statistical analysis of these data
sets is usually to see if there is any relationship between the dimensions.
- For example, we might have as our data set both the height of all the students in a class, and the mark they
received for that paper. We could then perform statistical analysis to see if the height of a student has any
effect on their mark.
- Standard deviation and variance only operate on 1 dimension, so that you could only calculate the standard
deviation for each dimension of the data set independently of the other dimensions.
- Covariance is always measured between 2 dimensions, If you calculate the covariance between one dimension
and itself, you get the variance. So, if you had a 3-dimensional data set (x, y, z), then you could measure the
covariance between the x and y dimensions, the x and y dimensions, and the y and z dimensions.
- Measuring the covariance between x and x, or y and y, or z and z would give you the variance of the x, y and z
dimensions respectively.

𝑛 (𝑥 − 𝑋) (𝑥 − 𝑋)
𝑣𝑎𝑟 𝑋 =
𝑖=1 𝑖 𝑖
(𝑛−1)

𝑛 (𝑥 − 𝑋) (𝑦 − 𝑌)
𝑐𝑜𝑣 𝑋 =
𝑖=1 𝑖 𝑖
(𝑛−1)
- If the value of covariance is positive then that indicates that both dimensions increase together, If the value is
negative, then as one dimension increases, the other decreases
EX: Imagine we have gone into the world and collected some 2-dimensional data, say, we have
asked a bunch of students how many hours in total that they spent studying image processing, and the mark
that they received. So we have two dimensions, the first is the H dimension, the hours studied, and the second is
the M dimension, the mark received.

Covariance Matrix
Recall that covariance is always measured between 2 dimensions. If we have a data se with more than 2 dimensions,
there is more than one covariance measurement that can be calculated. For example, from a 3 dimensional data set
(dimensions x, y, z).
A useful way to get all the possible covariance values between all the different dimensions is to calculate them all and
put them in a matrix.

𝐶 𝑛∗𝑛 = (𝑐𝑖,𝑗 , 𝑐𝑖,𝑗 = 𝑐𝑜𝑣 𝐷𝑖𝑚𝑖 , 𝐷𝑖𝑚𝑗 )


𝑐𝑜𝑣 𝑥, 𝑥 𝑐𝑜𝑣 𝑥, 𝑦 𝑐𝑜𝑣 𝑥, 𝑧
𝐶= 𝑐𝑜𝑣 𝑦, 𝑥 𝑐𝑜𝑣 𝑦, 𝑦 𝑐𝑜𝑣 𝑦, 𝑧
𝑐𝑜𝑣 𝑧, 𝑥 𝑐𝑜𝑣 𝑧, 𝑦 𝑐𝑜𝑣 𝑧, 𝑧

Eigenvectors
As you know, you can multiply two matrices together, provided they are compatible sizes. Eigenvectors are a special
case of this. Consider the two multiplications between a matrix and a vector
2 3
2 1
∗ 13 = 11 5
And
2 3
2 1
∗ 32 = 12 8
=4
3
2

In the first example, the resulting vector is not an integer multiple of the original vector, whereas in the second example,
the example is exactly 4 times the vector we began with.
What properties do these eigenvectors have?
o Can only be found for square matrices.
o Not every square matrix has eigenvectors.
o Given an n*n matrix that does have eigenvectors, there are n of them, for example 3X3 matrix, there are 3
eigenvectors.
o If we scale the vector by some amount before we multiply it, we still get the same multiple of it as a result.

Eigenvalues
Eigenvalues are closely related to eigenvectors, in fact, In our example, the value was 4. 4 is the eigenvalue associated
with that eigenvector.

Principal Components Analysis


 Finally we come to Principal Components Analysis (PCA). What is it? It is a way of identifying patterns in data,
and expressing the data in such a way as to highlight their similarities and differences. Since patterns in data
can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA
is a powerful tool for analyzing data.
 PCA can compress the data by reducing the number of dimensions.
 With PCA we can get the old data back! 

PCA Algorithm steps:


 Step1: Get some data
 Step2: Subtract the mean
 Step3: Calculate the covariance matrix
 Step4: Calculate the eigenvectors and eigenvalues of the covariance matrix
 Step5: Choosing components and forming a feature vector
 Step6: Deriving the new data set
Step1: Get some Data
We will use 2-dimensions data in our example
Step2: Subtract the mean
For PCA to work properly, you have to subtract the mean from each of the data dimensions. The mean subtracted is the
average across each dimension. So, all the x values have 𝑥 (the mean of the x values of all the data points) subtracted,
and all the y values have 𝑦 subtracted from them. This produces a data set whose mean is zero.

 Step3: Calculate the covariance matrix


Step3: Calculate the covariance matrix

Since the non-diagonal elements in this covariance matrix are positive, we should expect that both the x and y variable
increase together.
Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix

Step 5: Choosing components and forming a feature vector


What needs to be done now is you need to form a feature vector, which is just a fancy name for a matrix of vectors. This
is constructed by taking the eigenvectors that you want to keep from the list of eigenvectors, and forming a matrix with
these eigenvectors in the columns.

Given our example set of data, and the fact that we have 2 eigenvectors, we havetwo choices. We can either form a
feature vector with both of the eigenvectors:

or, we can choose to leave out the smaller, less significant component and only have a single column:

Step 6: Deriving the new data set

Where RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are
now in the rows, with the most significant eigenvector at the top, and RowDataAdjust is the mean-adjusted data
transposed, ie. The data items are in each column, with each row holding a separate dimension.
Benefits
 Use PCA to find patterns
Say we have 20 images. Each image is N pixels high by N pixels wide. For each image we can create an
image vector as described in the representation section. We can then put all the images together in one big
image-matrix like this:

Which gives us a starting point for our PCA analysis? Once we have performed PCA, we have our original
data in terms of the eigenvectors we found from the covariance matrix. Why is this useful? Say we want to
do facial recognition, and so our original images were of people’s faces. Then, the problem is, given a new
image, whose face from the original set is it? (Note that the new image is not one of the 20 we started
with.) The way this is done is computer vision is to measure the difference between the new image and the
original images, but not along the original axes, along the new axes derived from the PCA analysis.
It turns out that these axes works much better for recognizing faces, because the PCA analysis has given us
the original images in terms of the differences and similarities between them. The PCA analysis has
identified the statistical patterns in the data.
Since all the vectors are 𝑁 2 dimensional, we will get 𝑁 2 eigenvectors. In practice, we are able to leave out
some of the less significant eigenvectors, and the recognition still performs well.

 Use PCA for image compression


Using PCA for image compression also known as the Hotelling, or Karhunen and Leove (KL), transform. If we
have 20 images, each with 𝑁 2 pixels, we can form 𝑁 2 vectors, each with 20 dimensions. Each vector
consists of all the intensity values from the same pixel from each picture. This is different from the previous
example because before we had a vector for image, and each item in that vector was a different pixel,
whereas now we have a vector for each pixel, and each item in the vector is from a different image.
Now we perform the PCA on this set of data. We will get 20 eigenvectors because each vector is 20-
dimensional. To compress the data, we can then choose to transform the data only using, say 15 of the
eigenvectors. This gives us a final data set with only 15 dimensions, which has saved us 1/4 of the space.
However, when the original data is reproduced, the images have lost some of the information. This
compression technique is said to be loosy because the decompressed image is not exactly the same as the
original, generally worse.

You might also like