Projecting Data To A Lower Dimension With PCA
Projecting Data To A Lower Dimension With PCA
Ex: 50 dimensions. Each dimension has 20 levels. This gives a total of 2050 cells. But the no. of data samples will be far
less. There will not be enough data samples to learn!
So, we need to reduce data dimensionality!
Dimensionality reduction methods
Principal Component Analysis (PCA) – for unsupervised learning. (That we will focus on isA )
Fisher Linear Discriminate (FLD) – for supervised learning.
Multi-dimensional Scaling.
Independent Component Analysis.
Before getting to a description of PCA, I will first introduce mathematical concepts that will be used in PCA that covers
standard deviation, covariance, eigenvectors and eigenvalues.
We can say that one may do a PCA or FA simply to reduce a set of p variables to m components or factors prior to
further analyses on those m factors.
Mathematical background
In this section I will attempt to refresh - using only examples - some elementary mathematical skills background
that will be required to understand the process of Principal Components Analysis (PCA), Divided in two parts:
Standard Deviation
Variance
Statistics
Covariance
Covariance Matrix
Matrix Algebra Eigenvectors
Eigenvalues
Standard Deviation 𝜎
Assume we will take a sample of a population X = [1 2 4 6 12 15 25 45 56 67 65 98]
𝑛
𝑖=1 𝑥 𝑖
Mean 𝑋=
𝑛
Unfortunately, the mean doesn’t tell us a lot about the data, For example, these two data sets have exactly the same
mean (10), but are obviously quite different [0 8 12 20 ] and [8 9 11 12]
So what is different about these two sets?
It is the spread of the data that is different; The Standard Deviation (SD) of a data set is a measure of how spread out
the data is.
𝑛 2
𝑖=1(𝑥 𝑖 −𝑋)
𝜎= “The average distance from the mean of the data set to a point”
(𝑛−1)
o So they are different now ,, sample A 𝜎𝐴 = 8.3266 and sample B 𝜎𝐵 = 1.8257
- And so, as expected, the first set has a much larger standard deviation due to the fact that the data is much
more spread out from the mean.
- Another example, the data set: [10 10 10 10] also has a mean of 10, but its standard deviation is 0, because
all the numbers are the same. None of them deviate from the mean.
o It also discusses the difference between samples and populations.
Variance
Variance is another measure of the spread of data in a data set. In fact it is almost identical to the standard deviation.
The formula is this:
𝑛 (𝑥 −𝑋)2
𝜎2 = 𝑖=1 𝑖
(𝑛−1)
Covariance
- The last two measures we have looked at are purely 1-dimensional. Data sets like this “heights of all the people
in the room”,” marks for the last COMP exam “etc.
- However many data sets have more than one dimension, and the aim of the statistical analysis of these data
sets is usually to see if there is any relationship between the dimensions.
- For example, we might have as our data set both the height of all the students in a class, and the mark they
received for that paper. We could then perform statistical analysis to see if the height of a student has any
effect on their mark.
- Standard deviation and variance only operate on 1 dimension, so that you could only calculate the standard
deviation for each dimension of the data set independently of the other dimensions.
- Covariance is always measured between 2 dimensions, If you calculate the covariance between one dimension
and itself, you get the variance. So, if you had a 3-dimensional data set (x, y, z), then you could measure the
covariance between the x and y dimensions, the x and y dimensions, and the y and z dimensions.
- Measuring the covariance between x and x, or y and y, or z and z would give you the variance of the x, y and z
dimensions respectively.
𝑛 (𝑥 − 𝑋) (𝑥 − 𝑋)
𝑣𝑎𝑟 𝑋 =
𝑖=1 𝑖 𝑖
(𝑛−1)
𝑛 (𝑥 − 𝑋) (𝑦 − 𝑌)
𝑐𝑜𝑣 𝑋 =
𝑖=1 𝑖 𝑖
(𝑛−1)
- If the value of covariance is positive then that indicates that both dimensions increase together, If the value is
negative, then as one dimension increases, the other decreases
EX: Imagine we have gone into the world and collected some 2-dimensional data, say, we have
asked a bunch of students how many hours in total that they spent studying image processing, and the mark
that they received. So we have two dimensions, the first is the H dimension, the hours studied, and the second is
the M dimension, the mark received.
Covariance Matrix
Recall that covariance is always measured between 2 dimensions. If we have a data se with more than 2 dimensions,
there is more than one covariance measurement that can be calculated. For example, from a 3 dimensional data set
(dimensions x, y, z).
A useful way to get all the possible covariance values between all the different dimensions is to calculate them all and
put them in a matrix.
Eigenvectors
As you know, you can multiply two matrices together, provided they are compatible sizes. Eigenvectors are a special
case of this. Consider the two multiplications between a matrix and a vector
2 3
2 1
∗ 13 = 11 5
And
2 3
2 1
∗ 32 = 12 8
=4
3
2
In the first example, the resulting vector is not an integer multiple of the original vector, whereas in the second example,
the example is exactly 4 times the vector we began with.
What properties do these eigenvectors have?
o Can only be found for square matrices.
o Not every square matrix has eigenvectors.
o Given an n*n matrix that does have eigenvectors, there are n of them, for example 3X3 matrix, there are 3
eigenvectors.
o If we scale the vector by some amount before we multiply it, we still get the same multiple of it as a result.
Eigenvalues
Eigenvalues are closely related to eigenvectors, in fact, In our example, the value was 4. 4 is the eigenvalue associated
with that eigenvector.
Since the non-diagonal elements in this covariance matrix are positive, we should expect that both the x and y variable
increase together.
Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix
Given our example set of data, and the fact that we have 2 eigenvectors, we havetwo choices. We can either form a
feature vector with both of the eigenvectors:
or, we can choose to leave out the smaller, less significant component and only have a single column:
Where RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are
now in the rows, with the most significant eigenvector at the top, and RowDataAdjust is the mean-adjusted data
transposed, ie. The data items are in each column, with each row holding a separate dimension.
Benefits
Use PCA to find patterns
Say we have 20 images. Each image is N pixels high by N pixels wide. For each image we can create an
image vector as described in the representation section. We can then put all the images together in one big
image-matrix like this:
Which gives us a starting point for our PCA analysis? Once we have performed PCA, we have our original
data in terms of the eigenvectors we found from the covariance matrix. Why is this useful? Say we want to
do facial recognition, and so our original images were of people’s faces. Then, the problem is, given a new
image, whose face from the original set is it? (Note that the new image is not one of the 20 we started
with.) The way this is done is computer vision is to measure the difference between the new image and the
original images, but not along the original axes, along the new axes derived from the PCA analysis.
It turns out that these axes works much better for recognizing faces, because the PCA analysis has given us
the original images in terms of the differences and similarities between them. The PCA analysis has
identified the statistical patterns in the data.
Since all the vectors are 𝑁 2 dimensional, we will get 𝑁 2 eigenvectors. In practice, we are able to leave out
some of the less significant eigenvectors, and the recognition still performs well.