0% found this document useful (0 votes)
16 views27 pages

Principal Component Analysis

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction, data compression, feature extraction, and data visualization by projecting data onto a lower-dimensional space while maximizing variance. The process involves steps such as mean subtraction, calculating the covariance matrix, determining eigenvectors and eigenvalues, selecting significant components, and deriving a new dataset. PCA effectively identifies patterns in high-dimensional data, allowing for a compressed representation with minimal loss of information.

Uploaded by

Jithin S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views27 pages

Principal Component Analysis

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction, data compression, feature extraction, and data visualization by projecting data onto a lower-dimensional space while maximizing variance. The process involves steps such as mean subtraction, calculating the covariance matrix, determining eigenvectors and eigenvalues, selecting significant components, and deriving a new dataset. PCA effectively identifies patterns in high-dimensional data, allowing for a compressed representation with minimal loss of information.

Uploaded by

Jithin S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Principal Component Analysis

Introduction

• Principal Component Analysis (PCA) is a technique that is widely


used for applications such as:
• dimensionality reduction
• lossy data compression
• feature extraction and
• data visualisation.
• PCA can be defined as the orthogonal projection of the data onto a
lower dimensional linear space, known as principal subspace, such
that the variance of the projected data is maximized.
• Equivalently, it can be defined as the linear projection that minimizes
the average projection cost, defined as the mean squared distance
between the data points and their projections.
Maximum variance formulation

• Consider a dataset of observations {xn}, where n=1,2,3,…,N and xn is


a Euclidean variable with dimensionality D.
• Our goal is to project the data onto a space having dimensionality
M<D, while maximizing the variance of the projected data.
• To start with, consider the projection onto a one dimensional space
(M=1).
• We can define the direction of this space using a D-dimensional vector
u1, which for convenience, we shall choose to be a unit vector so that
u1Tu1 = 1.
• Note: A set of vectors Rn is called a basis, if they are linearly
independent and every vector Rn can be expressed as a linear
combination of these vectors.
• A set of vectors {x1, x2, x3, …, xn} are said to be linearly independent
if the linear vector equation w1x1+w2x2+…..+wnxn = 0 has only the
trivial solution w1=w2=…=wn =0. The set {x1, x2, x3, …, xn} is
linearly dependent otherwise.
Principal Component Analysis

• PCA is a way of identifying patterns in data, and expressing the data in


such a way as to highlight their similarities and differences.
• Since patterns in data can be hard to find in data of high dimension,
PCA is a powerful tool for analysing data.
• One main advantage of PCA is that once we found these patterns in
data we are actually compressing the data by reducing the number of
dimensions without much loss of information.
Method

• Step 1: Get the data


• Step 2: Subtract the mean
For PCA to work properly, we have to subtract the mean from each of
the data dimensions. So, for example, all the x values becomes 𝑥ഥ
subtracted and all y values becomes 𝑦ത subtracted. This produces a dataset
whose mean is zero.
• Step 3: Calculate the covariance matrix.
Since the data is 2-dimensional, the covariance matrix is a 22 matrix. In
this example, the covariance matrix is

.616555556 .615444444
Cov=
.615444444 .716555556

Since the off-diagonal elements in this covariance matrix are positive, both
x and y variable have a positive co-relation. That is both x and y variables
increase together.
• Step 4: Calculate the eigenvectors and eigenvalues of the covariance
matrix.
From the covariance matrix, it is possible to calculate the eigenvectors and
eigenvalues. These are very important because they represent useful
information about our data.
The eigenvectors and eigenvalues of our covariance matrix are as follows:
0.0490833989
eigenvalues =
1.28402771
−.735178656 −.677873399
eigenvectors =
.677873399 −.735178656
• These eigenvectors are both unit eigenvectors.
• We can plot these eigenvectors on top of the data we have.
• They appear as diagonal lines on the plot.
• They are perpendicular to each other.
• They provide information about patterns in data.
• One of the eigen vectors goes through the middle of the points, like drawing a line of
best fit.
• That eigenvector is showing the relationship between x and y through that line (an
approximation of the data points).
• The second eigenvector is less important, gives us other important pattern in data.
• All the points follow the main line but are off to the side of the main line by some
amount.
• So, by this process of taking the eigenvectors of the covariance matrix, we
have been able to extract lines that characterize the data.
• It is possible to transform the given data in such a way that it is expressed in
terms of these lines.
• Step 5: Choosing components and forming a feature vector.
Here the idea of data compression and dimensionality reduction comes into
picture.
Eigenvector with highest eigenvalue is the principal component of the
dataset.
Once the eigenvectors are found from the covariance matrix, the next step is
to rank them by eigenvalue from highest to lowest.
This gives components in order of significance.
Based on this order, the components of less significance can be ignored.
• If we leave out some components, the final dataset will have less dimensions
than original.
• To be precise, if we have n dimensions in our data originally and if we
calculate n eigen vectors and values and if we choose only the first p
eigenvectors, then the final dataset has only p-dimensions.
So, for feature selection, what we have to do is form a reduced matrix by
taking the eigenvectors we want to keep from the list of eigenvectors by
keeping these selected eigenvectors as columns.
So, FeatureVector=(eig1, eig2, eig3,…,eign)
In our example data, since we have two eigenvectors, we have two choices.
We can form a feature vector with both of the eigenvectors or we can
choose to leave out the smaller; less significant component and only have a
single column.
In this example, by considering both these eigenvectors in the order of
eigenvalues,
−0.677873399 −0.735178656
FeatureVector =
−0.735178656 0.677873399
If we leave out the less significant eigenvector from the list, the reduced
−0.677873399
FeatureVector =
−0.735178650
• Step 6: Deriving the new dataset.
This is the final step in PCA. Once we chosen the component (eigenvectors)
that we wish to keep in our data and formed a feature vector, we simply take the
transpose of the vector and multiply it on the left of the original dataset
transformed.
FinalData = RowFeatureVector × RowDataAdjust
where RowFeatureVector is the matrix with the eigenvectors in the columns
transposed so that the eigenvectors are now in rows with most significant
vectors at the top. RowDataAdjust is the mean adjusted data transposed. That is,
data items are now in each column with each row holding a separate dimension.
FinalData is the final dataset with data items in columns and dimensions
along rows.
The final data is only in terms of the vectors that we decided to keep.
To bring the data back to the same table like format, take the transpose of
the result.
When we consider a transformation by taking only the eigenvector with the
largest eigenvalue, it has only a single dimension. This data set is nothing
but the data contained in the first column of the original dataset. If we plot
this data it is one-dimensional and is actually the projection of the actual
data ponts on the x-axis. We have effectively thrown away the other axis.
Getting the old data back

• If we took all the eigenvectors in our transformation, we get exactly


the original data back.
• If we have reduced the number of eigenvectors in the final
transformation, then the retrieved data has lost some information (i.e.,
the least significant features).
• So, the final transformation is
FinalData = RowFeatureVector  RowDataAdjust
• To get the original data back
RowDataAdjust = (RowFeatureVector)-1  FinalData
• Where (RowFeatureVector)-1 is the inverse of the RowFeatureVector.
• When we take all the eigenvectors, the inverse of our feature vector is
actually equal to the transpose of our feature vector.
• This is true because the elements of the matrix are all unit eigenvectors
of our dataset. Therefore the equation becomes
RowDataAdjust = (RowFeatureVector)T  FinalData
• To get the actual original data back, we need to add mean of the
original data along with RowDataAdjust
RowOriginalData = (RowFeatureVector)-1  FinalData+original mean
• When we leave out some eigenvectors, the above equation still make
the correct transform.
• When we use the complete eigenvector, the result is exactly the data
we started with.
• When we do it with a reduced feature vector by keeping the variation
along the principal eigenvector, the variation along the other
component has lost (i.e., the projection to the x-axis/y-axis as the case
may be).

You might also like