0% found this document useful (0 votes)
6 views6 pages

Principal Component Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Principal Component Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Principal Component Analysis (PCA) - Beginner's Note

Principal Component Analysis (PCA) is a technique used to simplify large sets


of data. Imagine you have a lot of information, like many different measurements from a bunch of
objects, and it’s hard to make sense of it all. PCA helps you by taking this complex data and turning it
into a simpler form, keeping the most important information.

Key Ideas:

1. Dimensionality Reduction:

o Think of PCA like organizing a messy room. You have too many things (data points),
and it’s hard to find what’s important.

o PCA helps by "cleaning up" and showing you the few key items (important patterns)
that matter the most.

2. Principal Components:

o These are the new, simpler pieces of information PCA gives you.

o The first principal component shows the most important pattern or trend in your
data.

o The second one shows the next most important pattern, and so on.

3. Why Use PCA?

o Simplifying Data: If you have too many features or measurements, PCA helps by
summarizing them into just a few that still capture the essence of your data.

o Making Patterns Clearer: It helps reveal patterns in the data that might not be
obvious at first glance.

4. How It Works:

o Standardize Your Data: First, you make sure all your measurements are on the same
scale.

o Find Patterns: PCA finds the main directions (patterns) where your data varies the
most.

o Create New Variables: These main directions become new variables, called principal
components, that you can use instead of your original data.

5. When to Use PCA?

o Too Much Data: When you have too many features and want to focus on the most
important ones.

o Visualization: When you want to see your data in a 2D or 3D plot to better


understand it.

Simple Example:
Imagine you’re looking at a lot of different types of fruit. Each fruit has different
measurements: size, weight, color, etc. PCA would help you figure out which measurements
are the most important to identify the type of fruit, like "big and heavy" might be a key
pattern. It then lets you focus on just those key patterns, making it easier to categorize or
visualize your fruit.
PCA is like taking a big, complicated puzzle and finding the main pieces that give you a clear
picture of what’s going on.

How PCA Constructs the Principal Components


As there are as many principal components as there are variables in the data, principal
components are constructed in such a manner that the first principal component accounts
for the largest possible variance in the data set. For example, let’s assume that the scatter
plot of our data set is as shown below, can we guess the first principal component ? Yes, it’s
approximately the line that matches the purple marks because it goes through the origin
and it’s the line in which the projection of the points (red dots) is the most spread out. Or
mathematically speaking, it’s the line that maximizes the variance (the average of the
squared distances from the projected points (red dots) to the origin).

Step-by-Step Explanation of PCA


Step 1: Standardization
The aim of this step is to standardize the range of the continuous initial variables so that each one of
them contributes equally to the analysis.

More specifically, the reason why it is critical to perform standardization prior to PCA, is that the
latter is quite sensitive regarding the variances of the initial variables. That is, if there are large
differences between the ranges of initial variables, those variables with larger ranges will dominate
over those with small ranges (for example, a variable that ranges between 0 and 100 will dominate
over a variable that ranges between 0 and 1), which will lead to biased results. So, transforming the
data to comparable scales can prevent this problem.

Mathematically, this can be done by subtracting the mean and dividing by the standard deviation for

each value of each variable.

Step2: Covariance Matrix Computation


Covariance measures the strength of joint variability between two or more variables, indicating how
much they change in relation to each other. To find the covariance we can use the formula:

The value of covariance can be positive, negative, or zeros.

 Positive: As the x1 increases x2 also increases.

 Negative: As the x1 increases x2 also decreases.

 Zeros: No direct relation

Step 3: Compute the eigenvectors and eigenvalues of the


covariance matrix to identify the principal components
Eigenvectors and eigenvalues are the linear algebra concepts that we need to
compute from the covariance matrix in order to determine the principal
components of the data.
What you first need to know about eigenvectors and eigenvalues is that they
always come in pairs, so that every eigenvector has an eigenvalue. Also, their
number is equal to the number of dimensions of the data. For example, for a 3-
dimensional data set, there are 3 variables, therefore there are 3 eigenvectors
with 3 corresponding eigenvalues.
It is eigenvectors and eigenvalues who are behind all the magic of principal
components because the eigenvectors of the Covariance matrix are
actually the directions of the axes where there is the most variance (most
information) and that we call Principal Components. And eigenvalues are
simply the coefficients attached to eigenvectors, which give the amount of
variance carried in each Principal Component.
By ranking your eigenvectors in order of their eigenvalues, highest to lowest,
you get the principal components in order of significance.
Principal Component Analysis Example:

Let’s suppose that our data set is 2-dimensional with 2 variables x,y and that
the eigenvectors and eigenvalues of the covariance matrix are as follows:

If we rank the eigenvalues in descending order, we get λ1>λ2, which means


that the eigenvector that corresponds to the first principal component (PC1)
is v1 and the one that corresponds to the second principal component (PC2)
is v2.
After having the principal components, to compute the percentage of variance
(information) accounted for by each component, we divide the eigenvalue of
each component by the sum of eigenvalues. If we apply this on the example
above, we find that PC1 and PC2 carry respectively 96 percent and 4 percent of
the variance of the data.

Importance of Principal Component Analysis (PCA) in Computer Vision


Principal Component Analysis (PCA) plays a crucial role in computer vision, a
field that involves processing and interpreting visual data from the world.
Here’s why PCA is so important in this domain:
1. Dimensionality Reduction
 High-Dimensional Data: Images are high-dimensional data, with each
pixel representing a feature. For example, a 100x100 grayscale image has
10,000 features (pixels).
 PCA Reduces Dimensions: PCA reduces the number of features by
finding the most important patterns in the data. This makes it easier to
process and analyze images without losing significant information.
 Faster Computation: By reducing the dimensionality, PCA speeds up the
processing time for machine learning models and algorithms, making
real-time applications like facial recognition and object detection more
feasible.
2. Noise Reduction
 Images Contain Noise: Real-world images often contain noise due to
factors like poor lighting or sensor imperfections.
 PCA Removes Noise: By keeping only the most important components
and discarding the less significant ones (which often correspond to
noise), PCA can help clean up images, improving the performance of
computer vision algorithms.
3. Feature Extraction
 Extracts Key Features: PCA helps in extracting the most important
features from an image. These features can then be used for tasks like
image classification, object detection, or face recognition.
 Eigenspace Representation: In face recognition, for example, PCA is used
to represent faces in a lower-dimensional eigenspace, known as
"eigenfaces." This simplifies the problem of matching faces by reducing
the data to the most essential features.
4. Data Compression
 Storage and Transmission: Storing and transmitting high-resolution
images can be resource-intensive. PCA allows for compressing the image
data by reducing the number of features (pixels) while retaining most of
the visual information.
 Reconstruction: The original image can be approximately reconstructed
from the principal components, which is valuable in applications where
data storage or bandwidth is limited.
5. Visualization
 Visualizing High-Dimensional Data: Images and other visual data are
often too complex to interpret directly. PCA can reduce this complexity,
making it possible to visualize high-dimensional data in 2D or 3D plots.
 Cluster Analysis: In tasks like object recognition or image segmentation,
PCA helps in visualizing and understanding clusters of similar images,
aiding in the design of better algorithms.
6. Preprocessing for Machine Learning
 Improves Model Performance: Before feeding images into machine
learning models, PCA can be used as a preprocessing step to reduce
noise and complexity, which often leads to better model performance.
 Prevents Overfitting: By reducing the number of features, PCA helps in
preventing overfitting, where the model performs well on training data
but poorly on unseen data.
7. Pattern Recognition
 Identifying Patterns: PCA excels in identifying and emphasizing the
underlying patterns in visual data, which is critical in pattern recognition
tasks like handwriting analysis or medical imaging.
 Character Recognition: For example, in optical character recognition
(OCR), PCA can help in recognizing letters or numbers by focusing on the
most distinctive features.
Summary
PCA is a powerful tool in computer vision because it simplifies and improves
the processing of high-dimensional visual data. By reducing dimensions,
removing noise, extracting features, and aiding in data compression and
visualization, PCA enhances the efficiency and effectiveness of computer vision
applications, making it an indispensable technique in the field.

You might also like