Principal Component Analysis With Python (A Deep Dive) - by Francesco Franco - Oct, 2024 - Medium
Principal Component Analysis With Python (A Deep Dive) - by Francesco Franco - Oct, 2024 - Medium
Open in app
Search
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSUzI… 1/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
In this iterative process, the model gets better and better, and sometimes it even
gets really good.
Sometimes, your input sample will contain many columns, also known as features. It
is common knowledge (especially with traditional models) that using every column
in your Machine Learning model will mean trouble, the so-called curse of
dimensionality. In this case, you’ll have to selectively handle the features you are
working with. In this article, we’ll cover Principal Components Analysis (PCA),
which is one such way. It provides a gentle but extensive introduction to feature
extraction for your Machine Learning model with PCA.
The article is structured as follows. First of all, we’ll take a look at what PCA is. We
do this through the lens of the Curse of Dimensionality, which explains why we need
to reduce dimensionality especially with traditional Machine Learning algorithms.
This also involves the explanation of the differences between Feature Selection and
Feature Extraction techniques, which have different goals. PCA, which is part of the
Feature Extraction branch of techniques, is then introduced.
Once we clearly understand how PCA happens by means of the Python example,
we’ll show you how you don’t have to reinvent the wheel if you’re using PCA. If you
understand what’s going on, it’s often better to use a well-established library for
computing the PCA. Using Scikit-learn’s sklearn.decomposition.PCA API, we will
finally show you how to compute principal components and apply them to perform
dimensionality reduction for your dataset.
Then, we’ll discuss what can be done against it — dimensionality reduction — and
explain the difference between Feature Selection and Feature Extraction. Finally,
we’ll get to PCA — and provide a high-level introduction.
Since Supervised Learning means that you have a dataset at your disposal, the first
step in training a model is feeding the samples to the model. For every sample, a
prediction is generated. Note that at the first iteration, the model has just been
initialized. The predictions therefore likely make no sense at all.
This becomes especially evident from what happens in the second step, where
predictions and ground truth (= actual targets) are compared. This comparison
produces an error of loss value which illustrates how bad the model performs.
The third step is then really simple: you improve the model. Depending on the
Machine Learning algorithm, optimization happens in different ways. In the case of
Neural networks, gradients are computed with backpropagation, and subsequently
optimizers (topic for a future blog) are used for changing the model internals.
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSUzI… 3/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
Weights can also be changed by minimizing one function only; it just depends on
the algorithm.
You then start again. Likely, because you have optimized the model, the predictions
are a little bit better now. You simply keep iterating until you are satisfied with the
results, and then you stop the training process.
When you are performing this iterative process, you are effectively moving from a
model that is underfit to a model that demonstrates a good fit. Let’s briefly take a look
at them here.
In the first stages of the training process, your model is likely not able to capture the
patterns in your dataset. This is visible in the top figure below. The solution is
simple: just keep training until you achieve the right fit for the dataset (that’s the
bottom figure). Now, you can’t keep training forever. If you do, the model will learn to
focus too much on patterns hidden within your training dataset — patterns that may
not be present in other real-world data at all; patterns truly specific to the sample
with which you are training.
The result: a model tailored to your specific dataset, visible in the second figure.
In other words, training a Machine Learning model involves finding a good balance
between a model that is underfit and a model that is overfit. Fortunately, many
techniques are available to help you with this, but it’s one of the most common
problems in Supervised ML today.
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSUzI… 4/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSUzI… 5/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
On the top, a model that is underfit with respect to the data. In the middle: a model that is overfit with respect
to the data. On the bottom: the fit that we were looking for.
I think the odds are that I can read your mind at this point.
Overfitting, underfitting, and training a Machine Learning model — how are they
related to Principal Component Analysis?
f{x} = [1.23, -3.00, 45.2, 9.3, 0.1, 12.3, 8.999, 1.02, -2.45, -0.26, 1.24]
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSUzI… 6/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
Will a Machine Learning model be able to generalize across all eleven dimensions?
In other words, do we have sufficient samples to cover large parts of the domains
for all features in the vector (i.e., all the axes in the 11-dimensional space)? Or does
it look like Swiss cheese with (massive) holes?
Wikipedia (n.d.)
The point with “ensuring that there are several samples with each combination of
values” is that when this is performed well, you will likely be able to train a model
that (1) performs well and (2) generalizes well across many settings. With 200
samples, however, it’s 100% certain that you don’t meet this requirement. The effect
is simple: your model will overfit to the data at hand, and it will become worthless if
it is used with data from the real world.
Since increasing dimensionality equals an increasingly growing need for more data,
the only way out of this curse is to reduce the number of dimensions in our dataset.
This is called Dimensionality Reduction, and we’ll now take a look at two
approaches — Feature Selection and Feature Extraction.
We saw that if we want to decrease the odds of overfitting, we must reduce the
dimensionality of our data. While this can easily be done in theory (we can simply
cut off a few dimensions, who cares?), this gets slightly difficult in practice (which
dimension to cut… because, how do I know which one contributes most?).
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSUzI… 7/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
And what if each dimension contributes an equal amount to the predictive power of the
model? What then?
In the field of Dimensionality Reduction, there are two main approaches that you
can use: Feature Selection and Feature Extraction.
Feature Selection can be a good idea if you already think that most variance
within your dataset can be explained by a few variables. If the others are truly
unimportant, then you can easily discard them without losing too much
information.
Feature Extraction, on the other hand, “starts from an initial set of measured
data and builds derived values (features) intended to be informative and non-
redundant” (Wikipedia, 2003). In other words, a derived dataset will be built that
can be used for training your Machine Learning model. It is lower-dimensional
compared to the original dataset. It will be as informative as possible (i.e., as
much information from the original dataset is pushed into the new variables)
while non-redundant (i.e., we want to avoid that information present in one
variable in the new dataset is also present in another variable in the new
dataset). In other words, we get a lower-dimensional dataset that explains most
variance in the dataset, while keeping things relatively simple.
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSUzI… 8/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
Principal component analysis (PCA) is the process of computing the principal components
and using them to perform a change of basis on the data, sometimes using only the first
few principal components and ignoring the rest.
Wikipedia (2002)
Well, that’s quite a technical description, isn’t it. And what are “principal
components”?
Wikipedia (2002)
I can perfectly understand it if you still have no idea what PCA is after reading those
two quotes. I felt the same. For this reason, let’s break down stuff step-by-step.
The goal of PCA: finding a set of vectors (principal components) that best describe
the spread and direction of your data across its many dimensions, allowing you to
subsequently pick the top-n best-describing ones for reducing the dimensionality of
your feature space.
1. If you have a dataset, its spread can be expressed in orthonormal vectors — the
principal directions of the dataset. Orthonormal, here, means that the vectors
are orthogonal to each other (i.e. they have an angle of 90°) and are of size 1.
3. We can then reduce the number of dimensions to the most important ones only.
4. And finally, we can project our dataset onto these new dimensions, called the
principal components, performing dimensionality reduction without losing
much of the information present in the dataset.
The how:
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSUzI… 9/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
Although we will explain how later in this article, we’ll now visually walk through
performing PCA at a high level. This allows you to understand what happens first,
before we dive into how it happens.
Another important note is that for step (1), decomposing your dataset into vectors
can be done in two different ways — by means of (a) eigenvector decomposition of
the covariance matrix, or (b) Singular Value Decomposition. Later in this article,
we’ll walk through both approaches step-by-step.
# Configuration options
num_samples_total = 1000
cluster_centers = [(1,1), (1.25,1.5)]
num_classes = len(cluster_centers)
# Generate data
X, y = make_blobs(n_samples = num_samples_total, centers = cluster_centers, n_f
e)
# Make plot
plt.scatter(X[:, 0], X[:, 1])
axes = plt.gca()
axes.set_xlim([0, 2])
axes.set_ylim([0, 2])
plt.show()
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 10/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
If you look closely at the dataset, you can see that it primarily spreads into two
directions. These directions are from the upper right corner to the lower left corner
and from the lower right middle to the upper left middle. Those directions are
different from the axis directions, which are orthogonal to each other: the x and y
axes have an angle of 90 degrees.
No other set of directions will explain as much as the variance than the one we
mentioned above.
After standarization (future article), we can visualize the directions as a pair of two
vectors. These vectors are called the principal directions of the data (StackExchange,
n.d.). There are as many principal directions as the number of dimensions; in our
case, there are two.
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 11/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
[The eigenvectors and related] eigenvalues explain the variance of the data along the new
feature axes.
Raschka (2015)
In other words, they allow us to capture both the (1) direction and (2) the magnitude
of the spread in your dataset.
Notice that the vectors are orthogonal to each other. Also recall that our axes are
orthogonal to each other. You can perhaps now imagine that it becomes possible to
perform a transformation to your dataset, so that the directions of the axes are equal
to the directions of the eigenvectors. In other words, we change the “viewpoint” of
our data, so that the axes and vectors have equal directions.
This is the core of PCA: projecting the data to our principal directions, which are
then called principal components.
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 12/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
The benefit here is that while the eigenvectors tell us something about the directions
of our projection, the corresponding eigenvalues tell us something about the
importance of that particular principal direction in explaining the variance of the
dataset. It allows us to easily discard the directions that don’t contribute sufficiently.
That’s why before projecting the dataset onto the principal components, we must
first sort the vectors and reduce the number of dimensions.
Once we know the eigenvectors and eigenvalues that explain the spread of our
dataset, we must sort them in order of descending importance. This allows us to
perform dimensionality reduction, as we can keep the principal directions which
contribute most significantly to the spread in our dataset.
Sorting is simple: we sort the list with eigenvalues in descending order and ensure
that our list with eigenvectors is sorted in the same way. In other words, the pairs of
eigenvectors and eigenvalues are jointly sorted in a descending order based on the
eigenvalue. As the largest eigenvalues indicate the biggest explanation for spread in
your dataset, they must be on top of the list.
For the example above, we can see that the eigenvalue for the downward-oriented
eigenvector exceeds the one for the upward-oriented vector. If we draw a line
through the dataset that overlaps with the vector, we can also see that variance for
that line as a whole (where variance is defined as the squared distance of each point
to the mean value for the line) is biggest. We can simply draw no line where
variance is larger.
In fact, the total (relative) contribution of the eigenvectors to the spread for our
example is as follows:
[0.76318124 0.23681876]
So, for our example above, we now have a sorted list with eigenpairs.
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 13/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
As we saw above, the first eigenpair explains 76.3% of the spread in our dataset,
whereas the second one explains only 23.7%. Jointly, they explain 100% of the
spread, which makes sense.
Using PCA for dimensionality reduction now allows us to take the biggest-
contributing vectors (if your original feature space was say 10-dimensional, it is
likely that you can find a smaller set of vectors which explains most of the variance)
and only move forward with them.
If our goal was to reduce dimensionality to one, we would now move forward and
take the 0.763 contributing eigenvector for data projection. Note that this implies
that we will lose 0.237 worth of information about our spread, but in return get a
lower number of dimensions.
Clearly, this example with only two dimensions makes no rational sense as two
dimensions can easily be handled by Machine Learning algorithms, but this is
incredibly useful if you have many dimensions to work with.
Once we have chosen the number of eigenvectors that we will use for
dimensionality reduction (i.e. our target number of dimensions), we can project the
data onto the principal components — or component, in our case.
This means that we will be changing the axes so that they are now equal to the
eigenvectors.
In the example below, we can project our data to one eigenvector. We can see that
only the x axis has values after projecting, and that hence our feature space has
been reduced to one dimension.
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 14/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
4. Projecting the data to the n eigenpairs so that their directions equal the ones of
our axes.
In step (1), we simply mentioned that we can express the spread of our data by
means of eigenpairs. On purpose, we didn’t explain how this can be done, for the
sake of simplicity.
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 15/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
In fact, there are two methods that are being used for this purpose today:
Eigenvector Decomposition (often called “EIG”) and Singular Value Decomposition
(“SVD”). Using different approaches, they can be used to obtain the same end result:
expressing the spread of your dataset in eigenpairs, the principal directions of your
data, which can subsequently be used to reduce the number of dimensions by
projecting your dataset to the most important ones, the principal components.
While mathematically and hence formally you can obtain the same result with both,
in practice PCA-SVD is numerically more stable (StackExchange, n.d.). For this
reason, you will find that most libraries and frameworks favor a PCA-SVD
implementation over a PCA-EIG one. Nevertheless, you can still achieve the same
result with both approaches!
In the next section, we will take a look at a clear and step-by-step example of PCA
with EIG and, in a future post, we’ll look at PCA with SVD, allowing you to
understand the differences intuitively. We will then look at
sklearn.decomposition.PCA , Scikit-learn's implementation of Principal Component
Analysis based on PCA-SVD. There is no need to perform PCA manually if there are
great tools out there, after all!
1. Standardizing the dataset: EIG based PCA only works well if the dataset is
centered and has a mean of zero (i.e. μ = 0.0). We will use standardization for
this purpose, which also scales the data to a standard deviation of one (σ= 1.0).
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 16/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
6. Building the projection matrix for projecting our original dataset onto the
principal components.
We can see that steps (1), (4), (5) and (6) are general — we also saw them above. Steps
(2) and (3) are specific to PCA-EIG and represent the core of what makes eigenvector
decomposition based PCA unique. We will now cover each step in more detail,
including step-by-step examples with Python. Note that the example in this section
makes use of native / vanilla Python deliberately, and that Scikit-learn based
implementations of, e.g., standardization and PCA will be used in another section.
If we want to show how PCA works, we must use a dataset where the number of
dimensions > 2. Fortunately, Scikit-learn provides the Iris dataset, which can be
used to classify three groups of Iris flowers based on four characteristics (and hence
features or dimensions): petal length, petal width, sepal length and sepal width.
This code can be used for visualizing two dimensions every time:
# Configuration options
dimension_one = 1
dimension_two = 3
# Shape
print(X.shape)
print(y.shape)
# Dimension definitions
dimensions = {
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 17/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
0: 'Sepal Length',
1: 'Sepal Width',
2: 'Petal Length',
3: 'Petal Width'
}
# Color definitions
colors = {
0: '#b40426',
1: '#3b4cc0',
2: '#f2da0a',
}
# Legend definition
legend = ['Iris Setosa', 'Iris Versicolour', 'Iris Virginica']
# Make plot
colors = list(map(lambda x: colors[x], y))
plt.scatter(X[:, dimension_one], X[:, dimension_two], c=colors)
plt.title(f'Visualizing dimensions {dimension_one} and {dimension_two}')
plt.xlabel(dimensions[dimension_one])
plt.ylabel(dimensions[dimension_two])
plt.show()
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 18/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 19/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
The images illustrate that two of the Iris flowers cannot be linearly separated, but
that this group can be separated from the other Iris flower. Printing the shape yields
the following:
(150, 4)
(150,)
…indicating that we have only 150 samples, but that our feature space is four-
dimensional. Clearly a case where feature extraction could be beneficial for training
our Machine Learning model.
Performing standardization
We first add Python code for standardization, which brings our data to μ=0.0, σ= 1.0
by performing x =( x —μ) /σ for each dimension.
# Perform standardization
for dim in range(0, X.shape[1]):
print(f'Old mean/std for dim={dim}: {np.average(X[:, dim])}/{np.std(X[:, dim]
X[:, dim] = (X[:, dim] - np.average(X[:, dim])) / np.std(X[:, dim])
print(f'New mean/std for dim={dim}: {np.abs(np.round(np.average(X[:, dim])))}
# Make plot
colors = list(map(lambda x: colors[x], y))
plt.scatter(X[:, dimension_one], X[:, dimension_two], c=colors)
plt.title(f'Visualizing dimensions {dimension_one} and {dimension_two}')
plt.xlabel(dimensions[dimension_one])
plt.ylabel(dimensions[dimension_two])
plt.show()
And indeed:
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 20/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
The next step is computing the covariance matrix for our dataset.
In probability theory and statistics, a covariance matrix (…) is a square matrix giving the
covariance between each pair of elements of a given random vector.
Wikipedia (2003)
If you’re not into mathematics, I can understand that you don’t know what this is yet.
Let’s therefore briefly take a look at a few aspects related to a covariance matrix
before we move on, based on Lambers (n.d.).
Variable mean: the average value for the variable. Computed as the sum of all
available values divided by the number of values summed together. As petal width
represents dim=3 in the visualization above, with a mean of ≈ 1.1993, we can see
how the numbers above fit.
Variance: describing the “spread” of data around the variable. Computed as the sum
of squared differences between each number and the mean, i.e. the sum of (x- μ)²for
each number.
Covariance: describing the joint variability (or joint spread) of two variables. For
each pair of numbers from both variables, covariance is computed as:
Covariance matrix for n variables: a matrix representing covariances for each pair
of variables from some set of variables (dimensions) V = [X, Y, Z, ….].
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 21/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
Fortunately, there are some properties which make covariance matrices interesting
for PCA (Lambers, n.d.):
[Cov(X, X) = Var(X)
Cov(X, Y) = Cov(Y, X)
print('NumPy-computed:')
print(np.round(np.cov(X.T), 2))
> Self-computed:
> [[ 1. -0.12 0.87 0.82]
> [-0.12 1. -0.43 -0.37]
> [ 0.87 -0.43 1. 0.96]
> [ 0.82 -0.37 0.96 1. ]]
> NumPy-computed:
> [[ 1.01 -0.12 0.88 0.82]
> [-0.12 1.01 -0.43 -0.37]
> [ 0.88 -0.43 1.01 0.97]
> [ 0.82 -0.37 0.97 1.01]]
The great thing of EIG-PCA is that we can decompose the covariance matrix into
eigenvectors and eigenvalues.
We can use NumPy’s numpy.linalg.eig to compute the eigenvectors for this square
array:
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 23/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 24/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
In other words, the first principal dimension contributes for 73%; the second one
for 23%. If we therefore reduce the dimensionality to two, we get to keep
approximately 73 + 23 = 96% of the variance explanation.
# Sort eigenpairs
eigenpairs = [(np.abs(eig_vals[x]), eig_vect[:,x]) for x in range(0, len(eig_va
eig_vals = [eigenpairs[x][0] for x in range(0, len(eigenpairs))]
eig_vect = [eigenpairs[x][1] for x in range(0, len(eigenpairs))]
print(eig_vals)
Above, we saw that 96% of the variance can be explained by only two of the
dimensions. We can therefore reduce the dimensionality of our feature space from
n = 4 to n = 2 without losing much of the information.
The final thing we must do is generate the projection matrix and project our
original data onto the (two) principal components (Raschka, 2015):
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 25/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 26/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
That’s it! You just performed Principal Component Analysis using Eigenvector
Decomposition and have reduced dimensionality to two without losing much of the
information in the dataset.
Summary
In this article, we read about performing Principal Component Analysis on the
dimensions of your dataset for the purpose of dimensionality reduction. Some
datasets have many features and few samples, meaning that many Machine
Learning algorithms will be struck by the curse of dimensionality. Feature
extraction approaches like PCA, which attempt to construct a lower-dimensional
feature space based on the original dataset, can help reduce this curse. Using PCA,
we can attempt to recreate our feature space with fewer dimensions and with
minimum information loss.
After defining the context for applying PCA, we looked at it from a high-level
perspective. We saw that we can compute eigenvectors and eigenvalues and sort
those to find the principal directions in your dataset. After generating a projection
matrix for these directions, we can map our dataset onto these directions, which are
then called the principal components. But how these eigenvectors can be derived
was explained later, because there are two methods for doing so: using Eigenvector
Decomposition (EIG) and the more generalized Singular Value Decomposition
(SVD).
In two step-by-step examples, we saw how we can apply both PCA-EIG (this post)
and PCA-SVD (next post) for performing a Principal Component Analysis. In the
first case, we saw that we can compute a covariance matrix for the standardized
dataset which illustrates the variances and covariances of its variables. This matrix
can then be decomposed into eigenvectors and eigenvalues, which illustrate the
direction and magnitude of the spread expressed by the covariance matrix. Sorting
the eigenpairs, we can select the principal directions that contribute most to
variance, generate the projection matrix and project our data.
While PCA-EIG works well with symmetric and square matrices (and hence with our
covariance matrix), it can be numerically unstable. That’s why PCA-SVD is very
common in today’s Machine Learning libraries. In another step-by-step example, we
will look at how the SVD can be used directly on the standardized data matrix for
deriving the eigenvectors we also found with PCA-EIG. They can be used for
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 27/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
generating a projection matrix which allowed us to arrive at the same end result as
when performing PCA-EIG.
It’s been a thorough read, that’s for sure. Still I hope that you have learned
something. Any comments, questions or suggestions are welcome. Thank you for
reading!!
References
Wikipedia. (n.d.). Curse of dimensionality. Wikipedia, the free encyclopedia.
Retrieved December 3, 2020, from
https://en.wikipedia.org/wiki/Curse_of_dimensionality
Wikipedia. (2004, November 17). Feature selection. Wikipedia, the free encyclopedia.
Retrieved December 3, 2020, from https://en.wikipedia.org/wiki/Feature_selection
Wikipedia. (2003, June 8). Feature extraction. Wikipedia, the free encyclopedia.
Retrieved December 3, 2020, from https://en.wikipedia.org/wiki/Feature_extraction
Wikipedia. (2002, August 26). Principal component analysis. Wikipedia, the free
encyclopedia. Retrieved December 7, 2020, from
https://en.wikipedia.org/wiki/Principal_component_analysis
StackExchange. (n.d.). Relationship between SVD and PCA. How to use SVD to perform
PCA? Cross Validated.
https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-
pca-how-to-use-svd-to-perform-pca
Raschka, S. (2015, January 27). Principal component analysis. Dr. Sebastian Raschka.
https://sebastianraschka.com/Articles/2015_pca_in_3_steps.html
Technology
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 28/36
10/28/24, 9:39 AM Principal Component Analysis with Python (A Deep Dive) | by Francesco Franco | Oct, 2024 | Medium
Follow
https://medium.com/@francescofranco_39234/principal-component-analysis-with-python-a-deep-dive-0c5195bff087#id_token=eyJhbGciOiJSU… 29/36