A Step by Step Explanation of Principal Component Analysis
A Step by Step Explanation of Principal Component Analysis
PCA is actually a widely covered method on the web, and there are some
great articles about it, but only few of them go straight to the point and
explain how it works without diving too much into the technicalities and
the ‘why’ of things. That’s the reason why i decided to make my own post to
present it in a simplified way.
https://towardsdatascience.com/a-step-by-step-explanation-of-principal-component-analysis-b836fb9c97e2 1/7
8/3/2019 A step by step explanation of Principal Component Analysis
and visualize and make analyzing data much easier and faster for machine
learning algorithms without extraneous variables to process.
So to sum up, the idea of PCA is simple — reduce the number of variables of
a data set, while preserving as much information as possible.
. . .
Now, that we know that the covariance matrix is not more than a table that
summaries the correlations between all the possible pairs of variables, let’s
move to the next step.
An important thing to realize here is that, the principal components are less
interpretable and don’t have any real meaning since they are constructed as
linear combinations of the initial variables.
set. For example, let’s assume that the scatter plot of our data set is as
shown below, can we guess the first principal component ? Yes, it’s
approximately the line that matches the purple marks because it goes
through the origin and it’s the line in which the projection of the points (red
dots) is the most spread out. Or mathematically speaking, it’s the line that
maximizes the variance (the average of the squared distances from the
projected points (red dots) to the origin).
The second principal component is calculated in the same way, with the
condition that it is uncorrelated with (i.e., perpendicular to) the first
principal component and that it accounts for the next highest variance.
Without further ado, it is eigenvectors and eigenvalues who are behind all
the magic explained above, because the eigenvectors of the Covariance
matrix are actually the directions of the axes where there is the most variance
(most information) and that we call Principal Components. And
eigenvalues are simply the coefficients attached to eigenvectors, which give
the amount of variance carried in each Principal Component.
Example:
let’s suppose that our data set is 2-dimensional with 2 variables x,y and that
the eigenvectors and eigenvalues of the covariance matrix are as follows:
https://towardsdatascience.com/a-step-by-step-explanation-of-principal-component-analysis-b836fb9c97e2 5/7
8/3/2019 A step by step explanation of Principal Component Analysis
So, the feature vector is simply a matrix that has as columns the
eigenvectors of the components that we decide to keep. This makes it the
first step towards dimensionality reduction, because if we choose to keep
only p eigenvectors (components) out of n, the final data set will have only
p dimensions.
Example:
Continuing with the example from the previous step, we can either form a
feature vector with both of the eigenvectors v1 and v2:
Or discard the eigenvector v2, which is the one of lesser significance, and
form a feature vector with v1 only:
https://towardsdatascience.com/a-step-by-step-explanation-of-principal-component-analysis-b836fb9c97e2 6/7
8/3/2019 A step by step explanation of Principal Component Analysis
. . .
So, as we saw in the example, it’s up to you to choose whether to keep all
the components or discard the ones of lesser significance, depending on
what you are looking for. Because if you just want to describe your data in
terms of new variables (principal components) that are uncorrelated
without seeking to reduce dimensionality, leaving out lesser significant
components is not needed.
Last step : Recast the data along the principal components axes
In the previous steps, apart from standardization, you do not make any
changes on the data, you just select the principal components and form the
feature vector, but the input data set remains always in terms of the original
axes (i.e, in terms of the initial variables).
In this step, which is the last one, the aim is to use the feature vector formed
using the eigenvectors of the covariance matrix, to reorient the data from
the original axes to the ones represented by the principal components
(hence the name Principal Components Analysis). This can be done by
multiplying the transpose of the original data set by the transpose of the
feature vector.
. . .
If you enjoyed this story, please click the button as many times
as you think it deserves. And share to help others find it! Feel free to
leave a comment below.
References :
[Steven M. Holland, Univ. of Georgia]: Principal Components Analysis
https://towardsdatascience.com/a-step-by-step-explanation-of-principal-component-analysis-b836fb9c97e2 7/7