Principle Component Analysis
Principle Component Analysis
In addition to data visualization, PCA is also sometimes used for data compression
(though this is being becoming less often with data storage rising). It has also been
used for speeding up training of a supervised learning model (typically support
vector machines), though this has gotten less effective over time too.
What if instead of combining the features together, a third axis could be created,
based on the line function of the two features plotted against each other? This
If the data points are spread out, the axis fits well. Conversely, if they are all
clumped together, the axis does not fit well. Variance is important because it
means the data has still captured its information.
After a new z-axis has been defined, which is in the form of a two-unit vector
(z1 , z2 ), coordinates are projected onto the axis by first taking a dot product of
the coordinate vector (x1 , x2 ) and the axis vector.
z = [ 1] ⋅ [ 1 ]
x z
x2 z2
This dot product results in a scaler value, which is a distance value. Since the z-
axis is a vector from the origin, the distance value is translated onto the vector to
give the final coordinate points.
coordinates = z ⋅ [z1 z2 ]
Additionally, it is important to note PCA is not linear regression, even though they
may look similar. Since both algorithms try to minimize the distance between the
coordinates and the axis, the key difference is the axis the distance is minimized
from. Linear regression minimizes distance along the y-axis, while PCA always
uses the perpendicular line.
Also, linear regression only has two features, while PCA can have many features,
and multiple axis are used to retain the information (variance). They are both very
different algorithms used for different purposes, and it becomes more apparent
as the PCA algorithm has more features.
[ 1] = z ∗ [ 1 ]
x z
x2 z2
Note, this is a multiplication operation being performed and not a dot product.