Principal Component Analysis
Principal Component Analysis
Analysis (PCA)
Let there be n data points denoted by x₁, x2, ..., x, where each x; is a d-dimensional
vector. The goal is to reduce these points and find a mapping y₁, y2, ..., Yu where each
Vi is a p-dimensional vector (where p <d, and often p << d). That is, the data points
XXXX E Rd are mapped to y₁, 2, ..., Vy¡ ER".
Let X be a dxn matrix that contains all the data points in the original space which
has to be mapped to another pxn matrix Y (matrix), which retains maximum variabil
ity of the data points by reducing the number of features to represent the data point.
Note: In PCA or any variant of PCA, a standardized input matrix is used. So, X
represents the standardized input data matrix, unless otherwise specified.
Now let us discuss how PCA solves the problem of dimensionality reduction.
PCA is a method based on linear projection. For any linear projection-based tech
niques, given a d-dimensional vector x,, we obtain a low dimensional representation
y; (a p-dimensional vector) such that
y=UT xi (2.1)
5
6
Dimensionality Reduction and Data Visualization
How is the direction of the principal components chosen? The basic idea is to pick
a direction along which data is maximally spread, that is, the direction along which
there is maximum variance [1, 3, 4].
Let us consider the first principal component (the direction along which the data
has maximum variance) to be U₁. We now project the n points from matrix X on U₁,
given by:
UTX (2.2)
So, by definition, we would now like to find the maximum variance of U,TX. To find
that we solve for the below optimization problem:
max UX
U₁
We know that
Here, Σ is a sample covariance matrix of the data matrix X. Thus, the optimization
problem now becomes
However, U, EU, is a quadratic function with no fixed upper bound. So, (2.4) turns
out to be an ill-defined problem. This is because (2.4) depends on the direction as
well as the magnitude of U₁. To convert this into a well-defined problem, we need to
add a constraint to (2.4).
There are two possible approaches to resolve this problem. Either we add a con
straint on the direction of U, or on the magnitude of U₁. Adding a constraint on the
direction of U, and trying to calculate max (UTE U₁) will still make it an ill-defined
problem, because there is no upper-bound even after adding the constraint. But if we
add a constraint on the magnitude of U₁, that is, if we restrict the magnitude of U₁.
let us say U₁1 U₁ = 1, the length of the principal component is fixed. Hence, there is
only one direction in which UT EU, would be maximum. Thus, this problem has
an upper bound, and to solve for the upper bound, we are interested in finding the
direction of U₁.
Using the second case, between all fixed-length vectors, we search for the direc
tion of the maximum variance of the data. Now we have a well-defined problem as
follows:
maxUEU,
U₁
(2.5)
subject toU,TU₁ = 1
7
Lagrange's multipliers say that there exists 2, ER such that the solution U, to the
above problem can be rewritten as:
To optimize (2.6), we simply take the derivative and equate it to 0. This gives:
Thus,
EU₁ = M₁ U₁ (2.8)
Here, Σ is the covariance matrix of X, λ, is the dual variable and the vector that we
are looking for is U₁.
The direction U, obtained by maximizing the variance is the direction of some
unit vector that satisfies (2.8). However, this is exactly the definition of an eigenvec
tor of matrix Σ. Note that the eigenvector of the matrix multiplied to the matrix
results in a vector that is just the scaled version of the eigenvector itself. So, U, is the
eigenvector of Σ, and is the corresponding eigenvalue.
Now, the question is which eigenvector to choose? Σ is a d x d matrix. So, there
are at most d eigenvectors and eigenvalues.
Now, our objective is to maximize U₁¹ Σ U, and from (2.8) we know that Σ U₁ =
MU₁:
= M₁U₁TU₁ (2.10)
= λ₁ (2.11)
And since we want to maximize the above quantity, that is, we want to maximize λ₁,
it is evident that we have to pick U₁ to be the eigenvector corresponding to the largest
eigenvalue of Σ. For U₂, U, we proceed similarly and pick the eigenvector that
...
has the second largest, up to the ph largest eigenvalues. Thus, the eigenvector of the
sample covariance matrix Σ corresponding to the maximum eigenvalue the first
principal component. Similarly, the second principal component is the eigenvector
with the second largest eigenvalue and so on. In the same way, all the principal com
ponents can be found just by the eigendecomposition of the covariance matrix [5].
Singular Value Decomposition (SVD) could quickly solve the eigendecomposi
tion problem in a computationally efficient manner. SVD routines are much more
numerically stable than eigenvector routines for finding eigenvectors in the matrix.
We can use SVD to calculate eigenvectors in PCA very efficiently [5, 6]. SVD is a
matrix factorization technique which expresses a matrix (in this case, it is X) as a
linear combination of matrices of rank 1. SVD is a stable method because it does not