Mathematical Approach To PCA
Mathematical Approach To PCA
where a represent ‘n’ scalars and u represents the basis vectors. Basis vectors are
orthogonal to each other. Orthogonality of vectors can be thought of an extension of
the vectors being perpendicular in a 2-D vector space. So our feature vector (data-
set) can be transformed into a set of principal components (just like the basis
vectors).
Objectives of PCA:
1. The new features are distinct i.e. the covariance between the new features
(in case of PCA, they are the principal components) is 0.
2. The principal components are generated in order of the variability in the
data that it captures. Hence, the first principal component should capture
the maximum variability, the second one should capture the next highest
variability etc.
3. The sum of the variance of the new features / the principal components
should be equal to the sum of the variance of the original features.
Working of PCA:
PCA works on a process called Eigenvalue Decomposition of a covariance matrix
of a data set. The steps are as follows:
First, calculate the covariance matrix of a data set.
Then, calculate the eigenvectors of the covariance matrix.
The eigenvector having the highest eigenvalue represents the direction in
which there is the highest variance. So this will help in identifying the first
principal component.
The eigenvector having the next highest eigenvalue represents the
direction in which data has the highest remaining variance and also
orthogonal to the first direction. So, this helps in identifying the second
principal component.
Like this, identify the top ‘k’ eigenvectors having top ‘k’ eigenvalues to
get the ‘k’ principal components.
Numerical for PCA :
Consider the following dataset
2. 3. 1.
x1 0.5 2.2 1.9 2.3 2.0 1.0 1.1
5 1 5
2. 3. 1.
x2 0.7 2.9 2.2 2.7 1.6 1.1 0.9
4 0 6
- - - -
0.6 0.3 0.0 1.2 0.4 0.1
1.3 0.8 0.3 0.7
9 9 9 9 9 9
1 1 1 1
- - - - -
0.4 0.9 0.2 1.0 0.7
1.2 0.3 0.8 0.3 1.0
9 9 9 9 9
1 1 1 1 1
Correlation Matrix c =
where, X is the Dataset Matrix (In this numerical, it is a 10 X 2 matrix)
is the transpose of the X (In this numerical, it is a 2 X 10 matrix) and N is the
number of elements = 10
So,
{So in order to calculate the Correlation Matrix, we have to do the multiplication of
the Dataset Matrix with its transpose}
We get two values for , that are ( ) = 1.28403 and ( ) = 0.0490834. Now we
have to find the eigenvectors for the eigenvalues and
To find the eigenvectors from the eigenvalues, we will use the following
approach:
First, we will find the eigenvectors for the eigenvalue 1.28403 by using the
equation
B. Now divide the elements of the X matrix by the number 1.3602 (just found that)
So now we found the eigenvectors for the eigenvector , they are 0.67787 and
0.73518
Secondly, we will find the eigenvectors for the eigenvalue 0.0490834 by using
the equation {Same approach as of previous step)
B. Now divide the elements of the X matrix by the number 1.3602 (just found that)
So now we found the eigenvectors for the eigenvector \lambda_2, they are
0.735176 and 0.677873
Sum of eigenvalues ( ) and ( ) = 1.28403 + 0.0490834 = 1.33 = Total Variance
{Majority of variance comes from }
Step 3: Arrange Eigenvalues
The eigenvector with the highest eigenvalue is the Principal Component of the
dataset. So in this case, eigenvectors of lambda1 are the principal components.
{Basically in order to complete the numerical we have to only solve till this step, but
if we have to prove why we have chosen that particular eigenvector we have to
follow the steps from 4 to 6}
Step 4: Form Feature Vector