PCA Notes
PCA Notes
An important thing to realize here is that the principal components are less interpretable and
don’t have any real meaning since they are constructed as linear combinations of the initial
variables.
Geometrically speaking, principal components represent the directions of the data that
explain a maximal amount of variance, that is to say, the lines that capture most information
of the data. The relationship between variance and information here, is that, the larger the
variance carried by a line, the larger the dispersion of the data points along it, and the larger
the dispersion along a line, the more information it has.
Once the standardization is done, all the variables will be transformed to the same scale.
Step 2: Covariance Matrix Computation
The aim of this step is to understand how the variables of the input data set are varying from
the mean with respect to each other, or in other words, to see if there is any relationship
between them. Because sometimes, variables are highly correlated in such a way that they
contain redundant information. So, in order to identify these correlations, we compute
the covariance matrix.
The covariance matrix is a p × p symmetric matrix (where p is the number of dimensions)
that has as entries the covariances associated with all possible pairs of the initial variables.
For example, for a 3-dimensional data set with 3 variables x, y, and z, the covariance matrix
is a 3×3 data matrix of this from:
If we consider two-dimensional data X1 & X2, then covariance can be calculated as.
Ex: