Chapter2 PCA
Chapter2 PCA
2023/2024
Chapter 2: Principal Components Analysis (PCA)
• It developed mainly in the last half of the last century, due to the evolution
of computer science and the development of computers.
Among all the possible linear combinations, the one with the maximum
variance is chosen in each case, because:
P.C. must explain a large part of the variation associated with the
initial variables
2.1: General concepts
• The differentiation between elements of a population is measured by
the variance:
is a known matrix
is a vector
is a scalar
2.1: General concepts
Eigenvalues of matrix
The system
can be rewritten as
Symmetric matrices
• Eigenvalues will always be real
• The sum of all eigenvalues will be equal to the trace of the matrix
(sum of the elements of the main diagonal).
Principal Components
2.2: Construction of the principal components
where
are constants
2.2: Construction of the principal components
•
• Any two principal components are uncorrelated:
• In any principal component the sum of the squares of the coefficients
is 1
…
2.2: Construction of the principal components
where:
are, respectively, the p normed eigenvectors associated with the
largest p eigenvalues of ( ) and ,
2.2: Construction of the principal components
Geometric interpretation
• It is proved that these points lie on a hyper-ellipsoid (which in the case 𝑛×2 is
an ellipse) with center at the origin and axes whose direction is given
respectively by each of the eigenvectors of the matrix (which is symmetric
and positive semi-definite) and whose length is proportional to the
corresponding eigenvalue.
2.2: Construction of the principal components
In the two-dimensional case ( ), we have:
and X2
c × l2 x1
v1
v2
x21 x11
0
c × l2
X1
c × l1
2º
1º
2.2: Construction of the principal components
• Then:
since is the eigenvector of for the eigenvalue
and therefore
• This is generally not the case, so we will have to use the estimate:
• The eigenvalues of S are all non-negative and equal to the estimates of the variances
of the corresponding principal components.
However, in practice, what is usually done is to consider that our sample actually
constitutes a population and, therefore, it is considered that:
• the principal components obtained from S are effectively “the” principal components
and not the estimates of the principal components obtained from .
2.3: Estimation of the principal components
• In many situations, the variables under study are not all measured in the same unit,
on the same scale, or are even of a different nature, or have very different variances.
This procedure leads to obtaining variables with null mean value and unit variance:
• that is, the variables under study all have the same variance.
• The influence of variables with small variance tends to be inflated while that of
variables with high variance tends to be reduced.
2.3: Estimation of the principal components
Correlation Matrix
and
Note that:
and
where and (which only happens in the bivariate case)
2.3: Estimation of the principal components
Example 2.3.1
If we use the correlation matrix instead of the covariance matrix (that is, the data are
standardized), we will have:
• correlation matrix
• Given that the p.c.’s can be ordered in descending order of their variance and that the
larger this is, the more representative of the original data will be the corresponding
principal component, we must retain the first p.c.'s.
ie
2.4: Dimensionality reduction
That is, the sum of the variances of the original variables is equal to the
sum of the variances of the p.c.’s (if we consider all the p. c.'s we explain
all the variability).
The proportion of the total variance that is explained by the jth principal
component Yj is given by:
measure of the
importance of this p.c.
2.4: Dimensionality reduction
If the data are standardized, that is, if we are working with the correlation
matrix:
• the total variance will be equal to the number of variables (p) (since the
diagonal of P is all formed by 1's):
The proportion of the total variance that is explained by the jth principal
component Yj is given by:
measure of the
importance of this p.c.
2.4: Dimensionality reduction
Some of the rules that can be used are:
• retain as many p.c.'s as necessary so that the percentage
of variance explained by them is greater than a given
value fixed a priori; that is, retain the first r p.c.'s such
that
2,0
where the contributions of the various p.c.'s
are distinguished. 1,5
1,0
Eigenvalue
The r p.c.'s that contribute the most should ,5
Component Number
2.4: Dimensionality reduction
Among these criteria, the mean and the scree-plot are the most commonly
used.
• Practice has shown that these criteria both lead to credible solutions if at least
one of the following conditions is met:
number of variables less than 30 or number of cases (individuals) greater than 250.
Covariance matrix
Correlation matrix
2.5: Interpretation of the Principal Components
• The meaning of a p.c. will be interpreted from the variables that most
correlate with it.
• The coefficients of the linear combinations ( ) and the correlations
between the initial variables and the p.c.’s, or the loadings will be used.
Loading of variable Xi for p.c. Yj:
• The covariance between the i th variable (Xi) and the j th p.c. (Yj) is:
since
2.5: Interpretation of the Principal Components
• The correlation coefficient between the i th variable (Xi) and the j th p.c. (Yj) is:
since
• If the data are standardized, or if the correlation matrix has been used, we have:
, and therefore
• Thus, if the absolute value of a coefficient of a p.c. for a given variable is high, it
can be concluded that the correlation between this c.p. and the variable is high.
• the square of the correlation is equal to or greater than the mean of the squares
of the p correlations
• The absolute value of the correlation is equal to or greater than 0.5
The variables that must be used in the interpretation of the j th p. c. will be those that
present coefficients that lead to the verification of one of the rules:
Correlation matrix
1. 2. or
or since
2.5: Interpretation of the Principal Components
Covariance matrix
1. 2.
or or
2.5: Interpretation of the Principal Components
The relative importance of a variable Xi for the explanation of a p.c. Yj is given by:
since the vector is normed,
(However, this value tells us nothing about the importance of the principal component itself.)
The meaning to give to a p.c. (useful for interpretation) will be closely associated with
the variables corresponding to high .
2.5: Interpretation of the Principal Components
The variance of a variable Xi that is explained by a p.c. Yj is given by:
• The square of the correlation coefficient between Xi and Yj, , can be interpreted
as representing the proportion of the variance of the variable Xi that is explained by
the p.c. Yj, because:
with
In the previous case we have, for the 1st p.c. (using the data related to the
sample and therefore the resulting sample measures: r = estimate of )
• The 1st p.c. is highly correlated with both variables (using criterion 1: X1 is
more important to explain Y1 than X2). Also, when X1 or X2 increases, so does
Y1.
2.5: Interpretation of the Principal Components
For the 2nd p.c. we have:
• The 2nd p.c. is poorly correlated with both variables (X2 is more important to
explain Y2 than X1). When X1 increases Y2 decreases and when X2 increases
Y2 also increases.
• The variables are both important for the explanation of the 1st principal
component, with X1 being more important than X2
• none is very important for the explanation of the 2nd principal component,
X2 being more important than X1.
2.5: Interpretation of the Principal Components
Relative importance of the variable Xi Y1 Y 22
for the explanation of p.c. Yj X1 0.707 0.291
X2 0.291 0.707
We can now think of applying the same transformation to the data, that is, to
the observation vectors (columns of the X data matrix) of the
variables , respectively.
2.6: Scores
… …
2.6: Scores
In matrix form, we would have:
In general, the first two p.c.'s are preferably chosen, as they are the ones
that most contribute to the explanation of data variability.
2.7: Graphical Representations
2.7: Graphical Representations
2.7: Graphical Representations
2.8: Use of Principal Components
• if the principal components are not normally distributed, neither will the
original variables.
• if the first two p.c.’s explain a good part of the total variability, we can
represent the scores of the individuals in the plane defined by these two
components and try to visualize clusters of the obtained points.
• if there is a need to use more than two p.c.'s, the scores of the individuals for
the most important p.c.'s are used instead of the initial values of the variables
(which were in greater number), and the groups are built from them, using one
of the classification analysis methods.