GIS320 Lecture6 Principal Components Analysis
GIS320 Lecture6 Principal Components Analysis
Lecture 6
disclaimer
1
2023/08/14
what is PCA?
• Principal Component Analysis (or PCA) is a method that is used to
reduce the dimensionality of large data sets. How?
concepts in PCA
• Conceptually, using two datasets, the transformation of the data is
accomplished as follows:-
2
2023/08/14
concepts in PCA
• The major axis of the ellipse is determined
• The major axis becomes the new x-axis, the first principal component (PC1)
PC1 depicts the greatest variation because it is the largest transect that can
be drawn through the ellipse
• I.e., greatest variation = the line that captures most information of the data
concepts in PCA
• An orthogonal line perpendicular to PC1 is calculated.
• This line is the second principal component (PC2) and the new axis
for the original y-axis.
• The new axis describes the greatest variance not described by PC1.
3
2023/08/14
steps in a PCA?
STEP 1: STANDARDISATION
• The aim of this step is to standardize the range of the continuous initial
variables so that each one of them contributes equally to the analysis.
• Why?
Steps in a PCA?
STEP 2: CORRELATION MATRIX COMPUTATION
• The aim of this step is to see if there is any relationship between the
variables.
• What is correlation?
– “Correlation” on the other hand measures both the strength and
direction of the linear relationship between two variables.
4
2023/08/14
Steps in a PCA?
STEP 3: COMPUTE THE PRINCIPAL COMPONENTS
• These combinations are done in such a way that the new variables
(i.e., principal components) are uncorrelated and most of the
information within the initial variables is squeezed or compressed
into the first components.
Steps in a PCA?
STEP 3: COMPUTE THE PRINCIPAL COMPONENTS
• Principal components are less interpretable and don’t have any real
meaning since they are constructed as linear combinations of the
initial variables.
5
2023/08/14
Steps in a PCA?
STEP 3: COMPUTE THE PRINCIPAL COMPONENTS
6
2023/08/14
– ASPECT: Band 1
– HILLSHADE: Band 2
– SLOPE: Band 3
7
2023/08/14
• This table shows that the first component accounts for 67.1% of the
covariance (or ‘information’ of the 3 rasters collectively)
• When you add the second component, it accounts for 98.1% of the
‘information’. The third component does not give much extra information
(1.9%) and is slightly redundant with principal components 1 and 2.
8
2023/08/14
9
2023/08/14
10
2023/08/14
• The first principal component will have the greatest variance, the second will
show the second most variance not described by the first, and so forth.
• The first three or four rasters of the resulting multiband raster from principal
components tool will describe more than 95 percent of the variance. The
remaining individual raster bands can be dropped.
• Since the new multiband raster contains fewer bands, and more than 95 percent
of the variance of the original multiband raster is intact, the computations will be
faster, and the accuracy is maintained.
11
2023/08/14
12
2023/08/14
x5: % Resided for less than five years .07 -.00 -.07 .89 1
x7: Index of Concentration of the Extremes (ICE) -.35 -.65 .01 .02 .09 -.32 1
x8: Diversity Index (DI) .67 .58 -.16 .78 .25 .79 -.15 1
x9: % Foreign born .11 -.13 -.15 .39 .72 .24 .41 .43 1
13
2023/08/14
Factor loadings
Factor labelling
14
2023/08/14
Factor labelling
Write a descriptive label for PC1, PC2, PC3, PC4, PC5, PC6
Principal components
are less interpretable
and don’t have any real
meaning since they are
constructed as linear
combinations of the
initial variables.
15
2023/08/14
uses of PCA?
• Principal Component Analysis (or PCA) is being applied in:
• Biomedical industry
– Drug discover programmes
• Healthcare industry
• Retail industry
– Customer profiling
• Image compression
16