0% found this document useful (0 votes)

15 views16 pages

GIS320 Lecture6 Principal Components Analysis

Uploaded by

thandokunene6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

GIS320 Lecture6 Principal Components Analysis

Uploaded by

thandokunene6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

2023/08/14

Lecture 6

Principal components analysis (PCA)

Associate Professor Gregory Breetzke

[email protected]
Room 1-19, Geography Building

disclaimer

1
2023/08/14

what is PCA?
• Principal Component Analysis (or PCA) is a method that is used to
reduce the dimensionality of large data sets. How?

• E.g., change 20 variables into 4 variables/factors/components

• Reducing the number of variables of a data set comes at the

expense of accuracy, but the trick in dimensionality reduction is to
trade a little accuracy for simplicity

• So to sum up, the idea of PCA is simple — reduce the number of

variables of a data set, while preserving as much information as
possible

concepts in PCA
• Conceptually, using two datasets, the transformation of the data is
accomplished as follows:-

– The data is plotted in a scatterplot

– An ellipse is calculated to bound the points in the scatterplot

2
2023/08/14

concepts in PCA
• The major axis of the ellipse is determined

• The major axis becomes the new x-axis, the first principal component (PC1)
PC1 depicts the greatest variation because it is the largest transect that can
be drawn through the ellipse

• I.e., greatest variation = the line that captures most information of the data

• The direction of PC1 is the eigenvector

and its magnitude is the eigenvalue.

• The angle of the x-axis to PC1

is the angle of rotation that is
used in the transformation.

concepts in PCA
• An orthogonal line perpendicular to PC1 is calculated.

• This line is the second principal component (PC2) and the new axis
for the original y-axis.

• The new axis describes the greatest variance not described by PC1.

• What happens if there are more than two datasets/variables?

3
2023/08/14

steps in a PCA?
STEP 1: STANDARDISATION

• The aim of this step is to standardize the range of the continuous initial
variables so that each one of them contributes equally to the analysis.

• Why?

• Calculate z-score. Why?

• Once the standardization is done, all the variables will be transformed to

the same scale

Steps in a PCA?
STEP 2: CORRELATION MATRIX COMPUTATION

• The aim of this step is to see if there is any relationship between the
variables.

• Because sometimes, variables are highly correlated in such a way

that they contain redundant information. So, in order to identify
these correlations, we compute a matrix.

• What is correlation?
– “Correlation” on the other hand measures both the strength and
direction of the linear relationship between two variables.

4
2023/08/14

Steps in a PCA?
STEP 3: COMPUTE THE PRINCIPAL COMPONENTS

• Principal components are new variables that are constructed as

linear combinations or mixtures of the initial variables.

• These combinations are done in such a way that the new variables
(i.e., principal components) are uncorrelated and most of the
information within the initial variables is squeezed or compressed
into the first components.

Steps in a PCA?
STEP 3: COMPUTE THE PRINCIPAL COMPONENTS

• Principal components are less interpretable and don’t have any real
meaning since they are constructed as linear combinations of the
initial variables.

• Principal components are constructed in such a manner that the first

principal component accounts for the largest possible variance in
the data set

5
2023/08/14

Steps in a PCA?
STEP 3: COMPUTE THE PRINCIPAL COMPONENTS

• The second principal component is calculated in the same way, with

the condition that it is uncorrelated with (i.e., perpendicular to) the
first principal component and that it accounts for the next highest
variance

• This continues until a total of p principal components have been

calculated, equal to the original number of variables.

PCA in raster GIS

• Principal component analysis catches redundancy between data sets.

• What about aspect, slope, and hillshade data? Is there redundancy in

these three data sets? If so, how much?

6
2023/08/14

PCA in raster GIS

Step 1. Run the “Composite” tool in ArcPro

• The composite bands tool combines the aspect, hillshade, and

slope rasters into a single 3-band raster. Use the following
rasters as inputs:

– ASPECT: Band 1
– HILLSHADE: Band 2
– SLOPE: Band 3

• Output the new raster as Composite

PCA in raster GIS

Step 2. Execute the “Principal Components” tool

• Using the Spatial Analyst extension in ArcPro, execute the

“Principal Components” tool with the following criteria:

– INPUT RASTER: Composite

– OUTPUT RASTER: PCA
– NUMBER OF PRINCIPAL COMPONENTS: 3
– OUTPUT DATA FILE: PrincipalComponents.txt

• The result will be a 3-channel PCA composite and a data file

showing the amount of redundancy.

7
2023/08/14

PCA in raster GIS

• The “percent of eigenvalues” shows how much each principal component
accounts for.
Magnitude of variance

• This table shows that the first component accounts for 67.1% of the
covariance (or ‘information’ of the 3 rasters collectively)

• When you add the second component, it accounts for 98.1% of the
‘information’. The third component does not give much extra information
(1.9%) and is slightly redundant with principal components 1 and 2.

PCA in remote sensing

• Running a principal component analysis on three bands is useful
because we found the third component did not add much information.

• What about a 10-band multispectral image? Or even 100 or 200 bands

(hyperspectral imagery)?

• This is where PCA is really useful – multispectral and hyperspectral

analysis.

• For example, if most of the variance (eigenvalue) is found in principal

components one, two, and three, it’s only necessary to use these three
principal components. For land cover classification, it is much easier
using three bands compared to all 10 bands.

• In summary, PCA identifies duplicate data over multiple channels,

reduces redundancy, and speeds up the processing time. This is key for
principal component analysis image processing.

8
2023/08/14

PCA in raster (ArcPro)

• The input raster bands.
• They can be integer or floating point
type.

PCA in raster (ArcPro)

• The output multiband raster dataset.

• If all of the input bands are integer type,
the output raster bands will be integer. If
any of the input bands are floating point,
the output will be floating point.

9
2023/08/14

PCA in raster (ArcPro)

• Number of principal components.

• The number must be greater than zero
and less than or equal to the total
number of input raster bands.
• The default is the total number of
rasters in the input.

PCA in raster (ArcPro)

• Output ASCII data file storing principal

component parameters.
• The output data file records the
correlation and covariance matrices, the
eigenvalues and eigenvectors, the
percent variance each eigenvalue
captures, and the accumulative variance
described by the eigenvalues.
• The extension for the output file can be
.txt or .asc.

10
2023/08/14

PCA in raster (ArcPro)

• The result of the tool is a multiband raster with the same number of bands as
the specified number of components (one band per axis or component in the
new multivariate space).

• The first principal component will have the greatest variance, the second will
show the second most variance not described by the first, and so forth.

• The first three or four rasters of the resulting multiband raster from principal
components tool will describe more than 95 percent of the variance. The
remaining individual raster bands can be dropped.

• Since the new multiband raster contains fewer bands, and more than 95 percent
of the variance of the original multiband raster is intact, the computations will be
faster, and the accuracy is maintained.

PCA in raster (ArcPro)

11
2023/08/14

PCA in vector (GeoDa)

• REMEMBER THESE STEPS
1. Come up with a list of possible x (independent) variables that may be helpful in
estimating y (dependent variable)
2. Collect data on the y variable and your x variables from step 1
3. Check the relationships between each x (independent) variable and y (using
scatterplots and correlations), and use the results to eliminate those variables
that aren’t strongly related to y
4. Look at the possible relationships between the x (independent) variables to
make sure you aren’t being redundant (avoid multicollinearity)
5. Use those x variables (from step 4) in a multiple OLS regression analysis to
find the best-fitting model for your data
6. Use the best-fitting model (from step 5) to predict y for given x- values by
plugging those x-values into the model

• REMEMBER THESE ASSUMPTIONS

1. Linear relationship between dependent and independent variable(s)
2. Outliers
3. Non-stationarity
4. Multicollinearity
5. Spatially autocorrelated residuals
6. Normal distribution bias

PCA in vector (GeoDa)

• Sometimes, variables are highly correlated in such a way that it would be
duplicate information found in another variable.

• Principal component analysis identifies duplicate data over several

datasets. Then, PCA aggregates only essential information into groups called
“principal components“.

12
2023/08/14

PCA in vector (GeoDa)

• Assumption of regression

Table: Correlations for the independent variables

x1 x2 x3 x4 x5 x6 x7 x8 x9
x1: % Unemployed 1

x2: NZDep .81 1

x3: % Males -.15 -.12 1

x4: % Aged 15-29 .80 .25 .03 1

x5: % Resided for less than five years .07 -.00 -.07 .89 1

x6: % Renting .53 .60 .02 .76 .51 1

x7: Index of Concentration of the Extremes (ICE) -.35 -.65 .01 .02 .09 -.32 1

x8: Diversity Index (DI) .67 .58 -.16 .78 .25 .79 -.15 1

x9: % Foreign born .11 -.13 -.15 .39 .72 .24 .41 .43 1

• Rule of thumb: 0.70 threshold

PCA in vector (GeoDa)

13
2023/08/14

PCA in vector (GeoDa)

Factor loadings

PCA in vector (GeoDa)

Factor labelling

14
2023/08/14

PCA in vector (GeoDa)

Factor labelling

Which variables are ‘loaded’ onto PC1?

Which variables are ‘loaded’ onto PC2? etc

Write a descriptive label for PC1, PC2, PC3, PC4, PC5, PC6

PC1 = Unemployed mover; PC2 = Female foreigner

PCA in vector (GeoDa)

Principal components
are less interpretable
and don’t have any real
meaning since they are
constructed as linear
combinations of the
initial variables.

Used in regression if a number of variables are correlated

Used to create indices if an analyse would like to create a composite indicator of a concept

15
2023/08/14

uses of PCA?
• Principal Component Analysis (or PCA) is being applied in:

• Biomedical industry
– Drug discover programmes

• Healthcare industry

• Retail industry
– Customer profiling

• Image compression

m4 PDF
No ratings yet
m4 PDF
23 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
VECM
No ratings yet
VECM
56 pages
PCA PDF 1646672241
No ratings yet
PCA PDF 1646672241
11 pages
Chapter6 MV
No ratings yet
Chapter6 MV
32 pages
116 Principal Components Analysis
No ratings yet
116 Principal Components Analysis
6 pages
R22 ML Question Bank For It and CSM
No ratings yet
R22 ML Question Bank For It and CSM
4 pages
ML Quiz 1
No ratings yet
ML Quiz 1
4 pages
Principal Component Analysis: Learning Objectives
No ratings yet
Principal Component Analysis: Learning Objectives
11 pages
Metric Tolerance Chart PDF
No ratings yet
Metric Tolerance Chart PDF
6 pages
PCA Assgn 2
No ratings yet
PCA Assgn 2
6 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
28 pages
Multicollinearity Slides PDF
No ratings yet
Multicollinearity Slides PDF
8 pages
Ai (PCA)
No ratings yet
Ai (PCA)
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Data Analytics
No ratings yet
Data Analytics
28 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
Intermediate R - Principal Component Analysis
No ratings yet
Intermediate R - Principal Component Analysis
8 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
Dmaic - GRR Template
No ratings yet
Dmaic - GRR Template
25 pages
2008-Response Surface Methodology (RSM) As A Tool For Optimization in Analytical Chemistry PDF
No ratings yet
2008-Response Surface Methodology (RSM) As A Tool For Optimization in Analytical Chemistry PDF
13 pages
Fleiss 1981
No ratings yet
Fleiss 1981
8 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
CH 09
0% (1)
CH 09
39 pages
Principal Component Analysis - Wikipedia
No ratings yet
Principal Component Analysis - Wikipedia
28 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Pca Ica
No ratings yet
Pca Ica
34 pages
Tabel Stat Baru PDF
No ratings yet
Tabel Stat Baru PDF
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
Pca
No ratings yet
Pca
18 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Program 3
No ratings yet
Program 3
7 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Suggested Solution To Finals Long Quiz
No ratings yet
Suggested Solution To Finals Long Quiz
5 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
The Math Behind PCA
No ratings yet
The Math Behind PCA
3 pages
Data Mining - Module 2 - HU
No ratings yet
Data Mining - Module 2 - HU
88 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
4 pages
Module 3
No ratings yet
Module 3
41 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
20 pages
DR Pca
No ratings yet
DR Pca
22 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Best Practices vs. Misuse of PCA in The Analysis of Climate Variability
No ratings yet
Best Practices vs. Misuse of PCA in The Analysis of Climate Variability
25 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
STAT502
No ratings yet
STAT502
13 pages
Module 4-2 Principal Components Analysis
No ratings yet
Module 4-2 Principal Components Analysis
18 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Dimension Reduction Techniques v1
No ratings yet
Dimension Reduction Techniques v1
14 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Analytical Study Design Advantage & Disadvantage
No ratings yet
Analytical Study Design Advantage & Disadvantage
14 pages
Pca 1
No ratings yet
Pca 1
3 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Solution HW 1
No ratings yet
Solution HW 1
9 pages
Compiled Notes
No ratings yet
Compiled Notes
12 pages
Perbandingan Hasil Jadi Bustier Menggunakan Pola J.H. Meyneke Dan Charmant Terhadap Tubuh Ukuran S, M, Dan L
No ratings yet
Perbandingan Hasil Jadi Bustier Menggunakan Pola J.H. Meyneke Dan Charmant Terhadap Tubuh Ukuran S, M, Dan L
6 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
ADDB Week 5
No ratings yet
ADDB Week 5
66 pages
CHAPTER 7 Stat
No ratings yet
CHAPTER 7 Stat
62 pages
Heteroscedasticity Week 1 Econometrics
No ratings yet
Heteroscedasticity Week 1 Econometrics
33 pages
Statistics Week6
No ratings yet
Statistics Week6
47 pages
PICOT-Hubungan Tingkat Pengetahuan Dan Sikap Mayarakat Dengan
No ratings yet
PICOT-Hubungan Tingkat Pengetahuan Dan Sikap Mayarakat Dengan
5 pages
Solution - Chapter 10
No ratings yet
Solution - Chapter 10
33 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Business Statistics in Practices Chap - 08
No ratings yet
Business Statistics in Practices Chap - 08
30 pages
Auto Correlation
No ratings yet
Auto Correlation
9 pages
MSDS Sample
No ratings yet
MSDS Sample
3 pages
Stat 520 CH 7 Slides
No ratings yet
Stat 520 CH 7 Slides
35 pages
Experimental Design Proposal
No ratings yet
Experimental Design Proposal
2 pages
CLASSWORK 9 - Normal - Prob
No ratings yet
CLASSWORK 9 - Normal - Prob
1 page
Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet