0% found this document useful (0 votes)

14 views6 pages

116 Principal Components Analysis

Principal Components Analysis (PCA) is a data reduction tool used to identify patterns in high-dimensional data and reduce dimensionality while preserving information. It works by defining principal components through the covariance/correlations of the data, and the choice of the number of components is crucial for effective analysis. Implementation in R can be done using functions like 'prcomp' and 'princomp', and visualizations such as scree plots and biplots help in interpreting the results.

Uploaded by

f.zhydok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views6 pages

116 Principal Components Analysis

Uploaded by

f.zhydok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Principal Components Analysis

Maths and Statistics Help Centre

Usage
- Data reduction tool
- Vital first step in the multivariate analysis of continuous data
- Used to find patterns in high dimensional data
- Expresses data such that similarities and differences are highlighted
- Once the patterns in the data are found, PCA is used to reduce the dimensionality of the data without much
loss of information (choosing the right number of principal components is very important)
- If you are using PCA for modelling purposes (either subsequent gradient analyses or regression) then
normality is ideal. If it is for data reduction or exploratory purposes, then normality is not a strict
requirement

How does PCA work?

- Uses the covariance/correlations of the raw data to define principal components that combine many
correlated variables
- The principal components are uncorrelated
- Similar concept to regression – in regression you use one line to describe a relationship between two
variables; here the principal component represents the line. Given a point on the line, or a value of the
principal component you can discover the values of the variables.

Choosing the right number of principal components

- This is one of the most important parts of PCA. The number you choose needs to be the ones that give you
the most information without significant loss of information.
- Scree plots show the eigenvalues. These are used to tell us how important the principal components are.
- When the scree plot plateaus then no more principal components are needed.
- The loadings are a measure of how much each original variable contributes to each of the principal
components.

Implementation in R
- princomp(.)
- prcomp(.)

Example
This example uses the built-in R data set state with states.x77. The dataset contains 8 indicators about the 50 US
states.
> data(state)
> ls()
[1] "state.abb" "state.area" "state.center" "state.division"
[6] "state.name" "state.region" "state.x77"

>state.x77
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
....
Washington 3559 4864 0.6 71.72 4.3 63.5 32 66570

1
Principal Components Analysis
Maths and Statistics Help Centre

West Virginia 1799 3617 1.4 69.48 6.7 41.6 100 24070
Wisconsin 4589 4468 0.7 72.48 3.0 54.5 149 54464
Wyoming 376 4566 0.6 70.29 6.9 62.9 173 97203.

> plot(state.center,type="n") Plots where R thinks the states are by centre.

> text(state.center,state.abb)

Performing PCA on the states

First perform PCA of the state data using all the information in state.x77
state.pca1 <- prcomp(state.x77)

Output the PCA summary

>print(state.pca1,digits=3)

Standard deviations:
[1] 8.53e+04 4.47e+03 5.59e+02 4.64e+01 6.04e+00 2.46e+00 6.58e-01 2.90e-01

Rotation:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Population 1.18e-03 -1.00e+00 0.027849 -4.67e-03 3.35e-04 1.39e-04 -5.18e-05 -2.19e-05
Income 2.62e-03 -2.80e-02 -0.999177 2.82e-02 -7.79e-03 -1.12e-04 3.85e-05 -6.29e-05
Illiteracy 5.52e-07 -1.42e-05 0.000584 7.10e-03 -4.05e-02 -3.09e-02 2.55e-02 -9.98e-01
Life Exp -1.69e-06 1.93e-05 -0.001037 -3.88e-03 1.19e-01 2.86e-01 9.51e-01 1.06e-02
Murder 9.88e-06 -2.79e-04 0.002776 2.82e-02 -2.39e-01 -9.20e-01 3.06e-01 4.62e-02
HS Grad 3.16e-05 1.88e-04 -0.008266 -2.78e-02 9.62e-01 -2.66e-01 -4.08e-02 -3.21e-02
Frost 3.61e-05 3.87e-03 -0.028042 -9.99e-01 -3.45e-02 -1.99e-02 6.25e-03 -4.94e-03
Area 1.00e+00 1.26e-03 0.002583 -3.17e-05 -6.56e-06 1.88e-05 -4.09e-07 1.49e-06

We plot the results from our principal component analysis as a scree plot to enable us to decide how many principal
components are necessary to best explain the data.
>plot(state.pca1,type="l")

2
Principal Components Analysis
Maths and Statistics Help Centre

state.pca1$sdev[1]^2/(sum((state.pca1$sdev)^2))
[1] 0.9972262

The scree plot suggests that the first component explains the majority of the variance (the above calculation shows it
to be approximately 99.7%).

Looking at a projection (or biplot) shows us how the components and variables relate, with the magnitude of the
arrows representing the magnitude of the effect.
biplot(state.pca1,cex=c(0.75,1))

The first principal component represents mainly the Area and a bit of the Pop (population) and Income, these are
the 3 variables with highest variance.

Looking at the standard deviations of the 8 variables

> apply(state.x77,2,sd)

Pop Income Illiteracy Life Exp Murder HS Grad Frost Area

4.464e+03 6.145e+02 6.095e-01 1.342e+00 3.691e+00 8.077e+00 5.198e+01 8.533e+04

This principal component analysis does not seem very informative because the variances are so disparate; we
therefore try another method with a scaling factor which we call state.pca2. We do this because when the
covariance matrix is unbalanced PCA is very sensitive to the scaling of the original variables; hence we either use the
correlation matrix or try scaling. Scaling gives the variables unit variance and is usually advisable.
> state.pca2=prcomp(state.x77,scale.=TRUE)
Standard deviations:
[1] 1.8970755 1.2774659 1.0544862 0.8411327 0.6201949 0.5544923 0.3800642 0.3364338

3
Principal Components Analysis
Maths and Statistics Help Centre

Rotation:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Population 0.12642809 0.41087417 -0.65632546 -0.40938555 0.405946365 -0.01065617 -0.062158658 -0.21924645
Income -0.29882991 0.51897884 -0.10035919 -0.08844658 -0.637586953 0.46177023 0.009104712 0.06029200
Illiteracy 0.46766917 0.05296872 0.07089849 0.35282802 0.003525994 0.38741578 -0.619800310 -0.33868838
Life Exp -0.41161037 -0.08165611 -0.35993297 0.44256334 0.326599685 0.21908161 -0.256213054 0.52743331
Murder 0.44425672 0.30694934 0.10846751 -0.16560017 -0.128068739 -0.32519611 -0.295043151 0.67825134
HS Grad -0.42468442 0.29876662 0.04970850 0.23157412 -0.099264551 -0.64464647 -0.393019181 -0.30724183
Frost -0.35741244 -0.15358409 0.38711447 -0.61865119 0.217363791 0.21268413 -0.472013140 0.02834442
Area -0.03338461 0.58762446 0.51038499 0.20112550 0.498506338 0.14836054 0.286260213 0.01320320

The 1st principal component relates mainly to a combination of illiteracy, life expectancy, murder and HS grad; the
2nd component reflects income, area and population.
> summary(state.pca2, digit=3)
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Standard deviation 1.90 1.277 1.054 0.8411 0.6202 0.5545 0.3801 0.3364
Proportion of Variance 0.45 0.204 0.139 0.0884 0.0481 0.0384 0.0181 0.0141
Cumulative Proportion 0.45 0.654 0.793 0.8813 0.9294 0.9678 0.9859 1.0000

A scree plot helps to decide how many components are necessary to explain the data
plot(state.pca2,type="l")

In this case there is no clear plateau point on the plot and so it seems more components are needed, e.g. the first 3
principal components explain close to 80% of the variance. Another way is to decide how many principal
components to use is to use only the PCs that have an eigenvalue are greater than 1.

We now plot a biplot which shows a projection of states plus how the variables relate to the components
> biplot(state.pca2,cex=c(0.5,0.75))

4
Principal Components Analysis
Maths and Statistics Help Centre

5
Principal Components Analysis
Maths and Statistics Help Centre

PC1 plotted geographically - arguably it is taking into account "Southern-ness". This allows us to understand more
about how the principal components are defined.

Plot the second principal component to visualise what it is representing (higher area, population and income).

Principal Component Analysis in R: Prcomp Vs Princomp - Articles - STHDA
No ratings yet
Principal Component Analysis in R: Prcomp Vs Princomp - Articles - STHDA
13 pages
Definely Pitch Deck Nov
No ratings yet
Definely Pitch Deck Nov
17 pages
RedLine Manual
No ratings yet
RedLine Manual
2 pages
P-WPS 135 - MAG (GR 316)
No ratings yet
P-WPS 135 - MAG (GR 316)
9 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
Decision Making: Submitted By-Ankita Mishra
No ratings yet
Decision Making: Submitted By-Ankita Mishra
20 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
72 pages
Ch12 Unsupervised Learning
No ratings yet
Ch12 Unsupervised Learning
58 pages
Unsupervised Handout
No ratings yet
Unsupervised Handout
50 pages
Group 35 Brake
No ratings yet
Group 35 Brake
123 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
53 pages
Practical Guide To Principal Component Analysis (PCA) in R & Python
No ratings yet
Practical Guide To Principal Component Analysis (PCA) in R & Python
33 pages
Spec Comparison - SuperMark 1.5T PDF
No ratings yet
Spec Comparison - SuperMark 1.5T PDF
4 pages
Data Mining - Module 2 - HU
No ratings yet
Data Mining - Module 2 - HU
88 pages
Chapter 2 Principal Components Analysis: Math 3210
No ratings yet
Chapter 2 Principal Components Analysis: Math 3210
30 pages
Practical Guide To Principal Component N R
No ratings yet
Practical Guide To Principal Component N R
43 pages
Principal Components Analysis - SPSS Annotated Output PDF
No ratings yet
Principal Components Analysis - SPSS Annotated Output PDF
9 pages
Minitab Statguide Multivariate
No ratings yet
Minitab Statguide Multivariate
25 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
Dimensional Reduction in R
No ratings yet
Dimensional Reduction in R
24 pages
Module 4-2 Principal Components Analysis
No ratings yet
Module 4-2 Principal Components Analysis
18 pages
Seminar Report On Ardunio
No ratings yet
Seminar Report On Ardunio
27 pages
02 4runner Window PDF
No ratings yet
02 4runner Window PDF
6 pages
Instrument Specification Sheet - Flame Detectors: Project
No ratings yet
Instrument Specification Sheet - Flame Detectors: Project
1 page
Principal Component Analysis
No ratings yet
Principal Component Analysis
4 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Module12 - Unsupervised Learning
No ratings yet
Module12 - Unsupervised Learning
52 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Operating Instructions Clv63x Clv64x Clv65x Bar Code Scanners en Im0071081
No ratings yet
Operating Instructions Clv63x Clv64x Clv65x Bar Code Scanners en Im0071081
268 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
1756 Battery Module
No ratings yet
1756 Battery Module
32 pages
04 Observing The Sun
No ratings yet
04 Observing The Sun
3 pages
User Manual V.1.1: English Version 02 2019
No ratings yet
User Manual V.1.1: English Version 02 2019
20 pages
Milan Er: Medium Range Weapon System For Close Combat Operations
No ratings yet
Milan Er: Medium Range Weapon System For Close Combat Operations
2 pages
02 Principal Components
No ratings yet
02 Principal Components
9 pages
PCA Assgn 2
No ratings yet
PCA Assgn 2
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Practical 10
No ratings yet
Practical 10
2 pages
Apple Data
No ratings yet
Apple Data
8 pages
Aba Siwes
No ratings yet
Aba Siwes
59 pages
DR Pca
No ratings yet
DR Pca
22 pages
Pca Fa Data
No ratings yet
Pca Fa Data
8 pages
How To Install CentOS 6.9 in UEFI Mode by Console Redirection - v1.1
No ratings yet
How To Install CentOS 6.9 in UEFI Mode by Console Redirection - v1.1
7 pages
Jolliffe 2014
No ratings yet
Jolliffe 2014
5 pages
Guidelines Assignment 1 - Aerobic Dance
No ratings yet
Guidelines Assignment 1 - Aerobic Dance
5 pages
Library Orientation: For Graduate Students
No ratings yet
Library Orientation: For Graduate Students
2 pages
InfoCaster DS1150 Hardware Specifications - 20111221
No ratings yet
InfoCaster DS1150 Hardware Specifications - 20111221
2 pages
Object Detection
No ratings yet
Object Detection
17 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
VMW Ebook Vmware Vsphere Eight
No ratings yet
VMW Ebook Vmware Vsphere Eight
11 pages
GIS320 Lecture6 Principal Components Analysis
No ratings yet
GIS320 Lecture6 Principal Components Analysis
16 pages
परियोजना कार्य कक्षा ६
No ratings yet
परियोजना कार्य कक्षा ६
69 pages
Teacher Icard Programme
No ratings yet
Teacher Icard Programme
30 pages
Account Payable Process Gearinc Surabaya
No ratings yet
Account Payable Process Gearinc Surabaya
5 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis: Economics Working Paper Series Working Paper No. 1856
No ratings yet
Principal Component Analysis: Economics Working Paper Series Working Paper No. 1856
25 pages
Crwill
No ratings yet
Crwill
8 pages
ASUS A7v266 Motherboard Manual
No ratings yet
ASUS A7v266 Motherboard Manual
110 pages
Chapter6 MV
No ratings yet
Chapter6 MV
32 pages
Job Cover Letter With Referral
100% (1)
Job Cover Letter With Referral
6 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
PCA Review Reset
No ratings yet
PCA Review Reset
24 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
STAT502
No ratings yet
STAT502
13 pages
Crop Improvement IA 3 Poster
No ratings yet
Crop Improvement IA 3 Poster
1 page
Lecture 12 - Unsupervised - PCA
No ratings yet
Lecture 12 - Unsupervised - PCA
17 pages
Program 3
No ratings yet
Program 3
7 pages
Website Worksheets - R - Principal Components Analysis
No ratings yet
Website Worksheets - R - Principal Components Analysis
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Lecture18 PCA
No ratings yet
Lecture18 PCA
18 pages
CHAPTER 6 Frequency Analysis
No ratings yet
CHAPTER 6 Frequency Analysis
38 pages
PCA Using R
No ratings yet
PCA Using R
12 pages
MFG Kinetic Hardware Sizing Guide WP ENS
No ratings yet
MFG Kinetic Hardware Sizing Guide WP ENS
28 pages
Principal Component Analysis: by Eesha Tur Razia Babar
No ratings yet
Principal Component Analysis: by Eesha Tur Razia Babar
38 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
B.techFinalReview Panels 1
No ratings yet
B.techFinalReview Panels 1
12 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
Pca 1
No ratings yet
Pca 1
3 pages
Big Data Science in Finance
From Everand
Big Data Science in Finance
Irene Aldridge
No ratings yet
Mastering Scientific Computing with R
From Everand
Mastering Scientific Computing with R
Paul Gerrard
3/5 (1)
Six Sigma Green Belt, Round 2: Making Your Next Project Better than the Last One
From Everand
Six Sigma Green Belt, Round 2: Making Your Next Project Better than the Last One
Tracy L. Owens
No ratings yet
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet

116 Principal Components Analysis

Uploaded by

116 Principal Components Analysis

Uploaded by

Principal Components Analysis

Maths and Statistics Help Centre

How does PCA work?

Choosing the right number of principal components

> plot(state.center,type="n") Plots where R thinks the states are by centre.

Performing PCA on the states

Output the PCA summary

Looking at the standard deviations of the 8 variables

Pop Income Illiteracy Life Exp Murder HS Grad Frost Area

You might also like