0% found this document useful (0 votes)
173 views2 pages

AS Practice Exercise 3 Model Solution PDF

1. There are several functions that can be used to perform principal component analysis (PCA) in R, including prcomp() and princomp() from the stats package, and PCA() from the FactoMineR package. 2. The factoextra package is useful for visualizing the results of PCA. It contains functions to calculate standard deviations, rotations, centers, and scales from PCA results. 3. An example analysis is shown using the decathlon2 data from the factoextra package. prcomp() is used to run PCA on active individuals and variables, and outputs such as standard deviations, rotations, and variances are examined.

Uploaded by

Rohan Kanungo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
173 views2 pages

AS Practice Exercise 3 Model Solution PDF

1. There are several functions that can be used to perform principal component analysis (PCA) in R, including prcomp() and princomp() from the stats package, and PCA() from the FactoMineR package. 2. The factoextra package is useful for visualizing the results of PCA. It contains functions to calculate standard deviations, rotations, centers, and scales from PCA results. 3. An example analysis is shown using the decathlon2 data from the factoextra package. prcomp() is used to run PCA on active individuals and variables, and outputs such as standard deviations, rotations, and variances are examined.

Uploaded by

Rohan Kanungo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Solution: Principal Component Analysis

There are several functions from different packages for performing PCA :
 The functions prcomp() and princomp() from the built-in R stats package
 PCA() from FactoMineR package. Read more here : PCA with FactoMineR
 dudi.pca() from ade4 package. Read more here : PCA with ade4

prcomp(x, scale = FALSE)

princomp(x, cor = FALSE, scores = TRUE)

Install factoextra for visualization


The package factoextra is used for the visualization of the principal component
analysis results.
factoextra can be installed and loaded as follow :
# install.packages("devtools")
devtools::install_github("kassambara/factoextra")
# load
library("factoextra")

Prepare the data


We’ll use the data sets decathlon2 from the package factoextra :
library("factoextra")
data(decathlon2)

Extract only active individuals and variables for principal component analysis:
decathlon2.active <- decathlon2[1:23, 1:10]
head(decathlon2.active[, 1:6])
X100m Long.jump Shot.put High.jump X400m X110m.hurdle
SEBRLE 11.04 7.58 14.83 2.07 49.81 14.69
CLAY 10.76 7.40 14.26 1.86 49.37 14.05
BERNARD 11.02 7.23 14.25 1.92 48.93 14.99
YURKOV 11.34 7.09 15.19 2.10 50.42 15.31
ZSIVOCZKY 11.13 7.30 13.48 2.01 48.62 14.17
McMULLEN 10.83 7.31 13.76 2.13 49.91 14.38

Use the R function prcomp() for PCA


res.pca <- prcomp(decathlon2.active, scale = TRUE)

The values returned, by the function prcomp(), are :


names(res.pca)
[1] "sdev" "rotation" "center" "scale" "x"
1. sdev : the standard deviations of the principal components (the square roots of the eigenvalues)

head(res.pca$sdev)
[1] 2.0308159 1.3559244 1.1131668 0.9052294 0.8375875 0.6502944
2. rotation : the matrix of variable loadings (columns are eigenvectors)
3.
head(unclass(res.pca$rotation)[, 1:4])
PC1 PC2 PC3 PC4
X100m -0.4188591 0.13230683 -0.2708996 0.03708806
Long.jump 0.3910648 -0.20713320 0.1711752 -0.12746997
Shot.put 0.3613881 -0.06298590 -0.4649778 0.14191803
High.jump 0.3004132 0.34309742 -0.2965280 0.15968342
X400m -0.3454786 -0.21400770 -0.2547084 0.47592968
X110m.hurdle -0.3762651 0.01824645 -0.4032525 -0.01866477
3. center, scale : the centering and scaling used, or FALSE

Variances of the principal components_


The variance retained by each principal component can be obtained as follow :
# Eigenvalues
eig <- (res.pca$sdev)^2
# Variances in percentage
variance <- eig*100/sum(eig)
# Cumulative variances
cumvar <- cumsum(variance)
eig.decathlon2.active <- data.frame(eig = eig, variance = variance,
cumvariance = cumvar)
head(eig.decathlon2.active)
eig variance cumvariance
1 4.1242133 41.242133 41.24213
2 1.8385309 18.385309 59.62744
3 1.2391403 12.391403 72.01885
4 0.8194402 8.194402 80.21325
5 0.7015528 7.015528 87.22878
6 0.4228828 4.228828 91.45760

You might also like