0% found this document useful (0 votes)
63 views

Cheatsheet PCA

Uploaded by

vpmaryse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Cheatsheet PCA

Uploaded by

vpmaryse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

PCA with FactoMineR FactoMineR (for multivariate data analysis) and factoextra (for visualisation of PCA results)

and factoextra Scree plot PCA variables’ plot PCA individuals’ plot

Basics Use the factoextra::get_eig() function to


extract information about eigenvalues. The
Use the factoextra::fviz_pca_var() function to plot
contribution of original variables into selected (the
Use the factoextra::fviz_pca_ind() function
to plot observations with selected (the axes
PCA (Principal Component Analysis) is a factoextra::fviz_screeplot() function will plot axes argument) principal components . Show argument) principal coordinates. With the
dimension-reduction method. It finds principal the percentage of variance explained by each variables through text labels or arrows (the geom habillage argument one can select a grouping
factors - orthogonal linear combinations of principal factor. argument). Result of this function is the ggplot2 variable which will be color-coded in the plot.
original variables that explain maximum amount plot. Use addEllipses to plot ellipses for each
of variance. > get_eig(model) group.
eigenvalue variance.percent cum.variance.percent
Wn⇥q = Xn⇥p Rp⇥q Dim.1 4.474039e+00 8.9480e+01 89.48
Dim.2 3.546706e-01 7.0934e+00 96.57 > fviz_pca_var(model) > fviz_pca_ind(model)
The p dimensional input data X is projected into Dim.3 1.313722e-01 2.6273e+00 99.20
a q dimensional subspace by a linear Dim.4 3.991824e-02 7.9836e-01 100.00

transformation defined by R. New q dimensional Dim.5 5.256294e-32


> fviz_screeplot(model)
1.0512e-30 100.00

data W has orthogonal variables. The


transformation may be done through SVD
decomposition or eigen value decomposition.

The Example
This example uses data about Hollywood action
movies from 2015. Six quantitative variables with
movie ratings scrapped from Rotten Tomato and
Metacritic websites.

> head(movies2015)
Rotten Rotten Metacritic
Tomatoes Metacritic Audience Audience
Spectre 64 60 65 67
Furious 7 81 67 84 68
Terminator Genisys 25 38 59 63
San Andreas 50 43 56 55
Point Break 9 38 37 22

Use the FactoMineR ::PCA() function for PCA


with supplementary quantitative and categorical
variables. Missing values will be replaced by
colMeans. PCA - Biplot
> library(“FactoMineR”)
> model <- PCA(movies2015) Use the factoextra::fviz_pca_biplot() function to combine
> summary(model)
Eigenvalues
results for individuals and variables into a single bi-plot.
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 With the habillage argument one can select a grouping
Variance 4.474 0.355 0.131 0.040 0.00
% of var. 89.481 7.093 2.627 0.798 0.00
variable which will be color-coded in the plot. Use
Cumulative % of var. 89.481 96.574 99.202 100.000 100.00 addEllipses to plot ellipses for each group.
Individuals
Dist Dim.1 ctr cos2
In the presented example, the first principal coordinate is
Spectre | 1.077 | 0.989 2.184 0.842 |
Furious 7 | 2.408 | 2.321 12.045 0.930 | highly correlated with average rating from all sources
Terminator Genisys | 1.694 | -1.394 4.341 0.677 | (audience and critics) while the second principal coordinate
San Andreas | 0.811 | -0.704 1.108 0.754 |
Point Break | 3.643 | -3.461 26.767 0.902 | discriminate between audience and critics. Thus one can
Run All Night | 1.192 | 0.842 1.584 0.499 | easily identify movies that are preferred by critics and these
No Escape
...
| 1.076 | -0.508 0.577 0.223 |
preferred by audience.

Variables
Dim.1 ctr cos2 Dim.2
Rotten.Tomatoes | 0.988 21.836 0.977 | -0.059
Metacritic | 0.931 19.389 0.867 | -0.330
Average.critics | 0.986 21.721 0.972 | -0.156 > fviz_pca_biplot(model, habillage = filmy2015$script.type) +
Rotten.Tomatoes.Audience | 0.943 19.885 0.890 | 0.135
theme(legend.position = "top")
Metacritic.Audience | 0.876 17.169 0.768 | 0.447
...

This cheatsheet presents functions from FactoMineR package (Francois Husson, Julie Josse, Sebastien Le, Jeremy Mazet, http://factominer.free.fr/) in version 1.35 CC BY Przemysław Biecek http://github.com/pbiecek
and factoextra package (Alboukadel Kassambara, Fabian Mundt, http://www.sthda.com/english/rpkgs/factoextra/) in version 1.0.4 https://creativecommons.org/licenses/by/4.0/

You might also like