0% found this document useful (0 votes)
128 views35 pages

PCA in R

The document describes how to use the PCAtools package in R to perform principal component analysis on gene expression data from a GEO dataset. It loads the data, preprocesses it, runs PCA, and visualizes the results using various PCAtools functions like screeplot(), biplot(), and plotloadings(). It also explores techniques for determining the optimal number of principal components like Horn's parallel analysis and finding the elbow point in the scree plot.

Uploaded by

Hemanta Saikia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views35 pages

PCA in R

The document describes how to use the PCAtools package in R to perform principal component analysis on gene expression data from a GEO dataset. It loads the data, preprocesses it, runs PCA, and visualizes the results using various PCAtools functions like screeplot(), biplot(), and plotloadings(). It also explores techniques for determining the optimal number of principal components like Horn's parallel analysis and finding the elbow point in the scree plot.

Uploaded by

Hemanta Saikia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

1 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

if (!requireNamespace('BiocManager', quietly = TRUE))


install.packages('BiocManager')

BiocManager::install('PCAtools')

if (!requireNamespace('devtools', quietly = TRUE))


install.packages('devtools')

devtools::install_github('kevinblighe/PCAtools')

library(PCAtools)

2 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

library(Biobase)
library(GEOquery)

# load series and platform data from GEO


gset <- getGEO('GSE2990', GSEMatrix = TRUE, getGPL = FALSE)
mat <- exprs(gset[[1]])

# remove Affymetrix control probes


mat <- mat[-grep('^AFFX', rownames(mat)),]

# extract information of interest from the phenotype data (pdata)


idx <- which(colnames(pData(gset[[1]])) %in%
c('relation', 'age:ch1', 'distant rfs:ch1', 'er:ch1',
'ggi:ch1', 'grade:ch1', 'size:ch1',
'time rfs:ch1'))
metadata <- data.frame(pData(gset[[1]])[,idx],
row.names = rownames(pData(gset[[1]])))

# tidy column names


colnames(metadata) <- c('Study', 'Age', 'Distant.RFS', 'ER', 'GGI', 'Grade',
'Size', 'Time.RFS')

# prepare certain phenotypes of interest


metadata$Study <- gsub('Reanalyzed by: ', '', as.character(metadata$Study))
metadata$Age <- as.numeric(gsub('^KJ', NA, as.character(metadata$Age)))
metadata$Distant.RFS <- factor(metadata$Distant.RFS,
levels = c(0,1))
metadata$ER <- factor(gsub('\\?', NA, as.character(metadata$ER)),
levels = c(0,1))
metadata$ER <- factor(ifelse(metadata$ER == 1, 'ER+', 'ER-'),
levels = c('ER-', 'ER+'))
metadata$GGI <- as.numeric(as.character(metadata$GGI))
metadata$Grade <- factor(gsub('\\?', NA, as.character(metadata$Grade)),
levels = c(1,2,3))
metadata$Grade <- gsub(1, 'Grade 1', gsub(2, 'Grade 2', gsub(3, 'Grade 3', metada
ta$Grade)))
metadata$Grade <- factor(metadata$Grade, levels = c('Grade 1', 'Grade 2', 'Grade
3'))
metadata$Size <- as.numeric(as.character(metadata$Size))
metadata$Time.RFS <- as.numeric(gsub('^KJX|^KJ', NA, metadata$Time.RFS))

# remove samples from the pdata that have any NA value


discard <- apply(metadata, 1, function(x) any(is.na(x)))
metadata <- metadata[!discard,]

# filter the expression data to match the samples in our pdata


mat <- mat[,which(colnames(mat) %in% rownames(metadata))]

# check that sample names match exactly between pdata and expression data
all(colnames(mat) == rownames(metadata))

3 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

## [1] TRUE

p <- pca(mat, metadata = metadata, removeVar = 0.1)

## -- removing the lower 10% of variables based on variance

screeplot(p, axisLabSize = 18, titleLabSize = 22)

biplot(p)

biplot(p, showLoadings = TRUE, lab = NULL)

4 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

pairsplot(p)

5 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

plotloadings(p, labSize = 3)

## -- variables retained:

6 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

## 215281_x_at, 214464_at, 211122_s_at, 210163_at, 204533_at, 205225_at, 209351_at, 2


05044_at, 202037_s_at, 204540_at, 215176_x_at, 214768_x_at, 212671_s_at, 219415_at, 3
7892_at, 208650_s_at, 206754_s_at, 205358_at, 205380_at, 205825_at

eigencorplot(p,
metavars = c('Study','Age','Distant.RFS','ER',
'GGI','Grade','Size','Time.RFS'))

7 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

p$rotated[1:5,1:5]

## PC1 PC2 PC3 PC4 PC5


## GSM65752 -30.24272 43.826310 3.781677 -39.536149 18.612835
## GSM65753 -37.73436 -15.464421 -4.913100 -5.877623 9.060108
## GSM65755 -29.95155 7.788280 -22.980076 -15.222649 23.123766
## GSM65757 -33.73509 1.261410 -22.834375 2.494554 13.629207
## GSM65758 -40.95958 -8.588458 4.995440 14.340150 0.417101

p$loadings[1:5,1:5]

## PC1 PC2 PC3 PC4 PC5


## 206378_at -0.0024336244 -0.05312797 -0.004809456 0.04045087 0.0096616577
## 205916_at -0.0051057533 0.00122765 -0.010593760 0.04023264 0.0285972617
## 206799_at 0.0005723191 -0.05048096 -0.009992964 0.02568142 0.0024626261
## 205242_at 0.0129147329 0.02867789 0.007220832 0.04424070 -0.0006138609
## 206509_at 0.0019058729 -0.05447596 -0.004979062 0.01510060 -0.0026213610

8 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

suppressMessages(require(hgu133a.db))
newnames <- mapIds(hgu133a.db,
keys = rownames(p$loadings),
column = c('SYMBOL'),
keytype = 'PROBEID')

## 'select()' returned 1:many mapping between keys and columns

# tidy up for NULL mappings and duplicated gene symbols


newnames <- ifelse(is.na(newnames) | duplicated(newnames),
names(newnames), newnames)
rownames(p$loadings) <- newnames

horn <- parallelPCA(mat)


horn$n

## [1] 11

elbow <- findElbowPoint(p$variance)


elbow

## PC8
## 8

9 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

library(ggplot2)

screeplot(p,
components = getComponents(p, 1:20),
vline = c(horn$n, elbow)) +

geom_label(aes(x = horn$n + 1, y = 50,


label = 'Horn\'s', vjust = -1, size = 8)) +
geom_label(aes(x = elbow + 1, y = 50,
label = 'Elbow method', vjust = -1, size = 8))

which(cumsum(p$variance) > 80)[1]

10 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

## PC27
## 27

biplot(p,
lab = paste0(p$metadata$Age, ' años'),
colby = 'ER',
hline = 0, vline = 0,
legendPosition = 'right')

11 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

biplot(p,
colby = 'ER', colkey = c('ER+' = 'forestgreen', 'ER-' = 'purple'),
colLegendTitle = 'ER-\nstatus',
# encircle config
encircle = TRUE,
encircleFill = TRUE,
hline = 0, vline = c(-25, 0, 25),
legendPosition = 'top', legendLabSize = 16, legendIconSize = 8.0)

12 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

biplot(p,
colby = 'ER', colkey = c('ER+' = 'forestgreen', 'ER-' = 'purple'),
colLegendTitle = 'ER-\nstatus',
# encircle config
encircle = TRUE, encircleFill = FALSE,
encircleAlpha = 1, encircleLineSize = 5,
hline = 0, vline = c(-25, 0, 25),
legendPosition = 'top', legendLabSize = 16, legendIconSize = 8.0)

13 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

14 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

biplot(p,
colby = 'ER', colkey = c('ER+' = 'forestgreen', 'ER-' = 'purple'),
# ellipse config
ellipse = TRUE,
ellipseConf = 0.95,
ellipseFill = TRUE,
ellipseAlpha = 1/4,
ellipseLineSize = 1.0,
xlim = c(-125,125), ylim = c(-50, 80),
hline = 0, vline = c(-25, 0, 25),
legendPosition = 'top', legendLabSize = 16, legendIconSize = 8.0)

15 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

biplot(p,
colby = 'ER', colkey = c('ER+' = 'forestgreen', 'ER-' = 'purple'),
# ellipse config
ellipse = TRUE,
ellipseConf = 0.95,
ellipseFill = TRUE,
ellipseAlpha = 1/4,
ellipseLineSize = 0,
ellipseFillKey = c('ER+' = 'yellow', 'ER-' = 'pink'),
xlim = c(-125,125), ylim = c(-50, 80),
hline = 0, vline = c(-25, 0, 25),
legendPosition = 'top', legendLabSize = 16, legendIconSize = 8.0)

16 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

17 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

biplot(p,
colby = 'ER', colkey = c('ER+' = 'forestgreen', 'ER-' = 'purple'),
hline = c(-25, 0, 25), vline = c(-25, 0, 25),
legendPosition = 'top', legendLabSize = 13, legendIconSize = 8.0,
shape = 'Grade', shapekey = c('Grade 1' = 15, 'Grade 2' = 17, 'Grade 3' = 8),
drawConnectors = FALSE,
title = 'PCA bi-plot',
subtitle = 'PC1 versus PC2',
caption = '27 PCs ≈ 80%')

biplot(p,
lab = NULL,
colby = 'ER', colkey = c('ER+'='royalblue', 'ER-'='red3'),
hline = c(-25, 0, 25), vline = c(-25, 0, 25),
vlineType = c('dotdash', 'solid', 'dashed'),
gridlines.major = FALSE, gridlines.minor = FALSE,
pointSize = 5,
legendPosition = 'left', legendLabSize = 14, legendIconSize = 8.0,
shape = 'Grade', shapekey = c('Grade 1'=15, 'Grade 2'=17, 'Grade 3'=8),
drawConnectors = FALSE,
title = 'PCA bi-plot',
subtitle = 'PC1 versus PC2',
caption = '27 PCs ≈ 80%')

biplot(p,
# loadings parameters
showLoadings = TRUE,
lengthLoadingsArrowsFactor = 1.5,
sizeLoadingsNames = 4,
colLoadingsNames = 'red4',
# other parameters
lab = NULL,
colby = 'ER', colkey = c('ER+'='royalblue', 'ER-'='red3'),
hline = 0, vline = c(-25, 0, 25),
vlineType = c('dotdash', 'solid', 'dashed'),
gridlines.major = FALSE, gridlines.minor = FALSE,
pointSize = 5,
legendPosition = 'left', legendLabSize = 14, legendIconSize = 8.0,
shape = 'Grade', shapekey = c('Grade 1'=15, 'Grade 2'=17, 'Grade 3'=8),
drawConnectors = FALSE,
title = 'PCA bi-plot',
subtitle = 'PC1 versus PC2',
caption = '27 PCs ≈ 80%')

18 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

# add ESR1 gene expression to the metadata


p$metadata$ESR1 <- mat['205225_at',]

biplot(p,
x = 'PC2', y = 'PC3',
lab = NULL,
colby = 'ESR1',
shape = 'ER',
hline = 0, vline = 0,
legendPosition = 'right') +

scale_colour_gradient(low = 'gold', high = 'red2')

19 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

biplot(p, x = 'PC10', y = 'PC50',


lab = NULL,
colby = 'Age',
hline = 0, vline = 0,
hlineWidth = 1.0, vlineWidth = 1.0,
gridlines.major = FALSE, gridlines.minor = TRUE,
pointSize = 5,
legendPosition = 'left', legendLabSize = 16, legendIconSize = 8.0,
shape = 'Grade', shapekey = c('Grade 1'=15, 'Grade 2'=17, 'Grade 3'=8),
drawConnectors = FALSE,
title = 'PCA bi-plot',
subtitle = 'PC10 versus PC50',
caption = '27 PCs ≈ 80%')

20 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

pairsplot(p,
components = getComponents(p, c(1:10)),
triangle = TRUE, trianglelabSize = 12,
hline = 0, vline = 0,
pointSize = 0.4,
gridlines.major = FALSE, gridlines.minor = FALSE,
colby = 'Grade',
title = 'Pairs plot', plotaxes = FALSE,
margingaps = unit(c(-0.01, -0.01, -0.01, -0.01), 'cm'))

21 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

22 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

pairsplot(p,
components = getComponents(p, c(4,33,11,1)),
triangle = FALSE,
hline = 0, vline = 0,
pointSize = 0.8,
gridlines.major = FALSE, gridlines.minor = FALSE,
colby = 'ER',
title = 'Pairs plot', titleLabSize = 22,
axisLabSize = 14, plotaxes = TRUE,
margingaps = unit(c(0.1, 0.1, 0.1, 0.1), 'cm'))

23 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

plotloadings(p,
rangeRetain = 0.01,
labSize = 4.0,
title = 'Loadings plot',
subtitle = 'PC1, PC2, PC3, PC4, PC5',
caption = 'Top 1% variables',
shape = 24,
col = c('limegreen', 'black', 'red3'),
drawConnectors = TRUE)

## -- variables retained:

## POGZ, CDC42BPA, CXCL11, ESR1, SFRP1, EEF1A2, IGKC, GABRP, CD24, PDZK1

24 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

plotloadings(p,
components = getComponents(p, c(4,33,11,1)),
rangeRetain = 0.1,
labSize = 4.0,
absolute = FALSE,
title = 'Loadings plot',
subtitle = 'Misc PCs',
caption = 'Top 10% variables',
shape = 23, shapeSizeRange = c(1, 16),
col = c('white', 'pink'),
drawConnectors = FALSE)

## -- variables retained:

## CXCL11, IGKC, CXCL9, 210163_at, 214768_x_at, 211645_x_at, 211644_x_at, IGHA1, 2164


91_x_at, 214777_at, 216576_x_at, 212671_s_at, IL23A, PLAAT4, 212588_at, 212998_x_at,
KRT14, GABRP, SOX10, PTX3, TTYH1, CPB1, KRT15, MYBPC1, DST, CXADR, GALNT3, CDH3, TCI
M, DHRS2, MMP1, CRABP1, CST1, MAGEA3, ACOX2, PRKAR2B, PLCB1, HDGFL3, CYP2B6, ORM1, 20
5040_at, HSPB8, SCGB2A2, JCHAIN, POGZ, 213872_at, DYNC2LI1, CDC42BPA

25 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

26 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

eigencorplot(p,
components = getComponents(p, 1:27),
metavars = c('Study','Age','Distant.RFS','ER',
'GGI','Grade','Size','Time.RFS'),
col = c('darkblue', 'blue2', 'black', 'red2', 'darkred'),
cexCorval = 0.7,
colCorval = 'white',
fontCorval = 2,
posLab = 'bottomleft',
rotLabX = 45,
posColKey = 'top',
cexLabColKey = 1.5,
scale = TRUE,
main = 'PC1-27 clinical correlations',
colFrame = 'white',
plotRsquared = FALSE)

27 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

eigencorplot(p,
components = getComponents(p, 1:horn$n),
metavars = c('Study','Age','Distant.RFS','ER','GGI',
'Grade','Size','Time.RFS'),
col = c('white', 'cornsilk1', 'gold', 'forestgreen', 'darkgreen'),
cexCorval = 1.2,
fontCorval = 2,
posLab = 'all',
rotLabX = 45,
scale = TRUE,
main = bquote(Principal ~ component ~ Pearson ~ r^2 ~ clinical ~ correlates),
plotRsquared = TRUE,
corFUN = 'pearson',
corUSE = 'pairwise.complete.obs',
corMultipleTestCorrection = 'BH',
signifSymbols = c('****', '***', '**', '*', ''),
signifCutpoints = c(0, 0.0001, 0.001, 0.01, 0.05, 1))

28 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

pscree <- screeplot(p, components = getComponents(p, 1:30),


hline = 80, vline = 27, axisLabSize = 14, titleLabSize = 20,
returnPlot = FALSE) +
geom_label(aes(20, 80, label = '80% explained variation', vjust = -1, size = 8))

ppairs <- pairsplot(p, components = getComponents(p, c(1:3)),


triangle = TRUE, trianglelabSize = 12,
hline = 0, vline = 0,
pointSize = 0.8, gridlines.major = FALSE, gridlines.minor = FALSE,
colby = 'Grade',
title = '', plotaxes = FALSE,
margingaps = unit(c(0.01, 0.01, 0.01, 0.01), 'cm'),
returnPlot = FALSE)

pbiplot <- biplot(p,


# loadings parameters
showLoadings = TRUE,
lengthLoadingsArrowsFactor = 1.5,
sizeLoadingsNames = 4,
colLoadingsNames = 'red4',
# other parameters
lab = NULL,
colby = 'ER', colkey = c('ER+'='royalblue', 'ER-'='red3'),
hline = 0, vline = c(-25, 0, 25),
vlineType = c('dotdash', 'solid', 'dashed'),
gridlines.major = FALSE, gridlines.minor = FALSE,
pointSize = 5,
legendPosition = 'none', legendLabSize = 16, legendIconSize = 8.0,
shape = 'Grade', shapekey = c('Grade 1'=15, 'Grade 2'=17, 'Grade 3'=8),
drawConnectors = FALSE,
title = 'PCA bi-plot',
subtitle = 'PC1 versus PC2',
caption = '27 PCs ≈ 80%',
returnPlot = FALSE)

ploadings <- plotloadings(p, rangeRetain = 0.01, labSize = 4,


title = 'Loadings plot', axisLabSize = 12,
subtitle = 'PC1, PC2, PC3, PC4, PC5',
caption = 'Top 1% variables',
shape = 24, shapeSizeRange = c(4, 8),
col = c('limegreen', 'black', 'red3'),
legendPosition = 'none',
drawConnectors = FALSE,
returnPlot = FALSE)

peigencor <- eigencorplot(p,


components = getComponents(p, 1:10),
metavars = c('Study','Age','Distant.RFS','ER',
'GGI','Grade','Size','Time.RFS'),
cexCorval = 1.0,

29 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

fontCorval = 2,
posLab = 'all',
rotLabX = 45,
scale = TRUE,
main = "PC clinical correlates",
cexMain = 1.5,
plotRsquared = FALSE,
corFUN = 'pearson',
corUSE = 'pairwise.complete.obs',
signifSymbols = c('****', '***', '**', '*', ''),
signifCutpoints = c(0, 0.0001, 0.001, 0.01, 0.05, 1),
returnPlot = FALSE)

library(cowplot)
library(ggplotify)

top_row <- plot_grid(pscree, ppairs, pbiplot,


ncol = 3,
labels = c('A', 'B Pairs plot', 'C'),
label_fontfamily = 'serif',
label_fontface = 'bold',
label_size = 22,
align = 'h',
rel_widths = c(1.10, 0.80, 1.10))

bottom_row <- plot_grid(ploadings,


as.grob(peigencor),
ncol = 2,
labels = c('D', 'E'),
label_fontfamily = 'serif',
label_fontface = 'bold',
label_size = 22,
align = 'h',
rel_widths = c(0.8, 1.2))

plot_grid(top_row, bottom_row, ncol = 1,


rel_heights = c(1.1, 0.9))

30 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

p <- pca(mat, metadata = metadata, removeVar = 0.1)

## -- removing the lower 10% of variables based on variance

31 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

p.prcomp <- list(sdev = p$sdev,


rotation = data.matrix(p$loadings),
x = data.matrix(p$rotated),
center = TRUE, scale = TRUE)

class(p.prcomp) <- 'prcomp'

# for this simple example, just use a chunk of


# the original data for the prediction
newdata <- t(mat[,seq(1,20)])
predict(p.prcomp, newdata = newdata)[,1:5]

## PC1 PC2 PC3 PC4 PC5


## GSM65752 11.683293 71.0152986 10.677205 -75.97644152 29.7537169
## GSM65753 -10.542633 -31.9953531 -2.753783 -19.59178967 14.9924713
## GSM65755 6.585509 13.4975310 -40.370389 -29.38990525 47.7142845
## GSM65757 1.498398 -0.1294115 -37.336278 0.08078156 22.3448232
## GSM65758 -18.049833 -14.9445805 14.890320 16.57567005 3.4010033
## GSM65760 8.073473 47.5491189 -18.016340 -9.73629569 -51.7330414
## GSM65761 -3.689814 7.7199606 -35.476666 -35.31465087 -40.1455143
## GSM65762 3.949911 -24.9428080 4.710631 2.71721065 43.2182093
## GSM65763 -20.757238 -33.3085383 22.639443 7.41053224 -9.9339918
## GSM65764 -12.287305 -12.7566718 13.813429 33.75583684 17.7938583
## GSM65767 -4.209505 -13.9349129 -17.814569 -14.87200276 -82.4754172
## GSM65768 3.547044 39.6095431 -28.424912 40.26444836 45.6591355
## GSM65769 3.754370 30.0201461 12.415498 45.74502641 37.9905308
## GSM65770 2.538593 -36.6517740 54.887990 5.94021104 -0.9545218
## GSM65771 -7.382089 -8.5963702 27.749060 -21.50981794 -71.4524526
## GSM65772 3.735223 43.2576570 26.995375 21.01817312 -68.8193200
## GSM65773 15.775812 -19.4523339 4.419158 -6.47899302 -25.2479186
## GSM65774 17.589719 -28.5666333 -52.875007 -16.82207768 37.8455365
## GSM65775 -3.375783 -5.2950960 27.071957 49.10111537 55.0410908
## GSM65776 1.562855 -22.0947718 12.797877 7.08296875 -4.9924828

32 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

sessionInfo()

33 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

## R version 4.0.3 (2020-10-10)


## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ggplotify_0.0.5 cowplot_1.1.0 hgu133a.db_3.2.3
## [4] org.Hs.eg.db_3.12.0 AnnotationDbi_1.52.0 IRanges_2.24.0
## [7] S4Vectors_0.28.0 GEOquery_2.58.0 Biobase_2.50.0
## [10] BiocGenerics_0.36.0 PCAtools_2.2.0 ggrepel_0.8.2
## [13] ggplot2_3.3.2
##
## loaded via a namespace (and not attached):
## [1] matrixStats_0.57.0 bit64_4.0.5
## [3] ash_1.0-15 RColorBrewer_1.1-2
## [5] tools_4.0.3 R6_2.4.1
## [7] irlba_2.3.3 KernSmooth_2.23-17
## [9] DBI_1.1.0 colorspace_1.4-1
## [11] withr_2.3.0 tidyselect_1.1.0
## [13] ggalt_0.4.0 bit_4.0.4
## [15] curl_4.3 compiler_4.0.3
## [17] extrafontdb_1.0 cli_2.1.0
## [19] xml2_1.3.2 DelayedArray_0.16.0
## [21] labeling_0.4.2 scales_1.1.1
## [23] proj4_1.0-10 readr_1.4.0
## [25] stringr_1.4.0 digest_0.6.27
## [27] rmarkdown_2.5 pkgconfig_2.0.3
## [29] htmltools_0.5.0 extrafont_0.17
## [31] sparseMatrixStats_1.2.0 MatrixGenerics_1.2.0
## [33] limma_3.46.0 highr_0.8
## [35] maps_3.3.0 rlang_0.4.8
## [37] rstudioapi_0.11 RSQLite_2.2.1
## [39] DelayedMatrixStats_1.12.0 gridGraphics_0.5-0
## [41] farver_2.0.3 generics_0.0.2
## [43] BiocParallel_1.24.0 dplyr_1.0.2

34 of 35 21-01-2021, 02:19 pm
PCAtools: everything Principal Component Analysis https://bioconductor.org/packages/release/bioc/vignettes/PCAtools/inst/d...

## [45] magrittr_1.5 BiocSingular_1.6.0


## [47] Matrix_1.2-18 Rcpp_1.0.5
## [49] munsell_0.5.0 fansi_0.4.1
## [51] lifecycle_0.2.0 stringi_1.5.3
## [53] yaml_2.2.1 MASS_7.3-53
## [55] plyr_1.8.6 grid_4.0.3
## [57] blob_1.2.1 dqrng_0.2.1
## [59] crayon_1.3.4 lattice_0.20-41
## [61] beachmat_2.6.0 hms_0.5.3
## [63] knitr_1.30 ps_1.4.0
## [65] pillar_1.4.6 reshape2_1.4.4
## [67] glue_1.4.2 evaluate_0.14
## [69] BiocManager_1.30.10 vctrs_0.3.4
## [71] Rttf2pt1_1.3.8 gtable_0.3.0
## [73] purrr_0.3.4 tidyr_1.1.2
## [75] assertthat_0.2.1 xfun_0.18
## [77] rsvd_1.0.3 tibble_3.0.4
## [79] rvcheck_0.1.8 memoise_1.1.0
## [81] ellipsis_0.3.1

35 of 35 21-01-2021, 02:19 pm

You might also like