RTCGA
RTCGA
BugReports https://github.com/RTCGA/RTCGA/issues
URL https://rtcga.github.io/RTCGA
License GPL-2
LazyLoad yes
LazyData yes
Depends R (>= 3.3.0)
Imports XML, assertthat, stringi, rvest, data.table, xml2, dplyr,
purrr, survival, survminer, ggplot2, ggthemes, viridis, knitr,
scales
Suggests devtools, testthat, pander, Biobase, GenomicRanges, IRanges,
S4Vectors, RTCGA.rnaseq, RTCGA.clinical, RTCGA.mutations,
RTCGA.RPPA, RTCGA.mRNA, RTCGA.miRNASeq, RTCGA.methylation,
RTCGA.CNV, RTCGA.PANCAN12, magrittr, tidyr
1
2 RTCGA-package
Repository Bioconductor
biocViews ImmunoOncology, Software, DataImport, DataRepresentation,
Preprocessing, RNASeq
VignetteBuilder knitr
NeedsCompilation no
RoxygenNote 5.0.1
git_url https://git.bioconductor.org/packages/RTCGA
git_branch RELEASE_3_13
git_last_commit 98a46dc
git_last_commit_date 2021-05-19
Date/Publication 2021-09-26
R topics documented:
RTCGA-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
boxplotTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
checkTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
convertTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
datasetsTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
downloadTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
expressionsTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
heatmapTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
infoTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
installTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
kmTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
mutationsTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
pcaTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
readTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
survivalTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
theme_RTCGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Index 34
Description
The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, down-
load, and analyze data sets generated by TCGA. It contains clinical information, genomic charac-
terization data, and high level sequence analysis of the tumor genomes. The key is to understand
genomics to improve cancer care. RTCGA package offers download and integration of the variety
and volume of TCGA data using patient barcode key, what enables easier data possession. This
may have an benefcial infuence on impact on development of science and improvement of patients’
boxplotTCGA 3
treatment. Furthermore, RTCGA package transforms TCGA data to form which is convenient to
use in R statistical package. Those data transformations can be a part of statistical analysis pipeline
which can be more reproducible with RTCGA
Details
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
See Also
Examples
## Not run:
browseVignettes('RTCGA')
## End(Not run)
Description
Usage
Arguments
data A data.frame from TCGA study containing variables to be plotted.
x A character name of variable containing groups.
y A character name of continous variable to be plotted.
fill A character names of fill variable. By default, the same as x.
coord.flip Whether to flip coordinates.
facet.names A character of length maximum 2 containing names of variables to produce
facets. See examples.
ylab The name of y label. Remember about coord.flip.
xlab The name of x label. Remember about coord.flip.
legend.title A character with legend’s title.
legend A character specifying legend position. Allowed values are one of c("top", "bot-
tom", "left", "right", "none"). Default is "top" side position. to remove the
legend use legend = "none".
... Further arguments passed to geom_boxplot.
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Visualizations.html.
Other RTCGA: RTCGA-package, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA, expressionsTCGA,
heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
Examples
library(RTCGA.rnaseq)
# perfrom plot
library(dplyr)
expressionsTCGA(ACC.rnaseq, BLCA.rnaseq, BRCA.rnaseq, OV.rnaseq,
extract.cols = "MET|4233") %>%
rename(cohort = dataset,
MET = `MET|4233`) %>%
#cancer samples
filter(substr(bcr_patient_barcode, 14, 15) == "01") -> ACC_BLCA_BRCA_OV.rnaseq
## facet example
library(RTCGA.mutations)
library(dplyr)
mutationsTCGA(BRCA.mutations, OV.mutations, ACC.mutations, BLCA.mutations) %>%
filter(Hugo_Symbol == 'TP53') %>%
filter(substr(bcr_patient_barcode, 14, 15) == "01") %>% # cancer tissue
mutate(bcr_patient_barcode = substr(bcr_patient_barcode, 1, 12)) -> ACC_BLCA_BRCA_OV.mutations
ACC_BLCA_BRCA_OV.rnaseq %>%
mutate(bcr_patient_barcode = substr(bcr_patient_barcode, 1, 15)) %>%
filter(bcr_patient_barcode %in%
substr(ACC_BLCA_BRCA_OV.mutations_all$bcr_patient_barcode, 1, 15)) %>%
# took patients for which we had any mutation information
# so avoided patients without any information about mutations
mutate(bcr_patient_barcode = substr(bcr_patient_barcode, 1, 12)) %>%
# strin_length(ACC_BLCA_BRCA_OV.mutations$bcr_patient_barcode) == 12
left_join(ACC_BLCA_BRCA_OV.mutations,
by = "bcr_patient_barcode") %>% #joined only with tumor patients
mutate(TP53 = ifelse(!is.na(Variant_Classification), "Mut", "WILD")) %>%
select(cohort, MET, TP53) -> ACC_BLCA_BRCA_OV.rnaseq_TP53mutations
boxplotTCGA(ACC_BLCA_BRCA_OV.rnaseq_TP53mutations,
"reorder(cohort,log1p(MET), median)", "log1p(MET)",
xlab = "Cohort Type", ylab = "Logarithm of MET",
legend.title = "Cohorts", legend = "bottom",
facet.names = c("TP53"))
boxplotTCGA(ACC_BLCA_BRCA_OV.rnaseq_TP53mutations,
"reorder(cohort,log1p(MET), median)", "log1p(MET)",
xlab = "Cohort Type", ylab = "Logarithm of MET",
legend.title = "Cohorts", legend = "bottom",
fill = c("TP53"))
Description
The checkTCGA function let’s to check
• DataSets: TCGA datasets’ names for current release date and cohort.
• Dates: TCGA datasets’ dates of release.
Usage
checkTCGA(what, cancerType, date = NULL)
Arguments
what One of DataSets or Dates.
cancerType A character of length 1 containing abbreviation (Cohort code - http://gdac.broadinstitute.org/)
of types of cancers to check for.
date A NULL or character specifying from which date informations should be checked.
By default (date = NULL) the newest available date is used. All available dates
can be checked on http://gdac.broadinstitute.org/runs/ or by using checkTCGA('Dates')
function. Required format 'YYYY-MM-DD'.
Details
• If what='DataSets' enables to check TCGA datasets’ names for current release date and
cohort.
• If what='Dates' enables to check dates of TCGA datasets’ releases.
Value
• If what='DataSets' a data.frame of available datasets’ names (to pass to the downloadTCGA
function) and sizes.
• If what='Dates' a vector of available dates to pass to the downloadTCGA function.
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Download.html.
Other RTCGA: RTCGA-package, boxplotTCGA, convertTCGA, datasetsTCGA, downloadTCGA, expressionsTCGA,
heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
convertTCGA 7
Examples
#############################
## End(Not run)
# dates of TCGA datasets' releases.
checkTCGA('Dates')
#############################
## Not run:
# TCGA datasets' names availability for
# current release date and cancer type.
## End(Not run)
Description
Functions use Biobase (http://bioconductor.org/packages/release/bioc/html/Biobase.html) package
to transform data from packages from RTCGA data family to Bioconductor classes (RTCGA.rnaseq,
RTCGA.RPPA, RTCGA.PANCAN12, mRNA, RTCGA.methylation to ExpressionSet and RTCGA.CNV
to GRanges). For RTCGA.PANCAN12 there is sense to convert expression.cb1,expression.cb2,cnv.cb.
Usage
convertTCGA(dataSet, dataType = "expression")
convertPANCAN12(dataSet)
Arguments
dataSet A data.frame to be converted to ExpressionSet or GRanges.
dataType One of expression or CNV (for RTCGA.CNV datasets).
8 convertTCGA
Details
This functionality is motivated by that we were asked to offer the data in Bioconductor-friendly
classes because many users already have their data in one of the core infrastructure classes. Data of
the same type in compatible containers promotes interoperability and makes it easy to combine and
organize.
Bioconductor classes were designed to capitalize on the biological structure of the data. If data
have a range-based component it’s natural, for Bioconductor users, to store and access these as a
GRanges where they can extract position, strand etc. in the same way. Similarly for ExpressionSet.
This class holds expression data along with experiment metadata and comes with built in accessors
to extract and manipulate data. The idea is to offer a common API to the data; extracting the start
position in a GRanges is always start(). With a data.frame it is different each time (unless select()
is implemented) as the column names and organization of data can be different.
AnnotationHub and the soon to come ExperimentHub will host many different types of data. A
primary goal moving forward is to offer similar data in a consistent format. For example, CNV data
in AnnotationHub is offered as a GRanges and as more CNV are added we will ask that they too
are packaged as GRanges. The aim is that streamlined data on the back-end will make for a more
intuitive experience on the front-end.
Value
Functions return an ExpressionSet or a GRanges for RTCGA.CNV
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Download.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, datasetsTCGA, downloadTCGA, expressionsTCGA,
heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
Examples
########
########
# Expression data
datasetsTCGA 9
########
########
library(RTCGA.rnaseq)
library(Biobase)
convertTCGA(BRCA.rnaseq) -> BRCA.rnaseq_ExpressionSet
## Not run:
library(RTCGA.PANCAN12)
convertPANCAN12(expression.cb1) -> PANCAN12_ExpressionSet
library(RTCGA.RPPA)
convertTCGA(BRCA.RPPA) -> BRCA.RPPA_ExpressionSet
library(RTCGA.methylation)
convertTCGA(BRCA.methylation) -> BRCA.methylation_ExpressionSet
library(RTCGA.mRNA)
convertTCGA(BRCA.mRNA) -> BRCA.mRNA_ExpressionSet
########
########
# CNV
########
########
library(RTCGA.CNV)
library(GRanges)
convertTCGA(BRCA.CNV, "CNV") -> BRCA.CNV_GRanges
## End(Not run)
datasetsTCGA RTCGA.data - The Family of R Packages with Data from The Cancer
Genome Atlas Study
Description
Snapshots of the clinical, mutations, CNVs, rnaseq, RPPA, mRNA, miRNASeq and methylation
datasets from the 2015-11-01 release date (check all dates of release with checkTCGA('Dates'))
are included in the RTCGA.data family (factory) that contains 9 packages:
• RTCGA.rnaseq rnaseq
• RTCGA.clinical clinical
• RTCGA.mutations mutations
• RTCGA.CNV CNV
• RTCGA.RPPA RPPA
• RTCGA.mRNA mRNA
• RTCGA.miRNASeq miRNASeq
• RTCGA.methylation methylation
• RTCGA.PANCAN12 (not from TCGA)
10 datasetsTCGA
Details
For more detailed information visit RTCGA.data website https://rtcga.github.io/RTCGA. One can
install all data packages with installTCGA.
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski [aut, cre] < [email protected] >
Przemyslaw Biecek [aut] < [email protected] >
Witold Chodor [aut] < [email protected] >
See Also
RTCGA website http://rtcga.github.io/RTCGA.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, downloadTCGA, expressionsTCGA,
heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
Examples
## Not run:
## Bioconductor releases
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install(RTCGA.clinical)
BiocManager::install(RTCGA.mutations)
BiocManager::install(RTCGA.rnaseq)
BiocManager::install(RTCGA.CNV)
BiocManager::install(RTCGA.RPPA)
BiocManager::install(RTCGA.mRNA)
BiocManager::install(RTCGA.miRNASeq)
BiocManager::install(RTCGA.methylation)
## End(Not run)
downloadTCGA 11
Description
Enables to download TCGA data from specified dates of releases of concrete Cohorts of cancer
types. Pass a name of required dataset to the dataSet parameter. By default the Merged Clinical
dataSet is downloaded (value dataSet = 'Merge_Clinical.Level_1') from the newest available
date of the release.
Usage
downloadTCGA(cancerTypes, dataSet = "Merge_Clinical.Level_1", destDir,
date = NULL, untarFile = TRUE, removeTar = TRUE, allDataSets = FALSE)
Arguments
cancerTypes A character vector containing abbreviations (Cohort code) of types of cancers
to download from http://gdac.broadinstitute.org/. For easy access from R check
details below.
dataSet A part of the name of dataSet to be downloaded from http://gdac.broadinstitute.org/runs/.
By default the Merged Clinical dataSet is downloaded (value dataSet = 'Merge_Clinical.Level_1').
Available datasets’ names can be checked using checkTCGA function.
destDir A character specifying a directory into which dataSets will be downloaded.
date A NULL or character specifying from which date dataSets should be down-
loaded. By default (date = NULL) the newest available date is used. All available
dates can be checked on http://gdac.broadinstitute.org/runs/ or by using check-
TCGA function. Required format 'YYYY-MM-DD'.
untarFile Logical - should the downloaded file be untarred. Default is TRUE.
removeTar Logical - should the downloaded .tar file be removed after untarring. Default
is TRUE.
allDataSets Logical - should download all datasets matching dataSet parameter or only the
first one (without FFPE phrase if possible).
Details
All cohort names can be checked using: sub( x = names( infoTCGA() ),'-counts','' ).
Value
No values. It only downloads files.
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
12 expressionsTCGA
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Download.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, expressionsTCGA,
heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
Examples
dir.create( 'hre')
## Not run:
downloadTCGA( cancerTypes = c('BRCA', 'OV'), destDir = 'hre',
date = tail( checkTCGA('Dates'), 2 )[1] )
## End(Not run)
Description
Function gathers expressions over multiple TCGA datasets and extracts expressions for desired
genes. See rnaseq, mRNA, RPPA, miRNASeq, methylation.
Usage
expressionsTCGA(..., extract.cols = NULL, extract.names = TRUE)
Arguments
... A data.frame or data.frames from TCGA study containing expressions informa-
tions.
extract.cols A character specifing the names of columns to be extracted with bcr_patient_barcode.
If NULL (by default) all columns are returned.
extract.names Logical, whether to extract names of passed data.frames in ....
expressionsTCGA 13
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Note
Input data.frames should contain column bcr_patient_barcode if extract.cols is specified.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Visualizations.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
Examples
## for all examples
library(dplyr)
library(tidyr)
library(ggplot2)
## RNASeq expressions
library(RTCGA.rnaseq)
expressionsTCGA(BRCA.rnaseq, OV.rnaseq, HNSC.rnaseq,
extract.cols = "VENTX|27287") %>%
rename(cohort = dataset,
VENTX = `VENTX|27287`) %>%
filter(substr(bcr_patient_barcode, 14, 15) == "01") %>% #cancer samples
ggplot(aes(y = log1p(VENTX),
x = reorder(cohort, log1p(VENTX), median),
fill = cohort)) +
geom_boxplot() +
theme_RTCGA() +
scale_fill_brewer(palette = "Dark2")
## mRNA expressions
library(tidyr)
library(RTCGA.mRNA)
expressionsTCGA(BRCA.mRNA, COAD.mRNA, LUSC.mRNA, UCEC.mRNA,
extract.cols = c("ARHGAP24", "TRAV20")) %>%
rename(cohort = dataset) %>%
select(-bcr_patient_barcode) %>%
gather(key = "mRNA", value = "value", -cohort) %>%
ggplot(aes(y = value,
x = reorder(cohort, value, mean),
fill = cohort)) +
14 expressionsTCGA
geom_boxplot() +
theme_RTCGA() +
scale_fill_brewer(palette = "Set3") +
facet_grid(mRNA~.) +
theme(legend.position = "top")
## RPPA expressions
library(RTCGA.RPPA)
expressionsTCGA(ACC.RPPA, BLCA.RPPA, BRCA.RPPA,
extract.cols = c("4E-BP1_pS65", "4E-BP1")) %>%
rename(cohort = dataset) %>%
select(-bcr_patient_barcode) %>%
gather(key = "RPPA", value = "value", -cohort) %>%
ggplot(aes(fill = cohort,
y = value,
x = RPPA)) +
geom_boxplot() +
theme_dark(base_size = 15) +
scale_fill_manual(values = c("#eb6420", "#207de5", "#fbca04")) +
coord_flip() +
theme(legend.position = "top") +
geom_jitter(alpha = 0.5, col = "white", size = 0.6, width = 0.7)
## miRNASeq expressions
library(RTCGA.miRNASeq)
# miRNASeq has bcr_patienct_barcode in rownames...
mutate(ACC.miRNASeq,
bcr_patient_barcode = substr(rownames(ACC.miRNASeq), 1, 25)) -> ACC.miRNASeq.bcr
mutate(CESC.miRNASeq,
bcr_patient_barcode = substr(rownames(CESC.miRNASeq), 1, 25)) -> CESC.miRNASeq.bcr
mutate(CHOL.miRNASeq,
bcr_patient_barcode = substr(rownames(CHOL.miRNASeq), 1, 25)) -> CHOL.miRNASeq.bcr
mutate(LAML.miRNASeq,
bcr_patient_barcode = substr(rownames(LAML.miRNASeq), 1, 25)) -> LAML.miRNASeq.bcr
mutate(PAAD.miRNASeq,
bcr_patient_barcode = substr(rownames(PAAD.miRNASeq), 1, 25)) -> PAAD.miRNASeq.bcr
mutate(THYM.miRNASeq,
bcr_patient_barcode = substr(rownames(THYM.miRNASeq), 1, 25)) -> THYM.miRNASeq.bcr
mutate(LGG.miRNASeq,
bcr_patient_barcode = substr(rownames(LGG.miRNASeq), 1, 25)) -> LGG.miRNASeq.bcr
mutate(STAD.miRNASeq,
bcr_patient_barcode = substr(rownames(STAD.miRNASeq), 1, 25)) -> STAD.miRNASeq.bcr
Description
Function creates heatmaps (geom_tile) for TCGA Datasets.
Usage
heatmapTCGA(data, x, y, fill, legend.title = "Expression", legend = "right",
title = "Heatmap of expression", facet.names = NULL, tile.size = 0.1,
tile.color = "white", ...)
Arguments
data A data.frame from TCGA study containing variables to be plotted.
x, y A character name of variable containing groups.
fill A character names of fill variable.
legend.title A character with legend’s title.
legend A character specifying legend position. Allowed values are one of c("top", "bot-
tom", "left", "right", "none"). Default is "top" side position. to remove the
legend use legend = "none".
title A character with plot title.
facet.names A character of length maximum 2 containing names of variables to produce
facets. See examples.
tile.size, tile.color
A size and color passed to geom_tile.
... Further arguments passed to geom_tile.
16 heatmapTCGA
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Note
heatmapTCGA uses scale_fill_viridis from viridis package which is a port of the new matplotlib
color maps (viridis - the default -, magma, plasma and inferno) to R. matplotlib http://matplotlib.org/
is a popular plotting library for python. These color maps are designed in such a way that they will
analytically be perfectly perceptually-uniform, both in regular form and also when converted to
black-and-white. They are also designed to be perceived by readers with the most common form of
color blindness.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Visualizations.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
Examples
library(RTCGA.rnaseq)
# perfrom plot
library(dplyr)
ACC_BLCA_BRCA_OV.rnaseq %>%
select(-bcr_patient_barcode) %>%
group_by(cohort, MET) %>%
summarise_each(funs(median)) %>%
mutate(ZNF500 = round(`ZNF500|26048`),
ZNF501 = round(`ZNF501|115560`)) -> ACC_BLCA_BRCA_OV.rnaseq.medians
heatmapTCGA(ACC_BLCA_BRCA_OV.rnaseq.medians,
"cohort", "MET", "ZNF500", title = "Heatmap of ZNF500 expression")
infoTCGA 17
## facet example
library(RTCGA.mutations)
library(dplyr)
mutationsTCGA(BRCA.mutations, OV.mutations, ACC.mutations, BLCA.mutations) %>%
filter(Hugo_Symbol == 'TP53') %>%
filter(substr(bcr_patient_barcode, 14, 15) == "01") %>% # cancer tissue
mutate(bcr_patient_barcode = substr(bcr_patient_barcode, 1, 12)) -> ACC_BLCA_BRCA_OV.mutations
ACC_BLCA_BRCA_OV.rnaseq %>%
mutate(bcr_patient_barcode = substr(bcr_patient_barcode, 1, 15)) %>%
filter(bcr_patient_barcode %in%
substr(ACC_BLCA_BRCA_OV.mutations_all$bcr_patient_barcode, 1, 15)) %>%
# took patients for which we had any mutation information
# so avoided patients without any information about mutations
mutate(bcr_patient_barcode = substr(bcr_patient_barcode, 1, 12)) %>%
# strin_length(ACC_BLCA_BRCA_OV.mutations$bcr_patient_barcode) == 12
left_join(ACC_BLCA_BRCA_OV.mutations,
by = "bcr_patient_barcode") %>% #joined only with tumor patients
mutate(TP53 = ifelse(!is.na(Variant_Classification), "Mut", "WILD")) %>%
select(-bcr_patient_barcode, -Variant_Classification, -dataset, -Hugo_Symbol) %>%
group_by(cohort, MET, TP53) %>%
summarise_each(funs(median)) %>%
mutate(ZNF501 = round(`ZNF501|115560`)) -> ACC_BLCA_BRCA_OV.rnaseq_TP53mutations_ZNF501medians
Description
Function restores codes and counts for each cohort from TCGA project.
Usage
infoTCGA()
Value
A list with a tabular information from http://gdac.broadinstitute.org/.
18 installTCGA
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Download.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, heatmapTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA,
survivalTCGA, theme_RTCGA
Examples
infoTCGA()
library(magrittr)
(cohorts <- infoTCGA() %>%
rownames() %>%
sub('-counts', '', x=.))
Description
Function installs data packages from https://github.com/RTCGA/. Packages are listed dataset-
sTCGA.
Usage
installTCGA(packages = c("RTCGA.clinical", "RTCGA.mutations", "RTCGA.rnaseq",
"RTCGA.RPPA", "RTCGA.mRNA", "RTCGA.CNV", "RTCGA.miRNASeq", "RTCGA.PANCAN12",
"RTCGA.methylation"), build_vignettes = TRUE, ...)
Arguments
packages A character specifing the names of the data packages to be installed. By default
installs all packages.
build_vignettes
Should vignettes be build.
... Further arguments passed to install_github.
kmTCGA 19
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, heatmapTCGA, infoTCGA, kmTCGA, mutationsTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
Examples
## Not run:
installTCGA()
installTCGA('RTCGA.clinical')
## End(Not run)
Description
Plots Kaplan-Meier estimates of survival curves for survival data.
Usage
kmTCGA(x, times = "times", status = "patient.vital_status",
explanatory.names = "1", main = "Survival Curves", risk.table = TRUE,
risk.table.y.text = FALSE, conf.int = TRUE, return.survfit = FALSE,
pval = FALSE, ...)
Arguments
x A data.frame containing survival information. See survivalTCGA.
times The name of time variable.
status The name of status variable.
explanatory.names
Names of explanatory variables to use in survival curves plot.
main Title of the plot.
20 kmTCGA
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Visualizations.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, heatmapTCGA, infoTCGA, installTCGA, mutationsTCGA, pcaTCGA, readTCGA,
survivalTCGA, theme_RTCGA
Examples
## Extracting Survival Data
library(RTCGA.clinical)
survivalTCGA(BRCA.clinical, OV.clinical, extract.cols = "admin.disease_code") -> BRCAOV.survInfo
Description
Function gathers mutations over multiple TCGA datasets and extracts mutations and further infor-
mations about them for desired genes. See mutations.
Usage
mutationsTCGA(..., extract.cols = c("Hugo_Symbol", "Variant_Classification",
"bcr_patient_barcode"), extract.names = TRUE, unique = TRUE)
Arguments
... A data.frame or data.frames from TCGA study containing mutations informa-
tion (RTCGA.mutations).
extract.cols A character specifing the names of columns to be extracted with bcr_patient_barcode.
If NULL all columns are returned.
extract.names Logical, whether to extract names of passed data.frames in ....
unique Should the outputed data be unique. By default it’s TRUE.
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Note
Input data.frames should contain column bcr_patient_barcode if extract.cols is specified.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Visualizations.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, heatmapTCGA, infoTCGA, installTCGA, kmTCGA, pcaTCGA, readTCGA, survivalTCGA,
theme_RTCGA
22 pcaTCGA
Examples
library(RTCGA.mutations)
library(dplyr)
mutationsTCGA(BRCA.mutations, OV.mutations) %>%
filter(Hugo_Symbol == 'TP53') %>%
filter(substr(bcr_patient_barcode, 14, 15) == "01") %>% # cancer tissue
mutate(bcr_patient_barcode = substr(bcr_patient_barcode, 1, 12)) -> BRCA_OV.mutations
library(RTCGA.clinical)
survivalTCGA(BRCA.clinical, OV.clinical, extract.cols = "admin.disease_code") %>%
rename(disease = admin.disease_code)-> BRCA_OV.clinical
BRCA_OV.clinical %>%
left_join(BRCA_OV.mutations,
by = "bcr_patient_barcode") %>%
mutate(TP53 = ifelse(!is.na(Variant_Classification), "Mut",
"WILDorNOINFO")) -> BRCA_OV.clinical_mutations
BRCA_OV.clinical_mutations %>%
select(times, patient.vital_status, disease, TP53) -> BRCA_OV.2plot
kmTCGA(BRCA_OV.2plot, explanatory.names = c("TP53", "disease"),
break.time.by = 400, xlim = c(0,2000))
Description
Plots Two Main Components of Principal Component Analysis
Usage
pcaTCGA(x, group.names, title = "", return.pca = FALSE, scale = TRUE,
center = TRUE, var.scale = 1, obs.scale = 1, ellipse = TRUE,
circle = TRUE, var.axes = FALSE, alpha = 0.8, add.lines = TRUE, ...)
Arguments
x A data.frame containing i.e. expressions information. See expressionsTCGA.
group.names Names of group variable to use in labels of the plot.
title The title of a plot.
return.pca Should return pca object additionaly to pca plot?
scale As in prcomp.
center As in prcomp.
var.scale As in ggbiplot.
pcaTCGA 23
obs.scale As in ggbiplot.
ellipse As in ggbiplot.
circle As in ggbiplot.
var.axes As in ggbiplot.
alpha As in ggbiplot.
add.lines Should axis lines be added to plot.
... Further arguments passed to prcomp.
Value
If return.pca = TRUE then a list containing a PCA plot (of class ggplot) and a pca model, the
result of prcomp function. If not, then only PCA plot is returned.
ggbiplot
This function is based on https://github.com/vqv/ggbiplot which had to be copied to RTCGA be-
cause Bioconductor does not support remote dependencies from GitHub.
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Visualizations.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, readTCGA,
survivalTCGA, theme_RTCGA
Examples
## Not run:
library(dplyr)
## RNASeq expressions
library(RTCGA.rnaseq)
expressionsTCGA(BRCA.rnaseq, OV.rnaseq, HNSC.rnaseq) %>%
rename(cohort = dataset) %>%
filter(substr(bcr_patient_barcode, 14, 15) == "01") -> BRCA.OV.HNSC.rnaseq.cancer
pcaTCGA(BRCA.OV.HNSC.rnaseq.cancer, "cohort")
pcaTCGA(BRCA.OV.HNSC.rnaseq.cancer, "cohort", add.lines = FALSE)
pcaTCGA(BRCA.OV.HNSC.rnaseq.cancer, "cohort", return.pca = TRUE) -> pca.rnaseq
pca.rnaseq$plot
pca.rnaseq$pca
24 readTCGA
## End(Not run)
Description
readTCGA function allows to read unzipped files:
• clinical data - Merge_Clinical.Level_1
• rnaseq data (genes’ expressions) - rnaseqv2__illuminahiseq_rnaseqv2
• genes’ mutations data - Mutation_Packager_Calls.Level
• Reverse phase protein array data (RPPA) - protein_normalization__data.Level_3
• Merge transcriptome agilent data (mRNA) - Merge_transcriptome__agilentg4502a_07_3__unc_edu__Level_3__u
• miRNASeq data - Merge_mirnaseq__illuminaga_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data
or "Merge_mirnaseq__illuminahiseq_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data.Level_3
• methylation data - Merge_methylation__humanmethylation27
• isoforms data - Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_isoforms_normalized_
from TCGA project. Those files can be easily downloded with downloadTCGA function. See
examples.
Usage
readTCGA(path, dataType, ...)
Arguments
path See details and examples.
dataType One of 'clinical','rnaseq','mutations','RPPA','mRNA','miRNASeq','methylation','isofor
depending on which type of data user is trying to read in the tidy format.
... Further arguments passed to the as.data.frame.
Details
All cohort names can be checked using: sub( x = names( infoTCGA() ),'-counts','').
Parameter path specification:
• If dataType = 'clinical' a path to a cancerType.clin.merged.txt file.
• If dataType = 'mutations' a path to the unzziped folder Mutation_Packager_Calls.Level
containing .maf files.
• If dataType = 'rnaseq' a path to the uzziped file rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM
• If dataType = 'RPPA' a path to the unzipped file in folder protein_normalization__data.Level_3.
readTCGA 25
Value
An output:
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
Witold Chodor, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Download.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA,
survivalTCGA, theme_RTCGA
Examples
## Not run:
##############
##### clinical
##############
dir.create('data')
# reading datasets
sapply( c('BRCA', 'OV'), function( element ){
folder <- grep( paste0( '(_',element,'\\.', '|','_',element,'-FFPE)', '.*Clinical'),
list.files('data/'),value = TRUE)
path <- paste0( 'data/', folder, '/', element, '.clin.merged.txt')
assign( value = readTCGA( path, 'clinical' ),
x = paste0(element, '.clin.data'), envir = .GlobalEnv)
})
############
##### rnaseq
############
dir.create('data2')
# reading data
list.files( 'data2/') %>%
file.path( 'data2', .) -> folder
folder %>%
list.files %>%
file.path( folder, .) %>%
grep( pattern = 'illuminahiseq', x = ., value = TRUE) -> pathRNA
readTCGA( path = pathRNA, dataType = 'rnaseq' ) -> my_data
###############
##### mutations
###############
# reading data
list.files( 'data3/') %>%
file.path( 'data3', .) -> folder
#################
##### methylation
#################
##########
##### RPPA
##########
##########
##### mRNA
##########
##############
##### miRNASeq
##############
downloadTCGA(cancerTypes = cancerType,
dataSet = "Merge_mirnaseq__illuminaga_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data.Level_3",
destDir = "data7")
downloadTCGA(cancerTypes = cancerType,
dataSet = "Merge_mirnaseq__illuminahiseq_mirnaseq__bcgsc_ca__Level_3__miR_gene_expression__data.Level_3",
destDir = "data7")
##############
##### isoforms
##############
## End(Not run)
Description
Extracts survival information from clicnial datasets from TCGA project.
Usage
survivalTCGA(..., extract.cols = NULL, extract.names = FALSE,
barcode.name = "patient.bcr_patient_barcode",
event.name = "patient.vital_status",
days.to.followup.name = "patient.days_to_last_followup",
days.to.death.name = "patient.days_to_death")
Arguments
... A data.frame or data.frames from TCGA study containing clinical informations.
See clinical.
extract.cols A character specifing the names of extra columns to be extracted with survival
information.
extract.names Logical, whether to extract names of passed data.frames in ....
barcode.name A character with the name of bcr_patient_barcode which differs between
TCGA releases. By default is the name from the newest release date tail(checkTCGA('Dates'),1).
event.name A character with the name of patient.vital_status which differs between
TCGA releases. By default is the name from the newest release date tail(checkTCGA('Dates'),1).
survivalTCGA 31
days.to.followup.name
A character with the name of patient.days_to_last_followup which differs
between TCGA releases. By default is the name from the newest release date
tail(checkTCGA('Dates'),1).
days.to.death.name
A character with the name of patient.days_to_death which differs between
TCGA releases. By default is the name from the newest release date tail(checkTCGA('Dates'),1).
Value
A data.frame containing information about times and censoring for specific bcr_patient_barcode.
The name passed in barcode.name is changed to bcr_patient_barcode.
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Note
Input data.frames should contain columns patient.bcr_patient_barcode, patient.vital_status,
patient.days_to_last_followup, patient.days_to_death or theyir previous equivalents. It is
recommended to use datasets from clinical.
Author(s)
Marcin Kosinski, <[email protected]>
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Visualizations.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA,
readTCGA, theme_RTCGA
Examples
## Extracting Survival Data
library(RTCGA.clinical)
survivalTCGA(BRCA.clinical, OV.clinical, extract.cols = "admin.disease_code") -> BRCAOV.survInfo
Description
Additional RTCGA theme for ggtheme, based on theme_pander.
Usage
theme_RTCGA(base_size = 11, base_family = "", ...)
Arguments
base_size base font size
base_family base font family
... Further arguments passed to theme_pander.
Issues
If you have any problems, issues or think that something is missing or is not clear please post an
issue on https://github.com/RTCGA/RTCGA/issues.
Author(s)
Marcin Kosinski, <[email protected]>
See Also
RTCGA website http://rtcga.github.io/RTCGA/Visualizations.html.
Other RTCGA: RTCGA-package, boxplotTCGA, checkTCGA, convertTCGA, datasetsTCGA, downloadTCGA,
expressionsTCGA, heatmapTCGA, infoTCGA, installTCGA, kmTCGA, mutationsTCGA, pcaTCGA,
readTCGA, survivalTCGA
theme_RTCGA 33
Examples
library(RTCGA.clinical)
survivalTCGA(BRCA.clinical, OV.clinical, extract.cols = "admin.disease_code") -> BRCAOV.survInfo
kmTCGA(BRCAOV.survInfo, explanatory.names = "admin.disease_code",
xlim = c(0,4000))
Index
as.data.frame, 24 miRNASeq, 9, 12
mRNA, 9, 12
boxplotTCGA, 3, 3, 6, 8, 10, 12, 13, 16, 18–21, mutations, 9, 21
23, 25, 31, 32 mutationsTCGA, 3, 4, 6, 8, 10, 12, 13, 16,
18–20, 21, 23, 25, 31, 32
checkTCGA, 3, 4, 5, 8, 10–13, 16, 18–21, 23,
25, 31, 32 pcaTCGA, 3, 4, 6, 8, 10, 12, 13, 16, 18–21, 22,
clinical, 9, 30, 31 25, 31, 32
CNV, 9 prcomp, 22, 23
convertPANCAN12 (convertTCGA), 7
convertTCGA, 3, 4, 6, 7, 10, 12, 13, 16, 18–21, readTCGA, 3, 4, 6, 8, 10, 12, 13, 16, 18–21, 23,
23, 25, 31, 32 24, 31, 32
rnaseq, 9, 12
datasetsTCGA, 3, 4, 6, 8, 9, 12, 13, 16, 18–21, RPPA, 9, 12
23, 25, 31, 32 RTCGA (RTCGA-package), 2
downloadTCGA, 3, 4, 6, 8, 10, 11, 13, 16, RTCGA-package, 2
18–21, 23–25, 31, 32
scale_fill_viridis, 16
ExpressionSet, 7, 8 survivalTCGA, 3, 4, 6, 8, 10, 12, 13, 16,
expressionsTCGA, 3, 4, 6, 8, 10, 12, 12, 16, 18–21, 23, 25, 30, 32
18–23, 25, 31, 32
theme_pander, 32
geom_boxplot, 3, 4 theme_RTCGA, 3, 4, 6, 8, 10, 12, 13, 16, 18–21,
geom_tile, 15 23, 25, 31, 32
ggsurvplot, 20
ggtheme, 32 unique, 21
GRanges, 7, 8
methylation, 9, 12
34