Riborex
Riborex
Wenzheng Li, Weili Wang, Philip J. Uren, Luiz OF Penalva, Andrew D. Smith
18 September, 2017
Introduction
Riborex is a R package for identifying differentially translated genes from Ribo-seq data. Riborex integrates
both RNA- and Ribo-seq read count data into a single generalized linear model (GLM) and generates a
modified design matrix reflecting the integration. At its core, Riborex applies existing RNA-seq analysis tools
such as edgeR, DESeq2 and Voom to this modified design matrix and identifies differential translation across
conditions.
Detailed example
First, we need to load Riborex library.
library(riborex)
The input for Riborex are two read count tables summarized from RNA-seq and Ribo-seq data respectively.
The read count table should be organized as a data frame with rows correspond to genes and columns
correspond to samples as shown below.
data(riborexdata)
RNACntTable <- rna
RiboCntTable <- ribo
1
After the two read count table and two condition vectors are ready, we can use riborex (), and we can choose
which engine to use. By default, DESeq2 is used as the engine if you don’t specify the engine option. Use
help(riborex) in R to see more details about this function.
res.deseq2 <- riborex(RNACntTable, RiboCntTable, rnaCond, riboCond)
The format of the result is the same when DESeq2 is used in RNA-seq analysis.
res.deseq2
2
DESeq2 unadjusted p−values
2500
Frequency
1500
500
0
Unadjusted p−values
We can see for this dataset, the p-value distribution is as expected based on DESeq2 manual which is uniformly
distribution with differrentially expressed genes enriched with small p-values. We will show another dataset
later for which the p-value distribution is skew to the right and how it can be fixed with fdrtool.
Also, you can use summary () for your results.
summary(res.deseq2)
##
## out of 13916 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 1107, 8%
## LFC < 0 (down) : 1217, 8.7%
## outliers [1] : 0, 0%
## low counts [2] : 540, 3.9%
## (mean count < 6)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
And results can be saved by:
write.table(res.deseq2, "riborex_res_deseq2.txt", quote=FALSE)
If you want to use edgeR as your engine, you can use riborex () as:
res.edgeR <- riborex(RNACntTable, RiboCntTable, rnaCond, riboCond, "edgeR")
The format of the result is the same when edgeR is used in RNA-seq analysis.
head(res.edgeR$table)
3
## ENSRNOG00000000024 -0.30127172 6.212404 2.13670654 0.1438103 0.4037250
## ENSRNOG00000000033 0.07178854 1.313235 0.02631769 0.8711269 0.9636097
## ENSRNOG00000000034 -0.13430329 4.029136 0.12541035 0.7232390 0.9111612
## ENSRNOG00000000036 -0.82540899 1.132478 2.15554356 0.1420562 0.4008219
## ENSRNOG00000000040 -0.19057283 -1.555003 0.07537378 0.7836675 0.9338658
For edgeR engine, you can also choose to estimate dispersion of RNA-seq and Ribo-seq data separately by
specifying engine as “edgeRD”.
res.edgeRD <- riborex(RNACntTable, RiboCntTable, rnaCond, riboCond, "edgeRD")
If you want to use Voom as the engine, you can run riborex () as:
res.voom <- riborex(RNACntTable, RiboCntTable, rnaCond, riboCond, "Voom")
The format of the result is the same when Voom is used in RNA-seq analysis.
head(res.voom)
4
## ENSG00000136938.8 2996
head(RiboCntTable.corrected)
5
DESeq2 unadjusted p−values
800
600
Frequency
400
200
0
6
DESeq2 unadjusted p−values after correction
800
Frequency
600
400
200
0
Multi-factor experiment
Since we don’t find any available ribosome profiling data generated in a multi-factor experiement, here we
generate a pseudo dataset to demonstrate the usage of riborex in a multi-factor experiment. The pseudo
dataset have 8 samples in RNA-seq and Ribo-seq, and two factors are included.
rna <- RNACntTable[,c(1,2,3,4,1,2,3,4)]
ribo <- RiboCntTable[,c(1,2,3,4,1,2,3,4)]
For multi-factor experiment, we prepare two data frames to indicate the treatment under each factor. Here
for the 8 samples in both RNA- and Ribo-seq experiement, the 3rd and 4th samples are treated with drug1
and the 7th and 8th samples are treated with drug2.
rnaCond <- data.frame(factor1=(c("control1", "control1", "treated1", "treated1",
"control1", "control1", "control1", "control1")),
factor2=(c("control2", "control2", "control2", "control2",
"control2", "control2", "treated2", "treated2")))
Also we need to prepare a contrast to specify the comparison we want to perform, for example, if we want to
compare the influence of the usage of drug2. The contrast can be constructed as:
7
contrast = c("factor2", "control2", "treated2")
##
## out of 13916 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 1887, 14%
## LFC < 0 (down) : 1987, 14%
## outliers [1] : 0, 0%
## low counts [2] : 270, 1.9%
## (mean count < 3)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
edgeR and edgeRD can be used in a similar way.
res.edgeR <- riborex(rna, ribo, rnaCond, riboCond, "edgeR", contrast = contrast)
Currently, you can’t choose Voom as the engine in a multi-factor experiment yet.
Setup
This analysis was conducted on
sessionInfo()
8
## [9] GenomeInfoDb_1.10.3 IRanges_2.8.2
## [11] S4Vectors_0.12.2 BiocGenerics_0.22.0
##
## loaded via a namespace (and not attached):
## [1] genefilter_1.58.1 locfit_1.5-9.1 splines_3.3.2
## [4] lattice_0.20-35 colorspace_1.3-2 htmltools_0.3.6
## [7] yaml_2.1.14 base64enc_0.1-3 blob_1.1.0
## [10] survival_2.41-3 XML_3.98-1.6 rlang_0.1.2
## [13] foreign_0.8-69 DBI_0.7 BiocParallel_1.8.2
## [16] bit64_0.9-7 RColorBrewer_1.1-2 plyr_1.8.4
## [19] stringr_1.2.0 zlibbioc_1.20.0 munsell_0.4.3
## [22] gtable_0.2.0 htmlwidgets_0.9 memoise_1.1.0
## [25] evaluate_0.10.1 latticeExtra_0.6-28 knitr_1.17
## [28] geneplotter_1.52.0 AnnotationDbi_1.38.0 htmlTable_1.9
## [31] Rcpp_0.12.12 acepack_1.4.1 xtable_1.8-2
## [34] scales_0.5.0 backports_1.1.0 checkmate_1.8.3
## [37] Hmisc_4.0-3 annotate_1.52.1 XVector_0.14.1
## [40] bit_1.1-12 gridExtra_2.3 ggplot2_2.2.1
## [43] digest_0.6.12 stringi_1.1.5 grid_3.3.2
## [46] rprojroot_1.2 tools_3.3.2 bitops_1.0-6
## [49] magrittr_1.5 lazyeval_0.2.0 RCurl_1.95-4.8
## [52] tibble_1.3.4 RSQLite_2.0 Formula_1.2-2
## [55] cluster_2.0.6 Matrix_1.2-11 data.table_1.10.4
## [58] rmarkdown_1.6 rpart_4.1-11 nnet_7.3-12