0% found this document useful (0 votes)

71 views7 pages

Transcriptome Software Paper

Uploaded by

shikha mittal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views7 pages

Transcriptome Software Paper

Uploaded by

shikha mittal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

G&I

eISSN 2234-0742
Genomics & Informatics
Genomics InformVol. 13, No. 4, 2015
2015;13(4):119-125
Genomics & Informatics http://dx.doi.org/10.5808/GI.2015.13.4.119

REVIEW ARTICLE

Analysis of Whole Transcriptome Sequencing Data:

Workflow and Software
In Seok Yang, Sangwoo Kim*

Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul 03722, Korea

RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and
expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq)
approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the
development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related
software, focusing particularly on transcriptome reconstruction and expression quantification.

Keywords: bioinformatics tools, gene expression, high-throughput RNA sequencing, transcript

shown in Fig. 1: (1) preprocessing of raw data, (2) read

Introduction alignment, (3) transcriptome reconstruction, (4) expression
quantification, and (5) differential expression analysis. As an
The transcriptome is the entire set of RNA transcripts in initial step, RNA-seq data may be subjected to quality
a given cell for a specific developmental stage or physio- control (QC) of the raw data before data analysis. Similar to
logical condition [1]. Understanding the transcriptome is whole genome or exome sequencing, read alignment can be
necessary for interpreting the functional elements of the performed to map the reads to the reference genome or
genome as well as for understanding the underlying transcriptome. Clinical samples including formalin-fixed
mechanisms of development and disease. Microarray tech- paraffin-embedded specimen and cancer tissue biopsies are
nologies have been used for high-throughput large-scale often degraded or exist in limited amount [6]. Thus
RNA-level studies, such as to identify differentially ex- additional QC procedure can be performed to evaluate the
pressed genes between developmental stages or between performance of the RNA-seq experiment itself after read
healthy and diseased groups [2]. However, its hybridi- alignment. Next, transcriptome reconstruction is carried out
zation-based nature limits the ability to catalog and quantify to identify all transcripts expressed in a specimen based on
RNA molecules expressed under various conditions. read mapping data. If there is no available reference
Advances in massive parallel DNA sequencing technologies sequence, this procedure can be conducted directly using a de
have enabled transcriptome sequencing (RNA-seq) by novo assembly approach. Once all transcripts are identified,
sequencing of cDNA. RNA-seq has rapidly replaced their abundances can be estimated using read mapping data.
microarray technology because of its better resolution and Finally, differential expression analysis is conducted using
higher reproducibility; this method can be used to extend currently available programs. In this review, we discuss the
our knowledge of alternative splicing events [3], novel genes RNA-seq workflow and its related bioinformatics tools in
and transcripts [4], and fusion transcripts [5]. each step (Table 1), focusing on transcriptome reconstruc-
One concern regarding the application of RNA-seq is tion and abundance quantification.
abundance estimation at the gene-level and transcript-level
differential expression under distinct conditions. Routine
RNA-seq workflow may consist of the following five steps as

Received October 13, 2015; Revised December 10, 2015; Accepted December 12, 2015
*Corresponding author: Tel: +82-2-2228-0913, Fax: +82-2-2227-8129, E-mail: [email protected]
Copyright © 2015 by the Korea Genome Organization
CC It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/).

www.genominfo.org 119
IS Yang and S Kim. RNA-Seq Analysis Workflow and Software

from aligning. The adapter trimming step is typically not

necessary, as most recent sequencers provide raw data in
which the adapters are already trimmed. In contrast, quality
trimming may be an essential step depending on the analysis
strategy used. The FASTX-Toolkit [10] and FLEXBAR [11]
are useful for this purpose.

Read Alignment
There are two strategies in which a genome or trans-
criptome is used as a reference for the read alignment step
[12]. The transcriptome comprises all transcripts in a given
specimen and in which splicing has been conducted by
including the exons and excluding the introns. If a
transcriptome is used as a reference, unspliced aligners that
do not allow large gaps may be the proper choice for accurate
read mapping. Stampy, Mapping and Assembly with Quality
(MAQ) [13], Burrow-Wheeler Aligner (BWA) [14], and
Bowtie [15] can be used in this case. This alignment is
limited to the identification of known exons and junctions
because it does not identify splicing events involving novel
Fig. 1. Typical workflow for RNA sequencing (RNA-seq) data
exons. However, if the genome is used as a reference, spliced
analysis. This workflow shows an example for expression quanti- aligners that allow a wide range of gaps should be employed
fication and differential expression analysis at gene and/or transcript because reads aligned at exon-exon junctions will be split
level using RNA-seq, which is typically consisted of five steps as into two fragments. This approach may increase the pro-
following: preprocessing, read alignment, transcriptome reconstruc-
tion, expression quantification and differential expression analysis. bability of identifying novel transcripts generated by
For each step, currently available programs are written in Table 1. alternative splicing. Various spliced aligners have been
QC, quality control. developed, including TopHat [16], MapSplice [17], STAR
[18], and GSNAP [19].
Preprocessing of Raw Data
RNA-Seq Specific QC
Similarly to whole genome or exome sequencing, RNA-
seq data is formatted in FASTQ (sequence and base quality). Several intrinsic biases and limitations including nucle-
Numerous erroneous sequence variants can be introduced otide composition bias, GC bias and polymerase chain
during the library preparation, sequencing, and imaging reaction bias can be introduced to RNA-seq data of clinical
steps [7], which should be identified and filtered out in the samples with low quality or quantity. To evaluate the biases
data analysis step. Thus, QC of raw data should be performed from RNA-seq data, several metrics may be examined as
as the initial step of routine RNA-seq workflow. Tools such following: percentage of exonic or rRNA reads, accuracy and
as FastQC [8] and HTQC [9] can be applied in this step to biases in gene expression measurements, GC bias, evenness
assess the quality of raw data, enabling assessment of the of coverage, 5′-to-3′ coverage bias, and coverage of 5′ and 3′
overall and per-base quality for each read (i.e., read 1 and 2 ends [6]. Some programs including RNA-SeQC [20],
in case of paired-end sequencing) in each sample. Depending RSeQC [21], and Qualimap 2 [22] are currently available for
on the RNA-seq library construction strategy, some form of the purposes, which take typically BAM file as input.
read trimming may be advisable prior to aligning the RNA-SeQC [20] provides three types of QC metrics based
RNA-seq data. Two common trimming strategies include on read count (total, unique and duplicate reads, rRNA
“adapter trimming” and “quality trimming.” Adapter content, strand specificity, etc.), coverage (mean coverage,
trimming involves removal of the adapter sequence by mas- 5′/3′ coverage, GC bias, etc.), and expression correlation
king specific sequences used during library construction. (reads per kilobase per million mapped reads [RPKM]–based
Quality trimming generally removes the ends of reads where estimation of expression levels and correlation matrix by all
base quality scores have decreased to a level such that pairwise comparison). The software also provides multi-
sequence errors and the resulting mismatches prevent reads sample evaluation regarding library construction protocols,

120 www.genominfo.org
Genomics & Informatics Vol. 13, No. 4, 2015

input materials and other experimental parameters. combination of TopHat and Cufflinks [31]. The latter pro-
RSeQC [21] is a Python-based package program that tocol also includes a transcriptome reconstruction procedure
provides several metrics containing sequence quality, GC (using Cufflinks) from read mapping data to a reference
bias, polymerase chain reaction bias, nucleotide composition genome (using TopHat). These protocols are good examples
bias, sequencing depth, strand specificity, coverage uni- of different strategies that can be used for transcriptome
formity, and read distribution over the genome structure. Of reconstruction according to the presence or absence of a
the metrics, sequencing depth is importance, because it reference sequence.
allows users to determine if current RNA-seq data is suitable
for such application including expression profiling, alter- Expression Quantification
native splicing analysis, novel isoform identification, and
transcriptome reconstruction by checking whether the Numerous methods have been developed for expression
sequencing depth is saturated or not. quantification using RNA-seq data. The methods are
Qualimap 2 [22] is consisted of four analysis modes: BAM grouped into two according to the target levels: gene- and
QC, Counts QC, RNA-seq QC, and Multi-sample BAM QC. isoform-level quantification. Alternative expression analysis
Compared to previous release, this version focuses on by sequencing (ALEXA-seq) [32], enhanced read analysis of
multi-sample QC for high-throughput sequencing data. gene expression (ERANGE) [33], and normalization by
Multi-sample BAM QC mode allows combined QC for expected uniquely mappable area (NEUMA) [34] support
multiple alignment files, which takes the metrics from the gene-level quantification. Isoform-level quantification
single-sample BAM QC mode as input. RNA-seq QC mode is methods are divided into three groups according to the
added to compute the metrics specific to RNA-seq data, reference type and requirement of alignment results. The
which contains per-transcript coverage, junction sequence first group (e.g., RSEM [35]) requires the alignment result of
distribution, genomic localization of reads, 5′-3′ bias and reads using the transcriptome as a reference. The second
consistency of the library protocol. Counts QC mode enables group (e.g., Cufflinks [24] and StringTie [26]) also requires
to estimate the saturation of sequencing depth, read count alignment results of reads using whole genome sequences as
densities, correlation of samples and distribution of counts a reference rather than the transcriptome. The last group
among classes of selected features along with gene ex- (e.g., Sailfish [36]) uses an alignment-free method. We
pression estimation based on NOIseq [23]. discuss each isoform-level quantification method in detail in
the following sections.
Transcriptome Reconstruction
RSEM
Transcriptome reconstruction is the identification of all
transcripts expressed in a specimen. There are two strategies RSEM is software that quantifies transcript-level
used for transcriptome reconstruction, including the abundance from RNA-seq data. RSEM is operated in two
reference-guided approach and the reference-independent steps: (1) generation and preprocessing of a set of reference
approach. First, the reference-guided approach consists of transcript sequences and (2) alignment of reads to the
two sequential steps: (1) alignment of raw reads to the reference transcripts followed by estimation of transcript
reference as described in the previous section and (2) abundances and their credibility intervals. A FASTA
assembly of overlapping reads for reconstructing transcripts. formatted file of transcript sequences is used to generate the
This approach is advantageous when reference annotation reference transcripts, which can be obtained from a reference
information is well-known, such as in human and mouse, genome database, a de novo transcriptome assembler, or an
which is employed in Cufflinks [24], Scripture [25], and Expressed Sequence Tags (EST) database. Alternatively, a
StringTie [26]. Second, the reference-independent approach gene annotation file in GTF format and the full genome
uses a de novo assembly algorithm to directly build con- sequence in FASTA format may be supplied. RSEM uses the
sensus transcripts from short reads without reference, Bowtie alignment program [15]. A user-provided aligner can
which is useful when there is no known reference genome or be used for mapping RNA-seq reads using reference
transcriptome. Trinity [27], Oases [28], and transABySS transcripts. RSEM provides gene-level and isoform-level
[29] may be used for this purpose. estimates as the primary output by computing maximum
Two publications have described RNA-seq protocols: one likelihood abundance estimates based on the Expectation-
is de novo transcriptome reconstruction without reference Maximization (EM) algorithm after read mapping. Abundance
using the Trinity platform [30] and the other is differential estimates are given in terms of two measures: an estimate of
expression analysis of a gene and transcript using a the number of fragments and the estimated fraction of

www.genominfo.org 121
IS Yang and S Kim. RNA-Seq Analysis Workflow and Software

transcripts comprising a given isoform or gene. The latter genome, where some reads can be spliced when they were
−6
estimates can be multiplied by 10 to obtain a measure of aligned on the exon-exon junctions of transcripts. These
transcripts per million (TPM). RSEM also supports the mapped reads are provided as input to Cufflinks for
visualization of alignment and read depth using a genome transcript assembly and abundance estimation. Transcript
browser such as the University of California Santa Cruz assembly is achieved by building an overlap graph from the
(UCSC) Genome Browser. mapped reads followed by computing minimal path cover in
the overlap graph, generating a minimum number of
Cufflinks transcripts that will explain all reads in the graph.
Abundance estimation is performed by estimating the
The Tuxedo package is the most widely used software for maximum likelihood abundance based on transcript
transcript assembly and quantification using RNA-seq and coverage and compatibility together with the use of fragment
consists of a number of different programs, including TopHat, length distribution. Abundances are reported in fragments
Cufflinks, and Cuffdiff [31]. In the initial step, TopHat is per kilobase per million mapped fragments (FPKM) for
employed for mapping raw RNA-seq reads to a reference paired-end and RPKM for a single-end. Cuffdiff, a part of the

Table 1. Selected list of RNA-seq analysis programs

Workflow Category Package Reference
Preprocessing of raw data Raw data QC FastQC [8]
　　 HTQC [9]
　 Read trimming FASTX-Toolkit [10]
　　 FLEXBAR [11]
Read alignment Unspliced aligner MAQ [13]
　　 BWA [14]
　　 Bowtie [15]
　 Spliced aligner TopHat [16]
　　 MapSplice [17]
　　 STAR [18]
　　 GSNAP [19]
RNA-seq specific quality control 　 RNA-SeQC [20]
　　 RSeQC [21]
　　 Qualimap 2 [22]
Transcriptome reconstruction Reference-guided Cufflinks [24]
　　 Scripture [25]
　　 StringTie [26]
　 Reference-independent Trinity [27]
　　 Oases [28]
　　 transABySS [29]
Expression quantification Gene-level quantification ALEXA-seq [32]
　　 Enhanced read analysis of gene [33]
expression (ERANGE)
　　 Normalization by expected uniquely [34]
mappable area (NEUMA)
　 Isoform-level quantification Cufflinks [24]
　　 StringTie [26]
　　 RSEM [35]
　　 Sailfish [36]
Differential expression Gene-level NOIseq [23]
　　 edgeR [39]
　　 DESeq [40]
　　 SAMseq [41]
　 Isoform-level Cuffdiff [24]
　　 EBSeq [42]
Ballgown [45]
RNA-seq, RNA sequencing; MAQ, Mapping and Assembly with Quality; BWA, Burrow-Wheeler Aligner.

122 www.genominfo.org
Genomics & Informatics Vol. 13, No. 4, 2015

Cufflinks package, also uses the mapped reads to report Differential Expression using RNA-seq
genes and transcripts that are differentially expressed.
CummeRbund can produce figures and plots from the For differential expression analysis, a number of software
Cuffdiff outputs. packages and pipelines have been developed including
edgeR [39], DESeq [40], NOIseq [23], SAMseq [41],
StringTie Cuffdiff [24], and EBSeq [42]. Unlike edgeR and DESeq,
which adopt negative binomial models, and NOIseq and
StringTie is software used for transcriptome recon- SAMseq, which are non-parametric, Cuffdiff and EBSeq can
struction and abundance estimation. Similarly to other tools, be used to compare differentially expressed genes by
including Cufflinks, spliced aligners such as TopHat2 [37] or employing transcript-based detection methods. Many of the
GSNAP [19] are used to directly align RNA-seq reads or programs accept read count data as input, which can be
subsequent alignment after generating pre-assembled contigs produced by using HTSeq [43] or BEDTools [44]. Similarly
from the reads using a de novo assembler such as MaSurCa to Cuffdiff, Ballgown program [45] is employed for
[38]. StringTie can perform transcriptome reconstruction differential expression analysis using read mapping data
and abundance estimation simultaneously by building a flow from StringTie [26] (https://ccb.jhu.edu/software/stringtie/
network for the path of the heaviest coverage and computing index.shtml?t=manual). The above programs adopt one or
the maximum flow to estimate abundance. StringTie reports more of the several available normalization methods (total
estimated abundance in FPKM for paired-end and RPKM for count, upper quartile, median, DESeq normalization,
single-end. trimmed mean of M values, quantile and RPKM nor-
malization) to correct biases that may appear between
Sailfish samples (sequencing depth [33]) or within sample (gene
length [46] and GC contents [45]).
Sailfish is unique software adopting an alignment-free Although many programs have been developed, one
approach for isoform quantification. An index is built from a research group reported that there may be large differences
set of reference transcripts and a specific choice of k-mer between these programs and that no single method may be
length, which consists of data structures that maps each optimal under all experimental conditions [48]. Thus, it may
k-mer in the reference transcripts to a unique integer be difficult for most of users with no or weak statistical
identifier, enabling to count k-mers in a set of reads and to background to select a proper method. However, because
resolve their origin in the set of transcripts. Until the set of RNA-seq data sets are rapidly accumulating, we expect that
reference transcripts or the k-mer length is changed, it is not new bioinformatics tools for differential expression will be
necessary to rebuild the index. Sailfish computes an estimate developed, which will function robustly under a wide range
of the relative abundance of each transcript in the reference of conditions.
by employing an EM algorithm similar to that used in RSEM.
Because Sailfish avoids read alignment entirely, the running Conclusion
time for quantification is much lower than for other existing
methods. Sailfish reports terms of abundance measures, Numerous bioinformatics programs have been developed
including (1) RPKM, (2) k-mers per kilobase per million for RNA-seq data analysis. Even tools developed for a same
mapped k-mers (KPKM), and (3) TPM. purpose are based on distinct approaches using different
We described four programs, RSEM, Cufflinks, StringTie, algorithms and models. The diversity of the methodology
and Sailfish in detail. In addition to the use of specific makes it possible to customize analysis protocols by
algorithm, a major difference between these programs may choosing a program that provides the best fit to each specific
be the reference type used. A set of transcript sequences is goal. In this review, we described the routine RNA-seq
used as a reference in RSEM and Sailfish, indicating that the analysis workflow, focusing on transcriptome reconstruc-
programs may be suitable for estimating the abundance of tion and expression quantification, and also introduced its
known transcripts. In contrast, a reference genome is em- related bioinformatics programs. Therefore, we expect that
ployed in Cufflinks and StringTie, making it possible to this review will be helpful for preparing a specific pipeline for
present the estimated abundance of novel transcripts as well RNA-seq data analysis, enabling to design new biological
as already known transcripts, as spliced read mapping data experiments.
can reveal known and novel splice junction information
simultaneously.

www.genominfo.org 123
IS Yang and S Kim. RNA-Seq Analysis Workflow and Software

Acknowledgments 17. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et
al. MapSplice: accurate mapping of RNA-seq reads for splice
This work was supported by the Bio-Synergy Research junction discovery. Nucleic Acids Res 2010;38:e178.
Project (NRF-2014M3A9C4066449) of the Ministry of 18. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S,
et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics
Science, ICT and Future Planning through the National
2013;29:15-21.
Research Foundation. 19. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex
variants and splicing in short reads. Bioinformatics 2010;26:
References 873-881.
20. DeLuca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD,
1. Ozsolak F, Milos PM. RNA sequencing: advances, challenges Williams C, et al. RNA-SeQC: RNA-seq metrics for quality
and opportunities. Nat Rev Genet 2011;12:87-98. control and process optimization. Bioinformatics 2012;28:
2. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. 1530-1532.
RNA-seq: an assessment of technical reproducibility and com- 21. Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq
parison with gene expression arrays. Genome Res 2008;18: experiments. Bioinformatics 2012;28:2184-2185.
1509-1517. 22. Okonechnikov K, Conesa A, Garcia-Alcalde F. Qualimap 2: ad-
3. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, vanced multi-sample quality control for high-throughput se-
et al. Alternative isoform regulation in human tissue trans- quencing data. Bioinformatics 2015 Oct 1 [Epub]. http://dx.
criptomes. Nature 2008;456:470-476. doi.org/10.1093/bioinformatics/btv566.
4. Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne 23. Tarazona S, Furio-Tari P, Turra D, Pietro AD, Nueda MJ, Ferrer
M, et al. Annotating genomes with massive-scale RNA A, et al. Data quality aware analysis of differential expression
sequencing. Genome Biol 2008;9:R175. in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res
5. Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han 2015;43:e140.
B, Jing X, et al. Transcriptome sequencing to detect gene fu- 24. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van
sions in cancer. Nature 2009;458:97-101. Baren MJ, et al. Transcript assembly and quantification by
6. Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, RNA-Seq reveals unannotated transcripts and isoform
Berlin AM, et al. Comparative analysis of RNA sequencing switching during cell differentiation. Nat Biotechnol 2010;28:
methods for degraded or low-input samples. Nat Methods 511-515.
2013;10:623-629. 25. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J,
7. Robasky K, Lewis NE, Church GM. The role of replicates for Adiconis X, et al. Ab initio reconstruction of cell type-specific
error mitigation in next-generation sequencing. Nat Rev Genet transcriptomes in mouse reveals the conserved multi-exonic
2014;15:56-62. structure of lincRNAs. Nat Biotechnol 2010;28:503-510.
8. Babraham Bioinformatics. Fast QC. Cambridgeshire: 26. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT,
Babraham Institute, 2015. Accessed 2015 Nov 2. Available from: Salzberg SL. StringTie enables improved reconstruction of a
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/. transcriptome from RNA-seq reads. Nat Biotechnol 2015;33:
9. Yang X, Liu D, Liu F, Wu J, Zou J, Xiao X, et al. HTQC: a fast 290-295.
quality control toolkit for Illumina sequencing data. BMC 27. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA,
Bioinformatics 2013;14:33. Amit I, et al. Full-length transcriptome assembly from
10. FASTX-Toolkit. Cold Spring Harbor: Cold Spring Harbor RNA-Seq data without a reference genome. Nat Biotechnol
Laboratory, 2015. Accessed 2015 Nov 2. Available from: 2011;29:644-652.
http://hannonlab.cshl.edu/fastx_toolkit/. 28. Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust
11. Dodt M, Roehr JT, Ahmed R, Dieterich C. FLEXBAR-flexi- de novo RNA-seq assembly across the dynamic range of ex-
blebarcode and adapter processing for next-generation se- pression levels. Bioinformatics 2012;28:1086-1092.
quencing platforms. Biology (Basel) 2012;1:895-905. 29. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman
12. Garber M, Grabherr MG, Guttman M, Trapnell C. Computa- SD, et al. De novo assembly and analysis of RNA-seq data. Nat
tional methods for transcriptome annotation and quantifica- Methods 2010;7:909-912.
tion using RNA-seq. Nat Methods 2011;8:469-477. 30. Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD,
13. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads Bowden J, et al. De novo transcript sequence reconstruction
and calling variants using mapping quality scores. Genome Res from RNA-seq using the Trinity platform for reference gen-
2008;18:1851-1858. eration and analysis. Nat Protoc 2013;8:1494-1512.
14. Li H, Durbin R. Fast and accurate short read alignment with 31. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et
Burrows-Wheeler transform. Bioinformatics 2009;25:1754-1760. al. Differential gene and transcript expression analysis of
15. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and RNA-seq experiments with TopHat and Cufflinks. Nat Protoc
memory-efficient alignment of short DNA sequences to the 2012;7:562-578.
human genome. Genome Biol 2009;10:R25. 32. Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS,
16. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice Morin RD, et al. Alternative expression analysis by RNA
junctions with RNA-Seq. Bioinformatics 2009;25:1105-1111. sequencing. Nat Methods 2010;7:843-847.

124 www.genominfo.org
Genomics & Informatics Vol. 13, No. 4, 2015

33. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. parametric approach for identifying differential expression in
Mapping and quantifying mammalian transcriptomes by RNA-Seq data. Stat Methods Med Res 2013;22:519-536.
RNA-Seq. Nat Methods 2008;5:621-628. 42. Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits
34. Lee S, Seo CH, Lim B, Yang JO, Oh J, Kim M, et al. Accurate BM, et al. EBSeq: an empirical Bayes hierarchical model for in-
quantification of transcriptome from RNA-Seq data by effec- ference in RNA-seq experiments. Bioinformatics 2013;29:
tive length normalization. Nucleic Acids Res 2011;39:e9. 1035-1043.
35. Li B, Dewey CN. RSEM: accurate transcript quantification 43. Anders S, Pyl PT, Huber W. HTSeq: a Python framework to
from RNA-Seq data with or without a reference genome. BMC work with high-throughput sequencing data. Bioinformatics
Bioinformatics 2011;12:323. 2015;31:166-169.
36. Patro R, Mount SM, Kingsford C. Sailfish enables align- 44. Quinlan AR. BEDTools: The Swiss-Army tool for genome fea-
ment-free isoform quantification from RNA-seq reads using ture analysis. Curr Protoc Bioinformatics 2014;47:11.12.1-11.
lightweight algorithms. Nat Biotechnol 2014;32:462-464. 12.34.
37. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg 45. Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek
SL. TopHat2: accurate alignment of transcriptomes in the JT. Ballgown bridges the gap between transcriptome assembly
presence of insertions, deletions and gene fusions. Genome Biol and expression analysis. Nat Biotechnol 2015;33:243-246.
2013;14:R36. 46. Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq
38. Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke data confounds systems biology. Biol Direct 2009;4:14.
JA. The MaSuRCA genome assembler. Bioinformatics 2013;29: 47. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE,
2669-2677. Nkadori E, et al. Understanding mechanisms underlying hu-
39. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Biocon- man gene expression variation with RNA sequencing. Nature
ductor package for differential expression analysis of digital 2010;464:768-772.
gene expression data. Bioinformatics 2010;26:139-140. 48. Seyednasrollah F, Laiho A, Elo LL. Comparison of software
40. Anders S, Huber W. Differential expression analysis for se- packages for detecting differential expression in RNA-seq
quence count data. Genome Biol 2010;11:R106. studies. Brief Bioinform 2015;16:59-70.
41. Li J, Tibshirani R. Finding consistent patterns: a non-

www.genominfo.org 125

RNA Seq Data Analysis
No ratings yet
RNA Seq Data Analysis
90 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Intro To RNA-seq Concepts
No ratings yet
Intro To RNA-seq Concepts
85 pages
The RNA World 11th Lect High-Throughput Methods GH AY16 2017
No ratings yet
The RNA World 11th Lect High-Throughput Methods GH AY16 2017
59 pages
3 RNAseq Background
No ratings yet
3 RNAseq Background
42 pages
RNA-Seq Workflow: Gene-Level Exploratory Analysis and Differential Expression
No ratings yet
RNA-Seq Workflow: Gene-Level Exploratory Analysis and Differential Expression
42 pages
Module 7 8 Lecture Slides
No ratings yet
Module 7 8 Lecture Slides
59 pages
Rnaseq by Example
No ratings yet
Rnaseq by Example
163 pages
Week13
No ratings yet
Week13
43 pages
Lecture4 Expression - Analysis 2019
No ratings yet
Lecture4 Expression - Analysis 2019
79 pages
A Guide To Basic RNA Sequencing Data
No ratings yet
A Guide To Basic RNA Sequencing Data
30 pages
Cm2 Debily m1 Funcgenprecmed 2024 25
No ratings yet
Cm2 Debily m1 Funcgenprecmed 2024 25
41 pages
Nazarov QC-Statistics
No ratings yet
Nazarov QC-Statistics
50 pages
RNA Seq R - Final Decode
No ratings yet
RNA Seq R - Final Decode
76 pages
RNA Seq - Applications and Best Practices
No ratings yet
RNA Seq - Applications and Best Practices
34 pages
Survey RNA-Seq Data Analysis (2016)
No ratings yet
Survey RNA-Seq Data Analysis (2016)
19 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
WES Shivangi
No ratings yet
WES Shivangi
43 pages
Quality Control & Normalization of RNA SEQ Data: Shivangi Agarwal, PHD
No ratings yet
Quality Control & Normalization of RNA SEQ Data: Shivangi Agarwal, PHD
35 pages
Nihms 977214
No ratings yet
Nihms 977214
21 pages
NoePerron SummerBioinformaticsWorkshop
No ratings yet
NoePerron SummerBioinformaticsWorkshop
68 pages
NGS Data Analysis
No ratings yet
NGS Data Analysis
4 pages
Rna Seq Workflows Guide M GL 00034
No ratings yet
Rna Seq Workflows Guide M GL 00034
24 pages
On The Optimal Trimming of High-Throughput mRNAseq Data
No ratings yet
On The Optimal Trimming of High-Throughput mRNAseq Data
19 pages
Module 3 5mark.
No ratings yet
Module 3 5mark.
23 pages
M.SC Transcriptome Analysis 2025
No ratings yet
M.SC Transcriptome Analysis 2025
21 pages
Artigo Bioinformática
No ratings yet
Artigo Bioinformática
19 pages
BN335 L6 Transcriptomics JH
No ratings yet
BN335 L6 Transcriptomics JH
9 pages
Complete Bulk RNA Sequencing Presentation
No ratings yet
Complete Bulk RNA Sequencing Presentation
10 pages
2023 Article 9886
No ratings yet
2023 Article 9886
9 pages
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
No ratings yet
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
17 pages
Bioinformatics 29 1 15
No ratings yet
Bioinformatics 29 1 15
7 pages
Tutorial RNA-Seq Analysis Part 1
No ratings yet
Tutorial RNA-Seq Analysis Part 1
8 pages
Systematic Comparison and Assessment of RNA Seq Procedures For Gene Expression Quantitative Analysis
No ratings yet
Systematic Comparison and Assessment of RNA Seq Procedures For Gene Expression Quantitative Analysis
15 pages
Tutorial RNA-Seq Analysis Part 2
No ratings yet
Tutorial RNA-Seq Analysis Part 2
9 pages
Introduction To Single-Cell RNA-seq
No ratings yet
Introduction To Single-Cell RNA-seq
8 pages
Trapnell 2024 TopHat Discovering Splice Junction Wiht RNaSeq
No ratings yet
Trapnell 2024 TopHat Discovering Splice Junction Wiht RNaSeq
7 pages
Gene Expression RNA Sequence
No ratings yet
Gene Expression RNA Sequence
120 pages
Tool Combinaison Nfcore Rnaseq
No ratings yet
Tool Combinaison Nfcore Rnaseq
9 pages
BGi RNA-Seq Analysis
No ratings yet
BGi RNA-Seq Analysis
19 pages
Module8 RNASeq Pathogen Practical Manual
No ratings yet
Module8 RNASeq Pathogen Practical Manual
23 pages
RNA-seq With NOISeq R-Bioc Package
No ratings yet
RNA-seq With NOISeq R-Bioc Package
15 pages
RNA Sequencing (RNA-seq) - Comprehensive Notes
No ratings yet
RNA Sequencing (RNA-seq) - Comprehensive Notes
5 pages
Alignment
No ratings yet
Alignment
3 pages
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
No ratings yet
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
6 pages
Kratz Et Al. 2014. The Devil in Details RNAseq
No ratings yet
Kratz Et Al. 2014. The Devil in Details RNAseq
3 pages
Assays For Mutation Rate
No ratings yet
Assays For Mutation Rate
8 pages
Transcriptome Analysis
No ratings yet
Transcriptome Analysis
6 pages
Day1 Laros RNASeq Galaxy 2012
No ratings yet
Day1 Laros RNASeq Galaxy 2012
40 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
ExSeq Presentation With Background
No ratings yet
ExSeq Presentation With Background
40 pages
RNA-Seq and Transcriptome Analysis: Jessica Holmes
No ratings yet
RNA-Seq and Transcriptome Analysis: Jessica Holmes
98 pages
Cancer Systems Biology - Methods and Protocols (PDFDrive)
100% (2)
Cancer Systems Biology - Methods and Protocols (PDFDrive)
397 pages
The Bench Scientist's Guide To Statistical Analysis of RNA-Seq Data
No ratings yet
The Bench Scientist's Guide To Statistical Analysis of RNA-Seq Data
10 pages
Perspectives: Rna-Seq: A Revolutionary Tool For Transcriptomics
No ratings yet
Perspectives: Rna-Seq: A Revolutionary Tool For Transcriptomics
7 pages
RNA Seq Tutorial
0% (1)
RNA Seq Tutorial
139 pages
People and Earth'S Ecosystem: Topic 2: Ecosystem Structure & Function
No ratings yet
People and Earth'S Ecosystem: Topic 2: Ecosystem Structure & Function
54 pages
General Biology 1 Q1 Cell Cycle Quiz
No ratings yet
General Biology 1 Q1 Cell Cycle Quiz
3 pages
Ec 94
No ratings yet
Ec 94
2 pages
RNA-Seq Analysis Course
No ratings yet
RNA-Seq Analysis Course
40 pages
Chapter On Transcriptomics
No ratings yet
Chapter On Transcriptomics
13 pages
Blank en Berg Pittsburgh 2011 Ngs
No ratings yet
Blank en Berg Pittsburgh 2011 Ngs
59 pages
B.sc. Hons. Zoology
0% (1)
B.sc. Hons. Zoology
110 pages
Eukaryotic Gene Regulation Notes
No ratings yet
Eukaryotic Gene Regulation Notes
16 pages
Cell Cycle
No ratings yet
Cell Cycle
33 pages
H2 Biology - Notes On Organisation and Control of Prokaryotic and Eukaryotic Genomes
100% (5)
H2 Biology - Notes On Organisation and Control of Prokaryotic and Eukaryotic Genomes
15 pages
Ecological Succession Worksheet
No ratings yet
Ecological Succession Worksheet
2 pages
Mitosis PPT Editted
No ratings yet
Mitosis PPT Editted
21 pages
Immune Response
No ratings yet
Immune Response
34 pages
DLL Matatag - Science 7 Q2 W4....
No ratings yet
DLL Matatag - Science 7 Q2 W4....
28 pages
Course Objective
No ratings yet
Course Objective
1 page
Bio20 - 10 - Marine Ecology - Color
No ratings yet
Bio20 - 10 - Marine Ecology - Color
28 pages
Early Development in Invertebrates
100% (1)
Early Development in Invertebrates
36 pages
17 Control of Gene Expression in Prokaryotes-S PDF
100% (1)
17 Control of Gene Expression in Prokaryotes-S PDF
8 pages
Ecosystem Energy WebQuest
100% (1)
Ecosystem Energy WebQuest
4 pages
Local and Global Sequence Alignment 5+5 Examples
No ratings yet
Local and Global Sequence Alignment 5+5 Examples
10 pages
Interactions: Environment and Organisms
No ratings yet
Interactions: Environment and Organisms
33 pages
Ideker Bioeng&Sys Bio
No ratings yet
Ideker Bioeng&Sys Bio
8 pages
Communities, Biomes, and Ecosystems
No ratings yet
Communities, Biomes, and Ecosystems
4 pages
Energy Flow Webquest
No ratings yet
Energy Flow Webquest
4 pages
INTERCONNECTEDNESS
No ratings yet
INTERCONNECTEDNESS
27 pages
ECOLOGY
100% (2)
ECOLOGY
1 page
Cell Cycle All Chapter
No ratings yet
Cell Cycle All Chapter
16 pages
Hsslive - Xi - Bot - Chapter 6 Cell Cycle and Cell Division
No ratings yet
Hsslive - Xi - Bot - Chapter 6 Cell Cycle and Cell Division
5 pages
Blok 3 (Pensinyalan Sel)
No ratings yet
Blok 3 (Pensinyalan Sel)
8 pages
Revision Sheet g5 q1 (Final)
No ratings yet
Revision Sheet g5 q1 (Final)
9 pages
9th IMO (Biology)
No ratings yet
9th IMO (Biology)
4 pages
Biology Bio: Life Ology: Study of
No ratings yet
Biology Bio: Life Ology: Study of
15 pages
Ecology Essay
No ratings yet
Ecology Essay
1 page
Practical gRPC Development: Definitive Reference for Developers and Engineers
From Everand
Practical gRPC Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Transcriptome Software Paper

Uploaded by

Transcriptome Software Paper

Uploaded by

G&I

Analysis of Whole Transcriptome Sequencing Data:

Keywords: bioinformatics tools, gene expression, high-throughput RNA sequencing, transcript

shown in Fig. 1: (1) preprocessing of raw data, (2) read

from aligning. The adapter trimming step is typically not

Table 1. Selected list of RNA-seq analysis programs

You might also like