Margue Rat 2010

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Cell. Mol. Life Sci.

(2010) 67:569–579
DOI 10.1007/s00018-009-0180-6 Cellular and Molecular Life Sciences
REVIEW

RNA-seq: from technology to biology


Samuel Marguerat • Jürg Bähler

Received: 23 July 2009 / Revised: 11 September 2009 / Accepted: 8 October 2009 / Published online: 27 October 2009
Ó The Author(s) 2009. This article is published with open access at Springerlink.com

Abstract Next-generation sequencing technologies are of RNAs are tightly controlled, and they shape complex
now being exploited not only to analyse static genomes, gene expression networks that ultimately drive biological
but also dynamic transcriptomes in an approach termed processes. These networks need to be robust as well as
RNA-seq. Although these powerful and rapidly evolving highly plastic in order to allow rapid adaptation to envi-
technologies have only been available for a couple of ronmental or genetic perturbations [1]. An in-depth
years, they are already making substantial contributions to understanding of the principles and mechanisms governing
our understanding of genome expression and regulation. these complex gene expression programmes is important to
Here, we briefly describe technical issues accompanying better understand complex diseases such as cancer. For
RNA-seq data generation and analysis, highlighting dif- more than 10 years, microarrays have allowed the simul-
ferences to array-based approaches. We then review recent taneous monitoring of expression levels of all annotated
biological insight gained from applying RNA-seq and genes in cell populations [2, 3]. The ability to analyse entire
related approaches to deeply sample transcriptomes in gene expression programmes has opened new horizons for
different cell types or physiological conditions. These our understanding of global processes regulating gene
approaches are providing fascinating information about expression. Similarly, with the increasing realisation that
transcriptional and post-transcriptional gene regulation, RNAs transcribed from non-coding portions of genomes are
and they are also giving unique insight into the richness of playing fundamental roles, genome-wide approaches have
transcript structures and processing on a global scale and at provided valuable insights into this aspect of transcripto-
unprecedented resolution. mes. Later generations of microarrays (referred to as ‘‘tiling
arrays’’), which consist of probes designed to interrogate a
Keywords High-throughput sequencing  genome systematically irrespective of any gene annotation,
Transcriptional control  Non-coding RNA  have been instrumental in discovering unknown transcripts
Post-transcriptional control  Gene expression  Splicing  [4]. Applying this technique to several different organisms
Transcriptome  Genome has demonstrated that the complexity of transcriptomes has
indeed been vastly underestimated [5]. This is when next-
generation sequencers have entered the market. These
Introduction platforms allow the rapid and cost-effective generation of
massive amounts of sequence data. Obviously, this break-
Regulation of gene expression is fundamental to link through provides a huge potential to revolutionise the field
genotypes with phenotypes. The synthesis and maturation of transcriptomics. Even though direct sequencing of cDNA
libraries has been achieved before with SAGE [6] and
MPSS [7] approaches, next-generation sequencing (NGS)
S. Marguerat  J. Bähler (&) technologies are more straightforward and more affordable.
Department of Genetics, Evolution and Environment,
RNA-seq was thus born [8–11].
UCL Cancer Institute, University College London,
Darwin Building, Gower Street, London WC1E 6BT, UK In this review, we will first provide an overview of the
e-mail: [email protected] strengths and challenges inherent to RNA-seq and will then
570 S. Marguerat, J. Bähler

highlight major biological insights gained from RNA-seq sodium bisulphate [14]. This chemical triggers the trans-
in a wide range of organisms. formation of cytidine into uridine; widespread C–T
transition therefore ‘‘marks’’ the coding strand of each
transcript. Six additional RNA-seq protocols that maintain
RNA-seq data generation and analysis strand-specificity have been published. They differ in how
the adaptor sequences are inserted into the cDNA, which is
The NGS market is currently dominated by three different achieved (1) by direct ligation of RNA adaptors to the
platforms: the FLX pyrosequencing system from 454 Life RNA sample before reverse transcription [15, 16], (2) by
Sciences (a Roche company), the Illumina Genome Ana- addition of the adaptor sequences by template switch
lyser (developed initially by Solexa), and the AB SOLiD during reverse transcription [17], (3) by double-random
system (now Life Technologies). On all three platforms, priming coupled to solid phase extraction [18], (4) by
DNA fragments are sequenced in parallel, producing large direct ligation of the DNA adaptors to single-stranded
numbers of relatively short sequence ‘‘reads’’ or ‘‘tags’’. cDNA [19–21], (5) by reverse transcription of in vitro
The throughput varies from hundreds of thousands of reads polyadenylated RNA fragments followed by intramolecular
for the FLX system to hundreds of millions of reads for the ligation [22], or (6) by incorporation of dUTP during sec-
Illumina Genome Analyser and AB SOLiD systems. Read ond strand synthesis and digestion with uracil-N-
lengths range from 30–100 bp for Illumina and SOLiD to glycosylase [23]. These methods are likely to differ in
200–500 bp for FLX. It is important to note that these potential biases introduced in the data, and careful com-
technologies are evolving at a tremendous pace, with ever- parisons will be highly interesting.
increasing numbers and lengths of sequence reads. The NGS technologies exploit light that is emitted when the
three major systems differ significantly in the approaches correct base (or oligonucleotides in case of SOLiD) mat-
used to produce massive amounts of sequences. An in- ches the template being sequenced and is incorporated into
depth discussion of the technical and methodological the sequencing reaction. Thus, NGS raw outputs are image
aspects of these next-generation sequencers is beyond the records of the light emitted by every single parallel
scope of this review and can be found elsewhere [12, 13]. sequencing reaction at every sequencing cycle. These raw
Despite their technological differences, the three major image files represent terabytes of data and require sub-
platforms rely on similar work flows for the production and stantial storage resources. The images are then processed in
analysis of sequencing libraries (Fig. 1). First, the sample order to extract numerical signals for every base at every
nucleic acids have to be sheared in order to reach a size synthesis event from all the parallel reactions. These sig-
compatible with sequencing (typically \500 bp). Second, nals are used for base calling. Improving the quality and
DNA adapters containing unique sequences are attached at reliability of signal extraction and base calling has led to
both ends of the sheared DNA molecules. These adapters significant increases in the quality and throughput of NGS
subsequently allow the DNA fragments to be singled out, data [24–26].
either on beads or on a slide (‘‘flowcell’’), to then be After image and signal processing, NGS data consist of
sequenced in parallel. a list of short sequences together with their base call
The library preparation is a key step of RNA-seq, qualities. These data are fundamentally different from
because it determines how closely the cDNA sequence data microarray data. With hybridisation-based techniques, the
reflect the original RNA population. In the classic NGS scanner returns signal intensities for each probe on the
protocols, which have been developed for the analysis of array. In the case of RNA-seq data, the number of reads
genomic DNA, adapters are ligated onto shared double- mapping to any given region of the genome makes up the
stranded DNA fragments. In order to allow the analysis of signal. Besides providing single base pair resolution,
transcriptomes by NGS, these protocols have been adapted sequencing allows the maintaining of total control on
to the sequencing of cDNA. The most straightforward which reads are included in the final analysis and hence
approach is to simply synthesise double-stranded cDNA, to contribute to the expression signals. Thus, RNA-seq data
which the adapter can then be ligated. This robust protocol are countable and digital in nature. The generation of
has been attractive, because it applies the procedures reliable RNA-seq data therefore relies heavily on proper
developed by the manufacturer for the analysis of genomic mapping of sequencing reads to corresponding reference
DNA, and it has been widely used in the original RNA-seq genomes or on their efficient de novo assembly. Mapping
studies. A substantial drawback of this approach, however, NGS reads with high efficiency and reliability currently
is the loss of information on transcriptional direction, faces several challenges. First, the computing resources
because the adaptor is ligated to double-stranded cDNA. required to map huge numbers of small reads within a
An elegant study has managed to maintain strand infor- reasonable time can be limiting. However, tremendous
mation simply by pre-treating the RNA samples with effort has been invested during the last couple of years to
RNA-seq: from technology to biology 571

Fig. 1 Flowchart of a typical


RNA-seq experiment

develop algorithms that allow mapping of millions of small Once the sequencing reads have been filtered and
reads using limited computing resources and time [27–33]. mapped (or assembled), it is possible to compute an
The second challenge arises from the relatively high error expression score for every base in the genome and thus
rate of NGS data, meaning that non-perfect matches have obtain transcriptome maps at the best possible resolution.
to be considered when mapping reads back to a genome. The true resolution of this approach, however, depends on
This issue is particularly relevant when single nucleotide the amount of sequence coverage and therefore on the
polymorphisms (SNPs) are of interest to detect allele-spe- amount of sequences generated. Sequence coverage can be
cific expression in RNA-seq data. To distinguish a limiting factor, especially when large genomes are ana-
sequencing errors from SNPs requires higher sequencing lysed, due to costs and machine time required.
depths such that correct base calls at each position can be
made, even in heterozygous samples, because each base is
sequenced multiple times. Analysis protocols have been Applying RNA-seq to probe the breadth and depth
developed for the detection of genetic variation at a rea- of genome transcription
sonable sequencing depth and hence at affordable costs
[34]. Library preparation and/or sequencing procedures can The use of NGS technologies for the analysis of RNA has
also introduce systematic biases and artefacts such as over- been pioneered by researchers working with small regula-
amplification of GC-rich regions and generation of dupli- tory RNAs, possibly because this field has benefited less
cate sequences [35]. A third challenge, which is also one of from microarrays as the usual size of small RNAs is too
the most exciting feature of RNA-seq data, is to identify short to be captured adequately with the limited resolution
reads containing post-transcriptionally modified or rear- provided by microarrays. Sequencing of short regulatory
ranged sequences which cannot be mapped directly to the RNAs has resulted in important and exciting papers which
reference genome. This feature will be discussed in more has been extensively reviewed elsewhere [46, 47]. Whole
detail below. Finally, for cases when no good quality ref- transcriptome studies using RNA-seq have emerged soon
erence genome is available, direct de novo assembly of after. To date, transcriptomes have been sequenced for over
RNA-seq data into contigs may be useful. Several assem- a dozen organisms including human [14, 16, 18–20,
blers optimised for short sequence reads have been recently 48–55], mouse [17, 23, 56–58], budding yeast [22, 23,
developed [36–45]. 59–62], fission yeast [63], worm [64], fruit fly [65],
572 S. Marguerat, J. Bähler

non-model organisms [66, 67], several plants [15, 68–71] Another characteristic of RNA-seq data is their high
and prokaryotes [21, 72, 73]. Unlike the genome, the sensitivity, allowing the detection of the expression of
transcriptome dynamically changes in response to the substantially more transcripts in a given cell type compared
environment or to intrinsic programmes, and many studies to what could be detected by microarrays. RNA-seq studies
have reported transcriptome sequences for several cell also contribute to an increased list of the transcripts
types or physiological conditions. expressed in all organisms studied, most of these newly
The countable, almost digital, nature of RNA-seq data defined transcripts being non-coding. A high coverage
makes them particularly attractive for the quantitative RNA-seq study of the fission yeast (Schizosaccharomyces
analysis of transcript expression levels. Nearly every RNA- pombe) transcriptome during vegetative growth revealed
seq study published to date has addressed this question, and that over 94% of this genome is actively transcribed at
they agree that RNA-seq data are highly quantitative and some level, including genes required only under specialised
give reliable measurements of transcript levels in one or physiological conditions [63]. This finding could reflect a
more conditions. The dynamic range of these data is the- small percentage of cells in the population expressing a
oretically only limited by the sequencing depth and has different transcriptional programme [72], or it could reflect
been reported to span at least 5 orders of magnitude [58]. a certain amount of basal background transcription. The
This dynamic range is well beyond the range achieved by latter would be compatible with the suggestion that as
microarrays and close to the estimated range of transcript much as 90% of all RNA Polymerase II (Pol II) initiation
frequencies in the cell. A few studies also looked at the events represent transcriptional noise and raises the ques-
ability of RNA-seq to measure differential gene expression tion of the biological relevance of an almost ubiquitous
[51, 57, 61]. These studies agree in saying that RNA-seq noisy transcription [76].
performs at least as well as microarrays provided an ade- RNA-seq has also been used to dig deep into eukaryotic
quate sequencing depth. RNA-seq has the advantage transcriptomes and reveal an intriguing new feature of
though that, besides differential transcripts levels, levels of eukaryotic transcription at promoters. Cryptic unstable
different splice variants or of transcripts with different transcripts (CUTs) are small RNA Pol II transcripts found
UTR length can be assessed at the same time (see below). in the budding yeast (Saccharomyces cerevisiae) which are
Producing enough reads for accurate quantification of targeted for degradation by the exosome complex imme-
lowly expressed transcripts, however, can still be quite diately after synthesis [77]. While the mechanisms
expensive for large transcriptomes. In a variant of RNA- regulating their processing have been extensively studied,
seq, only small tags at the 30 ends of transcripts are the prevalence of CUTs in the yeast genome has remained
sequenced. This assay permits the measurement of even unknown. Two studies have determined the genome-wide
lowly expressed transcripts with a limited amount of distributions and structures of CUTs [78, 79], using NGS to
sequencing reads [57, 74]. sequence a SAGE library enriched for CUTs or high-den-
Besides this quantitative aspect, RNA-seq studies are sity tiling arrays, respectively. Interestingly, CUTs seem to
enabling researchers to refine transcript annotation, pro- be well-defined transcriptional units arising mostly from
viding for instance accurate maps of transcript start and end nucleosome-free regions (NFRs). NFRs are characteristic
sites. This feature is of particular help for dense prokary- of eukaryotic genomes and can be found mainly in the
otic genomes, allowing confident discrimination between promoters and terminators of genes [80]. A fraction of
single gene transcriptional units and operons encompassing CUTs are overlapping the 50 ends of genes, suggesting a
several genes [72]. The analysis of transcript structures is potential regulatory function. However, CUTs are most
also fundamental for the study of complex diseases such as frequently transcribed in divergent orientation from the
cancer. Genomic re-arrangements or mutations can gener- promoters of genes, suggesting that they could be by-
ate aberrant fusion transcripts which, if stably expressed, products of Pol II-dependent transcription [78, 79]. These
can lead to pathologies. Such gene fusions have been data suggest that bidirectional transcription is a widespread
shown to be commonly associated with different types of characteristic of eukaryotic promoters. In budding yeast,
tumours [75]. Direct sequencing of transcriptomes, coupled stable transcripts arising from bidirectional transcription
with analysis pipelines allowing the detection of sequence can also be detected, suggesting that this phenomenon is
re-arrangements and abnormal transcript structures, are not restricted to cryptic transcripts [79]. Interestingly, these
powerful tools which permit direct detection of such fusion transcripts show extensive overlaps with annotated genes.
events. Several studies have already provided proofs of A possible regulatory role of bidirectional transcription
principle that this approach is suitable for discovering new remains to be determined, but some data suggest that
aberrant transcripts [19, 50]. Thus, this technological divergent transcripts could act as transcriptional ‘‘links’’
breakthrough will hopefully fuel our understanding of between neighbouring genes and potentially regulate their
complex diseases. co-expression [79]. Bidirectional transcription seems to be
RNA-seq: from technology to biology 573

a conserved characteristic as it can also be detected in based on the genomic sequence alone. Techniques allowing
multicellular eukaryotes. Transcripts similar to yeast CUTs global characterisation of post-transcriptional sequence
have been detected after inactivation of the exosome in alterations and rearrangements are therefore required. High-
human cells. These so-called ‘‘promoter upstream tran- density tiling arrays are only partially suited for the analysis
scripts’’ (PROMTs) are mostly transcribed from promoters of post-transcriptional structural changes as their probe
of active genes in both directions [81]. As in yeast, stable design is unable to capture sequences that either are not
transcripts mapping to both strands of promoters can also encoded in the genome, as in the case of editing, or are not
be detected in metazoans [16, 82–84]. A similar class of adjacent in the genome, as in the case of splicing. These
short transcripts, 20–90 nucleotides in length, has been limitations could in principle be circumvented by designing
found in mouse ES cells, up- and downstream of the additional sets of probes for the array, but this requires high
transcription start sites (TSS) [82]. Interestingly, these quality annotation. RNA-seq, on the other hand, is partic-
short divergent transcripts are not enriched in terminator or ularly well suited for the study of mRNA processing, as it
intergenic regions. Analysis of histone marks around these generates transcript sequence data from a library indepen-
transcripts has revealed that marks associated with tran- dently of the organism’s genome sequence. In case of RNA
scription elongation are present on the gene sequences but splicing, for instance, where tiling arrays require the design
not in the antisense direction, suggesting that productive of special sets of probes, sequencing relies only on an
elongation occurs mostly downstream of the TSS. In this appropriate mapping strategy able to retrieve reads con-
context, it is possible that these short RNAs mark regions taining non-adjacent sequences (Fig. 2a). Several strategies
of Pol II pausing [82]. A similar picture could be detected have been developed for this purpose. In one approach, the
in human fibroblasts where nascent RNAs have been set of reads which does not map properly to the reference
sequenced using NGS technology, providing an overview genome can successively be mapped against a reference
of the distribution of Pol II engaged in transcription at a sequence library containing all known or predicted exon–
given time [16]. This study concludes that a large amount exon junctions. Sequencing reads mapping across exon–
of Pol II is paused shortly after initiation. In addition, exon junctions (often called ‘‘trans-reads’’) are diagnostic
engaged Pol II has been detected in divergent direction for post-transcriptional rearrangements. While quite
relative to genes. However, the lack of sequencing reads straightforward and flexible, this approach is limited when
further upstream indicates that divergent Pol II does not it comes to discovering new, un-annotated splice junctions.
productively elongate transcripts [16]. These findings Alternatively, a reference sequence library of all possible
suggest that regulation of transcript elongation participates splice junctions instead of all known splice junctions could
in the control of gene expression. In summary, bidirec- be used for mapping. This approach would permit discovery
tional transcription at promoters seems to be a widespread of new splicing events. In another approach, sequencing
phenomenon conserved across evolution. Further investi- reads are either mapped allowing gaps in the alignment or
gation will now be required to understand what portion of split in two before mapping both halves back separately to
these divergent transcription events represents useless by- the reference genome. The reads, whose two halves do not
products of transcription initiation and what portion plays map next to each other, point to a post-transcriptional
regulatory roles. rearrangement or splicing event. This approach is poten-
tially extremely powerful as it does not rely on any genome
annotation. However, it requires sufficiently long sequenc-
Applying RNA-seq to interrogate post-transcriptional ing reads to be confidently mapped even if split in two.
gene regulation In addition to mapping the sites of post-transcriptional
rearrangements, trans-reads provide a quantitative mea-
Post-transcriptional regulation is a fundamental part of gene surement of the levels of different transcript isoforms.
expression, which may well match transcriptional control in Furthermore, the amount of trans-reads at a given exon–
importance and sophistication. It includes the control of exon junction relative to the amount of reads spanning the
alternative splicing and polyadenylation, RNA editing, corresponding exon–intron junctions provides a measure of
RNA degradation and translation. With the possible the splicing efficiency at this junction. This feature has been
exception of translational control, these processes involve exploited to sample splicing efficiencies across all introns
the modification of transcript sequences or structures. The and genes under different conditions in fission yeast [63]. A
sequences of the processed RNA molecules can therefore fourth strategy takes advantage of so-called paired-end
differ substantially from the corresponding genome sequencing. NGS sequencers have been up-graded to allow
sequences. Our understanding of the sequence motifs gov- sequencing both ends of each DNA fragment in the library.
erning post-transcriptional control improves steadily but In this case, the data consist of two sequencing reads per
does not yet allow prediction of mRNA processing events DNA fragment. The distance between the two reads is
574 S. Marguerat, J. Bähler

known as it equals the fragment size of the library. This common [48]. This finding indicates that tissue specific
development has been critical for making it much easier, for alternative splicing is an almost universal mode of tissue-
example, to map short reads in low complexity regions [85]. specific gene regulation. Extreme ‘‘switch-like’’ behav-
For the analysis of post-transcriptional rearrangements by iours, where two isoforms are mutually exclusive in two
RNA-seq, the paired reads that map much closer or farther distinct tissues, have also been detected [48]. In these
apart to each other than the insert size of the library can cases, alternative splicing can produce different proteins in
point to rearrangements. While being compatible with short different contexts. Interestingly, ‘‘switch-like’’ exons are
reads and not relying on any prior knowledge about the characterised by conserved regulatory motifs [48]. Differ-
regulatory motifs or genome annotation, this fourth ent spliced isoforms can also occur together in the same
approach does not provide direct base pair mapping of the tissues. An interesting study has applied RNA-seq to ana-
junction. An advantage of the first three strategies described lyse the transcriptome of single mouse cells [56]. The
above is that the exact splice junction or rearrangement authors report 335 genes that display multiple isoforms in a
point coordinates can be identified. single blastomere, indicating that alternative splicing can
Analysis of alternative splicing by RNA-seq has been also increase the diversity of the transcriptome of a single
performed recently on several human tissues [48, 49, 56] cell during embryonic development. Similar analyses per-
and cell lines [48, 55]. The ability to globally sample every formed in fission and budding yeasts have provided
possible splice isoform has uncovered a much larger interesting insights into how simpler unicellular eukaryotes
amount of alternative splicing in human tissues than pre- exploit alternative splicing as a mode of post-transcrip-
viously estimated. Considering different tissues, as many as tional regulation [59, 63]. In fission yeast, intron retention
95% of the human multi-exon genes have been found to seems to be the main event detected during sexual differ-
undergo alternative splicing, with exon skipping being the entiation. This finding has confirmed and extended
most frequent form of regulation [48, 49]. These results observations from smaller-scale studies [87]. In addition,
considerably increase previous estimates, which have global splicing efficiencies and transcript expression levels
suggested that about two-thirds of human genes are dif- seem to be positively correlated during vegetative growth
ferentially spliced [86]. Importantly, for 92% of genes, the and sexual differentiation, suggesting coordination
second most frequent isoform has a relative frequency between transcription and splicing [63]. A recent RNA-seq
above 15%, indicating that in most cases several isoforms study in budding yeast has uncovered many alternative
of the same transcript reach substantial levels of expression isoforms showing differential expression between vegeta-
[48]. Isoforms differ mostly between tissues, while tive growth and response to heat-shock [60]. Interestingly,
between individual variations are two- to threefold less some of these isoforms are possibly coding for proteins of

Fig. 2 Detection of post-transcriptional modifications and rearrange- measure of splicing efficiency. b Reads containing poly(A) tracts
ments by RNA-seq. a Reads spanning exon–exon junctions give which are not encoded in the reference genome are diagnostic of
positive evidence for splicing events (trans-reads in red). Comparing polyadenylation events. c Reads containing sequence polymorphisms
the number of trans-reads for a selected junction to the number of compared with the reference genome are potential polymorphisms or
reads spanning its corresponding exon–intron junctions (blue) gives a editing sites
RNA-seq: from technology to biology 575

different lengths. Taken together, these data show that Several CLIP-seq (also called HITS-CLIP, for high-
regulation of splicing is also used by unicellular eukaryotes throughput sequencing CLIP) studies have analysed the
to control and diversify gene expression. Finally, bioin- binding patterns of human splicing regulators in different
formatics tools helping to extract the respective expression cell types and tissues [96–98]. For example, analysis of the
levels of different transcript isoforms from RNA-seq data binding patterns of the neuron-specific splicing factor Nova
are becoming available and will help to refine the global has demonstrated that its binding to introns determines
picture of alternative splicing in eukaryotes [88, 89]. the outcome of alternative splicing while its binding to
A related mechanism by which transcript diversity can 30 -UTRs can regulate alternative polyadenylation [97]. RIP
be increased is the use of alternative polyadenylation sites. and CLIP-seq have also been used to characterise Ago-
RNA-seq is particularly well suited to study polyadenyla- RNA complexes in mouse, human and fission yeast [99–
tion as it allows direct sequencing of the junctions between 101]. The Ago protein binds small RNAs to form a core
poly(A) tails and the rest of the transcript (Fig. 2b). This RNA silencing complex. Sequencing the populations of
approach permits the disentangling of several isoforms microRNAs (miRNAs) and mRNAs bound to Ago proteins
with alternative polyadenylation sites in a single sample. in the mouse brain has allowed direct identification of in
For example, human cells show a strong correlation of vivo expressed miRNAs and their potential target tran-
alternative splicing and alternative polyadenylation scripts [99]. RIP-seq with Ago has led to the discovery of a
between tissues, suggesting coordination between these new class of small RNAs in humans, originating from
two processes [48]. Interestingly, alternative introns and 30 small nucleolar RNAs (snoRNA) which can function like
untranslated regions (UTRs) are sharing common regula- miRNAs [100].
tory motifs, suggesting that they also share regulatory Ribosomes are riboprotein complexes mediating the
factors [48]. translation of RNA transcripts into proteins and are prob-
Transcriptome diversity can also be increased by editing ably the most abundant RNA-binding proteins in the cell.
of mRNA transcripts. This process involves deamination of Studying the amount and position of ribosomes bound to
adenosines into inosines, which are then read as guano- transcripts globally can provide important information
sines. Editing is critical for brain function in mammals and about regulation of translation. To this end, total cellular
linked to several diseases [90]. However, the extent of this RNA is fractionated based on the amount of associated
phenomenon has remained elusive. Direct sequencing of ribosomes (‘‘polysome profiling’’) [102]. This technique
transcriptomes is the method of choice to understand how has provided information on basic properties of the trans-
prevalent is this mode of post-transcriptional regulation lation process. NGS technologies with their ability to detect
(Fig. 2c). Indeed, a pioneering RNA-seq analysis of human the exact sequence of short RNA molecules have now
brain and other tissues has revealed hundreds of new enabled a transition from genome-wide polysome profiling
editing sites, many of which are located in non-coding to genome-wide ribosome foot-printing [22]. Similarly to
RNAs [91]. the CLIP method outlined above, this approach is based on
Information about protein–RNA interactions is funda- the isolation of short RNA fragments occupied by ribo-
mental for the understanding of regulatory networks somes and hence protected from degradation by an
governing the different layers of post-transcriptional con- endonuclease. It permits not only the measurement of the
trol. Predicting protein–RNA binding sites is difficult not number of ribosomes associated with different transcripts
least due to the relatively low sequence conservation of but their exact positions along the RNA molecules. This
RNA binding motifs. Protein–RNA interactions can be method, termed ‘‘ribosome profiling’’, has been applied to
mapped directly, however, using approaches similar the budding yeast grown under two different physiological
chromatin immunoprecipitation technique used to identify conditions [22]. The ability to detect the distribution of
protein–DNA interactions [92]. This approach is achieved ribosomes on transcripts at maximum resolution has
in two ways: (1) RNA-binding proteins are immunopre- revealed that the density of ribosomes is not uniform across
cipitated together with their intact target transcripts (RIP) transcripts. All transcripts contain a region of constant
[93], or (2) RNA-binding proteins are crosslinked to the length at their 50 ends showing a high density of ribosomes
RNAs they interact with and treated with RNAse before [22]. This observation could explain the previously pub-
immunoprecipitation (CLIP for crosslinking immunopre- lished phenomenon that short transcripts tend to be much
cipitation) [94]. This second approach limits the analysis to more densely packed with ribosomes than large transcripts
RNA fragments protected by the binding protein and is [103, 104]. The amount of ribosomes found in introns and
reminiscent of a footprint. The immunoprecipitated RNAs 30 -UTRs is less than 1% of the ribosome density seen in
need eventually to be identified using either single-gene open reading frames (ORFs), indicating that retained
[94] or genome-wide methods [95]. NGS technologies introns are rarely translationally active. Moreover, many
have been successfully applied to these approaches. small ORFs (uORFs) are detected in the 50 -UTRs of genes,
576 S. Marguerat, J. Bähler

but their functional relevance remains elusive. The ribo- researchers is to creatively exploit the opportunities pro-
some density in these uORFs is significantly higher than in vided by those rapidly evolving technologies. Even more
other regions of the 50 -UTRs, indicating that pervasive powerful sequencing approaches are already on the hori-
translation occurs upstream of the ORF [22]. Surprisingly, zon. For example, ‘‘next-next-generation’’ sequencers such
a substantial amount of these uORFs are using non-AUG as the Helicos system, which can sequence millions of
start codons, thus unexpectedly increasing the scope of single molecules in parallel, are entering the market and
peptides that can be translated from a given transcript. seem to be suited to analyse RNA [109]. Truly, progress is
limited mainly by our imagination, and exciting times are
certainly ahead.
Conclusions and outlook
Acknowledgments We would like to thank Luis López-Maury,
Rachel Imoberdorf, Vera Pancaldi, Martin Převorovský, and Brian
Next-generation sequencing technologies are revolutionis- Wilhelm for critical reading of the manuscript. Research in our lab-
ing genomics research and beyond by enabling the much oratory is funded by Cancer Research UK and by PhenOxiGEn, an
more rapid and cost-effective generation of massive EU FP7 research project.
amounts of sequences compared to traditional Sanger
Open Access This article is distributed under the terms of the
sequencing. This technological breakthrough provides an Creative Commons Attribution Noncommercial License which per-
opportunity for regular research institutes and departments mits any noncommercial use, distribution, and reproduction in any
to engage in ambitious projects which so far have only been medium, provided the original author(s) and source are credited.
conceivable for large genome centers. The impact of NGS
technologies for the analysis of gene regulation is particu-
larly high. Within only two years, RNA-seq has reached a
point where recent state-of-the-art technologies such as References
high-density tiling arrays look almost old fashioned. It looks
likely that sequencing-based approaches will largely super- 1. López-Maury L, Marguerat S, Bähler J (2008) Tuning gene
sede hybridisation-based approaches within a few years. expression to changing environments: from rapid responses to
RNA-seq permits the sequencing and quantifying of tran- evolutionary adaptation. Nat Rev Genet 9:583–593
2. Shalon D, Smith SJ, Brown PO (1996) A DNA microarray
scriptomes at maximal resolution and dynamic range, system for analyzing complex DNA samples using two-color
independently of transcript size, and above all free from any fluorescent probe hybridization. Genome Res 6:639–645
preconception (or even knowledge) of the genomes they are 3. Schena M, Heller RA, Theriault TP, Konrad K, Lachenmeier E,
derived from. RNA-seq has started to change the way we Davis RW (1998) Microarrays: biotechnology’s discovery plat-
form for functional genomics. Trends Biotechnol 16:301–306
think about studying the complexity and dynamics of tran- 4. Bertone P, Gerstein M, Snyder M (2005) Applications of DNA
scriptomes and genome regulation. Early RNA-seq studies tiling arrays to experimental genome annotation and regulatory
have revealed more extensively expressed genomes and pathway discovery. Chromosome Res 13:259–274
more complex transcriptomes than anticipated, thus giving 5. Kapranov P, Willingham AT, Gingeras TR (2007) Genome-
wide transcription and the implications for genomic organiza-
insight into novel regulatory mechanisms. These pioneering tion. Nat Rev Genet 8:413–423
studies have also uncovered rich and extensive post-tran- 6. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995)
scriptional regulation of transcript structures and sequences. Serial analysis of gene expression. Science 270:484–487
RNA-seq will without doubt drive many more exciting 7. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH,
Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R,
discoveries within the next few years. For example, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon
sequencing of RNA from complex samples containing K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K,
more than one organism, either collected in the wild [105– Mao J, Corcoran K (2000) Gene expression analysis by mas-
108] or created in the laboratory, will ultimately provide sively parallel signature sequencing (MPSS) on microbead
arrays. Nat Biotechnol 18:630–634
information about transcriptome dynamics of living com- 8. Lister R, Gregory BD, Ecker JR (2009) Next is now: new
munities and interactions within ecosystems. On the other technologies for sequencing of genomes, transcriptomes, and
hand, sequencing of RNA from closely related species or beyond. Curr Opin Plant Biol 12:107–118
members of a population will give insight into the pro- 9. Marguerat S, Wilhelm BT, Bähler J (2008) Next-generation
sequencing: applications beyond genomes. Biochem Soc Trans
cesses linking transcriptome plasticity to phenotypic 36:1091–1096
diversity and evolution. Given sufficient sequencing depth, 10. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolu-
RNA-seq analysis of cell populations adapting to changing tionary tool for transcriptomics. Nat Rev Genet 10:57–63
environmental conditions could also reveal rare changes in 11. Wilhelm BT, Landry J (2009) RNA-Seq—quantitative mea-
surement of expression through massively parallel RNA-
transcript sequences that do not necessarily lead to an sequencing. Methods 48:249–257
increase in fitness, thus helping to understand evolutionary 12. Mardis ER (2008) Next-generation DNA sequencing methods.
mechanisms and dynamics. The main challenge for Annu Rev Genomics Hum Genet 9:387–402
RNA-seq: from technology to biology 577

13. Ansorge WJ (2009) Next-generation DNA sequencing tech- 31. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering
niques. N Biotechnol 25:195–203 splice junctions with RNA-Seq. Bioinformatics 25:1105–1111
14. He Y, Vogelstein B, Velculescu VE, Papadopoulos N, Kinzler 32. Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A, Brudno
KW (2008) The antisense transcriptomes of human cells. Sci- M (2009) SHRiMP: accurate mapping of short color-space
ence 322:1855–1857 reads. PLoS Comput Biol 5:e1000386
15. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry 33. Li R, Yu C, Li Y, Lam T, Yiu S, Kristiansen K, Wang J (2009)
CC, Millar AH, Ecker JR (2008) Highly integrated single-base SOAP2: an improved ultrafast tool for short read alignment.
resolution maps of the epigenome in Arabidopsis. Cell 133:523– Bioinformatics 25:1966–1967
536 34. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N,
16. Core LJ, Waterfall JJ, Lis JT (2008) Nascent RNA sequencing Marth G, Abecasis G, Durbin R (2009) The Sequence Align-
reveals widespread pausing and divergent initiation at human ment/Map format and SAMtools. Bioinformatics 25:2078–2079
promoters. Science 322:1845–1848 35. Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M,
17. Cloonan N, Forrest ARR, Kolle G, Gardiner BBA, Faulkner GJ, Turner DJ (2009) Amplification-free Illumina sequencing-
Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Rob- library preparation facilitates improved mapping and assembly
ertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham of (G?C)-biased genomes. Nat Methods 6:291–295
HE, Manning JM, McKernan KJ, Grimmond SM (2008) Stem 36. Warren RL, Sutton GG, Jones SJM, Holt RA (2007) Assembling
cell transcriptome profiling via massive-scale mRNA sequenc- millions of short DNA sequences using SSAKE. Bioinformatics
ing. Nat Methods 5:613–619 23:500–501
18. Li H, Lovci MT, Kwon Y, Rosenfeld MG, Fu X, Yeo GW 37. Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT,
(2008) Determination of tag density required for digital tran- Magrini V, Mardis ER, Dangl JL, Jones CD (2007) Extending
scriptome analysis: application to an androgen-sensitive prostate assembly of short DNA sequences to handle error. Bioinfor-
cancer model. Proc Natl Acad Sci USA 105:20179–20184 matics 23:2942–2944
19. Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han 38. Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2007)
B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM SHARCGS, a fast and highly accurate short-read assembly
(2009) Transcriptome sequencing to detect gene fusions in algorithm for de novo genomic sequencing. Genome Res
cancer. Nature 458:97–101 17:1697–1706
20. Sugarbaker DJ, Richards WG, Gordon GJ, Dong L, De Rienzo 39. Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK,
A, Maulik G, Glickman JN, Chirieac LR, Hartman M, Taillon Lander ES, Nusbaum C, Jaffe DB (2008) ALLPATHS: de novo
BE, Du L, Bouffard P, Kingsmore SF, Miller NA, Farmer AD, assembly of whole-genome shotgun microreads. Genome Res
Jensen RV, Gullans SR, Bueno R (2008) Transcriptome 18:810–820
sequencing of malignant pleural mesothelioma tumors. Proc 40. Chaisson MJ, Pevzner PA (2008) Short read fragment assembly
Natl Acad Sci USA 105:3521–3526 of bacterial genomes. Genome Res 18:324–330
21. Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, 41. Hernandez D, François P, Farinelli L, Osterås M, Schrenzel J
Yu L, Assefa SA, He M, Croucher NJ, Pickard DJ, Maskell DJ, (2008) De novo bacterial genome sequencing: millions of very
Parkhill J, Choudhary J, Thomson NR, Dougan G (2009) A short reads assembled on a desktop computer. Genome Res
strand-specific RNA-Seq analysis of the transcriptome of the 18:802–809
typhoid bacillus Salmonella typhi. PLoS Genet 5:e1000569 42. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo
22. Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS short read assembly using de Bruijn graphs. Genome Res
(2009) Genome-wide analysis in vivo of translation with 18:821–829
nucleotide resolution using ribosome profiling. Science 43. Bryant DW, Wong W, Mockler TC (2009) QSRA: a quality-
324:218–223 value guided de novo short read assembler. BMC Bioinfor-
23. Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, matics 10:69
Hallen L, Krobitsch S, Lehrach H, Soldatov A (2009) ran- 44. Birol I, Jackman SD, Nielsen C, Qian JQ, Varhol R, Stazyk G,
scriptome analysis by strand-specific sequencing of Morin RD, Zhao Y, Hirst M, Schein JE, Horsman DE, Connors
complementary DNA. Nucleic Acids Res JM, Gascoyne RD, Marra MA, Jones SJ (2009) De novo
24. Quinlan AR, Stewart DA, Strömberg MP, Marth GT (2008) Transcriptome Assembly with ABySS. Bioinformatics (in press)
Pyrobayes: an improved base caller for SNP discovery in 45. Schmidt B, Sinha R, Beresford-Smith B, Puglisi SJ (2009) A fast
pyrosequences. Nat Methods 5:179–181 hybrid short read fragment assembly algorithm. Bioinformatics
25. Rougemont J, Amzallag A, Iseli C, Farinelli L, Xenarios I, Naef 25:2279–2280
F (2008) Probabilistic base calling of Solexa sequencing data. 46. Carthew RW, Sontheimer EJ (2009) Origins and mechanisms of
BMC Bioinformatics 9:431 miRNAs and siRNAs. Cell 136:642–655
26. Whiteford N, Skelly T, Curtis C, Ritchie ME, Löhr A, Zaranek 47. Naqvi AR, Islam MN, Choudhury NR, Haq QMR (2009) The
AW, Abnizova I, Brown C (2009) Swift: primary data analysis fascinating world of RNA interference. Int J Biol Sci 5:97–117
for the Illumina Solexa sequencing platform. Bioinformatics 48. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C,
25:2194–2199 Kingsmore SF, Schroth GP, Burge CB (2008) Alternative isoform
27. Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing regulation in human tissue transcriptomes. Nature 456:470–476
reads and calling variants using mapping quality scores. Gen- 49. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (2008) Deep
ome Res 18:1851–1858 surveying of alternative splicing complexity in the human
28. Lin H, Zhang Z, Zhang MQ, Ma B, Li M (2008) ZOOM! Zil- transcriptome by high-throughput sequencing. Nat Genet
lions of oligos mapped. Bioinformatics 24:2431–2437 40:1413–1415
29. Li H, Durbin R (2009) Fast and accurate short read alignment 50. Zhao Q, Caballero OL, Levy S, Stevenson BJ, Iseli C, de Souza
with Burrows-Wheeler transform. Bioinformatics 25:1754–1760 SJ, Galante PA, Busam D, Leversha MA, Chadalavada K, Rogers
30. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast Y, Venter JC, Simpson AJG, Strausberg RL (2009) Transcrip-
and memory-efficient alignment of short DNA sequences to the tome-guided characterization of genomic rearrangements in a
human genome. Genome Biol 10:R25 breast cancer cell line. Proc Natl Acad Sci USA 106:1886–1891
578 S. Marguerat, J. Bähler

51. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) develop ESTs for the flesh fly Sarcophaga crassipalpis. BMC
RNA-seq: an assessment of technical reproducibility and com- Genomics 10:234
parison with gene expression arrays. Genome Res 18:1509–1517 67. Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford
52. Guffanti A, Iacono M, Pelucchi P, Kim N, Soldà G, Croft LJ, DL, Hanski I, Marden JH (2008) Rapid transcriptome charac-
Taft RJ, Rizzi E, Askarian-Amiri M, Bonnal RJ, Callari M, terization for a nonmodel organism using 454 pyrosequencing.
Mignone F, Pesole G, Bertalot G, Bernardi LR, Albertini A, Lee Mol Ecol 17:1636–1647
C, Mattick JS, Zucchi I, De Bellis G (2009) A transcriptional 68. Emrich SJ, Barbazuk WB, Li L, Schnable PS (2007) Gene
sketch of a primary human breast cancer by 454 deep discovery and annotation using LCM-454 transcriptome
sequencing. BMC Genomics 10:163 sequencing. Genome Res 17:69–73
53. Wu Q, Kim YC, Lu J, Xuan Z, Chen J, Zheng Y, Zhou T, Zhang 69. Barbazuk WB, Emrich SJ, Chen HD, Li L, Schnable PS (2007)
MQ, Wu C, Wang SM (2008) Poly A—transcripts expressed in SNP discovery via 454 transcriptome sequencing. Plant J
HeLa cells. PLoS ONE 3:e2803 51:910–918
54. Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh 70. Denoeud F, Aury J, Da Silva C, Noel B, Rogier O, Delledonne
T, McDonald H, Varhol R, Jones S, Marra M (2008) Profiling M, Morgante M, Valle G, Wincker P, Scarpelli C, Jaillon O,
the HeLa S3 transcriptome using randomly primed cDNA and Artiguenave F (2008) Annotating genomes with massive-scale
massively parallel short-read sequencing. BioTechniques 45:81– RNA sequencing. Genome Biol 9:R175
94 71. Trick M, Long Y, Meng J, Bancroft I (2009) Single nucleotide
55. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, polymorphism (SNP) discovery in the polyploid Brassica napus
Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, using Solexa transcriptome sequencing. Plant Biotechnol J
Schmidt D, O’Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo 7:334–346
M (2008) A global view of gene activity and alternative splicing 72. Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick
by deep sequencing of the human transcriptome. Science ME, Bergman NH (2009) Structure and complexity of a bacte-
321:956–960 rial transcriptome. J Bacteriol 191:3203–3211
56. Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, 73. Mao C, Evans C, Jensen RV, Sobral BW (2008) Identification of
Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA new genes in Sinorhizobium meliloti using the Genome
(2009) mRNA-Seq whole-transcriptome analysis of a single cell. Sequencer FLX system. BMC Microbiol 8:72
Nat Methods 6:377–382 74. Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H,
57. ‘t Hoen PAC, Ariyurek Y, Thygesen HH, Vreugdenhil E, Jones S, Zhao Y, Hirst M & Marra MA (2009) Next-generation
Vossen RHAM, de Menezes RX, Boer JM, van Ommen GB, den tag sequencing for cancer gene expression profiling. Genome
Dunnen JT (2008) Deep sequencing-based expression analysis Res
shows major advances in robustness, resolution and inter-lab 75. Mitelman F, Johansson B, Mertens F (2007) The impact of
portability over five microarray platforms. Nucleic Acids Res translocations and gene fusions on cancer causation. Nat Rev
36:e141 Cancer 7:233–245
58. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B 76. Struhl K (2007) Transcriptional noise and the fidelity of initia-
(2008) Mapping and quantifying mammalian transcriptomes by tion by RNA polymerase II. Nat Struct Mol Biol 14:103–105
RNA-Seq. Nat Methods 5:621–628 77. Wyers F, Rougemaille M, Badis G, Rousselle J, Dufour M,
59. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein Boulay J, Régnault B, Devaux F, Namane A, Séraphin B, Libri
M, Snyder M (2008) The transcriptional landscape of the yeast D, Jacquier A (2005) Cryptic pol II transcripts are degraded by a
genome defined by RNA sequencing. Science 320:1344–1349 nuclear quality control pathway involving a new poly(A) poly-
60. Yassour M, Kaplan T, Fraser HB, Levin JZ, Pfiffner J, Adiconis merase. Cell 121:725–737
X, Schroth G, Luo S, Khrebtukova I, Gnirke A, Nusbaum C, 78. Neil H, Malabat C, d’Aubenton-Carafa Y, Xu Z, Steinmetz LM,
Thompson D, Friedman N, Regev A (2009) Ab initio con- Jacquier A (2009) Widespread bidirectional promoters are the
struction of a eukaryotic transcriptome by massively parallel major source of cryptic transcripts in yeast. Nature 457:1038–
mRNA sequencing. Proc Natl Acad Sci USA 106:3264–3269 1042
61. Bloom J, Khan Z, Kruglyak L, Singh M, Caudy A (2009) 79. Xu Z, Wei W, Gagneur J, Perocchi F, Clauder-Münster S,
Measuring differential gene expression by short read sequenc- Camblong J, Guffanti E, Stutz F, Huber W, Steinmetz LM
ing: quantitative comparison to 2-channel gene expression (2009) Bidirectional promoters generate pervasive transcription
microarrays. BMC Genomics 10:221 in yeast. Nature 457:1033–1037
62. Lee A, Hansen KD, Bullard J, Dudoit S, Sherlock G (2008) 80. Jiang C, Pugh BF (2009) Nucleosome positioning and gene
Novel low abundance and transient RNAs in yeast revealed by regulation: advances through genomics. Nat Rev Genet 10:161–
tiling microarrays and ultra high-throughput sequencing are not 172
conserved across closely related yeast species. PLoS Genet 81. Preker P, Nielsen J, Kammler S, Lykke-Andersen S, Christensen
4:e1000299 MS, Mapendano CK, Schierup MH, Jensen TH (2008) RNA
63. Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, exosome depletion reveals transcription upstream of active
Goodhead I, Penkett CJ, Rogers J, Bähler J (2008) Dynamic human promoters. Science 322:1851–1854
repertoire of a eukaryotic transcriptome surveyed at single- 82. Seila AC, Calabrese JM, Levine SS, Yeo GW, Rahl PB, Flynn
nucleotide resolution. Nature 453:1239–1243 RA, Young RA, Sharp PA (2008) Divergent transcription from
64. Hillier LW, Reinke V, Green P, Hirst M, Marra MA, Waterston active promoters. Science 322:1849–1851
RH (2009) Massively parallel sequencing of the polyadenylated 83. Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Will-
transcriptome of C. elegans. Genome Res 19:657–666 ingham AT, Stadler PF, Hertel J, Hackermüller J, Hofacker IL,
65. Torres TT, Metta M, Ottenwälder B, Schlötterer C (2008) Gene Bell I, Cheung E, Drenkow J, Dumais E, Patel S, Helt G, Ganesh
expression profiling by massively parallel sequencing. Genome M, Ghosh S, Piccolboni A, Sementchenko V, Tammana H,
Res 18:172–177 Gingeras TR (2007) RNA maps reveal new RNA classes and a
66. Hahn DA, Ragland GJ, Shoemaker DD, Denlinger DL (2009) possible function for pervasive transcription. Science 316:1484–
Gene discovery using massively parallel pyrosequencing to 1488
RNA-seq: from technology to biology 579

84. Taft RJ, Glazov EA, Cloonan N, Simons C, Stephen S, Faulkner 97. Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark
GJ, Lassmann T, Forrest ARR, Grimmond SM, Schroder K, TA, Schweitzer AC, Blume JE, Wang X, Darnell JC, Darnell
Irvine K, Arakawa T, Nakamura M, Kubosaki A, Hayashida K, RB (2008) HITS-CLIP yields genome-wide insights into brain
Kawazu C, Murata M, Nishiyori H, Fukuda S, Kawai J, Daub alternative RNA processing. Nature 456:464–469
CO, Hume DA, Suzuki H, Orlando V, Carninci P, Hayashizaki 98. Sanford JR, Wang X, Mort M, Vanduyn N, Cooper DN, Mooney
Y, Mattick JS (2009) Tiny RNAs associated with transcription SD, Edenberg HJ, Liu Y (2009) Splicing factor SFRS1 recog-
start sites in animals. Nat Genet 41:572–578 nizes a functionally diverse landscape of RNA transcripts.
85. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Genome Res 19:381–394
Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, Taillon BE, 99. Chi SW, Zang JB, Mele A, Darnell RB (2009) Argonaute HITS-
Chen Z, Tanzer A, Saunders ACE, Chi J, Yang F, Carter NP, CLIP decodes microRNA–mRNA interaction maps. Nature
Hurles ME, Weissman SM, Harkins TT, Gerstein MB, Egholm 460:479–486
M, Snyder M (2007) Paired-end mapping reveals extensive 100. Ender C, Krek A, Friedländer MR, Beitzinger M, Weinmann L,
structural variation in the human genome. Science 318:420–426 Chen W, Pfeffer S, Rajewsky N, Meister G (2008) A human
86. Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, snoRNA with microRNA-like functions. Mol Cell 32:519–528
Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD 101. Bühler M, Spies N, Bartel DP, Moazed D (2008) TRAMP-
(2003) Genome-wide survey of human alternative pre-mRNA mediated RNA surveillance prevents spurious entry of RNAs
splicing with exon junction microarrays. Science 302:2141– into the Schizosaccharomyces pombe siRNA pathway. Nat
2144 Struct Mol Biol 15:1015–1023
87. Averbeck N, Sunder S, Sample N, Wise JA, Leatherwood J 102. Melamed D, Arava Y (2007) Genome-wide analysis of mRNA
(2005) Negative control contributes to an extensive program of polysomal profiles with spotted DNA microarrays. Meth Enzy-
meiotic splicing in fission yeast. Mol Cell 18:491–498 mol 431:177–201
88. Jiang H, Wong WH (2009) Statistical inferences for isoform 103. Arava Y, Wang Y, Storey JD, Liu CL, Brown PO, Herschlag D
expression in RNA-Seq. Bioinformatics 25:1026–1032 (2003) Genome-wide analysis of mRNA translation profiles in
89. Zheng S, Chen L (2009) A hierarchical Bayesian model for Saccharomyces cerevisiae. Proc Natl Acad Sci USA 100:3889–
comparing transcriptomes at the individual transcript isoform 3894
level. Nucleic Acids Res 37:e75 104. Lackner DH, Beilharz TH, Marguerat S, Mata J, Watt S,
90. Maas S, Kawahara Y, Tamburro KM, Nishikura K (2006) A-to-I Schubert F, Preiss T, Bähler J (2007) A network of multiple
RNA editing and human disease. RNA Biol 3:1–9 regulatory layers shapes gene expression in fission yeast. Mol
91. Li JB, Levanon EY, Yoon J, Aach J, Xie B, Leproust E, Zhang Cell 26:145–155
K, Gao Y, Church GM (2009) Genome-wide identification of 105. Bailly J, Fraissinet-Tachet L, Verner M, Debaud J, Lemaire M,
human RNA editing sites by parallel DNA capturing and Wésolowski-Louvel M, Marmeisse R (2007) Soil eukaryotic
sequencing. Science 324:1210–1213 functional diversity, a metatranscriptomic approach. ISME J
92. Kuo MH, Allis CD (1999) In vivo cross-linking and immuno- 1:632–642
precipitation for studying dynamic protein:DNA associations in 106. Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I
a chromatin environment. Methods 19:425–433 (2008) Detection of large numbers of novel sequences in the
93. Gerber AP, Herschlag D, Brown PO (2004) Extensive associa- metatranscriptomes of complex marine microbial communities.
tion of functionally and cytotopically related mRNAs with Puf PLoS One 3:e3042
family RNA-binding proteins in yeast. PLoS Biol 2:E79 107. Gilbert JA, Thomas S, Cooley NA, Kulakova A, Field D, Booth
94. Ule J, Jensen K, Mele A, Darnell RB (2005) CLIP: a method T, McGrath JW, Quinn JP, Joint I (2009) Potential for phos-
for identifying protein–RNA interaction sites in living cells. phonoacetate utilization by marine bacteria in temperate coastal
Methods 37:376–386 waters. Environ Microbiol 11:111–125
95. Wang Z, Tollervey J, Briese M, Turner D, Ule J (2009) CLIP: 108. Poretsky RS, Hewson I, Sun S, Allen AE, Zehr JP, Moran MA
Construction of cDNA libraries for high-throughput sequencing (2009) Comparative day/night metatranscriptomic analysis of
from RNAs cross-linked to proteins in vivo. Methods 48:287– microbial communities in the North Pacific subtropical gyre.
293 Environ Microbiol 11:1358–1375
96. Yeo GW, Coufal NG, Liang TY, Peng GE, Fu X, Gage FH 109. Lipson D, Raz T, Kieu A, Jones DR, Giladi E, Thayer E,
(2009) An RNA code for the FOX2 splicing regulator revealed Thompson JF, Letovsky S, Milos P, Causey M (2009) Quanti-
by mapping RNA–protein interactions in stem cells. Nat Struct fication of the yeast transcriptome by single-molecule
Mol Biol 16:130–137 sequencing. Nat Biotechnol 27:652–658

You might also like