Classif lncRNA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

TIGS-1196; No.

of Pages 13

Review

The Landscape of long noncoding RNA


classification
Georges St. Laurent2,3, Claes Wahlestedt4, and Philipp Kapranov1,2
1
Institute of Genomics, School of Biomedical Sciences, Huaqiao Univerisity, 668 Jimei Road, Xiamen, China 361021
2
St. Laurent Institute, 317 New Boston St., Suite 201, Woburn, MA 01801 USA
3
Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, 185 Meeting Street, Providence, RI 02912, USA
4
Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of
Medicine, 1501 NW 10th Ave, Miami, FL 33136 USA

Advances in the depth and quality of transcriptome functionality of ncRNAs [13], in processes ranging from
sequencing have revealed many new classes of long heritable epigenetic change [14] to species-specific changes
noncoding RNAs (lncRNAs). lncRNA classification has in cognition [15], may finally answer the long-standing
mushroomed to accommodate these new findings, even question of the role of ncDNA in eukaryotic biology [16,17].
though the real dimensions and complexity of the non- Although there has been an emphasis on the annotation
coding transcriptome remain unknown. Although and classification of ncRNAs with properties similar to
evidence of functionality of specific lncRNAs continues those of protein-coding mRNAs, the vast majority of geno-
to accumulate, conflicting, confusing, and overlapping mic space used for RNA production remains underex-
terminology has fostered ambiguity and lack of clarity in plored. Moreover, although some ncRNAs share
the field in general. The lack of fundamental conceptual properties with coding mRNAs, such as splicing and poly-
unambiguous classification framework results in a num- adenylation [18,19], sequence conservation [18,19], and
ber of challenges in the annotation and interpretation of export to the cytosol [20], many others do not, highlighting
noncoding transcriptome data. It also might undermine the differences in the functionalities of coding versus
integration of the new genomic methods and datasets in ncRNAs [5,11,13,21–24].
an effort to unravel the function of lncRNA. Here, we The state of noncoding annotation is still in its early days,
review existing lncRNA classifications, nomenclature, but the field now has sufficient perspective to establish a
and terminology. Then, we describe the conceptual
guidelines that have emerged for their classification
and functional annotation based on expanding and more
comprehensive use of large systems biology-based Glossary
datasets.
50 -cap: an altered nucleotide present at 50 ends of a eukaryotic RNA and vital for
its functioning.
The ncRNA universe ENCODE project: the Encyclopedia of DNA Elements; a public research
The classic view of the transcriptome landscape and its consortium launched in September 2003 by the National Human Genome
Research Institute. The goal of the project is to identify all functional elements
mRNA-centric paradigm for transcript annotation has in the human genome sequence.
undergone a fundamental change [1,2]. The ENCODE Endogenous retrovirus (ERV): a genomic element that was traced back to a
project (see Glossary) estimates that (mostly noncoding) retrovirus integrated into an ancestral genome and since retained. ERV
sequences comprise 8% of the human genome.
transcripts cover 62–75% of our genome [3], and contribute Expressed sequence tag (EST): a relatively short and typically partial sequence
greatly to the overall estimate of 80% potentially function- of a longer RNA molecule.
FANTOM consortium: an international research consortium established by
al sequence in our DNA [4]. Similarly, RNAseq studies
scientists at RIKEN, Japan in 2000, initially to assign functional annotations to
show that transcripts from these noncoding regions domi- the full-length cDNAs collected during the Mouse Encyclopedia Project.
nate the population of nonribosomal, nonmitochondrial FANTOM has since developed and expanded over time to encompass different
fields of transcriptome analysis.
RNAs in a human cell [5]. ncRNAs have emerged as a Genomic bin approach: an approach designed to detect differentially
major source of biomarkers [6–10], targets for therapeutics expressed regions of the genome in the regions where no annotation is
[8,11], and potential explanations for the function of non- available.
Long tandem repeat (LTR): identical pieces of DNA found at the ends of
coding variants from genome-wide association studies retroviruses and critical for viral life cycle. LTRs contain elements required for
(GWAS) [12]. Constant expansion of the evidence for broad viral gene expression. LTRs of ERVs often retain these elements and thus can
initiate or control expression of host transcripts.
Paraspeckle: a subcellular compartment that could be identified in nuclear
Corresponding authors: Wahlestedt, C. ([email protected]); interchromatin space.
Kapranov, P. ([email protected]). Polycomb repressive complex 2 (PRC2): a multi-protein complex that reversibly
Keywords: long non-coding RNA; annotation of long non-coding RNAs; classification modifies chromatin structure and silences target genes.
of long non-coding RNAs; function of long non-coding RNAs; transcriptome; systems Tiling microarray: a microarray design (typically oligonucleotide-based) where
biology; lncRNA; lincRNA; vlincRNA. probes interrogate an entire genomic region of interest at regular intervals
agnostic of genomic annotations. This design differs from other microarrays
0168-9525/
that target only specific genomic features of interest, like exons of known
ß 2015 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2015.03.007
genes.

Trends in Genetics xx (2015) 1–13 1


TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

logical conceptual framework for the classification of the researchers in the classification of ncRNAs and interpre-
universe of transcripts that emanate from noncoding geno- tation of next-generation sequencing (NGS) data, especial-
mic regions. New methodologies for integrated classification ly in noncoding portions of the genome.
have accompanied a massive expansion of global transcrip-
tome datasets, particularly from genomics consortia such as Criteria and features of existing classes and categories
FANTOM [25], ENCODE [3], and GTEx [26]. Methods for of lncRNAs
grouping RNA sequencing (RNA-seq) (Box 1) reads into a The classification of the great majority of lncRNAs relies on
single transcribed region have improved rapidly [27– the empirical attributes originally used to detect them
30]. Progress has also been made in machine-learning (Table 1, Figure 1). This reflects their short history relative
approaches aimed at the integration and biological inter- to protein-coding genes, and provides a convenient basis by
pretation of diverse datasets [29,31,32]. All this has culmi- which to classify these uncharacterized RNA species.
nated in widely used sets of annotations of lncRNAs, such as
the one provided by the GENCODE consortium [33], and Classification based on transcript length
others [31,34–37]. The length estimate of ncRNAs serves as the most com-
Here, we review existing lncRNA classes and then monly used attribute for their classification. Typically, a
describe the conceptual guidelines that have emerged for threshold of 200 bases separates long from short ncRNAs
their classification and functional annotation based on the [38,39] (Table 1). Often, our knowledge is limited to se-
expanding and more efficient use of large systems-biology- quence reads mapped to a ‘region of transcription’, and
based datasets. This framework endeavours to guide even with improvements in NGS read length [40], this will
probably continue for the foreseeable future.
Building transcribed regions based on RNA-seq profil-
ing of total RNA (rather than the polyA+ fraction, see
below) led to the discovery that intergenic space encodes
Box 1. Overview of high-throughput technologies used to
thousands of very long intergenic ncRNAs (vlincRNAs),
detect and quantify ncRNAs
whose primary transcripts can range in length from 50 kb
RNA sequencing (RNA-seq): currently, one of the most commonly to 1 Mb [28,29,41]. Spanning at least 10% of the human
used procedures in transcriptome profiling. Typically, RNA is genome [5,29], vlincRNAs have been implicated in impor-
converted into cDNA using random hexamers followed by massive
random sequencing of the resulting cDNAs using NGS technologies.
tant biological processes such as pluripotency [29], cancer
As a result, millions of short sequence tags can be generated per [28,29], apoptosis [29], cell cycle progression [28,42], and
experiment. Subsequent mapping of the tags reveals the genomic cellular senescence [41].
position encoding the RNA and its relative mass in the cell. The
procedure is suitable for various aspects of transcriptome research: Classification based on association with annotated
RNA mapping, quantitation, alternative splicing analysis etc.
mRNA sequencing (mRNA-seq): RNA-seq on polyA+ fraction of
protein-coding genes
RNA, often synonymous with RNA-seq. This commonly used attribute (Table 1, Figure 1) serves as
Direct RNA sequencing (DRS): sequencing of native RNA, without the foundation of the GENCODE classification of lncRNAs
library preparation including cDNA conversion step [143], has been [33]. It underlies the logical challenge of overlapping non-
successfully used to sequence native polyA+ and identify alternative
coding and coding transcripts at a given locus – called
polyadenylation sites. DRS is particularly useful in applications
where artifacts of reverse transcription are undesirable, such as ‘transcriptional forests’ by the FANTOM consortium
precise strand of origin determination, and in applications that deal [43]. Targeted methods (Box 1) based on rapid amplifica-
with minute amounts of nucleic acids such as single-cell applica- tion of cDNA ends (RACE) experiments [44] and RNA-seq
tions. Theoretically, it can provide multiple tags per molecule, [45] indicate that transcriptional forests constitute a gen-
however so far it has been used in applications that provide a single
eral feature of the human genome. A prominent category of
tag per molecule at the polyadenylation site.
Cap assisted gene expression (CAGE): a transcriptome profiling ncRNAs has emerged from these transcriptional forests
procedure that targets RNAs with a 50 -cap [144]. CAGE generates composed of sense ncRNAs that overlap coding mRNAs on
short (typically 27 nucleotides) sequence tags from 50 ends of such the same strand and share some sequence with the latter,
RNAs, with one tag per RNA molecule. It enables accurate mapping yet do not encode proteins [44,46–48]. This category
of 50 ends this subset of RNAs.
Serial analysis of gene expression (SAGE): targets polyadenylated
includes unspliced sense partially intronic RNAs (PINs)
messages and generates a single internal (typically close to the 30 [49], and spliced transcripts that combine exons from
end) tag per RNA molecule [145]. coding and noncoding regions of a gene [47,48]. GENCODE
Paired-end tag (PET): also targets polyadenylated RNAs and recognizes the existence of such spliced lncRNAs in their
generates a tag that combines information on 50 and 30 ends of ‘sense overlapping’ biotype [33].
the same RNA molecule [146].
Rapid amplification of cDNA ends (RACE): an ‘outward’ PCR-
The PIN and sense overlapping categories allow for over-
based method designed to identify sequences connected to a given lap between lncRNAs and exons of a protein-coding gene.
region, which can be used in conjunction with NGS or microarrays, However, a protein-coding gene can produce lncRNAs found
for deep transcriptome profiling of a specific locus [44]. exclusively in its introns, known as totally intronic RNAs
Targeted RNA sequencing: selection of RNAs from a locus of
(TINs) [49] (Table 1, Figure 1). TINs make up the majority
interest using tiling microarrays followed by RNA-seq to achieve the
same goal [45]. (70%) of all non-coding (non-rRNA) nuclear-encoded RNA
GRO-seq: A typical RNA profiling experiment measures steady- and 40–50% of all cellular (non-rRNA) RNAs by mass, as
state levels of RNA. By contrast, GRO-seq [134] combines nuclear established by single-molecule sequencing [50]. Evidence
run-on experiments and NGS analysis to provide information on that large numbers of introns encode stand-alone RNAs
transcription competent RNA polymerase complexes.
originally came from microarray expression profiling
2
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

Table 1. Different known classes of lncRNAs


Category Abbreviation Refs Specific examples
Classification based on transcript length
Long noncoding RNA lncRNA [38,39]
Long-intergenic noncoding RNA; large intervening noncoding RNA, lincRNA [18] ANRIL [117], H19 [147],
long-intervening noncoding RNA HOTAIR [18], HOTTIP
[148], lincRNA-p21 [149],
XIST [150], Paupar [151]
Very long intergenic noncoding RNA vlincRNA [29] HELLP transcript [42],
Vlinc_21, vlinc_185,
vlinc_377, vlinc_500 [29]
macroRNA [28,152] Airn, Gtl2lt, KCNQOT1,
Lncat, Nespas (reviewed in
[152]), STAiR1 [28]
Promoter-associated long RNA PALR [38]
Classification based on association with annotated protein-coding genes
Intronic ncRNA; stable intronic sequence RNA; totally intronic RNA, sisRNA, TIN, PIN [49,50,54] additional
partially intronic RNA references in the text
Circular intronic RNAs ciRNAs [55]
Sense ncRNA [44]
Natural antisense ncRNA asRNA, NAT [57] BACE1-AS [153],
aHIF [154], Tsix [155]
Mirror antisense [44,67] Globin antisense [67]
Exonic circular RNAs ecircRNAs [62] cANRIL [118]
Chimeric RNAs, trans-spliced RNAs, exon juxtaposition [44,63–65]
Stand-alone ncRNAs made from 30 UTRs uaRNA [60]
Chromatin-interlinking RNA ciRNA [68]
Transcription start site-associated RNAs TSSa-RNAs [156]
Classification based on association with other DNA elements of known function
Enhancer-associated RNA eRNA [157]
Promoter-associated long RNA PALR [38]
Upstream antisense RNA uaRNA [158]
PROMoter uPstream Transcript PROMPT [89]
Telomeric repeat-containing RNA TERRA [159]
Classification based on protein-coding RNA resemblance
mRNA-like noncoding RNAs mlncRNAs [18]
Long-intergenic noncoding RNA; large intervening noncoding RNA, lincRNA [18] ANRIL [117], H19 [147],
long-intervening noncoding RNA HOTAIR [18], HOTTIP
[148], lincRNA-p21 [149],
XIST [150]
Classification based on association with repeats
C0T-1 repeat RNA [160]
Long interspersed nuclear element LINE1/2 [161]
Transcribed endogenous retroviruses [81]
Expressed Satellite Repeats [162]
Non-coding RNA driven by promoters within repeats vlincRNAs, NASTs [29,76] Vlinc_21, vlinc_185,
vlinc_377, vlinc_500 [29]
Polypurine-repeat-containing RNA GRC-RNA [163]
Transcribed pseudogenes [83] PTENP1 and KRASP1 [86]
Classification based on association with a biochemical pathway or stability
Nrd1-unterminated transcript NUT [164]
miRNA primary transcripts [165] H19 [166]
piRNA primary transcripts [167]
Cryptic unstable transcript CUT [88]
PROMoter uPstream Transcript PROMPT [89]
Xrn1-sensitive unstable transcript XUT [91]
Stable Uncharacterized Transcript, Stable Unannotated Transcript SUT [92]
Classification based on sequence and structure conservation
Transcribed-ultraconserved regions T-UCR [95] UCR106 [95]
Hypoxia-induced noncoding ultraconserved transcript HINCUT [100]
Long-intergenic noncoding RNA; large intervening noncoding RNA, lincRNA [18] HOTAIR [18], HOTTIP [148]
long-intervening noncoding RNA
RNA-Z regions [97]
EvoFold regions [98]

3
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

Table 1 (Continued )
Category Abbreviation Refs Specific examples
Classification based on expression in different biological states
Long stress-induced noncoding transcript LSINCT [101]
Hypoxia-induced noncoding ultraconserved transcript HINCUT [100]
Non-Annotated Stem Transcript NAST [76]
Classification based on association with subcellular structures
Chromatin-associated RNA CAR [102]
Chromatin-interlinking RNA ciRNA [68]
Nuclear bodies associated RNAs [168]
PRC2 associated RNAs [19,103]
Classification based on function
Long noncoding RNAs with enhancer-like function; ncRNA-activating ncRNA-a [108] ncRNA-a7 [108]
miRNA primary transcripts [165] H19 [166]
piRNA primary transcripts [167]
Competing endogenous RNA ceRNA [109] PTENP1 and KRASP1 [86]

[49,51] and in silico analysis of expressed sequence tag (EST) in reporter assays than those that do not [70], arguing for a
databases [49,52]. Even genomes as compact as those of functional, rather than spurious, link between RNA and this
human viruses can encode functional intronic RNAs type of genomic element.
[53]. Large numbers of stand-alone intronic RNAs recently
found in Xenopus oocytes [54] and mice [50] support the Classification based on mRNA resemblance
conclusion that introns encode functional ncRNAs on a As mentioned previously, research has focussed on ncRNAs
global scale. Some of these transcripts likely represent with a spliced structure, conserved sequence, and a polyA
circular intronic ncRNAs (ciRNAs) (produced from introns tail [18,19,34,35,71] (Figure 1). In fact, lncRNAs annotated
that escape debranching) that can accumulate in cells and by GENCODE – even those solely confined to intronic
regulate expression of their parent genes [55] (Figure 1). sequences – represent primarily spliced transcripts
Overlap on the opposite DNA strand from their associated [33]. These features were used to identify thousands of
protein-coding gene represents another frequently used at- transcripts in mice [18] and humans [19], called long inter-
tribute of lncRNAs. These natural antisense transcripts vening ncRNAs (lincRNAs) [18]. This approach has revealed
(NATs) (Figure 1) occur in 50–70% of all protein-coding genes many important functional lncRNAs, such as HOTAIR,
[56,57]. which mediates gene silencing by facilitating localization
ncRNAs can also be composed solely of sequences of of the epigenetic repressor Polycomb repressive complex
exons of protein-coding mRNAs (Figure 1, Table 1). For (PRC)2 to its target sequence [72]. Expression analysis of
example, transcript cleavage followed by post-transcrip- 10 000 human lincRNAs across 1300 tumour samples
tional 50 -cap addition [58,59] can result in production of using microarrays revealed hundreds of noncoding tran-
stand-alone ncRNAs from various parts of mRNAs [58], scripts potentially driving four different cancer types
notably 30 untranslated regions (UTRs) [60]. In fact, the [73]. Numerous other studies have implicated lincRNAs
type 0 variant of the cap structure may associate with the in human development and disease [74]. As an indication
post-transcriptionally capped 50 ends [61]. Additional cel- of their functionality, expression analysis of 11 tetrapod
lular processes could produce this type of ncRNA, such as species found 2508 lincRNAs expressed in at least three
the reverse splicing implicated in the production of circular species and originating >90 million years ago [71].
exonic RNAs [62], trans-splicing leading to production of
chimeric RNAs [63,64], exon juxtaposition [65], and pre- Classification based on association with repeats
sumed RNA copying, leading to production of ‘mirror anti- About half of the human genome consists of repeats of
sense’ transcripts [44,66,67]. Finally, RNAs whose various categories, and many ncRNA-encoding genomic
sequences have features of bona fide coding transcripts regions overlap these elements (Figure 1). Promoters with-
may have other roles as revealed by the class of chromatin- in repeats drive expression of many ncRNAs [75], especial-
interlinking RNAs (ciRNAs). These RNAs participate in ly in pluripotent [29,76,77] and cancerous cells
maintaining interphase chromatin configuration and [29]. Promoters within the long tandem repeats (LTRs)
mostly include spliced transcripts with long 30 UTRs [68]. of endogenous retroviruses specifically associate with
ncRNAs from several classes of nonannotated stem tran-
Classification based on association with other DNA scripts (NASTs) [76] in pluripotent stem cells, including
elements of known function lincRNAs [77] and vlincRNAs [29] (Table 1). In addition,
Notable classes of such RNAs include enhancer- and pro- LTR-driven vlincRNAs identify common regulatory archi-
moter-associated long RNAs (Table 1, Figure 1). These long tectures between stem and cancer cells [29]; an interesting
RNAs are involved in linking the dynamics of nuclear reminder of ideas from the stem cell theory of cancer [78].
architecture, chromatin signalling plasticity, and transcrip- Individual copies of repeats are expressed from their
tional regulation [69]. Interestingly, enhancers that give own promoters and contribute to the ncRNA transcrip-
rise to RNA species have greater likelihood of functionality tome. For example, RNA polymerase (Pol) III transcribes
4
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

T-UCR
repeat-associated ncRNA

vlincRNA

eRNA
lincRNA

PALR, TSSa-RNA
LOCUS 1 Intronic RNA:PIN uaRNA
CAR,
ciRNA

Intronic RNA:TIN
Upstream Ansense RNA, Natural ansense RNA
ncRNA-a Prompt, TSSa-RNA

ciRNAs
ecircRNAs

sense RNA

trans-spliced ncRNA

transcribed pseudogene, ceRNA

LOCUS 2

LOCUS 3

TRENDS in Genetics

Figure 1. Schematic diagram illustrating various classes of ncRNAs. Three hypothetical loci are shown. Protein coding exons are shown as green (locus 1) or yellow boxes
(locus 3). Locus 2 signifies a pseudogene of locus 1. Regulatory regions of locus 1 are shown in purple (promoter) and magenta (enhancer). Repeats are denoted by brown
boxes. Lines with arrows represent ncRNAs. The role depicted here for CARs and ciRNAs in stabilising a chromatin loop is hypothetical. Abbreviations: CAR, chromatin-
associated RNA; ceRNA, competing endogenous RNA; ciRNA, chromatin-interlinking RNA (grey) or circular intronic RNA (green); eRNA, enhancer-associated RNA;
ecircRNA, exonic circular RNA; lincRNA, long intervening non-coding RNA; ncRNA, noncoding RNA; ncRNA-a, activating ncRNAs; PALR, promoter-associated long RNA;
PIN, partially intronic RNA; TIN, totally intronic RNA; TSSa-RNA, transcription-start-site-associated RNA; T-UCR, transcribed ultraconserved regions; uaRNA, 30 UTR-derived
RNAs; vlincRNA, very long intergenic ncRNA.

noncoding repeated elements, such as Alu, B1, and B2, that binding and titrating regulatory molecules that normally
can bind to RNA Pol II and affect its activity in response to interact with the functional copies [85,86]. Moreover,
stress [79]. Long interspersed nuclear elements (LINEs) pseudogenes can be transcribed from the opposite strand
comprise 20% of the genome and express mostly noncoding thus producing transcripts capable of intermolecular
transcripts due to 30 truncation and accumulated muta- interaction with the productive copy [87] or its promoter
tions [80]. Similarly, expression of noncoding endogenous [85].
retroviruses (ERVs) is a well-documented phenomenon
[81]. Finally, examination of transcripts containing repeat Classification based on a biochemical pathway or
copies continues to reveal additional regulatory functions stability
for these molecules, as exemplified by Alu-mediated inter- Classification of ncRNA based on their association with
molecular interaction of coding and ncRNAs in trans de- substrate pools of different RNA degradation pathways
scribed recently [82]. and enzymes has recently gained popularity. Inhibition of
Transcripts from a specific subset of repeated sequences – components of the exosome (RRP6, RRP40, and RRP44) or
noncoding copies of protein-coding genes or mRNAs (pseu- nonsense-mediated decay (XRN1) has revealed popula-
dogenes) [83] – have gained prominence upon realisation tions of ncRNAs not easily observed in wild type cells
that they can function in various ways [84], including [88–92] (Table 1). This approach also provides information
5
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

about the pathways of their metabolism. The latter is Classification based on function
another attribute used for classification of ncRNAs, as lncRNAs can participate in a plethora of different cellular
exemplified by XUTs (Xrn1-sensitive unstable transcripts) processes: chromatin remodelling, regulation of transcrip-
[91] (Table 1). Most of the pathways analysed so far in this tion and translation, RNA stability, scaffolding, and
classification involve RNA degradation, and these innate immunity. We discuss here only examples of
lncRNAs overlap with classes of NATs [91] and promot- functions used for classification, and direct the reader to
er-associated RNAs [89,90,92]. other reviews focusing on lncRNA molecular mechanisms
[6,7,13,24,39,107].
Classification based on sequence or structure Activating ncRNAs (ncRNA-a), which have enhancer-
conservation like properties, represents an example of classification
Sequence conservation, though highly informative in pre- based on function (Table 1). This class is distinguished
dicting functional protein-coding mRNAs, remains a met- from enhancer RNAs (eRNAs) [108] by positively regulat-
ric with controversial merit in the noncoding space. Its ing nearby genes (Figure 1). One notable member of this
absence – typical for lncRNAs [93] – does not universally class, designated ncRNA-a7, regulates the Snai1 transcrip-
imply lack of functionality [22,24]. Still, many ultra-con- tion factor. Depleting ncRNA-a7 leads to major phenotypic
served regions (UCRs) – sequences of DNA 100% conserved changes at both cellular and molecular levels [108]. The
in humans, rats, and mice – map to the noncoding space of category of ncRNA-a will probably continue to grow, as the
the genome [94]. A large number of UCRs are transcribed accumulation of high-quality expression datasets identify
as ncRNAs, and some are associated with malignant states more lncRNAs that positively correlate with nearby genes
[95]. As secondary RNA structure plays a crucial role in (St. Laurent et al., unpublished).
ncRNA function [24,96], a number of bioinformatics Another example is competing endogenous RNAs (ceR-
approaches such as RNA-Z [97] and EvoFold [98] leverage NAs) [109]. They share sequence similarity with protein-
structure rather than sequence conservation to predict coding transcripts and function by competing for regulato-
ncRNA-encoding regions (Table 1) [99]. ry molecules [109]. Any ncRNA sharing a sequence with
another (coding or noncoding) RNA could potentially be a
Classification based on biological states ceRNA, such as transcribed pseudogenes, which represent
A number of cancer-associated transcribed UCRs (T- important ceRNAs [86] (Figure 1). Conceivably, the ceR-
UCRs) encoding ncRNAs were induced by hypoxia and NAs could form part of a complex regulatory matrix driven
thus further subclassified as hypoxia-induced noncoding by differential affinity among many contextually associat-
ultraconserved transcripts (HINCUTs) [100]. They serve ed RNA molecules [110].
as an example of another attribute used for ncRNA classi- Some lncRNAs serve as precursors for shorter
fication: induction after treatment with a stimulus or functional RNAs as exemplified by primary transcripts
association with a certain biological state. Another exam- for mi- and piwi-interacting RNAs (piRNAs) (Table 1). In
ple is long stress-induced noncoding transcripts (LSINCTs) fact, the long and short cleavage products could have
[101]. distinct functions, as demonstrated by short noncoding
tRNA-like molecule produced during maturation of
Classification based on subcellular localisation metastasis associated lung adenocarcinoma transcript
RNA localisation can provide important clues to its func- 1 (MALAT1) lncRNA [111]. The ENCODE consortium
tion. ncRNAs tend to be enriched in the nucleus [38,56], estimates that 6% of all annotated coding and noncoding
which suggests their involvement in the temporal–spatial transcripts overlap with short RNAs [3]. A recent report
regulation of nuclear architecture. For example, chroma- suggests that an 18-nucleotide short RNA produced from
tin-associated RNAs (CARs) – both intronic and intergenic a coding mRNA regulates translation [112]. Also, ncRNAs
– form an integral component of chromatin, with the derived from 30 ends of mRNAs associate with Argonaute
potential to regulate the expression of nearby genes proteins, suggesting they represent novel regulatory
[102] (Figure 1). The ENCODE consortium performed molecules [90]. Short RNAs derived from protein-coding
extensive profiling of three subnuclear compartments transcripts can also mediate transgenerational silencing
(chromatin, nucleolus, and nucleoplasm) revealing their of the parent gene [14]. Conceivably, cleavage could also
RNA compositions [3]. Within the nucleus, ncRNA associ- generate functional lncRNAs from a longer precursor
ation with, and targeting of, the gene-silencing PRC2 ncRNA, where both the precursor and the product could
complex led to the identification of thousands of PRC2- have different functions.
associated ncRNAs in mouse embryonic stem cells [103] Finally, we note that not every lncRNA transcript func-
and human cell lines [19]. ncRNAs form components of tions solely as a noncoding element. Peptide sequencing
other nuclear subcompartments such as paraspeckles, the data revealed presence of 250 novel mouse peptides
nucleolus, and the nuclear matrix [104]. Presumably, ad- encoded by presumed lncRNAs [113]. The full extent of
ditional classes of ncRNAs associated with these and other the novel mammalian proteome encoded by lncRNAs is not
compartments likely await discovery. Interestingly, some yet clear however. Although many lncRNAs appear to
lncRNAs localise to the cytosol [105] and actually associate associate with ribosomes [105,114], this frequently does
with ribosomes. Even the small mitochondrial genome not result in protein synthesis [115,116], but instead could
encodes lncRNAs [106], underscoring the variety of differ- reveal lncRNA regulation of translation [105]. Nonetheless,
ent processes in which these transcripts could participate the protein-coding potential is currently used as one metric
(see below). in lncRNA definition [37].
6
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

Challenges of current lncRNA classification following the current schemes, in the future one might
As described above, the existing classifications of lncRNAs expect hundreds of overlapping classes of ncRNAs as new
rest on their descriptive and distinctive properties: from knowledge is incorporated into the classification. There are
their size, to their localization, to their function. For exam- already at least 50 associations with a multitude of biologi-
ple, the GENCODE system, as one of the few practical and cal or biochemical processes (Table 1). Sixth, an attribute
up-to-date classifications available, also classifies lncRNAs used in the current classifications may decline over time in
into antisense RNA or lincRNA, in addition to intron- relevance or utility. Considering the growing role of trans
associated biotypes [33]. Although logical principles guide regulation by ncRNAs via intermolecular interactions
these classifications, they have inherited a number of [42,82,119], the fact that a lncRNA associates with an
unavoidable shortcomings. First, the existing classes enhancer, promoter, or intron, or is antisense to a known
capture a small fraction of lncRNAs present in the cell gene may not reflect the actual function of that ncRNA.
as illustrated in Figure 2. The various lists of lncRNAs Instead, the latter could function by interacting with tran-
annotated based on resemblance to protein-coding mRNAs scripts derived from elsewhere in a genome.
account for only 0.05–1.12% of cellular RNA (Figure 2),
while functional intronic RNAs could constitute as much as The consolidated conceptual framework of lncRNA
16% [50]. Second, the overlap between multiple existing classification
annotations of lncRNAs derived by different groups is The concepts driving lncRNA classification have begun to
small [33]. Third, the descriptions of the classes in the benefit from recent developments in the annotation of
current annotation schemes can be vague. For example, a noncoding transcripts and the dramatically improved tech-
lncRNA could initiate at an enhancer element or initiate a niques for measuring them. Below, we review the concep-
large distance away and merely overlap it, yet currently tual components that provide the basis for these ongoing
they would both be classified as eRNAs. Fourth, the existing improvements (Figure 3).
classes are not mutually exclusive. Thus, a lincRNA could
theoretically also classify as an eRNA, and an LSINCT, and Tier 1: mapping the longest unprocessed transcript
a CAR, and a T-UCF. For example, ANRIL is a lncRNA The fact that the existing lists of lncRNAs miss a large
[117], a NAT [117], and a circular RNA [118]. This point is fraction of ncRNA mass (Figure 2) argues that the annota-
particularly problematic, as few datasets cover all of tion process has to start at a higher level. The first logical
these characteristics and therefore many ncRNAs are not step in this effort is mapping the longest non-coding tran-
comprehensively assessed. Fifth, they lack systematisation: script (Figure 3).

Classificaon based on:


Transcript length:
15
1 Intergenic macroRNAs (vlincRNAs) [28]
2 VlincRNAs [29] 16
Associaon with annotated
protein-coding genes: 17
3 Introns (esmated from mouse data [50])
4 Intronic ansense [51]
29,9

Relave expression (%)


Associaon with other annotated
genomic elements:
5 Promoter-associated ncRNAs
6 Transcribed enhancers [70] 18,4
Protein-coding mRNA resemblance:
7 Intronic sense (GENCODE) [33]
8 lincRNAs [34] 4 13 11,3
9 lincRNAs [71] 7
10 lincRNAs (UCSC Browser) 12
11 lincRNAs [77]
12 TUCPs [34] 7,0
13 lincRNAs (GENCODE) [33]
10,11
Associaon with repeats: 9
14 Distal LTR-driven vlincRNAs [29] 4,3
6
Sequence or structure conservaon:
15 Ultra Conserved Elements [94] 5
16 EvoFold [98]
17 RNA-Z [97]
8 64
)
n (%

Expression in different biological states:


18 Differenally-expressed
intergenic macroRNAs 32
ao

(vlincRNAs) [28]
3
erv

16
0,016 2
ons

14
0,06 18
ce c

8
0,25 1
uen

Relave 1 4
mass (%) 4
Seq

16
TRENDS in Genetics

Figure 2. Properties of different published lists of human transcripts representing various classes of ncRNAs. Sequence conservation was defined by the conserved
elements from the Vertebrate MULTIZ Alignment & Conservation (100 Species) database from the University of California Santa Cruz (UCSC) Genome Browser
[169]. Relative conservation represents the fraction of conserved bases relative to the total lengths for each list of ncRNAs. Relative mass and expression levels represent
averages of several malignant and normal tissues profiled using single-molecule RNA-seq analysis [5,29]. Only non-ribosomal RNA reads uniquely mapping to the nuclear
genome were considered. Relative mass represents proportion of reads mapping to a particular genomic element relative to all reads. The relative expression is the relative
mass divided by the total length of each list and normalized to the relative expression of coding exons (defined by UCSC Genes). Promoter-associated RNAs were defined
by the regions 3 kb upstream of annotated start sites of UCSC Genes. Given the lack of a comprehensive list of standalone human intronic RNAs, we extrapolated the
relative mass of those based on mouse data [50]. The GENCODE annotations [33] are based on v19. Abbreviations: lincRNA, long intervening non-coding RNA; LTR; long
tandem repeat; ncRNA, noncoding RNA; TUCP, transcript of uncertain coding potential; vlincRNA, very long intergenic non-coding RNA.

7
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

RNAseq data
protein-coding gene

Assembly of RNAseq data


Analysis of global paerns
in intergenic regions into
of expression to idenfy
standalone ncRNA genes
standalone ncRNA that are part of
annotated loci

Tier 1: longest ncRNAs


ncRNA gene 1 ncRNA transcript 2
Tier 1: longest ncRNAs
Promoter regions

Grouping together shorter annotaons

Tier 2: processed mRNA-like ncRNAs


ncRNA transcript 3

ncRNA transcript 4

Tier 3: level of expression of each transcript


Ti e 1
Ti e 2
Ti e 3

X
Ti e 4
Ti e 5
6

.......................................................................................

ue
ue
u
u
u
u
u
ss
ss
ss
ss

ss
ss
ss
Ti

Ti
ncRNA gene 1
ncRNA transcript 2
ncRNA transcript 3
ncRNA transcript 4
............................................................

ncRNA transcript X

Tier 4: modificaons
ncRNA gene 1 ncRNA transcript 2

ncRNA transcript 3

TRENDS in Genetics

Figure 3. Outline of the consolidated conceptual framework of ncRNA classification. Highly accurate empirical RNA-seq data drives both annotation and quantification of
the longest ncRNA (Tier 1) and of processed ncRNA species (Tier 2) across the entire genome. The quantitation data serves as the basis for the combined global matrix of
knowledge of expression of each (coding and noncoding) RNA gene and transcript across multiple biological sources (Tier 3). This information provides the input for the
functional annotation of non-coding transcripts using systems biology approaches. Mapping of RNA modifications provides the final layer of knowledge in this scheme.
Abbreviations: ncRNA, noncoding RNA.

Subdividing the intergenic space into standalone a single locus. As an illustration, the clinically important
ncRNA loci (genes) has obvious benefits. First, it allows 8q24 region upstream of the MYC gene contains a number
for consolidation of disparate and often incomplete of different lncRNA elements [12] (Figure 4). Given the
ncRNAs represented by ESTs, lincRNAs, and mRNAs, into distances that separate them, it may not be obvious that
8
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

:
500 kb hg19
chr8 127,700,000 127,900,000 128,100,000 128,300,000 128,500,000 128,700,000

H1-ESC Nuclear polyA- (+) strand

NHEK Nuclear polyA- (+) strand

NHEK Nuclear polyA- (-) strand

NHEK ENCODE promoters


CCAT1 RNA
MYC UCSC Genes
vlincRNAs St.Laurent et al
CCAT2 RNA
rs6983267
lincRNAs

GWAS catalog
Obesity-related Prostate cancer Chronic lymphocyc Breast cancer Response to andepressant Bladder cancer
traits leukemia Colorectal cancer treatment

TRENDS in Genetics

Figure 4. A genomic view of the 8q24 region upstream of the human MYC gene. This clinically important locus containing many GWAS hits associated with several cancers
represents an example of a genomic region that could clearly benefit from the new annotation scheme. The RNA-seq analysis reveals fairly strong signal on both strands
covering most of this >1 Mbp region. Yet, the known lncRNA annotations represent only a small fraction of this locus and judging by the distribution of the RNA-seq signal
and known promoters, are likely part of much larger transcript units (for example vlincRNAs shown on the figure). Transcriptome RNA-seq data are represented by the
polyA- nuclear RNA from normal epidermal keratinocytes (NHEK) and embryonic stem cells (H1) generated by the ENCODE consortium [3]. In addition, vlincRNAs [29],
promoters [32], and disease-associated variants from GWASs [170] are shown. Abbreviations: GWAS, genome-wide association study; lncRNA, long noncodingRNA;
vlincRNA, very long intergenic noncoding RNA. Reproduced, with permission, from [12].

these annotations are part of the same transcript, yet the [121,122] provides additional information on completeness
RNA-seq signal clearly groups them together into one locus of such maps. Application of highly sensitive methods
associated with a specific regulatory region (Figure 4). targeted to specific regions using RACE [47] or capture-
Second, such a grouping would allow experiments to focus sequencing [45] would increase the discovery of processed
on the locus rather than its many different genomic ele- species. Multiple levels of processing exist, such as A to I
ments, allowing for seamless integration of the data from editing [123,124] and others [125] each under their own
independent experiments. Third, it would clarify the issue regulatory control.
of RNA association with different genomic features, for
example enhancers, by showing whether the transcripts Tier 3: the additional dimension of expression levels
originate from these DNA elements, or merely overlap In the past, genomic coordinates alone determined geno-
them. Overall, the longest transcripts would serve as mic annotation. However, in the case of overlapping tran-
scaffolds to bring together all the disparate annotations scripts it cannot predict which isoforms likely function in a
into gene-like structures with their own specific transcrip- particular tissue. Thus, progress of our understanding of
tion regulatory regions, helping to resolve the problem of the complexities of the transcriptome (Box 1) argues for an
overlap. In this case, promoter information and CAGE tag additional dimension – expression of each RNA produced
data (Box 1) [25] would help in both assessing the quality of at a given locus (Figure 3). The pioneering efforts by the
the map and understanding the regulation of such genes. FANTOM Consortium [25] make this undertaking possi-
A lncRNA may not always be produced from its own ble.
dedicated promoter, as exemplified by circular intronic
RNAs [55]. Such stand-alone functional intronic RNAs Tier 4: RNA modifications
would however have certain features, such as low correla- A map of all (>100) RNA modifications [125] constitutes
tion with other exons or introns of the same gene, relatively the final layer of annotation (Figure 3). These patterns
high expression levels with low variance, and occasionally, could represent an information rich source for distinguish-
differential expression in a biological time course ing RNA molecules, thus assisting in their classification.
[50]. These properties can now be measured by highly So far, technological limitations prevent us from efficient
quantitative analysis of RNA levels across multiple diverse genome-wide mapping of most RNA modifications, and
samples. Thus, defining standalone transcripts would re- assaying technically accessible modifications is fraught
quire an additional dimension – quantitative measure- with pitfalls [124], such as false discovery due to techno-
ment to allow for analysis of coexpression with multiple logical noise [126]. Hopefully, existing [127] and emerging
neighbouring transcripts. technologies [128] will enable progress in cataloguing RNA
modifications.
Tier 2: defining processed transcripts
The transcriptional forest concept implies that multiple From consolidated conceptual framework to function
RNA species share the same genomic space, either tran- The first key of the new framework is consolidation,
scribed independently or derived by processing of longer achieved by grouping disparate lncRNA transcripts into
precursors [38,43,120]. Mapping sites of polyadenylation genes or standalone Tier 1 transcripts. The second aspect
9
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

moves from phenomenological description of lncRNAs to emerged, which rely on an integrative framework of anno-
their genomic coordinates, which parallels the evolution of tation and classification. This framework increases empha-
the concept of the gene [129]. The third aspect uses empir- sis on the quality of genome-wide RNA measurements to
ical data to unravel the layers of overlapping transcripts, allow for the ready integration of data from multiple types
as exemplified by the study of intronic RNAs in mice [50]. of experiments. It facilitates the development of improved
The fourth aspect assigns functional weight to a lncRNA tools for the integration of the highly multi-dimensional
by integrating information from diverse high throughput data from these experiments into the classification frame-
methods [130]. Among others, these methods include cross- work, thereby revealing associations between both coding
linking immunoprecipitation (CLIP)-seq [131] for detec- and non-coding transcripts. Finally, it supports the ratio-
tion and measurement of RNA – protein interactions, nal and structured selection of subsets of these predictions
selective 20 -hydroxyl acylation analyzed by primer exten- for biological follow-up using reductionist methods.
sion (SHAPE)-seq [132] for analysis of RNA secondary
structure, chromatin isolation by RNA purification
(ChIRP)-seq [133] for measurement of RNA–chromatin Online links
interactions, and global run-on (GRO)-seq (Box 1) to mea- FANTOM Consortium: http://fantom.gsc.riken.jp/
sure transcription [134]. This multifaceted approach com- ENCODE Consortium: http://www.genome.gov/encode/
bines independent sources of evidence for the functionality St. Laurent Institute: http://www.stlaurentinstitute.com/
of an RNA molecule, underscoring the complexities of Database of RNA modifications: http://mods.rna.albany.
lncRNA involvement in the flow of biological information edu/
[24,135]. Fortunately, new machine-learning methods NIH Roadmap Epigenomics project: http://www.
have emerged to identify and decipher complex patterns roadmapepigenomics.org/
in the data, yielding probabilistic evaluation of ncRNA
function across large populations of transcripts [71,136]. Acknowledgements
The evolution of the conceptual framework of ncRNA We wish to thank Maxim Ri, Denis Antonets and Dmitry Shtokalo for
classification described above provides a roadmap for the help with the bioinformatics analysis and Mark Mazaitis and Anna
analysis of an RNA-seq experiment and its integration into Miminoshvili and for expert assistance with the figure preparations.
Studies on long noncoding RNAs in the Wahlestedt laboratory are
a broader knowledge base of high-throughput multidimen-
currently supported by the US National Institute of Health awards
sional information. The availability of a common set of DA035592, MH084880 and NS071674.
genomic coordinates for various stages of ncRNA proces-
sing (Figure 3, Tiers 1 and 2) provides the key resource that References
enables the integration of data from multiple experiments. 1 Kapranov, P. et al. (2002) Large-scale transcriptional activity in
Tiers 3 and 4 assist in refinement of the classification by chromosomes 21 and 22. Science 296, 916–919
separating overlapping transcripts that have different 2 Okazaki, Y. et al. (2002) Analysis of the mouse transcriptome based on
functional annotation of 60,770 full-length cDNAs. Nature 420, 563–
patterns of gene expression and modification.
573
For effective data integration, systems biology 3 Djebali, S. et al. (2012) Landscape of transcription in human cells.
approaches require expression datasets that cover large Nature 489, 101–108
numbers of biological sources with extraordinary precision 4 Bernstein, B.E. et al. (2012) An integrated encyclopedia of DNA
[137]. Small yet biologically important effects [138] can be elements in the human genome. Nature 489, 57–74
5 Kapranov, P. et al. (2010) The majority of total nuclear-encoded non-
lost in technological noise [139]. Similarly, loss of ncRNAs ribosomal RNA in a human cell is ‘dark matter’ un-annotated RNA.
and transcriptome complexity can occur during library BMC Biol. 8, 149
preparation [140] and RNA isolation [141]. Processing of 6 Clark, B.S. and Blackshaw, S. (2014) Long non-coding RNA-
NGS data also presents a number of challenges. For exam- dependent transcriptional regulation in neuronal development and
disease. Front. Genet. 5, 164
ple, algorithms building transcripts (Tier 1) should account
7 Gibb, E.A. et al. (2011) The functional role of long non-coding RNA in
for regions of the genome that have poor alignability due to human carcinomas. Mol. Cancer 10, 38
repetitive regions [142]. NGS reads unassigned after these 8 Qureshi, I.A. and Mehler, M.F. (2013) Long non-coding RNAs: novel
steps can then serve as input into ab initio algorithms such targets for nervous system disease diagnosis and therapy.
as the genomic binning approach [50,139] to detect differ- Neurotherapeutics 10, 632–646
9 Reis, E.M. and Verjovski-Almeida, S. (2012) Perspectives of long non-
entially expressed regions [27–30]. Without a doubt, cycles
coding RNAs in cancer diagnostics. Front. Genet. 3, 32
of iterations consisting of annotation, expression measure- 10 Vergara, I.A. et al. (2012) Genomic ‘‘dark matter’’ in prostate cancer:
ment, and addition of new transcripts and transcribed exploring the clinical utility of ncRNA as biomarkers. Front. Genet. 3,
regions from global RNA measurement experiments will 23
illuminate the puzzle of pervasive transcription. 11 Wahlestedt, C. (2013) Targeting long non-coding RNA to
therapeutically upregulate gene expression. Nat. Rev. Drug Discov.
12, 433–446
Concluding remarks 12 St Laurent, G. et al. (2014) Dark matter RNA illuminates the puzzle of
Assigning functions to the mass of lncRNAs produced in genome-wide association studies. BMC Med. 12, 97
the cell requires novel thinking and approaches. Many of 13 Clark, M.B. et al. (2013) The dark matter rises: the expanding world of
the classic reductionist methods that worked well for regulatory RNAs. Essays Biochem. 54, 1–16
14 Liebers, R. et al. (2014) Epigenetic regulation by heritable RNA. PLoS
coding genes have proven less useful to the challenges of Genet. 10, e1004296
deciphering the elaborate populations of transcripts gen- 15 Smalheiser, N.R. (2014) The RNA-centred view of the synapse: non-
erated by pervasive ncRNA transcription. Instead, global, coding RNAs and synaptic plasticity. Philos. Trans. R. Soc. Lond. B:
systems-biology and genomics-driven approaches have Biol. Sci. 369

10
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

16 Penman, S. (1991) If genes just make proteins and our proteins are the 45 Mercer, T.R. et al. (2012) Targeted RNA sequencing reveals the deep
same, then why are we so different? J. Cell. Biochem. 47, 95–98 complexity of the human transcriptome. Nat. Biotechnol. 30, 99–104
17 Zuckerkandl, E. (1981) A general function of noncoding polynucleotide 46 Denoeud, F. et al. (2007) Prominent use of distal 50 transcription start
sequences. Mass binding of transconformational proteins. Mol. Biol. sites and discovery of a large number of additional exons in ENCODE
Rep. 7, 149–158 regions. Genome Res. 17, 746–759
18 Guttman, M. et al. (2009) Chromatin signature reveals over a 47 Djebali, S. et al. (2008) Efficient targeted transcript discovery via
thousand highly conserved large non-coding RNAs in mammals. array-based normalization of RACE libraries. Nat. Methods 5, 629–
Nature 458, 223–227 635
19 Khalil, A.M. et al. (2009) Many human large intergenic noncoding 48 Makrythanasis, P. et al. (2009) Variation in novel exons (RACEfrags)
RNAs associate with chromatin-modifying complexes and affect gene of the MECP2 gene in Rett syndrome patients and controls. Hum.
expression. Proc. Natl. Acad. Sci. U.S.A. 106, 11667–11672 Mutat. 30, E866–E879
20 Zhang, K. et al. (2014) The ways of action of long non-coding RNAs in 49 Nakaya, H.I. et al. (2007) Genome mapping and expression analyses of
cytoplasm and nucleus. Gene 547, 1–9 human intronic noncoding RNAs reveal tissue-specific patterns and
21 Kapranov, P. and St Laurent, G. (2012) Dark matter RNA: existence, enrichment in genes related to regulation of transcription. Genome
function, and controversy. Front. Genet. 3, 60 Biol. 8, R43
22 Mattick, J.S. (2009) The genetic signatures of noncoding RNAs. PLoS 50 St Laurent, G., III et al. (2012) Intronic RNAs constitute the major
Genet. 5, e1000459 fraction of the non-coding RNA in mammalian cells. BMC Genomics
23 Morris, K.V. and Mattick, J.S. (2014) The rise of regulatory RNA. Nat. 13, 504
Rev. Genet. 15, 423–437 51 Fachel, A.A. et al. (2013) Expression analysis and in silico
24 St Laurent, G., III and Wahlestedt, C. (2007) Noncoding RNAs: characterization of intronic long noncoding RNAs in renal cell
couplers of analog and digital information in nervous system carcinoma: emerging functional associations. Mol. Cancer 12, 140
function? Trends Neurosci. 30, 612–621 52 Engelhardt, J. and Stadler, P.F. (2012) Hidden treasures in unspliced
25 FANTOM Consortium and the RIKEN PMI and CLST (DGT) (2014) A EST data. Theory Biosci. 131, 49–57
promoter-level mammalian expression atlas. Nature 507, 462–470 53 Moss, W.N. and Steitz, J.A. (2013) Genome-wide analyses of Epstein–
26 Lonsdale, J. et al. (2013) The Genotype-Tissue Expression (GTEx) Barr virus reveal conserved RNA structures and a novel stable
project. Nat. Genet. 45, 580–585 intronic sequence RNA. BMC Genomics 14, 543
27 Guttman, M. et al. (2010) Ab initio reconstruction of cell type-specific 54 Gardner, E.J. et al. (2012) Stable intronic sequence RNA (sisRNA), a
transcriptomes in mouse reveals the conserved multi-exonic structure new class of noncoding RNA from the oocyte nucleus of Xenopus
of lincRNAs. Nat. Biotechnol. 28, 503–510 tropicalis. Genes Dev. 26, 2550–2559
28 Hackermuller, J. et al. (2014) Cell cycle, oncogenic and tumor 55 Zhang, Y. et al. (2013) Circular intronic long noncoding RNAs. Mol.
suppressor pathways regulate numerous long and macro non- Cell 51, 792–806
protein coding RNAs. Genome Biol. 15, R48 56 Cheng, J. et al. (2005) Transcriptional maps of 10 human
29 St Laurent, G. et al. (2013) VlincRNAs controlled by retroviral chromosomes at 5-nucleotide resolution. Science 308, 1149–1154
elements are a hallmark of pluripotency and cancer. Genome Biol. 57 Katayama, S. et al. (2005) Antisense transcription in the mammalian
14, R73 transcriptome. Science 309, 1564–1566
30 Trapnell, C. et al. (2012) Differential gene and transcript expression 58 Affymetrix ENCODE Transcriptome Project; Cold Spring Harbor
analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Laboratory ENCODE Transcriptome Project (2009) Post-
Protoc. 7, 562–578 transcriptional processing generates a diversity of 50 -modified long
31 Boley, N. et al. (2014) Genome-guided transcript assembly by and short RNAs. Nature 457, 1028–1032
integrative analysis of RNA sequence data. Nat. Biotechnol. 32, 59 Otsuka, Y. et al. (2009) Identification of a cytoplasmic complex that
341–346 adds a cap onto 50 -monophosphate RNA. Mol. Cell. Biol. 29, 2155–
32 Ernst, J. et al. (2011) Mapping and analysis of chromatin state 2167
dynamics in nine human cell types. Nature 473, 43–49 60 Mercer, T.R. et al. (2011) Expression of distinct RNAs from 30
33 Harrow, J. et al. (2012) GENCODE: the reference human genome untranslated regions. Nucleic Acids Res. 39, 2393–2403
annotation for The ENCODE Project. Genome Res. 22, 1760–1774 61 Abdelhamid, R.F. et al. (2014) Multiplicity of 50 cap structures present
34 Cabili, M.N. et al. (2011) Integrative annotation of human large on short RNAs. PLoS ONE 9, e102895
intergenic noncoding RNAs reveals global properties and specific 62 Jeck, W.R. et al. (2013) Circular RNAs are abundant, conserved, and
subclasses. Genes Dev. 25, 1915–1927 associated with ALU repeats. RNA 19, 141–157
35 Jia, H. et al. (2010) Genome-wide computational identification and 63 Djebali, S. et al. (2012) Evidence for transcript networks composed of
manual annotation of human long noncoding RNA genes. RNA 16, chimeric RNAs in human cells. PLoS ONE 7, e28213
1478–1487 64 Finta, C. and Zaphiropoulos, P.G. (2002) Intergenic mRNA molecules
36 Amaral, P.P. et al. (2011) lncRNAdb: a reference database for long resulting from trans-splicing. J. Biol. Chem. 277, 5882–5890
noncoding RNAs. Nucleic Acids Res. 39, D146–D151 65 Zaphiropoulos, P.G. (1999) RNA molecules containing exons
37 Volders, P.J. et al. (2015) An update on LNCipedia: a database for originating from different members of the cytochrome P450 2C
annotated human lncRNA sequences. Nucleic Acids Res. 43, D174– gene subfamily (CYP2C) in human epidermis and liver. Nucleic
D180 Acids Res. 27, 2585–2590
38 Kapranov, P. et al. (2007) RNA maps reveal new RNA classes and a 66 Kapranov, P. et al. (2010) New class of gene-termini-associated
possible function for pervasive transcription. Science 316, 1484–1488 human RNAs suggests a novel RNA copying mechanism. Nature
39 Wang, K.C. and Chang, H.Y. (2011) Molecular mechanisms of long 466, 642–646
noncoding RNAs. Mol. Cell 43, 904–914 67 Volloch, V. et al. (1996) Antisense globin RNA in mouse erythroid
40 Sharon, D. et al. (2013) A single-molecule long-read survey of the tissues: structure, origin, and possible function. Proc. Natl. Acad. Sci.
human transcriptome. Nat. Biotechnol. 31, 1009–1014 U.S.A. 93, 2476–2481
41 Lazorthes, S. et al. (2015) A vlincRNA participates in senescence 68 Caudron-Herger, M. et al. (2011) Coding RNAs with a non-coding
maintenance by relieving H2AZ-mediated repression at the INK4 function: maintenance of open chromatin structure. Nucleus 2, 410–424
locus. Nat. Commun. 6, 5971 69 Rinn, J.L. and Chang, H.Y. (2012) Genome regulation by long
42 van Dijk, M. et al. (2012) HELLP babies link a novel lincRNA to the noncoding RNAs. Annu. Rev. Biochem. 81, 145–166
trophoblast cell cycle. J. Clin. Invest. 122, 4003–4011 70 Andersson, R. et al. (2014) An atlas of active enhancers across human
43 Carninci, P. et al. (2005) The transcriptional landscape of the cell types and tissues. Nature 507, 455–461
mammalian genome. Science 309, 1559–1563 71 Necsulea, A. et al. (2014) The evolution of lncRNA repertoires and
44 Kapranov, P. et al. (2005) Examples of the complex architecture of the expression patterns in tetrapods. Nature 505, 635–640
human transcriptome revealed by RACE and high-density tiling 72 Gupta, R.A. et al. (2010) Long non-coding RNA HOTAIR reprograms
arrays. Genome Res. 15, 987–997 chromatin state to promote cancer metastasis. Nature 464, 1071–1076

11
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

73 Du, Z. et al. (2013) Integrative genomic analyses reveal clinically 102 Mondal, T. et al. (2010) Characterization of the RNA content of
relevant long noncoding RNAs in human cancer. Nat. Struct. Mol. chromatin. Genome Res. 20, 899–907
Biol. 20, 908–913 103 Zhao, J. et al. (2010) Genome-wide identification of polycomb-
74 Harries, L.W. (2012) Long non-coding RNAs and human disease. associated RNAs by RIP-seq. Mol. Cell 40, 939–953
Biochem. Soc. Trans. 40, 902–906 104 Bergmann, J.H. and Spector, D.L. (2014) Long non-coding RNAs:
75 Faulkner, G.J. et al. (2009) The regulated retrotransposon modulators of nuclear structure and function. Curr. Opin. Cell Biol.
transcriptome of mammalian cells. Nat. Genet. 41, 563–571 26, 10–18
76 Fort, A. et al. (2014) Deep transcriptome profiling of mammalian stem 105 van Heesch, S. et al. (2014) Extensive localization of long noncoding
cells supports a regulatory role for retrotransposons in pluripotency RNAs to the cytosol and mono- and polyribosomal complexes. Genome
maintenance. Nat. Genet. 46, 558–566 Biol. 15, R6
77 Kelley, D.R. and Rinn, J.L. (2012) Transposable elements reveal a 106 Rackham, O. et al. (2011) Long noncoding RNAs are generated from
stem cell specific class of long noncoding RNAs. Genome Biol. 13, R107 the mitochondrial genome and regulated by nuclear-encoded
78 Sell, S. (2010) On the stem cell origin of cancer. Am. J. Pathol. 176, proteins. RNA 17, 2085–2093
2584–2594 107 Kung, J.T. et al. (2013) Long noncoding RNAs: past, present, and
79 Mariner, P.D. et al. (2008) Human Alu RNA is a modular transacting future. Genetics 193, 651–669
repressor of mRNA transcription during heat shock. Mol. Cell 29, 108 Orom, U.A. et al. (2010) Long noncoding RNAs with enhancer-like
499–509 function in human cells. Cell 143, 46–58
80 Belancio, V.P. et al. (2009) LINE dancing in the human genome: 109 Tay, Y. et al. (2014) The multilayered complexity of ceRNA crosstalk
transposable elements and disease. Genome Med. 1, 97 and competition. Nature 505, 344–352
81 Flockerzi, A. et al. (2008) Expression patterns of transcribed human 110 St Laurent, G. et al. (2012) Dark matter RNA: an intelligent scaffold
endogenous retrovirus HERV-K(HML-2) loci in human tissues and for the dynamic regulation of the nuclear information landscape.
the need for a HERV Transcriptome Project. BMC Genomics 9, 354 Front. Genet. 3, 57
82 Wang, J. et al. (2013) Control of myogenesis by rodent SINE- 111 Wilusz, J.E. et al. (2008) 30 end processing of a long nuclear-retained
containing lncRNAs. Genes Dev. 27, 793–804 noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 135,
83 Zheng, D. et al. (2007) Pseudogenes in the ENCODE regions: 919–932
consensus annotation, analysis of transcription, and evolution. 112 Pircher, A. et al. (2014) An mRNA-derived noncoding RNA targets and
Genome Res. 17, 839–851 regulates the ribosome. Mol. Cell 54, 147–155
84 Pink, R.C. et al. (2011) Pseudogenes: pseudo-functional or key 113 Prabakaran, S. et al. (2014) Quantitative profiling of peptides from
regulators in health and disease? RNA 17, 792–798 RNAs classified as noncoding. Nat. Commun. 5, 5429
85 Johnsson, P. et al. (2013) A pseudogene long-noncoding-RNA network 114 Ingolia, N.T. et al. (2011) Ribosome profiling of mouse embryonic stem
regulates PTEN transcription and translation in human cells. Nat. cells reveals the complexity and dynamics of mammalian proteomes.
Struct. Mol. Biol. 20, 440–446 Cell 147, 789–802
86 Poliseno, L. et al. (2010) A coding-independent function of gene and 115 Banfai, B. et al. (2012) Long noncoding RNAs are rarely translated in
pseudogene mRNAs regulates tumour biology. Nature 465, 1033–1038 two human cell lines. Genome Res. 22, 1646–1657
87 Kerin, T. et al. (2012) A noncoding RNA antisense to moesin at 5p14.1 116 Guttman, M. et al. (2013) Ribosome profiling provides evidence
in autism. Sci. Transl. Med. 4, 128ra140 that large noncoding RNAs do not encode proteins. Cell 154,
88 Arigo, J.T. et al. (2006) Termination of cryptic unstable transcripts is 240–251
directed by yeast RNA-binding proteins Nrd1 and Nab3. Mol. Cell 23, 117 Pasmant, E. et al. (2011) ANRIL, a long, noncoding RNA, is an
841–851 unexpected major hotspot in GWAS. FASEB J. 25, 444–448
89 Ntini, E. et al. (2013) Polyadenylation site-induced decay of upstream 118 Burd, C.E. et al. (2010) Expression of linear and novel circular forms of
transcripts enforces promoter directionality. Nat. Struct. Mol. Biol. an INK4/ARF-associated non-coding RNA correlates with
20, 923–928 atherosclerosis risk. PLoS Genet. 6, e1001233
90 Valen, E. et al. (2011) Biogenic mechanisms and utilization of small 119 Holdt, L.M. et al. (2013) Alu elements in ANRIL non-coding RNA at
RNAs derived from human protein-coding genes. Nat. Struct. Mol. chromosome 9p21 modulate atherogenic cell functions through trans-
Biol. 18, 1075–1082 regulation of gene networks. PLoS Genet. 9, e1003588
91 van Dijk, E.L. et al. (2011) XUTs are a class of Xrn1-sensitive 120 Kapranov, P. et al. (2007) Genome-wide transcription and the
antisense regulatory non-coding RNA in yeast. Nature 475, 114–117 implications for genomic organization. Nat. Rev. Genet. 8, 413–423
92 Xu, Z. et al. (2009) Bidirectional promoters generate pervasive 121 Jan, C.H. et al. (2011) Formation, regulation and evolution of
transcription in yeast. Nature 457, 1033–1037 Caenorhabditis elegans 30 UTRs. Nature 469, 97–101
93 Derrien, T. et al. (2012) The GENCODE v7 catalog of human long 122 Ozsolak, F. et al. (2010) Comprehensive polyadenylation site maps in
noncoding RNAs: analysis of their gene structure, evolution, and yeast and human reveal pervasive alternative polyadenylation. Cell
expression. Genome Res. 22, 1775–1789 143, 1018–1029
94 Bejerano, G. et al. (2004) Ultraconserved elements in the human 123 Peng, Z. et al. (2012) Comprehensive analysis of RNA-Seq data
genome. Science 304, 1321–1325 reveals extensive RNA editing in a human transcriptome. Nat.
95 Hudson, R.S. et al. (2013) Transcription signatures encoded by Biotechnol. 30, 253–260
ultraconserved genomic regions in human prostate cancer. Mol. 124 St Laurent, G. et al. (2013) Genome-wide analysis of A-to-I RNA
Cancer 12, 13 editing by single-molecule sequencing in Drosophila. Nat. Struct. Mol.
96 Mauger, D.M. et al. (2013) The genetic code as expressed through Biol. 20, 1333–1339
relationships between mRNA structure and protein function. FEBS 125 Grosjean, H. (2009) Nucleic acids are not boring long polymers of only
Lett. 587, 1180–1188 four types of nucleotides:a guided tour. In DNA and RNA Modification
97 Washietl, S. et al. (2005) Fast and reliable prediction of noncoding Enzymes: Structure, Mechanism, Function and Evolution (7th edn)
RNAs. Proc. Natl. Acad. Sci. U.S.A. 102, 2454–2459 (Grosjean, H., ed.), pp. 1–18, Landes Bioscience
98 Pedersen, J.S. et al. (2006) Identification and classification of 126 Kleinman, C.L. and Majewski, J. (2012) Comment on ‘‘Widespread
conserved RNA secondary structures in the human genome. PLoS RNA and DNA sequence differences in the human transcriptome’’.
Comput. Biol. 2, e33 Science 335, 1302 author reply 1302
99 Washietl, S. et al. (2007) Structured RNAs in the ENCODE selected 127 Flusberg, B.A. et al. (2010) Direct detection of DNA methylation
regions of the human genome. Genome Res. 17, 852–864 during single-molecule, real-time sequencing. Nat. Methods 7, 461–
100 Ferdin, J. et al. (2013) HINCUTs in cancer: hypoxia-induced 465
noncoding ultraconserved transcripts. Cell Death Differ. 20, 1675– 128 Barhoumi, A. and Halas, N.J. (2011) Detecting chemically modified
1687 DNA bases using surface-enhanced raman spectroscopy. J. Phys.
101 Silva, J.M. et al. (2010) Identification of long stress-induced non- Chem. Lett. 2, 3118–3123
coding transcripts that have altered expression in cancer. 129 Gerstein, M.B. et al. (2007) What is a gene, post-ENCODE? History
Genomics 95, 355–362 and updated definition. Genome Res. 17, 669–681

12
TIGS-1196; No. of Pages 13

Review Trends in Genetics xxx xxxx, Vol. xxx, No. x

130 Mudge, J.M. et al. (2013) Functional transcriptomics in the post- 150 Brown, C.J. et al. (1992) The human XIST gene: analysis of a 17 kb
ENCODE era. Genome Res. 23, 1961–1973 inactive X-specific RNA that contains conserved repeats and is highly
131 Zhang, C. and Darnell, R.B. (2011) Mapping in vivo protein-RNA localized within the nucleus. Cell 71, 527–542
interactions at single-nucleotide resolution from HITS-CLIP data. 151 Vance, K.W. et al. (2014) The long non-coding RNA Paupar regulates
Nat. Biotechnol. 29, 607–614 the expression of both local and distal genes. EMBO J. 33, 296–311
132 Siegfried, N.A. et al. (2014) RNA motif discovery by SHAPE and 152 Koerner, M.V. et al. (2009) The function of non-coding RNAs in
mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 genomic imprinting. Development 136, 1771–1783
133 Chu, C. et al. (2011) Genomic maps of long noncoding RNA occupancy 153 Faghihi, M.A. et al. (2008) Expression of a noncoding RNA is elevated
reveal principles of RNA-chromatin interactions. Mol. Cell 44, in Alzheimer’s disease and drives rapid feed-forward regulation of
667–678 beta-secretase. Nat. Med. 14, 723–730
134 Hah, N. et al. (2011) A rapid, extensive, and transient transcriptional 154 Thrash-Bingham, C.A. and Tartof, K.D. (1999) aHIF: a natural
response to estrogen signaling in breast cancer cells. Cell 145, antisense transcript overexpressed in human renal cancer and
622–634 during hypoxia. J. Natl. Cancer Inst. 91, 143–151
135 Mattick, J.S. (2007) A new paradigm for developmental biology. J. 155 Lee, J.T. et al. (1999) Tsix, a gene antisense to Xist at the X-
Exp. Biol. 210, 1526–1547 inactivation centre. Nat. Genet. 21, 400–404
136 Dozmorov, M.G. et al. (2013) Systematic classification of non-coding 156 Seila, A.C. et al. (2008) Divergent transcription from active promoters.
RNAs by epigenomic similarity. BMC Bioinform. 14 (Suppl 14), S2 Science 322, 1849–1851
137 Wan, Y.W. et al. (2014) On the reproducibility of TCGA ovarian cancer 157 Kim, T.K. et al. (2010) Widespread transcription at neuronal activity-
microRNA profiles. PLoS ONE 9, e87782 regulated enhancers. Nature 465, 182–187
138 St Laurent, G. et al. (2013) On the importance of small changes in 158 Flynn, R.A. et al. (2011) Antisense RNA polymerase II divergent
RNA expression. Methods 63, 18–24 transcripts are P-TEFb dependent and substrates for the RNA
139 Raz, T. et al. (2011) Protocol dependence of sequencing-based gene exosome. Proc. Natl. Acad. Sci. U.S.A. 108, 10460–10465
expression measurements. PLoS ONE 6, e19287 159 Luke, B. and Lingner, J. (2009) TERRA: telomeric repeat-containing
140 Fu, G.K. et al. (2014) Molecular indexing enables quantitative RNA. EMBO J. 28, 2503–2510
targeted RNA sequencing and reveals poor efficiencies in standard 160 Hall, L.L. et al. (2014) Stable C0T-1 repeat RNA is abundant and
library preparations. Proc. Natl. Acad. Sci. U.S.A. 111, 1891– is associated with euchromatic interphase chromosomes. Cell 156,
1896 907–919
141 Sultan, M. et al. (2014) Influence of RNA extraction methods and 161 Rangwala, S.H. et al. (2009) Many LINE1 elements contribute to the
library selection schemes on RNA-seq data. BMC Genomics 15, 675 transcriptome of human somatic cells. Genome Biol. 10, R100
142 Derrien, T. et al. (2012) Fast computation and applications of genome 162 Ting, D.T. et al. (2011) Aberrant overexpression of satellite repeats in
mappability. PLoS ONE 7, e30377 pancreatic and other epithelial cancers. Science 331, 593–596
143 Ozsolak, F. et al. (2009) Direct RNA sequencing. Nature 461, 814–818 163 Zheng, R. et al. (2010) Polypurine-repeat-containing RNAs: a novel
144 Kodzius, R. et al. (2006) CAGE: cap analysis of gene expression. Nat. class of long non-coding RNA in mammalian cells. J. Cell Sci. 123,
Methods 3, 211–222 3734–3744
145 Philippe, N. et al. (2014) Combining DGE and RNA-sequencing data 164 Schulz, D. et al. (2013) Transcriptome surveillance by selective
to identify new polyA+ non-coding transcripts in the human genome. termination of noncoding RNA synthesis. Cell 155, 1075–1087
Nucleic Acids Res. 42, 2820–2832 165 Saini, H.K. et al. (2008) Annotation of mammalian primary
146 Fullwood, M.J. et al. (2009) Next-generation DNA sequencing of microRNAs. BMC Genomics 9, 564
paired-end tags (PET) for transcriptome and genome analyses. 166 Cai, X. and Cullen, B.R. (2007) The imprinted H19 noncoding RNA is
Genome Res. 19, 521–532 a primary microRNA precursor. RNA 13, 313–316
147 Pachnis, V. et al. (1984) Locus unlinked to alpha-fetoprotein under the 167 Li, X.Z. et al. (2013) Defining piRNA primary transcripts. Cell Cycle
control of the murine raf and Rif genes. Proc. Natl. Acad. Sci. U.S.A. 12, 1657–1658
81, 5523–5527 168 Mao, Y.S. et al. (2011) Biogenesis and function of nuclear bodies.
148 Wang, K.C. et al. (2011) A long noncoding RNA maintains active Trends Genet. 27, 295–306
chromatin to coordinate homeotic gene expression. Nature 472, 169 Blanchette, M. et al. (2004) Aligning multiple genomic sequences with
120–124 the threaded blockset aligner. Genome Res. 14, 708–715
149 Huarte, M. et al. (2010) A large intergenic noncoding RNA induced by 170 Hindorff, L.A. et al. (2009) Potential etiologic and functional
p53 mediates global gene repression in the p53 response. Cell 142, implications of genome-wide association loci for human diseases
409–419 and traits. Proc. Natl. Acad. Sci. U.S.A. 106, 9362–9367

13

You might also like