Local Regulation of Gene Expression by lncRNA Promotors Transcription and Splicing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

LETTER doi:10.

1038/nature20149

Local regulation of gene expression by lncRNA


promoters, transcription and splicing
Jesse M. Engreitz1,2, Jenna E. Haines1†, Elizabeth M. Perez1, Glen Munson1, Jenny Chen1,2, Michael Kane1, Patrick E. McDonel1†,
Mitchell Guttman3 & Eric S. Lander1,4,5

Mammalian genomes are pervasively transcribed1,2 to produce At 5 of these 12 lncRNA loci, promoter knockouts significantly
thousands of long non-coding RNAs (lncRNAs)3,4. A few of these affected the expression of a nearby gene in an allele-specific manner
lncRNAs have been shown to recruit regulatory complexes through (false discovery rate <​10%), including both activating and repressive
RNA–protein interactions to influence the expression of nearby effects (Fig. 1c, d, Supplementary Note 4 and Extended Data Fig. 3).
genes5–7, and it has been suggested that many other lncRNAs can For each locus, the affected gene was located immediately adjacent to,
also act as local regulators8,9. Such local functions could explain and within 5–71 kb of, the knocked-out promoter (Fig. 1c and Extended
the observation that lncRNA expression is often correlated with Data Fig. 4). This indicates that a substantial fraction of lncRNA loci
the expression of nearby genes2,10,11. However, these correlations influence the expression of a neighbouring gene.
have been challenging to dissect12 and could alternatively result To test whether such effects were specific to lncRNA loci, we
from processes that are not mediated by the lncRNA transcripts deleted the promoters of six protein-coding genes (Extended Data
themselves. For example, some gene promoters have been proposed Fig. 1). Surprisingly, knockouts at four of these loci also affected the
to have dual functions as enhancers13–16, and the process of expression of a neighbour in cis (Fig. 1c, d and Extended Data Fig. 5).
transcription itself may contribute to gene regulation by recruiting Thus, both non-coding and coding loci can directly influence local
activating factors or remodelling nucleosomes10,17,18. Here we use gene ­expression. These regulatory connections may contribute to the
genetic manipulation in mouse cell lines to dissect 12 genomic loci observed correlations in the expression of neighbouring genes, which
that produce lncRNAs and find that 5 of these loci influence the have been reported both for lncRNAs and for mRNAs10,11,19,20.
expression of a neighbouring gene in cis. Notably, none of these Because in these experiments we deleted gene promoters, the
effects requires the specific lncRNA transcripts themselves and mechanisms underlying such cis effects could in principle involve
instead involves general processes associated with their production, (i) DNA regulatory elements in gene promoters13–16; (ii) the process
including enhancer-like activity of gene promoters, the process of
transcription, and the splicing of the transcript. Furthermore, such a b 35 kb
effects are not limited to lncRNA loci: we find that four out of six Local effect 0%
linc1536
100% 0%
Bend4
100%
protein-coding loci also influence the expression of a neighbour. 129
gRNAs ~750 bp Genotype
+/+
These results demonstrate that cross-talk among neighbouring genes Cast
–/–
is a prevalent phenomenon that can involve multiple mechanisms lncRNA Neighbouring gene

and cis-regulatory signals, including a role for RNA splice sites. ?


Downstream effect
+/–

These mechanisms may explain the function and evolution of some Function elsewhere in the cell –/+

genomic loci that produce lncRNAs and broadly contribute to the c Distance (kb) d
–250 +250 Knocked-out gene Neighbouring gene
regulation of both coding and non-coding genes. linc1319
We analysed 12 lncRNA loci whose RNA transcripts in mouse Snhg17
linc1405
0% 100% 0% 100%
12 lncRNAs

linc1319 Sfmbt2 ***


embryonic stem cells (mES cells) show preferential localization to the linc1536
linc2025
Meg3†
Snhg17 Snhg11 *

nucleus and span a range of abundance levels (Methods and Extended Snhg3†
linc1386
linc1536
linc1405
Bend4
Eomes
***
***
Data Fig. 1). For each locus, we looked for direct regulatory effects on linc1399
linc1423
linc1509
linc2025 Chd2 ***

local gene expression by using a genetic approach based on c­ lassical linc1547 0% 100% 0% 100%

cis–trans tests (Fig. 1a and Supplementary Note 1). Specifically, Gpr19 Cdkn1b
6 mRNAs

Gpr19 ***
Slc30a9 Slc30a9
Sfmbt2 Bend4 ***
we generated clonal cell lines carrying heterozygous knockouts of Rcc1
Crlf3
Sfmbt2 linc1319 *
the promoter (~​600–1,000-bp deletions) (Fig. 1b) and compared the Dicer1 Rcc1 Trnau1ap ***

expression of nearby genes within 1 Mb on the cis and trans alleles (that Figure 1 | Many lncRNA and mRNA loci influence the expression of
is, on the modified and unmodified homologous chromosomes in the neighbouring genes. a, Knocking out a promoter (black) could affect a
same cells) (Supplementary Note 2). Changes in neighbouring gene neighbouring gene (blue) directly (local) or indirectly (downstream).
expression that involve only the cis allele very probably result from b, Knockout of the linc1536 promoter. Left, genotypes; right, allele-specific
direct, local functions of the lncRNA locus, while changes that involve RNA expression for 129 and castaneus (Cast) alleles normalized to 81
control clones (+​/+​). Error bars, 95% confidence interval for the mean
both the cis and trans alleles probably result as indirect, downstream
(n =​  2 for −​/−​, 3 for +​/−​, 1 for −​/+​). c, Gene neighbourhoods oriented
consequences of the lncRNA acting elsewhere (Supplementary Note 1). so each knocked-out gene (black) is transcribed in the positive direction.
We performed genetic modifications in 129/castaneus F1 hybrid mES Blue neighbouring genes show allele-specific changes in expression.
cells that contain a polymorphic site every ~​140 bp, enabling us to dis- †See Supplementary Note 3. d, Average RNA expression on promoter
tinguish the two alleles using RNA sequencing (Fig. 1b, Extended Data knockout compared to wild-type alleles (n ≥​ 2 alleles, see Supplementary
Fig. 2 and Supplementary Note 3). Table 1). *​FDR <​  10%; *​*​*​FDR <​  0.1%.
1
Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA. 2Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, USA. 3Division of Biology and
Biological Engineering, California Institute of Technology, Pasadena, California 91125, USA. 4Department of Biology, MIT, Cambridge, Massachusetts 02139, USA. 5Department of Systems Biology,
Harvard Medical School, Boston, Massachusetts 02114, USA. †Present addresses: Department of Molecular & Cell Biology, University of California Berkeley, Berkeley, California 94720, USA
(J.E.H.); University of Massachusetts Medical School, Worcester, Massachusetts 01655, USA (P.E.M.).

4 5 2 | NAT U R E | VO L 5 3 9 | 1 7 NOV E M B E R 2 0 1 6
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

of transcription10,17,18; or (iii) the RNA transcripts themselves5–9 a linc1536/Bendr Bend4


13 kb
(Extended Data Fig. 6a). To begin to distinguish among these possible + strand
GRO-seq
mechanisms, we inserted early polyadenylation signals (pAS) 0.5–3 kb – strand

downstream of each transcription start site (TSS) that eliminated the H3K4me3
production of most of the RNA while leaving the promoter sequence
intact (Fig. 2 and Extended Data Fig. 6b, c, see Methods). We examined b Bendr 100% 0%
Bend4
100%
0%
four lncRNA loci and two mRNA loci where promoter deletion affected
the expression of a neighbouring gene (see Supplementary Note 5). Wild-type
As one example, we describe the linc1536 locus, hereafter called ~750 bp deletion
Promoter deletion
pAS insertion
Bendr (Bend4-regulating effects not dependent on the RNA, Fig. 2a). pAS at +570 bp
Whereas deleting the Bendr promoter reduced the expression of the
c Allele-specific GRO-seq Bend4
adjacent Bend4 gene by 57%, inserting a pAS into the first intron of 0% 100%

Bendr (~​570 bp downstream of the TSS in this ~​13 kb locus) had no Wild-type Both strands
effect on Bend4 expression despite eliminating the spliced Bendr RNA
Promoter deletion
(Fig. 2b, c). Furthermore, global run-on sequencing (GRO-seq) did not
pAS insertion
detect any transcriptionally engaged polymerase upstream of the pAS
insertion (Fig. 2c and Extended Data Fig. 7a), perhaps because the pAS Figure 2 | Enhancer-like function of the Bendr promoter.
prevents RNA splicing, which may substantially reduce transcriptional a, Transcriptionally engaged RNA polymerase (GRO-seq) and H3K4me3
activity in the modified locus21,22. Therefore, cis activation of Bend4 occupancy (chromatin-immunoprecipitation followed by sequencing,
requires neither the mature Bendr RNA transcript nor significant Bendr ChIP–seq). b, poly(A)+ RNA expression upon deleting the Bendr promoter
transcription. Instead, this effect appears to be mediated by DNA regu- or inserting a pAS on modified alleles versus controls. Error bars, 95%
latory elements in the ~​750 bp knocked-out promoter-proximal region. confidence interval for the mean (n ≥​ 2 alleles, see Supplementary Table 1).
In total, at five of the six loci examined with pAS insertions (­ including c, Allele-specific GRO-seq signal for clones carrying the indicated
modifications. Both clones are modified on the 129 allele, and only reads
three lncRNAs and two mRNAs), DNA regulatory elements in the specifically mapping that allele are shown. The y axis shows normalized
promoter-proximal sequences appear to be responsible for activating a read count. Bar plot quantifies signal at Bend4, including seven additional
neighbouring gene (Extended Data Fig. 7b). Although the promoters wild-type controls not shown on left.
in these loci would not be classified as ‘enhancers’ based on H3K4me3/
H3K4me1 ratios23, they are bound by mES cell transcription factors downstream Blustr exons, splicing skipped over the removed exon to
(Extended Data Fig. 7c) and are located in close proximity to their the next available 3′​splice site (Extended Data Fig. 8d) and Sfmbt2
neighbouring target genes (Fig. 1c and Extended Data Fig. 7d, e), expression was unaffected (Fig. 3b).
suggesting that these promoters may affect local gene expression Together, these data demonstrate that the 5′​ splice site and the
through mechanisms similar or identical to enhancers13,24,25. ­process of transcription in the Blustr locus are important for its a­ bility
We also identified one locus, linc1319 (renamed Blustr: bivalent to regulate Sfmbt2. This indicates that the Blustr RNA is in fact required
locus (Sfmbt2) is upregulated by the splicing and transcription of an for Sfmbt2 activation (splicing involves direct interactions between the
RNA), where both promoter deletions and pAS insertions substan- spliceosome and the nascent transcript), although this mechanism does
tially reduced the expression of a neighbouring gene, Sfmbt2, located not appear to depend on the precise sequence of the RNA beyond the
5 kb upstream (Fig. 3a). To dissect the regulatory mechanism, we presence of initial splice signals. One possibility is that the 5′​ splice
tested whether the activation of Sfmbt2 is mediated by a sequence-­ site promotes transcriptional activity in the Blustr locus, which in turn
specific function of the Blustr transcript or the process of transcription recruits components of the transcriptional machinery that act on the
(by which we mean one or more sequence-independent functions nearby Sfmbt2 promoter (Fig. 3d, Supplementary Note 7). Consistent
associated with transcription, such as changes in chromatin state or with this model, altering transcription or splicing in the Blustr locus
recruitment of co-factors). To test the first possibility, we knocked led to changes in chromatin state at the Sfmbt2 promoter (including
out each of the three downstream exons and three introns. None of reductions in H3K4me3 and spreading of H3K27me3) and reduced
these deletions impaired Sfmbt2 activation (Fig. 3b, Supplementary occupancy of engaged RNA polymerase in the paused position just
Note 6), suggesting that the activation of Sfmbt2 does not require downstream of the Sfmbt2 TSS (Extended Data Fig. 8b, e, f). Thus,
unique sequences or structures in the Blustr transcript itself. To test changes in Blustr transcription and splicing may affect Sfmbt2 expres-
the second possibility, we engineered pAS insertions at five different sion in part by altering chromatin state and RNA polymerase occu-
locations in the first exon or intron (+​40 bp to +​15 kb downstream of pancy at the Sfmbt2 promoter (Fig. 3d and Supplementary Note 7).
the TSS) and found that increasing the length of the Blustr transcribed In summary, genetic dissection of 12 lncRNA loci and 6 mRNA
region led to increased activation of Sfmbt2 (Fig. 3b and Extended Data loci found that 9 loci (50%) regulate the expression of a neighbouring
Fig. 8a, b). We note that changing the length of the transcribed region gene (Extended Data Fig. 9). In most of these loci, including Bendr,
affected the total amount of engaged polymerase in the Blustr locus local effects are mediated by enhancer-like functions of DNA ­elements
(Fig. 3c). Thus, Sfmbt2 activation responds to changes in the length/ in promoters. In one locus, Blustr, the processes of transcription
amount of transcriptional activity in the Blustr locus but does not and s­ plicing also contribute to cis-regulatory functions, perhaps by
appear to require specific sequence elements in the mature Blustr ­increasing the local concentration of transcription-associated factors.
transcript (Supplementary Note 7). We did not identify any lncRNA loci in which local effects are mediated
Because promoter-proximal splice sites and the process of splicing by sequence-specific functions of the lncRNA transcript. Because there
can enhance transcription—in some cases by as much as 100-fold21,22— exist thousands of other loci that fit our selection criteria, we expect
we tested whether the splicing of Blustr is involved in Sfmbt2 activation. that similar mechanisms broadly contribute to gene regulation in many
Upon deleting the 5′​splice site of the first intron of Blustr (Extended loci (Supplementary Note 8).
Data Fig. 8c), we observed a 94% reduction in Blustr transcription The frequent cross-talk between neighbouring genes observed in
(as assayed by GRO-seq), a 92% reduction in the levels of the mature our study indicates that gene loci can encode multiple ­independent
Blustr transcript, and an 85% reduction in Sfmbt2 expression (Fig. 3b, c ­categories of functions. Category I involves functions of the RNA
and Extended Data Fig. 8a, b), demonstrating that the first 5′​splice site product: mRNAs provide a template for protein synthesis, and
of Blustr has a critical role in activating Blustr and Sfmbt2 transcription. some non-coding transcripts (for example, XIST) act as functional
By contrast, downstream splice sites were dispensable: upon deleting lncRNAs. Category II involves the effects of transcription-related

1 7 NOV E M B E R 2 0 1 6 | VO L 5 3 9 | NAT U R E | 4 5 3
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

a linc1319/Blustr Sfmbt2
RNA poly(A)+
+ strand
GRO-seq – strand

H3K4me3
26 kb
5 kb

b RNA poly(A)+ expression Blustr Sfmbt2


0% 100% 200% 0% 100% 200%
Blustr
Wild-type
Promoter deletion

Deletion: Exon 2
Exon 3
Exon 4
614% 1,592%
Intron 1
Intron 2
Intron 3

pAS: +40 bp * **
+0.5 kb **
+2 kb
+4 kb
+15 kb

5′ splice site

c Allele-specific GRO-seq d
RNA polymerase II
Wild-type chromatin regulators
Promoter deletion
Transcription U1 snRNP
pAS: +2 kb

+15 kb
5′ splice site Blustr Sfmbt2

Figure 3 | Transcription and splicing of Blustr activates Sfmbt2 +​15 kb where n =​ 1, see Supplementary Table 1). Sfmbt2 pAS comparisons:
expression. a, Poly(A)+ RNA-seq, GRO-seq, and H3K4me3 ChIP–seq two-sided t-test. *​P <​  0.05; *​*​P <​  0.01. c, Allele-specific GRO-seq signal
in the Blustr locus. Sfmbt2 has two alternative TSSs. b, Poly(A)+ RNA for clones carrying indicated modifications. Only reads mapping to the
expression on modified alleles compared to controls (arrows). Error bars, modified allele are shown (castaneus for pAS +​2 kb; 129 for others).
95% confidence interval for the mean (n ≥​ 2 alleles, except for pAS d, Model for how transcription in the Blustr locus activates Sfmbt2.

processes—including mechanisms mediated by promoters, a Mouse-specific lncRNAs b DNase HS + H3K4me3


­transcription, and splicing—on the regulation of other nearby genes. (intergenic, non-repeat promoter) Mouse
lncRNA
The fact that many lncRNA loci have category II functions does not Mouse TSS
ES cells

necessarily mean that they do not also have category I functions, and we maps to human:
37
Mouse TSS does
not map to Syntenic sequence (100 bp)
Not RE in
note that our experiments do not rule out the possibility that the lncRNAs hES cells (ii) 21 human (i) Human
DNase HS + H3K4me1/CTCF
ES cells
dissected in this study have RNA-mediated functions other than on RE in 11
no RNA
local gene regulation. However, the prevalence of category II func- hES cells (iii)
c
tions suggests a model for the evolutionary origins of some lncRNAs. d DNase HS
[0–1]
Mouse ES cells

lncRNA [0–2]
In loci where a promoter acts as an enhancer, RNA transcripts may promoters Enhancers GC-match H3K4me3
[0–2]
H3K4me1
arise as non-functional by-products16. In loci where co-transcriptional CAGE
[0–1]

processes have cis-regulatory functions, the nascent transcripts might *** *** RNA poly(A)+ [0–1]
Sequence conservation

contribute through mechanisms such as splicing that require little 20 linc1494 2.4 kb
**
RNA-sequence specificity. These possibilities are particularly intriguing *** Conservation
in light of the patterns of evolutionary conservation of lncRNA loci26–28. 10
***
Human ES cells

For example, although most lncRNA transcripts expressed in mES cells 0 DNase HS
[0–1]

are not conserved (no RNA detected in syntenic loci in other ­mammals, H3K4me3
[0–1]
[0–1]
see Methods), the promoters in some of these loci correspond to –10
All i ii iii All i ii iii All ii + iii
H3K4me1
[0–1]
CAGE
conserved DNA sequences that have an enhancer chromatin signature RNA poly(A)+ [0–1]

in human ES cells (Fig. 4, Extended Data Fig. 10 and Supplementary


Note 9). These sequences may have conserved functional roles as Figure 4 | Evolutionary conservation of mES cell lncRNAs and their
cis-regulatory elements, rather than as lncRNA promoters. Thus, promoters. a, Classification of a subset of lncRNAs expressed in mES cells
mechanisms associated with cis functions by promoters, transcription, (see Supplementary Note 9, Methods). b, Eleven lncRNAs have promoters
and/or RNA processing may contribute to the functions and evolution whose syntenic sequence corresponds to putative DNA regulatory
elements (REs) marked by DNase I hypersensitivity (HS) in human
of an important subset of non-coding loci in mammalian genomes
ES cells. c, Example: linc1494. d, Enhancers and lncRNA promoters are
(Extended Data Fig. 10c). significantly enriched for corresponding to human regulatory elements
Beyond the implications for lncRNAs, these cis-regulatory connec- (pie chart, *​*​*​P <​  1 ×​  10−10, χ2 test versus GC-matched random regions)
tions between neighbouring genes occur in both protein-coding and and show elevated sequence conservation compared to GC-matched
non-coding loci and thus appear to represent a fundamental property regions (bar plot, *​*​P <​  0.01; *​*​*​P <​  0.001, Mann–Whitney U-test
of mammalian gene regulatory networks. The properties of these versus ii +​  iii).

4 5 4 | NAT U R E | VO L 5 3 9 | 1 7 NOV E M B E R 2 0 1 6
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

cis-regulatory connections—including mechanisms for specificity and 18. Shearwin, K. E., Callen, B. P. & Egan, J. B. Transcriptional interference—a crash
the potential for cooperative dynamics of gene activation—represent course. Trends Genet. 21, 339–345 (2005).
19. Purmann, A. et al. Genomic organization of transcriptomes in mammals:
key areas for future investigation. Coregulation and cofunctionality. Genomics 89, 580–587 (2007).
20. Kosak, S. T. et al. Coordinate gene regulation during hematopoiesis is related
Online Content Methods, along with any additional Extended Data display items and to genomic organization. PLoS Biol. 5, e309 (2007).
Source Data, are available in the online version of the paper; references unique to 21. Brinster, R. L., Allen, J. M., Behringer, R. R., Gelinas, R. E. & Palmiter, R. D.
these sections appear only in the online paper. Introns increase transcriptional efficiency in transgenic mice. Proc. Natl Acad.
Sci. USA 85, 836–840 (1988).
Received 16 April; accepted 10 October 2016. 22. Fong, Y. W. & Zhou, Q. Stimulatory effect of splicing factors on transcriptional
Published online 26 October 2016. elongation. Nature 414, 929–933 (2001).
23. Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and
why? Mol. Cell 49, 825–837 (2013).
1. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional
24. Andersson, R., Sandelin, A. & Danko, C. G. A unified architecture of
annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
transcriptional regulatory elements. Trends Genet. 31, 426–433 (2015).
2. Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function
25. Kim, T.-K. & Shiekhattar, R. Architectural and functional commonalities
for pervasive transcription. Science 316, 1484–1488 (2007).
between enhancers and promoters. Cell 162, 948–959 (2015).
3. Guttman, M. et al. Chromatin signature reveals over a thousand highly
26. Necsulea, A. et al. The evolution of lncRNA repertoires and expression patterns
conserved large non-coding RNAs in mammals. Nature 458, 223–227
in tetrapods. Nature 505, 635–640 (2014).
(2009).
27. Hezroni, H. et al. Principles of long noncoding RNA evolution derived from
4. Carninci, P. et al. The transcriptional landscape of the mammalian genome.
direct comparison of transcriptomes in 17 species. Cell Reports 11,
Science 309, 1559–1563 (2005).
1110–1122 (2015).
5. Lee, J. T. Lessons from X-chromosome inactivation: long ncRNA as guides and
28. Chen, J. et al. Evolutionary analysis across mammals reveals distinct classes
tethers to the epigenome. Genes Dev. 23, 1831–1842 (2009).
of long non-coding RNAs. Genome Biol. 17, 19 (2016).
6. Nagano, T. et al. The Air noncoding RNA epigenetically silences transcription by
targeting G9a to chromatin. Science 322, 1717–1720 (2008).
7. Wang, K. C. et al. A long noncoding RNA maintains active chromatin to Supplementary Information is available in the online version of the paper.
coordinate homeotic gene expression. Nature 472, 120–124 (2011).
8. Ørom, U. A. et al. Long noncoding RNAs with enhancer-like function in human Acknowledgements We thank S. Grossman, J. Rinn, M. Yassour, P. Sharp,
cells. Cell 143, 46–58 (2010). L. Boyer, M. Ray, C. Fulco, M. Munschauer, T. Wang and N. Friedman for
9. Guil, S. & Esteller, M. Cis-acting noncoding RNAs: friends and foes. Nat. Struct. discussions; A. Goren and Broad Technology Labs for ChIP; J. Lis, D. Mahat and
Mol. Biol. 19, 1068–1075 (2012). A. Shishkin for technical advice and reagents; and J. Flannick for computational
10. Ebisuya, M., Yamamoto, T., Nakajima, M. & Nishida, E. Ripples from tools. J.M.E. is supported by the Fannie and John Hertz Foundation and the
neighbouring transcription. Nat. Cell Biol. 10, 1106–1113 (2008). National Defense Science and Engineering Graduate Fellowship. M.G. is
11. Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding supported the NIH Director’s Early Independence Award (DP5OD012190),
RNAs reveals global properties and specific subclasses. Genes Dev. 25, the Edward Mallinckrodt Foundation, the Sontag Foundation, and the Searle
1915–1927 (2011). Scholars Program. Work in the Lander Laboratory is supported by the Broad
12. Bassett, A. R. et al. Considerations when investigating lncRNA function in vivo. Institute.
eLife 3, e03058 (2014).
13. Li, G. et al. Extensive promoter-centered chromatin interactions provide a Author Contributions J.M.E., M.G. and E.S.L. conceived and designed the study.
topological basis for transcription regulation. Cell 148, 84–98 (2012). J.M.E., J.E.H., G.M., M.K. and P.E.M. developed knockout protocols and performed
14. Rajagopal, N. et al. High-throughput mapping of regulatory DNA. Nat. genetic manipulations. E.M.P. and J.M.E. performed all other experiments. J.M.E.
Biotechnol. 34, 167–174 (2016). developed computational tools and analysed data. J.M.E. and J.C. performed
15. Yin, Y. et al. Opposing roles for the lncRNA haunt and its genomic locus in evolutionary analysis. J.M.E. and E.S.L. wrote the manuscript with input from all
regulating HOXA gene activation during embryonic stem cell differentiation. authors. E.S.L. supervised the work and obtained funding.
Cell Stem Cell 16, 504–516 (2015).
16. Paralkar, V. R. et al. Unlinking an lncRNA from its associated cis element. Author Information Reprints and permissions information is available at
Mol. Cell 62, 104–110 (2016). www.nature.com/reprints. The authors declare competing financial interests:
17. Martens, J. A., Laprade, L. & Winston, F. Intergenic transcription is required to details are available in the online version of the paper. Readers are welcome to
repress the Saccharomyces cerevisiae SER3 gene. Nature 429, 571–574 comment on the online version of the paper. Correspondence and requests for
(2004). materials should be addressed to E.S.L. ([email protected]).

1 7 NOV E M B E R 2 0 1 6 | VO L 5 3 9 | NAT U R E | 4 5 5
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

METHODS Similar to the lncRNAs selected, the TSSs of these selected mRNAs are located
Cell lines and cell culture. F1 hybrid 129/castaneus female mouse embryonic >​5 kb from other genes.
stem cells (gift from K. Plath) or V6.5 male mouse embryonic stem cells (gift from CRISPR sgRNA design. To design single-guide RNAs (sgRNAs), we built custom
A. Meissner) were cultured in serum-free N2B27-based medium (250 ml software to calculate a specificity score (based on potential off-target sites using
­neurobasal media (Gibco), 250 ml DMEM/F12 (Gibco), 5 ml 100×​  N2 supplement the algorithm described at http://crispr.mit.edu (see ref. 31)) and an efficacy score
(Gibco), 5 ml 50×​ B27 supplement (Gibco), 5 ml 200 mM l-glutamine (Gibco), (based on a sequence model for sgRNA efficiency as previously described32) for
3.6 μ​l 2-mercaptoethanol, 50 μ​g human leukaemia initiation factor (5 ×​  105 each 20-nt targeting sequence. We removed guides with specificity scores <​20 or
units, EMD Millipore), 7.4 μ​g progesterone, 10 mg bovine insulin (Sigma), 350 μ​l efficacy scores >​0.7. To avoid T-rich sequences that result in premature termina-
7.5% BSA fraction V (Gibco), supplemented with MEK inhibitor PD0325901 tion of Pol III-mediated sgRNA transcription, we removed guides with more than
(50 μ​l 10 mM, SelleckChem), and GSK3b inhibitor CHIR99021 (150 μ​l 10 mM, one T in the four bases closest to the seed region, guides with more than three
SelleckChem)). Prior to plating cells, tissue culture dishes were pre-treated with consecutive Ts, and guides with more than eight Ts total. We removed guides
PBS +​ 0.2% gelatin (Sigma) and 1.75 μ​g ml−1 laminin (Sigma) for 2–10 h at 37 °C. with homopolymer stretches of five or more bases and guides with GC content
At each passage, cells were trypsinized for 3–5 min in TVP solution (0.025% <​20% or >​90%. We removed guides that overlapped a known 129/castaneus
trypsin, 1% chicken serum (Sigma), and 1 mM EDTA in PBS pH 7.4) at room SNP33. Within a given region, we typically chose the three remaining guides with
temperature. Cells tested negative for mycoplasma contamination and were authen- the highest specificity scores. The sequences of all sgRNAs used in this study are
ticated by comparing polymorphisms to 129S1 and castaneus genomes. listed in Supplementary Table 2.
Cellular fractionation. To estimate the relative abundance of lncRNAs in Promoter deletion guide placement. To knock out a lncRNA or mRNA promoter,
­different cellular compartments, we performed cellular fractionation to isolate we chose 2–3 sgRNAs located in windows 300–500 bp upstream and downstream
­chromatin-associated, soluble nuclear, and cytoplasmic fractions essentially as of the TSS, leading to deletions of approximately 600–1,000 bp surrounding
described29. In brief, we first lysed 5 million cells in 200 μ​l cold cell lysis buffer the TSS. We adjusted the precise deletion boundaries outward if we could not
(10 mM Tris-HCl pH 7.5, 0.05% IGEPAL CA-630, 150 mM NaCl), incubating successfully design guides in these regions (for example, because they were
on ice for 5 min. We layered the cell lysate over 2.5 volumes of chilled sucrose located in repetitive sequences). We note that we often found that the wild-type
cushion (24% sucrose in cell lysis buffer) and centrifuged at 15,000g for 10 min. alleles in heterozygous knockouts were affected by scars from repair of sgRNA
The supernatant from this spin became the cytoplasmic fraction. After washing double-stranded breaks. Accordingly, we adjusted the bounds if necessary to
the pellet of nuclei with PBS (pH 7.5) +​1 mM EDTA, we resuspended the pellet cut outside of the exons of the mRNA or lncRNA and thus avoid damaging the
in 100 μ​l of cold glycerol buffer (20 mM Tris-HCl pH 7.5, 75 mM NaCl, 0.5 mM exonic sequences on the wild-type alleles in heterozygous knockouts. We note that
EDTA, 0.85 mM DTT, 0.125 mM PMSF, 50% glycerol) by gently flicking the tube. the presence of these scars (and their lack of allele-specific effects on the expres-
We added 100 μ​l of cold nuclei lysis buffer (10 mM HEPES pH 7.5, 1 mM DTT, sion of neighbouring genes) indicate that the cis effects observed upon deleting
7.5 MgCl2, 0.2 mM EDTA, 0.3 M NaCl, 1 M urea, 1% IGEPAL CA-630), then vor- promoters are not merely a result of CRISPR-mediated cutting and subsequent
texed for four seconds. After 2 min on ice, we spun the nuclear lysate at 15,000g for DNA repair.
2 min. This supernatant was collected as the soluble nuclear (nucleoplasm) fraction. Genetic deletions with CRISPR/Cas9. To delete specific sequences, we co-­
We rinsed the remaining pellet (chromatin fraction) in PBS +​1 mM EDTA, then transfected 100 ng of Cas9-expressing plasmids (PX330-NoGuide), 300 ng of a pool
resuspended the chromatin in 300 μ​l chromatin DNase buffer (20 mM Tris-HCl of sgRNA-expressing plasmids (pZB-Sg3), and 100 ng of a plasmid expressing EGFP
pH 7.5, 50 mM KCl, 4 mM MgCl2, 0.5 mM CaCl2, 2 mM TCEP, 0.5 mM PMSF, and a puromycin selectable marker from a CAG promoter (pS-pp7-­GFPiP). To
0.4% sodium deoxycholate, 1% IGEPAL CA-630, 0.1% N-lauroylsarcosine) plus ­create PX330-NoGuide, we modified PX330 (gift from F. Zhang, Addgene ­plasmid
15 μ​l murine RNase inhibitor (NEB) and 30 μ​l TURBO DNase (Ambion). The #44230 (ref. 34)) to remove the sgRNA expression cassette. To generate pZB-Sg3,
DNase digestion proceeded for 20 min at 37 °C and was halted by adding 10 mM we cloned a human U6 promoter and optimized sgRNA scaffold sequence35 into
EDTA and 5 mM EGTA. Protein was digested with proteinase K for 1 h at 37 °C. a minimal vector with an ampicillin-selectable marker and a ColE1 replication
RNA was isolated using Zymo RNA Concentrator-25 columns (two columns for origin. We transfected batches of 250,000 mouse embryonic stem cells using the
the cytoplasmic fraction). With this method, nuclear-associated endoplasmic Neon Transfection System (Invitrogen), using one pulse of 40 ms at 1,200 V and
reticulum is known to fractionate with the nucleoplasm29, and we observed that plated two batches of cells (500,000 total) into a 96-well plate in 200 μ​l media.
nucleolar RNAs fractionated with chromatin (data not shown). From each cellular As an internal control for each set of transfections, we performed a transfection
fraction, we sequenced total RNA and polyadenylated RNA (selected using oligo using four guides with no predicted target sites in the mouse genome.
d(T)25 magnetic beads, NEB) using a strand-specific RNA-sequencing protocol We verified efficient transfection by examining GFP expression after 24 h.
for Illumina instruments described previously30. To select for transfected cells, we replaced the media 24 h after transfection with
Selection criteria for knocked-out lncRNAs. We selected lncRNA loci initially 200 μ​l 2i +​  1 μ​g ml−1 puromycin. One day later, we split the cells into a 10-cm plate
identified and defined by a chromatin signature of H3K4me3 at promoters and with 8 ml of 0.5 μ​g ml−1 puromycin. One day later, we replaced the media with
H3K36me3 through gene bodies3. We further required that lncRNAs selected 10 ml of 2i with no puromycin. We allowed cells to grow for 7–8 days, replacing
for knockout analysis have TSSs, as defined by cap analysis of gene expression the media every 2–3 days. We hand-picked 88 individual colonies and 8 control
(CAGE), located >​5 kb from other genes (for epigenomic annotation of each locus, colonies for each transfection in 5 μ​l media, added 20 μ​l of TVP for ~​10–20 min at
see http://pubs.broadinstitute.org/neighboring-genes/). To prioritize intergenic 37 °C to dissociate the colonies, and then split the colonies into two identical plates.
lncRNA loci that may regulate local gene expression, we focused on lncRNAs that We grew the cells in these plates for 4–5 days. We harvested one of the plates for
have subcellular localization biased towards the nucleus versus the cytoplasm DNA and RNA extraction by removing most of the media and adding 3.5 ×​ volume
(Extended Data Fig. 1). We performed cellular fractionation experiments in V6.5 buffer RLT (Qiagen) and froze the other plate for later recovery in Freezing Media
male mES cells as described above and sequenced RNA from chromatin-­associated, (2i media +​ 10% fetal bovine serum +​  10% DMSO).
soluble nuclear, and cytoplasmic fractions (GEO Accession GSE80262). We Genotyping by PCR and sequencing. To genotype each promoter knockout, we
­calculated a relative nuclear-to-cytoplasmic ratio (chromatin RPKM plus soluble extracted genomic DNA and performed PCR using primers spanning the deleted
nuclear RPKM divided by cytoplasmic RPKM) and focused on lncRNAs with ratios sequence. We genotyped each clone by running the PCR products on agarose
above the median (1.5): these lncRNAs are preferentially localized to the nucleus gels and comparing PCR amplicon sizes to predicted wild-type and ­deletion
­compared to other lncRNAs and mRNAs. We selected nuclear-biased lncRNAs band sizes. We confirmed the sequences of wild-type and deletion bands by
that span a range of abundance levels (Extended Data Fig. 1). We also included Sanger sequencing or high-throughput sequencing through barcoded amplicon
some lncRNAs that are conserved across mammalian evolution (Snhg3, Snhg17, ­sequencing on an Illumina MiSeq (see Supplementary Table 2). Where possible, we
Meg3, and linc2025). used known polymorphic sites from 129S1 and castaneus genomes33 to determine
Selection criteria for knocked out mRNAs. We selected six mRNAs for p ­ romoter the h
­ aplotype-resolved genotype of each clone. Based on the genotyping data, we
knockouts based on the following criteria. We knocked out two mRNAs that are nominated clones for RNA sequencing. We eliminated clones showing evidence
moderately expressed and are not expected to be essential for mES cell growth of polyclonal or subclonal mutations, or complex mutations such as inversion or
(Dicer1 and Crlf3). We knocked out two mRNAs that are located adjacent to duplication of the genomic sequence between the sgRNAs. The sequences of all
knocked-out lncRNAs (Sfmbt2 and Rcc1), in order to look for reciprocal regulatory genotyping primers are listed in Supplementary Table 2.
effects between the lncRNA and the affected mRNA. We knocked out two mRNAs RNA sequencing libraries. We generated RNA sequencing libraries as p ­ reviously
that are located adjacent to a gene that is itself adjacent to a lncRNA (Gpr19 and described30,36, with some modifications for high sample throughput. We iso-
Slc30a9), in order to determine whether affected genes are specifically responsive lated RNA from harvested mES cells using RNeasy 96 columns. We enriched for
to lncRNA promoters or are generally responsive to other promoters in the locus. poly(A)+ RNA using oligo d(T)25 magnetic beads (NEB) and eluted in 18 μ​l H2O.

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

We fragmented RNA to an average of ~​150 nt by adding 2 μ​l Ambion fragmenta- 72 °C for 2 min, 4 °C hold). We cleaned the PCR reaction twice with 1 ×​  volume
tion buffer and incubating at 70 °C for exactly 2.5 min. After transferring quickly Agencourt Ampure XP magnetic beads and eluted in 20 μ​l H2O.
to ice, we added 40 μ​l of a master mix containing 12 μ​l 5×​ FNK buffer (50 mM Allele-specific gene expression measurements from RNA sequencing. We
Tris-HCl pH 7.5, 5 mM MgCl2, 0.6 mM CaCl2, 50 mM KCl, 10 mM DTT, 0.01% sequenced RNA libraries on an Illumina HiSeq 2500 (Read 1: 38 cycles; Read 2:
Triton X-100), 1 μ​l Murine RNase Inhibitor (NEB), 3 μ​l FastAP Thermosensitive 30 cycles; Index: 8 cycles). The first read includes the 8-nt barcode added during
Alkaline Phosphatase (Thermo Scientific), 3 μ​l T4 Polynucleotide Kinase (NEB), the first adaptor ligation (see above). Following processing to separate samples
and 1 μ​l TURBO DNase (Life Technologies). We incubated this reaction at 37 °C based on the inline barcodes, we filtered out sequencing reads that aligned to highly
for 30 min, then cleaned the reaction with MyOne SILANE magnetic beads37 and abundant RNA transcripts, including ribosomal RNAs, snRNAs, and repetitive
eluted in 6 μ​l of H2O. elements, as defined by RefSeq and RepeatMasker. A FASTA file containing these
We proceeded with the library preparation as previously described30, with one sequences is available at the Gene Expression Omnibus (GSE55914).
additional modification. To simplify the library preparation for many samples, we We developed a computational pipeline to estimate allele-specific expression
added unique sample barcodes (8 nt) during the first adaptor ligation36. We used from RNA-sequencing data. We created two separate reference files for the 129S1
12 pools each with 4 barcodes in order to mitigate differences in the efficiency and castaneus haplotypes, starting with the mm9 genome build and layering on
of ligation for different adaptor sequences. Following the first adaptor ligation, SNPs based on whole-genome sequencing of each of the two mouse strains33.
we pooled 12 samples together, including up to 9 clones corresponding to a We aligned RNA-sequencing data separately to each of the two haplotypes using
­single target gene as well as 3 control clones, during the first 70% ethanol wash TopHat (version 2.0.8). We combined the results of the two alignments using
of the SILANE-bead purification. We performed an extra SILANE purification PySuspenders40, which identifies reads that map specifically to one or the other
using the same beads to remove excess adaptor and then proceeded with reverse allele and splits them into separate BAM files. We discarded duplicate reads and
­transcription. reads with MAPQ <​30. After generating separate BAM files containing the reads
Hybrid selection of RNA sequencing libraries. To measure allele-specific expres- mapping to each allele, we counted reads that mapped to each RefSeq transcript
sion for hundreds of genes in a cost-effective manner, we developed a hybrid (including both spliced and unspliced isoforms) using Scripture41 and calculated
­selection strategy to enrich for allele-informative reads at target genes (Extended ‘allelic expression ratios’ for each gene (counts from 129 allele divided by total
Data Fig. 2). We designed oligo pools to capture allele-informative sequences counts from both 129 and castaneus alleles). The distribution of allelic expression
in the ~​1,600 RNAs located in the genome within 1 Mb of one of the knockout ratios for all active genes in mES cells was centred on 0.5, indicating that on average
targets. These target RNAs were divided into two independent pools: #140820 each gene is expressed equally from the 129 and castaneus alleles (Extended Data
and #141203. We used RefSeq RNA annotations for mRNAs and our custom Fig. 2b). This indicates that there is not systematic bias in our mapping procedure
annotations for most lncRNAs. We identified SNPs that would distinguish the towards one allele or the other.
129S1 and castaneus genomes33. We designed 120-bp capture oligos in the vicinity RNA-seq data analysis. We processed RNA-sequencing data sets in batches
of each 129/castaneus polymorphic site, tiling every 15 bp across either 600 bp corresponding to sets of libraries made on the same day with the same hybrid
(pool #140820) or 240 bp (pool #141203) centred on the SNP. We included probes ­selection probe pool. We removed samples with fewer than 100,000 non-repetitive,
targeting both alleles to minimize differences in capture efficiency between the unique, allele-informative reads. For within-batch quality control, we performed
two alleles. We filtered capture probe sequences as previously described37. We ­hierarchical clustering on all samples by their allelic expression ratios and removed
included up to 10 oligos per targeted RNA, duplicating probes where necessary to the 2–5% of outlier samples, which were largely comprised of clones that showed
include the sequences corresponding to each allele. Empirically, this probe design monoallelic expression from the X chromosome.
strategy in combination with the protocol described below enabled assessing allele- Assessment of gene knockout by expression analysis. The PCR genotyping
specific expression for 84% (611 of 731) of the targeted expressed genes in mES procedure described above provided putative genotypes for the cell clones. We
cells (RPKM ≥​ 2) at a sequencing depth of <​5 million reads per sample. Target confirmed the genotype of cells by analysing the allele-specific expression of the
genes and oligos sequences for these pools are listed in Supplementary Table 3. knocked-out gene in each clone. We required that clones show >​80% reduction
We synthesized pools of 12,000 capture oligos using CustomArray tech- of expression of the knocked out gene on the appropriate allele in order to include
nology. Oligos in each pool were flanked by unique primers (Left primer the clone in downstream analysis. Incomplete reduction of expression in some
sequence: CTTCCTACGAGCAGTTTGCC; right primer sequence: cases appeared to result from use of alternative TSSs that were not included in the
AGTTTACGCATTACGGGCAC). After one round of PCR to add a T7 promoter deleted sequence. In other cases, incomplete reduction of expression appeared
(GGATTCTAATACGACTCACTATAGGG), we generated biotinylated RNA to result from subclonal genetic mosaicism within the cell line, which probably
probes as described previously38, adding in 20% Biotin-16-UTP (Roche) and 20% resulted from deletions that occurred after several cell divisions, leading to genetic
Biotin-14-CTP (Life Technologies) to the in vitro transcription reactions. We differences between individual cells in a colony. For further analysis, we focused on
­generated RNA probes targeting both strands by incorporating the T7 promoter gene loci where we obtained at least two heterozygous knockout clones.
into either side of the PCR product and performing two separate in vitro transcrip- Barplots for allele-specific expression data. Barplots that depict allele-­specific
tion reactions per oligo pool. RNA expression or GRO-seq transcription on modified alleles compared to
To capture the allele-informative regions, we pooled the final, barcoded RNA ­controls (for example, Fig. 1d) were calculated as follows. For each modified allele,
sequencing libraries from all samples in the batch and performed a modified allele-specific measurements were normalized to the corresponding alleles in wild-
version of solution hybrid selection39. We first combined 500 ng dsDNA library type clones (for example, values for castaneus knockout alleles were divided by the
pool with 1 nmol of Illumina P5 and P7 primer mix in 21 μ​l total. We denatured mean of unmodified castaneus alleles in wild-type clones). We performed the same
this mix at 94 °C for 10 min and transferred immediately to ice. We added 7.5 μ​l calculation for unmodified alleles in wild-type clones to create a null distribution.
20×​  SSPE, 0.5 μ​l Murine RNase Inhibitor (NEB), and 1 μ​l of 500 ng μ​l−1 biotiny- For modified alleles, we further scaled these values by dividing by the mean of the
lated RNA probe, for a total volume of 30 μ​l. We set up at least two reactions per wild-type alleles in heterozygous knockout clones. The value of each bar represents
10 libraries, including at least one reaction with each strand of probes. We incu- a mean of these normalized measurements.
bated the hybridization reaction at 65 °C for 24–48 h. For each capture sample, we Identifying significant changes in allele-specific expression. In developing a
washed 30 μ​l Streptavidin C1 MyOne magnetic beads (Invitrogen) in 5×​  SSPE statistical approach to identify local, cis effects of these genetic manipulations,
and aliquoted them into PCR tubes. After removing the wash from the beads, we we sought to distinguish local effects of the genetic deletion from downstream
added the hybridization reaction and mixed to resuspend the beads. We captured effects that result as a consequence of either lncRNA/mRNA functions elsewhere
the biotinylated probes by shaking at 65 °C for 20 min. We washed the beads twice in the cell, off-target effects, or biological/technical variation between clonal cell
in 150 μ​l low stringency wash buffer (1×​ SSPE, 0.1% SDS, 1% NP-40, 4 M urea) at lines (Supplementary Note 1). Our power to detect these effects varies between
62 °C for 3–4 min, and twice in 150 μ​l high stringency wash buffer (0.1×​  SSPE, 0.1% ­different measured genes (owing to their level of expression and availability of
SDS, 1% NP-40, 4 M urea). To elute, we removed the final wash and resuspended SNPs) and between different knockout targets (owing to differences in the numbers
beads in 10 μ​l 100 mM NaOH and heated to 70 °C for 10 min. To complete the of knockout clones analysed).
elution, we added 1 μ​l 1 M acetic acid and 14 μ​l NLS elution buffer (20 mM Tris- To account for these two variables, we developed a statistical approach to
HCl pH 7.5, 10 mM EDTA, 2% N-lauroylsarcosine, 2.5 mM TCEP) and heated to empirically estimate the false discovery rate of allele-specific changes in the
94 °C for 4 min. While hot, we placed samples on magnet, removed eluate, and then expression neighbouring genes using hundreds of genes on other chromosomes
placed the eluate on ice for at least 30 s. We cleaned the eluates with 20 μ​l MyOne as controls. For each gene in the neighbourhood of one of our promoter deletions,
SILANE magnetic beads as described37, using 75 μ​l RLT and 61 μ​l 100% ethanol for we ­calculated three statistics: (i) a t-test statistic comparing the average change
the initial precipitation. We eluted in 23 μ​l H2O, and used this as input for a 50 μ​l in expression for each of the knockout alleles (including both heterozygous and
NEBNext High Fidelity PCR reaction using 500 pmol of the P5 and P7 Illumina homozygous knockout clones), normalized to the expression of the gene on the
primers each (98 °C for 30 s; 13 cycles of 98 °C for 15 s, 68 °C for 30 s, 72 °C for 30 s; wild-type allele of the heterozygous clones; (ii) a z-score statistic comparing the

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

expression of the knockout allele in heterozygous clones to the expression of colonies, used PCR to confirm loss of the cassette, and sequenced RNA from
the wild-type allele in the same clone; and (iii) a t-test statistic comparing the multiple clones. PCR primer sequences for cloning homology arms and genotyping
heterozygotes to the wild-type control clones using the allelic expression ratio p3PA insertions are listed in Supplementary Table 2.
after ­applying a variance-stabilizing transformation (arcsin of the square root Knockouts of Blustr exons and introns. To delete each exon and intron of Blustr,
of the allelic expression ratio). For a given gene, only samples with at least 20 we transfected cells with pools of guides as described for the promoter deletions,
allele-­informative reads were considered, in order to enable accurate estimates using two guides on each side. We assessed the genotype of clonal cell lines as
of allele-specific expression. These three tests differ in whether they incorporate described above for promoter deletions. To confirm exon knockout from RNA
information from homozygous clones and how they normalize between knock- sequencing data, we examined SNPs in each of the exons. Upon knockout of exon
out and wild-type alleles. We required that a gene perform significantly in each 2, for example, we observed loss of RNA sequencing reads mapping to exon 2, while
of the three tests in order to regard the gene as significant, as described below. reads mapping to other exons were still present. We also identified reads spanning
We note that each underlying measure was approximately normally distributed, a new splice junction between exon 1 and exon 3, further confirming that exon 2
with some apparent outliers across hundreds of control clones; we conservatively was removed from the mature transcript. For bar plots in Fig. 3 measuring Blustr
included these outliers in calculating each test statistic. We examined differences expression, the values represent the normalized read counts of the remaining exons
in v­ ariation between knockout and control alleles with Levene’s test. For estimates that were not deleted in that experiment. To confirm intron knockout, we used
of the variance of distributions presented in figures, see Supplementary Table 1. PCR primers spanning the deletion junction and sequenced the resulting PCR
Because the distributions are only approximately normal, we assessed the products. We note that the intron knockouts, by design, do not affect the sequence
significance of each of these gene-level statistics by permutation, sampling other of the spliced Blustr RNA.
cell lines from the same experimental batch and randomly assigning them as 5′ splice site knockout. To knock out the 5′​splice site of Blustr, we co-transfected
heterozygous or homozygous knockout clones to match the distribution of mES cells as described above, using a single sgRNA pZB plasmid and 200 ng of
genotypes of the real samples. We calculated an empirical false discovery rate ssDNA oligonucleotide donor for homologous recombination (Extended Data
for the sum of these permutation ranks, testing each of the neighbouring genes Fig. 8c). The oligo was ordered as an ultramer from Integrated DNA Technologies
and using all of the genes on other chromosomes as the background model. (Supplementary Table 2). We genotyped these insertions through amplicon
Neighbouring genes with FDR <​10%, a transformed allelic expression ratio sequencing using an Illumina MiSeq (primers in Supplementary Table 2).
>​0.03, and an effect size of >​10% in heterozygotes were considered significant. Transcriptional activity with GRO-Seq. We used precision run-on sequencing
No statistical methods were used to predetermine sample size, but we generated (PRO-seq)43, a variant of global run-on sequencing44, to map transcriptionally
as many knockout clones as possible. The experiments were not randomized and engaged RNA polymerase for a subset of clones. Clones for PRO-seq (as well as
the investigators were not blinded to allocation during experiments and outcome ChIP–seq and assays for transposase-accessible chromatin with high-through-
assessment. put sequencing (ATAC-seq)) were chosen from among the recoverable knockout
Transcriptional read-through for Meg3 and Snhg3. Promoter knockouts of Meg3 cell lines with a preference for clones with homozygous knockouts or knockouts
and Snhg3 led to reductions in one or more downstream genes oriented in the on the 129 allele only. We performed PRO-seq as previously described45, with
same direction as the knockout target gene. We attributed these changes to tran- ­modifications. We harvested 10 million mES cells by scraping, washing in cold
scriptional read-through based on the following evidence (Supplementary Note 4 PBS, and spinning at 330g for 3 min. The cell pellet was resuspended in 1 ml cold
and Extended Data Fig. 3). For both Meg3 and Snhg3, we observed evidence for douncing buffer (10 mM Tris-HCl pH 7.4, 300 mM sucrose, 3 mM CaCl2, 2 mM
transcription continuing past the annotated 3′​end of the knockout target, through MgCl2, 0.1% (v/v) Triton X-100, and 0.5 mM DTT) per 1 million cells. The cells
intergenic regions, and into the downstream gene (as assayed by RNA sequencing were incubated on ice in the cold room for 5 min and dounced 25 times. The nuclei
of chromatin-associated RNA). For the Meg3 locus, we did not observe H3K4me3 were pelleted at 500g for 2 min, washed twice in 5 ml douncing buffer, and centri-
or CAGE reads at the 5′​ends of Rian and Mirg (downstream of Meg3), indicating fuged at 500g for 2 min. The nuclei were then gently resuspended in 100 μ​l of cold
that they are not expressed from their own promoters. In the Snhg3 locus, the storage buffer (10 mM Tris-HCl, pH 8.0, 25% (v/v) glycerol, 5 mM MgAc2, 0.1 mM
downstream affected gene (Rcc1) is in fact expressed from its own promoter, but EDTA, and 0.5 mM DTT), immediately flash frozen, and stored at −​80 °C until use.
we found evidence for reads splicing from just downstream of Snhg3 into the first A 28 μ​l 2×​ Nuclear Run-On (NRO) mix was prepared as follows: 1 M Tris-HCl,
splice acceptor of Rcc1, indicating that at least some fraction of Rcc1 transcripts pH 8.0, 1 M MgCl2, 2 M KCl, and 0.1 M DTT. 5 μ​l of 1 mM Biotin-11-CTP (Perkin
begin at the Snhg3 promoter. Elmer), 1 μ​l of 0.05 mM CTP, 2.5 μ​l of 2 mM ATP, 2.5 μ​l of 2 mM GTP, 2.5 μ​l of
Insertion of polyadenylation signals. To halt transcription, we initially attempted 2 mM UTP (Sigma Aldrich), 6.5 μ​l of nuclease free water, and 2 μ​l of SUPERaseIn
to use a short 49-bp synthetic polyadenylation signal (spA) sequence 42 to (Ambion) were added to the 2×​ NRO mix and mixed well before the addition of
­minimize the amount of genomic sequence added (Extended Data Fig. 6b). For 50 μ​l of 2% NLS. The NRO reaction mix was mixed well and preheated to 37 °C.
a given gene, we designed a guide 0.5–3 kb downstream of the transcription start 100 μ​l of NRO mix was added to 100 μ​l of nuclei in storage buffer. The reaction
site. We designed 200-nt ssDNA oligos including the spA sequence flanked by was mixed gently by pipetting and incubated at 37 °C for 3 min, mixing halfway
75- and 76-bp homologous arms, centred on the sgRNA cut site (~​4 bp upstream through. To halt the reaction 500 μ​l of TRIzol LS (Thermo Fisher) was added,
of the PAM sequence), and ordered these as ultramers from Integrated DNA mixed well, and incubated at room temperature for 5 min. RNA was isolated
Technologies (Supplementary Table 2). To knock in polyadenylation signals, through a chloroform extraction and ethanol precipitation, and resuspended in
we transfected 100 ng PX330-NoGuide, 100 ng pZB, 100 ng pS-pp7-GFPiP, and 20 μ​l of H2O. The RNA was heat denatured at 65 °C for 40 s and fragmented on ice
100–200 ng of donor ssDNA oligo and followed the selection procedure described for 10 min with 5 μ​l of 1N NaOH. To stop the reaction, 5 μ​l of 1 M acetic acid and
for the promoter knockouts. To genotype these insertions, we used a combina- 20 μ​l of 1 M Tris-HCl, pH 7.4 were added. To remove unincorporated biotinylated
tion of PCR and high-throughput amplicon sequencing as described above. We nucleotides, the sample was passed through a P-30 exchange column (BioRad).
identified clones that had heterozygous insertions of the full 49-bp spA sequence 1 μ​l of RNase inhibitor was added to the ~​50 μ​l of RNA and the first biotin enrich-
on one allele; we typically observed that the other allele had a short insertion or ment was then performed.
deletion, consistent with non-homologous end joining (NHEJ)-mediated repair. Each biotin enrichment was performed as follows. To prepare the Streptavidin
This short pAS sequence (spA) succeeded in halting the transcription of three M280 Beads (Invitrogen) for biotin enrichment, 100 μ​l of beads were taken per
RNAs: Blustr (pAS at +​40 bp and +​0.5 kb in Fig. 3), Gpr19, and Bendr. However, sample and washed once in 0.1N NaOH with 50 mM NaCl and twice in 100 mM
for other genes, transcription was unaffected despite pAS knock-in, consistent with NaCl. Beads were resuspended in 160 μ​l of binding buffer (10 mM Tris-HCl, pH
the location-dependent efficiency previously observed for this pAS sequence42. 7.4, 300 mM NaCl, and 0.1% (v/v) Triton X-100). To each sample an equal volume
Accordingly, we built a larger construct containing three polyadenylation signals of Streptavidin M280 beads was added, mixed, and incubated on a rotator for
(p3PA, Extended Data Fig. 6c). The structure of this construct upon insertion into 20 min at room temperature. The beads were magnetically separated and washed
the genome through homologous recombination is as follows: spA–EFS promoter– twice in 500 μ​l of ice cold high salt wash buffer (50 mM Tris-HCl, pH 7.4, 2 M
Puromycin resistance gene IRES thymidine kinase–WPRE–SV40 pAS–PGK pAS NaCl, and 0.5% (v/v) Triton X-100), twice in 500 μ​l of binding buffer, and once
(p3PA-Puro-iTk). We co-transfected 300 ng of this construct with 100 ng of pZB in 500 μ​l of low salt wash buffer (50 mM Tris-HCl, pH 7.4 and 0.1% (v/v) Triton
and 100 ng of PX330-NoGuide, waited three days, and then selected for cells with X-100). To harvest the RNA, 300 μ​l of TRIzol (Thermo Fisher) was added to the
integrations with 1 μ​g ml−1 puromycin for one week. We picked individual colonies beads, vortexed for 20 s, and incubated at room temperature for 3 min. 60 μ​l of
and used PCR to genotype clones, using primers spanning the insertion junctions. chloroform was added and mixture was incubated at room temperature for 3 min.
We sequenced these PCR products to determine the allele of insertion. Following The samples were centrifuged at 14,000g for 5 min at 4 °C. The aqueous phase
genotyping, we expanded clonal cell lines and transfected them with PX330 and a was collected and transferred to a new tube; the remaining organic phase was
pool of four sgRNAs to delete the selection cassette, leaving behind three tandem removed from the beads. The TRIzol extraction was then repeated as above and
pASs. Following selection with 2 μ​g ml−1 ganciclovir, we again picked ­individual the two aqueous phases were combined. RNA was purified with a chloroform

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

extraction and ethanol precipitation, and resuspended in nuclease-free water. RNA associated domains (TADs) were downloaded from the Ren Laboratory
sequencing libraries were then prepared as described above, except that SILANE (http://chromosome.sdsc.edu/mouse/hi-c/download.html)52.
clean-ups were replaced with streptavidin-biotin capture enrichments until after LncRNA transcript annotations. For evolutionary conservation analysis,
reverse transcription (a total of three enrichments). we used lncRNA annotations and isoforms previously defined based on RNA
We sequenced PRO-seq libraries to a depth of ~​10 million 30-bp paired-end ­sequencing in mouse embryonic stem cells, combining annotations generated with
reads. To analyse the data, we mapped and processed the RNA sequencing data as ­multiple methods (Scripture41 and slncky28). We filtered the combined list using
described above, including aligning individually to the 129 and castaneus genomes. slncky28 to eliminate transcripts predicted to encode proteins or micropeptides
Figures showing ‘Allele-specific GRO-seq’ depict coverage for reads that uniquely by UCSC, transcripts that partially align to protein-coding genes (for example,
map to the specific allele indicated in the figure. To assess the relative read density pseudogenes or incomplete reconstructions), and species-specific coding gene
in the promoter-proximal region and gene body of Sfmbt2, we counted reads in duplications. Subsequently we performed several manual curation steps. We
the 2 kb region downstream of the first Sfmbt2 TSS and in the remainder of the examined each isoform using a combination of long-read RNA-sequencing data,
gene body46. We calculated the pause index as the ratio of these two quantities, total ­chromatin-associated RNA sequencing data, cap analysis of gene expres-
normalized to total read count. We noticed that different PRO-seq libraries had sion (CAGE) data, and poly(A)+ 3′​-end sequencing data from mES cells28,30,41,53.
subtle biases in the relative fraction of reads aligning to the TSS versus the gene We eliminated transcripts that appeared to result from an extended 3′​UTR of
body, leading to slightly offset distributions of pause indices across all genes, and an upstream protein-coding transcript. Because the precise 5′​ends of transcripts
so we corrected for these biases in each library by normalizing TSS and gene body are imprecisely assigned based on RNA-sequencing data alone, we re-assigned
RPKMs to the median of the ~​5,000 genes with coverage across all samples. 5′​ends (TSSs) using a sliding-window approach to find the 10-bp window with the
Chromatin accessibility with ATAC-seq. Libraries were generated as previously highest number of same-strand CAGE reads within 300-bp of the initial ­calculated
described47 using 50,000 mES cells. We generated duplicate ATAC-seq libraries for TSS. We additionally manually curated the TSS of each lncRNA, some of which
each clonal cell line examined and sequenced each to a depth of ~​40 million 30-bp were incorrectly assigned by more than 300 bp, based on CAGE and H3K4me3
paired end reads. We aligned paired-end DNA sequencing reads using Bowtie2 ChIP–seq data, and eliminated any where we could not identify the TSS (for
(ref. 48) to each of the 129 and castaneus genomes with the following parameters: ­example, due to an unmappable sequence or very low abundance).
–met-­stderr–maxins 1000, removed duplicate reads using Picard (http://picard. Analysis of lncRNA and promoter conservation. To categorize lncRNAs by
sourceforge.net), and filtered to uniquely aligning reads using samtools (MAPQ <​30, their conservation properties and promoter locations, we examined a set of 307
https://github.com/samtools/samtools). For plotting normalized read coverage ­lncRNAs expressed in mES cells as described above. We assessed the conservation
at the Blustr and Sfmbt2 promoters, we combined data from the two biological of each lncRNA through a two-step approach. We first used slncky to look in
replicates (two independent measures of the same cell line) and connected paired- syntenic locations for evidence of lncRNA transcripts in deep poly(A)+ RNA-
end reads to generate fragments. Fragment coverage was normalized by the total seq of rat, chimp, and human induced pluripotent stem cells (iPSCs)28. LncRNAs
number of uniquely mapping reads. called ‘conserved’ by this first filter have substantial evidence based on RNA-seq
Chromatin immunoprecipitation. ChIP–seq for H3K4me3 and H3K27me3 was that allows for ­independent reconstruction of the transcript in one or more of
performed using monoclonal antibodies as previously described49. Sequencing data these other organisms. We categorized the remaining lncRNAs by the location of
was analysed as for ATAC-seq described above. their TSS: 71 lncRNAs originate within 500-bp of an mRNA TSS on the opposite
Validation of allele-specific RNA expression with ddPCR. To validate our strand (divergent); 59 lncRNAs originate within the long-terminal repeats (LTRs)
RNA-seq based measurements of allele-specific expression, we used a quanti- of endogenous retroelements; and 79 lncRNAs have their promoters in intergenic
tative allele-specific PCR assay to verify measurements for Blustr and Sfmbt2. regions that do not overlap with LTRs and do not emerge from a bidirectional
We isolated RNA from harvested mES cells using RNeasy 96 columns and per- mRNA promoter (henceforth, ‘intergenic’).
formed a DNase treatment followed by reverse transcription of 500 ng of RNA Because some conserved lncRNAs might be expressed at too low a level to
(total reaction volume 20 μ​l). We performed droplet digital PCR (ddPCR) assemble a transcript de novo in a given species, we examined more closely the 79
using Bio-Rad Custom ddPCR Assays that involve qPCR primers flanking intergenic lncRNAs that were called ‘mouse-specific’ in the initial slncky analysis.
a polymorphic site and two allele-specific fluorescent probes. For Blustr: left We applied a second, more stringent threshold to remove lncRNAs misclassified
primer sequence: GACAAATACTCCCTTCAACA; right primer sequence: as mouse-specific due to low abundance. For each intergenic lncRNA locus, we
GAACAGTTTGTCCTGCC; probe sequence: TAAGTGAGGTGAACTCCAAG used liftOver54 to map the 10 bp surrounding the mouse TSS (mm9) to the human
(129 allele, FAM) or AGTGAGGCGAACTTCAAG (castaneus, HEX). For Sfmbt2: genome (hg19) (minMatch =​ 0.1, UCSC chain). 37 of these transcripts did not lift
left primer sequence: TGTAAGTTTGCCTGATACTC; right primer sequence: over at this step, and thus were considered mouse-specific. For the 42 that did lift
TCTAATGTACCTCAGCCC; probe sequence: TTTCCTATGAGCAGTTCAAC over, we examined the syntenic region for evidence of poly(A)+ RNA-seq data from
(129 allele, FAM) or TCCTATGAACCGTTCAGC (castaneus, HEX). ddPCR was human iPSCs28 or poly(A)+ nuclear-fraction RNA-seq from hES cells (–100 to
done with 2.2 μ​l of cDNA, 11 μ​l of Supermix (BioRad), 1.1 μ​l of each probe, and +​900 bp relative to the TSS), or for evidence of poly(A)+ nuclear-fraction or whole-
7.7 μ​l of water per reaction followed by droplet generation. PCR was performed as cell CAGE from hES cells (–250 to +​250 bp relative to the TSS), and removed from
follows: 95 °C for 10 min; cycling at 94 °C for 30 s and 55 °C for 1 min for a total of consideration any lncRNAs that showed evidence for RNA-seq or CAGE above
40 cycles; and 98 °C for 10 min. Readout was done using the QX200 Droplet Reader a certain threshold. We chose this threshold based on a set of random intergenic
and Quantasoft Software (BioRad) to determine the total number of droplets regions, which were matched to the set of intergenic mouse-specific lncRNAs based
containing each allele. We calculated allelic expression ratios from these values and on GC content. We eliminated from consideration the ten lncRNAs that showed
compared it to values generated through RNA-sequencing and hybrid selection of RNA-seq or CAGE signals greater than the 90th percentile of random regions,
the same RNA samples (Extended Data Fig. 2d, e). corresponding to approximately two CAGE or RNA-seq reads in the windows
External ChIP–seq, RNA-seq, and DNase HS data. We used the following data described above. These ten lncRNAs were added to the ‘conserved’ section of
from ENCODE50: H3K4me3, H3K4me1, H3K27ac, and CTCF ChIP–seq in the pie chart in Fig. 4a. Several of these ten lncRNAs correspond to substantially
mES cells (ES-Bruce4); DNase hypersensitivity sequencing in mES cells (E14); shortened, ­single-exon poly(A)+ transcripts that show minimal overlap with the
H3K4me3, H3K4me1, and CTCF ChIP–seq and DNase HS data in H1-hES cells; syntenic exons in mouse; although a majority of the exonic sequence of these
and RNA-sequencing data in H1-hES cells (nuclear poly(A)+, nuclear total). To transcripts are not in fact conserved between human and mouse, we excluded these
assess transcription factor binding to mRNA and lncRNA promoters (Extended from consideration as putative mouse-specific lncRNAs.
Data Fig. 7c), we examined mES cell ChIP–seq peaks available from Kagey et al. For the purposes of examining the conservation properties of these intergenic
at the Gene Expression Omnibus (GSE22562)51. mouse-specific lncRNAs, we defined a matched set of ‘enhancer’ elements. We first
DNA purification for examining proximity contacts. To examine the ­proximity generated a list of regulatory elements in mES cells using the DNase hotspots called
contacts of the linc1405 locus, we used the RAP-DNA protocol, which we initially by ENCODE-UW in ES-E14 cells. As an estimate of the activity of each element, we
developed in order to map RNA localization to chromatin, to capture linc1405 calculated the density of H3K27ac reads in the region. From the set of intergenic
DNA37. In brief, we cross-linked live cells to fix endogenous chromatin complexes, elements that did not overlap a promoter, lncRNA promoter, or LTR, we selected
then purified a target DNA region using a pool of oligonucleotides targeting the a random subset matched to the intergenic lncRNA promoters for H3K27ac­
linc1405 locus (Supplementary Table 3). Here, we used probes that are the same density (binned by 10 reads per bp) and distance to the TSS of the closest active
strand as the linc1405 RNA—in this way, we specifically capture the linc1405 gene (binned by 5 kb). We call these elements ‘enhancers’ because they are marked
DNA and do not directly capture the linc1405 RNA itself. We mapped the 3D by DNase hypersensitivity and H3K27ac but do not overlap a known gene
­proximity contacts of the linc1405 locus through high-throughput sequencing of promoter.
co-­purified DNA and calculated the normalized enrichment to an input DNA We compared the sequence conservation and functional conservation of three
library in 1-kb windows (Extended Data Fig. 7e). Annotations for ­topologically classes of elements: intergenic mouse-specific lncRNAs, matched intergenic

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

enhancer elements, and GC-matched random intergenic elements. First, we Code availability. Code for the analyses described in this paper is available from
­computed the rate at which each set maps to human sequence. We centred each the authors upon request.
element and used liftOver (–minMatch =​ 0.1) to identify the syntenic region in
the human genome. Elements that did not lift over at this step correspond to the 29. Bhatt, D. M. et al. Transcript dynamics of proinflammatory genes revealed by
white segment of the pie charts in Fig. 4 (iii, did not map). For elements that did lift sequence analysis of subcellular RNA fractions. Cell 150, 279–290 (2012).
over to human, we next defined the subset that map to putative regulatory elements 30. Engreitz, J. M. et al. RNA–RNA interactions enable specific targeting of
noncoding RNAs to nascent pre-mRNAs and chromatin sites. Cell 159,
in human. We examined a 500-bp window centred on the lifted over region and 188–199 (2014).
counted reads in hES cell DNase-seq data from ENCODE. We defined regions 31. Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases.
showing DNase HS scores higher than 95% of the mappable random intergenic Nat. Biotechnol. 31, 827–832 (2013).
regions as putative DNA regulatory elements. We note that these random inter- 32. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human
genic regions include some enhancers; they are matched to lncRNA promoters cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).
33. Keane, T. M. et al. Mouse genomic variation and its effect on phenotypes and
for GC content, and thus frequently correspond to regulatory elements (which gene regulation. Nature 477, 289–294 (2011).
are GC-rich) that happen to be active in hES cells. For both intergenic mouse-­ 34. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems.
specific lncRNAs and enhancers, ~​33% of elements corresponded to putative Science 339, 819–823 (2013).
DNA ­regulatory elements in human (Fig. 4d), representing a ~​6.6-fold enrichment 35. Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an
versus the random intergenic controls. To compare sequence conservation of optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).
36. Shishkin, A. A. et al. Simultaneous generation of many RNA-seq libraries in a
these classes of elements, we calculated the average SiPhy score55 across each
single reaction. Nat. Methods 12, 323–325 (2015).
500-bp region surrounding the mouse TSS or the centre of the enhancer 37. Engreitz, J., Lander, E. S. & Guttman, M. RNA antisense purification (RAP) for
element, using the 29 mammals alignment from the mouse perspective56. mapping RNA interactions with chromatin. Methods Mol. Biol. 1262, 183–197
We used a two-sided Mann–Whitney U-test to look for changes in the distributions (2015).
of SiPhy scores to the set of mappable random intergenic regions (Fig. 4d: 38. Engreitz, J. M. et al. The Xist lncRNA exploits three-dimensional genome
random ii + ​iii). architecture to spread across the X chromosome. Science 341, 1237973
(2013).
Impact of expression level on conservation analysis. Although the set of inter- 39. Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for
genic mES cell lncRNAs examined above does not show any significant evidence massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).
for poly(A)+ RNA in the syntenic locus in human, some of these transcripts may 40. Huang, S., Holt, J., Kao, C.-Y., McMillan, L. & Wang, W. A novel multi-alignment
not be detected in human and yet still be truly conserved. These transcripts might pipeline for high-throughput sequencing data. Database (Oxford) 2014,
be misclassified as mouse-specific lncRNAs for several reasons, including: (i) low bau057 (2014).
41. Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes
expression level in hES cells and iPS cells such that the lncRNA, by chance, is not in mouse reveals the conserved multi-exonic structure of lincRNAs.
detected based on the depth of sequencing data available; or (ii) the lncRNA is not Nat. Biotechnol. 28, 503–510 (2010).
expressed in hES cells or iPS cells, but is expressed in a different human cell type 42. Levitt, N., Briggs, D., Gil, A. & Proudfoot, N. J. Definition of an efficient synthetic
and thus may have a conserved function. poly(A) site. Genes Dev. 3, 1019–1025 (1989).
To estimate the false positives resulting from these and other scenarios, we 43. Kwak, H., Fuda, N. J., Core, L. J. & Lis, J. T. Precise maps of RNA polymerase reveal
how promoters direct initiation and pausing. Science 339, 950–953 (2013).
examined the properties of a set of 853 conserved mRNAs matched to the inter-
44. Core, L. J., Waterfall, J. J. & Lis, J. T. Nascent RNA sequencing reveals
genic mouse-specific lncRNAs based on expression in mES cells. We counted widespread pausing and divergent initiation at human promoters. Science
the frequency at which these mRNAs would be called ‘not conserved’ by the 322, 1845–1848 (2008).
same procedures described above: we applied the nuclear poly(A)+ CAGE and 45. Mahat, D. B. et al. Base-pair-resolution genome-wide mapping of active RNA
RNA-seq filters to eliminate transcripts that show detectable transcription in the polymerases using precision nuclear run-on (PRO-seq). Nat. Protocols 11,
1-kb region near the TSS. While 87% of the intergenic lncRNAs described above 1455–1476 (2016).
46. Adelman, K. & Lis, J. T. Promoter-proximal pausing of RNA polymerase II:
passed these filters (and thus appeared to be mouse-specific), only 22% of the emerging roles in metazoans. Nat. Rev. Genet. 13, 720–731 (2012).
­expression-matched mRNAs passed; this indicates that the set of 69 mouse-specific 47. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J.
intergenic lncRNAs are approximately 3.9-fold enriched for human elements that Transposition of native chromatin for fast and sensitive epigenomic profiling of
are not transcribed in hES cells. Thus, the mouse-specific lncRNAs defined above open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods
appear to consist largely of transcripts that are not conserved. 10, 1213–1218 (2013).
48. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2.
We performed the following additional analyses to ensure the robustness of our Nat. Methods 9, 357–359 (2012).
conclusions regarding the existence of lncRNAs that evolved from ancestral regula- 49. Busby, M. et al. Systematic comparison of monoclonal versus polyclonal
tory elements. First, we examined the conservation of the first 5′​splice sites of this antibodies for mapping histone modifications by ChIP–seq. Preprint at
set of lncRNAs. In 7 of these 11 loci, the GT dinucleotide in the first 5′​splice site http://dx.doi.org/10.1101/054387 (2016).
is not conserved, suggesting that a similar spliced transcript cannot be produced 50. Mouse ENCODE Consortium et al. An encyclopedia of mouse DNA elements
from this locus. Second, we re-performed the entire conservation analysis focusing (Mouse ENCODE). Genome Biol. 13, 418 (2012).
51. Kagey, M. H. et al. Mediator and cohesin connect gene expression and
on the 50% of mES cell intergenic lncRNAs with the highest expression levels; chromatin architecture. Nature 467, 430–435 (2010).
these lncRNAs are less likely to be missed in hES cells due to low ­abundance. We 52. Dixon, J. R. et al. Topological domains in mammalian genomes identified by
also adjusted our poly(A)+ RNA and CAGE filters to require a complete absence analysis of chromatin interactions. Nature 485, 376–380 (2012).
of reads in the corresponding regions in hES cells and iPS cells. Using these 53. Fort, A. et al. Deep transcriptome profiling of mammalian stem cells supports
filters, 79% of the intergenic lncRNAs are not detectably expressed in human a regulatory role for retrotransposons in pluripotency maintenance. Nat. Genet.
cells, representing a ~​12-fold enrichment over mRNAs matched for expres- 46, 558–566 (2014).
54. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12,
sion level. Therefore we are confident that most of these lncRNAs are correctly 996–1006 (2002).
­classified as mouse-specific. Of the 30 intergenic lncRNAs called mouse-specific 55. Garber, M. et al. Identifying novel constrained elements by exploiting biased
by this more conservative analysis, 5 do indeed correspond to putative DNA substitution patterns. Bioinformatics 25, i54–i62 (2009).
­regulatory elements, including linc1494 (Fig. 4c), representing a >​8-fold enrich- 56. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint
ment ­versus GC-matched random sequences (chi-squared P <​  1 ×​  10−10). Thus, using 29 mammals. Nature 478, 476–482 (2011).
57. Gentleman, R. C. et al. Bioconductor: open software development for
our ­conclusions that some lncRNAs appear to evolve from ancestral regulatory
computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
elements are robust even with stringent thresholds. 58. Lawrence, M. et al. Software for computing and annotating genomic ranges.
Software for data analysis and graphical plots. We used the following software PLOS Comput. Biol. 9, e1003118 (2013).
for data analysis and graphical plots: R Bioconductor (version 3.0)57, Gviz (version 59. Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for
1.10.11), gplots (version 2.17.0), GenomicRanges (version 1.18.4)58, rtracklayer interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009).
(version 1.26.3)59, BEDTools60, Integrative Genomics Viewer (version 2.3.26)61, 60. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing
genomic features. Bioinformatics 26, 841–842 (2010).
and vcftools (version 0.1.12)62.
61. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26
Data availability. Sequencing data for this study is available at the Gene Expression (2011).
Omnibus (GSE80262 and GSE85798), and additional visualizations of the data are 62. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27,
available at http://pubs.broadinstitute.org/neighboring-genes/. 2156–2158 (2011).

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

Extended Data Figure 1 | Expression and subcellular localization subcellular localization of lncRNAs and mRNAs. We sequenced poly(A)+
of knocked-out lncRNAs and mRNAs. a, Expression of lncRNAs and RNA from chromatin, soluble nuclear, and cytoplasmic fractions (see
mRNAs in F1 129/castaneus female mES cells, reported in fragments per Methods) and plotted the relative abundance of mature transcripts in each
kilobase per million (FPKM) in whole-cell poly(A)+ RNA-seq. Cumulative fraction. We selected lncRNAs that showed localization biased towards
fraction is plotted for all mRNAs expressed in mES cells. Large dots the nuclear fractions relative to most mRNAs. For comparison, we plotted
represent transcripts whose promoters we deleted in this study. LncRNAs 1,000 randomly selected mRNAs (light grey).
and mRNAs span a >​20-fold range of abundance levels. b, Relative

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

Extended Data Figure 2 | Generation of knockout clones and the Blustr locus. Each dot represents the mean of two ddPCR technical
measurement of allele-specific RNA expression. a, Overview of replicates (x axis) and the value from one RNA-seq technical replicate
knockout and measurement protocol. b, Distribution of allelic expression (y axis). f, Example locus showing hybrid selection strategy and RNA-seq
ratios (number of informative reads mapping to 129S1 allele divided by the coverage for cell lines with the indicated genotype for deletion of the Bendr
number mapping to either the 129S1 or the castaneus allele) across active promoter. The y axis scales represent normalized read counts and are the
genes in mES cells. c, Scatter plot of allelic expression ratios for genes same for all hybrid selection tracks. The absolute level of expression for
with RPKM ≥​ 2 that have more than 100 allele-informative reads across any given gene varies among clonal cell lines; throughout this work, we
all libraries. Allelic expression ratios are consistent in RNA sequencing instead consider the relative level of expression between the two alleles in
data before and after hybrid selection (HS). d, e, Allelic expression ratios heterozygous knockout cells. For similar plots of each gene studied,
as measured by two independent methods for Blustr (d) and Sfmbt2 (e) see http://pubs.broadinstitute.org/neighboring-genes/.
expression in 15 clonal cell lines containing genetic modifications in

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

Extended Data Figure 3 | Read-through transcription at Meg3 and alleles. Error bars, 95% confidence interval for the mean (n ≥​  2 alleles,
Snhg3 loci. a, Snhg3 promoter knockout reduces the levels of Rcc1 mRNA see Supplementary Table 1). b, Meg3 promoter knockout eliminates the
by 23%. However, sequencing of chromatin-associated RNA shows expression not only of Meg3 but also of two additional lncRNAs encoded
that transcription continues past the annotated 3′​end of Snhg3 into the downstream in a tandem orientation (Rian and Mirg). Although these
downstream Rcc1 gene (see Methods). This read-through transcription three lncRNAs are annotated as separate genes, they appear to be derived
creates a fusion transcript containing exons of both Snhg3 and Rcc1, from a single transcript driven by the Meg3 promoter. This is consistent
as well as intergenic RNA. We note that this fusion transcript is also with the presence of continuous chromatin-associated RNA throughout
annotated in the syntenic human locus as an alternative isoform of RCC1. the locus and a lack of CAGE reads at the 5′​ends of Rian and Mirg3.
Bars, relative poly(A)+ RNA expression on modified versus unmodified

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

Extended Data Figure 4 | Promoter knockouts for five intergenic (see Methods). The y axis is capped at –10 to +​10 standard deviations
lncRNAs affect the expression of a neighbouring gene. Significance from the mean. Black, knocked-out lncRNA; blue, gene with significant
(z-score) of allele-specific expression ratios at all genes within 1 Mb of each allele-specific change in gene expression (FDR <​  10%). Independent
of five lncRNA loci. Each dot represents a different heterozygous promoter clones are not expected to yield the same significance value (z-score), in
knockout clone for a given gene. Dots are shown only for genes that are part because read depth differs between samples.
sufficiently highly expressed to assess allele-specific expression

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

Extended Data Figure 5 | Promoter knockouts for four mRNAs affect The y axis is capped at –10 to +​10 standard deviations from the mean.
the expression of a neighbouring gene. Significance (z-score) of allele- Black, knocked-out lncRNA; blue, gene with significant allele-specific
specific expression ratios at all genes within 1 Mb of each of four mRNA change in gene expression (FDR <​ 10%). Independent clones are not
loci. Each dot represents a different heterozygous promoter knockout expected to yield the same significance value (z-score), in part because
clone for a given gene. Dots are shown only for genes that are sufficiently read depth differs between samples.
highly expressed to assess allele-specific expression (see Methods).

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

Extended Data Figure 6 | Dissecting mechanisms for how gene loci


regulate a neighbour. a, Three categories of possible mechanisms by
which a gene locus might regulate the expression of a neighbour. b, We
used two strategies to insert pAS downstream of gene promoters. In the
first strategy, we inserted a 49-bp synthetic pAS (spA) using a single-
stranded DNA oligo with 75-bp homology arms (see Methods). c, In the
second pAS insertion strategy, we cloned a donor plasmid containing
a selection cassette and three different pAS sequences (see Methods).
Homology arms of 300–800 bp were used to integrate the cassette. After
isolating clones with successful insertions, we used a second round of
transfections to remove the selection cassette, leaving behind three tandem
pASs. EFS, elongation factor 1 promoter; Puro, puromycin resistance gene
(pac); HSV-tk, herpes simplex virus thymidine kinase.

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

Extended Data Figure 7 | Promoters of lncRNAs and mRNAs have the promoter may be involved in the cis-regulatory function. c, Presence
enhancer-like functions. a, Allele-specific GRO-seq signal for clones with (grey) or absence (white) of various chromatin marks and transcription
the indicated modifications at the Bendr locus. Only reads specifically factors in mES cells in a 1.5-kb window centred on the TSS of each
mapping to one of the two alleles are shown. The y axis scale represents targeted gene. d, Distance from each knocked-out gene to its neighbouring
normalized read count and is the same for all tracks. b, Allele-specific target gene (x axis) versus the magnitude of the effect on the expression
poly(A)+ RNA expression for genetic modifications at the linc1405, of the neighbouring gene (per cent compared to wild-type, y axis). Blue
Snhg17, Gpr19, and Slc30a9 loci. Bars, average RNA expression on genes represent those discussed in main text; grey genes are discussed in
modified compared to unmodified (wild-type) alleles. Error bars, Supplementary Note 5. e, Proximity-based contacts between the linc1405
95% confidence intervals for the mean (n ≥​ 2 alleles, see Supplementary and Eomes loci. The y axis shows enrichment in a sequencing-based
Table 1). Grey arrows indicate distance from the targeted locus promoter proximity assay in which we used antisense oligos to capture linc1405
to the affected neighbouring gene. We note that, based on their location, DNA and any interacting, cross-linked proximal DNA (see Methods).
the Snhg17 and Gpr19 pAS insertions probably allow more substantial TAD annotations are derived from Hi-C experiments in mES cells
splicing and transcription; for these loci, it is clear that the majority of (see Methods). Blue arrow, focal contact between the linc1405 and
the transcript is dispensable but it is possible that transcription close to Eomes loci.

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

Extended Data Figure 8 | Characterization of genetic modifications which also disrupt the 5′​splice site. Bar plots show allele-specific RNA
in the Blustr locus. a, Allele-specific GRO-seq signal for clones with expression for knockout clones and control clones (n =​  18 for +​/+​, 1 for
the indicated modifications at the Blustr locus. Only reads specifically others). Error bars, 95% confidence interval for the mean. d, Schematic of
mapping to one of the two alleles are shown. The y axis scale represents the observed splice structures of Blustr RNA transcripts in poly(A)+ RNA
normalized read count and is the same for all tracks, and is magnified sequencing of the exon deletion clones. Each deletion removes a region
five times at the indicated location to better visualize the reads in the including ~​50–200 bp on either side of the exon, thereby removing both
Sfmbt2 locus. b, Quantification of allele-specific GRO-seq signal in the the exon and its splice sites. The Exon 4 deletion removes the endogenous
Sfmbt2 locus on alleles modified as indicated. TSS, region including the pAS, leading to new isoforms of the lncRNA transcript that splice into two
two alternative TSSs of Sfmbt2 and 2 kb downstream; gene body, region cryptic splice acceptors downstream. e, GRO-seq, H3K4me3 ChIP–seq,
containing the remainder of the Sfmbt2 gene locus; pause index, ratio of and chromatin accessibility (ATAC-seq FPKM) at the Blustr and Sfmbt2
TSS to gene body. Dashed grey lines indicate the 95% confidence intervals promoters in cell lines with the indicated genotypes. Deletion of the first
for the mean of eight wild-type clones. Bars, n =​ 8 for wild-type and n =​  1 5′​splice site leads to a significant reduction in H3K4me3, RNA polymerase
for others. c, Schematic of the 5′​end of the Blustr locus and genotypes occupancy, and chromatin accessibility at the Blustr promoter, as well as
of two knockout clones. The 5′​splice site is located 78 bp downstream of H3K4me3 and RNA polymerase occupancy (but not accessibility) at the
the Blustr transcription start site (in this panel, Blustr is transcribed from Sfmbt2 promoter. f, H3K27me3 ChIP–seq at the Blustr and Sfmbt2 loci in
left to right). One of the alleles from the two clones contains insertion of cell lines with the indicated genotypes. Deletion of the Blustr promoter or
the oligo mediated by homologous recombination; the remaining three 5′​splice site leads to spreading of the repression-associated H3K27me3
alleles contain insertions or deletions resulting from non-homologous modification across a ~​30 kb region.
end joining repair of sgRNA-mediated double-strand breaks, some of

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
LETTER RESEARCH

Extended Data Figure 9 | Mechanisms for cross-talk between neighbouring lncRNAs and mRNAs. Proposed mechanisms based on pAS insertion
experiments and other genetic manipulations (see text). For proposed mechanisms of lncRNAs marked with daggers see Supplementary Note 5.

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH LETTER

Extended Data Figure 10 | Classification of lncRNAs based on a Mann–Whitney U-test. *​*​*​P <​ 0.001. Box represents first and third
conservation and promoter location. a, Classification of 307 lncRNAs quartiles; centre line represents median; whiskers represent data within
expressed in mES cells. ‘Conserved’ transcripts are those that show 1.5×​ the interquartile range. b, Chromatin and RNA data for 11 mouse-
significant evidence of cap analysis of gene expression (CAGE) data specific lncRNAs that appear to have evolved from ancestral regulatory
and/or poly(A)+ RNA in syntenic loci (see Methods). Divergent, initiating elements. In mouse, these elements show evidence for CAGE, H3K4me3,
within 500 bp of an mRNA TSS, on the opposite strand; ERV, endogenous and DNase I hypersensitivity, consistent with their roles as promoters.
retroviral repetitive element (see Supplementary Note 9). Box plot shows The syntenic sequences in human do not show evidence for CAGE but
sequence-level conservation of the promoters of subsets of lncRNAs nonetheless are DNase I hypersensitive and are frequently marked by
expressed in mES cells. Random intergenic regions are matched to lncRNA H3K4me1 and/or CTCF. c, Model for evolution of lncRNAs from pre-
promoters by GC content. Positive SiPhy score indicates evolutionary existing enhancers, which often initiate weak bidirectional transcription.
constraint on functional sequences. Orange category corresponds to Spliced transcripts may neutrally appear through the appearance of splice
mouse-specific lncRNAs that appear to have evolved from ancestral signals and loss of polyadenylation signals. In some cases, transcription,
regulatory elements (REs) and correspond to sequences that show splicing, or other RNA processing mechanisms may feed back and
evidence for DNase I hypersensitivity in human embryonic stem cells. contribute to the cis-regulatory function of the promoter, producing a
Significance is calculated compared to random intergenic regions using lncRNA as a by-product.

© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

You might also like