Paper2 Fu
Paper2 Fu
Paper2 Fu
We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by
comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5ⴕ
untranslated regions (5ⴕ untranslated regions [UTRs]) of transcripts of intronless genes. This type of
comparison avoids the isolation of sequences that are distinguished by their protein-coding information. We
classified sequence families comprising 2069 putative exonic enhancers and 974 putative exonic silencers.
Representatives of each class functioned as enhancers or silencers when inserted into a test exon and assayed
in transfected mammalian cells. As a class, the enhancer sequencers were more prevalent and the silencer
elements less prevalent in all exons compared with introns. A survey of 58 reported exonic splicing mutations
showed good agreement between the splicing phenotype and the effect of the mutation on the motifs defined
here. The large number of effective sequences implied by these results suggests that sequences that influence
splicing may be very abundant in pre-mRNA.
[Keywords: splicing; pre-mRNA; motifs; exon; enhancers; silencers]
Supplemental material is available at http://www.genesdev.org.
Received February 17, 2004; revised version accepted April 9, 2004.
A fundamental step in the transfer of genetic informa- needed to distinguish true from false exons is thought
tion from DNA to protein is the splicing of RNA tran- to reside in splicing enhancer and splicing silencer
scripts. In this process, relatively small exons (∼100 nt) sequence elements (Blencowe 2000; Wagner and Garcia-
are selected from among generally much larger introns Blanco 2001; Cartegni et al. 2002; Ladd and Cooper
(thousands of nucleotides) and are joined to form mature 2002). Exonic splicing enhancers (ESEs) have been ex-
mRNA. Pre-mRNA splicing is accomplished by two se- tensively studied in the context of alternative splicing
quential transesterification reactions catalyzed by a very (Black 2003). ESEs have also been implicated in at least
large ribonucleoprotein complex known as the spliceo- some constitutive splicing (Mayeda et al. 1999; Schaal
some (Burge et al. 1999). A spliceosome is either re- and Maniatis 1999a). Exonic or intronic splicing silenc-
cruited or assembled at the correct 5⬘ splice site (donor) ers have been less extensively studied (Wagner and
and 3⬘ splice site (acceptor) in part through recognition of Garcia-Blanco 2001; Ladd and Cooper 2002).
conserved sequences spanning the intron–exon junctions Many ESEs have been found by selecting RNA se-
(Burge et al. 1999). However, the sequence conservation quences from among large numbers of random oligomers
at the splice sites is incomplete, such that there are whose insertion stimulates the use of a weak splice site
many false sites that match the consensus sequence as (Liu et al. 1998, 2000; Schaal and Maniatis 1999b) or a
well or better than the true sites (Senapathy et al. 1990). poorly included chimeric exon (Tian and Kole 1995) in a
For most long transcripts, it is clear that the exon repre- cell-free splicing system or in transfected cells (Coulter
sents the unit of initial recognition (Robberson et al. et al. 1997). In many of these cases the ESEs have been
1990; Berget 1995). However, incorporation of a require- shown to function in concert with specific splicing fac-
ment for a pair of splice sites defining an exon of limited tor proteins. However, despite their protein specificity,
length does not alleviate the problem. Pseudo exons, so these sequences are highly degenerate and their abun-
defined, outnumber real exons by an order of magnitude dance in introns is ∼80% of their frequency in exons (Liu
(Sun and Chasin 2000). The additional information et al. 1998). As a result, they are not very effective in
distinguishing true exons from pseudo exons (our unpub-
lished observations).
1
Corresponding author. Computational approaches have also been used to find
E-MAIL [email protected]; FAX (212) 865-8246.
Article published online ahead of print. Article and publication date are ESEs. There is a sharp transition in sequence composi-
at http://www.genesdev.org/cgi/doi/10.1101/gad.1195304. tion between introns and exons because of the fact that
GENES & DEVELOPMENT 18:1241–1250 © 2004 by Cold Spring Harbor Laboratory Press ISSN 0890-9369/04; www.genesdev.org 1241
Zhang and Chasin
most exons code for protein, whereas introns do not. found in which protein synthesis is initiated in exon 3 or
Most exons can be readily distinguished by this differ- higher. We focused on 502 internal non-protein-coding
ence (Zhang 1998; Zhang et al. 2003), and gene finding exons of these genes as examples of exons that should
programs that exploit it can be highly accurate (Burge have a relatively high content of ESEs and a relatively
and Karlin 1997). It is not clear what proportion, if any, low content of ESSs, but with no protein-coding infor-
of this salient information is used as an ESE in addition mation. We compared the sequence composition of these
to protein coding. Some researchers have succeeded in noncoding exons with that of two dissimilar types of
getting around this problem by comparing two classes of sequences. The first was pseudo exons: intronic regions
exons, each of which codes for protein (Fedorov et al. that have the appearance of exons in that they are
2001; Fairbrother et al. 2002). Fairbrother et al. (2002), bounded by sequences similar to acceptor and donor
reasoning that exons with weak splice sites were more splice sites and are of typical exon size, but are not in fact
likely to require ESEs for recognition than were exons spliced (Sun and Chasin 2000; Zhang et al. 2003). These
with strong sites, identified exonic hexamers that were sequences are expected to be low in ESEs and perhaps
both more prevalent in the former and more prevalent high in ESSs. We expected this comparison alone to also
than those found in the intronic flanks. When tested, yield sequences that specify other kinds of information
these hexamers indeed promoted splicing. Alternatively inherent in exons: sequences for nuclear transport,
spliced exons are usually associated with weak splice mRNA stability, and mRNA localization, for example.
sites (Black 2003), so it is possible that their selection To circumvent this problem, we also compared the non-
method was biased toward this class of exons. coding exons with a second class of sequences: the 5⬘
Exonic splicing silencers (ESSs) are a second class of UTRs of intronless genes. These regions could contain
sequence elements that are known to regulate alterna- the same nonsplicing information as the noncoding ex-
tive splicing. The high proportion of human genomic ons, but they should lack ESEs.
sequences that can act to inhibit splicing when inserted We compared the frequency of 8-mers (allowing one
into an exon suggests that ESSs may also play a role in mismatch) in noncoding exons to these two counter-
splice site selection (Fairbrother and Chasin 2000). Sironi parts. We chose 8-mers so as to include binding sites that
et al. (2004) searched for potential ESSs among hexamers could contain more information than the 5- or 6-mers
that are underrepresented in exons compared with that are commonly used in bioinformatics searches, and
pseudo exons and exon flanks; one of three such to facilitate the formation of stable synthetic heterodu-
hexamers tested had silencing activity. plexes for later experimental testing. However, because
We have circumvented the noise represented by ex- the frequency of individual 8-mers was too low to gather
onic protein coding sequences by examining exons that sufficient data from our set of 502 noncoding exons, we
do not code for proteins. We compared the frequencies of allowed a single mismatch for each 8-mer considered.
8-mers in constitutively spliced noncoding exons with Pseudo exons were chosen from the introns adjacent to
those in pseudo exons and the 5⬘ untranslated regions each noncoding exon in our database. We calculated z-
(UTRs) of intronless genes. Sequences overrepresented in scores as an index of the significance of a deviation be-
the noncoding exons were designated as putative ESEs, tween the frequency of each possible 8-mer in the non-
and underrepresented sequences were designated as pu- coding exons versus the pseudo exons and in the non-
tative ESSs. Over 3000 8-mers were identified by these coding exons versus the 5⬘ UTRs of intronless genes.
criteria. On testing, almost all of 20 sequences assayed This metric allows a determination of the statistical sig-
conferred the predicted phenotype. The large number of nificance of a frequency difference between two popula-
effective oligomers implied by these results suggests that tions without any knowledge of the underlying distribu-
sequences that positively and negatively influence splic- tion. A p value can be assigned to any z-score (see Supple-
ing may be very abundant in pre-mRNA. mental Material for the exact formula used). Because we
allowed a single mismatch, the z-score for each 8-mer
actually represents the average of 25 sequences, 24 of
Results which differ from the nominal sequence by a single base
substitution. The results are shown in Figure 1, in which
Computational strategy
the z-scores of noncoding exons versus pseudo exons are
We have used a computational approach to identify ESEs plotted on the abscissa, and the z-scores of noncoding
and ESSs associated with constitutively spliced exons. exon versus 5⬘ UTRs of intronless genes are plotted on
We focused on constitutive splicing as opposed to alter- the ordinate. Although most of the 65,536 8-mers lie
native splicing so as to tackle the more fundamental near the center of this two-dimensional scatter plot,
problem represented by the former and to avoid what are many are either overrepresented or underrepresented in
likely more complex mechanisms in the latter. The the noncoding exons. Choosing those 8-mers whose
strategy we used to overcome the confounding presence frequency difference corresponds to a p value of <0.002
of protein coding information was to restrict our search for each comparison, we collected 2069 putative ESEs
to non-protein-coding exons. In more than 40% of all (PESEs; upper right quadrant of Fig. 1 beyond the dotted
human genes, protein synthesis is initiated in an exon line) and 974 putative ESSs (PESSs, lower left quadrant
other than the first (Davuluri et al. 2001). On examina- beyond the dotted line). The thresholds of p < .002 pre-
tion of a database of unrelated human genes, 9% were dicts a false discovery rate (Storey and Tibshirani 2003)
Figure 2. Examples of putative exonic splicing silencer (PESS; left) and putative exonic splicing enhancer (PESE; right) sequence
families. The 974 PESS and 2069 PESE 8-mer sequences were aligned and then clustered using ClustalW. Pictograms for 10 PESSs and
8 PESEs on the basis of the positional sequence scoring matrix underlying each cluster are shown. The number of sequences in the
cluster is shown in parentheses and the name of an exact exemplar used for testing is given, as is the information content in bits.
mediated decay (NMD) of this species. In fact, although nigene, the 108-nt exon 13 of the human Thbs4 (throm-
CHUK exon 8 contains no in-frame nonsense codons, bospondin 4) gene. In this case the original central exon
the introduction of any 8-mer at the BamHI site used contains an in-frame nonsense codon at a position 11
produces a frame shift and the generation of several in- bases from the 3⬘ end of the exon, from which it is not
frame stop codons, including one that is 60 nt upstream expected to elicit NMD. Now the frame shift induced by
of the 3⬘ end of the exon. This position is outside the the insertion of an 8-mer removes this nonsense codon.
region of immunity from NMD, which is usually esti- We tested eight of the PESSs that exhibited strong si-
mated as 50–55 nt from the 3⬘ end of the penultimate lencer activity in the CHUK exon 8. As can be seen in
exon (Maquat 2004; Neu-Yilik et al. 2004). Nevertheless, Figure 4D, each of the eight PESSs again induced silenc-
NMD does not appear to be operating in this system, as ing, increasing exon skipping fourfold, from 19% to an
12 of the insertions (two PESSs, four mutated versions, average of 73%. We conclude that our results reflect
and six arbitrary sequences) caused little or no increase splicing differences and not NMD. In addition to ad-
in the proportion of skipped species despite the genera- dressing the NMD issue, this experiment shows that
tion of the same nonsense codons. Exceptions to NMD these eight PESSs work similarly in two completely dif-
have been previously noted (Enssle et al. 1993; Danck- ferent exons and at two different relative locations (22
wardt et al. 2002; Maquat 2004; Neu-Yilik et al. 2004, bases downstream of the 5⬘ end in CHUK exon 8, and 16
and references therein). Moreover, we previously found bases upstream of 3⬘ end of the exon in Thbs4 exon 13).
that a nonsense mutation that caused NMD when ex- Insertion of 8-mers into the test exon can create PESS
pressed from the endogenous dhfr gene (used as the host and PESE elements in the new joint sequences. In par-
minigene here) did not exhibit this phenotype upon ticular, the BamHI site used for the insertion lies at one
transfection (Urlaub et al. 1989). end of a resident PESE in CHUK exon 8 (AAGGAUCC),
As a further test of an NMD effect, we repeated the and the insertions usually created an overlapping PESE
PESS assay using a different target exon in our test mi- at this location, extended by one base. In three cases, the
pseudo exons did not sharply drop compared with the to that value, but the PESS frequency in introns is sev-
pseudo exon’s flanks. PESS frequency was, in fact, as eral times greater, again supporting the idea (Fairbrother
much as 20% higher in both the pseudo exon body and and Chasin 2000) that intronic sequences have evolved
flanks compared with deep introns (Fig. 5, right, gray to create a generally inhospitable environment for splic-
line), suggesting a possible role of PESSs in silencing ing. Looking at the total number of PESEs and PESSs,
false splice sites. It should be noted that the pseudo ex- the average real 140-nt internal exon would contain 10.5
ons analyzed here did not overlap the pseudo exon set PESEs and 1.9 PESSs, whereas the typical pseudo exon
used for prediction. It is interesting to compare the fre- counterpart would contain 4.4 PESEs and 6.9 PESSs.
quencies of these putative regulatory sequences with These average PESE/PESS ratios differ by a factor of 8.6;
that predicted by chance, using the average probability of splicing decisions may be made on the basis of these
all possible 8-mers (1/65,536 = 0.0000153). The latter fre- ratios.
quency is shown by the dashed horizontal line in Figure Numerous mutagenesis studies have shown that ESS
5. The average PESE 8-mer frequency in introns is close sequences are often juxtaposed with ESE sequences in
alternatively spliced exons. Thus, one might expect to out compared with 5⬘ UTRs, which are themselves rich
see a significantly higher level of PESS elements in al- in CpG (Davuluri et al. 2001). In addition, all but one of
ternatively spliced exons compared with constitutive ex- the studies cited above assayed the splicing of terminal
ons. We analyzed a data set of 281 alternatively spliced exons, whereas only internal exons were considered in
exons and found that they contained 8.0 PESEs and 2.2 our experiment
PESSs per 140 nt, for a PESE/PESS ratio of 3.6 compared To gauge the physiological relevance of the sequences
with 5.5 for constitutive exons (and 0.64 for pseudo ex- identified here, we surveyed exonic mutations that re-
ons). This lower PESE/PESS ratio may play a role in al- sult in a splicing deficiency, but that are not located
lowing alternatively spliced exons to be skipped within the consensus splice site sequence. If the PESEs
and PESSs found here are important governors of splicing
in vivo, then we would expect many of these mutations
to have either destroyed a PESE or created a PESS. Of 58
Discussion
splicing mutations examined (33 in the hprt gene, see
It is interesting to compare the 8-mer PESEs found here Supplementary Table S1), more than half (55%) fulfilled
with the 6-mer PESEs found by Fairbrother et al. (2002), this expectation: 19 represent disruptions in PESEs, and
using a different computational strategy. One of their 16 created PESSs. In contrast, only 11% of missense mu-
two criteria was to compare exons with exons: those tations with no splicing phenotype affected PESS or
with strong splice sites to those with weak splice sites. PESE sequences (see Supplementary Table S2). It is in-
In this way they also avoided the isolation of sequences teresting to note that the creation of PESSs was nearly as
on the basis of protein coding potential. Over 80% of common as the disruption of PESEs, as predicted by
their 237 hexamers can be found in our PESE collection, Kashima and Manley (2003).
with 1351 hits. In contrast, 10 random sets of 237 The threshold values we used to define a PESS or PESE
hexamers produced an average of only 308 hits. The were necessarily arbitrary at this stage. We can use the
overlap between these two PESE sets, isolated by differ- mutational data described above to direct us to a less
ent criteria, supports the validity of each set. conservative definition: If the threshold is reduced to
Several laboratories have used iterative selections include sequences with z-score indices down to 2.0
starting with random oligomers to define sequences that (p < 0.03), rather than 2.88 (p < 0.002), for each dimen-
bind splicing factors or that promote splicing in vitro or sion, then we capture 81% of the splicing mutations re-
in vivo (Tacke and Manley 1995; Tian and Kole 1995; ferred to above as either disrupting a PESE or creating a
Coulter et al. 1997; Liu et al. 1998, 2000; Schaal and PESS (compared with 20% for missense mutations; for
Maniatis 1999b). The full PESE 8-mers found here over- details see Supplementary Table S3). This relaxation
lap with ∼40% of these ESE sequences; the overlap after generates 4984 PESEs and 3579 PESSs with predicted
scrambling the ESE sequences was only one-third of this false discovery rates of 5% and 7%, respectively. This
value. That the overlap is not 100% may be related to the new total of over 8500 putative regulatory elements rep-
fact that the CpG content of these ESEs is unusually resents one in eight of all possible 8-mers, leading to the
high in every case cited above; the average is more than conclusion that sequences that can regulate splicing are
10% of all dinucleotides. In contrast, the CpG content of highly abundant. Further support of this idea lies in the
our PESEs is 4.2%, closer to the value for exons in gen- effect of the single base mutations in PESSs shown in
eral (2.8%). We may have selected against such CpG-rich Figure 4. Although in every case the mutation decreased
sequences because we demanded that our PESEs stand the silencing phenotype, substantial silencing activity
remained in most cases. Of nine informative cases, eight tive sequences. Nevertheless, we believe it is the most
mutant 8-mers still increased skipping at least 2.5-fold comprehensive list of splicing regulatory sequences yet
(Fig. 4A); most of these 8-mers had silencing indices be- collected and should serve as a basis for examining the
tween 0 and 2 (compared with the threshold of 2.88 for biochemical mechanisms governing accurate splice site
both indices to be predicted as a PESS). A similar situa- recognition.
tion was obtained considering PESEs. Here again, four of
eight mutant sequences still enhanced splicing by more
than fourfold, and the mutant 8-mers exhibited low sta- Materials and methods
tistical indices (Fig. 4B). Thus, although our combined Additional details can be found in Supplemental Material.
statistical index performed well in predicting PESSs and
PESEs, it is apparently not a reliable predictor of the lack
Data sets
of PESS or PESE activity.
Previous experiments from other laboratories also A nonredundant human transcript dataset was downloaded
point to a plethora of PESEs. From the computational from ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot in Sep-
tember 2003. These transcript sequences were aligned to human
study of Fairbrother et al. (2002), 6% of all hexamers are
genomic sequences obtained from ftp://ftp.ncbi.nih.gov/genomes/
predicted to have ESE activity, or approximately eight
H_sapiens using the Spidey program (http://www.ncbi.nlm.
hexamers per exon of average size 140 nt. ESEs selected nih.gov/spidey/spideysource.html). We required that a valid
from random sequences and shown to enhance splicing alignment to have more than 98% identity, more than 95%
in response to specific SR proteins are highly degenerate, mRNA coverage and that all identified exons be flanked by
with the same 5–8-nt consensus-defining region rarely canonical splice sites. Under these restrictions, 16,930 tran-
being found twice (Liu et al. 1998, 2000). The combined scripts yielded alignments, from which 166,538 exons and
prevalence of these sequences in exons is at least four per 149,608 introns were identified. Based on these sequences, three
140 nt, which can be extrapolated to at least eight if the subsets were created as follows:.
full complement of SR proteins is considered. Both of 1) Noncoding internal exons (NC). By comparing full-length
genes with annotated mRNA sequences, 2495 NCs (∼1.5% of all
these frequencies are similar to the average of 10.5 found
exons) were extracted from 5⬘ UTRs. We discarded any exon if
for the PESEs defined here. Because all three sets do not
a single base substitution or a single base addition or deletion
overlap completely, the combined total number of PESEs could generate an open reading frame, reasoning that these
per exon must be even larger. Even allowing for cluster- could be misannotated coding exons containing single sequenc-
ing and overlap of individual PESEs, the picture that ing errors. The majority of exons were eliminated by this filter.
emerges is one in which more than half of an exon is We eliminated exons that are mostly skipped in splicing, as
made up of ESEs and in which much of the remainder is deduced from the human dbEST database (ftp://ftp.ncbi.nih-
composed of ESSs. It follows that general RNP structure .gov/blast/db/est_human.tar.gz). We discarded exons that have
(e.g., an H-complex) may reflect more information than <70% inclusion. After these two filters, 502 noncoding exons
has been hitherto acknowledged. remained for analysis. A list of the noncoding exons used can be
found at http://www.columbia.edu/cu/biology/faculty/chasin/
Compared with ESEs, there have been far fewer sys-
xz3/noncode.txt.
tematic searches for ESSs. In our own previous work we
2) Pseudo exons adjacent to the noncoding exons (PE). We
found that 7 of 19 sequences (∼100-mers) randomly cho- applied the same criteria as in our previous study (Zhang et al.
sen from the human genome inhibited splicing when 2003) to extract 2876 PEs: intronic sequences 50–250 nt long
inserted into an exon (Fairbrother and Chasin 2000). Re- that are flanked by sequences resembling splice sites (acceptor
examination of these seven inhibitory sequences showed consensus values of at least 75 and donor consensus values of at
that they contained at least one PESS, with the average least 78). To further ensure that the pseudo exons set resembled
content being 3.80 (over twice that expected by chance), the noncoding exon set in general base composition (e.g., from
whereas in 12 noninhibitory sequences, the average con- the same set of isochores), we collected pseudo exons from the
tent was 1.25. That more than 90% of the tested PESSs introns adjacent to the 502 noncoding exons. After removal of
any pseudo exons that were present as ESTs, we were left with
inhibited splicing indicates that the great majority of the
2309 pseudo exons. A list of the pseudo exons used for compari-
PESS sequences we have identified by computation
son can be found at http://www.columbia.edu/cu/biology/
could be playing a physiological role. faculty/chasin/xz3/pseudos5.doc.
The large difference (8.6-fold) in PESE/PESS ratio be- 3) 5⬘ UTRs of intronless genes (IL). Among the 16,930 full-
tween real exons and pseudo exons raises the question of length genes, we extracted 1220 intronless genes and parsed
whether this metric can be used to distinguish these two their 5⬘-UTRs according to the annotation. We then applied the
classes. Unfortunately, the distribution of ratio values is same criteria that we did for NCs to eliminate possible coding–
quite wide, such that if a threshold ratio value is chosen exon contamination. The number of 5⬘ UTRs of intronless gene
to capture 80% of real exons, it will also capture 20% of that made up this dataset was 864. A list of the intronless gene
pseudo exons (this being the optimum condition for 5⬘ UTRs used can be found at http://www.columbia.edu/cu/
biology/faculty/chasin/xz3/ilgenes.doc.
combined sensitivity and specificity). Thus, the final dis-
tinction of these two classes awaits further work. In the
meantime, this ratio may be a useful adjunct to other Calculations of scoring indices EP, EI, SP, and SI
criteria (Zhang et al. 2003). EP represents the extent to which a given 8-mer is found in the
Our lists of PESE and PESS 8-mers undoubtedly con- noncoding exons as opposed to pseudo exons; this z-score was
tain some ineffective sequences and omit other effec- calculated as in Fairbrother et al. (2002). When this index is <0,
the absolute value is taken as the silencer scoring index, SP. sequencing. Tandem arrays of PS9 were constructed using syn-
Similarly, EI and SI represent the scoring indices for noncoding thetic oligomers that provided no space between repeats.
exons compared with the 5⬘ UTRs of intronless genes. An index
of 2.88 corresponds with p < .002, and an index of 2 corresponds
with p < 0.03. The random chance for an 8-mer to pass both Splicing
criteria at 0.002 is 10−4. A more detailed description of the cal- Single representatives from eight clusters and two representa-
culations is provided in Supplemental Material. A list of all tives from each of two additional clusters of PESSs were chosen
8-mers and their corresponding z-scores is available as a 1.7 MB for testing silencing. For testing PESEs, we focused on novel
text file at http://www.columbia.edu/cu/biology/faculty/chasin/ signals, choosing six not found by ESEfinder (Cartegni et al.
xz3/octamers.txt. 2003) or among the RESCUE 6-mers (Fairbrother et al. 2002).
Two additional PESEs were chosen because they resemble
Clustering and sampling putative ESS/ESEs known enhancers (see Results). Human 293 cells were trans-
fected in 35-mm wells by the plasmids using Lipofectamine
We clustered the 974 PESSs and 2069 PESEs using a hierarchical 2000 (Invitrogen) according to the manufacturer. After 24 h,
clustering algorithm (Fairbrother et al. 2002). Using a dissimi- total RNA was isolated using RNAwiz (Ambion), treated with
larity cutoff of 3.2 in the dendrogram yielded 69 PESS clusters DNase I, and subjected to RT–PCR labeling with ␣-32P-dATP
and 80 PESE clusters (see Supplemental Material). (Chen and Chasin 1993) under the following conditions: tem-
plate, 3µL RT product; forward primer, CGCCAAACUUGGG
Statistical analysis of the PESS/PESEs in coding exons GGAAGCA; reverse primer, CGGAACUGCCUCCAACUAUC;
initial denaturation, 93°C for 5 min; denaturation, 93°C, 30
From among more than 120,000 internal coding exons, we chose sec; annealing, 61°C, 30 sec; extension, 72°C, 1 min; 28 cycles;
to look at those 50–250 nt long and flanked by at least 100 nt of final extension, 72°C, 7 min. Results were quantified with a
intron sequence on both sides. We extracted 100 nt of sequence PhosphorImager.
from each of these 78,000 exons: 25 nt from each end and 50
from the center. If an exon was shorter than 100 nt, we only
considered the two ends. For a composite intron, we collected Mutation analysis
all the corresponding introns that were at least 100 nt long. We Mutations in the hprt gene were collected from O’Neil et al.
also divided these into three parts: a 100-nt 5⬘ end, a 100-nt (1998) and Tu et al. (2000); mutations in other genes were taken
region at the center, and a 100-nt 3⬘ end. If an intron was shorter from those collected by Cartegni et al. (2002). These mutations
than 300 nt, we only considered its ends. We calculated the are listed in the Supplemental Material. A single point mutation
average frequency of all PESSs or PESEs at each position of these always changes a set of eight overlapping 8-mers to a new set of
uniform exons and introns. Pseudo exons overlapping highly eight sequences. If there were one or more putative enhancers in
repeated sequences were excluded. the original set but fewer or no enhancers in the new set, then
this change was designated as an enhancer-disruption (ED)
Constructs event. Conversely, if in the mutant set there were one or more
putative silencers but none or fewer in the original set, then this
A complete hamster dhfr minigene (pDCH1P12) was first con- change was designated as a silencer-creation (SC) event. Tabu-
structed that contained exon1, intron1 (304 bp), exons 2 and 3 lated results can be found in Supplementary Tables S1 and S2.
merged, an abbreviated intron 3 (900 bp), and exons 4–6 merged.
This minigene was driven by the dhfr promoter and was termi-
nated by the first dhfr polyA site. Exons 2 and 3 were then
replaced with a unique NotI site to form pDCH1P12D. In the Acknowledgments
course of other studies, we have tested the splicing of several We thank Will Fairbrother for providing an electronic version of
foreign exons inserted into this NotI site. When inserted into the RESCUE-ESE hexamer sequences, Adrian Krainer for pro-
this site as a polymerase chain reaction product, the exon 8 of viding the list of sequences underlying the ESEfinder program,
the human CHUK gene (Mock et al. 1995) is predominantly Harmen Bussemaker for a critical reading of the manuscript and
included when cloned with its flanking intron sequences helpful suggestions, Hongfei Zhang for help with the statistical
(PDCHUK8F, 47 and 67 nt beyond the 3⬘ and 5⬘ splice sites, analysis, and three anonymous reviewers for helpful criticisms.
respectively) but is mainly skipped when cloned without these X.H-F. Zhang is a Columbia University Predoctoral Faculty Fel-
flanking sequences (pDCHUK8), making it a sensitive indicator low. L.A.C. was supported by funds from Columbia University.
for enhancement and for silencing. In the same way we con- The publication costs of this article were defrayed in part by
structed a minigene with exon 13 of the human thrombospon- payment of page charges. This article must therefore be hereby
din4 gene inserted into the NotI site of pDCH1P12D without its marked “advertisement” in accordance with 18 USC section
flanks (pDTBSN413). The transcript of this minigene is spliced 1734 solely to indicate this fact.
efficiently without its flanks.
We inserted PESS and PESE candidates into a unique BamHI
site 22 nt downstream from the start of CHUK exon 8. We
References
synthesized the two strands of the 8-mer sequence flanked by
cohesive ends compatible with a BamHI site on each side. To Berget, S.M. 1995. Exon recognition in vertebrate splicing. J Biol
facilitate future manipulations, the BamHI site was recon- Chem. 270: 2411–2414.
structed on the upstream side of the insert and disrupted on the Black, D.L. 2003. Mechanisms of alternative pre-messenger
downstream side. For ligation of the annealed strands, we incu- RNA splicing. Annu Rev Biochem. 72: 291–336.
bated 3 µL of double-strand insertions (0.6 µg) with 1 µL of Blencowe, B.J. 2000. Exonic splicing enhancers: mechanism of
BamHI-cut vectors (∼0.1 µg, without CIP treatment) in a 20-µL action, diversity and role in human genetic diseases. Trends
reaction at 16°C for 1–2 h; a 5-µL portion was used to transform Biochem Sci. 25: 106–110.
DH5␣ competent cells. Recombinant plasmids were verified by Burge, C. and Karlin, S. 1997. Prediction of complete gene struc-
tures in human genomic DNA. J Mol Biol. 268: 78–94. tion motifs and composite pre-mRNA exonic elements. Mol
Burge, C.B., Tuschl, T., and Sharp, P.A. 1999. Splicing of pre- Cell Biol. 19: 1853–1863.
cursors to mRNAs by the spliceosomes. In The RNA world, Mock, B.A., Connelly, M.A., McBride, O.W., Kozak, C.A., and
2nd ed. (ed. R.F. Gesteland, Cech, T. R. & Atkins, J. F.), pp. Marcu, K.B. 1995. CHUK, a conserved helix-loop-helix ubiq-
525–560. Cold Spring Harbor Laboratory Press, Cold Spring uitous kinase, maps to human chromosome 10 and mouse
Harbor, New York. chromosome 19. Genomics. 27: 348–351.
Cartegni, L., Chew, S.L., and Krainer, A.R. 2002. Listening to Neu-Yilik, G., Gehring, N.H., Hentze, M.W., and Kulozik, A.E.
silence and understanding nonsense: exonic mutations that 2004. Nonsense-mediated mRNA decay: from vacuum
affect splicing. Nat Rev Genet. 3: 285–298. cleaner to Swiss army knife. Genome Biol. 5: 218.
Cartegni, L., Wang, J., Zhu, Z., Zhang, M.Q., and Krainer, A.R. O’Neill, J.P., Rogan, P.K., Cariello, N., and Nicklas, J.A. 1998.
2003. ESEfinder: A web resource to identify exonic splicing Mutations that alter RNA splicing of the human HPRT gene:
enhancers. Nucleic Acids Res. 31: 3568–3571. a review of the spectrum. Mutat Res. 411: 179–214.
Chen, I.T. and Chasin, L.A. 1993. Direct selection for mutations Robberson, B.L., Cote, G.J., and Berget, S.M. 1990. Exon defini-
affecting specific splice sites in a hamster dihydrofolate re- tion may facilitate splice site selection in RNAs with mul-
ductase minigene. Mol Cell Biol. 13: 289–300. tiple exons. Mol Cell Biol. 10: 84–94.
———. 1994. Large exon size does not limit splicing in vivo. Schaal, T.D. and Maniatis, T. 1999a. Multiple distinct splicing
Mol Cell Biol. 14: 2140–2146. enhancers in the protein-coding sequences of a constitu-
Coulter, L.R., Landree, M.A., and Cooper, T.A. 1997. Identifi- tively spliced pre-mRNA. Mol Cell Biol. 19: 261–273.
cation of a new class of exonic splicing enhancers by in vivo ———. 1999b. Selection and characterization of pre-mRNA
selection. Mol Cell Biol. 17: 2143–2150. splicing enhancers: Identification of novel SR protein-spe-
Danckwardt, S., Neu-Yilik, G., Thermann, R., Frede, U., cific enhancer sequences. Mol Cell Biol. 19: 1705–1719.
Hentze, M.W., and Kulozik, A.E. 2002. Abnormally spliced Senapathy, P., Shapiro, M.B., and Harris, N.L. 1990. Splice junc-
beta-globin mRNAs: a single point mutation generates tran- tions, branch point sites, and exons: sequence statistics,
scripts sensitive and insensitive to nonsense-mediated identification, and applications to genome project. Methods
mRNA decay. Blood. 99: 1811–1816. Enzymol. 183: 252–278.
Davuluri, R.V., Grosse, I., and Zhang, M.Q. 2001. Computa- Sironi, M., Menozzi, G., Riva, L., Cagliani, R., Comi, G.P.,
tional identification of promoters and first exons in the hu- Bresolin, N., Giorda, R., and Pozzoli, U. 2004. Silencer ele-
man genome. Nat Genet. 29: 412–417. ments as possible inhibitors of pseudoexon splicing. Nucleic
Del Gatto-Konczak, F., Olive, M., Gesnel, M.C., and Acids Res. 32: 1783–1791.
Breathnach, R. 1999. hnRNP A1 recruited to an exon in vivo Storey, J.D. and Tibshirani, R. 2003. Statistical significance for
can function as an exon splicing silencer. Mol Cell Biol. genomewide studies. Proc Natl Acad Sci. 100: 9440–9445.
19: 251–260. Sun, H. and Chasin, L.A. 2000. Multiple splicing defects in an
Enssle, J., Kugler, W., Hentze, M.W., and Kulozik, A.E. 1993. intronic false exon. Mol Cell Biol. 20: 6414–6425.
Determination of mRNA fate by different RNA polymerase Tacke, R. and Manley, J.L. 1995. The human splicing factors
II promoters. Proc Natl Acad Sci. 90: 10091–10095. ASF/SF2 and SC35 possess distinct, functionally significant
Fairbrother, W.G. and Chasin, L.A. 2000. Human genomic se- RNA binding specificities. EMBO J. 14: 3540–3551.
quences that inhibit splicing. Mol Cell Biol. 20: 6816–6825. Tian, H. and Kole, R. 1995. Selection of novel exon recognition
Fairbrother, W.G., Yeh, R.F., Sharp, P.A., and Burge, C.B. 2002. elements from a pool of random sequences. Mol Cell Biol.
Predictive identification of exonic splicing enhancers in hu- 15: 6291–6298.
man genes. Science. 297: 1007–1013. Tu, M., Tong, W., Perkins, R., and Valentine, C.R. 2000. Pre-
Fedorov, A., Saxonov, S., Fedorova, L., and Daizadeh, I. 2001. dicted changes in pre-mRNA secondary structure vary in
Comparison of intron-containing and intron-lacking human their association with exon skipping for mutations in exons
genes elucidates putative exonic splicing enhancers. Nucleic 2, 4, and 8 of the Hprt gene and exon 51 of the fibrillin gene.
Acids Res. 29: 1464–1469. Mutat Res. 432: 15–32.
Hui, J., Stangl, K., Lane, W.S., and Bindereif, A. 2003. HnRNP L Urlaub, G., Mitchell, P.J., Ciudad, C.J., and Chasin, L.A. 1989.
stimulates splicing of the eNOS gene by binding to variable- Nonsense mutations in the dihydrofolate reductase gene af-
length CA repeats. Nat Struct Biol. 10: 33–37. fect RNA processing. Mol Cell Biol. 9: 2868–2880.
Kashima, T. and Manley, J.L. 2003. A negative element in Wagner, E.J. and Garcia-Blanco, M.A. 2001. Polypyrimidine
SMN2 exon 7 inhibits splicing in spinal muscular atrophy. tract binding protein antagonizes exon definition. Mol Cell
Nat Genet. 34: 460–463. Biol. 21: 3281–3288.
Ladd, A.N., and Cooper, T.A. 2002. Finding signals that regulate Zhang, M.Q. 1998. Statistical features of human exons and their
alternative splicing in the post-genomic era. Genome Biol. flanking regions. Hum Mol Genet. 7: 919–932.
3: reviews0008. Zhang, X.H., Heller, K.A., Hefter, I., Leslie, C.S., and Chasin,
Liu, H.X., Zhang, M., and Krainer, A.R. 1998. Identification of L.A. 2003. Sequence information for the splicing of human
functional exonic splicing enhancer motifs recognized by in- pre-mRNA identified by support vector machine classifica-
dividual SR proteins. Genes Dev. 12: 1998–2012. tion. Genome Res. 13: 2637–2650.
Liu, H.X., Chew, S.L., Cartegni, L., Zhang, M.Q., and Krainer,
A.R. 2000. Exonic splicing enhancer motif recognized by hu-
man SC35 under splicing conditions. Mol Cell Biol. 20: 1063–
1071.
Maquat, L.E. 2004. Nonsense-mediated mRNA decay: splicing,
translation and mRNP dynamics. Nat Rev Mol Cell Biol.
5: 89–99.
Mayeda, A., Screaton, G.R., Chandler, S.D., Fu, X.D., and
Krainer, A.R. 1999. Substrate specificities of SR proteins in
constitutive splicing are determined by their RNA recogni-