Paper2 Fu

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Computational definition of sequence

motifs governing constitutive


exon splicing
Xiang H-F. Zhang and Lawrence A. Chasin1
Department of Biological Sciences, MC2433, Columbia University, New York, New York 10027, USA

We have searched for sequence motifs that contribute to the recognition of human pre-mRNA splice sites by
comparing the frequency of 8-mers in internal noncoding exons versus unspliced pseudo exons and 5ⴕ
untranslated regions (5ⴕ untranslated regions [UTRs]) of transcripts of intronless genes. This type of
comparison avoids the isolation of sequences that are distinguished by their protein-coding information. We
classified sequence families comprising 2069 putative exonic enhancers and 974 putative exonic silencers.
Representatives of each class functioned as enhancers or silencers when inserted into a test exon and assayed
in transfected mammalian cells. As a class, the enhancer sequencers were more prevalent and the silencer
elements less prevalent in all exons compared with introns. A survey of 58 reported exonic splicing mutations
showed good agreement between the splicing phenotype and the effect of the mutation on the motifs defined
here. The large number of effective sequences implied by these results suggests that sequences that influence
splicing may be very abundant in pre-mRNA.
[Keywords: splicing; pre-mRNA; motifs; exon; enhancers; silencers]
Supplemental material is available at http://www.genesdev.org.
Received February 17, 2004; revised version accepted April 9, 2004.

A fundamental step in the transfer of genetic informa- needed to distinguish true from false exons is thought
tion from DNA to protein is the splicing of RNA tran- to reside in splicing enhancer and splicing silencer
scripts. In this process, relatively small exons (∼100 nt) sequence elements (Blencowe 2000; Wagner and Garcia-
are selected from among generally much larger introns Blanco 2001; Cartegni et al. 2002; Ladd and Cooper
(thousands of nucleotides) and are joined to form mature 2002). Exonic splicing enhancers (ESEs) have been ex-
mRNA. Pre-mRNA splicing is accomplished by two se- tensively studied in the context of alternative splicing
quential transesterification reactions catalyzed by a very (Black 2003). ESEs have also been implicated in at least
large ribonucleoprotein complex known as the spliceo- some constitutive splicing (Mayeda et al. 1999; Schaal
some (Burge et al. 1999). A spliceosome is either re- and Maniatis 1999a). Exonic or intronic splicing silenc-
cruited or assembled at the correct 5⬘ splice site (donor) ers have been less extensively studied (Wagner and
and 3⬘ splice site (acceptor) in part through recognition of Garcia-Blanco 2001; Ladd and Cooper 2002).
conserved sequences spanning the intron–exon junctions Many ESEs have been found by selecting RNA se-
(Burge et al. 1999). However, the sequence conservation quences from among large numbers of random oligomers
at the splice sites is incomplete, such that there are whose insertion stimulates the use of a weak splice site
many false sites that match the consensus sequence as (Liu et al. 1998, 2000; Schaal and Maniatis 1999b) or a
well or better than the true sites (Senapathy et al. 1990). poorly included chimeric exon (Tian and Kole 1995) in a
For most long transcripts, it is clear that the exon repre- cell-free splicing system or in transfected cells (Coulter
sents the unit of initial recognition (Robberson et al. et al. 1997). In many of these cases the ESEs have been
1990; Berget 1995). However, incorporation of a require- shown to function in concert with specific splicing fac-
ment for a pair of splice sites defining an exon of limited tor proteins. However, despite their protein specificity,
length does not alleviate the problem. Pseudo exons, so these sequences are highly degenerate and their abun-
defined, outnumber real exons by an order of magnitude dance in introns is ∼80% of their frequency in exons (Liu
(Sun and Chasin 2000). The additional information et al. 1998). As a result, they are not very effective in
distinguishing true exons from pseudo exons (our unpub-
lished observations).
1
Corresponding author. Computational approaches have also been used to find
E-MAIL [email protected]; FAX (212) 865-8246.
Article published online ahead of print. Article and publication date are ESEs. There is a sharp transition in sequence composi-
at http://www.genesdev.org/cgi/doi/10.1101/gad.1195304. tion between introns and exons because of the fact that

GENES & DEVELOPMENT 18:1241–1250 © 2004 by Cold Spring Harbor Laboratory Press ISSN 0890-9369/04; www.genesdev.org 1241
Zhang and Chasin

most exons code for protein, whereas introns do not. found in which protein synthesis is initiated in exon 3 or
Most exons can be readily distinguished by this differ- higher. We focused on 502 internal non-protein-coding
ence (Zhang 1998; Zhang et al. 2003), and gene finding exons of these genes as examples of exons that should
programs that exploit it can be highly accurate (Burge have a relatively high content of ESEs and a relatively
and Karlin 1997). It is not clear what proportion, if any, low content of ESSs, but with no protein-coding infor-
of this salient information is used as an ESE in addition mation. We compared the sequence composition of these
to protein coding. Some researchers have succeeded in noncoding exons with that of two dissimilar types of
getting around this problem by comparing two classes of sequences. The first was pseudo exons: intronic regions
exons, each of which codes for protein (Fedorov et al. that have the appearance of exons in that they are
2001; Fairbrother et al. 2002). Fairbrother et al. (2002), bounded by sequences similar to acceptor and donor
reasoning that exons with weak splice sites were more splice sites and are of typical exon size, but are not in fact
likely to require ESEs for recognition than were exons spliced (Sun and Chasin 2000; Zhang et al. 2003). These
with strong sites, identified exonic hexamers that were sequences are expected to be low in ESEs and perhaps
both more prevalent in the former and more prevalent high in ESSs. We expected this comparison alone to also
than those found in the intronic flanks. When tested, yield sequences that specify other kinds of information
these hexamers indeed promoted splicing. Alternatively inherent in exons: sequences for nuclear transport,
spliced exons are usually associated with weak splice mRNA stability, and mRNA localization, for example.
sites (Black 2003), so it is possible that their selection To circumvent this problem, we also compared the non-
method was biased toward this class of exons. coding exons with a second class of sequences: the 5⬘
Exonic splicing silencers (ESSs) are a second class of UTRs of intronless genes. These regions could contain
sequence elements that are known to regulate alterna- the same nonsplicing information as the noncoding ex-
tive splicing. The high proportion of human genomic ons, but they should lack ESEs.
sequences that can act to inhibit splicing when inserted We compared the frequency of 8-mers (allowing one
into an exon suggests that ESSs may also play a role in mismatch) in noncoding exons to these two counter-
splice site selection (Fairbrother and Chasin 2000). Sironi parts. We chose 8-mers so as to include binding sites that
et al. (2004) searched for potential ESSs among hexamers could contain more information than the 5- or 6-mers
that are underrepresented in exons compared with that are commonly used in bioinformatics searches, and
pseudo exons and exon flanks; one of three such to facilitate the formation of stable synthetic heterodu-
hexamers tested had silencing activity. plexes for later experimental testing. However, because
We have circumvented the noise represented by ex- the frequency of individual 8-mers was too low to gather
onic protein coding sequences by examining exons that sufficient data from our set of 502 noncoding exons, we
do not code for proteins. We compared the frequencies of allowed a single mismatch for each 8-mer considered.
8-mers in constitutively spliced noncoding exons with Pseudo exons were chosen from the introns adjacent to
those in pseudo exons and the 5⬘ untranslated regions each noncoding exon in our database. We calculated z-
(UTRs) of intronless genes. Sequences overrepresented in scores as an index of the significance of a deviation be-
the noncoding exons were designated as putative ESEs, tween the frequency of each possible 8-mer in the non-
and underrepresented sequences were designated as pu- coding exons versus the pseudo exons and in the non-
tative ESSs. Over 3000 8-mers were identified by these coding exons versus the 5⬘ UTRs of intronless genes.
criteria. On testing, almost all of 20 sequences assayed This metric allows a determination of the statistical sig-
conferred the predicted phenotype. The large number of nificance of a frequency difference between two popula-
effective oligomers implied by these results suggests that tions without any knowledge of the underlying distribu-
sequences that positively and negatively influence splic- tion. A p value can be assigned to any z-score (see Supple-
ing may be very abundant in pre-mRNA. mental Material for the exact formula used). Because we
allowed a single mismatch, the z-score for each 8-mer
actually represents the average of 25 sequences, 24 of
Results which differ from the nominal sequence by a single base
substitution. The results are shown in Figure 1, in which
Computational strategy
the z-scores of noncoding exons versus pseudo exons are
We have used a computational approach to identify ESEs plotted on the abscissa, and the z-scores of noncoding
and ESSs associated with constitutively spliced exons. exon versus 5⬘ UTRs of intronless genes are plotted on
We focused on constitutive splicing as opposed to alter- the ordinate. Although most of the 65,536 8-mers lie
native splicing so as to tackle the more fundamental near the center of this two-dimensional scatter plot,
problem represented by the former and to avoid what are many are either overrepresented or underrepresented in
likely more complex mechanisms in the latter. The the noncoding exons. Choosing those 8-mers whose
strategy we used to overcome the confounding presence frequency difference corresponds to a p value of <0.002
of protein coding information was to restrict our search for each comparison, we collected 2069 putative ESEs
to non-protein-coding exons. In more than 40% of all (PESEs; upper right quadrant of Fig. 1 beyond the dotted
human genes, protein synthesis is initiated in an exon line) and 974 putative ESSs (PESSs, lower left quadrant
other than the first (Davuluri et al. 2001). On examina- beyond the dotted line). The thresholds of p < .002 pre-
tion of a database of unrelated human genes, 9% were dicts a false discovery rate (Storey and Tibshirani 2003)

1242 GENES & DEVELOPMENT


Sequence motifs for constitutive splicing

another report (X.H-F. Zhang, C. Leslie, L.A. Chasin, in


prep.). The 108-nt CHUK exon 8 itself contains three
clusters of PESEs and one cluster of PESSs (Fig. 3B). After
insertion of various 8-mers at a position 22 nt from the 5⬘
end of this 108-nt exon, the effect on splicing was evalu-
ated by transient transfection of human 293 cells and by
quantifying the spliced transcripts that had either in-
cluded or skipped the central exon.
We first tested PESS sequences, as there is little infor-
mation concerning the role of silencing in constitutive
splicing. Twelve PESSs were chosen to represent fami-
lies of diverse sizes and sequences. For 11 of the 12
cases, the possible functional significance of the se-
quence was not considered. The exception, PS4, was cho-
Figure 1. A scatter plot showing the scores of all possible sen because it constituted a core binding site for hnRNP
65,536 8-mers with respect to their relative abundance in three A1 (AUAGGGU), which can act as an ESS (Del Gatto-
sequence classes. The axis numbers represent z-scores. Z-scores Konczak et al. 1999). As can be seen by the black bars in
on the X-axis are from a comparison of the relative abundance of Figure 4A, 10 of 12 putative silencers significantly in-
each 8-mer in noncoding internal exons versus pseudo exons; creased exon skipping when inserted into the exon em-
this number is called an EP index when it is >0 (for enhancer
bedded in its flanks; the average increase in skipping was
compared with pseudo exons) and an SP index when it is <0 (for
sixfold over the control, such that the central exon was
silencer compared with pseudo exons). The z-scores on the Y-
axis are from a comparison of the relative abundance of each now skipped 50%–80% of the time instead of 10%. In-
8-mer in noncoding internal exons versus the 5⬘ UTR of intron- terestingly, whereas PESS PS9 had almost no effect on
less genes; this number is called an EI index when it is >0 (for splicing, insertion of two tandem 8-mers of this se-
enhancer compared with intronless genes) and an SI index when quence was effective, and three 8-mers virtually abol-
it is <0 (for silencer compared with intronless genes). In all ished central exon splicing (Fig. 4C, columns 1–3). Be-
further discussion, the silencer indices SP and SI are expressed cause two new PESSs were created at the joints of these
as their absolute values. The dotted line marks a z-score of 2.88, tandem repeats, it is possible that one of these secondary
chosen as a threshold for z-scores considered to be of signifi- 8-mers (UUAACAAU, ACAAUUUA) was responsible
cance. A z-score >2.88 has a probability of <0.002 of occurring by
for the heightened silencing. However, inasmuch as
chance. Points lying beyond this threshold in both comparisons
three copies were much more effective than two, the
are black and represent the set of putative exonic splicing en-
hancers or silencers characterized further. If all 8-mers were conclusion that two or more PESSs can act synergisti-
distributed equally in all data sets, then the probability that a cally can still be drawn.
point will lie outside the dashed lines (i.e., in both dimensions) Next we tested the specificity of this effect by intro-
by chance is <10−4. ducing a single base substitution (SBS) in each putative
silencer sequence; the changes were designed to reduce
the z-score index to as close to zero as possible. None of
of 0.3% for PESEs and 0.7% for PESSs, on the basis of a the changes created a PESE. The silencing effect was sig-
bivariate normal distribution. These sequences were nificantly reduced in each case (Fig. 4A, gray bars), and
grouped into 69 PESS families and 80 PESE families by the average decrease was over threefold (range = 1.4–8.4).
hierarchical clustering. The sequence logos for 10 PESS In one case in which the SBS was ineffective, we intro-
and 8 PESE clusters are shown in Figure 2, along with duced a second SBS; the double change completely re-
their information content and the size of the cluster. versed the splicing inhibition (Fig. 4C, columns 4–6). We
also tested six 8-mers with low silencer scores (ACCCU
AUC, AUACAUAA, AACAAUAC, CAUUUCUA, CCA
Testing putative exonic splicing silencers
UGACC, and CCAUAUAC): None produced significant
To determine whether these sequences could function as silencing (data not shown). Thus, the induction of exon
splicing regulatory elements, we tested several for their skipping was not caused by the mere introduction of a
effect on the splicing of the central exon of a chimeric foreign sequence. We have previously shown that the
three-exon minigene. Two versions of the test minigene insertion of much larger sequences (>100 nt) does not
were used. The terminal exons of this construct were usually compromise splicing (Chen and Chasin 1994;
from the hamster dihydrofolate reductase (dhfr) gene Fairbrother and Chasin 2000). Interestingly, one of the
separated by dhfr intron sequences. Exon 8 of the human two PESSs that did not inhibit splicing as a single copy,
conserved helix–loop–helix ubiquitous kinase (CHUK) PS12, constitutes a CACACACA repeat—a sequence
gene was inserted into this minigene, either with or that has been characterized as an intronic enhancer (Hui
without ∼55 nt of flank beyond the splice site sequences. et al. 2003).
With the flanking sequences present, the central exon is We considered the possibility that the decreased pro-
included ∼90% of the time; without the flanking se- portion of exon-included species affected by an insertion
quence, the central exon is skipped 90% of the time (Fig. was actually the result of the introduction of an in-frame
3A). This flanking sequence effect will be the subject of nonsense codon in the central exon, leading to nonsense-

GENES & DEVELOPMENT 1243


Zhang and Chasin

Figure 2. Examples of putative exonic splicing silencer (PESS; left) and putative exonic splicing enhancer (PESE; right) sequence
families. The 974 PESS and 2069 PESE 8-mer sequences were aligned and then clustered using ClustalW. Pictograms for 10 PESSs and
8 PESEs on the basis of the positional sequence scoring matrix underlying each cluster are shown. The number of sequences in the
cluster is shown in parentheses and the name of an exact exemplar used for testing is given, as is the information content in bits.

mediated decay (NMD) of this species. In fact, although nigene, the 108-nt exon 13 of the human Thbs4 (throm-
CHUK exon 8 contains no in-frame nonsense codons, bospondin 4) gene. In this case the original central exon
the introduction of any 8-mer at the BamHI site used contains an in-frame nonsense codon at a position 11
produces a frame shift and the generation of several in- bases from the 3⬘ end of the exon, from which it is not
frame stop codons, including one that is 60 nt upstream expected to elicit NMD. Now the frame shift induced by
of the 3⬘ end of the exon. This position is outside the the insertion of an 8-mer removes this nonsense codon.
region of immunity from NMD, which is usually esti- We tested eight of the PESSs that exhibited strong si-
mated as 50–55 nt from the 3⬘ end of the penultimate lencer activity in the CHUK exon 8. As can be seen in
exon (Maquat 2004; Neu-Yilik et al. 2004). Nevertheless, Figure 4D, each of the eight PESSs again induced silenc-
NMD does not appear to be operating in this system, as ing, increasing exon skipping fourfold, from 19% to an
12 of the insertions (two PESSs, four mutated versions, average of 73%. We conclude that our results reflect
and six arbitrary sequences) caused little or no increase splicing differences and not NMD. In addition to ad-
in the proportion of skipped species despite the genera- dressing the NMD issue, this experiment shows that
tion of the same nonsense codons. Exceptions to NMD these eight PESSs work similarly in two completely dif-
have been previously noted (Enssle et al. 1993; Danck- ferent exons and at two different relative locations (22
wardt et al. 2002; Maquat 2004; Neu-Yilik et al. 2004, bases downstream of the 5⬘ end in CHUK exon 8, and 16
and references therein). Moreover, we previously found bases upstream of 3⬘ end of the exon in Thbs4 exon 13).
that a nonsense mutation that caused NMD when ex- Insertion of 8-mers into the test exon can create PESS
pressed from the endogenous dhfr gene (used as the host and PESE elements in the new joint sequences. In par-
minigene here) did not exhibit this phenotype upon ticular, the BamHI site used for the insertion lies at one
transfection (Urlaub et al. 1989). end of a resident PESE in CHUK exon 8 (AAGGAUCC),
As a further test of an NMD effect, we repeated the and the insertions usually created an overlapping PESE
PESS assay using a different target exon in our test mi- at this location, extended by one base. In three cases, the

1244 GENES & DEVELOPMENT


Sequence motifs for constitutive splicing

tested were chosen because they resembled known ESEs:


PE1, the SC35 consensus sequence (Liu et al. 2000), and
PE2, a purine-rich element (Schaal and Maniatis 1999b).
The remaining six PESEs represented novel candidates in
that they are not represented in the PESEs predicted by
Fairbrother et al. (Fairbrother et al. 2002), nor are they
found by ESEfinder (Cartegni et al. 2003). About half of
our PESEs are novel by these criteria, implying that
many such new ESEs remain to be discovered.

Global distribution of PESEs and PESSs


These PESEs and PESSs were predicted on the basis of
relative abundance or scarcity, respectively, in noncod-
ing exons compared with nonsplicing sequences. We
asked whether these oligomers showed a similar differ-
ential distribution in protein-coding exons—the major
class of exons. We collected data from 78,000 internal
protein-coding exons of lengths >50 nt and calculated the
frequencies of PESEs and PESSs occurring at each posi-
Figure 3. Minigenes used for testing effects on splicing. (A) tion, starting 100 nt upstream and ending 100 nt down-
Two versions of the exon 8 region of the human conserved
stream of the exon–intron borders. Because the exons
helix–loop–helix ubiquitous kinase (CHUK) gene were inserted
were of different lengths, we combined 25 nt from each
into a chimeric intron separating exon 1 and the combined ex-
ons 4–6 of the hamster dihydrofolate reductase (dhfr) gene. Large end with 50 nt from the center to create a 100-nt version;
boxes depict exons, stubby gray boxes show flanking regions of if the exon was <100 nt, we just used the ends. For com-
the CHUK exon 8, and thin horizontal lines represent hamster parison, we repeated the same process for 20,580 repeat-
intron sequence. In the upper figure, the CHUK exon 8 flanks free pseudo exons. We calculated the same frequencies
are limited to the splice site consensus region from −14 up- for 148,000 regions of 100 nt located at the centers of
stream and to +7 downstream of the exon–intron junction; introns.
CHUK exon 8 is spliced poorly, as indicated. In the lower figure, The results for the real exons and introns are shown on
additional CHUK exon 8 flanking sequences have been added the left in Figure 5. It can be seen that the frequency of
from −62 upstream and to +75 downstream. The addition of this
PESSs (gray line) dropped dramatically at the transition
flanking sequence greatly improves exon inclusion, as shown.
between intronic flanks and the (composite) exon. Spikes
(B) Sequence of the 108-nt CHUK exon 8. PESE sequences are
underlined and PESS sequences are double overlined. The are seen as expected at the very edge of the exons because
BamHI site used for the insertion of tested 8-mers is in bold. (C) of the conservation of the splice site consensus se-
Sequence of the 108-nt thrombospon4 (Tbsh4) exon 13, anno- quence. There is a peak of PESS frequency in the region
tated as in B. of the upstream flank harboring the polypyrimidine
tract, again as expected because runs of U are common in
these PESSs. Less expected is a smaller peak just beyond
insertion of the test PESSs created additional overlapping the donor site consensus in the downstream flank: We
PESSs; in these instances we cannot be sure which 8-mer speculate that this peak in negative signals may contrib-
is responsible for the silencing. However, because the ute to a locking in on the positive signal represented by
choice of an 8-mer to represent a regulatory sequence the splice sites. The frequencies of PESEs (black line)
was arbitrary, these overlapping sequences can be just as behaved in exactly the opposite manner, rising precipi-
well viewed as a single, somewhat longer, element tously within the composite exon. Within the exon the
average frequency of PESEs was approximately constant;
that is, the frequencies were not very different within
the 25-nt ends and the 50-nt center. Beyond ∼50 nt from
Testing putative splicing enhancers
the exon, the frequency of both PESSs and PESEs reached
Enhancers were tested next. Eight predicted PESEs were a level characteristic of deep intron sequences. Sharp
inserted into the central exon lacking its flanks. No transitions in sequence composition between exons and
PESSs were created at the joints by the insertion of these introns have previously been noted (Burge and Karlin
PESEs. As can be seen in Figure 4B (black bars), the eight 1997; Zhang 1998). It should be remembered that the
PESEs increased inclusion of the central exon from the PESEs and PESSs analyzed here were chosen on the basis
baseline of 11% to between 58% and 91% (average 6.4- of noncoding exons; that is, the transitions seen here
fold). Once again, SBS mutations that reduced the z-score were not produced by selection for protein coding poten-
index to near zero decreased the enhancer effect signifi- tial.
cantly in seven of eight cases and dramatically (more In contrast to the real exons, the pseudo exons lacked
than threefold) in four cases (Fig. 4B, gray bars). None of an elevated PESE content (Fig. 5, right, black line). Also
the mutations created a PESS. Two of the eight PESEs in contrast to the real exons, the PESS frequency in

GENES & DEVELOPMENT 1245


Zhang and Chasin

Figure 4. The effect of 8-mer insertions


on splicing. (A) Testing PESSs for splicing
inhibition. The indicated 8-mer PESS se-
quences were inserted into a BamHI site at
position +22 in CHUK exon 8 using the
lower construct shown in Figure 3A. Plas-
mids were transfected into human 293
cells by lipofection, and RNA was ex-
tracted after 24 h and assayed for splicing
by RT–PCR using radioactive dATP as a
precursor. Band intensity was quantified
with a PhosphorImager; proportion
skipped indicates skipped band/(skipped
band + included band). The bands corre-
spond to the column below them and
show the results of one transfection ex-
periment; the graph shows the average of
two transfections, and the error bars indi-
cate the range. Black bars represent inser-
tion of the PESS shown at the top (S), gray
bars represent insertion of a single base
substitution mutant sequence also shown
at the top (M); the SP and SI scoring indi-
ces (defined in the legend for Fig. 1) of each
PESS and each mutant sequence are
shown at the bottom. (B) Testing PESEs
for splicing enhancement. The indicated
PESE 8-mers were inserted into CHUK
exon 8 using the upper construct shown in
A. Splicing was assayed exactly as in B. (E)
PESE; (M) single base substitution mutant
sequence. Proportion included indicates
included band/(skipped band + included
band). Two transfections were carried out
for each construct; the error bars indicate
the range of the two measurements. (C)
The effect of insert sequence variations on
splicing silencing. (Left) Multiple copies of
a PESS can act synergistically to inhibit
splicing: one, two, and three copies of the
8-mer PS9 (see A) were inserted into
CHUK exon 8 and assayed for splicing as
described in A. (Right) A double base sub-
stitution is more effective than a single
base substitution in destroying silencing activity: The original PESS P5 and mutants harboring one or two single base substitutions
were inserted into CHUK exon 8 and assayed for splicing as in A. The 8-mers were UGUAAUGU, UGUAAAGU, and UGGAAAGU,
respectively; the SP indices were 4.92, 1.95, and 1.74, respectively; and the SIs were 3.14, 0.50, and −4.07, respectively. (D) Testing
PESSs for silencing in a second exon. A minigene analogous to that shown in Figure 3A was constructed using human thrombospondin
4 exon 13 as the central exon. Eight PESS sequences were inserted into a BamHI site at a position 16 nt upstream of the 3⬘ end of the
exon (Fig. 3C) and tested for silencing as described in A.

pseudo exons did not sharply drop compared with the to that value, but the PESS frequency in introns is sev-
pseudo exon’s flanks. PESS frequency was, in fact, as eral times greater, again supporting the idea (Fairbrother
much as 20% higher in both the pseudo exon body and and Chasin 2000) that intronic sequences have evolved
flanks compared with deep introns (Fig. 5, right, gray to create a generally inhospitable environment for splic-
line), suggesting a possible role of PESSs in silencing ing. Looking at the total number of PESEs and PESSs,
false splice sites. It should be noted that the pseudo ex- the average real 140-nt internal exon would contain 10.5
ons analyzed here did not overlap the pseudo exon set PESEs and 1.9 PESSs, whereas the typical pseudo exon
used for prediction. It is interesting to compare the fre- counterpart would contain 4.4 PESEs and 6.9 PESSs.
quencies of these putative regulatory sequences with These average PESE/PESS ratios differ by a factor of 8.6;
that predicted by chance, using the average probability of splicing decisions may be made on the basis of these
all possible 8-mers (1/65,536 = 0.0000153). The latter fre- ratios.
quency is shown by the dashed horizontal line in Figure Numerous mutagenesis studies have shown that ESS
5. The average PESE 8-mer frequency in introns is close sequences are often juxtaposed with ESE sequences in

1246 GENES & DEVELOPMENT


Sequence motifs for constitutive splicing

Figure 5. Statistical analysis of PESSs and


PESEs in coding exons and introns. The fre-
quencies of each of the 974 PESSs and 2069
PESEs were determined for each position in
78,000 human coding exons (50–250 nt long)
and in 100 nt of their immediate flanks and
in 100 nt regions from the center of 148,000
introns. Numbers on the ordinate indicate
the average frequency of a PESS or PESE per
nucleotide position multiplied by 100,000.
The heavy gray curve represents the PESSs,
and the black curve the PESEs. Indications
below the curve: The box marked Real rep-
resents a composite exon standardized to 100
nt as described in the text and in Supplemen
tal Material; their intronic flanks are indicated by heavy lines. The thin lines refer to intronic sequences of 100 nt extracted from the
center of each intron; this same central intron data is presented three times for easy reference. The box marked Pseudo shows the same
analysis performed on 20,580 pseudo exons drawn from repeat-free regions of introns; this set of pseudo introns did not overlap with
the pseudo exon set used to derive the z-scores in Figure 1. The broken horizontal line depicts the average frequency of any given 8-mer
in a random sequence (1/65,536).

alternatively spliced exons. Thus, one might expect to out compared with 5⬘ UTRs, which are themselves rich
see a significantly higher level of PESS elements in al- in CpG (Davuluri et al. 2001). In addition, all but one of
ternatively spliced exons compared with constitutive ex- the studies cited above assayed the splicing of terminal
ons. We analyzed a data set of 281 alternatively spliced exons, whereas only internal exons were considered in
exons and found that they contained 8.0 PESEs and 2.2 our experiment
PESSs per 140 nt, for a PESE/PESS ratio of 3.6 compared To gauge the physiological relevance of the sequences
with 5.5 for constitutive exons (and 0.64 for pseudo ex- identified here, we surveyed exonic mutations that re-
ons). This lower PESE/PESS ratio may play a role in al- sult in a splicing deficiency, but that are not located
lowing alternatively spliced exons to be skipped within the consensus splice site sequence. If the PESEs
and PESSs found here are important governors of splicing
in vivo, then we would expect many of these mutations
to have either destroyed a PESE or created a PESS. Of 58
Discussion
splicing mutations examined (33 in the hprt gene, see
It is interesting to compare the 8-mer PESEs found here Supplementary Table S1), more than half (55%) fulfilled
with the 6-mer PESEs found by Fairbrother et al. (2002), this expectation: 19 represent disruptions in PESEs, and
using a different computational strategy. One of their 16 created PESSs. In contrast, only 11% of missense mu-
two criteria was to compare exons with exons: those tations with no splicing phenotype affected PESS or
with strong splice sites to those with weak splice sites. PESE sequences (see Supplementary Table S2). It is in-
In this way they also avoided the isolation of sequences teresting to note that the creation of PESSs was nearly as
on the basis of protein coding potential. Over 80% of common as the disruption of PESEs, as predicted by
their 237 hexamers can be found in our PESE collection, Kashima and Manley (2003).
with 1351 hits. In contrast, 10 random sets of 237 The threshold values we used to define a PESS or PESE
hexamers produced an average of only 308 hits. The were necessarily arbitrary at this stage. We can use the
overlap between these two PESE sets, isolated by differ- mutational data described above to direct us to a less
ent criteria, supports the validity of each set. conservative definition: If the threshold is reduced to
Several laboratories have used iterative selections include sequences with z-score indices down to 2.0
starting with random oligomers to define sequences that (p < 0.03), rather than 2.88 (p < 0.002), for each dimen-
bind splicing factors or that promote splicing in vitro or sion, then we capture 81% of the splicing mutations re-
in vivo (Tacke and Manley 1995; Tian and Kole 1995; ferred to above as either disrupting a PESE or creating a
Coulter et al. 1997; Liu et al. 1998, 2000; Schaal and PESS (compared with 20% for missense mutations; for
Maniatis 1999b). The full PESE 8-mers found here over- details see Supplementary Table S3). This relaxation
lap with ∼40% of these ESE sequences; the overlap after generates 4984 PESEs and 3579 PESSs with predicted
scrambling the ESE sequences was only one-third of this false discovery rates of 5% and 7%, respectively. This
value. That the overlap is not 100% may be related to the new total of over 8500 putative regulatory elements rep-
fact that the CpG content of these ESEs is unusually resents one in eight of all possible 8-mers, leading to the
high in every case cited above; the average is more than conclusion that sequences that can regulate splicing are
10% of all dinucleotides. In contrast, the CpG content of highly abundant. Further support of this idea lies in the
our PESEs is 4.2%, closer to the value for exons in gen- effect of the single base mutations in PESSs shown in
eral (2.8%). We may have selected against such CpG-rich Figure 4. Although in every case the mutation decreased
sequences because we demanded that our PESEs stand the silencing phenotype, substantial silencing activity

GENES & DEVELOPMENT 1247


Zhang and Chasin

remained in most cases. Of nine informative cases, eight tive sequences. Nevertheless, we believe it is the most
mutant 8-mers still increased skipping at least 2.5-fold comprehensive list of splicing regulatory sequences yet
(Fig. 4A); most of these 8-mers had silencing indices be- collected and should serve as a basis for examining the
tween 0 and 2 (compared with the threshold of 2.88 for biochemical mechanisms governing accurate splice site
both indices to be predicted as a PESS). A similar situa- recognition.
tion was obtained considering PESEs. Here again, four of
eight mutant sequences still enhanced splicing by more
than fourfold, and the mutant 8-mers exhibited low sta- Materials and methods
tistical indices (Fig. 4B). Thus, although our combined Additional details can be found in Supplemental Material.
statistical index performed well in predicting PESSs and
PESEs, it is apparently not a reliable predictor of the lack
Data sets
of PESS or PESE activity.
Previous experiments from other laboratories also A nonredundant human transcript dataset was downloaded
point to a plethora of PESEs. From the computational from ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot in Sep-
tember 2003. These transcript sequences were aligned to human
study of Fairbrother et al. (2002), 6% of all hexamers are
genomic sequences obtained from ftp://ftp.ncbi.nih.gov/genomes/
predicted to have ESE activity, or approximately eight
H_sapiens using the Spidey program (http://www.ncbi.nlm.
hexamers per exon of average size 140 nt. ESEs selected nih.gov/spidey/spideysource.html). We required that a valid
from random sequences and shown to enhance splicing alignment to have more than 98% identity, more than 95%
in response to specific SR proteins are highly degenerate, mRNA coverage and that all identified exons be flanked by
with the same 5–8-nt consensus-defining region rarely canonical splice sites. Under these restrictions, 16,930 tran-
being found twice (Liu et al. 1998, 2000). The combined scripts yielded alignments, from which 166,538 exons and
prevalence of these sequences in exons is at least four per 149,608 introns were identified. Based on these sequences, three
140 nt, which can be extrapolated to at least eight if the subsets were created as follows:.
full complement of SR proteins is considered. Both of 1) Noncoding internal exons (NC). By comparing full-length
genes with annotated mRNA sequences, 2495 NCs (∼1.5% of all
these frequencies are similar to the average of 10.5 found
exons) were extracted from 5⬘ UTRs. We discarded any exon if
for the PESEs defined here. Because all three sets do not
a single base substitution or a single base addition or deletion
overlap completely, the combined total number of PESEs could generate an open reading frame, reasoning that these
per exon must be even larger. Even allowing for cluster- could be misannotated coding exons containing single sequenc-
ing and overlap of individual PESEs, the picture that ing errors. The majority of exons were eliminated by this filter.
emerges is one in which more than half of an exon is We eliminated exons that are mostly skipped in splicing, as
made up of ESEs and in which much of the remainder is deduced from the human dbEST database (ftp://ftp.ncbi.nih-
composed of ESSs. It follows that general RNP structure .gov/blast/db/est_human.tar.gz). We discarded exons that have
(e.g., an H-complex) may reflect more information than <70% inclusion. After these two filters, 502 noncoding exons
has been hitherto acknowledged. remained for analysis. A list of the noncoding exons used can be
found at http://www.columbia.edu/cu/biology/faculty/chasin/
Compared with ESEs, there have been far fewer sys-
xz3/noncode.txt.
tematic searches for ESSs. In our own previous work we
2) Pseudo exons adjacent to the noncoding exons (PE). We
found that 7 of 19 sequences (∼100-mers) randomly cho- applied the same criteria as in our previous study (Zhang et al.
sen from the human genome inhibited splicing when 2003) to extract 2876 PEs: intronic sequences 50–250 nt long
inserted into an exon (Fairbrother and Chasin 2000). Re- that are flanked by sequences resembling splice sites (acceptor
examination of these seven inhibitory sequences showed consensus values of at least 75 and donor consensus values of at
that they contained at least one PESS, with the average least 78). To further ensure that the pseudo exons set resembled
content being 3.80 (over twice that expected by chance), the noncoding exon set in general base composition (e.g., from
whereas in 12 noninhibitory sequences, the average con- the same set of isochores), we collected pseudo exons from the
tent was 1.25. That more than 90% of the tested PESSs introns adjacent to the 502 noncoding exons. After removal of
any pseudo exons that were present as ESTs, we were left with
inhibited splicing indicates that the great majority of the
2309 pseudo exons. A list of the pseudo exons used for compari-
PESS sequences we have identified by computation
son can be found at http://www.columbia.edu/cu/biology/
could be playing a physiological role. faculty/chasin/xz3/pseudos5.doc.
The large difference (8.6-fold) in PESE/PESS ratio be- 3) 5⬘ UTRs of intronless genes (IL). Among the 16,930 full-
tween real exons and pseudo exons raises the question of length genes, we extracted 1220 intronless genes and parsed
whether this metric can be used to distinguish these two their 5⬘-UTRs according to the annotation. We then applied the
classes. Unfortunately, the distribution of ratio values is same criteria that we did for NCs to eliminate possible coding–
quite wide, such that if a threshold ratio value is chosen exon contamination. The number of 5⬘ UTRs of intronless gene
to capture 80% of real exons, it will also capture 20% of that made up this dataset was 864. A list of the intronless gene
pseudo exons (this being the optimum condition for 5⬘ UTRs used can be found at http://www.columbia.edu/cu/
biology/faculty/chasin/xz3/ilgenes.doc.
combined sensitivity and specificity). Thus, the final dis-
tinction of these two classes awaits further work. In the
meantime, this ratio may be a useful adjunct to other Calculations of scoring indices EP, EI, SP, and SI
criteria (Zhang et al. 2003). EP represents the extent to which a given 8-mer is found in the
Our lists of PESE and PESS 8-mers undoubtedly con- noncoding exons as opposed to pseudo exons; this z-score was
tain some ineffective sequences and omit other effec- calculated as in Fairbrother et al. (2002). When this index is <0,

1248 GENES & DEVELOPMENT


Sequence motifs for constitutive splicing

the absolute value is taken as the silencer scoring index, SP. sequencing. Tandem arrays of PS9 were constructed using syn-
Similarly, EI and SI represent the scoring indices for noncoding thetic oligomers that provided no space between repeats.
exons compared with the 5⬘ UTRs of intronless genes. An index
of 2.88 corresponds with p < .002, and an index of 2 corresponds
with p < 0.03. The random chance for an 8-mer to pass both Splicing
criteria at 0.002 is 10−4. A more detailed description of the cal- Single representatives from eight clusters and two representa-
culations is provided in Supplemental Material. A list of all tives from each of two additional clusters of PESSs were chosen
8-mers and their corresponding z-scores is available as a 1.7 MB for testing silencing. For testing PESEs, we focused on novel
text file at http://www.columbia.edu/cu/biology/faculty/chasin/ signals, choosing six not found by ESEfinder (Cartegni et al.
xz3/octamers.txt. 2003) or among the RESCUE 6-mers (Fairbrother et al. 2002).
Two additional PESEs were chosen because they resemble
Clustering and sampling putative ESS/ESEs known enhancers (see Results). Human 293 cells were trans-
fected in 35-mm wells by the plasmids using Lipofectamine
We clustered the 974 PESSs and 2069 PESEs using a hierarchical 2000 (Invitrogen) according to the manufacturer. After 24 h,
clustering algorithm (Fairbrother et al. 2002). Using a dissimi- total RNA was isolated using RNAwiz (Ambion), treated with
larity cutoff of 3.2 in the dendrogram yielded 69 PESS clusters DNase I, and subjected to RT–PCR labeling with ␣-32P-dATP
and 80 PESE clusters (see Supplemental Material). (Chen and Chasin 1993) under the following conditions: tem-
plate, 3µL RT product; forward primer, CGCCAAACUUGGG
Statistical analysis of the PESS/PESEs in coding exons GGAAGCA; reverse primer, CGGAACUGCCUCCAACUAUC;
initial denaturation, 93°C for 5 min; denaturation, 93°C, 30
From among more than 120,000 internal coding exons, we chose sec; annealing, 61°C, 30 sec; extension, 72°C, 1 min; 28 cycles;
to look at those 50–250 nt long and flanked by at least 100 nt of final extension, 72°C, 7 min. Results were quantified with a
intron sequence on both sides. We extracted 100 nt of sequence PhosphorImager.
from each of these 78,000 exons: 25 nt from each end and 50
from the center. If an exon was shorter than 100 nt, we only
considered the two ends. For a composite intron, we collected Mutation analysis
all the corresponding introns that were at least 100 nt long. We Mutations in the hprt gene were collected from O’Neil et al.
also divided these into three parts: a 100-nt 5⬘ end, a 100-nt (1998) and Tu et al. (2000); mutations in other genes were taken
region at the center, and a 100-nt 3⬘ end. If an intron was shorter from those collected by Cartegni et al. (2002). These mutations
than 300 nt, we only considered its ends. We calculated the are listed in the Supplemental Material. A single point mutation
average frequency of all PESSs or PESEs at each position of these always changes a set of eight overlapping 8-mers to a new set of
uniform exons and introns. Pseudo exons overlapping highly eight sequences. If there were one or more putative enhancers in
repeated sequences were excluded. the original set but fewer or no enhancers in the new set, then
this change was designated as an enhancer-disruption (ED)
Constructs event. Conversely, if in the mutant set there were one or more
putative silencers but none or fewer in the original set, then this
A complete hamster dhfr minigene (pDCH1P12) was first con- change was designated as a silencer-creation (SC) event. Tabu-
structed that contained exon1, intron1 (304 bp), exons 2 and 3 lated results can be found in Supplementary Tables S1 and S2.
merged, an abbreviated intron 3 (900 bp), and exons 4–6 merged.
This minigene was driven by the dhfr promoter and was termi-
nated by the first dhfr polyA site. Exons 2 and 3 were then
replaced with a unique NotI site to form pDCH1P12D. In the Acknowledgments
course of other studies, we have tested the splicing of several We thank Will Fairbrother for providing an electronic version of
foreign exons inserted into this NotI site. When inserted into the RESCUE-ESE hexamer sequences, Adrian Krainer for pro-
this site as a polymerase chain reaction product, the exon 8 of viding the list of sequences underlying the ESEfinder program,
the human CHUK gene (Mock et al. 1995) is predominantly Harmen Bussemaker for a critical reading of the manuscript and
included when cloned with its flanking intron sequences helpful suggestions, Hongfei Zhang for help with the statistical
(PDCHUK8F, 47 and 67 nt beyond the 3⬘ and 5⬘ splice sites, analysis, and three anonymous reviewers for helpful criticisms.
respectively) but is mainly skipped when cloned without these X.H-F. Zhang is a Columbia University Predoctoral Faculty Fel-
flanking sequences (pDCHUK8), making it a sensitive indicator low. L.A.C. was supported by funds from Columbia University.
for enhancement and for silencing. In the same way we con- The publication costs of this article were defrayed in part by
structed a minigene with exon 13 of the human thrombospon- payment of page charges. This article must therefore be hereby
din4 gene inserted into the NotI site of pDCH1P12D without its marked “advertisement” in accordance with 18 USC section
flanks (pDTBSN413). The transcript of this minigene is spliced 1734 solely to indicate this fact.
efficiently without its flanks.
We inserted PESS and PESE candidates into a unique BamHI
site 22 nt downstream from the start of CHUK exon 8. We
References
synthesized the two strands of the 8-mer sequence flanked by
cohesive ends compatible with a BamHI site on each side. To Berget, S.M. 1995. Exon recognition in vertebrate splicing. J Biol
facilitate future manipulations, the BamHI site was recon- Chem. 270: 2411–2414.
structed on the upstream side of the insert and disrupted on the Black, D.L. 2003. Mechanisms of alternative pre-messenger
downstream side. For ligation of the annealed strands, we incu- RNA splicing. Annu Rev Biochem. 72: 291–336.
bated 3 µL of double-strand insertions (0.6 µg) with 1 µL of Blencowe, B.J. 2000. Exonic splicing enhancers: mechanism of
BamHI-cut vectors (∼0.1 µg, without CIP treatment) in a 20-µL action, diversity and role in human genetic diseases. Trends
reaction at 16°C for 1–2 h; a 5-µL portion was used to transform Biochem Sci. 25: 106–110.
DH5␣ competent cells. Recombinant plasmids were verified by Burge, C. and Karlin, S. 1997. Prediction of complete gene struc-

GENES & DEVELOPMENT 1249


Zhang and Chasin

tures in human genomic DNA. J Mol Biol. 268: 78–94. tion motifs and composite pre-mRNA exonic elements. Mol
Burge, C.B., Tuschl, T., and Sharp, P.A. 1999. Splicing of pre- Cell Biol. 19: 1853–1863.
cursors to mRNAs by the spliceosomes. In The RNA world, Mock, B.A., Connelly, M.A., McBride, O.W., Kozak, C.A., and
2nd ed. (ed. R.F. Gesteland, Cech, T. R. & Atkins, J. F.), pp. Marcu, K.B. 1995. CHUK, a conserved helix-loop-helix ubiq-
525–560. Cold Spring Harbor Laboratory Press, Cold Spring uitous kinase, maps to human chromosome 10 and mouse
Harbor, New York. chromosome 19. Genomics. 27: 348–351.
Cartegni, L., Chew, S.L., and Krainer, A.R. 2002. Listening to Neu-Yilik, G., Gehring, N.H., Hentze, M.W., and Kulozik, A.E.
silence and understanding nonsense: exonic mutations that 2004. Nonsense-mediated mRNA decay: from vacuum
affect splicing. Nat Rev Genet. 3: 285–298. cleaner to Swiss army knife. Genome Biol. 5: 218.
Cartegni, L., Wang, J., Zhu, Z., Zhang, M.Q., and Krainer, A.R. O’Neill, J.P., Rogan, P.K., Cariello, N., and Nicklas, J.A. 1998.
2003. ESEfinder: A web resource to identify exonic splicing Mutations that alter RNA splicing of the human HPRT gene:
enhancers. Nucleic Acids Res. 31: 3568–3571. a review of the spectrum. Mutat Res. 411: 179–214.
Chen, I.T. and Chasin, L.A. 1993. Direct selection for mutations Robberson, B.L., Cote, G.J., and Berget, S.M. 1990. Exon defini-
affecting specific splice sites in a hamster dihydrofolate re- tion may facilitate splice site selection in RNAs with mul-
ductase minigene. Mol Cell Biol. 13: 289–300. tiple exons. Mol Cell Biol. 10: 84–94.
———. 1994. Large exon size does not limit splicing in vivo. Schaal, T.D. and Maniatis, T. 1999a. Multiple distinct splicing
Mol Cell Biol. 14: 2140–2146. enhancers in the protein-coding sequences of a constitu-
Coulter, L.R., Landree, M.A., and Cooper, T.A. 1997. Identifi- tively spliced pre-mRNA. Mol Cell Biol. 19: 261–273.
cation of a new class of exonic splicing enhancers by in vivo ———. 1999b. Selection and characterization of pre-mRNA
selection. Mol Cell Biol. 17: 2143–2150. splicing enhancers: Identification of novel SR protein-spe-
Danckwardt, S., Neu-Yilik, G., Thermann, R., Frede, U., cific enhancer sequences. Mol Cell Biol. 19: 1705–1719.
Hentze, M.W., and Kulozik, A.E. 2002. Abnormally spliced Senapathy, P., Shapiro, M.B., and Harris, N.L. 1990. Splice junc-
beta-globin mRNAs: a single point mutation generates tran- tions, branch point sites, and exons: sequence statistics,
scripts sensitive and insensitive to nonsense-mediated identification, and applications to genome project. Methods
mRNA decay. Blood. 99: 1811–1816. Enzymol. 183: 252–278.
Davuluri, R.V., Grosse, I., and Zhang, M.Q. 2001. Computa- Sironi, M., Menozzi, G., Riva, L., Cagliani, R., Comi, G.P.,
tional identification of promoters and first exons in the hu- Bresolin, N., Giorda, R., and Pozzoli, U. 2004. Silencer ele-
man genome. Nat Genet. 29: 412–417. ments as possible inhibitors of pseudoexon splicing. Nucleic
Del Gatto-Konczak, F., Olive, M., Gesnel, M.C., and Acids Res. 32: 1783–1791.
Breathnach, R. 1999. hnRNP A1 recruited to an exon in vivo Storey, J.D. and Tibshirani, R. 2003. Statistical significance for
can function as an exon splicing silencer. Mol Cell Biol. genomewide studies. Proc Natl Acad Sci. 100: 9440–9445.
19: 251–260. Sun, H. and Chasin, L.A. 2000. Multiple splicing defects in an
Enssle, J., Kugler, W., Hentze, M.W., and Kulozik, A.E. 1993. intronic false exon. Mol Cell Biol. 20: 6414–6425.
Determination of mRNA fate by different RNA polymerase Tacke, R. and Manley, J.L. 1995. The human splicing factors
II promoters. Proc Natl Acad Sci. 90: 10091–10095. ASF/SF2 and SC35 possess distinct, functionally significant
Fairbrother, W.G. and Chasin, L.A. 2000. Human genomic se- RNA binding specificities. EMBO J. 14: 3540–3551.
quences that inhibit splicing. Mol Cell Biol. 20: 6816–6825. Tian, H. and Kole, R. 1995. Selection of novel exon recognition
Fairbrother, W.G., Yeh, R.F., Sharp, P.A., and Burge, C.B. 2002. elements from a pool of random sequences. Mol Cell Biol.
Predictive identification of exonic splicing enhancers in hu- 15: 6291–6298.
man genes. Science. 297: 1007–1013. Tu, M., Tong, W., Perkins, R., and Valentine, C.R. 2000. Pre-
Fedorov, A., Saxonov, S., Fedorova, L., and Daizadeh, I. 2001. dicted changes in pre-mRNA secondary structure vary in
Comparison of intron-containing and intron-lacking human their association with exon skipping for mutations in exons
genes elucidates putative exonic splicing enhancers. Nucleic 2, 4, and 8 of the Hprt gene and exon 51 of the fibrillin gene.
Acids Res. 29: 1464–1469. Mutat Res. 432: 15–32.
Hui, J., Stangl, K., Lane, W.S., and Bindereif, A. 2003. HnRNP L Urlaub, G., Mitchell, P.J., Ciudad, C.J., and Chasin, L.A. 1989.
stimulates splicing of the eNOS gene by binding to variable- Nonsense mutations in the dihydrofolate reductase gene af-
length CA repeats. Nat Struct Biol. 10: 33–37. fect RNA processing. Mol Cell Biol. 9: 2868–2880.
Kashima, T. and Manley, J.L. 2003. A negative element in Wagner, E.J. and Garcia-Blanco, M.A. 2001. Polypyrimidine
SMN2 exon 7 inhibits splicing in spinal muscular atrophy. tract binding protein antagonizes exon definition. Mol Cell
Nat Genet. 34: 460–463. Biol. 21: 3281–3288.
Ladd, A.N., and Cooper, T.A. 2002. Finding signals that regulate Zhang, M.Q. 1998. Statistical features of human exons and their
alternative splicing in the post-genomic era. Genome Biol. flanking regions. Hum Mol Genet. 7: 919–932.
3: reviews0008. Zhang, X.H., Heller, K.A., Hefter, I., Leslie, C.S., and Chasin,
Liu, H.X., Zhang, M., and Krainer, A.R. 1998. Identification of L.A. 2003. Sequence information for the splicing of human
functional exonic splicing enhancer motifs recognized by in- pre-mRNA identified by support vector machine classifica-
dividual SR proteins. Genes Dev. 12: 1998–2012. tion. Genome Res. 13: 2637–2650.
Liu, H.X., Chew, S.L., Cartegni, L., Zhang, M.Q., and Krainer,
A.R. 2000. Exonic splicing enhancer motif recognized by hu-
man SC35 under splicing conditions. Mol Cell Biol. 20: 1063–
1071.
Maquat, L.E. 2004. Nonsense-mediated mRNA decay: splicing,
translation and mRNP dynamics. Nat Rev Mol Cell Biol.
5: 89–99.
Mayeda, A., Screaton, G.R., Chandler, S.D., Fu, X.D., and
Krainer, A.R. 1999. Substrate specificities of SR proteins in
constitutive splicing are determined by their RNA recogni-

1250 GENES & DEVELOPMENT

You might also like