microrna targets Plos pub
microrna targets Plos pub
microrna targets Plos pub
MicroRNAs (miRNAs) interact with target mRNAs at specific sites to induce cleavage of the message or inhibit
translation. The specific function of most mammalian miRNAs is unknown. We have predicted target sites on the 39
untranslated regions of human gene transcripts for all currently known 218 mammalian miRNAs to facilitate focused
experiments. We report about 2,000 human genes with miRNA target sites conserved in mammals and about 250
human genes conserved as targets between mammals and fish. The prediction algorithm optimizes sequence
complementarity using position-specific rules and relies on strict requirements of interspecies conservation.
Experimental support for the validity of the method comes from known targets and from strong enrichment of
predicted targets in mRNAs associated with the fragile X mental retardation protein in mammals. This is consistent
with the hypothesis that miRNAs act as sequence-specific adaptors in the interaction of ribonuclear particles with
translationally regulated messages. Overrepresented groups of targets include mRNAs coding for transcription factors,
components of the miRNA machinery, and other proteins involved in translational regulation, as well as components of
the ubiquitin machinery, representing novel feedback loops in gene regulation. Detailed information about target
genes, target processes, and open-source software for target prediction (miRanda) is available at www.microrna.org.
Our analysis suggests that miRNA genes, which are about 1% of all human genes, regulate protein production for 10%
or more of all human genes.
Citation: John B, Enright AJ, Aravin A, Tuschl T, Sander C, et al. (2004) Human microRNA targets. PLoS Biol 2(11): e363.
mRNA cleavage between the nucleotide positions 10 and 11 phate and a 2-nt 39 overhang (Lee et al. 2003). The pre-
in the siRNA:mRNA target duplex (Tuschl et al. 1999; Zamore miRNA is subsequently transported across the nuclear
et al. 2000; Elbashir et al. 2001; Hutvágner and Zamore 2002a; membrane, dependent on the protein exportin 5 (Lund et
Llave et al. 2002b; Martinez et al. 2002; Bartel 2004; Yekta et al. 2003; Yi et al. 2003). Dicer cleaves the pre-miRNA in the
al. 2004). It appears that the extent of base pairing between cytoplasm about two helical turns away from the ends of the
the small RNA and the mRNA determines the balance pre-miRNA stem loop, producing double-stranded RNA. A
between cleavage and degradation (Hutvágner and Zamore helicase unwinds the cleaved double-stranded RNA in a
2002a). Recent examples of cleavage of target messages are, in strand-specific direction (Khvorova et al. 2003; Schwarz et al.
mouse, mir-196 guiding cleavage of Hox-B8 transcripts (Yekta 2003).
et al. 2004) and, in Epstein Barr virus, miR-BART2, a virus- One of the unwound strands is subsequently incorporated
encoded miRNA, guiding the cleavage of transcripts for virus into a ribonuclear particle (RNP) complex, RNA-induced
DNA polymerase (gene BALF5) (Pfeffer et al. 2004). While silencing complex (RISC) (Hutvágner and Zamore 2002a;
cleavage of mRNA is a straightforward process, the details of Martinez et al. 2002). Every RISC contains a member of the
the mechanism of translational repression are unknown. Argonaute protein family, which tightly binds the RNA in the
The following rules for matches between miRNA and target complex (Hammond et al. 2001; Hutvágner and Zamore
messages have been deduced from a range of experiments. (1) 2002a; Martinez et al. 2002; Mourelatos et al. 2002). There are
Asymmetry: experimentally verified miRNA target sites at least eight members of the Argonaute family in mammals
indicate that the 59 end of the miRNA tends to have more (Sasaki et al. 2003), and only a small subset has been
bases complementary to the target than its 39 end. Loopouts functionally characterized. The Argonautes and Dicer bind
in either the mRNA or the miRNA between positions 9 and single-stranded RNA via their PAZ domains (Lingel et al.
14 of the miRNA have been observed or deduced (Brennecke 2003; Sasaki et al. 2003; Song et al. 2003; Yan et al. 2003), and
et al. 2003; Johnston and Hobert 2003; Lin et al. 2003; Vella et the known structures of the PAZ domains may have
al. 2004). Recent experiments show some correlation between implications for prediction of miRNA targets (Lingel et al.
the level of translational repression and the free energy of 2003; Song et al. 2003; Yan et al. 2003).
binding of the first eight nucleotides in the 59 region of the
miRNA (Doench and Sharp 2004). However, confirmed Association of mRNAs and miRNAs with Fragile X Mental
miRNA:mRNA target pairs can have mismatches in this Retardation Protein
region (Moss et al. 1997; Johnston and Hobert 2003). (2) G:U Among the prime candidates for miRNA control are the
wobbles: wobble base pairs are less common in the 59 end of a genes that are posttranscriptionally regulated. The mRNA-
miRNA:mRNA duplex, and recent work shows a dispropor- binding protein fragile X mental retardation protein (FMRP)
tionate penalty of G:U pairing relative to standard thermo- is involved in the regulation of local protein synthesis (Antar
dynamic considerations (Doench and Sharp 2004). (3) and Bassell 2003) and binds 4% of mRNAs expressed in the
Cooperativity of binding: many miRNAs can bind to one rat brain, as tested in vitro (Brown et al. 2001). The loss of
gene (Reinhart et al. 2000; Ambros 2003; Vella et al. 2004), function of FMRP causes fragile X syndrome, the most
and the target sites may overlap to some degree (Doench and prevalent form of mental retardation (one in every 2,000
Sharp 2004). children). Over the past three years a number of different
Given the overlap between the siRNA and miRNA path-
groups have identified in vivo mRNA cargoes of FMRP. The
ways, it is reasonable to assume that rules of regulation in the
Warren and Darnell laboratories have identified ligands by
siRNA pathway will partly apply to miRNA target recognition
co-immunoprecipitation followed by microarray analysis,
(Hutvágner and Zamore 2002b; Boutet et al. 2003; Doench et
complemented by extraction of polyribosomal fractions
al. 2003). Lately, detailed characteristics associated with
(Brown et al. 2001). They discovered that FMRP and one of
siRNA functionality were identified: low G/C content, a bias
its three RNA-binding domains specifically binds to G-rich
towards low internal stability at the39 terminus, lack of
quartet motifs (Brown et al. 2001; Darnell et al. 2001; Denman
inverted repeats, and strand base preferences (positions 3, 10,
2003; Miyashiro et al. 2003). Three more studies found that
13, and 19) (Jackson et al. 2003; Reynolds et al. 2004). These
mRNAs containing U-rich motifs bind recombinant FMRP in
observations may provide clues for better quantitative
vitro and associate with FMRP-containing mRNPs in vivo
description of miRNA:mRNA interaction. Regions adjacent
(Chen et al. 2003; Denman 2003). Lastly, antibody-positioned
or near to the target site can be important for miRNA
specificity. In lin-41, a 27-nucleotide (nt) intervening se- RNA amplification as a primary screen followed by tradi-
quence between two consecutive let-7 sites is necessary for its tional methods identified over 80 FMRP-regulated mRNAs,
regulation (Vella et al. 2004). Because of lack of conservation with a combination of G-quartet and U-rich motifs in their
of this 27-nt intervening sequence in C. briggsae, incorpo- mRNA sequences (Miyashiro et al. 2003).
ration of a corresponding rule is premature. Independently, FMRP has been shown to be associated with
RISC components and miRNAs (Jin et al. 2004). The Drosophila
Maturation of miRNAs and Assembly in RNA-Induced homolog of FMRP (FXR) and the Vasa intronic gene were
Silencing Complex identified as components of RISC (Caudy et al. 2002). More
miRNAs are transcribed as longer precursors, termed pre- recent studies have proved that mammalian FMRP interacts
miRNAs (Lee et al. 2002), sometimes in clusters and with miRNAs and with the components of the miRNA
frequently in introns (25% of human miRNAs; Table S1). pathways including Dicer and the mammalian orthologs of
Upon transcription, miRNAs undergo nuclear cleavage by the Argonaute (AGO) 1 (Ishizuka et al. 2002; Jin et al. 2004). Given
RNase III endonuclease Drosha, producing the 60–70-nt the association of FMRP with Argonaute-containing com-
stem-loop precursor miRNA (pre-miRNA) with a 59 phos- plexes, we propose and investigate the hypothesis that the
Results
Prediction of miRNA Targets
Using currently known mammalian miRNA sequences, we
scanned 39 untranslated regions (UTRs) from the human
(Homo sapiens), mouse (Mus musculus), and rat (Rattus norvegicus)
genomes for potential target sites. The scanning algorithm
was based on sequence complementarity between the mature
miRNA and the target site, binding energy of the miRNA–
target duplex, and evolutionary conservation of the target
site sequence and target position in aligned UTRs of
homologous genes. We identified as conserved across
mammals a total of 2,273 target genes with more than one
target site at 90% conservation of target site sequence (Tables
S2 and S3) and 660 target genes at 100% conservation. We
also scanned the zebrafish (Danio rerio) and fugu (Fugu rubripes)
fish genomes for potential targets using known and predicted
miRNAs (Figure 1; Tables S4 and S5) and identified 1,578
target genes with two or more conserved miRNA sites
between the two fish species.
In addition to the analysis of 39 UTRs, we also scanned all
protein-coding regions for high-scoring miRNA target sites.
Figure 1. Target Prediction Pipeline for miRNA Targets in Vertebrates
For convenience, these results are reported both as hits in
The mammalian (human, mouse, and rat) and fish (zebra and fugu) 39
cDNAs (coding plus noncoding; Table S6) and as hits in UTRs were first scanned for miRNA target sites using position-
coding regions (Table S7), with cross-references in the UTR specific rules of sequence complementarity. Next, aligned UTRs of
target tables (number of hits in the coding region for each orthologous genes were used to check for conservation of miRNA–
target relationships (‘‘target conservation’’) between mammalian
UTR in Tables S2 and S3). genomes and, separately, between fish genomes. The main results
The algorithm and cutoff parameters were chosen to (bottom) are the conserved mammalian and conserved fish targets,
provide a flexible mechanism for position-specific constraints for each miRNA, as well as a smaller set of super-conserved vertebrate
and to capture what is currently known about experimentally targets.
DOI: 10.1371/journal.pbio.0020363.g001
verified miRNA target sites: (1) nonuniform distribution of
the number of sequence-complementary target sites for
different miRNAs; (2) 59–39 asymmetry (the complementary (www.sanger.ac.uk; Griffiths-Jones 2004). We provide both
pairing of about ten nucleotides at the 59 end is more high-scoring targets, as strong candidates for validation
important than that of the ten nucleotides at the 39 end experiments, and lower-scoring targets, which may have a
[Doench and Sharp 2004], and the matches near the 39 end role in broader background regulation of protein dose.
can to a limited extent compensate for weaker 59 binding); Expression information (see Table S3) for miRNAs and
and (3) influence of G:U wobbles on binding (Doench and mRNAs provides an additional filter for validation experi-
Sharp 2004). In choosing these parameters, we drew on ments, in addition ranking target sites by complementarity
experience from careful analysis of target predictions in and evolutionary conservation.
Drosophila (Enright et al. 2003) as well as proposed human
targets of virus-encoded miRNAs (Pfeffer et al. 2004). Validation of Target Predictions
To facilitate evaluation of predicted targets and design of Only a small number of target sites of target genes
new experiments, we provide methods and results in a regulated by miRNAs have been experimentally verified, so
convenient and transparent form. We make the miRanda we sought direct and indirect evidence to help validate or
software freely available under an open-source license, so that invalidate the proposed set of mammalian targets. (1) We
researchers can adjust the algorithm, numerical parameters, compared predicted targets with experimentally verified
and position-specific rules. We also provide web resources, targets in mammals, C. elegans, and D. melanogaster, as well as
including a viewer for browsing potential target sites, their mammalian homologs. (2) We compared predicted
conserved with or without positional constraints, on aligned target numbers from real and shuffled miRNA sequences and
UTRs, with periodic updates (www.microrna.org), as well as estimated the rate of false-positive predictions. (3) We
links to these targets from the miRNA registry site RFAM assessed the enrichment of miRNA targets in mRNAs that
are known cargoes of FMRP, an RNA-binding protein known cog-1, nkx-6.1, is a conserved target for five different miRNAs
to be involved in translational regulation. in our predictions (see Table S2).
Agreement with known targets. We previously used known The comparison of our results with known targets shows
miRNA sites for the let-7 and lin-4 miRNAs in Drosophila to that our method can detect most (but not all) known target
develop the target prediction method and check for sites and target genes at reasonably high rank. However, given
consistency (Enright et al. 2003). More recent experimental the small number of experimentally verified miRNA–target
target identification provides independent control data. pairs, additional validation tests are desirable, such as
Recent work in C. elegans (Vella et al. 2004) has narrowed statistical tests using randomization of miRNA sequences to
the originally reported list of six target sites for let-7 in the estimate false positives.
UTR of lin-41 down to three elements, two target sites, and a Estimate of false positives. As a computational control of
27-nt intervening sequence (a possible binding site for the validity of the prediction method, one can perform a
another factor). The surviving two target sites have high statistical test that attempts to estimate the probability that a
alignment scores, S = 115 and S = 110, while the other four predicted site is incorrect. Here, a ‘‘false positive’’ is a
sites are below threshold (Enright et al. 2003), fully consistent predicted target site of a real miRNA on a real mRNA that
with the experimental results. As one of the confirmed sites has passed all relevant thresholds but is incorrect in that it is
has a single-residue bulge, target prediction methods that not biologically meaningful. The statement ‘‘not biologically
require a perfect run of base pairs near the 59 end of the meaningful’’ is rarely clearly defined, but can reasonably be
miRNA would not detect it, while our method does. lsy-6, a taken to mean that no functionally effective miRNA:mRNA
recently experimentally identified miRNA in C. elegans, interaction occurs under conditions of co-expression at
controls left–right neuronal asymmetry via cog-1, an Nkx-type physiological concentration, where ‘‘functionally effective’’
homeobox gene; the cog-1 gene has a target site in its 39 UTR, is defined in terms of detectable changes of phenotypic
which also has a high score (S = 125) and passes the attributes.
conservation filter. Technically, an estimate of the false-positive rate can be
Experiments in D. melanogaster have identified six new obtained by computing (directly or via randomization) the
miRNA–target gene pairs: miR-7 targets the notch signaling background distribution of scores for biologically non-
meaningful miRNA target sites and then deriving the
genes HLHm3, HLHm4, and hairy, and miR-2b targets the genes
probability that a non-meaningful target site passes all score
reaper, grim, and sickle (Stark et al. 2003). Consistent with these
thresholds, i.e., for a single aggregate score, that the incorrect
experiments, our target predictions in D. melanogaster (Enright
site has a score T . Tc, where Tc is a fixed threshold that may
et al. 2003) ranked HLHm3, hairy, and HLHm4 at positions 1, 3,
be, in general, different for each miRNA. We chose to
and 7, respectively, in the list of 143 target genes for miR-7
estimate the background distribution using shuffled miRNAs
(Enright et al. 2003). Similarly, our predictions ranked reaper,
obtained by swapping randomly selected pairs of bases of
grim, and sickle at positions 3, 11, and 19, respectively, among
each given miRNA 1,000 times, keeping the nucleotide
the other 120 predicted target genes for miR-2c. We also
composition constant. The shuffled miRNA sequences were
predicted miR-6 to target this group of pro-apoptotic genes,
scanned against human, mouse, and rat 39 UTR sequences
with sites that have lower scores than the miR-2 family but are
exactly as for the prediction procedure for real miRNA
conserved in D. pseudoobscura. Unfortunately, one cannot in
sequences. In the procedure, a miRNA:mRNA match site is
general use these predicted and then validated target sites predicted to be a target site if it passes three thresholds, S .
(Stark et al. 2003) for the derivation of new prediction rules, Sc for match score, jDGj . jDGcj for free energy of duplex
as the set of targets tested is limited to the type predicted and formation, and C . Cc for conservation, where C reflects a
is not exhaustive. binary evaluation of orthology of mRNAs, similarity of
Indirect validation comes from the prediction that position of the site on the mRNA, and a threshold percentage
mammalian orthologs of some of the known miRNA targets of conserved residues in the two mRNA target sites. Finally,
in C. elegans and D. melanogaster are miRNA targets. An the predicted target sites for a set of shuffled miRNAs are
example is the proposed conservation of the miRNA–target counted and then averaged over a total of ten randomized
relationship lin-4:lin-28 (we use the notation miRNA:mRNA runs. The percentage of false positives for target transcripts
for a miRNA–target pair), first discovered in worm (Moss and with more than two, three, and four sites is 39%, 30%, and
Tang 2003): we detect target sites in human lin-28 for the lin-4 24%, respectively, using a non-permissive conservation
miRNA homolog miR-125. We also confirm the human analog threshold of 100% for target site sequences (Figure 2). In
of a let-7:lin-28 relation predicted in C. elegans (Reinhart et al. addition, the false-positive rate for single sites with a score of
2000). In summary, the predicted target sites on human lin-28 more than 110 is approximately 35%.
are miR-125 (1 site), let-7b (2 sites; Moss and Tang 2003), miR- To provide a realistic estimate of false positives using
98 (2 sites), and miR-351 (1 site). Another known lin-4 and let-7 randomization, the distribution of scores from random trials
target in C. elegans is lin-41. The human homolog of lin-41 (‘‘random-false’’) should be similar to the distribution of
(sequence provided by F. J. Slack, personal communication) incorrect (non-meaningful) hits from real trials (‘‘real-false’’).
and another closely related gene (encoding Tripartite motif The difference between these two distributions is difficult to
protein 2) are predicted as high-ranking targets of let-7 and compute in principle, as very few validated correct predic-
miR-125 (the human homolog of lin-4) (see Tables S2 and S3). tions are known at present. For human sequences, without
Another known instance of miRNA target regulation in any conservation filter, we obtained a total of 2,538,431
worms is the regulation of cog-1 by the lsy-6 miRNA (Johnston predicted target sites for real miRNAs, and, for shuffled
and Hobert 2003). Although there is no obvious homolog of miRNAs, on average, 2,033,701 (6 82,172) target sites—a
lsy-6 in mammals, the vertebrate homolog of the target gene difference of 20%. This difference may be indicative of a
biological signal in the raw score (S) and energy (DG) transcripts). This suggests that in the 397 FMRP target genes,
calculated by the miRanda algorithm or may be due to 59 should pass the filters. The enrichment factor does not
different polynucleotide compositions of shuffled miRNAs vary much with the cutoff parameters used in target
compared to real miRNAs. Even if this difference represents a prediction (data not shown), but is subject to some
real effect, by far the most predictive criterion for accurate uncertainty because of potential false-positive predictions.
target detection is conservation of target sites across species, The enrichment of miRNA:FMRP interaction is consistent
and not alignment scores or energies (20% compared to a with the hypothesis that translational control involving FMRP
factor of three, see Figure 2; Table S8). As a consequence, the protein is executed in a complex that involves one or more
current set of predicted targets rests heavily on the criterion miRNAs interacting with transcripts at specific sites. Note
of conservation of miRNA:mRNA match between different that this analysis supports the validity of target gene
species. We believe this to be essentially true for all currently prediction, not the identity of the controlling miRNA or
published target prediction methods. the accuracy of specific sites.
Indirect experimental support: FMRP-associated mRNAs. An additional validation test involved FMRP cargoes that
An excellent opportunity to test our target predictions comes had been identified in more than one study, using independ-
from experiments showing the association of mRNAs and ent experimental methods. For example, the mRNAs of 14
miRNAs with proteins involved in translational control, even genes (Brown et al. 2001) were overrepresented in both the
if these experiments do not provide information on specific polyribosome fraction of mouse fragile X cells and in co-
miRNA:mRNA pairings. In particular, FMRP, which may immunoprecipitation with mouse brain mRNPs that contain
regulate translation in neurons, not only associates with FMRP. Almost all of the 14 genes are predicted targets with
hundreds of mRNAs (Brown et al. 2001; Chen et al. 2003; more than one conserved site (11 of 12 annotated UTRs;
Denman 2003; Miyashiro et al. 2003; Waggoner and Liebhab- Table S9). In some cases, expression data provide additional
er 2003) and with miRNAs (Jin et al. 2004), but also associates support: postsynaptic density protein 95 (PSD95)–associated (SA-
with components of the miRNA processing machinery, Dicer, PAP4), a neuron-specific protein, is regulated by many
and the mammalian homologs of AGO1 and AGO2 (Jin et al. miRNAs highly expressed in rat brain primary cortical
2004). If all FMRP-bound mRNAs are regulated by miRNAs, neurons (Kim et al. 2004).
one should see a large enrichment of predicted targets among In summary, the three validation approaches (retrospec-
such mRNAs. We tested this hypothesis with 397 FMRP- tive, statistical, and indirect experimental) suggest that the
associated mRNAs taken from a number of recent experi- current version of the miRanda algorithm, in spite of clear
ments (Brown et al. 2001; Chen et al. 2003; Denman 2003; limitations, can predict true miRNA targets at reasonable
Miyashiro et al. 2003; Waggoner and Liebhaber 2003). accuracy, provided that (1) the targets are detected as
Are FMRP-bound messages enriched in predicted targets? conserved and (2) the gene contains more than one miRNA
Using five different datasets (Table S9), we predicted that target site or a single high-scoring site (S . 110, approx-
74% of FMRP-associated messages are miRNA target genes imately, including sites with almost perfect complementarity
(294 of 397 mRNAs). This corresponds to an enrichment suggestive of mRNA cleavage).
factor of about five compared to the 59 targets one would
expect from our analysis in a randomly chosen set of 397 Overview of Mammalian miRNA Target Genes
mRNAs, where 59/397 equals 4,462/29,785 (4,462 predicted More than 2,000 mammalian targets. We predicted 2,273
mammalian target mRNAs pass the 90% conservation filter genes as targets with two or more miRNA target sites in their
for one or more sites per transcript out of a total of 29,785 39 UTRs conserved in mammals at 90% target site con-
Target miRNA
Gene Identifier Description ID
Add ‘‘ENSG00000’’ to the beginning of the identifiers to derive Ensembl identifiers. All miRNA–target relationships shown here are conserved in mammals, i.e., homologous
miRNAs target transcripts of homologous genes at similar UTR positions with similar local sequence. Genes that are predicted to be targets in both mammals and fish are in
bold. Where the miRNA–target relationship is also conserved in non-mammalian vertebrates, the miRNA is in bold.
a
Contains conserved CPE motif.
N.A., not available.
DOI: 10.1371/journal.pbio.0020363.t001
servation (see Tables S2 and S3). This means we predicted tion (some are also conserved in fish). Somewhat surprisingly,
approximately 9% of protein-coding genes to be under the number of predicted targets per miRNA varies greatly,
miRNA regulation. In addition, we predicted another 2,128 from zero (for seven miRNAs) to 268 (for let-7b), but the
genes with only one target site, but the false-positive rate for distribution is nonuniform (mean = 7.1, standard deviation
these is significantly higher (Figure 2). Of these, the top- = 4.7; Figure 3). This indicates a range of specificity for most
scoring 480 genes (S . 110) have an estimated false-positive miRNAs and suggests that regulation of one message by one
rate comparable to that of genes with multiple sites and thus miRNA is rare.
also are good candidates for experimental verification. Some Functional analysis. We analyzed the distribution of func-
of the genes with single sites may contain additional sites that tional annotation for all targets of all miRNAs using Gene
we cannot detect for a number of reasons, including Ontology (GO) terms (see Materials and Methods; Table S10)
truncated UTRs. A significant subset of the total number of and domain annotations from InterPro (Mulder et al. 2003).
single-site target genes (7%) has near complementary single The target genes reflected a broad range of biological
sites. These near complementary sites may indicate cleavage, functions (Figure S1). The most enriched GO term was
for which additional sites may not be necessary. The targets ‘‘ubiquitin-protein ligase activity,’’ with 3.3-fold enrichment
listed in Table 1 were selected for variety of function, (Table S10). Since ubiquitination is a process controlling the
variation in number of sites, and varied extent of conserva- quantity of specific proteins in a cell at specific times, miRNA
Target miRNA
a
Gene Reference ID Description ID
APPb,c Denman 2003 142192 Amyloid beta A4 protein precursor let-7d, miR-130, miR-214
BASP1c Brown et al. 2001 176788 Neuronal axonal membrane protein NAP-22 miR-207, miR-18, miR-22
CACNA1D Chen et al. 2003 157388 Voltage-dependent L-type calcium channel miR-291–5p
alpha-1D subunit
CICb,c Brown et al. 2001 079432 Capicua (Drosophila) homolog miR-202, miR-210, miR-292-as
CLTC Chen et al. 2003 141367 Clathrin heavy chain 1 miR-122a, miR-330
DDX5 Chen et al. 2003 108654 Probable RNA-dependent helicase P68 miR-1d, miR-147, miR-154, miR-33
DLG3c,e Chen et al. 2003 082458 Presynaptic protein SAP102 miR-15b, miR-196, miR-326
DLG4c Brown et al. 2001 132535 PSD95, presynaptic density protein miR-125ad, miR-135, miR-324–3p
FACL3b,c Brown et al. 2001 123983 Long-chain Acyl-CoA synthetase 3 let-7d, miR-141, miR-98
FMR1c Brown et al. 2001 102081 FMRP1 miR-194d, miR-297d, miR-326
FMR2 Chen et al. 2003 155966 FMRP2 miR-152d
FXR1b Denman 2003 114416 FMRP1 let-7d, miR-199, miR-336
HNRPA2Bc Chen et al. 2003 122566 Heterogeneous nuclear ribonucleoproteins miR-103d, miR-143, miR-151
A2/B1
HTR1Bb Denman 2003 135312 5-hydroxytrypatmine 1B receptor miR-292-as, miR-25, miR-202, miR-183
HTR2Cc Denman 2003 147246 5-Hydroxytrypatmine 2C receptor let-7e, miR-352, miR-199-as, miR-9
MAP1B Brown et al. 2001 131711 Microtubule-associated protein 1B miR-325, miR-136
MAP4K4 Brown et al. 2001 071054 Mitogen-activated protein kinase 4 miR-29a
Mint homologb Brown et al. 2001 065526 Smart/HDAC1-associated repressor protein miR-203
SEMA3F Miyashiro et al. 2003 001617 Semaphorin 3F miR-182, miR-325
Transcripts for genes (Gene and ID) are described as FMRP cargoes in several studies (DR) and predicted here as targets of specific miRNAs (miRNA). Selected from a total of
294 such targets.
a
Reference from which data was extracted.
b
Homologous miRNA–mRNA pair conserved in fish.
c
Additional miRNAs are predicted to target the gene (number in parentheses): APP (9), BASP1 (4), Capicua (2), DLG3 (7), and DLG4 (5).
d
The miRNA has multiple target sites on the gene.
e
The 39 UTR of the gene contains a CPE motif (Table S11).
DOI: 10.1371/journal.pbio.0020363.t002
FMRP cargo mRNAs regulated by miRNAs. FMRP is Other APP-interacting proteins, APP-binding family B
composed of several RNA-binding domains (two KH and member 1 (mir-9, miR-340, and miR-135b), APP-binding family
one RRG) that bind messages. The specific binding motifs for member 2 (let-7 and miR-218), and APP-binding family 2 (miR-
FMRP on messages are incompletely known, but are thought 188 and miR-206) were also predicted targets, some of which
to include G-quartet patterns and/or U-rich sequences had near exact target site matches. In summary, the APP gene
(Dolzhanskaya et al. 2003; Ramos et al. 2003). We predicted appears to be subject to translational regulation by the
294 mRNAs known to be FMRP cargoes as miRNA targets (see combinatorial control of a number of different miRNAs.
Table S9). The most reliable of these (Table 2) reflect high PSD95 and synaptic processes. PSD95 and similar scaffold-
confidence in experimental identification of FMRP associa- ing molecules, link the NMDA receptor with intracellular
tion or conservation of target site between mammals and fish. enzymes that mediate signaling; this process is involved in the
Alzheimer’s disease amyloid protein. Amyloid precursor development and maintenance of synaptic function and
protein (APP) is an FMRP-bound protein that is translation- synaptic plasticity, and interference in this process is
ally regulated. The APP transcript contains a 29-nt motif at implicated in schizophrenia and bipolar disorder (Beneyto
position 200 in the 39 UTR that is known to aid destabiliza- and Meador-Woodruff 2003). FMRP binds PSD95 and is
tion of the APP mRNA in certain nutrient conditions and that required for mGluR-dependent translation of PSD95 (Todd
binds nucleolin, a protein associated with RNPs containing et al. 2003). PSD95 is a high-ranking target of miR-125, miR-
FMRP (Rajagopalan and Malter 2000). In addition, there is an 135, miR-320, and miR-327, all of which are either exclusively
81-nt sequence at position 630 in the APP 39 UTR that is expressed in brain or enriched in brain tissue (Lagos-
required for the TGFbeta-induced stabilization of the APP Quintana et al. 2002; Krichevsky et al. 2003; Sempere et al.
mRNA (Amara et al. 1999). We predicted APP as a target, with 2004). In particular, large transcript numbers of miR-125b are
a total score of S = 708 with a minimum of eight miRNA found copurified with polyribosomes in rat neurons in (Kim
sites, including two let-7 top-ranking sites that are conserved et al. 2004). PSD95 has one reported G-quartet in its 39 UTR
in human, mouse, and rat. One of the predicted miRNA at position 648 (Todd et al. 2003), further suggesting it as an
target sites in the APP UTR lies in the 81-nt region (Figure 4), in vivo FMRP target. We predicted an additional G-quartet
and another is within 30 nt of the motif at position 200. site at position 205–235 in the 39 UTR of PSD95. One of the
miRNA (miR-125) target sites overlaps with the G-quartets, Components of RNPs Regulated by miRNAs
raising the possibility that miRNAs directly compete with FMRP-associated proteins. FMRP binds its own mRNA,
FMRP to bind the message in this location. Likewise, NAP-22, implying negative feedback if the binding inhibits FMRP
which has three miRNA target sites (see Table S9), has a miR- production (Ceman et al. 1999). The fact that miRNAs target
207 target site that overlaps with a G-quartet (Darnell et al. transcripts for FMRP and FMRP-binding proteins suggests
2001). another negative feedback loop in which high levels of these
Other PSD95 family members are also involved in synaptic proteins inhibit their own production (depending, of course,
processes, in particular, in the integration of NMDA signaling on the concentration of miRNAs and mRNAs) (Figure 4). The
in the synaptic membrane. All PSD95 family members in genes for six FMRP-associated (not associated at the same
mammals (also known as discs large 1–5), SAP90, and CamKII time) proteins, hnRNP A1, Pur-alpha, Pur-beta, Staufen,
are predicted miRNA targets (see Table S9), as well as mGluR, AGO-2, and PABP, are predicted miRNA targets. This
the protein product of which is an agonist that induces the indicates that FMRP-containing RNPs are under miRNA
rapid translation of PSD95 (Todd et al. 2003) and three regulation. FXR2, a gene similar to FMR1 is also a miRNA
NMDA receptor subunits (see Table S9). These results suggest target in human, mouse, rat, and fish. Details of the implied
that miRNAs may be involved in NMDA and glutamate feedback regulation and differential control of RNP action
receptor signaling to coordinate and integrate information, remain to be determined.
with specificity achieved through the combinatorial action of RISC. Our data suggest that the RNAi–miRNA machinery
different miRNAs. itself is under miRNA regulation; for example Dicer appears
to be controlled by let-7 and miR-15b; Ago-1 by let-7 and miR- plasticity. We have three more lines of evidence suggesting
29b/c; Ago-2 by miR-138; Ago-3 by miR-138, miR-25, and miR- the notion that translational regulation by CPEB is linked to
103; and Ago-4 by miR-27a/b. Dicer and two of the Argonautes miRNA regulation. First, our target list and the list of genes
also are predicted to be targets in both zebrafish and fugu. regulated by CPEB significantly overlap. There are nine genes
The let-7 sites on the 39 UTR of Dicer and Ago-1 (Figure 4) will known to be CPEB-regulated, seven of which are predicted
accommodate most of the let-7 variants with similar scores. targets: alpha-CAMIIK, Map 2, Inositol 1, 4–5-Triphosphate
The variants of let-7 are expressed in a wide range of tissues Receptor type 1, Ephrin A receptor class A type 2, SCP-1, and
and developmental stages, suggesting broad regulation of CPEB3 (Mendez and Richter 2001). Second, CPEB is known to
Dicer and Ago-1 by miRNAs. In contrast, the only miRNA that self-regulate in D. melanogaster (Tan et al. 2001). The CPEB1
targets Ago-2 is miR-138, which has so far been cloned only homolog in fly, orb, and CPEBs in vertebrates are predicted
once in the cerebellum (Lagos-Quintana et al. 2002). The miRNA targets. Third, the gene most correlated in expression
target site for miR-138 has only one mismatch at position 8; to the CPEB homolog in D. melanogaster is a Piwi protein
this may induce a siRNA-like cleavage of the message (Sting), a member of the Argonaute family (Pal-Bhadra et al.
(Hutvágner and Zamore 2002a; Doench et al. 2003). Ago-3 is 2002; Stuart et al. 2003) that is involved in translational
also a top target for miR-138, with only two mismatches in its regulation and in the RISC.
site. We suggest that some miRNAs targeting this machinery Among the predicted miRNA targets, 115 genes also
(e.g., let-7, miR-27, miR-29, and miR-103) are expressed fairly contained CPE motifs, which were conserved in at least two
widely, while others (e.g., miR-138 and miR-25) have lower and mammals in the same positions in the UTRs and are therefore
more restricted expression. candidates for CPEB regulation (Table S11; see Materials and
Other RNPs. The highly conserved RNA-binding proteins, Methods). Our predictions include HuB, HuR, Eif-4 gamma,
ELAV-like proteins (HuR, HuB, HuC, and HuD), contain DAZ associated protein 2, VAMP-2 (known to be posttran-
three RNA-recognition motifs, which bind AU-rich elements scriptionally regulated), Presynaptic protein SAP102, and
in 39 UTRs of a subset of target mRNAs (Good 1995). These brain-derived neurotrophic factor precursor. Taken together
AU-rich elements increase the proteins’ cytoplasmic stability these data suggest that the CPEB genes, the known CPEB-
and increase translatability (Perrone-Bizzozero and Bologna- regulated genes, and the predicted CPEB-regulated genes are
ni 2002). Experiments have identified 18 mRNAs bound to strong miRNA target candidates and provide rich ground for
HuB in retinoic-acid-induced cells; of the 14 we were able to experimentation.
map unambiguously, 12 are predicted miRNA target genes:
Elavl1 (known to regulate its own mRNA), Gap-43, c-fos, PN-1, Targets of Cancer-Related miRNAs
Krox-24, CD51, CF2R, CTCF, NF-M, GLUT-1, c-myc, and N- Deregulated expression of certain miRNAs has been linked
cadherin (Tenenbaum et al. 2000). Three of the ELAV-like to human proliferative diseases such as B cell chronic
genes themselves are also targets of a large number of lymphocytic leukemia (Calin et al. 2002; Lagos-Quintana et
miRNAs (see Tables S2 and S3; Figure 4). This is yet another al. 2003) and colorectal neoplasia (Michael et al. 2003). Recent
example of miRNAs predicted to target the bound messages analysis of the genomic location of known miRNA genes
of RNA-binding proteins and of the regulation of RNA- suggested that 50% of miRNA genes are in cancer-associated
binding genes by miRNAs. genomic regions or in fragile sites (Calin et al. 2004). The
miRNAs miR-15 and miR-16 are located within a 30-kb region
Cytoplasmic Polyadenylation Binding Proteins Regulated at Chromosome 13q14, a region deleted in 50% of B cell
by miRNAs chronic lymphocytic leukemias, 50% of mantle cell lympho-
We predicted all four human cytoplasmic polyadenylation mas, 16%–40% of multiple myelomas, and 60% of prostate
binding proteins (CPEBs) known in mammals as miRNA cancers (Calin et al. 2002). Furthermore, miR-15 and miR-16
targets ranked within the top 170 target genes with 6–20 sites are down-regulated, or their loci lost, in 68% of B cell chronic
in their UTRs (Figure 4; Table S11). Indeed, CPEB2 is the lymphocytic leukemias (Calin et al. 2002). Similarly, miR-143
highest-ranking gene of all transcripts. The orthologs to and miR-145 are down-regulated at the adenomatous and
CPEB1 in fish and fly (known as orb in D. melanogaster) are also cancer stages of colorectal neoplasia (Michael et al. 2003), and
predicted as targets. CPEB is an RNA-binding protein first miR-155 is up-regulated in children with Burkitt lymphoma
shown to activate translationally dormant mRNAs by regulat- (Metzler et al. 2004).
ing cytoplasmic polyadenylation in Xenopus oocytes (Hake Our method predicted cancer-specific (by annotation) gene
and Richter 1994). It also regulates dendritic synaptic targets of miR-15a, miR-15b, miR-16, miR-143, miR-145, and
plasticity (Mendez and Richter 2001; Richter 2001) and miR-155. The target genes and their miRNA regulators are as
dendritic mRNA transport (Mendez and Richter 2001; Huang follows: (1) CNOT7, a gene expressed in colorectal cell lines
et al. 2003) and facilitates transport of mRNAs in dendrites and primary tumors (Flanagan et al. 2003) (miR-15a); (2)
together with kinesin and dynein in RNPs (Huang et al. 2003). LASS2, a tumor metastasis suppressor (Pan et al. 2001) (miR-
CPEB binds to its target message through the CPE motif 15b); (3) ING4, a homolog of the tumor suppressor p33 ING1b,
(UUUUAU), which must be within a certain distance of the which stimulates cell cycle arrest, repair, and apoptosis
hexanucleotide AAUAAA. CPEB keeps messages in their (Shiseki et al. 2003) (miR-143); (4) Gab1, encoding multivalent
dormant state until phosphorylated, after which it activates Grb2-associated docking protein, which is involved in cell
polyadenylation (Mendez et al. 2000), thereby activating proliferation and survival (Yart et al. 2003) (miR-155); and (5)
translation or degradation (Mendez et al. 2002). In addition, COL3A1, a gene up-regulated in advanced carcinoma (Tapper
CPEB co-fractionates with the postsynaptic density fraction et al. 2001) (miR-145).
in mouse synaptosomes, consistent with translation of stored miR-16 has a tantalizing number of high-ranking targets
mRNAs in dendrites being part of the mechanism of synaptic that are cancer associated and specifically involved in the
Sumo pathway There is increasing evidence that Sumo extent there are common pathways regulated by common
controls pathways important for the surveillance of genome miRNAs between vertebrates and invertebrates.
integrity (Muller et al. 2004). The first- and fifth-highest-
ranked targets of miR-16 are Sumo-1 activating and con- Target Sites in Protein-Coding Sequences
jugating enzymes, respectively. The top two single-site targets Experiments suggest that miRNA target sites in metazoans
for miR-16 are an Activin type II receptor gene (TGFbeta are preferentially in UTRs, not in coding regions. If this is
signaling) and Hox-A5, both known to be dysregulated at the true, a correct target site prediction method should predict a
level of protein expression in colon cancers (Wang et al. larger number of targets in UTRs than in coding regions.
2001). Both of these sites show near perfect complementary Alternatively, target sites in coding regions may so far have
matching between miR-16 and the target genes (indicating escaped experimental verification, especially in plants, in
possible cleavage). Both of these target genes are also targets which targets of miRNAs in coding regions are the rule, not
for another cancer related miRNA, miR-15. the exception.
To investigate this issue we computed the average density
Targets Conserved between Mammals and Fish of target sites for high-scoring targets (S . 130) and before
Roughly 55 miRNAs have identical mature sequences in application of conservation filters. The statistical assessment
fugu and mammals, and 80 have very similar sequences in the of the influence of conservation filters in coding regions
two species; additional fish miRNA sequences can be would have raised complicated issues, as nearly two-thirds of
predicted with confidence based on sequence similarity. It nucleotides in coding regions are conserved between mam-
is therefore reasonable to expect that the targets of these malian genomes to preserve amino acid sequences. Interest-
probably functionally homologous miRNAs are orthologous ingly, we found, on average, 11 pre-conservation target sites
genes in the different species. To follow up on this hypothesis, per 1 million nucleotides in coding regions, versus 15 such
we assessed conservation of mammalian miRNA–target pairs target sites per 1 million nucleotides in UTRs. This is
between the 2,273 mammalian and 1,578 fish miRNA targets consistent with a stronger ‘‘raw’’ prediction signal in UTRs
(with more than one target site per UTR). The analysis yielded and may indicate a lower number of biologically relevant
240 target genes conserved between mammals and fish. The target sites in coding regions in mammals, consistent with
number 240 is probably an underestimate because of several early experimental findings.
factors, including: (1) unfinished annotation of genomes, As a guide to experimentation, we report all sites in coding
particularly rat and fugu; (2) ambiguity in assigning orthologs; regions with an alignment score above 110 for miRNAs of
and (3) lack of UTR information. length up to 20 nt and an alignment score above 130 for
The full set of conserved target genes between fish and miRNAs longer than 20 nt (scores depend on the length).
mammals indicates a wide functional range of conserved These cutoff scores approximately correspond to a 75%
targets (Table S12). Many Hox genes are conserved as targets, complementary match between miRNA and target, leaving
including the miR-196 targets, Hox-A4:miR-34a, Hox-C9:let-7b open the question of how many match pairs are needed to
(near prefect complementary match), and Hox-B5:miR-27b. lead to translational inhibition in coding regions, by any
Examples from the notch signaling pathway include miR- mechanism. We identified 942 genes that contained such sites
30:hairy enhancer of split 1 (Hes1) and miR-152:noggin. in their coding regions. Strikingly, there was only one site
with a perfect match, and this was for the imprinted miR-127,
known to be antisense to the reciprocally imprinted retro-
Targets Conserved between Vertebrates and Flies
transposon-like gene on the opposite strand (Seitz et al.
Twenty-eight of the 78 identified miRNAs in flies have
2003). Of the 942 genes, 25% have been otherwise identified
apparent mammalian homologs. Based on this remarkable
as targets based on conserved target sites in their UTRs.
conservation across hundreds of millions of years, it is
However, only five genes have targets sites in their UTRs
reasonable to expect that there is some conservation of
complementary to the same miRNA that targets the coding
target sites, target genes, and target pathways between flies
region (see Table S3, columns H and I). For example, miR-211
and humans. Most strikingly we can identify hox genes and
has a near perfect complementary site in the coding region of
axon guidance genes as common targets between vertebrates
a gene of unknown function (Ensembl ID ENSG00000134030,
and flies, e.g., capicua and sex combs reduced (one of the
containing an Eif-4 gamma domain) and also has two
vertebrate homologs of Hox-A5). The hox gene cluster in
conserved ‘‘normal’’ sites in the UTR. Similarly, miR-198 has
Drosophila contains high-ranking predicted targets (Enright et a site in the coding region, as well as conserved sites in the
al. 2003) of miR-10 and miR-iab-4, and the hox gene cluster in UTR region, of a sodium and chloride GABA transporter
mammals contains high-ranking targets of miR-196. These (Ensembl ID ENSG00000157103). However, we see no trend
miRNAs are themselves located in the hox gene region. We for miRNAs that have conserved sites in UTRs to have
predicted miR-iab-4–3p to target abd-B in Drosophila, a gene additional sites in the coding region; rather, stronger target
related to the ancestral hox-7 cluster, the ancestral parent of sites for a given miRNA tend to be confined either to the UTR
many of the predicted targets of miR-196. Axon guidance or the coding region and are rarely in both.
receptors and ligands conserved as targets include Lar,
ephrins, and slits. Human slit1 is a top target of miR-218, Target Sites with Near Perfect Matches in cDNAs
which itself is transcribed from the intron of slit2, suggesting We scanned all cDNAs for high-scoring matches without
down-regulation of slit1 by transcription of slit2. We expect using conservation to check for high-scoring targets, which
that there are many more conserved targets but we are we may have missed through strict conservation rules (see
hindered by the difficulty of mapping orthologous genes Table S6). Over 40 genes contain sites that have near perfect
between human and fish. Future work will elucidate to what complementarity to a miRNA (S .120), and these target
Table 3. Number of Genes and 39 UTR Sequences Used for Target Prediction 39
Organism Ensembl Build Total Genes Total Transcripts Ensembl 39 UTRs Predicted 39 UTRs
DOI: 10.1371/journal.pbio.0020363.t003
genes may be cleaved rather than translationally repressed as complementarity of a 7-nt miRNA ‘‘seed’’ sequence, defined
in the case of miR-196 and Hox-B8. For example miR-298, an as positions 2–8 from the 59 end of the miRNA, to a potential
embryonic-stem-cell-specific miRNA (Houbaviy et al. 2003), target site on the mRNA, followed by optimization of mRNA–
has a near match with MCL-1, and miR-328 (neuronally miRNA duplex free energies between an extended window of
expressed) has a near match with LIMK-1, which is known to 35 additional bases of the mRNA and the rest of the miRNA.
be involved in synapse formation and function. miR-129, Target genes were ranked using a composite scoring function,
expressed in mouse cerebellum, has a near perfect comple- which took into account all sites for a particular miRNA on a
mentary match with Musashi-1, which is an RNA-binding gene given mRNA. Conserved miRNA:mRNA pairs were required
essential for neural development, regulated in the cerebel- to involve orthologs of miRNA and mRNA in human, mouse,
lum, and up-regulated in medulloblastoma (Yokota et al. and rat, but there was no requirement for conservation of
2004). target site sequence (beyond the seed match) or position on
the mRNA. Using shuffled miRNA sequences, with the
Comparison of miRNA Target Prediction Methods constraint that shuffled controls match real miRNAs in
Recently, several computational methods for the predic- relevant sequence properties, the false-positive rate of
tion of miRNA targets have been developed (Enright et al. predictions was estimated to be 50% for target genes
2003; Lewis et al. 2003; Rajewsky and Socci 2003; Stark et al. conserved between mouse and human, 31% for target genes
2003; Kiriakidou et al. 2004; Rehmsmeier et al. 2004). Two of conserved in human, mouse, and rat, and 22% for target
these have been applied to mammalian miRNAs, as described genes identified in fugu as well as mammals. As a final result,
in Lewis et al. (2003) and Kiriakidou et al. (2004). We now Lewis et al. reported 400 conserved target genes for the 79
compare and contrast these two methods with each other and miRNAs. Among these targets, 107 genes were reported as
with the current version of our method, as further developed conserved in the fish fugu.
from miRanda 1.0 and as applied to mammalian and In the second method, Kiriakidou et al. used 94 miRNAs in
vertebrate genomes (Enright et al. 2003). We compare human and mouse, seeking targets in a dataset of 13,000
algorithms and target lists, as an aid to the design of UTRs conserved in mouse and human (from Ensembl, date
experiments. not given). The algorithm used a 38-nt sliding mRNA window
The three prediction methods share the goal of identifying and calculation of miRNA–mRNA duplex free energies,
mRNAs targeted by miRNAs. All three use sequence keeping duplexes with energies below 20 kcal/mol. The
complementarity, free energy calculations of duplex forma- duplexes were further filtered using a set of requirements
tion, and evolutionary arguments in developing a scoring regarding matches and loop lengths in certain positions, as
scheme for evaluation of potential targets. Results are derived and extrapolated from experimental tests involving a
reported as lists of target sites and lists of target genes predicted target site for let-7b miRNA on the UTR of the
containing such sites. The three methods differ, however, in human homolog of worm lin-28. The target site sequence was
important technical details, such as the datasets of miRNA engineered into a Luciferase reporter, followed by sequence
and UTR sequences and the algorithm and scoring scheme, as variation of the target site and test of an initial set of 15
well as the report format. We now summarize these technical predictions in the same reporter assay. Using shuffled miRNA
differences and compare the lists of resulting target genes for sequences, and applying the same rules and parameters, the
a common subset of miRNAs. The interpretation of such false-positive rate of predictions was estimated to be 50% for
comparisons is hampered by the fact that selection criteria targets conserved between human and mouse. As a final
and the use of numerical cutoffs differ conceptually, and result, Kiriakidou et al. reported 5,031 human targets, with
genomic coverage is nonuniform. 222 reported as conserved in the mouse.
In the first method, Lewis et al. used 79 miRNAs in human, In the third method (this work), we used 218 mammalian
mouse, and rat, seeking targets in a UTR dataset extracted miRNAs and 29,785 transcripts derived from Ensembl (Table
from the June 2003 version of the Ensembl database. The 3) and, as a final result, report 4,467 target genes. What are
UTR dataset had 14,300 ortholog triplets conserved between the main differences between these three prediction meth-
human, mouse, and rat and 17,000 ortholog pairs between ods? Comparison of the total number of predicted target
human and mouse. All annotated UTRs were extended by 2 genes is not very informative, as different datasets and cutoffs
kb of 39 flanking sequence. The algorithm required exact were used. We attempted to remove one of the technical
differences, by explicitly comparing reported targets for the logical conditions. (6) As to differences in algorithm, one can
same set of 79 miRNAs used by Lewis et al. (although state opinions about the strengths or weaknesses of each
significant differences remained in the sets of UTR sequences particular algorithm, but the relationship between each
used): the overlap of target genes between Kiriakidou et al. prediction method and the actual in vivo process by which
(out of 189) and Lewis et al. (out of 400) was 10.6%; the miRNAs have functional interactions with their target
overlap between Lewis et al. (out of 400) and this work (out of mRNAs remains unclear or, at best, unproven. In summary,
2,673) was 46%; and the overlap between Kiriakidou et al. in our view, each of the three methods, including the one in
(out of 189) and this work (out of 2,673) was 49%. In each case this work, falls substantially short of capturing the full detail
the totals (‘‘out of’’) are the number of target genes for the of physical, temporal, and spatial requirements of biologically
common set of 79 miRNAs and the percentage is relative to significant miRNA–mRNA interaction. As such, the target
the smaller set of two compared. The obvious reason for the lists remain largely unproven, but useful hypotheses.
larger overlap with our results, 46% and 49% respectively, is The predicted targets are useful in practice for the design
the larger number of targets in our predictions, which in turn of experiments as they increase the efficiency of validation
is primarily the result of choice of cutoff. experiments by focusing on target lists significantly enhanced
Direct comparison of the three prediction methods is in likely targets, compared to random. It is plausible that
complicated by the fact that the noticeable differences targets near the top of lists are the most likely to lead to
between the target lists of the three methods are due to the successful experiments. Task-specific filtering of target lists
aggregate effects of datasets, algorithm, including selection for particular planned experiments is recommended, espe-
rules, use of conservation, and cutoffs. The following cially with respect to cooperativity of binding (more than one
characteristics of the three methods underlie these differ- site for one or more miRNAs on one gene transcript) and
ences and should be taken into consideration when choosing coincidence of expression, as new data on expression
targets for experimentation. (1) As to UTR datasets, Lewis et patterns of miRNAs and mRNAs in different tissues become
al., with the earliest published report, used a smaller set of available. For example, a recommended conservative ap-
UTRs, with some likelihood of false positives as a result of proach to the design of experiments would use all available
UTR extension. The UTR sets used in this work, the third in expression information and restrict the predicted target
terms of publication date, are the most comprehensive and genes to those with two or more target sites at normal
plausibly the most reliable (as of February 2004). (2) As to threshold (S . 90) or one target site with a higher threshold (S
miRNA datasets, there was an increase from 79 for Lewis et al. . 110), counting only sites with up to one G:U pair in
to 94 for Kiriakidou et al. to 218 miRNAs used in this work. residues 2–8 counting from the 59 end of the miRNA.
(3) As to the cooperativity of binding, the scoring system of To take into account the rapid development of this field
Lewis et al. evaluated cooperativity of multiple target sites by and the likely close interaction of theory and experiment, we
the same miRNA on a target gene, but disregarded multiple plan to periodically update our prediction method and
target sites from different miRNAs on one gene; that of parameters and make revised target lists available on
Kiriakidou et al. focused on single sites; and that of this work www.microrna.org. Next, we discuss some conceptual con-
gave high scores to multiple hits on a target gene, no matter sequences of the composition of our target list.
whether these hits involved the same miRNA or different
miRNAs. These tendencies are not exclusive where scores
involve functions of several real numbers, with cutoffs Discussion
applied to the aggregate score; e.g., our method also allows How Widespread Is the Regulation of Translation by
strong single target sites. (4) As to assessment of false positives miRNA?
using statistical methods based on shuffling, the comparison With plausible parameters, we have predicted that close to
of percentages is inconclusive, as the statistics of the 9% (2,273 out of 23,531) of all mammalian genes have more
background distribution of true negatives is not well known. than one miRNA target site in their 39 UTRs, with 1,314 being
It appears certain, however, from both Lewis et al. and this stronger candidates with more than two target sites. This
work, that statistical confidence increases with the extent of could well be an underestimate of the total number of genes
conservation among increasingly distant species. (5) As to subject to miRNA regulation, as we have used a conservative
validation experiments, each of the methods used a different conservation filter. On the other hand, not all predicted
type and set, with mixed overall conclusions. On the miRNA–mRNA pairs would have a biological consequence
reassuring side, there was direct validation of some of the unless both miRNA and mRNA are expressed at the same
predicted target sites of Lewis et al. and of Kiriakidou et al. time in the same cell and at sufficient concentration. The
using reporter constructs in cell lines. We found some human genome has about 250 miRNA genes, compared to
agreement between the sites validated in this way and our about 35,000 protein genes. Thus, the the determination that
predicted targets (details in Table S13), but in some cases we about 1% of genes (miRNAs) control the expression of more
predicted different details of target sites for a given than 10% of genes is a reasonable first order estimate. It is
experimentally tested miRNA:mRNA pair. Also, Kiriakidou currently not known if any miRNAs control the expression of
et al. used a series of such experiments to extrapolate from a miRNA genes, i.e., the progression from miRNA transcript to
set of specific sequence variants to general rules for mature miRNA.
identification of target sites. However, serious doubts about
the validity of any set of rules persist as there is very little in How Conserved in Evolution Are miRNA Targets?
vivo validation in which native levels of specific miRNAs are As many miRNA sequences are detectably conserved across
shown to interact with identified native mRNA targets with large evolutionary distances, they must be subject to strong
observable phenotypic consequences under normal physio- functional constraints. These constraints are unlikely to come
from single-site interactions with the target, as experimen- signal integration on target genes are key features of the
tally validated animal miRNAs rarely have perfectly matched control of translation by miRNAs. Neither multiplicity nor
target sites. Plausibly, the evolution of miRNAs is constrained cooperativity is a novel feature in the regulation of gene
by functional interactions with multiple targets. As a expression. Indeed, regulation by transcription factors
consequence, any compensatory mutation in the miRNA in appears to be characterized, at least in eukaryotes, by
response to mutations in a target site would be disruptive to analogous one-to-many and many-to-one relations between
the miRNA’s interaction with other target sites. Co-evolution regulating factor and regulated genes (Kadonaga 2004). We
of the miRNA sequence and all of its target sequences is are, of course, aware that the control cycles and feedback
therefore a rare event. With these assumptions, the con- loops involving miRNAs cannot be adequately described
straints on the local mRNA sequence of individual target sites without more detailed knowledge of the control of tran-
are weaker than those on the miRNA sequence. We were scription of miRNA genes, about which little is known at
therefore surprised to observe a substantial number of cases present.
(28.6% of the 2,273 targets) with 100% conservation of target
site sequence and with the target sites being within ten Mechanisms of miRNA Action
nucleotides of each other on the globally aligned UTRs of The role of a few animal miRNAs as posttranscriptional
orthologous genes between mammals. regulators of gene expression and, in particular, as inhibitors
Lacking more detailed knowledge of miRNA evolution, we of translation is well established. However, the molecular
draw two operational conclusions. (1) Conservation of target mechanism of action is not well understood. Posttranscrip-
site sequence and position is a practical information filter for tional control of protein levels can be achieved, for example,
predicted target sites, reducing the rate of false positives. (2) by cleaving the mRNA, by preventing RNP transport to
It is very likely that new miRNAs have continuously appeared ribosomes, by stalling or otherwise inhibiting translation on
in evolution (Lai 2003) at some non-negligible rate and that ribosomes, or by facilitating the formation of protein
the set of targets for any given miRNA has lost or gained complexes near ribosomes that degrade nascent polypeptide
members, even between species as close as human and mouse. chains. What do our results imply regarding the mechanism
It is therefore important to develop prediction tools that do of action?
not rely on conservation filters or at least allow us to make In analogy to plant miRNAs that have near perfect
them weaker. Work on this is in progress. sequence complementarity and facilitate mRNA degradation,
our predicted targets with near perfect complementarity
Multiplicity and Cooperativity between miRNA and mRNA plausibly are involved in mRNA
Regulation by miRNAs is obviously not as simple as one cleavage (e.g., miR-196 and miR-138; see Results). Most of these
miRNA–one target gene, as perhaps the early examples (lin-4 would involve single target sites. In the case of Hox-B8,
and let-7) seemed to indicate. The distribution of predicted cleavage has been experimentally shown in mammalian cells
targets reflects more complicated combinatorics, both in (Yekta et al. 2004). We estimate that fewer than 5% of miRNA
terms of target multiplicity (more than one target per targets are cleaved as a result of miRNA binding.
miRNA) and signal integration (more than one miRNA per Multiple target sites of lesser complementarity are con-
target gene). sistent with RNP formation leading to translational inhi-
The distribution of the number of target genes (and target bition, not mRNA degradation. Although we did predict
sites) per miRNA is highly nonuniform, ranging from zero for single miRNA target sites for some genes, most target genes
seven miRNAs to 268 for let-7b, with an average of 7.1 targets have multiple sites, indicating that cooperative binding
per miRNA. It is difficult to describe in detail, beyond the (Doench and Sharp 2004) may be essential for formation of
examples discussed in this text and beyond the annotation of inhibitory RNP complexes.
target genes in Figure 2 and Table S3, which specific An interesting and somewhat paradoxical feature is seen
processes appear to be regulated by each miRNA or each with mRNAs bound by FMRP, some of which increased and
set of co-expressed miRNAs. Groups of targets may reflect a some of which are decreased in polysome fractions in FMRP
reaction, a pathway, or a functional class (see Results). knock-out mice (Brown et al. 2001). We see no bias in which
Although all miRNA–target pairs are subject to the condition of these two sets is most enhanced as predicted miRNA
of synchrony of expression, it is likely that typically one targets. This ambiguity not only raises questions about details
miRNA regulates the translation of a number of target of FMRP regulation but also raises the possibility that miRNA
messages and that, in some cases, the target genes as a group targets may not always be translationally repressed and may
are involved in a particular cellular process. This was already instead be translationally enhanced.
known for the case of lin-4 (Ambros 2003).
The number of miRNA target sites per gene is also Improvement of Prediction Rules
nonuniform, with a mean of 2.4. Although we do list target Current methods for predicting miRNA targets rely on
genes with single miRNA sites, there is increasing evidence conservation filters to reduce noise. Although the miRNA–
that, in general, two or more sites are needed in the context mRNA pairings of experimentally validated targets were
of repression of translation. Although the details of these carefully used to define prediction rules (Enright et al. 2003;
distributions (see Figure 2 and Table S3) depend on technical Lewis et al. 2003; Stark et al. 2003), the information content in
details, such as uniform cutoff for all miRNAs and evaluation sequence match scores and free energy estimates of RNA
in terms of a particular, imperfect scoring system, the general duplex formation appears to be low. What is missing? Perhaps
features of the distributions (see Figure 3) may be generally the fine details of experimentally proven target site matches
valid. are incorrect, although in some experiments mismatches and
We conclude that multiplicity of targets and cooperative insertions have been tested. More plausibly, the rules do not
yet capture additional functionally relevant interactions of modulation of protein dosage in a cell through low-level
miRNAs, such as in maturation and transport. Such addi- translational repression (Bartel and Chen 2004).
tional interactions remain to be described in molecular These aspects of miRNA regulation complicate the design
detail, such as interactions with the small RNA processing of experiments aiming at testing target predictions, or, more
machinery (Drosha and Dicer) and with the components of generally, at discovering biologically meaningful targets.
RNPs (AGO and FMRP). A first step in this direction is the Straightforward experiments that test one target site for
very recent analysis of the crystal structure of a PAZ domain one miRNA on one UTR will not be able to disentangle the
of a human Argonaute protein, eIF2c1, complexed with a 9- effects of multiplicity or cooperativity. Tests for multiple sites
mer RNA oligonucleotide in dimer configuration, which may on one UTR for one miRNA capture aspects of cooperativity
represent three-dimensional interactions for the 39 end of a (Doench and Sharp 2004), but still do not capture signal
miRNA (and siRNA) complexed, e.g., with Dicer or AGO (Ma integration by diverse miRNAs. The most complicated
et al. 2004). In this structure, each PAZ domain makes close situation is one in which multiple miRNAs affect multiple
binding contact with nine nucleotides of a single-stranded genes in combinatorial fashion, with fine-tuning depending
RNA. The two 39 terminal nucleotides bind in a pocket on the state of the cell. We look forward to the results of
through RNA backbone and other contacts. The remaining ingenious experiments designed to deal with the complexity
seven nucleotides bind PAZ through a series of backbone of miRNA regulation.
contacts such that nucleotides 3 to 9 are in an RNA helical The results of this genome-wide prediction for mammals
conformation with bases exposed for base pairing to the and fish are meant to be a guide to experiments that will in
second single-stranded RNA. If a 20–21-nt single-stranded time elucidate the genetic control network of regulators of
RNA is bound to a PAZ domain in the same fashion, the 59 transcription, translation/maturation, and degradation of
end would be free for other interactions, such as binding to gene products, including miRNAs.
another protein domain in the RISC or base-pairing to
mRNA. The conformational entropy that results when the 39 Materials and Methods
end binds to PAZ, because the RNA helix is pre-formed, is
miRNA sequences. Mature human and mouse miRNA sequences
consistent with weaker base pairing between miRNA and were obtained from the RFAM miRNA registry (Griffiths-Jones 2004).
mRNA at the 39 end of the miRNA, and stronger base pairing To cover cases of incomplete data, any mouse miRNA sequence not
at the 59 end. The dimeric structure of the PAZ domain (Ma et (yet) described in humans was assumed to be present in human, with
al. 2004) also raises the tantalizing possibility of cooperative the same sequence, and vice versa. Similarly, all mouse miRNAs were
assumed to be identical and present in the rat genome. These
binding of a dimer of two miRNA–PAZ combinations to two assumptions are reasonable as sequence identity for known orthol-
target sites on one or more mRNAs. In such an arrangement, ogous pairs in human and mouse is, on average, 98% (with 110 out of
seven residues at the 39 ends of the two miRNAs (residues 3–9, 146 orthologous sequences being identical). In total, 218 mammalian
miRNAs were used. For human target searches, 162 native miRNA
but not the terminal two nucleotides) are paired in sequences were available plus 17 mouse and 39 rat miRNA sequences;
antiparallel fashion, with near perfect complementary pair- for mouse, 191 native, 14 human, and 13 rat sequences; and for rat, 45
ing. native, 159 mouse, and 14 human miRNA sequences.
As more details of molecular contacts become available, Mature miRNA sequences for zebrafish and fugu were predicted
starting from known human and mouse miRNA precursor sequences
prediction rules will evolve and improve in accuracy. The (Ambros et al. 2003a). Each precursor sequence was used, in a scan
following elements are worth considering in the next against the zebrafish supercontigs (release 18.2.1) using NCBI
generation of target prediction rules: (1) details of strand BLASTN (version 2.2.6; E-value cutoff, 2.0) (Altschul et al. 1990), to
identify a sequence segment containing the potential zebrafish
bias as deduced from siRNA experiments (Khvorova et al. miRNA. The mammalian and fish segments were then realigned
2003), (2) contribution of sequences outside of the mRNA using a global alignment protocol (ALIGN in the FASTA package,
target sites, (3) refinement of position-dependent rules, version 2u65; Pearson and Lipman 1988). After testing the potential
fish miRNA precursors for foldback structures (Zuker 2003), the final
including different gap penalties for the mRNA and the set of 225 predicted zebrafish miRNAs was selected. The same set of
miRNA, (4) energetics of miRNA–protein binding, starting sequences was used for fugu.
with PAZ domain interaction, and (5) translation of system- 39 UTR sequences. The Ensembl database (Birney et al. 2004)
atic mutational profiling experiments into scoring rules served as the source of genomic data. The Ensembl BioPerl
application user interface was used to generate 39 UTR sequences
(Doench and Sharp 2004). for all transcripts of all genes from each genome. Some transcripts
are alternatively spliced from the same gene, so the total number of
Principles of Regulation by miRNAs genes is smaller than the number of transcripts (Table 3). When no
Although the predicted targets are subject to error (see Ensembl annotated 39 UTR sequences were available, we predicted 39
UTRs by taking 4,000 bp of genomic sequence downstream of the end
estimate of false positives) and the prediction rules in need of of the last exon of a transcript (Table 3). If this predicted region
improvement, several general principles of gene regulation overlapped coding sequence on either strand, we halted 39 UTR
by miRNAs are emerging. (1) Except in cases where a highly extension at that point.
complementary match causes cleavage of the target message, UTR orthology and alignment. Orthology mappings between genes
from different genomes were obtained using ‘‘orthologue tables’’
miRNAs appear to act cooperatively, requiring two or more from the EnsMart (Kasprzyk et al. 2004) feature of the Ensembl
target sites per message, for either one or several different database. Pairs of orthologous UTRs were aligned with each other
miRNAs. (2) Most miRNAs are involved in the translational using the AVID (Bray et al. 2003) alignment algorithm to facilitate
analysis of conservation of position and sequence of target sites. In
regulation of several target genes, which in some cases are total, 26,205 human transcripts, representing 15,869 genes, were
grouped into functional categories. (3) miRNAs carried in the mapped to both mouse and rat transcripts. For zebrafish, 11,442
context of RNPs appear to be sequence-specific adaptors transcripts, representing 10,909 genes, were mapped to fugu tran-
guiding RNPs to particular target sequences. miRNA regu- scripts and 11,306 transcripts mapped to human transcripts (10,063
genes).
lation of cellular messages may therefore range from a switch- miRNA target prediction. The miRanda algorithm (version 1.0;
like behavior (e.g., cleavage of mRNA message) to a subtle Enright et al. 2003) was used to scan all available miRNA sequences
for a given genome against 39 UTR sequences of that genome derived analysis, including analysis of conservation of target site sequence
from the Ensembl database and—tabulated separately—against all and position in orthologous 39 UTRs. A total of ten randomized
cDNA sequences and coding regions. The algorithm uses dynamic experiments were performed. Counts were averaged across all
programming to search for maximal local complementarity align- experiments, and the standard deviation and other statistical
ments, corresponding to a double-stranded antiparallel duplex. A measures were calculated.
score of þ5 was assigned for G:C and A:T pairs, þ2 for G:U wobble Analysis of FMRP-associated mRNAs. We compiled a list of 464
pairs, and 3 for mismatch pairs, and the gap-open and gap- gene identifiers of FMRP-associated mRNAs from five different
elongation parameters were set to 8.0 and 2.0, respectively. To publications (Brown et al. 2001; Chen et al. 2003; Denman 2003;
significantly increase the speed of miRanda runs, in calculating the Miyashiro et al. 2003; Waggoner and Liebhaber 2003). Among the
optimal alignment score at positions i, j in the alignment scoring 464 gene identifiers, 397 identifiers were mapped to the correspond-
matrix, the gap-elongation parameter was used only if the extension ing genes in our 39 UTR dataset. The remaining 67 genes were not
to i, j of a given stretch of gaps ending at positions i–1, j or j–1, i (but mapped because their published identifiers were obsolete, primarily
not of stretches of gaps ending at i–k, j or j, i–k for k . 1) resulted in a because of their Affymetrix probeset identification numbers. To
higher score than the addition of a nucleotide–nucleotide match at identify miRNA regulation of the 397 FMRP-associated mRNAs,
positions i, j. Removal of this restriction with the availability of more these genes were then compared with the set of predicted miRNA
computing power would result in a moderate increase in average targets.
loop length, but the advantages of this would probably be superceded CPE motif prediction. We predicted CPE motifs in human, mouse,
by overall refinement of target prediction rules. Importantly, and rat UTRs. We used a search pattern using four criteria: (1)
complementarity scores at the first eleven positions, counting from presence of the CPE motif UUUUAU, (2) presence of the hexanu-
the miRNA 59 end, were multiplied by a scaling factor of 2.0, so as to cleotide AAUAAA, (3) the CPE and the hexanucleotide motif being
approximately reflect the experimentally observed 59–39 asymmetry; within 100 nucleotides of each other, and (4) the conservation of
for example, G:C and A:T base pairs contributed þ10 to the match these motifs and the positions of the motifs in the mouse ortholog
score in these positions. The value of the scaling factor at each (Mendez and Richter 2001).
position is an adjustable parameter subject to optimization as more
experimental information becomes available. Because of the ongoing
discussion about the rules for target prediction, target genes (a total Supporting Information
of 490) that contained target sites with more than one G:U wobble in Figure S1. Overrepresentation of the GO and Interpro Domains
the 59 end are flagged in the Table S2. The thresholds for candidate
target sites were S . 90 and DG , 17 kcal/mol, where S is the sum of Found at DOI: 10.1371/journal.pbio.0020363.sg001 (347 KB PDF).
single-residue-pair match scores over the alignment trace and DG is Table S1. Human miRNAs in Introns
the free energy of duplex formation from a completely dissociated
state, calculated using the Vienna package as in Enright et al. (2003). Found at DOI: 10.1371/journal.pbio.0020363.st001 (25 KB XLS).
After finding optimal local matches above these thresholds
Table S2. Predicted Mammalian miRNA Targets by Gene
between a particular miRNA and the set of 39 UTRs in each genome,
we asked whether target site position and sequence for this miRNA Found at DOI: 10.1371/journal.pbio.0020363.st002 (8.0 MB XLS).
were conserved in the 39 UTRs of orthologous genes, i.e., between
human and mouse or rat, or between fugu and zebrafish. The Table S3. Predicted Mammalian miRNA Targets by miRNA
alignments of target sites were generated transitively Found at DOI: 10.1371/journal.pbio.0020363.st003 (17.0 MB XLS).
(UTR!miRNA!UTR) via a shared (or homologous) miRNA. We
required that the positions of pairs of target sites in two species fall Table S4. Predicted Fish Targets by Gene
within 610 residues in the aligned 39 UTRs. Conserved target sites Found at DOI: 10.1371/journal.pbio.0020363.st004 (5.6 MB XLS).
with sequence identity of 90% or more (human versus mouse or rat)
and 70% or more (zebrafish versus fugu) were selected as candidate Table S5. Predicted Fish Targets by miRNA
miRNA target sites and stored in a MySQL database. Using human as Found at DOI: 10.1371/journal.pbio.0020363.st005 (9.8 MB XLS).
the reference species, we predicted 10,572 conserved target sites
(conserved in either mouse or rat) in 4,463 human transcripts, of Table S6. High-Scoring miRNA Matches in Human cDNAs
which 2,307 transcripts of 2,273 genes contained more than one Found at DOI: 10.1371/journal.pbio.0020363.st006 (601 KB XLS).
target site. Similarly, using zebrafish as a reference species, we
predicted 7,057 conserved target sites (conserved in fugu) in 4,820 Table S7. High-Scoring miRNA Matches in Human Coding Regions
zebrafish transcripts. Found at DOI: 10.1371/journal.pbio.0020363.st007 (512 KB XLS).
To focus on the strongest predictions, conserved target sites for
each miRNA were sorted according to alignment score, with free Table S8. Estimate of False Positives
energy as the secondary sort criterion. In cases where multiple Found at DOI: 10.1371/journal.pbio.0020363.st008 (23 KB XLS).
miRNAs targeted the same site on a transcript (or within 25 nt of a
site), only the highest scoring, lowest energy miRNA was reported for Table S9. Predicted Targets That Are Associated with FMRP
that site. Found at DOI: 10.1371/journal.pbio.0020363.st009 (678 KB XLS).
Functional analysis of targets. To facilitate surveys of target
function and analysis of functional enrichment, InterPro domain Table S10. Function of Targets by Interpro and GO Mapping
assignments (Mulder et al. 2003) and GO (molecular function Found at DOI: 10.1371/journal.pbio.0020363.st010 (357 KB XLS).
hierarchy) mappings (Ashburner and Lewis 2002) for all human
genes were obtained using EnsMart. For each functional class derived Table S11. Target Genes That Contain Predicted CPE Motifs
from either source, we calculated its degree of under- or over-
Found at DOI: 10.1371/journal.pbio.0020363.st011 (529 KB XLS).
representation, Fclass, using the log-odds ratio of the fraction of
annotated target genes with the same class (F1) and the fraction of all Table S12. Conserved Vertebrate Target Genes
annotated Ensembl human genes with that class (F2):
Found at DOI: 10.1371/journal.pbio.0020363.st012 (621 KB XLS).
F1 N class N class Table S13. Overlap of the Predicted Targets with Validated Gene
Fclass ¼ log2 ; where F1 ¼ i¼Ctar and F2 ¼ i¼Call ð1Þ
F2 X X Targets from Lewis et al. (2003) and Kiriakidou et al. (2004).
i i
Ntar Nall
i¼1 i¼1
Found at DOI: 10.1371/journal.pbio.0020363.st013 (68 KB XLS).
Author contributions. CS and DMS conceived and directed the programs. BJ, AA, TT, CS, and DSM analyzed the ouput data. TT
project. BJ, AJE, CS, and DSM worked on the algorithm. BJ, AJE, and contributed microRNA expression data. BJ, CS, and DSM wrote the
DSM prepared the input data. BJ and AJE wrote the computer paper with assistance from AJE. &
Lee Y, Ahn C, Han J, Choi H, Kim J, et al. (2003) The nuclear RNase III Drosha Ramos A, Hollingworth D, Pastore A (2003) G-quartet-dependent recognition
initiates microRNA processing. Nature 425: 415–419. between the FMRP RGG box and RNA. RNA 9: 1198–1207.
Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB (2003) Prediction Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R (2004) Fast and effective
of mammalian microRNA targets. Cell 115: 787–798. prediction of microRNA/target duplexes. RNA. In press.
Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP (2003a) Vertebrate Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, et al. (2000) The
microRNA genes. Science 299: 1540. 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis
Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, et al. (2003b) The elegans. Nature 403: 901–906.
microRNAs of Caenorhabditis elegans. Genes Dev 17: 991–1008. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002)
Lin SY, Johnson SM, Abraham M, Vella MC, Pasquinelli A, et al. (2003) The C. MicroRNAs in plants. Genes Dev 16: 1616–1626.
elegans hunchback homolog, hbl-1, controls temporal patterning and is a Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, et al. (2004) Rational
probable microRNA target. Dev Cell 4: 639–650. siRNA design for RNA interference. Nat Biotechnol 22: 326–330.
Lingel A, Simon B, Izaurralde E, Sattler M (2003) Structure and nucleic-acid Richter JD (2001) Think globally, translate locally: What mitotic spindles and
binding of the Drosophila Argonaute 2 PAZ domain. Nature 426: 465–469. neuronal synapses have in common. Proc Natl Acad Sci U S A 98: 7069–7071.
Llave C, Kasschau KD, Rector MA, Carrington JC (2002a) Endogenous and Sasaki T, Shiohama A, Minoshima S, Shimizu N (2003) Identification of eight
silencing-associated small RNAs in plants. Plant Cell 14: 1605–1619. members of the Argonaute family in the human genome. Genomics 82: 323–
Llave C, Xie Z, Kasschau KD, Carrington JC (2002b) Cleavage of Scarecrow-like 330.
mRNA targets directed by a class of Arabidopsis miRNA. Science 297: 2053– Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, et al. (2003) Asymmetry in the
2056. assembly of the RNAi enzyme complex. Cell 115: 199–208.
Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U (2003) Nuclear export of Seitz H, Youngson N, Lin SP, Dalbert S, Paulsen M, et al. (2003) Imprinted
microRNA precursors. Science 303: 95–98. microRNA genes transcribed antisense to a reciprocally imprinted retro-
Ma JB, Ye K, Patel DJ (2004) Structural basis for overhang specific small transposon-like gene. Nat Genet 34: 261–262.
interfering RNA recognition by the PAZ domain. Nature 429: 318–322. Sempere LF, Freemantle S, Pitha-Rowe I, Moss E, Dmitrovsky E, et al. (2004)
Martinez J, Patkaniowska A, Urlaub H, Luhrmann R, Tuschl T (2002) Single- Expression profiling of mammalian microRNAs uncovers a subset of brain-
stranded antisense siRNAs guide target RNA cleavage in RNAi. Cell 110: expressed microRNAs with possible roles in murine and human neuronal
563–574. differentiation. Genome Biol 5: R13.
Mendez R, Richter JD (2001) Translational control by CPEB: A means to the Shiseki M, Nagashima M, Pedeux RM, Kitahama-Shiseki M, Miura K, et al.
end. Nat Rev Mol Cell Biol 2: 521–529. (2003) p29ING4 and p28ING5 bind to p53 and p300, and enhance p53
Mendez R, Hake LE, Andresson T, Littlepage LE, Ruderman JV, et al. (2000) activity. Cancer Res 63: 2373–2378.
Phosphorylation of CPE binding factor by Eg2 regulates translation of c-mos Song JJ, Liu J, Tolia NH, Schneiderman J, Smith SK, et al. (2003) The crystal
mRNA. Nature 404: 302–307. structure of the Argonaute2 PAZ domain reveals an RNA binding motif in
Mendez R, Barnard D, Richter JD (2002) Differential mRNA translation and RNAi effector complexes. Nat Struct Biol 10: 1026–1032.
meiotic progression require Cdc2-mediated CPEB destruction. EMBO J 21: Stark A, Brennecke J, Russell RB, Cohen SM (2003) Identification of Drosophila
1833–1844. microRNA targets. PLoS Biol 1: e60.
Mette MF, van der Winden J, Matzke M, Matzke AJ (2002) Short RNAs can Steward O, Schuman EM (2003) Compartmentalized synthesis and degradation
identify new candidate transposable element families in Arabidopsis. Plant of proteins in neurons. Neuron 40: 347–359.
Physiol 130: 6–9. Stuart JM, Segal E, Koller D, Kim SK (2003) A gene-coexpression network for
global discovery of conserved genetic modules. Science 302: 249–255.
Metzler M, Wilda M, Busch K, Viehmann S, Borkhardt A (2004) High expression
Tan L, Chang JS, Costa A, Schedl P (2001) An autoregulatory feedback loop
of precursor microRNA-155/BIC RNA in children with Burkitt lymphoma.
directs the localized expression of the Drosophila CPEB protein Orb in the
Genes Chromosomes Cancer 39: 167–169.
developing oocyte. Development 128: 1159–1169.
Michael MZ, O’Connor SM, van Holst Pellekaan NG, Young GP, James RJ (2003)
Tapper J, Kettunen E, El-Rifai W, Seppala M, Andersson LC, et al. (2001)
Reduced accumulation of specific microRNAs in colorectal neoplasia. Mol
Changes in gene expression during progression of ovarian carcinoma.
Cancer Res 1: 882–891.
Cancer Genet Cytogenet 128: 1–6.
Miyashiro KY, Beckel-Mitchener A, Purk TP, Becker KG, Barret T, et al. (2003)
Tenenbaum SA, Carson CC, Lager PJ, Keene JD (2000) Identifying mRNA
RNA cargoes associating with FMRP reveal deficits in cellular functioning in
subsets in messenger ribonucleoprotein complexes by using cDNA arrays.
Fmr1 null mice. Neuron 37: 417–431.
Proc Natl Acad Sci U S A 97: 14085–14090.
Moss EG, Tang L (2003) Conservation of the heterochronic regulator Lin-28, its
Todd PK, Mack KJ, Malter JS (2003) The fragile X mental retardation protein is
developmental expression and microRNA complementary sites. Dev Biol
required for type-I metabotropic glutamate receptor-dependent translation
258: 432–442. of PSD-95. Proc Natl Acad Sci U S A 100: 14374–14378.
Moss EG, Lee RC, Ambros V (1997) The cold shock domain protein LIN-28 Tuschl T, Zamore PD, Lehmann R, Bartel DP, Sharp PA (1999) Targeted mRNA
controls developmental timing in C. elegans and is regulated by the lin-4 degradation by double-stranded RNA in vitro. Genes Dev 13: 3191–3197.
RNA. Cell 88: 637–646. Vella MC, Choi EY, Lin SY, Reinert K, Slack FJ (2004) The C. elegans microRNA
Mourelatos Z, Dostie J, Paushkin S, Sharma A, Charroux B, et al. (2002) miRNPs: let-7 binds to imperfect let-7 complementary sites from the lin-41 39UTR.
A novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev 18: 132–137.
Genes Dev 16: 720–728. Waggoner SA, Liebhaber SA (2003) Identification of mRNAs associated with
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, et al. (2003) The alphaCP2-containing RNP complexes. Mol Cell Biol 23: 7055–7067.
InterPro Database, 2003 brings increased coverage and new features. Wang Y, Hung C, Koh D, Cheong D, Hooi SC (2001) Differential expression of
Nucleic Acids Res 31: 315–318. Hox A5 in human colon cancer cell differentiation: A quantitative study
Muller S, Ledl A, Schmidt D (2004) SUMO: A regulator of gene expression and using real-time RT-PCR. Int J Oncol 18: 617–622.
genome integrity. Oncogene 23: 1998–2008. Woodside KJ, Shen H, Muntzel C, Daller JA, Sommers CL, et al. (2004)
Pal-Bhadra M, Bhadra U, Birchler JA (2002) RNAi related mechanisms affect Expression of Dlx and Lhx family homeobox genes in fetal thymus and
both transcriptional and posttranscriptional transgene silencing in Droso- thymocytes. Gene Expr Patterns 4: 315–320.
phila. Mol Cell 9: 315–327. Xu P, Vernooy SY, Guo M, Hay BA (2003) The Drosophila microRNA mir-14
Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, et al. (2003) Control of leaf suppresses cell death and is required for normal fat metabolism. Curr Biol
morphogenesis by microRNAs. Nature 425: 257–263. 13: 790–795.
Pan H, Qin WX, Huo KK, Wan DF, Yu Y, et al. (2001) Cloning, mapping, and Yan KS, Yan S, Farooq A, Han A, Zeng L, et al. (2003) Structure and conserved
characterization of a human homologue of the yeast longevity assurance RNA binding of the PAZ domain. Nature 426: 468–474.
gene LAG1. Genomics 77: 58–64. Yart A, Mayeux P, Raynal P (2003) Gab1, SHP-2 and other novel regulators of
Park W, Li J, Song R, Messing J, Chen X (2002) CARPEL FACTORY, a Dicer Ras: Targets for anticancer drug discovery? Curr Cancer Drug Targets 3:
homolog, and HEN1, a novel protein, act in microRNA metabolism in 177–192.
Arabidopsis thaliana. Curr Biol 12: 1484–1495. Yekta S, Shih IH, Bartel DP (2004) MicroRNA-directed cleavage of HOXB8
Pasterkamp RJ, Verhaagen J (2001) Emerging roles for semaphorins in neural mRNA. Science 304: 594–596.
regeneration. Brain Res Brain Res Rev 35: 36–54. Yi R, Qin Y, Macara IG, Cullen BR (2003) Exportin-5 mediates the nuclear
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence export of pre-microRNAs and short hairpin RNAs. Genes Dev 17: 3011–
comparison. Proc Natl Acad Sci U S A 85: 2444–2448. 3016.
Perrone-Bizzozero N, Bolognani F (2002) Role of HuD and other RNA-binding Yokota N, Mainprize TG, Taylor MD, Kohata T, Loreto M, et al. (2004)
proteins in neural development and plasticity. J Neurosci Res 68: 121–126. Identification of differentially expressed and developmentally regulated
Pfeffer S, Zavolan M, Grasser FA, Chien M, Russo JJ, et al. (2004) Identification genes in medulloblastoma using suppression subtraction hybridization.
of virus-encoded microRNAs. Science 304: 734–736. Oncogene 23: 3444–3453.
Rajagopalan LE, Malter JS (2000) Growth factor-mediated stabilization of Zamore PD, Tuschl T, Sharp PA, Bartel DP (2000) RNAi: Double-stranded RNA
amyloid precursor protein mRNA is mediated by a conserved 29-nucleotide directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide
sequence in the 39-untranslated region. J Neurochem 74: 52–59. intervals. Cell 101: 25–33.
Rajewsky N, Socci ND (2003) Computational identification of microRNA Zuker M (2003) Mfold web server for nucleic acid folding and hybridization
targets. Dev Biol 267: 529–535. prediction. Nucleic Acids Res 31: 3406–3415.