0% found this document useful (0 votes)

104 views9 pages

TMP F780

Positional conservation and amino acids shape the correct diagnosis and population frequencies of benign and damaging personal amino acid mutations.

Uploaded by

Frontiers

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views9 pages

TMP F780

Positional conservation and amino acids shape the correct diagnosis and population frequencies of benign and damaging personal amino acid mutations.

Uploaded by

Frontiers

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Downloaded from genome.cshlp.

org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Positional conservation and amino acids shape the correct

diagnosis and population frequencies of benign and damaging
personal amino acid mutations
Sudhir Kumar, Michael P. Suleski, Glenn J. Markov, et al.

Genome Res. published online June 22, 2009

Access the most recent version at doi:10.1101/gr.091991.109

P<P Published online June 22, 2009 in advance of the print journal.

Email alerting Receive free email alerts when new articles cite this article - sign up in the box at the
service top right corner of the article or click here

Advance online articles have been peer reviewed and accepted for publication but have not yet
appeared in the paper journal (edited, typeset versions may be posted when available prior to final
publication). Advance online articles are citable and establish publication priority; they are indexed
by PubMed from initial publication. Citations to Advance online articles must include the digital
object identifier (DOIs) and date of initial publication.

To subscribe to Genome Research go to:

http://genome.cshlp.org/subscriptions

Copyright © 2009 by Cold Spring Harbor Laboratory Press

Downloaded from genome.cshlp.org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Letter

Positional conservation and amino acids shape the

correct diagnosis and population frequencies of benign
and damaging personal amino acid mutations
Sudhir Kumar,1,2,3 Michael P. Suleski,1 Glenn J. Markov,1 Simon Lawrence,1
Antonio Marco,1 and Alan J. Filipski1
1
Center for Evolutionary Functional Genomics, Biodesign Institute, Arizona State University, Tempe, Arizona 85287-5301, USA;
2
School of Life Sciences, Arizona State University, Tempe, Arizona 85287-4501, USA

As the cost of DNA sequencing drops, we are moving beyond one genome per species to one genome per individual to
improve prevention, diagnosis, and treatment of disease by using personal genotypes. Computational methods are fre-
quently applied to predict impairment of gene function by nonsynonymous mutations in individual genomes and single
nucleotide polymorphisms (nSNPs) in populations. These computational tools are, however, known to fail 15%–40% of
the time. We find that accurate discrimination between benign and deleterious mutations is strongly influenced by the
long-term (among species) history of positions that harbor those mutations. Successful prediction of known disease-
associated mutations (DAMs) is much higher for evolutionarily conserved positions and for original–mutant amino acid
pairs that are rarely seen among species. Prediction accuracies for nSNPs show opposite patterns, forecasting impediments
to building diagnostic tools aiming to simultaneously reduce both false-positive and false-negative errors. The relative
allele frequencies of mutations diagnosed as benign and damaging are predicted by positional evolutionary rates. These
allele frequencies are modulated by the relative preponderance of the mutant allele in the set of amino acids found at
homologous sites in other species (evolutionarily permissible alleles [EPAs]). The nSNPs found in EPAs are biochemically
less severe than those missing from EPAs across all allele frequency categories. Therefore, it is important to consider
position evolutionary rates and EPAs when interpreting the consequences and population frequencies of human muta-
tions. The impending sequencing of thousands of human and many more vertebrate genomes will lead to more accurate
classifiers needed in real-world applications.
[Supplemental material is available online at http://www.genome.org.]

Unshrouding the mysteries of human genome variation is the es- be not strictly-neutral and are thus thought to harbor signatures of
sential precursor to the development of personalized medicine negative or positive selection (Yampolsky et al. 2005; Eyre-Walker
where the aim is to relate the genotype with the phenotype in et al. 2006; Levy et al. 2007; Shastry 2007; Bentley et al. 2008;
better understanding an individual’s susceptibility to disease and Boyko et al. 2008; Wang et al. 2008; Wheeler et al. 2008).
response to treatment. Already, complete genomes from many The de novo prediction methods to predict functional effects
individual humans have been sequenced, and projects are un- of novel mutations often do not directly incorporate many bi-
derway to expand that number to over a thousand genomes in the ological attributes (e.g., interactions among multiple sites or genes,
near future (Levy et al. 2007; Bentley et al. 2008; Wang et al. 2008; environmental influences on phenotypes, and allele state in the
Wheeler et al. 2008). These projects have revealed that every paired chromosome) because of the lack of information and the
individual carries thousands of amino acid–altering (nonsyn- difficulty in modeling them mathematically. Still, these methods
onymous) nucleotide mutations and that a large number of these offer up to 80% accuracy for mutations in genes implicated in
mutations are novel in terms of their location and the type of Mendelian diseases (for reviews, see Bhatti et al. 2006; Ng and
amino acid change induced. Experimental and other functional Henikoff 2006; Bromberg and Rost 2007; Shastry 2007; Tian et al.
information are rarely available for the association of phenotypic 2007). PolyPhen is the most widely used method for estimating
effect with these mutations, so computational methods are used potential deleterious effects of amino acid mutations; it is available
instead (e.g., Miller and Kumar 2001; Ramensky et al. 2002; Ng and as a web-based service, and it relies on information from sequence
Henikoff 2003; Shastry 2007; Tian et al. 2007; Lohmueller et al. conservation, physiochemical differences, proximity of mutations
2008). These in silico predictions are of great interest in detecting to predicted functional domains, and structural features (Sunyaev
variants for Mendelian and complex diseases, in prioritizing et al. 1999; Ramensky et al. 2002). PolyPhen and SIFT (Ng and
polymorphisms for experimental research in humans and other Henikoff 2003) have been used in hundreds of studies, including
species, and in analyzing data from genome-wide association the evaluation of nonsynonymous single nucleotide polymor-
studies (e.g., Rudd et al. 2005; Bhatti et al. 2006; Kryukov et al. phisms (nSNPs) found in complete genomes. Many other ap-
2007; Doniger et al. 2008). Using various prediction tools, up to proaches have been proposed over the last decade, but these are
one-fourth of nonsynonymous mutations have been diagnosed to not yet widely used (e.g., Bromberg and Rost 2007; Tian et al. 2007;
Cheng et al. 2008).
3
Corresponding author. In recent years, scientists have employed many strategies in
E-mail [email protected]; fax (480) 727-6947.
Article published online before print. Article and publication date are at efforts to build super-classifiers, using sophisticated computational
http://www.genome.org/cgi/doi/10.1101/gr.091991.109. approaches to improve the accuracy of computational prediction

19:000–000 Ó 2009 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/09; www.genome.org Genome Research 1
www.genome.org
Downloaded from genome.cshlp.org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Kumar et al.

tools in diagnosing known disease-associated mutations (DAMs) to gene ontology). We classified each gene into one or more of 13
be function-altering (damaging; true-positives) and nSNPs to be major categories (plus a group of unannotated genes). The accu-
neutral (benign; true-negatives). These strategies have resulted in racy of predicting DAMs in these categories varied in a relatively
some gains compared with classical methods such as PolyPhen and narrow range, even though DAMs in some of the gene function
SIFT (e.g., Bromberg and Rost 2007; Tian et al. 2007). However, the categories were significantly easier to predict than others (e.g.,
anatomies of the misdiagnoses of different types of DAMs and translation) (see Supplemental Fig. S3).
nSNPs remain poorly understood, as the primary correlates of the In contrast to functional categories, the long-term evolu-
observed failures are yet to be explored. With a focus on PolyPhen tionary rates at DAM positions correlate strongly with the success
and the comparison of the observed patterns with those from SIFT, in diagnosing DAMs in both PolyPhen (Fig. 1C) and SIFT (data not
we have taken an evolutionary approach in examining the patterns shown). DAMs in completely conserved positions (lowest evolu-
of successes and failures of mutational diagnosis. Given that Poly- tionary rate) were 1.5 times more likely to be correctly classified
Phen (and other methods) already considers a host of evolutionary than those at positions harboring any interspecies variation (67%
and primary sequence attributes in making decisions, it is reasonable vs. 44%; P < 0.01), while more than 70% of DAMs in the fastest-
to work with the null hypothesis that the accuracy of correct pre- evolving positions were misdiagnosed (benign). The accuracy of
diction is similar for mutations occurring at positions evolving with DAM prediction also varies tremendously and unexpectedly
different evolutionary rates and that the accuracy of the correct among the 20 amino acids, with accuracy ranging from 22%–96%
prediction is similar for different original and mutated amino acids. in the proteome-wide analysis (Fig. 1D).
The choice of evolutionary conservation of positions, original amino We examined the question of whether the observed re-
acids, and mutant alleles reflects practical considerations, because lationship between the evolutionary rate and the accuracy of
these three attributes are readily available or calculable for all prediction is reproduced for DAMs of specific proteins, because
mutations. Other factors, such as secondary, tertiary, and higher PolyPhen scores the relative likelihood of a mutation to affect
structures undoubtedly play an important role, but that infor- function in its protein context. Analysis of the cystic fibrosis
mation is not yet available for an overwhelming proportion of transmembrane conductance regulator (CFTR) protein, which
known DAMs and nSNPs for human populations. contributed the largest number of DAMs in our data set (444),
produced patterns similar to those seen in the proteome-wide
analysis for rates (Pearson’s r = 0.95) as well as for original amino
Results
acids (r = 0.97) (Fig. 1E). Similar results were observed for other
We begin with a report on the accuracy of correctly diagnosing DAM-rich proteins (data not shown). Therefore, the dependence of
known DAMs implicated in Mendelian diseases (>9000 DAMs correct inference of DAMs on evolutionary rate and amino acids is
from >500 genes) (Supplemental Fig. S1). These DAMs were sub- a fundamental attribute of positions, rather than an artifact of
jected to the most recent version of the PolyPhen web service, proteome-wide summarization of mutations in proteins evolving
which classifies them into three categories—benign, possibly- with vastly different conservation profiles and amino acid con-
damaging, and probably-damaging—based on the logarithmic ra- tents. Major differences among evolutionary rates and amino acids
tios of the likelihood of occurrence of a given DAM at the specific and were also observed in the SIFT analysis, and these differences
position and the likelihood of that amino acid occurring at any were correlated with those observed for PolyPhen (r = 0.95 and
position (Ramensky et al. 2002). A probably-damaging designation 0.59, respectively; P << 0.01). Because different amino acids are
indicates that the mutation’s chance of affecting protein function known to evolve at intrinsically different rates, we also examined
is the highest, whereas a benign designation suggests little or no the relationship of evolutionary conservation on the DAM pre-
putative impact on the protein function. diction accuracy for specific amino acids. In the easiest to diagnose
PolyPhen designated 60% of DAMs to be probably-damaging, amino acids (e.g., arginine), DAMs occurring at completely con-
which is the correct inference in this case (Fig. 1A). However, 21% served positions were significantly harder to predict than those at
of DAMs were identified to be benign, which provides a lower limit positions with any site variability (P << 0.01) (Fig. 1F). A similar
on the false-negative rate of inference. Similar accuracies are pattern is seen for amino acids whose mutations are difficult to
reported in other studies, as well (Ng and Henikoff 2006; Chan diagnose (e.g., alanine) (P << 0.01) (Fig. 1F).
et al. 2007; Tian et al. 2007; Cheng et al. 2008; Lohmueller et al. Next, we analyzed >12,000 nSNPs in order to examine the de-
2008). Pooling of the benign and possibly-damaging (ambiguous) pendence of rate of evolution and the original amino acid for
diagnoses increases the false-negative rate for DAMs to 41%, while mutations not associated with any disease. The fraction of nSNPs
the pooling of the possibly-damaging and probably-damaging identified as benign also depends strongly on evolutionary rates (Fig.
categories increases the DAM-prediction accuracy of PolyPhen to 2B) and the original amino acids (Fig. 2C). However, DAMs and
79%. However, it appears to be more prudent to use only the nSNPs show opposite patterns in terms of accuracy, assuming that
probably-damaging category to represent the correct inference for a vast majority of nSNPs represent nondisease variations. For in-
DAMs, because PolyPhen classified a very similar fraction of DAMs stance, alanine nSNPs are diagnosed as benign most often, and nSNPs
and nSNPs into the possibly-damaging category (20% and 18%, at fast-evolving positions are also easily diagnosed to be benign.
respectively) (cf. Figs. 1A and 2A). We compared PolyPhen results In order to investigate why DAMs and nSNPs show comple-
with those obtained from SIFT, which classifies mutations into mentary patterns, we further analyzed results from PolyPhen, which
only two categories: tolerant or not-tolerant (Ng and Henikoff uses a single score metric (position-specific independent counts
2006). SIFT designated 21% of DAMs to be tolerant, which is [PSIC] score) in its decision making. Distributions of PSIC scores
similar to the DAM misclassification rates in PolyPhen (Supple- overlap extensively for DAMs and nSNPs proteome-wide (Fig. 3A)
mental Fig. S2). and for individual amino acids (Supplemental Fig. S4). Generally,
For identifying the correlates of successes and failures in di- DAMs exhibit a wider range of values and carry larger PSIC scores
agnosing DAMs, we first examined the accuracy of correct di- as compared with nSNPs. Underlying the wide PSIC distributions
agnoses for genes having different functions (as reflected in the for DAMs and nSNPs are the relationships of PSIC scores with the

2 Genome Research
www.genome.org
Downloaded from genome.cshlp.org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Evolutionary anatomies of nonsynonymous mutations

show the lowest average PSIC scores, as

well (P << 0.01) (Fig. 3C). This comparison
of the PSIC scores for DAMs and nSNPs
explains the inverse relationship between
the accuracy in diagnosing nSNPs (to be
benign) and DAMs (to be probably-dam-
aging), because PolyPhen designates all
mutants with PSIC # 1.5 to be benign and
with PSIC > 2.0 to be probably-damaging;
PSIC scores between 1.5 and 2.0 yield the
possibly-damaging diagnosis.
The estimation of PSIC scores also
involves the use of an amino acid in-
terchangeability matrix (evolutionary sub-
stitution matrix for each pair of amino
acids), which is frequently inferred from
multiple sequence alignments for a large
number of proteins (e.g., BLOSUM log-
odds substitution matrix). Amino acid in-
terchangeability varies extensively, and
we expect to see concordant differences
in prediction accuracies. Indeed, the ex-
tensive heterogeneity in the accuracy of
prediction for different original–mutant
pairs is seen for DAMs and nSNPs, and it
correlates with the BLOSUM62 amino
acid interchangeability (Fig. 3D). The
original–mutant pairs that occur with
the highest frequency in nature are the
hardest to diagnose when the mutant is
disease-associated (Fig. 3D). In contrast,
these pairs are the easiest to diagnose
for nSNPs. Thus, nSNPs and DAMs show
opposite relationships that are explained
by the evolutionary properties of posi-
tions as well as the assumptions on amino
acid interchangeability derived from long-
term evolutionary patterns.
In addition to evolutionary rates, com-
parative genomics yields a set of amino
acids observed among species in every
position. Under the simplifying assump-
tion that the function of a position has
not changed significantly, these amino
acids are evolutionarily permissible alleles
(EPAs) at that position. Since EPAs are
neutral alternatives at a position, they are
Figure 1. Accuracy of PolyPhen diagnosis of 9460 DAMs. (A) Fraction of DAMs classified into benign, not expected to be disease-associated. We
possibly-damaging, and probably-damaging categories. (B) Evolutionary timetree of 44 species used for inferred EPAs for each DAM and nSNP
estimating evolutionary rates. Relationship of evolutionary rates (C ) and incident amino acids (D) with
the correct diagnosis of DAMs (probably-damaging). Error bars, 95% confidence interval (two times the
position by using multiple sequence align-
SE). (E ) Correlation between the accuracy of DAM prediction from proteome-wide analysis, and one ments of 44 diverse vertebrate species (see
DAM-rich protein (cystic fibrosis transmembrane conductance regulator [CFTR]). Solid and open circles Methods). A small fraction of DAMs are
show data points for incident amino acids (r = 0.97; P << 0.01) and evolutionary rates (r = 0.95; P << EPAs (;9%), a finding that is similar to
0.01), respectively. (F ) Two examples showing the dependence of the accuracy of DAM diagnosis
those reported elsewhere (e.g., Kondrashov
for constant and variables sites for arginine (an easy-to-diagnose amino acid; red bars) and alanine (a
difficult-to-diagnose amino acid; gray bars). Error bars, 95% confidence interval based on the binomial 2003; Subramanian and Kumar 2006a).
variance of the fraction of sites in the plotted categories. These DAMs occur preferentially at faster-
evolving positions.
As expected, EPAs comprise a vast
evolutionary rates and amino acids involved. The lowest average majority of nSNPs (59%). Still, many thousands of nSNPs are not
PSIC scores are seen for the evolutionary rates and amino acids for EPAs. This result may not be attributed to a disproportionate
which PolyPhen exhibited the worst performance for DAMs number of alignment gaps and missing data at positions where
(Fig. 3B). In fact, nSNPs with these rates and incident amino acids nSNPs are not EPAs, because the fraction of species with alignment

Genome Research 3
www.genome.org
Downloaded from genome.cshlp.org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Kumar et al.

more frequently than those in the slowest-evolving positions (81%

vs. 53%; P << 0.01) (Fig. 4A). Likely, this is because the strong puri-
fying selection in the highly conserved sites would allow only re-
cently emerged mutations to be found at those positions, and these
nSNPs would occur with low frequencies. Furthermore, positions
that evolve more slowly will have a smaller number of EPAs, which
would result in a greater proportion of non-EPA nSNPs. This phe-
nomenon is evident in the observation that non-EPA nSNPs occur
with one-third the allele frequency of EPA nSNPs overall consis-
tently across positions evolving with different rates (Fig. 4B).
A stratification of PolyPhen results based on the EPA status of
the analyzed mutations shows the importance of EPAs (Table 1). At
variable positions, DAMs are diagnosed to be probably-damaging
twice as often as benign when they do not overlap EPAs. This ac-
curacy declines to 31% when DAMs are EPAs, and DAMs are di-
agnosed to be probably-damaging much less frequently than
benign. Therefore, DAM accuracy prediction depends strongly on
their overlap with EPAs. There is also a great influence of EPAs
on the prediction of functional classification of nSNPs. nSNPs are
much easier to categorize as benign if they appear as EPAs. In fact,
nSNPs are designated to be probably-damaging only 5% of the time
if they are EPAs; from this rarity and from the above-mentioned
results, we can infer that the observed EPAs at a position are im-
portant indicators of the accuracy with which functional impact of
novel mutations can be predicted.

Discussion
In proteome-wide analyses, we have shown that evolutionary rate
and positional amino acid composition correlates extensively with
the computational assessment of a mutation’s functional effects.
A large number of DAMs are found in positions that vary among
species, and a majority of DAMs in these positions are misdi-
agnosed. Similarly, a large number of nSNPs occur in positions
that are highly conserved, which, in many cases, are predicted by
computational tools to carry functional consequences. Correlation
between classification accuracies and PSIC scores suggests that we
could improve prediction by tailoring PSIC classification thresh-
Figure 2. PolyPhen classification of 12,421 nSNPs into benign, possibly- olds to individual classes of variants (e.g., by amino acid type and
damaging, and probably-damaging categories. (A) Fraction of nSNPs by rate class). However, such efforts would likely suffer the hand-
classified into the three categories. The fraction of nSNPs designated to be
benign at positions with different evolutionary rates (B) and original amino icap of the classical trade-off between the false-negative and false-
acids (C ). Panel B also contains the accuracy of DAM inference from Figure positive prediction rates. That is, while changes in PSIC diagnostic
1B (filled squares). Error bars, 95% confidence interval based on the bi- thresholds for individual amino acids and/or rate classes might
nomial variance of the fraction of sites. reduce false-negatives for DAMs, they might simultaneously in-
crease false-positives for nSNPs. In such cases, it is prudent to as-
sociate a reliability indicator with inferences produced using
gaps and missing data were almost identical for EPA and non-EPA computational methods (e.g., Bromberg and Rost 2007).
nSNP sites (32% and 33%, respectively). We expect the fraction of We suggest that a reliability of inference (RoI) measure be
non-EPA nSNPs to increase in the future as more individual included with functional predictions to reflect their uncertainty.
genomes are sequenced and rarer alleles are discovered. This in- The RoI measure is the average of probability of true-positives (PTP)
crease will be counteracted by discovery of more nSNPs in EPA and the probability of true-negatives (PTN). The former is calculated
because the use of more species in the multiple sequence align- by applying the given computational method on all available
ments would expand the list of EPAs at each position. Overall, the DAMs, while the latter is calculated by using all available strictly
number of non-EPA nSNPs is likely to decline slowly, if at all. This ‘‘neutral’’ nSNP data. By design, the RoI does not depend on the
conclusion is based on the observation that more than 80% of EPA inference made. Rather, it captures how difficult it will be to make
nSNPs could be identified using only 33 nonhuman mammals, a correct prediction for a given type of change in its evolutionary
and a 30% increase in the number of species (nine additional context. The RoI may only be improved by improving true-positive
species) led to the discovery of only a small fraction of nSNPs in and true-negative rates (such efforts are already underway for
expanded EPA lists for each site. PolyPhen) (S. Sunyaev, pers. comm.). Of course, PTP and PTN may
The frequency of nSNP occurrence in EPA shows a marked be weighted unequally in calculating the RoI when analyzing
relationship with the evolutionary rate. The nonsynonymous poly- nonsynonymous mutations from the genomes of ‘‘healthy’’ indi-
morphisms in the fastest-evolving positions are EPAs significantly viduals, because they are expected to carry a large number of

4 Genome Research
www.genome.org
Downloaded from genome.cshlp.org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Evolutionary anatomies of nonsynonymous mutations

Figure 3. (A) Frequency distributions of DAM (red) and nSNPs (blue) PSIC scores. Vertical lines show the PolyPhen PSIC cut-offs for classification of
variants in the absence of structural or other information; nSNPs and DAMs are from Subramanian and Kumar (2006a). (B) Mean PSIC values for DAMs in
different evolutionary rate categories. The correlation (r) between mean PSIC values and mean evolutionary rate is 0.96 (P << 0.01). The 95% confidence
intervals derived from the SEMs are shown. (C ) Relation between mean PSIC scores for DAMs and mean PSIC scores for nSNPs, by amino acid types (solid
circles) and evolutionary rates (open circles). (D) Inverse relationship of the accuracy of DAMs (probably-damaging) and nSNPs (benign) with the evo-
lutionary interchangeability of amino acid pairs (original/variant pairs) as captured in the BLOSUM62 matrix. Each data point represents the average of all
pairs for a given BLOSUM score, with the error bars displaying the 95% confidence intervals derived from binomial variance of the proportions. BLOSUM
scores are log-odds substitution occurrences. Negative BLOSUM scores show amino acid pairs that are found to have a low probability of substitution,
whereas a positive score indicates frequently observed amino acid pairs. Complete 20 3 20 matrices of DAM and nSNP accuracies (and their SEs) are given
in the Supplemental material.

neutral mutations. In this case, RoI = (PTN + vPTP)/(1 + v), where v In addition to helping us understand the factors that modulate
is the expected ratio of DAMs to nSNPs and will generally be less the accuracy of computational methods, evolutionary rates and
than one. Furthermore, single and multidimensional RoI matrices frequencies of EPAs at positions involved in DAMs and nSNPs sup-
may be constructed, with amino acid pair and rate classes as ad- ply null expectations for interpreting the observed population fre-
ditional dimensions, because the accuracy of diagnosis differs quencies of alleles. For example, computational methods have been
among classes for the same amino acid. We anticipate that suffi- used to predict the functional effects of nSNPs (benign, possibly-
cient data will become available in the future from the profiling of damaging, and probably-damaging) found in genome-scale popu-
an expanded number of diseases, individuals, and populations to lation surveys and the distributions of frequencies of alleles in the
build such matrices. three functional categories compared (Lohmueller et al. 2008).
For now, we used the estimates of PTP and PTN based on the Lohmueller et al. (2008) noted that the mean derived allele fre-
DAM and nSNP data analyzed (see 20 3 20 matrices in the Supple- quency (MAF) for the benign alleles is significantly higher than that
mental Figs. S5, S6), respectively, to estimate the RoI for 682 muta- for the damaging alleles. The direction and magnitude of this dif-
tions found in the disease-associated genes of one individual (Levy ference is predictable based on the average evolutionary rates of
et al. 2007). The average RoI for these mutations is 57.5% when PTP positions in the three functional categories, because the long-term
and PTN are equally weighted. It rises to 71% when PTN is given evolutionary rates at any given position will modulate allele fre-
a weight 10 times that to PTP (i.e., v = 0.1). This ad hoc ratio may be quencies within populations under the principles of the neutral
justifiable, because ;10% of nonsynonymous mutations are found theory (Kimura 1983; Subramanian and Kumar 2006a). Indeed, rates
to be fixed among species in comparative genomic analysis in- of evolution and the MAF are highly correlated over all nSNPs and
volving humans and chimpanzees (e.g., Subramanian and Kumar when considering EPA and non-EPA nSNPs separately (r = 0.88; P <
2006b). While a 71% success rate may appear reasonably good for 0.05). The evolutionary rate ratio for probably-damaging and benign
some academic research, it is presently too low to be useful in real- positions is quite similar to that reported for the MAF (0.49 and 0.40,
world applications (especially in making health decisions). respectively), but a second-degree polynomial fits the relationship

Genome Research 5
www.genome.org
Downloaded from genome.cshlp.org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Kumar et al.

In addition to frequencies, the biochemical severity of muta-

tions is an important factor to consider, particularly as it relates to
the DAMs found in EPAs and nSNPs absent from EPAs. At variable
sites, DAMs that overlap EPAs are biochemically much less severe
than other DAMs (average Grantham values of 73 vs. 93; P << 0.01)
(Table 1) but are more severe than nSNPs and inter-specific differ-
ences (average Grantham value of 68). Compensatory evolution is
thought to be one of the mechanisms to explain this observation for
DAMs and is discussed elsewhere (e.g., Kondrashov et al. 2002; Gao
and Zhang 2003; Subramanian and Kumar 2006a). On the other
hand, non-EPA nSNPs show a higher biochemical severity than
other SNPs (77 vs. 62; P << 0.01). Furthermore, many non-EPA
nSNPs are observed with relatively high frequencies in the
Lohmueller et al. (2008) data, and these higher frequency alleles
show a greater biochemical severity than EPA nSNPs with similar
frequencies (Fig. 4D). These observations suggest that some non-
EPA nSNPs may be involved in adaptive evolution or persist due to
compensatory changes. We are currently investigating biological
properties of a large number of nSNPs that occur with high fre-
quency in human populations but have significantly smaller than
expected ETS. These mutations are excellent candidates for po-
tential (ancient or modern) lineage-specific adaptations (and
compensations), and they will be discussed elsewhere (S Kumar
and A Filipski, in prep.). In the meantime, it is clear that the
interpretations of rare and common alleles with different func-
tional predictions need to account for evolutionary rates, the
amino acids involved, and the EPA status of mutations and their
resident positions when determining their genome-wide associa-
tions with the phenotypes.
In conclusion, with decreasing costs for sequencing personal
genomes and variants, it is quickly becoming feasible to use in-
dividual genetic novelties for learning about predisposition to
diseases and to better carry out optimally informed treatments
based on personal genomic profiles. In such efforts, computational
methods that predict the propensity of novel mutations to cause
disease will play a critical role, because it is not possible to in-
vestigate the effects of individual rare (or even common) muta-
tions in the laboratory and because each individual carries many
unique mutations. Our findings show that some amino acid
Figure 4. Analysis of EPAs. (A) The relation of the evolutionary rate with mutations will be easier to diagnose with high accuracy because of
the proportion of nSNPs present in the set of EPAs in the variable sites (r = the amino acids involved and because of the evolutionary prop-
0.91, P < 0.02). (B) The average allele frequencies of nSNPs present in EPAs erties of the positions they afflict when we apply genome-wide
(closed circles) and absent from the set of EPAs (open squares) in variable
positions evolving with different rates. Mean allele frequencies are sig-
observations to individual positions. The availability of thousands
nificantly different between two EPA categories for each rate class (P << of human genomes will reveal nonsynonymous mutations even at
0.01). (C ) Relationship between nSNPs frequency and the percentage of positions where DAMs are known to occur, which will make it
evolutionary time span (%ETS) of the corresponding EPA (r = 0.90, P << possible to develop position-specific estimators that diagnose
0.01). All non-EPA nSNPs have an ETS of 0. (D) The biochemical severities
of nSNPs present in EPAs (closed circles) and absent from EPAs (open
squares). Error bars, 95% confidence intervals derived from the SEMs.
Table 1. Allele frequencies and biochemical severities of nSNPs in
different functional categories in the context of their overlap with
EPAs at variable positions
between rate and the MAF better than a linear regression (r 2 = 0.86
and 0.77, respectively). Furthermore, the neutral theory predicts Computational diagnosis Absent from EPA Present in EPA
that EPAs found in a larger number of species would occur with
higher frequencies in the human population. Significant correlation DAMs
Benign 25% 55%
is found between the nSNP frequency in the human population and Probably damaging 49% 31%
the evolutionary time span (%ETS) when all nSNPs are divided into Grantham value 93.0 72.7
25 allele frequency categories (r = 0.90; P << 0.01) (Fig. 4C) and for nSNP
raw data (r = 0.41, P << 0.01). With the upcoming sequencing Benign 59% 84%
Probably damaging 14% 5%
of a large number of individuals, it will be possible to estimate allele
Grantham value 76.9 62.1
frequencies in different populations more reliably and to examine
the predictive power of ETS in generating expected allele frequencies All differences in mean Grantham values and percent accuracies are sta-
of nSNPs for use in functional genomics. tistically significant (P << 0.01).

6 Genome Research
www.genome.org
Downloaded from genome.cshlp.org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Evolutionary anatomies of nonsynonymous mutations

novel mutations with a high RoI. With the knowledge of in- billion years. Species divergence times were obtained from an
formation on the genotypes of nonsynonymous mutations and advanced version of the TimeTree resource (www.timetree.org,
SNPs, the copy number variation of the protein (including para- version 2.0 prerelease) (Hedges et al. 2006). For each position, all
logs), and the availability of more protein structures, it will become species containing alignment gaps or missing data were pruned
possible to build more accurate mutation classifiers to diagnose from the tree before calculating the number of substitutions and
disease propensities of novel mutations, select and prioritize var- the total evolutionary time. We repeated this procedure to calcu-
iants for experimental research, and develop baseline patterns of late the evolutionary rate using only 33 mammalian species. Ver-
novel allele frequencies within populations. tebrate and mammal rates were highly correlated for all sites used
(r = 0.92; P << 0.01), and we employed the latter rates, as mam-
malian genomes are more appropriate models for the human ge-
nome as compared to more distantly related species. Furthermore,
Methods we have previously shown that maximum likelihood estimates of
We analyzed two large-scale data sets of DAMs and nSNPs (Sub- relative evolutionary rates are very highly correlated with rates
ramanian and Kumar 2006a; Lohmueller et al. 2008). The Sub- obtained using the Fitch algorithm (Miller and Kumar 2001), as
ramanian and Kumar (2006a) data set consisted of 10,685 DAMs each site contains data from many closely and distantly related
and 5308 human nSNPs. This data set was constructed by down- species. This was confirmed in our analysis of DAM positions for
loading the human proteome from GenBank (build 34.1) with which rates from four species ML analysis from Subramanian and
associated RefSeq identifiers for each gene. Of all available DAMs Kumar (2006a) and the 44-species analysis in this study showed
in 1307 human genes from HGMD (http://archive.uwcm.ac.uk/ significant correlation (r = 0.70; P << 0.01). Because the calculation
uwcm/mg/hgmd0.html) and all putatively-benign nSNP sites of rates by our current method only requires the amino acids in all
in 11,753 human genes from various genome projects (see other species at a given site, it is more suitable for application in
Subramanian and Kumar 2006a), genes containing no DAMs or personalized diagnostics. We quantized evolutionary rates into six
nSNPs were discarded. Complete proteomes of four diverse species discrete categories such that sites showing no variation across all
(Homo sapiens, Mus musculus, Gallus gallus, and Takifugu rubripes) species comprise the slowest-evolving group (category 0), and the
were obtained for the remaining genes from the Ensembl web cut-off rates for the other five categories (1–5) were such that they
server (http://www.ensembl.org/), along with orthologs identified each contained a similar number of sites when applied to the
via a reciprocal BLASTP search with each RefSeq gene (Altschul Lohmueller et al. (2008) nSNPs. The five categories of evolutionary
et al. 1990; Waterston et al. 2002). Additionally, the BLOSUM rate of variable positions had average evolutionary rates of 0.6, 1.5,
substitution matrix was employed using appropriate threshold 2.5, 3.5, and 5.3 with standard deviations of 0.2, 0.3, 0.3, 0.3, and
scores (Subramanian and Kumar 2004). If any of the three verte- 1.5, respectively.
brate orthologs could not be determined for any human gene, then These UCSC Genome Browser alignments were also used to
that gene and all DAMs and nSNPs contained within it were ex- generate EPAs at each position, because they cover 44 diverse
cluded from the data set. Each ortholog was aligned to the ho- vertebrate species, including agnathans, fishes, amphibians, birds,
mologous human sequence with CLUSTALW using default settings and mammals (http://genome.ucsc.edu/). Under the principles of
(Thompson et al. 1994), and all sites (and thus associated DAMs the neutral theory of molecular evolution, a vast majority of EPAs
and nSNPs) containing indels or missing data at homologous sites are expected to represent neutral variants at a site. For each DAM/
in any of the three vertebrate species were excluded in order to nSNP, we estimated the percentage of evolutionary time span
represent at least four species. (%ETS) in the 44-species tree, which is the total branch length
From the Lohmueller et al. (2008) data, we extracted all nSNPs (times) in the tree obtained after pruning all nonhuman species
by removing all synonymous, noncoding and redundant SNPs. lacking the variant allele divided by the total branch length of the
Then, we used dbSNP rsIDs for each nSNP (http://www.ncbi.nlm. tree after pruning all species containing an alignment gap or
nih.gov/projects/SNP/) to generate a RefSeq identifier (Pruitt et al. missing data. For each variant at a site, the ETS varies from 0%–
2007). This information was used to map each nSNP onto 100%, with constant sites containing a single EPA with an ETS of
the 44-species protein alignments available in the UCSC Genome 100% and non-EPA mutations producing an ETS of 0%. A smaller
Browser (Kuhn et al. 2009). During this process, a substantial ETS is frequently associated with variation that has occurred re-
number of nSNPs was eliminated because either dbSNP records did cently in species closely related to humans.
not contain a map from rsIDs to RefSeq identifiers, not all human The PolyPhen web resource was used to classify mutations
RefSeq identifiers were present in the UCSC data set, or the wild- into benign, possibly-damaging, and probably-damaging catego-
type amino acid in the Lohmueller data set was not the human ries for DAMs and nSNPs (Ramensky et al. 2002). After removing
representative in the UCSC data set. The outcome was a set of duplicate entries and sites for which PolyPhen returned ‘‘un-
12,712 nSNPs with allele frequencies as reported by Lohmueller known’’ or was unable to return any result, the final data set
et al. (2008), and the 44-species alignment for each nSNP position. contained 9460 DAMs and 4020 nSNPs for the Subramanian and
The 44-species alignments were also generated by using the RefSeq Kumar (2006a) data sets. We noticed that while PolyPhen attempts
identifiers in UCSC for all the DAMs. We discarded all positions to incorporate information from the protein structure (when
where the amino acid state of any of the species in the original four available from databases such as Protein Data Bank) and available
sequence alignment disagreed between the Subramanian and functional data from site annotations, the final diagnosis for this
Kumar (2006a) data set and the UCSC alignment. This produced data set was rooted solely in primary sequences for >97% of the
a total of 8696 DAMs with 44-species alignments. mutations we tested. Inclusion and exclusion of these mutations
We estimated the evolutionary rate for each amino acid site produce the same results, so we did not consider nonsequence
separately using the amino acids found in the 44-species align- attributes in any of our analyses. Because of the slowness of the
ment. The number of substitutions at each site was obtained by web resource (http://sift.cchmc.org/), SIFT analyses are based on
using the known phylogeny of the species (Fig. 1B) and applying a subset of these DAMs and nSNPs (approximately one-third each,
the Fitch (1971) algorithm. The total of the substitutions was di- 2375 and 1439, respectively). Supplementary information avail-
vided by the total time elapsed on the tree to obtain the evolu- able from Lohmueller et al. (2008) provided the PolyPhen di-
tionary rate in the units of the number of substitutions per site per agnosis for all the nSNPs.

Genome Research 7
www.genome.org
Downloaded from genome.cshlp.org on July 10, 2009 - Published by Cold Spring Harbor Laboratory Press

Kumar et al.

Acknowledgments Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J,
Kirkness EF, Denisov G, et al. 2007. The diploid genome sequence of
We thank Revak Raj Tyagi for his help with UCSC Genome browser an individual human. PLoS Biol 5: e254. doi: 10.1371/journal.pbio.
data extraction, Antoine Ah-Foune and Veronica Shi for some early 0050254.
Lohmueller KE, Indap AR, Schmidt S, Boyko AR, Hernandez RD, Hubisz MJ,
analyses, and Kristi Garboushian for providing editorial support.
Sninsky JJ, White TJ, Sunyaev SR, Nielsen R, et al. 2008. Proportionally
We thank David Cooper (HGMD) for permitting us to use the more deleterious genetic variation in European than in African
disease-associated mutation data of Subramanian and Kumar (2006a). populations. Nature 451: 994–997.
This research was supported by a research grant from NIH HG2096 Miller MP, Kumar S. 2001. Understanding human disease mutations
(S.K.). through the use of interspecific genetic variation. Hum Mol Genet 10:
2319–2328.
Ng PC, Henikoff S. 2003. SIFT: Predicting amino acid changes that affect
protein function. Nucleic Acids Res 31: 3812–3814.
References Ng PC, Henikoff S. 2006. Predicting the effects of amino acid substitutions
on protein function. Annu Rev Genomics Hum Genet 7: 61–80.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local Pruitt KD, Tatusova T, Maglott DR. 2007. NCBI reference sequences (RefSeq):
alignment search tool. J Mol Biol 215: 403–410. A curated non-redundant sequence database of genomes, transcripts
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown and proteins. Nucleic Acids Res 35: D61–D65.
CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al. 2008. Accurate whole Ramensky V, Bork P, Sunyaev S. 2002. Human non-synonymous SNPs:
human genome sequencing using reversible terminator chemistry. Server and survey. Nucleic Acids Res 30: 3894–3900.
Nature 456: 53–59. Rudd MF, Williams RD, Webb EL, Schmidt S, Sellick GS, Houlston RS. 2005.
Bhatti P, Church DM, Rutter JL, Struewing JP, Sigurdson AJ. 2006. Candidate The predicted impact of coding single nucleotide polymorphisms
single nucleotide polymorphism selection using publicly available tools: database. Cancer Epidemiol Biomarkers Prev 14: 2598–2604.
A guide for epidemiologists. Am J Epidemiol 164: 794–804. Shastry BS. 2007. SNPs in disease gene mapping, medicinal drug
Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD,
development and evolution. J Hum Genet 52: 871–880.
Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, et al.
Subramanian S, Kumar S. 2004. Gene expression intensity shapes
2008. Assessing the evolutionary impact of amino acid mutations in the
evolutionary rates of the proteins encoded by the vertebrate genome.
human genome. PLoS Genet 4: e1000083. doi: 10.1371/journal.pgen.
Genetics 168: 373–381.
1000083.
Subramanian S, Kumar S. 2006a. Evolutionary anatomies of positions and
Bromberg Y, Rost B. 2007. SNAP: Predict effect of non-synonymous
polymorphisms on function. Nucleic Acids Res 35: 3823–3835. types of disease-associated and neutral amino acid mutations in the
Chan PA, Duraisamy S, Miller PJ, Newell JA, McBride C, Bond JP, Raevaara T, human genome. BMC Genomics 7: 306. doi: 10.1186/1471-2164-7-306.
Ollila S, Nystrom M, Grimm AJ, et al. 2007. Interpreting missense Subramanian S, Kumar S. 2006b. Higher intensity of purifying selection
variants: Comparing computational methods in human disease genes on >90% of the human genes revealed by the intrinsic replacement
CDKN2A, MLH1, MSH2, MECP2, and tyrosinase (TYR). Hum Mutat 28: mutation rates. Mol Biol Evol 23: 2283–2287.
683–693. Sunyaev SR, Eisenhaber F, Rodchenkov IV, Eisenhaber B, Tumanyan VG,
Cheng TM, Lu YE, Vendruscolo M, Lio P, Blundell TL. 2008. Prediction by Kuznetsov EN. 1999. PSIC: Profile extraction from sequence alignments
graph theoretic measures of structural effects in proteins arising from with position-specific counts of independent observations. Protein Eng
non-synonymous single nucleotide polymorphisms. PLoS Comput Biol 12: 387–394.
4: e1000135. doi: 10.1371/journal.pcbi.1000135. Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: Improving the
Doniger SW, Kim HS, Swain D, Corcuera D, Williams M, Yang SP, Fay JC. sensitivity of progressive multiple sequence alignment through
2008. A catalog of neutral and deleterious polymorphism in yeast. PLoS sequence weighting, position-specific gap penalties and weight matrix
Genet 4: e1000183. doi: 10.1371/journal.pgen.1000183. choice. Nucleic Acids Res 22: 4673–4680.
Eyre-Walker A, Woolfit M, Phelps T. 2006. The distribution of fitness effects of Tian J, Wu N, Guo X, Guo J, Zhang J, Fan Y. 2007. Predicting the phenotypic
new deleterious amino acid mutations in humans. Genetics 173: 891–900. effects of non-synonymous single nucleotide polymorphisms based on
Fitch WM. 1971. Toward defining the course of evolution: Minimum support vector machines. BMC Bioinformatics 8: 450. doi: 10.1186/1471-
change for a specific tree topology. Syst Zool 20: 406–416. 2105-8-450.
Gao L, Zhang J. 2003. Why are some human disease-associated mutations Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J,
fixed in mice? Trends Genet 19: 678–681. et al. 2008. The diploid genome sequence of an Asian individual. Nature
Hedges SB, Dudley J, Kumar S. 2006. TimeTree: A public knowledge-base of 456: 60–65.
divergence times among organisms. Bioinformatics 22: 2971–2972. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P,
Kimura M. 1983. The neutral theory of molecular evolution. Cambridge Agarwala R, Ainscough R, Alexandersson M, An P, et al. 2002. Initial
University Press, Cambridge, UK. sequencing and comparative analysis of the mouse genome. Nature 420:
Kondrashov AS. 2003. Direct estimates of human per nucleotide mutation 520–562.
rates at 20 loci causing Mendelian diseases. Hum Mutat 21: 12–27. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W,
Kondrashov AS, Sunyaev S, Kondrashov FA. 2002. Dobzhansky-Muller Chen YJ, Makhijani V, Roth GT, et al. 2008. The complete genome of an
incompatibilities in protein evolution. Proc Natl Acad Sci 99: 14878– individual by massively parallel DNA sequencing. Nature 452: 872–876.
14883. Yampolsky LY, Kondrashov FA, Kondrashov AS. 2005. Distribution of the
Kryukov GV, Pennacchio LA, Sunyaev SR. 2007. Most rare missense alleles strength of selection against amino acid replacements in human
are deleterious in humans: Implications for complex disease and proteins. Hum Mol Genet 14: 3191–3201.
association studies. Am J Hum Genet 80: 727–739.
Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead
B, Raney BJ, Pohl A, Pheasant M, et al. 2009. The UCSC Genome Browser
Database: Update 2009. Nucleic Acids Res 37: D755–D761. Received February 1, 2009; accepted in revised form June 8, 2009.

8 Genome Research
www.genome.org

New Clinical Genetics A Guide to Genomic Medicine - 4th Edition All Sections Download
No ratings yet
New Clinical Genetics A Guide to Genomic Medicine - 4th Edition All Sections Download
17 pages
Complete Download Digital Hinduism First Edition Xenia Zeiler PDF All Chapters
100% (8)
Complete Download Digital Hinduism First Edition Xenia Zeiler PDF All Chapters
52 pages
Get Persian Gulf 2018: India's Relations With The Region P. R. Kumaraswamy Free All Chapters
100% (3)
Get Persian Gulf 2018: India's Relations With The Region P. R. Kumaraswamy Free All Chapters
62 pages
Matteson Thesis
No ratings yet
Matteson Thesis
37 pages
mmc2
No ratings yet
mmc2
26 pages
From Genomic Variation To Personalized Medicine
No ratings yet
From Genomic Variation To Personalized Medicine
152 pages
Human Genetic Variation
No ratings yet
Human Genetic Variation
36 pages
FINALS-MOLBIO
No ratings yet
FINALS-MOLBIO
20 pages
SNPeff
No ratings yet
SNPeff
13 pages
Merge Pages 62eadaff8e854 (1)
No ratings yet
Merge Pages 62eadaff8e854 (1)
17 pages
rabbani2016
No ratings yet
rabbani2016
28 pages
Paper 1 - Comprehensive Characterization of PPI Perturbed by Disease Mutations
No ratings yet
Paper 1 - Comprehensive Characterization of PPI Perturbed by Disease Mutations
20 pages
2001annex - Pages 8-160
No ratings yet
2001annex - Pages 8-160
156 pages
参考文献5
No ratings yet
参考文献5
11 pages
283ra53.full
No ratings yet
283ra53.full
11 pages
My Brain
No ratings yet
My Brain
36 pages
Chaya Amendment Plaint.
No ratings yet
Chaya Amendment Plaint.
20 pages
TBI Year-In-Review 2013
No ratings yet
TBI Year-In-Review 2013
91 pages
[Current Protocols in Human Genetics 2020-feb 27 vol. 105 iss. 1] Goswami, Rashmi S._ Harada, Shuko - An Overview of Molecular Genetic Diagnosis Techniques (2020) [10.1002_cphg.97] - libgen.li
No ratings yet
[Current Protocols in Human Genetics 2020-feb 27 vol. 105 iss. 1] Goswami, Rashmi S._ Harada, Shuko - An Overview of Molecular Genetic Diagnosis Techniques (2020) [10.1002_cphg.97] - libgen.li
12 pages
Fat Noews Docx (4)
No ratings yet
Fat Noews Docx (4)
24 pages
Final Thesis
75% (8)
Final Thesis
38 pages
Lecture 8
No ratings yet
Lecture 8
30 pages
Human Mutation - 2013 - Liu
No ratings yet
Human Mutation - 2013 - Liu
10 pages
TMS - Genetic live transfer
No ratings yet
TMS - Genetic live transfer
10 pages
Houle 2016
No ratings yet
Houle 2016
29 pages
ioannidis2016
No ratings yet
ioannidis2016
9 pages
Genome Basic Concept, Terminology and Tools
No ratings yet
Genome Basic Concept, Terminology and Tools
47 pages
Fgene 13 1045301
No ratings yet
Fgene 13 1045301
9 pages
Práctica 2 Eng
No ratings yet
Práctica 2 Eng
9 pages
Discovering Combinatorial Biomarkers: Vipin Kumar
No ratings yet
Discovering Combinatorial Biomarkers: Vipin Kumar
23 pages
humu0032-0894
No ratings yet
humu0032-0894
6 pages
Paper 3
No ratings yet
Paper 3
8 pages
Apap PDF
No ratings yet
Apap PDF
18 pages
Role of Pharmacogenomics in Drug Development
No ratings yet
Role of Pharmacogenomics in Drug Development
11 pages
Genome Sequencing
No ratings yet
Genome Sequencing
10 pages
s41431-021-00903-z
No ratings yet
s41431-021-00903-z
10 pages
Narrative Report- Matatag Seminar
No ratings yet
Narrative Report- Matatag Seminar
6 pages
Research
No ratings yet
Research
13 pages
Algorithms 16 00480
No ratings yet
Algorithms 16 00480
14 pages
Big Data and Genomics
No ratings yet
Big Data and Genomics
54 pages
Nxivm Doc 503: Objection To Jury Anonymity
No ratings yet
Nxivm Doc 503: Objection To Jury Anonymity
4 pages
2023 Genetic Testing
No ratings yet
2023 Genetic Testing
36 pages
Graph Based Signature
No ratings yet
Graph Based Signature
8 pages
Precision Medicine Integrating Whole-Genome Sequencing
No ratings yet
Precision Medicine Integrating Whole-Genome Sequencing
10 pages
Keynote: Happy Maps, Daniele Quercia
No ratings yet
Keynote: Happy Maps, Daniele Quercia
3 pages
Material SNP
No ratings yet
Material SNP
3 pages
Harding V Commercial Union Assurance Company
100% (2)
Harding V Commercial Union Assurance Company
2 pages
Genome Signal Case Study
No ratings yet
Genome Signal Case Study
11 pages
Discovering Genomics, Proteomics, and Bioinformatics: Second Edition
No ratings yet
Discovering Genomics, Proteomics, and Bioinformatics: Second Edition
8 pages
Martin 2011
No ratings yet
Martin 2011
3 pages
2003 Sift
No ratings yet
2003 Sift
3 pages
Pcu Spec Pro Syllabus
No ratings yet
Pcu Spec Pro Syllabus
8 pages
Circular3OrientationProgramXI202526pdf_202504040417_0
No ratings yet
Circular3OrientationProgramXI202526pdf_202504040417_0
2 pages
(Ebook) Ultimate Guide To People Analytics
No ratings yet
(Ebook) Ultimate Guide To People Analytics
35 pages
Foundations, Promises and Uncertainties of Personalized Medicine
No ratings yet
Foundations, Promises and Uncertainties of Personalized Medicine
7 pages
Working Model Important
No ratings yet
Working Model Important
77 pages
Setup, Validation, and Quality Control of A Centralized Whole-Genome-Sequencing Laboratory Lessons Learned0pdf
No ratings yet
Setup, Validation, and Quality Control of A Centralized Whole-Genome-Sequencing Laboratory Lessons Learned0pdf
36 pages
Analysis and Design of A Transformer-Feedback-Based Wideband Receiver
No ratings yet
Analysis and Design of A Transformer-Feedback-Based Wideband Receiver
12 pages
Unhygenic Food and Sugar
No ratings yet
Unhygenic Food and Sugar
4 pages
David G. Wang, Et Al. - Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in The Human Genome
No ratings yet
David G. Wang, Et Al. - Large-Scale Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in The Human Genome
7 pages
Design, Analysis, and Interpretation of Genome Wide Association Scans ISBN 1461494427, 9781461494423 scribd download
No ratings yet
Design, Analysis, and Interpretation of Genome Wide Association Scans ISBN 1461494427, 9781461494423 scribd download
17 pages
Pathogenicity Prediction
No ratings yet
Pathogenicity Prediction
5 pages
Pharmacogenomics: When Drug Treatment Becomes Personalized Medicine
No ratings yet
Pharmacogenomics: When Drug Treatment Becomes Personalized Medicine
18 pages
Adv Unit1 Answerkey
No ratings yet
Adv Unit1 Answerkey
2 pages
Dna PDF
No ratings yet
Dna PDF
4 pages
Ijsred V2i1p5
No ratings yet
Ijsred V2i1p5
6 pages
Genomic Medicine Principles and Practice, 2nd Edition Premium Download
100% (16)
Genomic Medicine Principles and Practice, 2nd Edition Premium Download
15 pages
March 24 Barbiere
No ratings yet
March 24 Barbiere
16 pages
Word Formation Exercises
No ratings yet
Word Formation Exercises
2 pages
Tmp1a96 TMP
No ratings yet
Tmp1a96 TMP
80 pages
Personalized Cardiovascular Medicine: Where We Stand Now, and The Road Ahead
No ratings yet
Personalized Cardiovascular Medicine: Where We Stand Now, and The Road Ahead
32 pages
Innovative Thumb-Sucking Habit-Breaking Appliances
No ratings yet
Innovative Thumb-Sucking Habit-Breaking Appliances
1 page
Homework of Journal 1 Parker Gibson
No ratings yet
Homework of Journal 1 Parker Gibson
2 pages
GeneDx Variant Classification Process June 2021
No ratings yet
GeneDx Variant Classification Process June 2021
4 pages
EP7 3.2.1 Glass Containers For Pharmaceutical Use PDF
100% (1)
EP7 3.2.1 Glass Containers For Pharmaceutical Use PDF
5 pages
Mutation Screenun April June 2015 2 Genevista
No ratings yet
Mutation Screenun April June 2015 2 Genevista
4 pages
Dna Test Report - Medgenome Laboratories: Luvv Aggarwal (G18-4859) 80236/213216
0% (1)
Dna Test Report - Medgenome Laboratories: Luvv Aggarwal (G18-4859) 80236/213216
6 pages
Fossella Pers Med Valuation
No ratings yet
Fossella Pers Med Valuation
2 pages
Human Variation 2015
No ratings yet
Human Variation 2015
2 pages
Read Godly Empress Doctor Chapter 203-204 - NovelPlanet
0% (1)
Read Godly Empress Doctor Chapter 203-204 - NovelPlanet
4 pages
According To The Restatement Second of Contracts
No ratings yet
According To The Restatement Second of Contracts
3 pages
tmpE3C0 TMP
No ratings yet
tmpE3C0 TMP
17 pages
tmp3656 TMP
No ratings yet
tmp3656 TMP
14 pages
tmpF178 TMP
No ratings yet
tmpF178 TMP
15 pages
Exam 1
No ratings yet
Exam 1
5 pages
Power Pivot
No ratings yet
Power Pivot
20 pages
tmpA7D0 TMP
No ratings yet
tmpA7D0 TMP
9 pages
tmp998 TMP
No ratings yet
tmp998 TMP
9 pages
tmp97C8 TMP
No ratings yet
tmp97C8 TMP
9 pages
Tmp75a7 TMP
No ratings yet
Tmp75a7 TMP
8 pages
tmp96F2 TMP
No ratings yet
tmp96F2 TMP
4 pages
Act 4103 - The Indeterminate Sentence Law
No ratings yet
Act 4103 - The Indeterminate Sentence Law
3 pages
Sistema de Combustible Con Inyector Bomba PDE y EDC S6. Descripción de Funcionamiento
97% (35)
Sistema de Combustible Con Inyector Bomba PDE y EDC S6. Descripción de Funcionamiento
48 pages
Rubrics For Role Playing
100% (1)
Rubrics For Role Playing
1 page
Humans chimpanzees and genetic similarity: a deeper critique
From Everand
Humans chimpanzees and genetic similarity: a deeper critique
Mehdi Ghram
No ratings yet
Fast Facts: Neuroblastoma
From Everand
Fast Facts: Neuroblastoma
Jennifer Foster
No ratings yet
Cancer Textbook: 1
From Everand
Cancer Textbook: 1
Aliasghar Tabatabaei Mohammadi
No ratings yet
Complementary and Alternative Medical Lab Testing Part 17: Oncology
From Everand
Complementary and Alternative Medical Lab Testing Part 17: Oncology
Ronald Steriti
No ratings yet
Diagnostic Problems in Tumors of Central Nervous System: Selected Topics
From Everand
Diagnostic Problems in Tumors of Central Nervous System: Selected Topics
Arun Chitale
No ratings yet

TMP F780

Uploaded by

TMP F780

Uploaded by

Downloaded from genome.cshlp.

Positional conservation and amino acids shape the correct

Genome Res. published online June 22, 2009

To subscribe to Genome Research go to:

Copyright © 2009 by Cold Spring Harbor Laboratory Press

Positional conservation and amino acids shape the

Evolutionary anatomies of nonsynonymous mutations

show the lowest average PSIC scores, as

more frequently than those in the slowest-evolving positions (81%

Evolutionary anatomies of nonsynonymous mutations

In addition to frequencies, the biochemical severity of muta-

Evolutionary anatomies of nonsynonymous mutations

You might also like