The Penultimate Rotamer Library
The Penultimate Rotamer Library
ABSTRACT All published rotamer libraries both the speed and accuracy of building crystallographic
contain some rotamers that exhibit impossible inter- models. However, any incorrect conformations included in
nal atomic overlaps if built in ideal geometry with a rotamer library will show increased occurrence in the
all hydrogen atoms. Removal of uncertain residues less certain parts of new experimental structures as well
(mainly those with B-factors >40 or van der Waals as biasing theoretical models. We feel, therefore, that the
overlaps >0.4 Å) greatly improves the clustering of accuracy of rotamer libraries is an important issue, with
rotamer populations. Asn, Gln, or His side chains the increased use of repacking and homology modeling,
additionally benefit from flipping of their planar and especially on the eve of a major structural genomics
terminal groups when required by atomic overlaps effort.
or H-bonding. Sensitivity to skew and to the bound- The growth of the PDB as a whole has been important in
aries of angle bins is avoided by using modes improving the accuracy of rotamer libraries, but even more
rather than traditional mean values. Rotamer defini- important is the recent growth in the number of very
tions are listed both as the modal values and in a high-resolution protein structures. At such resolution and
preferred version that maximizes common atoms in the good areas of the electron density map, side-chain
between related rotamers. The resulting library conformations are very clearly seen, resulting in little bias
shows significant differences from previous ones, from previously defined rotamers, refinement methods, or
differences validated by considering the likelihood fitting errors. Unfortunately, no previous study has lim-
of systematic misfitting of models to electron den- ited itself solely to these residues; usually resolution and
sity maps and by plotting changes in rotamer fre- homology criteria are applied to choose good structures,
quency with B-factor. Few rotamers now show but then all residues in each structure contribute equally
atomic overlaps in ideal geometry; those overlaps to the library.
are relatively small and can be understood in terms
The development of our all-atom contact analysis tech-
of bond angle distortions compensated by favorable
nique using the Probe program22 and the optimization of
interactions. The new library covers 94.5% of ex-
H-atom positions in Reduce23 allows us to analyze all-atom
amples in the highest quality protein data with 153
steric and H-bonding interactions. All published rotamer
rotamers and can make a significant contribution to
libraries are found by this new methodology to contain
improving the accuracy of new structures. Proteins
rotamers with serious van der Waals overlaps (clashes)
2000;40:389 – 408. © 2000 Wiley-Liss, Inc.
when built with all atoms and standard geometry. Signifi-
cant clashes in defined rotamers are unexpected, since the
Key words: side-chain rotamer library; all-atom con-
most commonly occurring conformations should have the
tact analysis; structure validation; re-
lowest energy. Atomic overlaps (up to about 0.4 Å) may
versed leucines; explicit hydrogens; van
indicate the inappropriateness of using standard geometry
der Waals analysis
(for example, if a conformation has systematically strained
INTRODUCTION bond angles), but larger clashes almost certainly indicate
an erroneous rotamer definition.
Side-chain -angle distributions were studied as soon as Previous rotamer studies have each used different ap-
multiple protein structures were available.1–5 The observa- proaches, leading to libraries with many similarities but
tion that side-chain torsions fall into n-dimensional clus- some differences. The original library of Ponder and
ters and that, therefore, a library of rotamers can usefully
Richards6 drew bins around the observed clusters and
be defined was introduced in 1987 by Ponder and Rich-
determined the mean and standard deviation of the peak.
ards.6 As the database has grown, several groups have
This library has been very influential and is still the most
since compiled updated rotamer libraries.7–11 The concept
of rotamers and the availability of rotamer libraries has
changed the handling of side chains in homology model- Grant sponsor: National Institutes of Health; Grant number: GM-
ing,12 Monte Carlo and combinatorial calculations,13 and 15000.
protein design.14,15 Side-chain rotamer libraries are incor- The Supplementary material referred to in this article can be found
porated into crystallographic model-to-map fitting pro- at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/
index.html/.
grams such as O16 and XtalView,17 while angle expecta-
*Correspondence to: David C. Richardson, 211 Nanaline Duke
tions are part of verification tools,18,19 including those Building, Duke University, Durham, NC 27710-3711. E-mail:
used for all structures deposited at the PDB (Protein Data [email protected]
Bank20,21). The use of rotamers significantly improves Received 20 December 1999; Accepted 14 March 2000
widely used in some fields. Despite the small size of the ian statistics to obtain an estimate of the population of
database then available (e.g., only 16 examples of Met), it otherwise sparse regions. This approach has significant
has surprisingly few artifacts, and in some aspects sur- advantages: the pure statistical accuracy is very high, and
passes later libraries compiled from larger data sets. the probability for every division of rotamer space is
However, it could seldom reach beyond 2 and it includes explicitly stated (including every division of and in the
some incorrect amide orientations. That work used hydro- backbone-dependent version). However, this methodology,
gens to check for long-range atomic clashes, but neither plus the inclusion of high-B data, lowers the overall
these nor subsequent authors have checked for internal contrast and leads to a defined rotamer in every possible
clashes, presumably believing that high-resolution data bin, the less probable of which often show extremely large
prevents them. internal clashes and are unlikely ever to be genuinely
Schrauber et al.7 discussed whether or not rotamers observed. Also, especially for side chains with planar
were useful and whether side chains were rotameric functional groups, the a priori bins split single distribu-
(which they defined as mean ⫾20°). As part of that tions, leading to misplaced means and extra rotamers in
analysis they compiled a new rotamer library, but out only the tails of the distribution which are valid as arbitrary
to 2 and excluding Asp and Asn. Despite its conservative sampling points but not as locally favored conformations.
nature, this library contains some duplication of rotamers, Overall, the library of Dunbrack and Cohen is the most
some rotamers with steric clashes, and some systemati- complete one previously published, but users must give
cally misfit conformations. Also, despite their overall nega- thoughtful attention to setting the lower threshold for
tive conclusions about the rotamer concept, their library acceptable rotamer probabilities.
has been used by others. We have recently published sets of rotamers for Met22
Tuffery et al. used cluster analysis to produce a library and for Asn and Gln.25 The Met rotamers were defined
of 113 rotamers,8 later expanded to 212,9which was used with a B-factor cutoff of 30, which tightened the 3
by them and others for combinatorial repacking calcula- distribution remarkably, allowing 94% of the observed
tions. Their rotamers are unusual in that they have residues to be included in 13 rotamers. For Asn and Gln,
relatively few clashes but contain duplicated rotamers for we used our program Reduce to optimize H-bond net-
symmetrical side chains and often have nearly eclipsed works, as well as to add all explicit hydrogens. About 20%
angles. These features are probably due to their methodol- of the side-chain amides were flipped by 180° because the
ogy of performing energy minimization on the structures flip resulted in substantially better hydrogen bonding or
before compiling the library. Such a procedure is a reason- substantially less atomic overlap, while inconclusive cases
able step in their repacking calculations, but we feel it is were omitted. The result showed Asn and Gln terminal
inappropriate in compiling a rotamer library. Energy angle distributions with clear clustering for the first time,
minimization untethered to X-ray data rarely improves an allowing definition of rotamers that correspond to low-
experimental structure: if moving from the final model to a energy conformations.
more correct structure was as simple as minimizing, the In the current work we extend our analysis to include all
crystallographic refinement would already have done it. side-chains, using a database of 240 structures at 1.7 Å
Indeed, such untethered energy minimization has been resolution or better and all applicable filters. Additionally,
used by crystallographers to produce degraded models as our all-atom contact analysis can easily distinguish be-
controls for verification programs.24 tween pairs of high- and low-energy conformations occupy-
The library of De Maeyer et al.10 is a combination of ing approximately the same spatial position which might
those of Schrauber et al.7 and Ponder and Richards,6 with be mistaken for each other in lower-resolution electron
some extensions in order to include angles past 2 and to density maps. If critical analysis (examination of van der
sample regions of torsion space not included in the former Waals overlaps, electron density, and occurrence as a
studies. The library has some undesirable features, includ- function of B-factor and resolution) indicates that an
ing Arg 4 rotamers at ⫾60° causing substantial van der observed conformation is a systematic fitting error, then it
is not included in our library.
Waals clashes, some Asn and Gln rotamers with amide
The resulting rotamer library, because of the removal of
groups in incorrect flip states, and rotamers with fully-
side chains with uncertain conformations and systematic
eclipsed angles. An advantage is their use of common
errors, is less prone to perpetuation of inaccuracies than
angles leading to common atom positions, an approach
those published previously. It is also complete as far as
also adopted here.
possible from the current high-quality database, and it
The rotamer library built into the O crystallographic
shows a very high percentage of side chains to be rota-
fitting program16 is also an extension from Ponder and
meric.
Richards. Most of the relatively low number of rotamers
are sound, but many genuine rotamers are missing and a
METHODS
few demonstrably-incorrect ones are included.
Nomenclature
The most comprehensive recent analysis was done by
Dunbrack and Cohen11 (for updates see their website at Many different nomenclatures have been used to de-
www.fccc.edu/research/labs/dunbrack/sidechain.html) us- scribe side-chain torsion angles. One of the most widely
ing over 500 structures. They divided torsion space into used is g⫹, g⫺, and t for gauche positive, gauche negative,
bins such that all regions were included and used Bayes- and trans, respectively (as illustrated in Fig. 1 for the case
PENULTIMATE ROTAMER LIBRARY 391
TABLE I. (Continued.)
1 2 3
1 1 2 2 3 3 1/2 Width at
Name # % Alpha Beta Other mode comm. mode comm. mode comm. 3 rangee 1/2 Height
Methionine
ptp 12 2% 1% 3% 3% 68 62 ⫺167 180 88 75 11 17 12
ptm 17 3% 1% 6% 4% 67 62 174 180 ⫺78 ⴚ75 9 10 9
tpp 30 5% 8% 2% 5% ⫺177 ⴚ177 66 65 75 75 10 15 15
tpt 9 2% 1% 4% 1% 179 ⴚ177 67 65 ⫺179 180 9 8 9
ttp 28 5% 7% 7% 2% 176 ⴚ177 178 180 73 75 10 11 11
ttt 17 3% 5% 2% 2% 180 ⴚ177 171 180 174 180 9 9 19
ttm 36 7% 3% 10% 8% ⫺177 ⴚ177 176 180 ⫺78 ⴚ75 10 10 13
mtp 92 17% 22% 10% 17% ⫺68 ⴚ67 177 180 72 75 10 12 14
mtt 43 8% 9% 8% 7% ⫺67 ⴚ67 177 180 ⫺178 180 10 13 15
mtm 58 11% 12% 11% 9% ⫺67 ⴚ67 ⫺177 180 ⫺76 ⴚ75 12 11 16
mmp 15 3% 3% 1% 4% ⫺64 ⴚ65 ⫺63 ⴚ65 103 103 9 10 10
mmt 10 2% 0% 2% 3% ⫺63 ⴚ65 ⫺64 ⴚ65 180 180 12 14 19
mmm 105 19% 21% 16% 19% ⫺66 ⴚ65 ⫺60 ⴚ65 ⫺67 ⴚ70 11 13 16
86% 91% 84% 83%
472/550 175 112 185
Glutamate
pt-20° 80 5% 1% 9% 7% 63 62 ⫺175 180 ⫺18 ⴚ20 ⫺90 to 90 14 13 23
pm0° 32 2% 0% 0% 4% 71 70 ⫺79 ⴚ80 5 0 ⫺50 to 50 14 13 17
tp10° 91 6% 10% 2% 6% ⫺177 ⴚ177 65 65 13 10 ⫺10 to 90 14 13 17
tt 0° 350 24% 25% 42% 18% ⫺177 ⴚ177 178 180 2 0 ⫺90 to 90 14 14 30
tm-20° 17 1% 1% 1% 1% ⴚ177 ⴚ80 ⴚ25 ⫺50 to 10 13 13 15
mp0° 88 6% ⬍1% 2% 10% ⫺65 ⴚ65 85 85 ⫺3 0 ⫺60 to 60 14 13 25
mt-10° 484 33% 36% 29% 32% ⫺67 ⴚ67 177 180 ⫺10 ⴚ10 ⫺90 to 90 13 16 25
mm-40° 197 13% 19% 7% 12% ⫺65 ⴚ65 ⫺58 ⴚ65 ⫺40 ⴚ40 ⫺90 to 30 14 14 25
91% 92% 92% 90%
1339/1470 394 225 720
Glutamine
pt 20° 37 4% 1% 5% 6% 64 62 180 180 20 20 ⫺90 to 90 13 14 16
pm0° 15 2% 0% 1% 3% 70 ⴚ75 0 ⫺60 to 60
tp-100° 14 2% 4% 2% ⬍1% ⴚ177 65 ⴚ100 ⫺150 to 0
tp60° 78 9% 13% 9% 7% ⫺175 ⴚ177 64 65 60 60 0 to 90 14 15 24
tt 0° 140 16% 16% 29% 12% ⫺174 ⴚ177 173 180 ⫺5 0 ⫺90 to 90 14 13 40
mp0° 24 3% ⬍1% 1% 5% ⴚ65 85 0 ⫺60 to 60
mt-30° 304 35% 40% 26% 36% ⫺67 ⴚ67 177 180 ⫺25 ⴚ25 ⫺90 to 90 16 15 37
mm-40° 127 15% 12% 13% 17% ⫺66 ⴚ65 ⫺60 ⴚ65 ⫺40 ⴚ40 ⫺95 to 0 16 18 26
mm100° 22 3% 4% 1% 2% ⴚ65 ⴚ65 100 0 to 150
88% 89% 86% 88%
761/863 229 137 395 2
Aspartate range
p-10° 203 10% 1% 2% 13% 61 62 ⫺4 ⴚ10 ⫺90 to 0 9 19
p30° 194 9% 1% 5% 12% 65 62 9 30 0 to 90 8 14
t0° 438 21% 8% 44% 20% ⫺176 ⴚ177 1 0 ⫺50 to 50 12 30
t70° 118 6% 11% 7% 4% ⫺179 ⴚ177 65 65 50 to 90 12 18
m-20° 1088 51% 77% 38% 47% ⫺71 ⴚ70 ⫺15 ⴚ15 ⫺90 to 20 10 16
96% 97% 95% 96%
2041/2124 365 232 1444
Asparagine
p-10° 103 7% 0% 1% 10% 63 62 ⫺13 ⴚ10 ⫺90 to 0 8 9
p30° 132 9% ⬍1% 7% 12% 64 62 34 30 0 to 90 6 7
t-20° 177 12% 5% 21% 12% ⫺174 ⴚ174 ⫺20 ⴚ20 ⫺120 to 0 5 21
t30° 228 15% 13% 18% 15% ⫺168 ⴚ177 31 30 0 to 80 14 22
m-20° 580 39% 65% 28% 33% ⫺71 ⴚ65 ⫺23 ⴚ20 ⫺60 to 10 10 20
m-80° 118 8% 8% 9% 8% ⫺71 ⴚ65 ⫺76 ⴚ75 ⫺100 to ⫺60 9 9
m120° 58 4% 3% 3% 4% ⫺64 ⴚ65 132 120 60 to 160 9 18
94% 95% 88% 95%
1396/1490 293 179 924
394 S.C. LOVELL ET AL.
TABLE I. (Continued.)
1 2
1 1 2 2 1/2 Width at
Name # % Alpha Beta Other mode comm. mode comm. 2 range 1/2 Height
Isoleucine
pp 10 1% ⬍1% 1% ⬍1% 62 100
pt 216 13% 4% 13% 22% 61 62 171 170 10 10
tp 36 2% 2% 1% 4% ⫺169 ⴚ177 66 66 13 11
tt 127 8% 1% 8% 14% ⫺174 ⴚ177 167 165 13 11
mp 19 1% 0% 2% 1% ⴚ65 100
mt 993 60% 81% 58% 41% ⫺66 ⴚ65 169 170 10 10
mm 242 15% 10% 16% 17% ⫺57 ⴚ57 ⫺59 ⴚ60 10 10
99% 99% 98% 99%
1643/1667 496 629 518
Leucine
pp 21 1% ⬍1% 2% 1% 62 80
tp 750 29% 30% 36% 23% 177 ⴚ177 63 65 10 10
tt 49 2% 1% 3% 1% ⫺172 ⴚ172 147 145 120 to 180 9 9
mp 63 2% 1% 5% 2% ⫺85 ⴚ85 66 65 45 to 105 11 14
mt 1548 59% 62% 46% 66% ⫺65 ⴚ65 174 175 11 11
93% 95% 93% 93%
2431/2602 836 644 951
Histidine
p-80° 51 9% 0% 6% 13% 60 62 ⫺75 ⴚ75 ⫺120 to ⫺50 10 12
p80° 26 4% 0% 4% 6% 61 62 78 80 50 to 120 13 10
t-160° 31 5% 5% 14% 1% ⫺178 ⴚ177 ⫺163 ⴚ165 150 to ⫺120 12 20
t-80° 64 11% 17% 9% 9% ⫺173 ⴚ177 ⫺81 ⴚ80 ⫺120 to ⫺50 10 22
t60° 94 16% 24% 17% 12% ⫺178 ⴚ177 62 60 50 to 120 13 19
m-70° 174 29% 26% 30% 30% ⫺60 ⴚ65 ⫺69 ⴚ70 ⫺120 to ⫺30 11 23
m170° 44 7% 9% 3% 9% ⫺63 ⴚ65 165 165 120 to ⫺160 10 16
m80° 78 13% 14% 10% 14% ⫺66 ⴚ65 83 80 50 to 120 11 18
94% 94% 92% 95%
562/598 124 143 295
Tryptophan
p-90° 67 11% 2% 13% 14% 58 62 ⫺87 ⴚ90 ⫺130 to ⫺60 12 10
p90° 34 6% 1% 9% 6% 60 62 92 90 60 to 130 12 8
t-105° 100 16% 27% 10% 14% 178 ⴚ177 ⫺105 ⴚ105 ⫺130 to ⫺60 16 14
t90° 109 18% 28% 14% 15% ⫺178 ⴚ177 88 90 0 to 100 10 11
m-90° 31 5% 0% 7% 7% ⫺70 ⴚ65 ⫺87 ⴚ90 ⫺130 to ⫺60 9 12
m0° 48 8% 15% 2% 8% ⫺66 ⴚ65 ⫺4 ⴚ5 ⫺40 to 20 9 20
m95° 195 32% 22% 43% 29% ⫺69 ⴚ65 95 95 60 to 130 11 19
94% 95% 98% 92%
584/618 140 175 269
Tyrosine
p90° 182 13% 1% 21% 12% 63 62 89 90 60 to 90, ⫺90 to ⫺60 13 13
t80° 486 34% 55% 25% 30% 176 ⴚ177 77 80 20 to 90, ⫺90 to ⫺75 11 14
m-85° 618 43% 26% 50% 45% ⫺65 ⴚ65 ⫺87 ⴚ85 50 to 90, ⫺90 to ⫺50 11 21
m-30° 124 9% 15% 4% 9% ⫺64 ⴚ65 ⫺42 ⴚ30 ⫺50 to 0, 0 to 50 11 18
98% 97% 99% 97%
1410/1443 290 468 652 (for Tyr, Phe 90° ⫽ ⫺90°)
Phenylalanine
p90° 202 13% 1% 24% 11% 59 62 88 90 60 to 90, ⫺90 to ⫺60 11 11
t80° 522 33% 57% 18% 29% 177 ⴚ177 80 80 20 to 90, ⫺90 to ⫺75 13 17
m-85° 697 44% 29% 51% 47% ⫺64 ⴚ65 ⫺83 ⴚ85 50 to 90, ⫺90 to ⫺50 12 17
m-30° 149 9% 12% 5% 11% ⫺64 ⴚ65 ⫺19 ⴚ30 ⫺50 to 0, 0 to 50 9 20
98% 97% 99% 98%
1570/1599 389 514 667
Proline
C␥ endo 379 44% 23% 54% 43% 30 30 15 to 60 7
C␥ exo 372 43% 68% 28% 44% ⫺29 ⴚ30 ⫺60 to ⫺15 6
cis, C␥ endo 56 6% 0% 1% 7% 31 30 15 to 60 5
93% 91% 84% 94%
807/928 20 57 730
PENULTIMATE ROTAMER LIBRARY 395
TABLE I. (Continued.)
1
1 1 1/2 Width at
Name # % Alpha Beta Other act. com.a 1/2 Height
Threonine
p 1200 49% 25% 31% 65% 59 62 10
t 169 7% 0% 13% 6% ⫺171 ⴚ175 6
m 1062 43% 74% 55% 29% ⫺61 ⴚ65 7
99% 100% 99% 99%
2431/2447 395 672 1364
Valine
p 169 6% 2% 8% 8% 63 63 ⫽“177”f 8
t 1931 73% 90% 72% 63% 175 175 ⫽“⫺65” 8
m 526 20% 7% 20% 28% ⫺64 ⴚ60 ⫽“60” 7
99% 100% 99% 99%
2626/2649 622 1080 924
Serine
p 1201 48% 33% 36% 55% 64 62 10
t 541 22% 22% 34% 18% 178 ⴚ177 11
m 714 29% 44% 29% 25% ⫺65 ⴚ65 9
98% 98% 100% 98%
2456/2498 350 485 1621
Cysteine
p 64 23% 5% 23% 34% 55 62 14
t 74 26% 20% 45% 21% ⫺177 ⴚ177 10
m 142 50% 75% 32% 43% ⫺65 ⴚ65 11
99% 100% 100% 98%
280/285 85 65 130
a
“mode” indicates the peak of the smoothed distribution, “comm.” indicates the common-atom value (given in bold face).
b
Mode and 1/2 width at 1/2 height values are not given for minor rotamers.
c
⬍1% indicates a value between 0.5% and 0%. 0% indicates no observations.
d
Total number of rotameric side chains/Total number that pass all data filters.
e
Ranges used in determining frequencies are normally common-atom values ⫾30°. Exceptions (always in the terminal value) are listed here.
f
Standard conventions28,29 result in angles being named differently for Val than for Thr and Ile. These figures indicate the equivalent angles.
Å. Clashscore was not considered, and the cutoff on we assigned the B-factor of the bonded nonhydrogen atom.
R-factor was relaxed to 24% in order to encompass typical If atomic displacement values (U2) were listed (e.g., 1ETN,
structures in each resolution range. Controls were allowed 2ER7), which produce numbers ⬍1.0, these were con-
to be related to proteins in the primary database, but in verted to the more common temperature factor (B) using
order to exclude information from those or any higher- the relationship B ⫽ 82 U2. The whole side-chain was
resolution structure, a control had to be the highest omitted from our database if it had a single atom with a
resolution structure within its protein family at the time it B-factor ⱖ40.
was solved. The resulting annotated list of 240 database Water molecules with occupancies ⬍0.67 were not consid-
files at 1.7 Å or better and 78 controls at 1.8 –2.5 Å is ered, and side chains were omitted if any atom had an
available in electronic form as supplementary material occupancy of ⬍1.0 or an alternate conformation flag. It is
(http://www.interscience.wiley.com/jpages/0887-3585/ our experience22 that these residues show steric clashes
suppmat/index.html) or from our website at http://kinemage. significantly more frequently than those modeled with a
biochem.duke.edu. single conformation. The B conformation of an A/B alter-
nate pair is particularly prone to clash or even to have
Removal of Uncertain Residues highly deviant covalent geometry: indeed, these residues
High B-factors can arise for a number of reasons, but all were not checked by the quality control programs used by
indicate uncertainty in the position of the deposited coordi- the PDB. For finding clashes with other side chains we
nates. Therefore, we applied a B-factor cutoff, as discussed used the A conformation only.
in the Results section and previously.22 B-factors are given Residues were also rejected if any atom had a non-H-
in almost all PDB files; the only complication is checking bonded atomic overlap of 0.4 Å or more, since either it or its
for three types of cases where B-factors are ⱕ1. If no neighbor must be incorrect. Van der Waals overlaps were
B-factors were assigned (which is very unusual at high determined by all-atom contact analysis as implemented
resolution), that field is set either to zero or to 1.0; we in the Probe program.22 Detailed analysis of van der Waals
omitted such files. When explicit H atoms are included interactions is not meaningful unless all atoms, including
sometimes their B-factors are set to zero; for such cases, hydrogens, are used. Therefore, prior to running Probe, H
396 S.C. LOVELL ET AL.
atoms were added geometrically to protein, nucleic acid, Half-width at half-height can be converted to standard
and heterogen molecules; their positions were optimized, deviation, if the distribution is normal, by dividing by
including combinatorial analysis of local H-bond networks, 1.1774 (for a normal distribution the height at 1 is 0.606,
using the program Reduce.23 Hydrogens were not added to so that the half-width is larger than ). We have found,
water molecules, however, which our algorithms allow to however, that almost all of our rotamer distributions are in
provide either H-bond donor or acceptor properties as fact platykurtic (i.e., flatter-topped and steeper-sided than
needed. Note that individual database side chains were a normal distribution), so that a standard deviation calcu-
tested for clashes within the protein structure, including lated from the set of points would be even smaller than the
any bond angle distortions, while later checks of proposed above estimate. The average half-width at half-height is
rotamer conformations were done with ideal-geometry given in Table I, but, for the analysis of skew, separate
side chains in isolation. half-widths above and below the mode are listed in the
The optimal orientation choice for each amide or imida- supplementary material.
zole was determined by Reduce, considering both H-bond For all amino acids scatter-plot kinemages of the raw
criteria and all-atom van der Waals overlaps, including angle distributions, the modes, and the contours31 for the
polar hydrogens.23 Reduce flags each Asn, Gln, or His as summed mask functions were displayed in Mage.32–34 For
either K (keep in original orientation), F (flip by 180°), X Arg and Lys, a 3-D kinemage was made for each 1 (p, m,
(unknown, i.e., similar score in either orientation), or C and t). Multiple peaks and asymmetries were evaluated;
(clashing in both orientations). Only those in the “keep” or since each point carries its identity (file and residue) in the
“flip” categories were used in the Asn, Gln, or His rotamer kinemage, a sample of outliers was identified and exam-
distributions. ined in the context of their 3-D structures. Bin boundaries
for counting frequencies were defined as the common-atom
Determination of Distribution Modes
angle ⫾30° rounded to the nearest 5° unless listed explic-
Dihedral angles were calculated with Dang.31 Rotamer itly in Table I; those exceptions are wider bins for angles
positions were defined as the mode, or highest peak, of the with broad distributions and a few narrower bins to avoid
smoothed distribution in space (see Results section for rotamer overlap. Since the bins do not include all of torsion
rationale). Smoothing was done by placing a Gaussian space, the rotamer probabilities sum to a number less than
mask over each data point and summing the mask values 100%, which is considered the “rotamericity”7 of that
at grid points spaced every 1°. The mask had a half-width residue type.
at half-height of 1° for one-dimensional data, 2° for two- Once the boundaries were determined, the probability
dimensional, 4° for three-dimensional, and 6° for four- (% occurrence) for each rotamer overall and in each
dimensional data. Regardless of the dimensionality, each secondary structural class (helix, sheet, and other) was
mask had an integral of 1. The rotamer was then defined found. Secondary-structure assignments were from a modi-
as the local maximum of the sum of masks. fication of DSSP as implemented in ProCheck;18 a residue
Once the set of rotamers was defined as modal values, was counted as helical if it is given the strict H assignment
each residue type was re-examined to see which rotamers by ProCheck and is not in the first three H’s of a run (the
could satisfactorily be defined as having some common less restrictive first turn), as beta if given the E assign-
atoms (produced by common values). The criteria were ment, and as falling into the “other” category in remaining
cases. Left-handed (L) residues are those with 0° ⬍⬍175°.
1) whether the data unequivocally demonstrated that the Significant changes in the rotamer frequencies are dis-
angles in question differed or whether a common value cussed in the Results section. On testing for differences in
could fit all acceptably; modal positions as a function of secondary structure,
2) the extent to which the atomic contacts (for ideal however, none shifted significantly except for Asn and Asp,
geometry) occurred at similar angles;
which were, therefore, given a set of backbone-dependent
3) whether the conformations had an inherent symmetry
rotamers (Table II).
(e.g., to set absolute values equal for mm and pp if they
Within especially broad distributions, a few additional
made no nonequivalent backbone interactions, or to use
sample points (Table III) were chosen by visual inspection
180° as the default t angle if the preceding angle was
of the 2-D or 3-D distributions. These sample points were
also t). 1 angles were also similarly considered across
defined such that they lay within the highly populated
classes of residues.
regions of the distribution tail, 30 – 60° away from the
position of the actual rotamer and in a nonclashing
Half-widths (analogous to standard deviations as used
conformation. Common-atom angles are used whenever
with means) were defined as the angular distance plus or
this gives a reasonable agreement with the data.
minus from the modal value at which the summed mask
Each amino acid was built using Engh and Huber35
function is half of the maximum for that peak. The
ideal geometry in a coordinate system with the C␣ at the
artifactual broadening caused by the mask width was
origin, N along the X-axis, and C in the XZ plane.
corrected according to the following scheme:
Hydrogens were added with Reduce23 and a kinemage
corrected width made in Prekin32 with rotatable angles. Each defined
rotamer was examined in Mage, at both the modal and
⫽ 公((distribution width)2 ⫺ (mask width)2) common-atom values, with all-atom contact dots calcu-
PENULTIMATE ROTAMER LIBRARY 397
1 1 2 2 1 2 1 2
Name # % mode comm.a mode comm. range range 1/2 width at 1/2 height
Aspartateb
␣m-10° 283 75% ⫺72 ⴚ70 ⫺14 ⴚ10 ⫺100 to ⫺40 ⫺60 to 10 9 11
␣t60° 72 19% ⫺176 ⴚ177 63 60 155 to ⫺145 ⫺20 to 90 12 14
355 95%
m-20° 92 38% ⫺66 ⴚ65 ⫺21 ⴚ20 ⫺95 to ⫺35 ⫺90 to 20 10 20
p10° 14 6% 65 65 13 10 35 to 95 ⫺20 to 40 9 11
t-10° 130 53% ⫺176 ⴚ177 ⫺10 ⴚ10 155 to ⫺145 ⫺90 to 90 9 20
236 97%
Lm-30° 54 61% ⫺64 ⴚ65 ⫺29 ⴚ30 ⫺95 to ⫺35 ⫺90 to 0 9 16
Lt30° 26 29% ⫺162 ⴚ165 43 30 170 to ⫺130 0 to 60 9 18
80 90%
Asparagineb
␣m-20° 204 66% ⫺72 ⴚ70 ⫺17 ⴚ20 ⫺100 to ⫺40 ⫺60 to 10 9 14
␣m-80° 26 8% ⫺72 ⴚ70 ⫺81 ⴚ80 ⫺100 to ⫺40 ⫺100 to ⫺60 13 17
␣m120°c 9 3% ⴚ70 120 ⫺100 to ⫺40 60 to 160
␣t60° 38 12% ⫺175 ⴚ177 64 60 155 to ⫺145 30 to 80 11 21
␣t-60° 14 5% ⴚ172 ⴚ60 155 to ⫺145 ⫺120 to 0
291 94%
p60° 17 8% 65 60 35 to 95 ⫺20 to 100
m-50° 74 36% ⫺66 ⴚ65 ⫺49 ⴚ50 ⫺95 to ⫺35 ⫺90 to 0 10 26
m120° 7 3% ⴚ65 120 ⫺100 to ⫺40 60 to 160
t10° 77 38% ⫺179 ⴚ177 11 10 150 to ⫺150 ⫺90 to 90 6 10
175 86%
Lm-30° 91 55% ⫺65 ⴚ65 ⫺30 ⴚ30 ⫺95 to ⫺35 ⫺70 to 10 9 18
Lt30° 58 35% ⫺166 ⴚ165 32 30 165 to ⫺135 0 to 60 10 12
149 90%
a
“mode” indicates the peak of the smoothed distribution, “comm.” indicates the common-atom value (bold face).
b
For “other” secondary structural class, use the backbone-independent rotamers.
c
Mode and half-width at half-height are not given for minor rotamers.
lated interactively by Probe.22,31 Rotamers that exhibited correspond to cluster peaks or energy minima and are not,
any significant van der Waals overlaps within the side therefore, rotamers.
chain or with the fixed N, C, or H␣ atoms were analyzed in Rotamer angles are listed both as the modal (peak)
detail and are discussed individually in the Results sec- values found for the individual distribution and also in a
tion. version which optimizes common values (and therefore
common atom positions) among rotamers with related
RESULTS geometries, such as ⫾85°, ⫾105°, ⫾175°, or 180° for the
The complete side-chain rotamer library is given in various classes of Arg 4 values. Use of common-atom
Table I, including frequencies, secondary structural prefer- values improves efficiency in combinatorial calculations
ences, angle values, and half-width at half-height (refer such as Monte Carlo or Dead-End Elimination repacking
to Methods for rotamer nomenclature). Amino-acid types methods (for example13). Common-atom values have
are listed in order of their number of angles, with even more important beneficial effects, however, both for
backbone-dependent rotamers for Asp and Asn in Table II. calculations and for fitting side chains in structure determi-
For each amino acid, the library includes only those nations, by preventing a choice between rotamers based on
rotamers which occur consistently at high resolution and differences that are not statistically significant. Rotamer
low B-factor and which show suitable clustering around positions have usually been given to the nearest degree
plausible local energy minima. For Arg, Lys, and Met, this simply because those are the units in which they are
results in rotamers for 1/3 to 1/2 of the total staggered- measured, even though the best cases are not known more
angle combinations (34/81, 27/81, and 13/27, respectively). accurately than 2–3° and the rarer ones only to perhaps
Table III additionally lists a small set of conformations 10°. Omitting cases with additional confounding problems
which may be used to sample the occupied regions of (Asn, Gln, Asp, Glu), that level of accuracy can be substan-
torsion space more uniformly, chosen to be well inside the tiated by comparing the most up-to-date compilations (the
tails of the few especially wide distributions. For use in a present work and that of Dunbrack and Cohen11), or by
method with a radius of convergence significantly smaller comparing what should be symmetrically equivalent cases
than 20 –30°, a more closely spaced grid of sample points within either of these studies. Table I quotes modal values
could be defined throughout the populated regions of to the nearest degree for each rotamer in order to docu-
space. It should be noted that extra sample points do not ment the data (except when there were ⬍10 observations).
398 S.C. LOVELL ET AL.
TABLE III. Additional Sample Points in Torsion Space all of its residues unless missing atoms mean the angles
are undefined. However, within a given structure the
Name 1 2 3
quality of the electron density map often varies greatly,
Glutamate and the less reliable regions can easily be identified by
Spt-60° 62 180 ⴚ60
high B-factors, alternate conformations, or low occupan-
Spt60° 62 180 60
Stt-60° ⴚ177 180 ⴚ60
cies. Significant atomic overlaps also indicate local prob-
Stt60° ⴚ177 180 60 lems. We find that the use of these local quality indicators
Smt-60° ⴚ67 180 ⴚ60 is crucial. For instance, we have shown22 that a side-chain
Smt60° ⴚ67 180 60 with a B-factor above 50 is 10 times more likely to have a
Smm0° ⴚ65 ⴚ75 0 bad steric clash than one with a B-factor in the range of
Glutamine 10 –20.
Spt-60° 62 180 ⴚ60 The B-factor cutoff is both the single most powerful and
Spt60° 62 180 60 the simplest filter applied in this study. The effect on
Stt-60° ⴚ177 180 ⴚ60
cleaning up the data, as shown for Lys in Figure 2, is
Stt60° ⴚ177 180 60
Smt-60° ⴚ67 180 ⴚ60
dramatic. Surprisingly, previous to our work, a B-factor
Smt60° ⴚ67 180 60 cutoff had only once been applied in published analyses of
Aspartate side-chain conformations,27 and that study was not aimed
Sp-50° 62 ⴚ50 at producing a library of rotamers. Absolute values of
St-30° ⴚ170 ⴚ30 B-factors are not directly comparable between structures
Sm-60° ⴚ65 ⴚ60 due to variations in data reduction, solvent treatment,
Asparagine estimates of intensity falloff, and application of either
Sp-50° 62 ⴚ50 global or local B-restraints. It is possible to compensate
St-80° ⴚ174 ⴚ80
partially for such differences by normalizing B-factors by
Phenylalanine
Sm30° ⴚ85 30 the mean and standard deviation in each structure.36,37
Tyrosine We feel it is preferable, however, to use the simpler
Sm30° ⴚ85 30 absolute values, for two reasons: a high B-factor will smear
out calculated electron density, unless artificially resharp-
ened, no matter what its origin, while differences in the
However, we feel that the common-atom values (boldface) actual level of molecular disorder are often larger than the
are preferable for almost all uses (perhaps augmented methodological effects. Many atomic-resolution structures
with the suggested sample points given in Table III) and have no disordered loops and thus, for good reason, no high
that they are likely to prove more nearly correct when B-factors. Within our data set, the five structures with the
judged by more accurate future data sets. lowest average B had a mean clashscore (number of
Additional information is available in electronic form. clashes ⱖ0.4 Å per 1000 atoms) of only 4.8, while the five
As supplementary material to this article, there is a more with highest B had a mean clashscore of 23.3, confirming
complete version of Table I which includes explicit bin that absolute B-factors are a meaningful indicator of
boundaries and the (sometimes asymmetric) half widths accuracy. For a comparison test we normalized the B-
for all angles; a version of the table with common atoms factors for our data set, finding that a cutoff of 1.91
and with sample points folded in, which we recommend for standard deviations removes the same number of residues
applications such as dead-end elimination; and the list of as a simple B-factor cutoff of 40. Overall, as judged by their
files that make up the high and medium resolution data- efficiency at excluding problematic residues, the methods
bases. On our website (http://kinemage.biochem.duke.edu) are nearly equivalent, since about 50% of their omitted
there are PDB-format coordinate files and kinemages32–34 residues have a serious clash. However, if we compare the
with rotatable angles for all amino acids, in standard subset that differs between the two methods, those uniquely
geometry in a common coordinate system; PDB-format discarded by the absolute B-cutoff have clashes in 34% of
coordinate files and kinemages with standard geometry cases, whereas those uniquely discarded by the normal-
side-chains in rotameric conformations; and the actual ized cutoff clash in only 27% of cases. Methodological
multidimensional data distributions in kinemage format, effects are certainly larger at low resolution, but for our
with bins and assigned rotamers marked and each data purposes and our data set, an absolute cutoff has the
point identifiable. Files suitable for use with the crystallo- advantage in performance as well as in simplicity.
graphic fitting programs O16 and XtalView17,30 are avail- B-factors for an individual side chain may be high for
able both as supplementary material and from our web- various reasons, including thermal motion, static disorder,
site. or phase problems. One of the most important reasons,
however, is the possibility of a side-chain misfitting.
Effect of Filters Refinement of a misfit side chain can either move the atom
It is common practice for compilers of rotamer libraries back into density or increase the B-factors, depending on
to use only high-resolution structures, most often with a the details of the local environment and the weights of the
cutoff at 2 Å rather than the 1.7 Å used here. It is also B-factor and other restraints. Whatever the cause, the
common practice, once a structure has been chosen, to use conformation of a high B-factor side chain is less reliable
PENULTIMATE ROTAMER LIBRARY 399
⫾165°.36 With such discrepant definitions, means some- ing strand.41,42 The aromatic rotamers with 2 near zero
times merely represent the centers of bins, giving little are significantly more common in helix,7 while Ile tt is
information about preferred conformations. quite common in “other” but essentially forbidden in helix
The third advantage of modes arises when two or more because of a clash with the backbone.
peaks are close together, such as for the leucine 1/2 case There are, however, two amino acids (Asp and Asn) for
discussed below. Drawing bins at 0°, 120°, and -120° puts which modal positions as well as probabilities are highly
two separate peaks in the tt and in the mp regions. The dependent on backbone conformation. Both are small,
mean for each of these two regions lies in between the polar, and interact strongly and specifically with the local
clusters, which has resulted in clashing Leu tt and mp backbone. Their backbone-dependent rotamers are given
rotamers for every previous library. In contrast, determin- in Table II for the ␣, , and left-handed classes (“other”
ing the modes shows two distinct peaks 60 –70° apart rotamers are essentially the same as the backbone-
which can be analyzed separately, as done below. Genu- independent ones), while the distributions and clustering
inely distinct peaks occur in close proximity to each other for Asn have been published previously.25 Asn is in a
even more often if individual angles are analyzed in one left-handed backbone conformation (0°⬍⬍175°) in 11% of
dimension, producing misleading mean values. In those examples, the highest occurrence for any non-Gly residue,
cases, however, modes are helpful but the best solution is and Asp is the next highest with 4%; in both cases only two
use of the appropriate multidimensional treatment. tightly clustered conformations are found. Asp rotamers
Lastly, if the observed distribution is converted to an are essentially the same as Asn except truncated to ⫾90°
energy equivalent, it is the mode rather than the mean in 2 by the symmetry of the carboxyl group. Local H-bonds
that corresponds to the lowest-energy conformation. are influential, and are similar in both cases, except for the
A disadvantage of the modal-value approach is that it Asn N␦ i-4 H-bond in ␣-helix which is, of course, not
requires smoothing to determine the mode reliably (see possible for Asp and results in the absence of the ␣m-80°
Methods). Then, to determine a correct width for the rotamer for Asp.
smoothed peak, the effect of the smoothing function must It is likely that further division according to backbone
be subtracted, which in this implementation means sub- dependence would make additional trends apparent: for
tracting the mask width as the root difference of squares. example, dividing residues on -strands according to
In addition, for clusters with low total population, both whether the neighboring strands are parallel or anti-
mean and mode are susceptible to statistical fluctuations, parallel. However, for the current data that would involve
but the mode is somewhat more so. The common-atom further division into classes with too few members for
angle definitions, although adopted for other reasons, also statistical validity.
avoid most of the small-population problems.
It has recently been found 37 that rotamer mean values Lysine – The Statistically Simple Side Chain
change systematically with resolution, at least in part
Lys and Arg, with four angles each, have 81 possible
because of averaging between unresolved alternate confor-
staggered rotamers; Met, with three angles, has 27. It is
mations, which produces skewed distributions at lower
worth exploring whether a compact description would
resolution. In one case (Leu), part of the shift is caused by a
suffice, with just a few rules that applied to multiple cases.
misfitting more common at low resolution (see below), but
The attempt failed for Arg and Met, where analogous sets
the general point remains valid and important. Although
of rotamers show relative frequencies differing by factors
anomalous behavior of the means uncovered this interest-
of three or more, presumably responding to circumstances
ing relationship, the modal values as seen in the data
such as nonuniform patterns of possible H-bond partners.
described in that study do not shift from the rotameric
However, Lys rotamers show reproducible patterns of
positions, again suggesting that modes are preferable for
relative frequencies that can be accurately predicted using
most purposes.
only a few physically reasonable parameters, as shown in
Backbone Dependence—Asp and Asn Table IV. Two parameters are the relative preferences for
1 t (0.65) and p (0.13) as a fraction of m. Two additional
All side chains were examined for backbone dependence parameters are penalties for the “syn-pentane”43 conflicts
of their rotamers. For most amino acids, the relative that occur when adjacent gauche angles change signs (mp
frequencies of some rotamers changed significantly be- or pm); one of those penalty factors (0.1) applies for 2/3
tween secondary-structural classes, but the position of the or 3/4 on the unbranched side chain, and a more severe
peaks did not. Therefore, Table I lists the probabilities in one (estimated as 0.05) applies for 1/2 which has back-
␣,  and “other” classes, along with the position of the bone atoms on one end. Such syn-pentane cases also result
rotamer they share. The largest and most consistent in shifted values to avoid the clash, such as 1 ⫽ ⫺90° for
frequency changes are the often-noted lack of 1 p confor- Lys mptt, 3 ⫽ 103° for Met mmp, 2 ⫽ ⫺80° for Glu pm0°
mations in helix for all amino acids other than Ser and or tm-20°, or 2 ⫽ 100° for Ile pp or mp.
Thr.5 Ser, Thr, Asp, and Asn have quite high probabilities The most interesting parameter is the penalty for hav-
of 1 p in the “other” secondary structural class, primarily ing a gauche angle in 2,3, or 4. It can be estimated
because of the H-bonding in pseudoturns and helix N- separately for one-gauche, two-gauche, and three-gauche
caps.25,39,40 For Phe and Tyr, 1 p is more common in  rotamers relative to the cases where 2,3, and 4 are
sheet because of favorable interactions with the neighbor- trans, avoiding any comparisons that contain mp or pm
PENULTIMATE ROTAMER LIBRARY 401
combinations. Those factors are found to be 0.21, (0.22) lower, not higher, at high B and is significantly lower for
squared, and (0.20) cubed, suggesting that the parameters 4, consistent with its small physical size but not with an
are independent and simply multiplicative. Also, the experi- effect from increasing uncertainty. In contrast, Met 3
mentally measured energy difference of 0.89 kcal/mol prefers gauche by more than 2:1, since the gauche form not
between gauche and trans butane44 can be converted only does not clash, but actually has favorable H-atom van
using the Boltzman relationship der Waals contacts.22
Presumably the reason Lys behaves in such a statisti-
E ⫽ RT ln P ⫽ 0.592 ln P ⫽ 1.364 log P cally simple fashion is that although the end makes
charged H-bonds, the geometry of those interactions is
to give a gauche factor of 0.22.
relatively unconstrained, with the side-chain having so
An overall least-squares fit of the parameter values to
many degrees of freedom that it can usually get to its
all 81 Lys rotamer frequencies was done by minimizing the
appropriate position without strain. Given that the high-
sum of squares using Mathematica.45 Since the Lys NH3⫹
resolution, low-B lysines very seldom have any angles as
terminal group is smaller than a methyl, six parameters
much as 30° from staggered and populate the less-favored
were used, and the gauche penalty was fit with one value
rotamers only as often as dictated by their pseudo-energy
for 2 and 3 and a separate value for 4, which came out as
differences, it seems completely unjustified ever to fit
0.20 and 0.23, respectively. The predictions in Table IV are
partially disordered lysines with eclipsed angles or poor
obtained by multiplying all applicable factors for each
rotamers.
rotamer; the correlation coefficient between predicted and
observed (Pearson’s r) is .993. An estimate of the relative Systematic Fitting Errors—Effects on Leu, Val, Asn,
rotamer pseudo-energies can be made by adding up the Gln, and Met
energies for each applicable penalty factor, so that, for
instance, mpmp acts as though it is 6 kcal/mol less favored It has long been known that there are enhanced probabili-
than mttt and does not occur in our data set. ties of making particular types of errors when fitting
Note that the strong preference of Lys for trans angles side-chain conformations. For example, Fourier transform
(about 1:5:1 m:t:p) is real and is not a result of fitting termination ripples even at 2 Å resolution can make the
disordered side chains as trans. Not only are the ratios electron density at C rather weak, giving the density for
consistent across all rotamers, but also the contrast is Val or Thr a flat, barlike shape which can be fit equally
402 S.C. LOVELL ET AL.
Fig. 5. a: Correlation of rotamer frequency with B-factor for both genuine and
misfit Leu rotamers. B-factor bins were constructed to contain the same number of
points in each bin for the whole distribution. The % frequency of the rotamer in each
B-factor bin is plotted in the Y-direction, at the X position of the mean B-factor for that
bin. The lower panel is an enlargement of the bottom section of the main plot to show
more clearly the slope of the lines for the rarer rotamers. Systematically misfit
rotamers (tt* and mp*) are indicated by open symbols and red lines. b: A comparison
of the structures and their contacts for genuine rotamers (left) versus their misfit
partners (right). Blue and green dots indicate positive van der Waals interactions,
yellow lines indicate modest (still favorable) van der Waals overlaps, and orange or
red lines indicate van der Waals clashes, as calculated with Probe and displayed in
Mage.
and 40% for mp*. A mixture of conformations could also should be good evidence that the nonrotamer is a better fit
produce inverted B-factors, but not just for the suspect to the data than any rotamer, there should be a structural
rotamers. Therefore, the observed patterns are most consis- reason for adoption of that conformation, and any steric
tent with incorrect C␥ positions for tt* and mp*. clash should be avoidable with only modest bond angle
The strongest piece of evidence for rotamer correctness distortion.
is a positive correlation with map quality (i.e., either
resolution or B-factor), whereas a misfit conformation Proline and Disulfides – Special Cases
should correlate negatively. Figure 5a shows the variation Proline ring-pucker states can be treated as equivalent
of Leu rotamer occurrence with B-factor. Those rotamers to rotamers, since they alter the backbone conformation
we define as genuine become more common as B-factor only very slightly. Most rotamer libraries, if they include
decreases, whereas the flipped conformations tt* and mp* Pro, treat it as having three conformations: C␥-endo (or
become less common. There are two explanations for this: “up”), C␥-exo (or “down”), and planar.6,9,16,17 Some force
either misfitting results in a higher B-factor, or flexibility fields and refinement methods allow puckers also at other
in the side-chain results in poor electron density which is ring atoms (especially C), and such cases occur in our
easy to fit incorrectly. Either explanation suggests an database. However, it has been argued convincingly that
error. Since our data covers a limited range of resolution, Pro has only two preferred puckers, rather than three or
we selected additional structures from 1.8 to 2.5 Å resolu- more;38 the planar and C pucker states are absent at high
tion (see Methods); plotting Leu rotamer frequencies vs resolution in small-molecule structures. We also found in a
resolution shows the same pattern as Figure 5a but with previous study that long-range clashes are substantially
somewhat lower slopes. decreased by substituting either the C␥-endo or C␥-exo
For all of the above reasons, we conclude that the mp* states for planar or C puckers.22 We, therefore, treat Pro
and tt* conformations are very unlikely to be correct. We as having only two acceptable puckers, which occur in the
simply omit them from our data rather than transforming present data in equal numbers, clustered at values consis-
them to the two major peaks, because these misfittings tent with those found previously.38 Electron density that
usually cause movement of backbone and C, so that their appears planar is often observed for Pro rings in protein
transformed coordinates would be unreliable. After the structure determination; this is probably caused by averag-
backward leucines are omitted, there remains a valid ing between the C␥-endo and -exo pucker states and is
rotamer cluster in each of the tt and mp areas (Figure 4) better modeled as two alternate conformations, as often
which is clash-free and shows the correct B-factor depen- seen directly at higher resolution. Prolines preceded by cis
dence (Figure 5a). Because tt* and mp* are more numer- peptides are always observed to have the C␥-endo pucker.
ous at lower resolution and higher B, every previous Disulfides can also be surprisingly difficult to fit cor-
rotamer defined for Leu tt or mp has either been between rectly, since Fourier ripples from the sulfur atoms can
the two clusters or in the incorrect one. Not every indi- result in weak or shifted density for one or both Cs.
vidual Leu in tt* or mp* is necessarily a mistake, since Additionally, incorrectly fit disulfides are hard to fix
occasionally the environment might force the side chain because of multiple constraints. A strict resolution limit
into that particular strained conformation. Those conforma- and avoidance of high-B or alternate-conformation ex-
tions, however, are not rotameric. amples are, thus, very important for analyzing disulfides,
In an analogous manner to Leu, Met has several sets of but they are rare enough that our present database is too
conformations that are, in part, spatially isosteric for the small to deal with all five angles. A complete five-angle
sulfur and sometimes the C⑀.51 If 2 is p, then rotating 1 library will be presented in a subsequent paper, using a
by ⫹60° and 2 by ⫺120° will return the sulfur atom to its database chosen to be suitable for that purpose.
original position. If 2 is m, then rotation of 1 by ⫺60° and
2 by ⫹120° will have the same effect. Alternatively, if 3 is Clashing Rotamers and Bond Angle Distortions
m, rotation of 2 by ⫺40° and 3 by ⫹120° (⫹40° and ⫺120° Three types of clustered, well-populated, correctly B-
if 3 is p) puts the sulfur and C⑀ in partially isosteric dependent rotamers in our library are found to have
positions. Because the sulfur has no hydrogens, these moderate but significant atomic overlaps when built in
transformations do not result in steric clashes. They do, standard geometry. These are m-30° of Phe or Tyr, p30° of
however, involve changing angles by about 60°, resulting Asn or Asp, and those with 4 ⫾105° for Arg. In each case,
in near-eclipsed dihedrals. If the electron density is at all the bond angles of observed examples are opened out
ambiguous, two equally good conformations may seem slightly to ease those clashes, and there are also favorable
possible, but in reality the side chain should never be fit in H-bond or packing interactions that can help to compen-
an eclipsed conformation unless other possibilities have sate for the strained conformation.
been ruled out. In our experience a combination of using For Phe and Tyr, the 1/2 distribution is populated
lower map contours, examining the all-atom long-range throughout 2 when 1 is m, as shown by Schrauber et al.7
van der Waals interactions, and trying the valid rotamers and in our data. With this 1 the aromatic ring lies
can almost always suggest a Met conformation which is between the two smallest backbone atoms (H␣ and N), but
both in the density and rotameric. in ideal geometry, for a large range of 2 (from ⫺40° to
In structure determinations, appropriate criteria should ⫹50°), there is steric overlap between the edge of the ring
be met before fitting a side chain as nonrotameric. There and the backbone (H␦ to N). This overlap is 0.3 Å at the
PENULTIMATE ROTAMER LIBRARY 405
modal position (1⫽⫺64° 2⫽⫺19°) for Phe, which is below and H⑀ s, even with methyl rotation optimized. There are
our clash cutoff of 0.4 Å but not negligible. There are good no examples of this conformation in our database, but it
reasons, however, to believe that this conformation is occurs three times in the control set of 78 structures at
correct: an aromatic ring is difficult to fit incorrectly at 1.7 1.8 –2.5 Å resolution and also for the altered side chains in
Å resolution or higher, and aromatics tend to be found in some high-resolution mutant structures. It seems likely
the interior of the protein where the electron density is that the inclusion of this side-chain conformation in the
best. Most of the 273 residues in this rotamer show local library of such a popular refitting program has led to its
bond-angle distortions: the mean C␣-C-C␥ angle for appearance in structures where the density may be ambigu-
m-30° in Tyr and Phe is 115.6⫾2.0°, compared with ous. The three structures which exhibit this rotamer at B
113.6⫾2.4 for the overall distribution and with Engh and ⬍40 were solved at 1.8, 2.0, and 2.2 Å, suggesting that the
Huber standards of 113.9⫾1.8° for Tyr and 113.8⫾1.0° for increase in bias of structures towards rotamer libraries
Phe. With this 2° bond-angle enlargement (and often with happens even at relatively high resolutions, within the
an increase in the N-C␣-C angle as well), these residues range usually used for compiling other libraries.
do not, in fact, routinely show ring-to-backbone van der Other examples of previously defined rotamers with
Waals overlaps. Such side chains are usually well packed, prohibitive clashes and unsupported by our data have
which presumably both prevents other rotamers and pro- resulted from eclipsed angles such as Val 120°7 (Figure
vides favorable interactions to compensate for the modest 6b), from a peak at the average between two clusters such
bond-angle strain. as Leu tt,11 or perhaps most insidiously from data with
The overlap for the Asp or Asn p30° rotamer is 0.36 Å systematic fitting errors such as the Leu tt* or mp* cases
and is present for any standard geometry conformation discussed above. Other clashing rotamers, such as Arg
with 1 p. Bond angle increases are seen but are within the tmtp10 (Figure 6c) or Lys mmpt9 (Figure 6d), may be
standard deviation of the distribution (C␣-C-C␥ angle for included out of a desire for complete sampling of conforma-
all Asn is 112.5 ⫾ 2.1°; for p30° rotamer 113.6 ⫾ 2.1°). tional space or from poor behavior on energy minimization.
Almost all of the 132 p30° examples are H-bonded to i⫹2 In the present study, we have included data only from
or i⫹3 NHs in a pseudoturn or a helix N-cap arrangement, structures of 1.7 Å resolution or better and side chains only
which could offset the energy penalty of a small bond-angle with B ⬍40, to maximize the level of direct evidence for
distortion and/or a small remaining overlap. each individual conformation. Each defined rotamer was
The van der Waals overlap of the arginine mtm105° (or then required to pass both criteria of good occurrence and
ttm105°, mtp105°, ttp105°) rotamer in ideal geometry is of clustering in the distribution from the high-quality data
slightly larger (0.46 Å H to H␥), and we see 41 total and also of constituting a convincing local optimum for
examples. The size of this clash may indicate that the all-atom van der Waals analysis in ideal geometry; border-
radius we use for hydrogens on charged groups (1.0 Å) is line cases were decided by analyzing their behavior as a
still slightly too large. However, the 4 value of ⫾105° is in function of B-factor. We believe, therefore, that it is
the local optimum given an oppositely signed 3, while the unlikely that the present library contains any artificial
offset from the usual Arg 4 value of ⫾85° confirms that rotamers, thus breaking the feedback cycle.
these rotamers are, indeed, disfavored. Guanidinium H-
bonds provide both conformational restraints and compen- DISCUSSION
sating favorable interactions. Overall, these results show even more strongly than
before that protein side-chain conformations do indeed
Positive-Feedback Cycles for Bad Rotamers occur as well-defined rotamers. A library of rotamers is the
The real damage from including poor rotamers in a preferred form of analysis if two conditions for the behav-
library is that they can become self-fulfilling prophecies. ior of side-chains are met:
The cycle arises because almost any conformation will
occasionally be the best fit to some poorly connected piece 1) conformations occur as relatively tight clusters in mul-
of electron density, so the bad rotamer will begin to show tidimensional space, and
up in new structures. If later rotamer libraries include 2) the permissible cluster locations and probabilities can-
low-resolution or high B-factor residues, then that same not simply be determined by multiplying together the
bad conformation will seem confirmed as a valid rotamer. individual angle distributions.
There is, indeed, evidence of this taking place. We have
previously discussed this effect for Asn and Gln rotamers In testing the validity of the second criterion, we find
with incorrectly flipped amides and seriously clashing that only Lys follows rules strikingly simpler than rotamer
NH2 groups,25 such as Gln tpt6 or Asn p180°.9 In the enumeration; all 81 Lys rotamer frequencies can be mod-
current data, there is an especially clear example for Met eled to very high accuracy using only six physically
in the tpm conformation. This rotamer appears in the realistic parameters (Table IV). Even for Lys, the indi-
library used in the crystallographic fitting program O, vidual frequencies are strongly dependent on the neighbor-
which is based on the library of Ponder and Richards but ing angles (e.g., 3 on 2 and 4), and the dependencies
has been extended to fill out angles which were undefined are even more complex for other amino acids. In addition,
in that study. Met tpm, as shown in Figure 6a, is clearly there are minor rotamer combinations with atomic clashes
impossible, having a 0.69 Å atomic overlap between the H␣ at the staggered angles which have their peak occurrences
406 S.C. LOVELL ET AL.
at significantly shifted angles (e.g., Arg mtm105° or Leu A third, more complex, factor covers differences in choice
tt); this effect adds real but misleading shoulders to the of definitions and methodologies. Some disagreements
peaks in one-dimensional distributions, further confirm- arise from blurring the distinction between a true rotamer
ing the need for multidimensional analysis. (i.e., a locally favored conformation with clustered ex-
The truth of the first condition (tight clusters) was amples) and an arbitrary sample point in conformation
challenged by Schrauber et al.7 They showed that despite space. Many computational uses of rotamers require addi-
more and better data than in the original treatments, tional sampling within the allowed regions, but such
many residues were ⬎20° from a rotamer mean, while sample points are not real rotamers because their spacing
summing angle ranges even as wide as ⫾20° for long side and position depends on their intended use, not on the
chains would locate the functional group very imprecisely. properties of the side-chain conformations. Therefore, we
However, with further increase in database size and have provided a minimal set of sample points separately
accuracy and with the application of stronger quality (Table III), rather than including them in the rotamer
criteria, we have shown here and in previous work25 that library. An additional problem is that extra sample points
almost all of the clusters tighten very satisfactorily. Gln are helpful only if they correspond to populated regions of
3, when 2 is trans, provides the only really refractory the distributions and are physically reasonable conforma-
case, while many rotamers show half widths less than tions, which has not always been the case.
⫾10° (see Table I). Because the multidimensional clusters
Most earlier work used the mean (average) value as the
in space are round rather than rectangular, the com-
rotamer position, whereas we use the mode (peak occur-
bined effect for long side chains is the root sum of squares,
rence). Determining the mode requires smoothing the
rather than the direct sum, of the individual angle spreads.
distribution, but modes have important advantages of
To illustrate the overall level of rotamer clustering in
corresponding to the local energy minima and of being
Cartesian space for real side chains, Figure 7 shows the
sensitive to closely spaced peaks while independent of
superposition of all examples in our database of three
skewed peak shape or of arbitrarily defined bins. As was
neighboring Lys rotamers: mtpt, ttmt, and ttpt. Even at
the terminal atom the clusters are tight, despite the done by De Maeyer et al.,10 we also list common-atom
distribution at each angle having a significant spread. The rotamer positions with common angles for cases that
distributions of N positions for mtpt (blue) and ttmt have similar data and equivalent subsets of geometry and
(yellow) are completely overlapping with means only 0.27 contacts. This streamlines some applications, and it avoids
Å apart, whereas the N distribution for the near-neighbor the danger of choosing between rotamers based on a
rotamer ttpt (green) is well-separated from the others in difference that is not statistically significant.
its own distinct location 2.1 Å away. The standard devia- Differing treatment as well as size of the database used
tion of Lys N atom positions in a given rotamer is about is an important methodological issue. The 240-protein
0.8 Å, which certainly seems narrow enough to confirm the database used here is much larger than early ones6,7 but is
practical utility of rotamers; even with four angles, the either similar in size to or smaller than those used in
rotamer clusters are crisply distinct. recent studies.9,11 It is, however, restricted to higher-
resolution structures (1.7 Å here vs 2.0 Å6,7,11 or 2.5 Å9)
Comparison With Other Libraries and to structures satisfying a number of other quality
criteria (see Methods). Most importantly, the number of
For the simpler amino acids and the most common
side chains analyzed is further reduced by eliminating
rotamers, all libraries, of course, agree quite well, at least
those with uncertain conformations. In general, when a
in existence and position if not always in probability. For
side chain has been shown to be either wrong or uncertain
the rarer rotamers and the more difficult residue types
we simply omit it from the compiled data, because any
(including Lys, Arg, Met, Leu, Gln, Glu, Asn, Asp, and
correction process not using the experimental data would
Pro), there are at least three factors governing disagree-
ments between this and previous work. Growth in the be highly suspect. The only exceptions are the 180° flips of
database is crucial to such efforts, but here it is not the side-chain amides or imidazoles which we do correct in
most decisive issue; our raw data are essentially indistin- unambiguous cases, and the orientation of movable hydro-
guishable from those of Dunbrack and Cohen.11 gen positions, neither of which affects agreement with the
The second factor is the development of our new meth- X-ray data significantly. A larger database is clearly
ods for optimizing explicit H positions23 and representing desirable when trying to distinguish signal (correct rotam-
all-atom contacts clearly and dramatically.22 If graphics ers) from random statistical noise, because the signal-to-
such as Figure 5b and Figure 6 had been available to noise ratio increases as the square root of the number of
earlier authors, their rotamer lists would almost certainly observations. However, that relationship holds only if the
have been affected. The all-atom contact analysis, in both data is of uniform quality and if the errors are random,
visual and quantitative forms, was essential to discarding neither of which is the case for side-chain conformations.
from the present library a significant number of previous In fact, since low-resolution, high B-factor data is most
rotamers now shown to represent flipped amides or system- susceptible to systematic errors, adding such observations
atic fitting errors. On the other hand, this process helped will degrade rather than improve the results. In effect, we
in validation of a relatively large set of well-behaved filter out the noise rather than attempting to amplify the
rotamers down to the level of 1–2% occurrence probability. signal.
PENULTIMATE ROTAMER LIBRARY 407
We feel the value of our approach has been confirmed by In contrast, because of the steepness of the Lennard-
the production of clean, well-clustered distributions and Jones potential, more serious van der Waals overlaps
the settling of some previously-unanswered questions. In involve a prohibitively large energy penalty. Whereas a
particular, we recommend that any study of conforma- protein may be able to offset an eclipsed angle, it is
tional details should either omit examples with high probably never able to offset the many tens of kcals/mol
B-factors, because of the combination of easy application needed to stabilize a van der Waals overlap of about
and impressive effectiveness, or else should specifically 0.6 –1.0 Å as some published rotamers display (Figure 6).
examine behavior as a function of B as was done here to Such configurations are much more likely to be errors than
test for possible artifacts. correct-but-rarely-populated conformations. As discussed
in the Results section, these cases can be understood as
Nonrotameric Side Chains caused by defining a rotamer at the average between two
A side-chain rotamer is normally taken to mean a clusters, by choosing the wrong flip state of a group which
combination of angles producing a locally low-energy appears symmetric without explicit hydrogens, or by the
conformation, found empirically as a cluster of observa- inclusion of systematically misfit conformations. We con-
tions in torsion space. By defining rotamer boundaries in clude that none of those cases should properly be called
torsion space, it is possible to study how often long-range side-chain rotamers.
interactions shift side chains away from the preferred
CONCLUSIONS
rotameric conformations. In this study we draw bound-
aries at the common-atom values ⫾30° (exceptions are The present rotamer library has been constructed using
listed in Table I, primarily for angles with shallower more of the available information than previous studies,
energy wells) and count as nonrotameric any residue including various measures of the reliability of individual
which falls outside these bounds, including both those side-chain conformations and tests of the conformational
with near-eclipsed angles and those with staggered validity of potential rotamers. We took advantage of two
angles but highly unfavorable angle combinations. new criteria (all-atom contact analysis and B-factor depen-
For a dihedral angle between two tetrahedral carbons, a dence), which are independent of each other and of earlier
fully eclipsed conformation has an energy of 4 –10 kcal/mol work, in order to settle the borderline cases. All of the
higher than that of a staggered conformation.53 This is rotamers listed here correspond to local energy minima
equivalent to two to four hydrogen bonds, and we have, and peaks in the observed distributions. Once poorly
indeed, observed a few low B-factor Gln residues with determined side chains are discounted and flips corrected,
eclipsed 1 angles and three or four hydrogen bonds. This an extremely high proportion (⬎90% for most residues) are
does not mean that conformations with eclipsed angles in good rotameric conformations as defined by this library.
should be defined as rotamers. It does mean, however, that For the low-B regions of high-resolution protein struc-
occasionally it is appropriate to use nonrotameric conforma- tures, individual side chains in conformations far from a
tions in either experimentally determined or theoretical rotameric position fall into three classes: a) a few types
protein models if there is good reason. “Good reason” may with looser constraints than most (e.g., Gln or Glu with 2
mean clear density in a non–phase-biased map, tightly t); b) those which we suggest are fitting errors, such as
constrained local packing, or the ability to make several flipped Asn or Leu; and c) interesting cases (relatively
hydrogen bonds to offset the energy lost in forcing a common near active sites but especially unlikely for disor-
nonrotameric conformation. Whenever a nonrotamer is dered surface residues) for which the higher energy is
used, it should be because no rotameric conformation fits offset by other positive interactions. These observations
the available data nearly as well. suggest that proteins exhibit significantly strained side-
chain conformations surprisingly rarely and only for good
Clashing Rotamers reasons.
Nonrotameric conformations, and a few rarely popu- The result of this work is, we believe, a clear improve-
lated genuine rotamers, may have internal van der Waals ment on all previous libraries and that it neither omits any
overlaps when built in standard geometry. These overlaps important rotamers nor includes any which are signifi-
should be small in size, and it should be possible to largely cantly in error. It is, however, called penultimate, because
offset them with small local geometry distortions. In every applying suitably strict filters to the currently available
case where such conformations are significantly populated structures yields too few residues to determine accurately
in our data, we have closely examined not only the the reliability and position of the rare minor rotamers.
distributions but also the structures to make sure they are Therefore, in a few years’ time after many more atomic-
reasonable. For several examples in each case, we have resolution structures have been solved, it should be pos-
also examined electron density maps. The three types of sible to produce a definitive rotamer library that can stand
slightly overlapping rotamers in the present library all permanently to support accurate modeling of both experi-
have many low-B examples, with clear electron density; mental and predicted protein structures.
their overlaps can be relieved by modest bond angle
ACKNOWLEDGMENTS
changes, and they typically show favorable compensating
interactions. These cases, we conclude, are indeed genuine We thank Hope Taylor and Brent Presley for construct-
examples of somewhat strained rotamers. ing ideal-geometry side chains, Brent Presley for making
408 S.C. LOVELL ET AL.
angle plots, and Lizbeth Videau for critical reading of the 27. Kuszewski J, Gronenborn AM, Clore GM. Improvements and
manuscript. This work was supported in part by academic extensions in the conformational database potential for the refine-
ment of NMR and X-ray structures of proteins and nucleic acids. J
leave for J.M.W. from Glaxo-Wellcome. Magn Reson 1997;125:171–177.
28. IUPAC-IUB. Commission on biochemical nomenclature: abbrevia-
REFERENCES tions and symbols for the description of the conformation of
polypeptide chains. J Mol Biol 1970;52:1–17.
1. Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochem- 29. Markley JL, Bax A, Anata J, et al. Recommendations for the
istry of polypeptide chain configurations. J Mol Biol 1963;7:95–99. presentation of NMR structures of proteins and nucleic acids. J
2. Janin J, Wodak S, Levitt M, Maigret B. Conformation of amino Mol Biol 1998;280:933–952.
acid side-chains in proteins. J Mol Biol 1978;125:357–386. 30. McRee DE. XtalView/Xfit–a versatile program for manipulating
3. Bhat TN, Sasisekharan V, Vijayan M. An analysis of side-chain atomic coordinates and electron density. J Struct Biol 1999;125:
conformation in proteins. Int J Pept Protein Res 1979;13:170 –184. 156 –165.
4. James MNG, Sielecki AR. Structure and refinement of penicillopep- 31. Word JM. All-atom small-probe contact surface analysis: an
sin at 1.8 Å resolution. J Mol Biol 1983;163:299 –361. information-rich description of molecular goodness-of-fit. Disserta-
5. McGregor MJ, Islam SA, Sternberg MJE. Analysis of the relation- tion: Duke University; 2000.
ship between side-chain conformation and secondary structure in 32. Richardson DC, Richardson JS. The kinemage: a tool for scientific
globular proteins. J Mol Biol 1987;198:295–310. illustration. Protein Sci 1992;1:3–9.
6. Ponder JW, Richards FM. Tertiary templates for proteins: use of 33. Richardson DC, Richardson JS. Kinemages–simple macromolecu-
packing criteria in the enumeration of allowed sequences for lar graphics for interactive teaching and publication. Trends
different structural classes. J Mol Biol 1987;193:775–791. Biochem Sci 1994;19:135–138.
7. Schrauber H, Eisenhaber F, Argos P. Rotamers: to be or not to be? 34. Richardson JS, Richardson DC. “MAGE, PROBE, and Ki-
An analysis of amino acid side-chain conformations in globular nemages”. International Tables for Crystallography vol. 4. Dortre-
proteins. J Mol Biol 1993;230:592– 612. cht: Kluwer Academic Publishers;2000 (in press). Chapter 25.2.8
8. Tuffery P, Etchebest C, Hazout S, Lavery R. A new approach to the 35. Engh RA, Huber R. Accurate bond and angle parameters for X-ray
rapid determination of protein side-chain conformations. J Biomol protein structure refinement. Acta Crystallogr A 1991;47:392–400.
Struct Dyn 1991;8:1267–1289. 36. Carugo O, Argos P. Correlation between side-chain mobility and
9. Tuffery P, Etchebest C, Hazout S. Prediction of protein side-chain conformation in protein structures. Protein Eng 1997;10:777–787.
conformations: a study of the influence of backbone accuracy on 37. MacArthur MW, Thornton JM. Protein side-chain conformation: a
conformation stability in the rotamer space. Protein Eng 1997;10: systematic variation of 1 mean values with resolution–a conse-
361–372. quence of multiple rotameric states? Acta Crystallogr D 1999;55:
10. De Maeyer M, Desmet J, Lasters I. All in one: a highly detailed 994 –1004.
rotamer library improves both accuracy and speed in the model- 38. Némethy G, Gibson KD, Palmer KA, et al. Energy parameters in
ling of sidechains by dead-end elimination. Fold Des 1997;2:53–66. polypeptides. 10. Improved geometrical parameters and nonbonded
11. Dunbrack RL, Cohen FE. Bayesian statistical analysis of protein interactions for use in the ECEPP/3 algorithm, with application to
side-chain rotamer preferences. Protein Sci 1997;6:1661–1681. proline-containing peptides. J Phys Chem 1992;96:6472–6484.
12. Bower MJ, Cohen FE, Dunbrack RL. Prediction of protein side- 39. Richardson JS, Richardson DC. Amino acid preferences for spe-
chain rotamers from a backbone-dependent rotamer library: a cific locations at the ends of ␣-helices. Science 1988;240:1648–1652.
new homology modeling tool. J Mol Biol 1997;267:1268 –1282. 40. Wan W-Y, Milner-White EJ. A recurring two-hydrogen-bond motif
13. Lasters I, De Maeyer M, Desmet J. Enhanced dead-end elimina- incorporating a serine or threonine residue is found both at
tion in the search for the global minimum energy conformation of ␣-helical N termini and in other situations. J Mol Biol 1999;286:
a collection of protein side-chains. Protein Eng 1995;8:815– 822. 1651–1662.
14. Desjarlais JR, Handel TM. De novo design of the hydrophobic 41. Summers NL, Carlson WD, Karplus M. Analysis of side-chain
cores of proteins. Protein Sci 1995;4:2006 –2018. orientations in homologous proteins. J Mol Biol 1987;196:175–198.
15. Dahiyat BI, Mayo SL. De novo protein design: fully automated 42. Richardson JS, Richardson DC, Tweedy NB, et al. Looking at
sequence selection. Science 1997;278:82– 87. proteins: representations, folding, packing, and design. Biophys J
16. Jones TA, Zou J-Y, Cowan SW, Kjeldgaard M. Improved methods 1992;63:1186 –1209.
for building protein models in electron density maps and the 43. Dunbrack RL, Karplus M. Conformational analysis of the back-
location of errors in these models. Acta Crystallogr A 1991;47:110 – bone-dependent rotamer preferences of protein sidechains. Nat
119. Struct Biol 1994;1:334 –340.
17. McRee DE. Practical protein crystallography. San Diego: Aca- 44. Wiberg KB, Murcko MA. Rotational barriers: 2. Energies of
demic Press;1993. alkane rotamers. An examination of gauche interactions. J Am
18. Laskowski RA, MacArthur MW, Moss DS, Thornton JM. Pro- Chem Soc 1988;110:8029 – 8038.
Check–A program to check the stereochemical quality of protein 45. Wolfram Research I. Mathematica Version 3.0. Champaign, IL:
structures. J Appl Crystallogr 1993;26:283–291. Wolfram Research, Inc.;1996.
19. Hooft RWW, Vriend G, Sander C, Abola EE. Errors in protein 46. Richardson JS. The anatomy and taxonomy of protein structure.
structures. Nature 1996;381:272. In: Anfinsen CB, Edsall JT, Richards FM, editors. Advances in
20. Bernstein FC, Koetzle TF, Williams GJ, et al. The Protein Data protein chemistry. New York: Academic Press; 1981. p 167–339.
Bank: a computer-based archival file for macromolecular struc- 47. Richardson JS, Richardson DC. Interpretation of electron density
tures. J Mol Biol 1977;112:535–542. maps. Methods Enzymol 1985;115:189 –206.
21. Berman HM, Westbrook J, Feng Z, et al. The Protein Data Bank. 48. Creamer TP, Rose GD. Side-chain entropy opposes ␣-helix forma-
Nucleic acids Res 2000;28:235–242. tion but rationalizes experimentally determined helix-forming
22. Word JM, Lovell SC, LaBean TH, Taylor HC, et al. Visualizing propensities. Proc Natl Acad Sci 1992;89:5937–5941.
and quantifying molecular goodness-of-fit: small-probe contact 49. Lee C, Subbiah S. Prediction of protein side-chain conformation by
dots with explicit hydrogens. J Mol Biol 1999;285:1711–1733. packing optimization. J Mol Biol 1991;217:373–388.
23. Word JM, Lovell SC, Richardson JS, Richardson DC. Asparagine 50. Dunbrack RL, Karplus M. Backbone-dependent rotamer library
and glutamine: using hydrogen atom contacts in the choice of for proteins: application to side-chain prediction. J Mol Biol
side-chain amide orientation. J Mol Biol 1999;285:1735–1747. 1993;230:543–574.
24. van den Akker F, Hol WGJ. Difference density quality (DDQ): a 51. Petrella RJ, Lazaridis T, Karplus M. Protein sidechain conformer
method to assess the global and local correctness of macromolecu- prediction: a test of the energy function. Folding & Design
lar crystal structures. Acta Crystallogr D 1999;55:206 –218. 1998;3:353–377.
25. Lovell SC, Word JM, Richardson JS, Richardson DC. Asparagine 52. Kuszewski J, Gronenborn AM, Clore GM. Improving the quality of
and glutamine rotamers: B-factor cutoff and correction of amide flips NMR and crystallographic protein structures by means of a
yield distinct clustering. Proc Natl Acad Sci USA 1999;96:400–405. conformational database potential derived from structure data-
26. Benedetti E, Morelli G, Némethy G, Scheraga HA. Statistical and bases. Protein Sci 1996;5:1067–1080.
energetic analysis of side-chain conformations in oligopeptides. 53. Karplus M, Parr RG. An approach to the internal rotation
Int J Pept Protein Res 1983;22:1–15. problem. J Chem Phys 1963;38:1547–1552.