0% found this document useful (0 votes)
40 views24 pages

Automated Docking Using A Lamarckian

The document presents a new automated docking method utilizing a Lamarckian genetic algorithm combined with an empirical binding free energy function to predict ligand conformations in macromolecular targets. The study compares three search methods, demonstrating that the Lamarckian genetic algorithm outperforms traditional methods in efficiency and reliability for flexible ligands. The advancements are implemented in AUTODOCK version 3.0, which enhances its capabilities in drug design and molecular docking applications.

Uploaded by

Bithy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views24 pages

Automated Docking Using A Lamarckian

The document presents a new automated docking method utilizing a Lamarckian genetic algorithm combined with an empirical binding free energy function to predict ligand conformations in macromolecular targets. The study compares three search methods, demonstrating that the Lamarckian genetic algorithm outperforms traditional methods in efficiency and reliability for flexible ligands. The advancements are implemented in AUTODOCK version 3.0, which enhances its capabilities in drug design and molecular docking applications.

Uploaded by

Bithy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

�— —�

Automated Docking Using a Lamarckian


Genetic Algorithm and an Empirical
Binding Free Energy Function

GARRETT M. MORRIS,1 DAVID S. GOODSELL,1


ROBERT S. HALLIDAY, 2 RUTH HUEY,1 WILLIAM E. HART, 3
RICHARD K. BELEW,4 ARTHUR J. OLSON 1
1
Department of Molecular Biology, MB-5, The Scripps Research Institute, 10550 North Torrey Pines
Road, La Jolla, California 92037-1000
2
Hewlett-Packard, San Diego, California
3
Applied Mathematics Department, Sandia National Laboratories, Albuqurque, NM
4
Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA

Received February 1998; accepted 24 June 1998

ABSTRACT: A novel and robust automated docking method that predicts the
bound conformations of flexible ligands to macromolecular targets has been
developed and tested, in combination with a new scoring function that estimates
the free energy change upon binding. Interestingly, this method applies a
Lamarckian model of genetics, in which environmental adaptations of an
individual’s phenotype are reverse transcribed into its genotype and become
heritable traits Ž sic .. We consider three search methods, Monte Carlo simulated
annealing, a traditional genetic algorithm, and the Lamarckian genetic algorithm,
and compare their performance in dockings of seven protein�ligand test systems
having known three-dimensional structure. We show that both the traditional
and Lamarckian genetic algorithms can handle ligands with more degrees of
freedom than the simulated annealing method used in earlier versions of
AUTODOCK, and that the Lamarckian genetic algorithm is the most efficient,
reliable, and successful of the three. The empirical free energy function was
calibrated using a set of 30 structurally known protein�ligand complexes with
experimentally determined binding constants. Linear regression analysis of the
observed binding constants in terms of a wide variety of structure-derived
molecular properties was performed. The final model had a residual standard
error of 9.11 kJ mol�1 Ž2.177 kcal mol�1 . and was chosen as the new energy

Correspondence to: A. J. Olson; e-mail: [email protected]


Contract�grant sponsor: National Institutes of Health, con-
tract�grant numbers: GM48870, RR08065

Journal of Computational Chemistry, Vol. 19, No. 14, 1639�1662 (1998)


� 1998 John Wiley & Sons, Inc. CCC 0192-8651 / 98 / 141639-24
MORRIS ET AL.

function. The new search methods and empirical free energy function are
available in AUTODOCK, version 3.0. � 1998 John Wiley & Sons, Inc. J Comput
Chem 19: 1639�1662, 1998

Keywords: automated docking; binding affinity; drug design; genetic algorithm;


flexible small molecule protein interaction

We initiated the current work to remedy two


Introduction limitations of AUTODOCK. Ži. We have found that
the simulated annealing search method performs

A
well with ligands that have roughly eight rotatable
fast atom-based computational docking tool bonds or less: problems with more degrees of
is essential to most techniques for structure- freedom rapidly become intractable. This de-
based drug design.1, 2 Reported techniques for au- manded a more efficient search method. Žii.
tomated docking fall into two broad categories: AUTODOCK is often used to obtain unbiased dock-
matching methods and docking simulation meth- ings of flexible inhibitors in enzyme active sites: in
ods.3 Matching methods create a model of the computer-assisted drug-design, novel modifica-
active site, typically including sites of hydrogen tions of such lead molecules can be investigated
bonding and sites that are sterically accessible, and computationally. Like many other computational
then attempt to dock a given inhibitor structure approaches, AUTODOCK performs well in predict-
into the model as a rigid body by matching its ing relative quantities and rankings for series of
geometry to that of the active site. The most suc- similar molecules; however, it has not been possi-
cessful example of this approach is DOCK,4, 5 which ble to estimate in AUTODOCK whether a ligand will
is efficient enough to screen entire chemical bind with a millimolar, micromolar, or nanomolar
databases rapidly for lead compounds. The second binding constant. Earlier versions of AUTODOCK
class of docking techniques model the docking of a
used a set of traditional molecular mechanics
ligand to a target in greater detail: the ligand
force-field parameters that were not directly corre-
begins randomly outside the protein, and explores
lated with observed binding free energies; hence,
translations, orientations, and conformations until
we needed to develop a force field that could be
an ideal site is found. These techniques are typi-
used to predict such quantities.
cally slower than the matching techniques, but
Molecular docking is a difficult optimization
they allow flexibility within the ligand to be mod-
problem, requiring efficient sampling across the
eled and can utilize more detailed molecular me-
chanics to calculate the energy of the ligand in the entire range of positional, orientational, and con-
context of the putative active site. They allow formational possibilities. Genetic algorithms ŽGA.
computational chemists to investigate modifica- fulfill the role of global search particularly well,
tions of lead molecules suggested by the chemi- and are increasingly being applied to problems
cal intuition and expertise of organic synthetic that suffer from combinatorial explosions due to
chemists. their many degrees of freedom. Both canonical
AUTODOCK 6, 7 is an example of the latter, more genetic algorithms17 � 21 and evolutionary program-
physically detailed, flexible docking technique. ming methods 22 have been shown to be successful
Previous releases of AUTODOCK combine a rapid in both drug design and docking.
grid-based method for energy evaluation,8, 9 pre- In this report, we describe two major advances
calculating ligand�protein pairwise interaction en- that are included in the new release of AUTODOCK,
ergies so that they may be used as a look-up table version 3.0. The first is the addition of three new
during simulation, with a Monte Carlo simulated search methods: a genetic algorithm; a local search
annealing search10, 11 for optimal conformations of method; and a novel, adaptive global�local search
ligands. AUTODOCK has been applied with great method based on Lamarckian genetics, the La-
success in the prediction of bound conformations marckian genetic algorithm ŽLGA.. The second ad-
of enzyme�inhibitor complexes,12, 13 peptide�anti- vance is an empirical binding free energy force
body complexes,14 and even protein�protein inter- field that allows the prediction of binding free
actions15 ; these and other applications have been energies, and hence binding constants, for docked
reviewed elsewhere.16 ligands.

1640 VOL. 19, NO. 14


AUTOMATED DOCKING

local search aspects, performing a more global


Methods search early in the run, when higher temperatures
allow transitions over energy barriers separating
GENETIC ALGORITHMS energetic valleys, and later on performing a more
Genetic algorithms 23 use ideas based on the lan- local search when lower temperatures place more
guage of natural genetics and biological evolu- focus on local optimization in the current valley.
tion.24 In the case of molecular docking, the partic- AUTODOCK 3.0 retains the functionality of earlier
ular arrangement of a ligand and a protein can be versions, but adds the options of using a genetic
defined by a set of values describing the transla- algorithm ŽGA. for global searching, a local search
tion, orientation, and conformation of the ligand ŽLS. method to perform energy minimization, or a
with respect to the protein: these are the ligand’s combination of both, and builds on the work of
state variables and, in the GA, each state variable Belew and Hart.27, 28 The local search method is
corresponds to a gene. The ligand’s state corre- based on that of Solis and Wets,29 which has the
sponds to the genotype, whereas its atomic coordi- advantage that it does not require gradient infor-
nates correspond to the phenotype. In molecular mation about the local energy landscape, thus fa-
docking, the fitness is the total interaction energy cilitating torsional space search. In addition, the
of the ligand with the protein, and is evaluated local search method is adaptive, in that it adjusts
using the energy function. Random pairs of indi- the step size depending upon the recent history of
viduals are mated using a process of crossover, in energies: a user-defined number of consecutive
which new individuals inherit genes from either failures, or increases in energy, cause the step size
parent. In addition, some offspring undergo ran- to be doubled; conversely, a user-defined number
dom mutation, in which one gene changes by a of consecutive successes, or decreases in energy,
random amount. Selection of the offspring of the cause the step size to be halved. The hybrid of the
current generation occurs based on the individual’s GA method with the adaptive LS method together
fitness: thus, solutions better suited to their envi- form the so-called Lamarckian genetic algorithm
ronment reproduce, whereas poorer suited ones ŽLGA., which has enhanced performance relative
die. to simulated annealing and GA alone,21, 26 and is
A variety of approaches have been adopted to described in detail later. Thus, the addition of
improve the efficiency of the genetic algorithm. these new GA-based docking methods enhances
Classical genetic algorithms represent the genome AUTODOCK, and allows problems with more de-
as a fixed-length bit string, and employ binary grees of freedom to be tackled. Furthermore, it is
crossover and binary mutation to generate new now possible to use the same force field as is used
individuals in the population. Unfortunately, in in docking to perform energy minimization of
many problems, such binary operators can gener- ligands.
ate values that are often outside the domain of
interest, leading to gross inefficiencies in the search. IMPLEMENTATION
The use of real encodings helps to limit the genetic
algorithm to reasonable domains. Alternative ge- In our implementation of the genetic algorithm,
netic algorithms have been reported25 that employ the chromosome is composed of a string of real-
more complicated representations and more so- valued genes: three Cartesian coordinates for the
phisticated operators besides crossover and muta- ligand translation; four variables defining a
tion. Some of these retain the binary represen- quaternion specifying the ligand orientation; and
tation, but must employ decoders and repair one real-value for each ligand torsion, in that or-
algorithms to avoid building illegal individuals der. Quaternions are used to define the orienta-
from the chromosome, and these are frequently tion30 of the ligand, to avoid the gimbal lock
computationally intensive. However, the search problem experienced with Euler angles.31 The or-
performance of the genetic algorithm can be im- der of the genes that encode the torsion angles is
proved by introducing a local search method.26, 27 defined by the torsion tree created by AUTOTORS, a
preparatory program used to select rotatable bonds
in the ligand. Thus, there is a one-to-one mapping
HYBRID SEARCH METHODS IN AUTODOCK
from the ligand’s state variables to the genes of the
Earlier versions of AUTODOCK used optimized individual’s chromosome.
variants of simulated annealing.6, 7 Simulated an- The genetic algorithm begins by creating a ran-
nealing may be viewed as having both global and dom population of individuals, where the user

JOURNAL OF COMPUTATIONAL CHEMISTRY 1641


MORRIS ET AL.

defines the number of individuals in the popula- in the last N generations Ži.e., N is a user-defina-
tion. For each random individual in the initial ble parameter, typically 10.; and ² f : is the mean
population, each of the three translation genes for fitness of the population. Because the worst fitness,
x, y, and z is given a uniformly distributed ran- f w , will always be larger than either f i or ² f :,
dom value between the minimum and maximum except when f i � f w , then for individuals that have
x, y, and z extents of the grid maps, respectively; a fitness lower than the mean, f i � ² f :, the nu-
the four genes defining the orientation are given a merator in this equation, f w � f i , will always be
random quaternion, consisting of a random unit greater than the denominator f w � ² f :, and thus
vector and a random rotation angle between �180� such individuals will be allocated at least one
and �180�; and the torsion angle genes, if any, are offspring, and thus will be able to reproduce.
given random values between �180� and �180�. AUTODOCK checks for f w � ² f : beforehand, and if
Furthermore, a new random number generator has true, the population is assumed to have con-
been introduced that is hardware-independent.32 It verged, and the docking is terminated.
is used in the LS, GA, and LGA search engines, Crossover and mutation are performed on ran-
and allows results to be reproduced on any hard- dom members of the population according to
ware platform given the same seed values. The user-defined rates of crossover and mutation. First,
creation of the random initial population is fol- crossover is performed. Two-point crossover is
lowed by a loop over generations, repeating until used, with breaks occurring only between genes,
the maximum number of generations or the maxi- never within a gene—this prevents erratic changes
mum number of energy evaluations is reached, in the real values of the genes. Thus, both parents’
whichever comes first. A generation consists of chromosomes would be broken into three pieces at
five stages: mapping and fitness evaluation, selec- the same gene positions, each piece containing one
tion, crossover, mutation, and elitist selection, in or more genes; for instance, ABC and abc. The
that order. In the Lamarckian GA, each generation chromosomes of the resulting offspring after two-
is followed by local search, being performed on a point crossover would be AbC and aBc. These
user-defined proportion of the population. Each of offspring replace the parents in the population,
these stages is discussed in more detail in what keeping the population size constant. Crossover is
follows. followed by mutation; because the translational,
Mapping translates from each individual’s geno- orientational, and torsional genes are represented
type to its corresponding phenotype, and occurs by real variables, the classical bit-flip mutation
over the entire population. This allows each indi- would be inappropriate. Instead, mutation is per-
vidual’s fitness to be evaluated. This is the sum of formed by adding a random real number that has
the intermolecular interaction energy between the a Cauchy distribution to the variable, the distribu-
ligand and the protein, and the intramolecular tion being given by:
interaction energy of the ligand. The physicochem-
ical nature of the energy evaluation function is �
described in detail later. Every time an individual’s C Ž �, � , x . �
� Ž � 2 � Ž x � �. .
2
energy is calculated, either during global or local
search, a count of the total number of energy
evaluations is incremented. � � 0, � � 0, �� � x � �
This is followed, in our implementation, by pro-
portional selection to decide which individuals will where � and � are parameters that affect the
reproduce. Thus, individuals that have better- mean and spread of the distribution. The Cauchy
than-average fitness receive proportionally more distribution has a bias toward small deviates, but,
offspring, in accordance with: unlike the Gaussian distribution, it has thick tails
that enable it to generate large changes occasion-
fw � fi ally.26
no � fw � ² f : An optional user-defined integer parameter
fw � ² f : elitism determines how many of the top individu-
als automatically survive into the next generation.
where n o is the integer number of offspring to be If the elitism parameter is non-zero, the new popu-
allocated to the individual; f i is the fitness of the lation that has resulted from the proportional se-
individual Ži.e., the energy of the ligand.; f w is the lection, crossover, and mutation is sorted accord-
fitness of the worst individual, or highest energy, ing to its fitness; the fitness of new individuals

1642 VOL. 19, NO. 14


AUTOMATED DOCKING

having resulted from crossover and�or mutation


is calculated as necessary. Because populations are
implemented as heaps, selection of the best n
individuals is efficient.
The genetic algorithm iterates over generations
until one of the termination criteria is met. At the
end of each docking, AUTODOCK reports the fitness
Žthe docked energy., the state variables, and the
coordinates of the docked conformation, and also
the estimated free energy of binding. AUTODOCK
performs the user-specified number of GA dock-
ings, and then carries out conformational cluster
analysis on the docked conformations to determine
which are similar, reporting the clusters ranked by
increasing energy.

LAMARCKIAN GENETIC ALGORITHM


The vast majority of genetic algorithms mimic
the major characteristics of Darwinian evolution FIGURE 1. This figure illustrates genotypic and
phenotypic search, and contrasts Darwinian and
and apply Mendelian genetics. This is illustrated
Lamarckian search.27 The space of the genotypes is
on the right-hand side of Figure 1 Žnote the one- represented by the lower horizontal line, and the space
way transfer of information from the genotype to of the phenotypes is represented by the upper horizontal
the phenotype.. However, in those cases where an line. Genotypes are mapped to phenotypes by a
inverse mapping function exists Ži.e., one which developmental mapping function. The fitness function is
yields a genotype from a given phenotype., it is f ( x ). The result of applying the genotypic mutation
possible to finish a local search by replacing the operator to the parent’s genotype is shown on the
individual with the result of the local search; see right-hand side of the diagram, and has the
the left-hand side of Figure 1. This is called the corresponding phenotype shown. Local search is shown
on the left-hand side. It is normally performed in
Lamarckian genetic algorithm ŽLGA., and is an
phenotypic space and employs information about the
allusion to Jean Batiste de Lamarck’s Ždiscredited.
fitness landscape. Sufficient iterations of the local search
assertion that phenotypic characteristics acquired arrive at a local minimum, and an inverse mapping
during an individual’s lifetime can become herita- function is used to convert from its phenotype to its
ble traits.33 corresponding genotype. In the case of molecular
The most important issues arising in hybrids of docking, however, local search is performed by
local search ŽLS. techniques with the GA revolve continuously converting from the genotype to the
around the developmental mapping, which trans- phenotype, so inverse mapping is not required. The
forms genotypic representations into phenotypic genotype of the parent is replaced by the resulting
ones.26 The genotypic space is defined in terms of genotype, however, in accordance with Lamarckian
principles.
the genetic operators—mutation and crossover in
our experiments—by which parents of one genera-
tion are perturbed to form their children. The phe-
notypic space is defined directly by the problem, converting the phenotypic result of LS back into its
namely, the energy function being optimized. The corresponding genotype become possible.
local search operator is a useful extension of GA In our case, the fitness or energy is calculated
global optimization when there are local ‘‘smooth- from the ligand’s coordinates, which together form
ness’’ characteristics Žcontinuity, correlation, etc.. its phenotype. The genotypic representation of the
of the fitness function that local search can exploit. ligand, and its mutation and crossover operators,
In hybrid GA � LS optimizations, the result of the have already been described. The developmental
LS is always used to update the fitness associated mapping simply transforms a molecule’s geno-
with an individual in the GA selection algorithm. typic state variables into the corresponding set of
If, and only if, the developmental mapping func- atomic coordinates. A novel feature of this applica-
tion is invertible, will the Lamarckian option— tion of hybrid global�local optimization is that the

JOURNAL OF COMPUTATIONAL CHEMISTRY 1643


MORRIS ET AL.

Solis and Wets LS operator searches through the tion from ideal bond lengths and bond angles.
genotypic space rather than the more typical phe- These methods are excellent for studying molecu-
notypic space. This means that the developmental lar processes over time, for optimizing bound con-
mapping does not need to be inverted. Nonethe- formations, and for performing free energy per-
less, this molecular variation of the genetic algo- turbation calculations between molecules with a
rithm still qualifies as Lamarckian, because any single atom change,46 but they often require con-
‘‘environmental adaptations’’ of the ligand ac- siderable investments of computer time and, un-
quired during the local search will be inherited by fortunately, these approaches tend to perform less
its offspring. well in ranking the binding free energies of com-
At each generation, it is possible to let a user- pounds that differ by more than a few atoms.
defined fraction of the population undergo such a What is needed is an empirical relationship
local search. We have found improved efficiency between molecular structure and binding free
of docking with local search frequencies of just energy.
0.06, although a frequency of 1.00 is not signifi- The first thoroughly established linear free en-
cantly more efficient.26 Both the canonical and a ergy relationship was observed by Hammett as
slightly modified version of the Solis and Wets early as 1933, and reported in 1937.47 It was used
method have been implemented. In canonical Solis to relate structure and reactivity of small organic
and Wets, the same step size would be used for molecules on a quantitative basis. Hammett was
every gene, but we have improved the local search able to derive substituent constants and reaction
efficiency by allowing the step size to be differ- constants that could then be used to calculate rate
ent for each type of gene: a change of 1 A ˚ Ž1 A
˚�
constants and equilibrium constants for a specific
10�1 0 m. in a translation gene could be much reaction of a specific compound. It could be said
more significant than a change of 1� in a rotational
that Hammett’s work was the forerunner of mod-
or torsional gene. In the docking experiments pre-
˚ ern-day quantitative structure�activity relation-
sented here, the translational step size was 0.2 A,
ships ŽQSAR., pioneered by Hansch and cowork-
and the orientational and torsional step sizes
ers in the 1960s. Here it is assumed that the sum of
were 5�.
the steric, electronic, and hydrophobic effects of
In the Lamarckian genetic algorithm, genotypic
substituents in a compound determines its biologi-
mutation plays a somewhat different role than it
cal activity; see, for example, Fujita, 48 Hansch,49
does in traditional genetic algorithms. Tradition-
and more recently Selassie et al.50
ally, mutation plays the role of a local search
operator, allowing small, refining moves that are Current structure-based scoring functions seek
not efficiently made by crossover and selection to remedy some of the deficiencies of traditional
alone. With the explicit local search operator, how- force fields by developing empirical free energy
ever, this role becomes unnecessary, and is needed functions that reproduce observed binding con-
only for its role in replacing alleles that might have stants. Most of these approaches use an expanded
disappeared through selection. In LGA, mutation ‘‘master equation’’ to model the free energy of
can take on a more exploratory role. The Cauchy binding, adding entropic terms to the molecular
deviates are a compromise between radical jumps mechanics equations 51 :
to arbitrary sections of the conformation space and
detailed exploration of the local topography. �G � �Gvdw � �G hbond � �Gelec � �Gconform
� �Gtor � �Gsol
DERIVATION OF THE EMPIRICAL BINDING
FREE ENERGY FUNCTION
where the first four terms are the typical molecular
The study of molecular structure underpins mechanics terms for dispersion�repulsion, hydro-
much of computational molecular biology. There gen bonding, electrostatics, and deviations from
are several established methods for performing covalent geometry, respectively; �Gtor models the
molecular mechanics and molecular dynamics, no- restriction of internal rotors and global rotation
tably AMBER,34, 35 CHARMM,36 DISCOVER,37 ECEPP,38 and translation; and �Gsol models desolvation
and GROMOS.39 Many of these traditional force upon binding and the hydrophobic effect Žsolvent
fields model the interaction energy of a molecular entropy changes at solute�solvent interfaces.. This
system with terms for dispersion�repulsion,40 hy- latter term is the most challenging. Most workers
drogen bonding,41 electrostatics,42 � 45 and devia- use variants of the method of Wesson and Eisen-

1644 VOL. 19, NO. 14


AUTOMATED DOCKING

berg,52 calculating a desolvation energy based on degrees of freedom is added to the in vacuo func-
the surface area buried upon complex formation, tion. This term is proportional to the number of
with the area of each buried atom being weighted sp 3 bonds in the ligand, Ntor .54 We investigated
by an atomic solvation parameter. Bohm¨ built on variants that included and excluded methyl, hy-
earlier work with the de novo inhibitor design droxyl, and amine rotors.
program LUDI,53 and used linear regression to cali- In the development of an empirical free energy
brate a similar function against a set of 45 diverse function for AUTODOCK, the desolvation term was
protein�ligand complexes with published binding most challenging, because AUTODOCK uses a grid-
constants.54 The final function predicted binding based method for energy evaluation, and most
constants for a set of test complexes with a stan- published solvation methods are based on surface
dard deviation equivalent to about a factor of 25 in area calculations. We investigated two different
binding constant: more than sufficient to rank in- methods of calculating the desolvation energy
hibitors with millimolar, micromolar, and nanomo- term. The first of these methods was based on
lar binding constants. Jain devised a continuous, estimating atom-by-atom contributions to the in-
differentiable scoring function,55 which is, in terfacial molecular surface area between the ligand
¨
essence, very similar to that of Bohm, but based on and the protein using the difference in the surface
non-physical pairwise potentials using Gaussians areas of the complex and the unbound protein and
and sigmoidal terms. unbound ligand. Both the solvent-accessible and
We have implemented a similar approach using solvent-excluded surface areas were considered,
the thermodynamic cycle of Wesson and Eisen- being calculated with MSMS,57 a fast and reliable
berg.52 The function includes five terms: program that computes analytical molecular sur-

ž /
faces. Unfortunately, there can be significant errors
Ai j Bi j
�G � �GvdW Ý � in the value of the interfacial solvent-accessible
i, j ri12j ri6j surface areas, due to the ‘‘collar’’ of accessible

ž /
surface that surrounds the ligand�protein interface
Ci j Di j
� �G hbond Ý E Ž t . � in the complex. We also tested seven variants of
i, j ri12j ri10j the pairwise, volume-based method of Stouten
qi q j et al.58 : this method has the advantage that it is
� �Gelec Ý consistent with the pre-calculated affinity grid for-
� Ž ri j . ri j
i, j mulation used by AUTODOCK. For each atom in the
� �Gtor Ntor ligand, fragmental volumes of surrounding protein
atoms are weighted by an exponential function
� �Gsol Ý Ž Si Vj � S j Vi . eŽ�r i j �2 � Ž1.
2 2
.
and then summed, evaluating the percentage of
i, j
volume around the ligand atom that is occupied
where the five �G terms on the right-hand side by protein atoms. This percentage is then weighted
are coefficients empirically determined using lin- by the atomic solvation parameter of the ligand
ear regression analysis from a set of protein�ligand atom to give the desolvation energy. The full
complexes with known binding constants, shown method may be broken into four separate compo-
in Table I. The summations are performed over all nents: burial of apolar atoms in the ligand, burial
pairs of ligand atoms, i, and protein atoms, j, in of apolar protein atoms, burial of polar and charged
addition to all pairs of atoms in the ligand that are atoms in the ligand, and burial of polar and
separated by three or more bonds. charged protein atoms. Great success has also been
The in vacuo contributions include three interac- reported in using simply the amount of hydropho-
tion energy terms, used in previous versions of bic surface area buried upon complexation as a
AUTODOCK: a Lennard�Jones 12-6 dispersion�re- measure of the ‘‘hydrophobic effect,’’ 54 so we
pulsion term; a directional 12�10 hydrogen bond- tested several formulations that included only the
ing term, where EŽ t . is a directional weight based volume lost around ligand carbon atoms. The
on the angle, t, between the probe and the target burial of polar atoms caused particular problems,
atom9 ; and a screened Coulombic electrostatic po- as discussed in what follows. Apart from the vol-
tential.56 Each of these terms, including their pa- ume-based method, we tested a simpler formula-
rameterization, have already been described.7 tion for the solvent transfer of polar atoms; that is,
A measure of the unfavorable entropy of ligand a constant term corresponding to the favorable free
binding due to the restriction of conformational energy of interaction of a polar atom with solvent

JOURNAL OF COMPUTATIONAL CHEMISTRY 1645


MORRIS ET AL.

TABLE I.
Protein – Ligand Complexes Used to Calibrate Empirical Free Energy Function, Along with Brookhaven Protein
Data Bank (PDB) Accession Codes and Binding.

Protein�ligand complex PDB code Log( K i ) a

Concanavalin A / �-methyl-D-mannopyranoside 4cna 2.00


Carboxypeptidase A / glycyl-L-tyrosine 3cpa 3.88
Carboxypeptidase A / phosphonate ZAA P (O)F 6cpa 11.52
Cytochrome P-450 cam / camphor 2cpp 6.07
Dihydrofolate reductase / methotrexate 4dfr 9.70
�-Thrombin / benzamidine 1dwb 2.92
Endothiapepsin / H-256 zer6 7.22
�-Thrombin / MQPA 1etr 7.40
�-Thrombin / NAPAP 1ets 8.52
�-Thrombin / 4-TAPAP 1ett 6.19
FK506-binding protein (FKBP) / immunosuppressant FK506 1fkf 9.70
D-Galactose / D-glucose binding protein / galactose 2gbp 7.60
Hemagglutinin / sialic acid 4hmg 2.55
HIV-1 Protease / A78791 1hvj 10.46
HIV-1 Protease / MVT101 4hvp 6.15
HIV-1 Protease / acylpepstatine 5hvp 5.96
HIV-1 Protease / XK263 1hvr 9.51
Fatty-acid-binding protein / C15 COOH 2ifb 5.43
Myoglobin (ferric) / imidazole 1mbi 1.88
McPC603 / phosphocholine 2mcp 5.23
�-Trypsin / benzamidine 3ptb 4.74
Retinol-binding protein / retinol 1rbp 6.72
Thermolysin / Leu-hydroxylamine 4tln 3.72
Thermolysin / phosphoramidon 1tlp 7.55
Thermolysin / n-(1-carboxy-3-phenylpropyl)-Leu-Trp 1tmn 7.30
Thermolysin / Cbz-Phe-p-Leu-Ala (ZFpLA) 4tmn 10.19
Thermolysin / Cbz-Gly-p-Leu-Leu (ZGpLL) 5tmn 8.04
Purine nucleoside phosphorylase (PNP) / guanine 1ulb 5.30
Xylose isomerase / CB3717 2xis 5.82
Triose phosphate isomerase (TIM) / 2-phosphoglycolic acid (PGA) 2ypi 4.82
a
¨ 54
Adapted from Bohm.

is estimated, and this is subtracted from the bind- Thirty protein�ligand complexes with pub-
ing free energy. lished binding constants were used in the calibra-
Trilinear interpolation is used to evaluate tion of AUTODOCK’s free energy function ŽTable I.,
rapidly the intermolecular dispersion�repulsion ¨ 54
and were chosen from the set of 45 used by Bohm,
energy, the hydrogen bonding energy, the electro- omitting all complexes that he modeled Ži.e., using
static potential, and the solvation energy of each only complexes for which crystallographic struc-
atom in the ligand, using grid maps that have been tures were available.. One of the limitations of
pre-calculated over the protein for each atom type these binding constant data is that the conditions
in the ligand. In AUTODOCK 3.0, we have imple- under which they were determined vary, which
mented a faster method of trilinear interpolation59 intrinsically limits the accuracy of our best model.
than was available in earlier versions of AUTODOCK. We converted between the inhibition constant, K i ,
Both methods are mathematically equivalent. The and the observed free energy change of binding,
original implementation used 24 multiplications to �Gobs , using the equation:
perform each three-dimensional trilinear interpola- �Gobs � RT ln K i
tion, but, by cascading seven one-dimensional in-
terpolations, the number of multiplications has where R is the gas constant, 1.987 cal K�1 mol�1 ,
been reduced to 7. and T is the absolute temperature, assumed to be

1646 VOL. 19, NO. 14


AUTOMATED DOCKING

room temperature, 298.15 K.60 Note that this equa- protease�XK263, hemagglutinin�sialic acid, and
tion lacks a minus sign because the inhibition dihydrofolate reductase�methotrexate provided
constant is defined for the dissociation reaction, more difficult tests, with many rotatable bonds
EI � E � I,61 whereas �Gobs refers to the opposite and diverse chemical characteristics.
process of binding, E � I � EI; where E is the We compared the performance of Monte Carlo
enzyme and I is the inhibitor. simulated annealing ŽSA., the genetic algorithm
To remove any steric clashes in the crystallo- ŽGA., and the Lamarckian genetic algorithm ŽLGA..
graphic complexes, each ligand was optimized us- The new empirical free energy function presented
ing AUTODOCK’s new Solis and Wets local mini- here was used for energy evaluation in all cases.
mization technique described earlier, but with Dockings were performed using approximately the
the previously reported force field.7 The separate same number of energy evaluations � 1.5 mil-
contributions from the hydrogen bonding, disper- lion., so each method could be judged given simi-
sion�repulsion, electrostatic, and solvation ener- lar computational investments. The CPU time
gies were evaluated. Empirical free energy coeffi- taken for a single docking varied from 4.5 to 41.3
cients for each of these terms were derived using minutes, on a 200-MHz Silicon Graphics MIPS
linear regression in the S-PLUS software package,62 4400 with 128 MB of RAM, depending on the
and cross-validation studies were performed. In number of rotatable bonds and the number of
total, 900 different binding free energy models atoms in the ligand.
were tested: each linear model consisted of a van At the end of a set of dockings, the docked
der Waals term, a hydrogen bonding term Žone of conformations were exhaustively compared to one
6 variants., an electrostatic term, a torsional en- another to determine similarities, and were clus-
tropy term Žone of 5 variants., and a desolvation tered accordingly. The user-defined root-mean-
term Žone of 15 variants.. We also investigated square positional deviation Žrmsd. tolerance was
whether the inclusion of a constant term improved used to determine if two docked conformations
the model. Six of the seven test systems used to were similar enough to be included in the same
test the docking procedure, which were originally cluster, and symmetrically related atoms in the
used to test AUTODOCK, version 2.4,7 were also in ligand were considered. These clusters were ranked
the training set of 30 protein�ligand complexes; in order of increasing energy, by the lowest energy
therefore, to validate the chosen coefficients, linear in each cluster. Ordinarily, the structure of the
protein�ligand complex would not be known, so
regression was repeated for the set of 24
the criteria by which the dockings would be evalu-
protein�ligand complexes, excluding the 6 over-
ated are the energies of the docked structures, and,
lapping test systems.
in cases where there are several plausible, low-en-
ergy structures, the number of conformations in a
TESTING DOCKING METHODS conformationally similar cluster. Because one of
our goals was to test the ability of the methods to
Seven protein�ligand complexes, with a range reproduce known structures, we also compared
of complexity and chemical properties, were cho- the rmsd between the lowest energy docked struc-
sen from the Brookhaven Protein Data Bank 63, 64 to ture and the crystallographic structure.
compare the performance of the docking tech-
niques Žsee Fig. 2.. To facilitate comparison with
the previous force field, we chose the same set of DOCKING- AND SEARCH-METHOD-SPECIFIC
six test systems investigated earlier,7 but added a PARAMETERS
harder docking problem to challenge all the search The proteins and ligands in the seven docking
methods Žsee Table II.. The simplest test cases tests were treated using the united-atom approxi-
were the �-trypsin�benzamidine and cytochrome mation, and prepared using the molecular model-
P-450 cam �camphor complexes, which had small, ing program, SYBYL.65 Only polar hydrogens were
rigid ligands. Interactions in the former are domi- added to the protein, and Kollman united-atom
nated by electrostatic interactions and hydrogen partial charges were assigned. Unless stated other-
bonds to the substrate amidine, whereas the latter wise, all waters were removed. Atomic solvation
is dominated by hydrophobic interactions. McPC- parameters and fragmental volumes were assigned
603�phosphocholine and streptavidin�biotin were to the protein atoms using a new AUTODOCK util-
moderately flexible, and represented test systems ity, ADDSOL. The grid maps were calculated using
having an intermediate level of difficulty. HIV-1 AUTOGRID, version 3.0. In all seven protein�ligand

JOURNAL OF COMPUTATIONAL CHEMISTRY 1647


MORRIS ET AL.

FIGURE 2. The seven ligands chosen for docking, showing the rotatable bonds as curly arrows: (a) benzamidine; (b)
camphor; (c) phosphocholine; (d) biotin; (e) HIV-1 protease inhibitor XK-263; (f) isopropylated sialic acid; and (g)
methotrexate. Note that two ligands, (e) and (f), contain hydroxyl rotors, which are not counted in the total number of
torsional degrees of freedom; note also that cyclic rotatable bonds are excluded.

cases, we used grid maps with 61 � 61 � 61 points, nonpolar hydrogens added by SYBYL for the partial
a grid-point spacing of 0.375 A, ˚ and, because the atomic charge calculation. The partial charges on
location of the ligand in the complex was known, the nonpolar hydrogens were added to that of the
the maps were centered on the ligand’s binding hydrogen-bearing carbon also in AUTOTORs.
site. The ligands were treated in SYBYL initially as In all three search methods, 10 dockings were
all atom entities, that is, all hydrogens were added, performed; in the analysis of the docked conforma-
then partial atomic charges were calculated using tions, the clustering tolerance for the root-mean-
the Gasteiger�Marsili method.66, 67 AUTOTORS, an square positional deviation was 0.5 A, ˚ and the
AUTODOCK utility, was used to define the rotatable crystallographic coordinates of the ligand were
bonds in the ligand, if any, and also to unite the used as the reference structure. For all three search

1648 VOL. 19, NO. 14


AUTOMATED DOCKING

TABLE II.
X-Ray Crystal Structure Coordinates Used in Docking Experiments, Their Brookhaven Protein Data Bank
Accession Codes and Resolution, Number of Rotatable Bonds in the Ligand, Number of Torsional
Degrees of Freedom, Total Number of Degrees of Freedom, and Energy of Crystal Structure Using the
Empirical Force Field Presented Here.

Number Total Energy


of number of of crystal
PDB Resolution rotatable degrees of structure
Protein�ligand complex code ˚)
(A Reference bonds Nt ora freedom (kcal mol � 1)

�-Trypsin / benzamidine 3ptb 1.7 69 0 0 7 �7.86


Cytochrome P-450 cam / camphor 2cpp 1.63 70 0 0 7 �4.71
McPC-603 / Phosphocholine 2mcp 3.1 71 4 4 11 +5.48 b
Streptavidin / biotin 1stp 2.6 73 5 5 12 �8.86
HIV-1 protease / XK263 1hvr 1.8 75 10 8 17 �18.62
Influenza hemagglutinin / sialic acid 4hmg 3.0 76 11 7 18 �4.71
Dihydrofolate reductase / methotrexate 4dfr 1.7 79 7 7 14 �13.64
a
Nt or is the number of torsional degrees of freedom used in the calculation of the predicted free energy change of binding, �Gpr e d .
Note that this excludes rotatable bonds that only move hydrogens, such as hydroxyl, amino, and methyl groups.
b
This energy is dominated by a large, positive contribution from C2 and O1 to the internal nonbonded energy, of +6.13 kcal
mol � 1 ; these atoms are 2.26 A˚ apart.

methods, the step sizes were 0.2 A˚ for translations In the GA and LGA dockings, we used an initial
and 5� for orientations and torsions. These step population of random individuals with a popula-
sizes determined the amount by which a state tion size of 50 individuals; a maximum number of
variable could change when a move is made in 1.5 � 10 6 energy evaluations; a maximum number
simulated annealing and the relative size of muta- of generations of 27,000; an elitism value of 1,
tion in the local search, whereas the � and � which was the number of top individuals that
parameters determined the size of the mutation in automatically survived into the next generation; a
the genetic algorithms, GA and LGA. The Cauchy mutation rate of 0.02, which was the probability
distribution parameters were � � 0 and � � 1. that a gene would undergo a random change; and
Note that in simulated annealing, random changes a crossover rate of 0.80, which was the probability
were generated by a uniformly distributed random that two individuals would undergo crossover.
number generator; in the Solis and Wets local Proportional selection was used, where the aver-
search, by a normal distribution; and, in the ge- age of the worst energy was calculated over a
netic algorithm, by a Cauchy distribution. In the window of the previous 10 generations. In the
simulated annealing tests, the initial state of the LGA dockings, the pseudo-Solis and Wets local
ligand was chosen randomly by AUTODOCK. We search method was used, having a maximum of
used the optimal set of simulated annealing pa- 300 iterations per local search; the probability of
rameters that were determined from the schedule performing local search on an individual in the
experiments described earlier.7 These included an population was 0.06; the maximum number of
initial annealing temperature of 616 cal mol�1 , a consecutive successes or failures before doubling
linear temperature reduction schedule, 10 runs, 50 or halving the local search step size, � , was 4, in
cycles, and a cycle-termination criterion of a maxi- both cases; and the lower bound on � , the termi-
mum of 25,000 accepted steps or 25,000 rejected nation criterion for the local search, was 0.01.
steps, whichever came first. The minimum energy
state was used to begin the next cycle; the only
exception was for 1hvr, where the initial annealing Results and Discussion
temperature was increased to 61,600 cal mol�1 .
The maximum initial energy allowed was 0.0 kcal CALIBRATION OF EMPIRICAL FREE
ENERGY FUNCTION
mol�1 , and the maximum number of retries was
1000, used to generate a low energy random initial Several linear regression models were tested for
state to begin each simulated annealing docking. their ability to reproduce the observed binding

JOURNAL OF COMPUTATIONAL CHEMISTRY 1649


MORRIS ET AL.

TABLE III.
Calibration of Empirical Free Energy Function.

Residual
standard Multiple
Modela error R2 �Gvdwb �Gestat �G hbond �G t or �Gsolv

A 2.324 0.9498 0.1795 0.1133 0.0166 0.3100 0.0101


(0.0263) (0.0324) (0.0625) (0.0873) (0.0585)
B 2.232 0.9537 0.1518 0.1186 0.0126 0.3548 0.1539
(0.0269) (0.0246) (0.0382) (0.0890) (0.1050)
C 2.177 0.9559 0.1485 0.1146 0.0656 0.3113 0.1711
(0.0237) (0.0238) (0.0558) (0.0910) (0.1035)
a
Models differ in the formulation of the solvation term and the hydrogen bonding term. Model A: full volume-based solvation term
and standard 10�12 hydrogen bonding, as in Eq. (1). Model B: apolar ligand atoms only in the solvation term, and standard 10�12
hydrogen bonding. Model C: apolar ligand atoms only in the solvation term, and the standard 10�12 hydrogen less the estimated
average, as in Eq. (2).
b
Values for the model coefficients, with standard deviations in parentheses.

constants of structurally characterized complexes. in the calibration set, using the chosen model
Table III shows the results for the three major Žmodel C.. Model A adds the full volume-based
candidates, and Figure 3 shows the correlation solvation method and the torsional restriction term
between the observed and the predicted binding to the original molecular mechanics force field.
free energies for the 30 protein�ligand complexes Model B simplifies the solvation method by evalu-

FIGURE 3. Predicted versus observed binding free energies for the calibration set and the docking tests. The solid
line shows a perfect fit, and the dotted lines show one standard deviation above and below this. Hollow diamonds show
the 30 protein�ligand complexes used in fitting the terms of the binding free energy function. Solid triangles show the
results of the simulated annealing (SA) dockings, solid diamonds show the genetic algorithm (GA) dockings, and the
solid squares show the Lamarckian genetic algorithm (LGA) dockings. Note the outlying biotin�streptavidin complex
(1stp), where it is believed there are significant contributions to the binding free energy due to protein rearrangements.

1650 VOL. 19, NO. 14


AUTOMATED DOCKING

ating the volume buried for only the carbon atoms very stable in the linear regression analysis, with
in the ligand. Model C also uses only ligand car- consistent coefficient values in different formula-
bon atoms in the desolvation calculation, and also tions and reasonable standard deviations. In our
adds a constant term to the hydrogen bonding best model, dispersion�repulsion energies, with
function, modeling desolvation of polar atoms. parameters taken from AMBER,34 were weighted by
Model C was chosen for incorporation into AUTO- a factor of 0.1485, yielding an energy of about
DOCK 3.0, based on its better overall statistics, and �0.2 kcal mol�1 for the most favorable atom�atom
on criteria discussed in what follows. The form of contacts. Electrostatics, modeled with a screened
this free energy function is: Coulomb potential,56 were weighted by a factor of
0.1146, yielding an energy of about �1.0 kcal
�G � �GvdW Ý
i, j
ž Ai j
ri12j

Bi j
ri6j / mol�1 for an ideal salt bridge. In the torsional
restriction term, each torsional degree of freedom
requires 0.3113 kcal mol�1 .

� �G hbond Ý E Ž t .
i, j
ž Ci j
ri12j

Di j
ri10j
� Ehbond
/ The major differences between models occurred
with the interaction of the hydrogen bonding term
and the desolvation term. Hydrogen bonding is
qi q j modeled with a directional 12�10 potential.68 We
� �Gelec Ý
i, j � Ž ri j . ri j encountered a major problem when calibrating this
hydrogen bonding function. Because the test set
� �Gtor Ntor
included only natural enzyme�ligand complexes,
Ý Si Vj eŽ�r Ž2. optimized by millions of years of evolution, hydro-
2 2
� �Gsol i j �2 � .

iC , j gen bonding groups in the ligands are nearly al-


ways paired with the appropriate hydrogen bond-
where E hbond is the estimated average energy of ing group in the protein. Thus, the number of
hydrogen bonding of water with a polar atom, and hydrogen bonds that the ligand forms in the com-
the summation in the solvation term is performed- plex and the number it forms with solvent when
over all pairs consisting of only carbon atoms in free in solution are approximately the same; that
the ligand, i, and atoms of all types, j, in the is, there is little change in the free energy of
protein. Note that the internal or intramolecular hydrogen bonding, and �G hbond was evaluated to
interaction energy of the ligand is not included in be approximately zero. Unfortunately, this pro-
the calculation of binding free energy; during vides no information on the cost of burying a
docking, however, internal energy is included in hydrogen bonding group without forming a bond
the total docked energy, because changes in ligand with the protein, and our data set did not include
conformation can affect the outcome of the dock- cases to evaluate this. Of course, the volume-based
ing, so this must be taken into consideration. We solvation method should account for this—the un-
looked at linear regression models that did include favorable polar contribution to the solvation en-
the internal energy, and found that adding this ergy should compensate for the favorable 12�10
term did not improve the model. The assumption hydrogen bonding energy. The linear regression,
made is that the internal energy of the ligand in however, consistently returned coefficients that set
solution and in the complex are the same. The the hydrogen bonding energy and desolvation en-
energies used and reported by AUTODOCK should ergy to nearly zero, and increased the dispersion�
be distinguished: there are docked energies, which repulsion term to compensate Žsee Model A in
include the intermolecular and intramolecular in- Table III.. We chose an alternative formulation to
teraction energies, and which are used during resolve this problem.
dockings; and predicted free energies, which include We obtained the best results by separating the
the intermolecular energy and the torsional free desolvation of polar atoms from the volume-based
energy, and are only reported at the end of a calculation. We assumed that the extent of hydro-
docking. Because the intermolecular energy grid gen bonding in the complexes was roughly the
maps include the desolvation term, dockings using same as the extent of hydrogen bonding in solu-
the new, empirical force field in AUTODOCK ver- tion. The calculated hydrogen bonding energy, us-
sion 3.0 may be qualitatively different from results ing the directional 12�10 hydrogen bonding func-
found using earlier versions. tion, was divided by the maximal number of
Three coefficients, for dispersion�repulsion, possible hydrogen bonds, counting two for each
electrostatics, and loss of torsional freedom were oxygen atom and one for each polar hydrogen. For

JOURNAL OF COMPUTATIONAL CHEMISTRY 1651


MORRIS ET AL.

the 30 complexes used in calibration, it was found ing parameters added to a previously optimized
that 36% of the maximum possible hydrogen parameterization. We retained the molecular me-
bonding sites were actually utilized. Values of chanics formulation, however, specifically for its
E hbond ranging from 36% to 100% of the maximal ability to model the distance dependence of each
well depth of 5 kcal mol�1 had little effect on the energetic term. This distance dependence Žand an-
success of the formulation Ždata not shown., and a gular dependence in hydrogen bonding. is essen-
value of 36% was chosen. Optimized weights tial for finding valid docked conformations, but
yielded an ideal hydrogen bonding energy in the the amount and resolution of the available pro-
complex of �0.328 kcal mol�1 , and the estimated tein�ligand data do not support a full re-pameteri-
average energy of each hydrogen bond in solution zation of the functions.
of �0.118. Because hydrogen bonding was mod-
eled by this difference, the typical hydrogen bond-
DOCKING EXPERIMENTS
ing free energy of a complex was approximately
zero, but there was a penalty of about 0.2 kcal Because we are comparing different search
mol�1 for oxygen and nitrogen atoms that did not methods, it is important to ensure that the meth-
form hydrogen bonds, driving the simulation to- ods are treated equally. It is therefore important
ward docked conformations with maximal hydro- that each search method be allowed approximately
gen bonding. We are currently exploring an appro- the same number of energy evaluations in a dock-
priate data set for evaluating this formulation more ing. The number of energy evaluations in a dock-
rigorously. ing depends on the termination criteria, and be-
The desolvation for carbon atoms in the ligand cause it is not possible to predict how many
was evaluated using two different classes of atom accepted or rejected steps the stochastic SA method
type, aliphatic and aromatic, as in the original will make at a given temperature, the number of
study.58 The desolvation term was weighted by a evaluations varies in SA. The range was from
factor of 0.1711 in our final empirical free energy 1.19 � 10 6 to 2.33 � 10 6 , depending on the pro-
force field, so a typical aliphatic carbon atom yields tein�ligand test system, even though the same
an energy of about �0.2 kcal mol�1 upon binding. parameters were used for the number of cycles,
We cross-validated the free energy model in accepted steps and rejected steps. In the case of the
two ways. First, we investigated the influence of GA dockings, the population was 50 and the num-
each member of the training set on the final coeffi- ber of generations was 27,000, which gave a total
cients of the model, by removing each one from of 1.35 � 10 6 energy evaluations in a docking;
the training set and calculating the coefficients thus, the GA dockings were terminated by reach-
from the remaining 29 complexes. We found that ing the maximum number of generations. In the
none of them had a strong effect on the final case of the LGA dockings, 6% of the population
values of the coefficients. underwent Lamarckian local search, each search
We also performed a second kind of cross-vali- consisting of 300 iterations and each iteration us-
dation of the free energy model, by performing ing an extra energy evaluation. Thus, the LGA
Solis and Wets local search using AUTODOCK and dockings, even with the same population size and
the new free energy function, starting from the number of generations as the GA, were terminated
x-ray crystallographic conformations of each in- by reaching the maximum number of energy eval-
hibitor in 20 HIV-1 protease-inhibitor complexes, uations, 1.50 � 10 6 .
to compare the resulting optimized conformations’ The results of the simulated annealing, genetic
predicted free energy change of binding, �G binding , algorithm, and the Lamarckian genetic algorithm
with the experimentally determined values. These docking experiments are summarized in Tables IV,
protease inhibitors were quite different, having V and VI, respectively. The lowest energy docked
from 7 to 28 torsions, and widely different side structure found by each method is compared with
chains—charged, polar, and hydrophobic, and the crystal structure of the ligand in Figure 4. The
constituted a diverse test set. As can be seen from predicted change in free energy upon binding,
the results in Table IX, the correlation was very �Gpred , for the lowest energy found by LGA is
good, with an overall rmsd between the experi- shown in Table VII, along with the experimentally
mental and calculated values of �G binding of 1.92 observed change in free energy upon binding,
kcal mol�1 . �Gobs . In addition, the breakdown of the energy of
The final form of the free energy function may the lowest energy docked conformation is shown,
seem overparameterized, with additional weight- in terms of the intermolecular interaction energy,

1652 VOL. 19, NO. 14


AUTOMATED DOCKING

TABLE IV.
Results of Simulated Annealing Dockings.a

˚)
Energy (kcal mol � 1) and rmsd (A
rmsd
Number Number of Number of
PDB of in Lowest lowest Mean Mean energy
code clusters rank 1 energy energy energy rmsd evaluations

3ptb 5 6 �8.03 0.21 �7.84 (0.08) 0.50 (0.17) 2.01 � 10 6


2cpp 4 6 �7.29 0.81 �7.22 (0.03) 0.91 (0.30) 2.33 � 10 6
2mcp 10 1 �4.09 0.88 70.89 (2.10 � 10 2 ) 5.40 (4.80) 1.85 � 10 6
1stp 10 1 �8.48 1.27 �7.71 (0.66) 1.24 (0.35) 2.00 � 10 6
1hvr 10 1 �11.77 1.15 1.12 � 10 5 (3.36 � 10 5 ) 6.13 (2.61) 1.19 � 10 6
4hmg 10 1 �2.59 3.77 6.99 � 10 4 (1.52 � 10 5 ) 6.20 (2.94) 1.55 � 10 6
4dfr 10 1 �8.73 4.83 6.13 � 10 2 (1.96 � 10 3 ) 5.04 (1.74) 1.30 � 10 6
a
The parameters used were 10 runs, 50 cycles, and a cycle-termination criterion of 25,000 accepted steps or 25,000 rejected
˚ calculated from the ligand’s crystallographic
steps, whichever came first. The rmsd conformational clustering tolerance was 0.5 A,
coordinates. Standard deviations given in parentheses.

�G inter the intramolecular energy, �G intra , and the and the hydrophobic benzyl ring.69 The amidine
torsional free energy, �Gtor . These results are dis- moiety was treated as being protonated. It was
cussed case-by-case in what follows; ‘‘crystallo- assumed that delocalization of the �-electrons of
graphic rmsd’’ refers to the root-mean-square posi- the benzene ring extended to the �-system of the
tional deviation of a given conformation from the amidine, and thus the ligand was treated as a rigid
crystallographic coordinates. body. All three search methods succeeded in find-
ing lowest energy conformations that were also the
ones with the lowest crystallographic rmsd. In this
�-Trypsin / Benzamidine (3ptb)
case, the method that found the docked structure
The recognition of benzamidine by �-trypsin, with the lowest energy was GA ŽTable V., but that
which binds tightly in the specificity pocket of found by the LGA method ŽTable VI. was practi-
trypsin, is chiefly due to the polar amidine moiety cally the same. The mean of the final docked

TABLE V.
Results of Genetic Algorithm Dockings.a

˚)
Energy (kcal mol � 1) and rmsd (A
rmsd
Number Number of Number of
PDB of in Lowest lowest Mean Mean energy
code clusters rank 1 energy energy energy rmsd evaluations

3ptb 2 9 �8.17 0.32 �7.72 (1.35) 1.50 (3.39) 1.35 � 10 6


2cpp 4 7 �7.36 0.93 �6.65 (2.11) 2.18 (3.42) 1.35 � 10 6
2mcp 10 1 �5.17 0.85 �3.61 (0.95) 5.26 (2.98) 1.35 � 10 6
1stp 7 4 �10.09 0.75 �8.42 (1.82) 2.96 (3.04) 1.35 � 10 6
1hvr 7 4 �21.41 0.82 �11.09 (9.79) 2.79 (1.97) 1.35 � 10 6
4hmg 9 2 �7.60 1.11 �5.72 (1.77) 2.32 (1.43) 1.35 � 10 6
4dfr 10 1 �16.10 0.95 �10.24 (3.95) 4.39 (2.37) 1.35 � 10 6
a
The parameters used were 10 runs a population size of 50, and a run-termination criterion of a maximum of 27,000 generations or
a maximum of 1.5 � 10 6 energy evaluations, whichever came first. Note that, in this case all runs terminated after the maximum
number of generations was reached, which equals the product of the population size and the number of generations. The rmsd
˚ calculated from the ligand’s crystallographic coordinates. Standard deviations are
conformational clustering tolerance was 0.5 A,
given in parentheses.

JOURNAL OF COMPUTATIONAL CHEMISTRY 1653


MORRIS ET AL.

TABLE VI.
Results of Lamarckian Genetic Algorithm Dockings.a

˚)
Energy (kcal mol � 1) and rmsd (A
rmsd
Number Number of Number of
PDB of in Lowest lowest Mean Mean energy
code clusters rank 1 energy energy energy rmsd evaluations

3ptb 1 10 �8.15 0.45 �8.15 (0.00) 0.46 (0.01) 1.50 � 10 6


2cpp 1 10 �7.36 0.93 �7.36 (0.00) 0.93 (0.00) 1.50 � 10 6
2mcp 6 2 �5.54 1.05 �4.15 (0.15) 1.10 (0.07) 1.50 � 10 6
1stp 1 10 �10.14 0.69 �10.06 (0.05) 0.66 (0.06) 1.50 � 10 6
1hvr 2 9 �21.38 0.76 �19.11 (6.92) 0.85 (0.35) 1.50 � 10 6
4hmg 3 7 �7.72 1.14 �7.54 (0.19) 1.18 (0.12) 1.50 � 10 6
4dfr 2 7 �16.98 1.03 �16.90 (0.07) 0.98 (0.07) 1.56 � 10 6
a
The parameters used were 10 runs, a population size of 50, and a run-termination criterion of a maximum of 27,000 generations
or a maximum of 1.5 � 10 6 energy evaluations, whichever came first. Because local search also uses energy evaluations, the total
number of energy evaluations for the LGA method was greater than that for the GA method, using the same population size and
maximum number of generations; in the LGA dockings, the runs terminated because the maximum number of energy evaluations
was exceeded. The rmsd conformational clustering tolerance was 0.5 A,˚ calculated from the ligand’s crystallographic coordinates.
Standard deviations are given in parentheses.

energy across the ten dockings was lowest for strate. The lowest energy found was �7.36 kcal
LGA, followed by SA, and finally GA. This is mol�1 , found by both the GA ŽTable V. and LGA
reflected in a comparison of the mean rmsd of the methods ŽTable VI.; SA’s lowest energy was �7.29
docked conformation from the crystallographic kcal mol�1 ŽTable IV., which is practically the
structure for each of the methods: GA had the same. All methods found the crystallographic
highest mean rmsd, followed by SA, and on aver- structure, SA succeeding in 9 of 10 dockings, GA
age, the LGA produced conformations with the in 7 out of 10 dockings, and LGA in all of the
lowest crystallographic rmsd. Thus, considering dockings Žwith success, once again, being mea-
their average performance, the best search method sured as having a crystallographic rmsd of less
at finding the lowest energy and the lowest rmsd than 1 A ˚ .. In all three search method cases, the
was the LGA. The predicted binding free energy, lowest energy cluster was the most populated,
�Gpred , of the lowest docked energy structure ob- with 6, 9, and 10 members using SA, GA, and
tained using the LGA method was �8.15 kcal LGA, respectively. The predicted binding free en-
mol�1 ŽTable VII., whereas the observed value, ergy, �Gpred , of the lowest docked energy struc-
�Gobs was �6.46 kcal mol�1 : this is within the ture, was �7.36 kcal mol�1 using the LGA method
estimated error of the model. Žsee Table VII., whereas the observed value, �Gobs ,
was �8.27 kcal mol�1 —once again, this was
within the estimated error of the model.
Cytochrome P-450 c am / Camphor (2cpp)
Camphor binds to the monooxygenase cy-
McPC-603 / Phosphocholine (2mcp)
tochrome P-450 cam such that the 5-exo C—H bond
is hydroxylated stereospecifically. The active site is Antibody molecules bind their target antigens
deeply sequestered within the enzyme, and the with exquisite specificity, having close comple-
crystal structure of the complex does not possess mentarity between antigen and antibody surfaces,
an obvious substrate access channel.70 This buried hydrogen bonding, van der Waals, and electro-
active site presents a more challenging docking static interactions. Phosphocholine binds to Fab
problem than 3ptb. Once bound, however, the McPC-603,71 and is an example of recognition is
substrate is ‘‘tethered’’ by a hydrogen bond that is predominantly electrostatic in character, primarily
donated from the Tyr-96 hydroxyl to the carbonyl due to the influence of Arg H52.72 There is little
oxygen of camphor, while the subtle complemen- conformational change in the side chains of Fab
tarity of the pocket and the hydrophobic skeleton McPC-603 upon binding, as indicated from the
of camphor help to position the rest of the sub- unbound crystal structure. We allowed all four

1654 VOL. 19, NO. 14


AUTOMATED DOCKING

FIGURE 4. A comparison of the lowest energy structure found by each search method and the crystal structure. The
latter is shown in black. The simulated annealing results are rendered with a striped texture, the genetic algorithm
results are shaded gray, and the Lamarckian genetic algorithm results are white. Oxygen atoms are shown as spheres;
other heteroatoms are not shown. Note that simulated annealing failed in the last two test cases, 4hmg and 4dfr, but
both the genetic algorithm and the Lamarckian genetic algorithm succeeded.

bonds to rotate during docking. The energy of the were more significant. Both SA and GA found 10
crystal structure was positive, due in most part to different clusters, whereas LGA found 6 clusters.
a large, positive internal energy dominated by C2 The mean energy of the 10 dockings was �70.89,
and O1 being too close Ž2.26 A ˚ .; this could be �3.61, and �4.15 kcal mol�1 for SA, GA, and
improved if local minimization had been per- LGA, respectively. Thus, on average, the LGA per-
formed on the crystal structure before docking. formed best in finding the lowest energy docked
The lowest energy found by each of the three structure. Furthermore, the mean rmsd from the
search methods were �4.09, �5.17, and �5.54 crystallographic coordinates was 5.40, 5.26, and
kcal mol�1 , using SA, GA, and LGA, respectively. ˚ for SA, GA, and LGA, respectively, indicat-
1.10 A
Unlike 3ptb and 2cpp, these differences in energy ing that LGA also reproduced the crystal structure

JOURNAL OF COMPUTATIONAL CHEMISTRY 1655


MORRIS ET AL.

TABLE VII.
Comparison of Predicted Free Energy of Binding, � Gpred , of Lowest Energy Docked Structure Obtained
Using Lamarckian Genetic Algorithm, and Observed Free Energy of Binding, � Gobs .a

˚)
Energy (kcal mol � 1) and rmsd (A
rmsd of
PDB Lowest lowest
code energy energy �G inter �G intra �G t or �Gpred �Gobs ( �Gpred � �Gobs )

3ptb �8.15 0.45 �8.15 0.00 0.00 �8.15 �6.46 �1.69


2cpp �7.36 0.93 �7.36 0.00 0.00 �7.36 �8.27 +0.91
2mcp �5.54 1.05 �6.57 +1.03 +1.25 �5.32 �7.13 +1.81
1stp �10.14 0.69 �9.90 �0.24 +1.56 �8.34 �18.27 +9.93 b
1hvr �21.38 0.76 �19.34 �2.04 +2.49 �16.85 �12.96 �3.89
4hmg �7.72 1.14 �8.93 +1.21 +2.18 �6.75 �3.48 �3.27
4dfr �16.97 1.03 �16.57 �0.40 +2.18 �14.39 �13.22 �1.17
a
�G in t er is the intermolecular interaction energy between the ligand and the receptor, �G in tr a is the intramolecular interaction
energy of the ligand, and �G t or is the torsional free energy change of the ligand upon binding.
b
This large discrepancy may be due to neglect of the conformational rearrangements of streptavidin upon binding biotin, which are
neglected in the docking simulation and binding free energy calculation.

most often. The predicted binding free energy, It was not possible to include the entropic ef-
�Gpred , of the lowest docked energy structure was fects of the flexible surface loops of streptavidin in
�5.32 kcal mol�1 using the LGA method Žsee the docking of biotin, although they make signifi-
Table VII., whereas the observed value, �Gobs , was cant contributions to the binding free energy as
�7.13 kcal mol�1 —this was also within the esti- revealed by a recent set of experiments involving
mated error of the model. an atomic force microscope.74 It was found that the
unbinding forces of discrete complexes of strepta-
vidin with biotin analogs were proportional to the
Streptavidin / Biotin (1stp)
enthalpy change of the complex formation but
One of the most tightly binding noncovalent independent of changes in the free energy, which
complexes is that of streptavidin�biotin, with an indicates that the unbinding process is adiabatic
experimentally observed dissociation constant, K d , and that entropic changes occur after unbinding.
of 10�1 5 M. Comparison of the apo form and the This may help to explain why the predicted bind-
complex 73 shows that the high affinity results from ing free energy of the streptavidin�biotin complex
several factors, including formation of multiple Ž �Gpred . �10.14 kcal mol�1 , underestimated the
hydrogen bonds and van der Waals interactions magnitude of the observed value Ž �Gobs . �18.27
between the biotin and the protein, in addition to kcal mol�1 ŽTable VII..
the ordering of surface polypeptide loops of strep-
tavidin upon binding biotin. The method that
found the lowest energy was LGA, at �10.14 kcal HIV-1 Protease / XK263 (1hvr)
mol�1 , although GA was not significantly differ- HIV-1 protease inhibitors prevent the matura-
ent, followed by SA with �8.48 kcal mol�1 . The tion of virions of HIV, and are a major target for
method with the lowest mean energy was LGA at computer-assisted drug design in the development
�10.06 kcal mol�1 , then GA with �8.42 kcal of AIDS therapies. Substrates and inhibitors of
mol�1 , and finally SA with �7.76 kcal mol�1 . The HIV-1 protease are typically extended peptides or
method that found the crystallographic complex peptidomimetics, with a dozen or more freely ro-
coordinates most often was LGA, having a mean tatable bonds and, as such, they present a chal-
˚ then SA at 1.24 A,
rmsd of 0.66 A, ˚ and finally GA lenging target for automated docking techniques.
˚
with 2.96 A. At the rmsd tolerance chosen for these In addition, considerable protein motion is ex-
˚ SA found 10 different confor-
experiments, 0.5 A, pected in the flaps upon binding, to allow the
mational clusters, GA found 7 clusters Žthe most continuous polypeptide to reach the active site.
populated was rank 1, with 4 members., and LGA However, most docking methods use a rigid pro-
found 1 cluster. tein target, and explicit modeling of the opening

1656 VOL. 19, NO. 14


AUTOMATED DOCKING

and closing of the flaps is not performed: thus, the crystallographic rmsd was 2.36 A. ˚ The mean en-
ligand must ‘‘thread’’ its way into the active site. ergy of all 10 SA dockings was very high Ž6.99 �
The cyclic urea HIV-protease inhibitor, XK-263, 10 4 kcal mol�1 .; only 4 of the 10 SA dockings
has 10 rotatable bonds, excluding the cyclic urea’s found negative energies. Both GA and LGA, how-
flexibility. All three search methods found solu- ever, succeeded in finding conformations near
tions near to the crystal structure75 : interestingly, Ž� 1.5 A˚ rmsd. the crystal conformation. The low-
the lowest docking energy found by SA was est energy found was by LGA, and was �7.72 kcal
�11.77 kcal mol�1 and had an rmsd of 1.15 A ˚ mol�1 . This structure had a crystallographic rmsd
from the crystal structure, whereas GA and LGA ˚ and had a predicted binding free energy,
of 1.14 A,
found much lower energies but were still near to �Gpred , of �6.75 kcal mol�1 ; the observed binding
the active site, having crystallographic rmsd val- free energy, �Gobs , for the sialic acid�hemagg-
ues of 0.82 A˚ and 0.76 A,
˚ respectively. The lowest lutinin complex was �3.48 kcal mol�1 . The differ-
docking energy found overall was �21.41 kcal ence in predicted and observed binding free ener-
mol�1 , and was found using GA, although that gies may be due to the structural differences
found by LGA was practically the same. The pre- between the isopropylated derivative that was
dicted binding free energy, �Gpred , of the lowest docked and sialic acid itself.
energy structure was �16.85 kcal mol�1 using
LGA, whereas the observed value, �Gobs , was
�12.96 kcal mol�1 . The larger discrepancy be- Dihydrofolate Reductase / Methotrexate
(4dgr)
tween the predicted and observed values may be
due to the entropic contributions of protein side Methotrexate is an antimetabolite that attacks
chain and flap conformational rearrangements, or proliferating tissue selectively induces remissions
may be due to other low-energy conformational in certain acute leukemias77 ; however, dangerous
states of the cyclic urea moiety of XK-263, which side effects of methotrexate in normal cells con-
are neglected in our calculations. tinue to make DHFR an important target in com-
puter-assisted anticancer drug design.78 We used
the crystal structure of E. coli dihydrofolate reduc-
Influenza Hemagglutinin / Sialic Acid (4hmg) tase complexed with methotrexate79 to investigate
The recognition of sialic acid by influenza a more challenging docking problem. We assumed
hemagglutinin is chiefly mediated through hydro- that waters 603, 604, and 639, which mediate hy-
gen bonding: sialic acid has five hydroxyls, three drogen bonding between the inhibitor and the
in the glycerol group, one carboxylate, a cyclic protein, were conserved biowaters, and included
ether oxygen, and an acetamido group, with a total them in the protein structure in our grid calcula-
of 11 rotatable acyclic bonds. We used the crystal tions. Ideally, these should be predicted, and re-
structure of Weis et al.,76 although the low resolu- cently a method based on a k-nearest-neighbors
tion meant that the overall coordinate error was classifier and a genetic algorithm called Consolv
approximately 0.35�0.40 A, ˚ which, in itself, pre- was reported to do just this.80
sents a potential challenge in the docking tests. We This is one of the two test cases where simu-
modeled an isopropylated derivative of sialic acid lated annealing failed: the lowest energy structure
to mimic part of an adjacent six-membered ring that it found had an rmsd of 4.83 A ˚ from the
that would normally be present in this complex, crystal structure. This could be because the final
but was not seen due to disorder: this introduced docked conformation in simulated annealing is
an extra rotatable bond, giving a total of 11 tor- arrived at after a series of continuous steps, and if
sions. Furthermore, in these tests, we used the the route to the active site is blocked, the docking
crystal conformation of the six-membered ring, will tend to fail before the ligand reaches the
although normally we would use several of the active site. Note that, in the case of the camphor�
lowest energy conformations of the ring system cytochrome P-450 cam docking, the random initial-
and dock these separately. ization loop was able to find initial states that were
This was one of two cases where simulated inside the binding pocket, but in this case the
annealing failed to find a docking that was near dockings failed to start near the active site.
the crystal structure: the lowest energy structure The lowest energy found was �16.98 kcal
found had an rmsd of 3.77 A ˚ from the crystallo- mol�1 , and was found using LGA: this structure
graphic structure, and the docking with the lowest had an rmsd from the crystal structure of 1.03 A. ˚

JOURNAL OF COMPUTATIONAL CHEMISTRY 1657


MORRIS ET AL.

The predicted binding free energy, �Gpred , of the formance of each search method Žsee Table VIII..
lowest docked energy structure was �14.39 kcal If, in each of the seven test systems, we assume
mol�1 using LGA, whereas the observed value, that the lowest docked energy found by any
�Gobs , was �13.22 kcal mol�1 . This finding was method is the effective global minimum energy,
within the estimated error of the model. and then calculate the difference between this en-
ergy and all of the docked energies found by each
JUDGING SEARCH METHODS search method, we can then calculate the mean
and standard deviation of this difference energy
To evaluate the new search methods, and to for each search method. Ideally, the mean and
compare them with the earlier search method of standard deviation of this value would be zero.
simulated annealing, we addressed the following The mean of this difference energy was lowest for
questions: Which search method is most efficient? LGA Ž0.40 kcal mol�1 ., followed by GA Ž3.41 kcal
That is, which finds the lowest energy in a given mol�1 ., and finally SA Ž2.62 � 10 5 kcal mol�1 .:
number of energy evaluations? Which search
the very high mean difference energy for SA is
method is most reliable? That is, which method
indicative of the cases in which this method failed
finds the most conformations similar to that of the
to escape a local minimum, where the ligand
lowest energy? Finally, which search method is
was partially or wholly trapped within the pro-
most successful? That is, which finds the crystallo-
tein. Hence, in answer to the first question, the
graphic conformation most often after a given
Lamarckian genetic algorithm, LGA, is the most
number of dockings? Furthermore, because these
efficient search method.
comparisons were carried out using the new, em-
In terms of how often the structure with lowest
pirical free energy force field, these tests also rep-
energy was found, LGA performed best: the mean
resent an evaluation of the force field itself, and, if
the global minimum of the force field is unable to of the number of docked structures in rank 1 was
reproduce observed crystallographic structures, its 78% for LGA, 40% for GA, and 24% for SA. The
usefulness will be limited. Because it is very diffi- mean of the number of clusters found was lowest
cult to determine the global minimum of such a for LGA Ž2.29., followed by GA Ž7.00., and finally
complex function, we cannot answer this question SA Ž8.43.. Hence, the most reliable search method
definitively; however, we can report the lowest was LGA.
energy found by any of the methods and its struc- In comparing the relative success of each search
tural similarity to that of the crystal structure. method in reproducing the crystallographic struc-
If we calculate statistics across all seven pro- ture, considering the crystallographic rmsd across
tein�ligand test systems for each search method, all 10 dockings in each of the 7 test systems, the
we obtain a quantitative estimate of relative per- mean rmsd was lowest for LGA Ž0.88 A, ˚ standard

TABLE VIII.
Statistical Comparison of Three Search Methods in AUTODOCK 3.0 Across all Seven Test Systems.a

˚)
Energy (kcal mol � 1) and rmsd (A
rmsd
Number Number Difference from of Number of
Search of in effective global lowest Mean Mean energy
method Statistic clusters rank 1 minimum energy energy energy rmsd evaluations

SA Mean 8.43 2.43 2.62 � 10 5 1.85 2.61 � 10 5 3.63 1.75 � 10 6


SD 2.70 2.44 1.40 � 10 5 1.74 4.60 � 10 4 2.61 4.15 � 10 5
GA Mean 7.00 4.00 3.41 0.82 �7.64 3.06 1.35 � 10 6
SD 3.06 3.06 5.31 0.25 2.59 1.32 0.00
LGA Mean 2.29 7.86 0.40 0.86 �10.47 0.88 1.50 � 10 6
SD 1.80 2.91 2.62 0.24 5.47 0.25 0.00
a
The search methods are simulated annealing (SA), genetic algorithm (GA), and Lamarckian genetic algorithm (LGA). The mean
and standard deviation (SD) for each criterion is shown. The effective global minimum energy for each of the seven test systems is
the lowest docked energy found by any method for that test system. For each of the 10 dockings, the difference between the final
docked energy and this effective global minimum energy was calculated; the mean and standard deviation was calculated across
all 7 test systems, which was repeated for each search method.

1658 VOL. 19, NO. 14


AUTOMATED DOCKING

TABLE IX. search methods is an estimate of the quality of the


Results of Cross-Validation of Free Energy force field, although this is complicated by the fact
Function Using Local Search on 20 HIV-1
that the search method itself must determine a
Protease-Inhibitor Complexes.
docking near to the global minimum, an unknown
Experimental Calculated state. We can calculate the energy of the ligand in
PDB �G binding �G binding the crystal structure using the new force field Žsee
code (kcal / mol) (kcal / mol) Table II., which we assume to be near the global
minimum, but, unfortunately, the crystal structure
1hvs �14.04 �10.95
may contain frustrations and bad contacts. This
1hvk �13.79 �11.60
1hvi �13.74 �12.39
appears to be the case in 2mcp, where a close
7hvp �13.11 �12.19 contact between C2 and O1 causes a positive total
1hps �12.57 �11.80 energy to be calculated for the crystal structure. In
1hpv �12.57 �8.24 all cases, the lowest energy found, considering all
4phv �12.51 �14.36 the search methods, was lower than that of the
1hef �12.27 �9.52 corresponding crystal structure.
1hiv �12.27 �13.02 The crystallographic rmsd of the lowest energy
1hvl �12.27 �10.35 Žfound by any search method. for each of the
8hvp �12.27 �9.36 ˚
1aaq �11.62 �9.68
protein�ligand test systems were all within 1.14 A,
1htg �11.58 �13.13 or less, of the crystal structure. This suggests that
9hvp �11.38 �10.54 the force field’s global minimum in each of the
1hih �10.97 �11.43 protein�ligand cases was near to the crystal struc-
1heg �10.56 �8.60 ture, if we accept the assumption that the crystal
1sbg �10.56 �10.35 structure was near to or at the global minimum,
1htf �9.31 �8.21 and that the lowest energy found was near to the
1hbv �8.68 �9.75
global minimum. In some cases, dockings were
1hte �7.69 �7.28
found that had lower crystallographic rmsd values
but slightly higher energies than the lowest energy
found. All of the lowest crystallographic rmsd
values were 0.89 A ˚ or less, indicating that low-en-
˚ ., followed by GA Ž3.06 A, ˚ stan- ergy structures found by the force field were very
deviation 0.25 A
˚ ., and finally SA Ž3.63 A,
˚ similar to the corresponding crystal structure.
dard deviation 1.32 A
standard deviation 2.61 A ˚ .. These average results
indicate that, of the three search methods, LGA
will find the crystallographic structure most often. Conclusion
Thus, the answer to the last question, ‘‘Which
method is most successful?,’’ is LGA. AUTODOCK is a software package of general
In two different cases, 4hmg and 4dfr, the simu- applicability for automated docking of small
lated annealing method failed to reproduce the molecules, such as peptides, enzyme inhibitors,
corresponding crystal structure, although it suc- and drugs, to macromolecules, such as proteins,
ceeded with 1hvr Žsee Fig. 4.. This is important enzymes, antibodies, DNA, and RNA. New search
because methotrexate has 7 rotatable bonds, and methods have been introduced and tested here,
would be expected to be solvable using our rule- using a new, empirical binding free energy func-
of-thumb that SA succeeds in problems with 8 tion for calculating ligand�receptor binding affini-
torsions or less; however, the HIV-1 protease in- ties.
hibitor XK-263, has 10 rotatable bonds, and was We have shown that, of the three search meth-
successfully docked using SA. Thus, the degree of ods tested in AUTODOCK Žsimulated annealing,
difficulty of a docking problem is not as simple as genetic algorithm, and Lamarckian genetic algo-
how many rotatable bonds there are; other factors, rithm., the most efficient, reliable, and successful
such as the nature of the energy landscape, clearly is the Lamarckian genetic algorithm LGA. We de-
play an important role. fined efficiency of search in terms of lowest energy
It could be said that the crystallographic rmsd found in a given number of energy evaluations;
of the lowest energy structure found by any of the reliability in terms of reproducibility of finding the

JOURNAL OF COMPUTATIONAL CHEMISTRY 1659


MORRIS ET AL.

lowest energy structure in independent dockings, the docking proceeds, to improve the calculation
as measured by the number of conformations in of the binding affinity.
the top ranked cluster; and success in terms of
reproducing the known crystal structure. Simu-
lated annealing failed to reproduce the crystal Availability
structures for the influenza hemagglutin�sialic acid
complex Ž4hmg. and the dihydrofolate reductase� More information about AUTODOCK and how to
methotrexate complex Ž4dfr.. However, both the obtain it can be found on the World Wide Web
genetic algorithm and the Lamarckian genetic al- at: http:��www.scripps.edu�pub�olson-web�
gorithm methods succeeded. Thus, the introduc- doc�autodock.
tion of the LGA search method extends the power
and applicability of AUTODOCK to docking prob-
lems with more degrees of freedom than could be Acknowledgments
handled by earlier versions.
The predicted binding affinities of the lowest The authors thank Dr. Bruce S. Duncan and Dr.
energy docked conformations, using the LGA Christopher Rosin for their helpful comments and
method and the new empirical free energy func- suggestions. This work is publication 10887-MB
from The Scripps Research Institute.
tion, were within the standard residual error of the
force field in four of the seven cases Ž3ptb, 2cpp,
2mcp, and 4dfr., and reasonably close in two other
cases Ž1hvr and 4hmg.. The large discrepancy be-
References
tween the predicted and the observed binding 1. J. M. Blaney, and J. S. Dixon, Perspect. Drug Discov. Design,
affinity of biotin for streptavidin Ž1stp., even 1, 301 Ž1993..
though the crystal structure was successfully re- 2. I. D. Kuntz, E. C. Meng, and B. K. Shoichet, Acc. Chem. Res.,
produced, may be due to the large free energy 27, 117 Ž1994..
change that accompanies conformational changes 3. R. Rosenfeld, S. Vajda, and C. DeLisi, Annu. Rev. Biophys.
in the protein upon binding, in particular the sur- Biomol. Struct., 24, 677 Ž1995..
face loops. This remains a limitation of the method, 4. I. D. Kuntz, J. M. Blaney, S. J. Oatley, R. Langridge, and
T. E. Ferrin, J. Mol. Biol., 161, 269 Ž1982..
because protein motion is not modeled and suc-
5. B. K. Shoichet and I. D. Kuntz, Prot. Eng., 6, 723 Ž1993..
cessfully predicting such large-scale protein con-
6. D. S. Goodsell and A. J. Olson, Prot. Struct. Func. Genet., 8,
formational changes is difficult. The AUTODOCK
195 Ž1990..
method works well when there is little change
7. G. M. Morris, D. S. Goodsell, R. Huey, and A. J. Olson, J.
between the apo and ligand-bound forms of the Comput.-Aided Mol. Des., 10, 293 Ž1996..
protein, even if the protein undergoes significant 8. N. Pattabiraman, M. Levitt, T. E. Ferrin, and R. Langridge,
conformational changes during binding. J. Comput. Chem., 6, 432 Ž1985..
AUTODOCK predicts the binding affinity using 9. P. J. Goodford, J. Med. Chem., 28, 849 Ž1985..
one conformation of the ligand�protein complex. 10. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H.
A new class of models for predicting receptor�lig- Teller, and E. Teller, J. Chem. Phys., 21, 1087 Ž1953..
and binding affinities has been reported recently 11. S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, Science,
that considers not just the lowest energy state of 220, 671 Ž1983..
the complex, but the predominant states of the bind- 12. E. A. Lunney, S. E. Hagen, J. M. Domagala, C. Humblet, J.
Kosinski, B. D. Tait, J. S. Warmus, M. Wilson, D. Ferguson,
ing molecules.81 These approaches are grounded in D. Hupe, P. J. Tummino, E. T. Baldwin, T. N. Bhat, B. Liu,
statistical thermodynamics, and combine a modest and J. W. Erickson, J. Med. Chem., 37, 2664 Ž1994..
set of degrees of freedom with aggressive confor- 13. J. V. N. Vara Prasad, K. S. Para, D. F. Ortwine, J. B. Dunbar
mational sampling to identify the low-energy con- Jr., D. Ferguson, P. J. Tummino, D. Hupe, B. D. Tait, J. M.
formations of the complex and the free molecules. Domagala, C. Humblet, T. N. Bhat, B. Liu, D. M. A. Guerin,
E. T. Baldwin, J. W. Erickson, and T. K. Sawyer, J. Am.
AUTODOCK version 3.0 currently performs exten- Chem. Soc., 116, 6989 Ž1994..
sive conformational sampling, information that 14. A. R. Friedman, V. A. Roberts, and J. A. Tainer, Prot. Struct.
could be incorporated into the calculation of the Func. Genet., 20, 15 Ž1994..
binding affinity. We are studying how the search 15. B. L. Stoddard and D. E. Koshland, Nature, 358, 774 Ž1992..
methods can be modified such that statistical ther- 16. D. S. Goodsell, G. M. Morris, and A. J. Olson, J. Mol. Recog.,
modynamics calculations can be performed while 9, 1 Ž1996..

1660 VOL. 19, NO. 14


AUTOMATED DOCKING

17. C. M. Oshiro, I. D. Kuntz, and J. S. Dixon, J. Comput.-Aided 41. I. K. McDonald and J. M. Thornton, J. Mol. Biol., 238, 777
Mol. Design, 9, 113 Ž1995.. Ž1994..
18. D. R. Westhead, D. E. Clark, D. Frenkel, J. Li, C. W. Murray, 42. M. K. Gilson and B. Honig, Nature, 330, 84 Ž1987..
B. Robson, and B. Waszkowycz, J. Comput.-Aided Mol. De- 43. D. Bashford and M. Karplus, Biochemistry, 29, 10219 Ž1990..
sign, 9, 139 Ž1995..
44. D. Bashford and K. Gerwert, J. Mol. Biol., 224, 473 Ž1992..
19. P. Willet, TIBTECH, 13, 516 Ž1995..
45. B. Honig and A. Nicholls, Science, 268, 1144 Ž1995..
20. D. E. Clark and D. R. Westhead, J. Comput.-Aided Mol.
Design, 10, 337 Ž1996.. 46. P. A. Bash, U. C. Singh, F. K. Brown, R. Langridge, and
P. A. Kollman, Science, 235, 574 Ž1987..
21. C. D. Rosin, R. S. Halliday, W. E. Hart, and R. K. Belew, In
Proceedings of the Seventh International Conference on Genetic 47. L. P. Hammett, J. Am. Chem. Soc., 59, 96 Ž1937..
Algorithms Ž ICGA97 ., T. Baeck, Ed., Morgan Kauffman, San 48. T. Fujita, J. Iwasa, and C. Hansch, J. Am. Chem. Soc., 86,
Francisco, CA, 1997. 5175 Ž1964..
22. D. K. Gehlhaar, G. M. Verkhivker, P. A. Rejto, C. J. Sher- 49. C. Hansch, A. R. Steward, J. Iwasa, and E. W. Deutsch, Mol.
man, D. B. Fogel, L. J. Fogel, and S. T. Freer, Chem. Biol., 2, Pharmacol., 1, 205 Ž1965..
317 Ž1995.. 50. C. D. Selassie, Z. X. Fang, R. L. Li, C. Hansch, G. Debnath,
23. J. H. Holland, Adaptation in Natural and Artificial Systems, T. E. Klein, R. Langridge, and B. T. Kaufman, J. Med. Chem.,
University of Michigan Press, Ann Arbor, MI, 1975. 32, 1895 Ž1989..
˘
24. S. S. Cetverikov, J. Exper. Biol., 2, 3 Ž1926.. 51. D. H. Williams, J. P. L. Cox, A. J. Doig, M. Gardner, U.
25. Z. Michalewicz, Genetic Algorithms � Data Structures � Gerhard, P. T. Kaye, A. R. Lal, I. A. Nicholls, C. J. Salter,
Evolution Programs, Springer-Verlag, New York, 1996. and R. C. Mitchell, J. Am. Chem. Soc., 113, 7020 Ž1991..
26. W. E. Hart, Adaptive Global Optimization with Local Search, 52. L. Wesson and D. Eisenberg, Prot. Sci., 1, 227 Ž1992..
Ph.D. Thesis, Computer Science and Engineering Depart- ¨
53. H.-J. Bohm, J. Comput.-Aided Mol. Design, 6, 593 Ž1992..
ment, University of California, San Diego, 1994. See al-
¨
54. H.-J. Bohm, J. Comput.-Aided Mol. Design, 8, 243 Ž1994..
so: ‘‘ftp:��ftp.cs.sandia.gov�pub�papers�wehart�thesis.
ps.gz.’’ 55. A. N. Jain, J. Comput.-Aided Mol. Design, 10, 427 Ž1996..
27. W. E. Hart, T. E. Kammeyer, and R. K. Belew, In Founda- 56. E. L. Mehler and T. Solmajer, Prot. Eng., 4, 903 Ž1991..
tions of Genetic Algorithms III, D. Whitley and M. Vose, Eds., 57. M. F. Sanner, A. J. Olson, and J.-C. Spehner, Biopolymers, 38,
Morgan Kauffman, San Francisco, CA, 1994. 305 Ž1996..
28. R. K. Belew and M. Mitchell, Adaptive Individuals in Evolv- ¨
58. P. F. W. Stouten, C. Frommel, H. Nakamura, and C. Sander,
ing Populations: Models and Algorithms. Santa Fe Institute Mol. Simul., 10, 97 Ž1993..
Studies in the Science of Complexity, XXVI, Addison-Wesley,
59. S. Hill, In Graphics Gems IV, P. S. Heckbert, Ed., Academic
Reading, MA, 1996.
Press, London, 1994, p. 521.
29. F. J. Solis and R. J.-B. Wets, Math. Oper. Res., 6, 19 Ž1981..
60. P. W. Atkins, Physical Chemistry, Oxford University Press,
30. P.-G. Maillot, In Graphics Gems, A. S. Glassner, Ed., Aca- Oxford, 1982, p. 263.
demic Press, London, 1990, p. 498.
61. H. R. Horton, L. A. Moran, R. S. Ochs, J. D. Rawn, and K. G.
31. A. Watt and M. Watt, In Advanced Animation and Rendering Scrimgeour, Principles of Biochemistry, Prentice-Hall, Lon-
Techniques—Theory and Practice, ACM Press, New York. don, 1993.
32. P. L’Ecuyer and S. Cote, ACM Trans. Math. Software, 17, 98 62. S-PLUS, Statistical Sciences, Inc., Seattle, WA.
Ž1991..
63. F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. Meyer,
33. J. B. Lamarck, Zoological Philosophy, Macmillan, London,
Jr., M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi,
1914.
and M. Tasumi, J. Mol. Biol., 112, 535 Ž1977..
34. S. J. Weiner, P. A. Kollman, D. A. Case, U. C. Singh, C.
64. E. E. Abola, F. C. Bernstein, S. H. Byant, T. F. Koetzle, and J.
Ghio, G. Alagona, S. Profeta Jr., and P. Weiner, J. Am.
Weng, In Crystallographic Databases-Information Content,
Chem. Soc., 106, 765 Ž1984..
Software Systems, Scientific Applications, F. H. Allen, G. Berg-
35. W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. erhoff, and R. Sievers, Eds., Data Commission of the Inter-
Merz Jr., D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. national Union of Crystallography, Bonn� Cambridge�
Caldwell, and P. A. Kollman, J. Am. Chem. Soc., 117, 5179 Chester, 1987, p. 107.
Ž1995..
65. SYBYL, Tripos Associates, Inc., St. Louis, MO.
36. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S.
66. M. Marsili and J. Gasteiger, Chim. Acta, 52, 601 Ž1980..
Swaminathan, and M. Karplus, J. Comput. Chem., 4, 187
Ž1983.. 67. J. Gasteiger and M. Marsili, Tetrahedron, 36, 3210 Ž1980..
37. A. T. Hagler, E. Huler, and S. Lifson, J. Am. Chem. Soc., 96, 68. D. N. A. Boobbyer, P. J. Goodford, P. M. McWhinnie, and
5319 Ž1977.. R. C. Wade, J. Med. Chem., 32, 1083 Ž1989..
´
38. G. Nemethy, M. S. Pottle, and H. A. Scheraga, J. Phys. 69. M. Marquart, J. Walter, J. Deisenhofer, W. Bode, and R.
Chem., 87, 1883 Ž1983.. Huber, Acta Crystallogr. Ž Sect. B ., 39, 480 Ž1983..
39. H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. 70. T. L. Poulos, B. C. Finzel, and A. J. Howard, J. Mol. Biol.,
diNola, and J. R. Haak, J. Chem. Phys., 81, 3684 Ž1984.. 195, 687 Ž1987..
40. J. H. van der Waals, Lehrbuch der Thermodynamik, Part 1, 71. E. A. Padlan, G. H. Cohen, and D. R. Davies, Ann. Im-
Mass and Van Suchtelen, Leipzig, 1908. munol. Ž Paris . Ž Sect. C ., 136, 271 Ž1985..

JOURNAL OF COMPUTATIONAL CHEMISTRY 1661


MORRIS ET AL.

72. J. Novotny, R. E. Bruccoleri, and F. A. Saul, Biochemistry, 28, 77. C. K. Matthews and K. E. van Holde, Biochemistry, Ben-
4735 Ž1989.. jamin�Cummings, Redwood City, CA, 1990.
73. P. C. Weber, D. H. Ohlendorf, J. J. Wendolski, and F. R. 78. C. A. Reynolds, W. G. Richards, and P. J. Goodford, Anti-
Salemme, Science, 243, 85 Ž1989.. Cancer Drug Des., 1, 291 Ž1987..
74. V. T. Moy, E.-L. Florin, and H. E. Gaub, Science, 266, 257
Ž1994.. 79. J. T. Bolin, D. J. Filman, D. A. Matthews, R. C. Hamlin, and
J. Kraut, J. Biol. Chem., 257, 13650 Ž1982..
75. P. Y. S. Lam, P. K. Jadhav, C. J. Eyerman, C. N. Hodge, Y.
Ru, L. T. Bacheler, J. L. Meek, M. J. Otto, M. M. Rayner, Y. 80. M. L. Raymer, P. C. Sanschagrin, W. F. Punch, S. Venkatara-
Wong, C.-H. Chang, P. C. Weber, D. A. Jackson, T. R. man, E. D. Goodman, and L. A. Kuhn, J. Mol. Biol., 265, 445
Sharpe, and S. Erickson-Viitanen, Science, 263, 380 Ž1994.. Ž1997..
¨
76. W. I. Weis, A. T. Brunger, J. J. Skehel, and D. C. Wiley, J. 81. M. K. Gilson, J. A. Given, and M. S. Head, Chem. Biol., 4, 87
Mol. Biol., 212, 737 Ž1990.. Ž1997..

1662 VOL. 19, NO. 14

You might also like