0% found this document useful (0 votes)
80 views24 pages

Molecular Dynamics Simulations: Erik Lindahl

Uploaded by

Kübra Kahveci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views24 pages

Molecular Dynamics Simulations: Erik Lindahl

Uploaded by

Kübra Kahveci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter 1

Molecular Dynamics Simulations


Erik Lindahl

Abstract
Molecular dynamics has evolved from a niche method mainly applicable to model systems into a
­cornerstone in molecular biology. It provides us with a powerful toolbox that enables us to follow and
understand structure and dynamics with extreme detail—literally on scales where individual atoms can
be tracked. However, with great power comes great responsibility: Simulations will not magically
provide valid results, but it requires a skilled researcher. This chapter introduces you to this, and
makes you aware of some potential pitfalls. We focus on the two basic and most used methods; opti-
mizing a structure with energy minimization and simulating motion with molecular dynamics. The
statistical mechanics theory is covered briefly as well as limitations, for instance the lack of quantum
effects and short timescales. As a practical example, we show each step of a simulation of a small pro-
tein, including examples of hardware and software, how to obtain a starting structure, immersing it in
water, and choosing good simulation parameters. You will learn how to analyze simulations in terms
of structure, fluctuations, geometrical features, and how to create ray-traced movies for presentations.
With modern GPU acceleration, a desktop can perform μs-scale simulations of small proteins in a
day—only 15 years ago this took months on the largest supercomputer in the world. As a final exer-
cise, we show you how to set up, perform, and interpret such a folding simulation.

Key words Molecular dynamics, Simulation, Force field, Protein, Solvent, Energy minimization,
Position restraints, Equilibration, Trajectory analysis, Secondary structure

1  Introduction

Biomolecular dynamics occur over a wide range of scales in both


time and space, and the choice of approach to study them depends
on the question asked. In many cases the best alternative is an
experimental technique, for instance spectroscopy to study bond
vibrations or electrophysiology to study ion channels opening and
closing. However, theoretical methods have made huge advances
the last few decades, and there are now large domains where mod-
eling and simulation either provide more detail or are more effi-
cient to use compared to setting up a new experiment.
Molecular dynamics simulation is far from the only theoretical
method; when the aim is to predict for example the structure

Andreas Kukol (ed.), Molecular Modeling of Proteins, Methods in Molecular Biology, vol. 1215,
DOI 10.1007/978-1-4939-1465-4_1, © Springer Science+Business Media New York 2015

3
4 Erik Lindahl

and/or function of proteins (rather than studying the dynamics of


a protein) the best tool is normally bioinformatics that detect
related proteins from amino acid sequence similarity. Similarly, for
computational drug design often it is much more productive to
use less accurate but exceptionally fast statistical methods like
QSAR (Quantitative Structure–Activity Relationship) instead of
spending billions of CPU hours to simulate binding of thousands
of compounds.
Traditionally, the role of simulations has been to test if simple
theoretical models can predict experimental observations. For
example, simulations of ion channels have been useful to explain
why some ions pass while others are blocked, although the conduc-
tivity itself was already known from experiments. Similarly, simula-
tions can provide detail not accessible through experiments, for
instance pressure distributions inside membranes. However, this is
changing rapidly—simulations have moved far beyond confirming
experiments, and today they frequently make predictions about
properties such as binding or folding dynamics that are later con-
firmed in the lab. With ever-increasing computational power this
development will not only continue, but it is likely to accelerate
significantly the next few years.
From an ideal physics point-of-view, the time-dependent
Schrödinger equation should be able to predict all properties of
any molecule with arbitrary precision ab initio. However, as soon
as more than a handful of particles are involved it is necessary to
introduce approximations. In quantum chemistry, one common
approximation is to assume that atomic nuclei do not move, and
using an implicit representation of solvent. This is obviously not
realistic for large biomolecules if we are interested in understand-
ing their motion and sampling of lots of different states, so for
most biomolecular systems we instead choose to work with
empirical parameterizations of models, for instance classical
Coulomb interactions between pointlike atomic charges. The
conceptual difference is that quantum chemistry is excellent at
describing the electronic structure and enthalpy (potential) of the
system, while classical molecular dynamics instead excels at sam-
pling the billions of states a macromolecule will adapt—in par-
ticular this means they properly include the entropy part of free
energy. These models are not only orders of magnitude faster,
but since they have been parameterized from experiments they
also perform better when it comes to reproducing observations
on microsecond scale (Fig. 1), rather than extrapolating quantum
models 10 orders of magnitude. The first molecular dynamics
simulation was performed as late as 1957 [1], although it was not
until the 1970s that it was possible to simulate water [2] and
biomolecules [3].
Molecular Dynamics Simulations 5

bond length lipid lipid


diffusion normal protein "biology"
vibration rotation rotation folding
rapid ribosome membrane
around bonds water transport in protein fodling
relaxation ion channel protein folding synthesis

10-15s 10-12s 10-9s 10-6s 10-3s 1s 103s

Accessible to atomic-detail simulation today (2013)

Fig. 1 Range of time scales for dynamics in biomolecular systems. While the individual time steps of molecular
dynamics is 1–2 fs, parallel computers make it possible to simulate on microsecond scale, and distributed
computing techniques can sample even slower processes, almost reaching milliseconds

2  Theory

Macroscopic properties measured in an experiment are not direct


observations, but averages over billions of molecules representing
a statistical mechanics ensemble. This has deep theoretical implica-
tions that are covered in great detail in the literature [4, 5], but
even from a practical point of view there are important conse-
quences: (1) It is not sufficient to work with individual structures,
but systems have to be expanded to generate a representative
ensemble of structures (see Note 1) at the given experimental con-
ditions, e.g., temperature and pressure—this is one thing that sets
classical molecular dynamics apart from quantum chemistry. (2)
Thermodynamic equilibrium properties related to free energy,
such as binding constant, solubilities, and relative stability cannot
be calculated directly from individual simulations, but require
more elaborate techniques covered in later chapters—these all rely
on entropy. (3) For equilibrium properties (in contrast to kinetic)
the aim is to examine the ensemble of structures, and not necessar-
ily to reproduce individual atomic trajectories!
The two most common ways to generate statistically faithful
equilibrium ensembles are Monte Carlo and Molecular Dynamics
simulations, where the latter also has the advantage of accurately
reproducing kinetics of non-equilibrium properties such as diffu-
sion or folding times. However, these methods cannot handle the
case where a structure is very far from equilibrium, for instance if
two atoms are almost overlapping after building a new side chain.
To remove this type of clashes prior to simulation, we typically start
with an Energy Minimization. This type of minimization is also
commonly used to refine low-resolution experimental structures.
All classical simulation methods rely on more or less empirical
sets of parameters called Force fields [6–9] to calculate interactions
and evaluate the potential energy of the system as a function of
pointlike atomic coordinates. A force field consists of both the set
of equations used to calculate the potential energy and forces from
particle coordinates, as well as a collection of parameters used in
6 Erik Lindahl

Fig. 2 Examples of interaction functions in modern force fields. Bonded interac-


tions include covalent bond-stretching, angle-bending, torsion rotation around
bonds, and out-of-plane or “improper” torsions (not shown). Nonbonded interac-
tions are based on neighborlists and consist of Lennard–Jones attraction and
repulsion, as well as Coulomb electrostatics. Even a small amino acid residue
contains a large number of interactions, and for a protein there are thousands

the equations. For most purposes these approximations work great,


but they cannot reproduce quantum effects such as bond forma-
tion or breaking (see Note 2).
All common force fields subdivide potential functions in two
classes. Bonded interactions cover stretching of covalent bonds,
angle-bending, torsion potentials when rotating around bonds,
and out-of-plane “improper torsion” potentials, all which are nor-
mally fixed throughout a simulation (Fig. 2). The remaining non-
bonded interactions between atoms that are merely close in space
consist of Lennard–Jones repulsion and dispersion as well as
Coulomb electrostatic. These are typically computed from neigh-
borlists updated periodically.
Given the force on all atoms, the coordinates are updated for
the next step. For energy minimization, the steepest descent algo-
rithm simply moves each atom a short distance in direction of
decreasing energy (force is the negative gradient of energy), while
molecular dynamics is performed by integrating Newton’s equa-
tions of motion [10]:
¶V ( r1 ,¼,rN )
Fi = -
¶ri
¶ ri
2
mi = Fi
¶t 2
The updated coordinates are then used to evaluate the potential
energy again, as shown in the flowchart of Fig. 3.
Molecular Dynamics Simulations 7

Initial input data:


Interaction function V(r) - "force field"
coordinates r, velocities v

Compute potential V(r) and


forces F i = iV(r) on atoms

Repeat for millions of steps


Update coordinates &
velocities according to
equations of motion

Collect statistics and write


energy/coordinates to
trajectory files

Yes
More steps?

No

Done!

Fig. 3 Simplified flowchart of a typical molecular dynamics simulation. The basic


idea is to generate structures from a natural ensemble by calculating potential
functions and integrating Newton’s equations of motion, structures which are
then used to evaluate equilibrium properties of the system. A typical time step is
in the order of 1 or 2 femtoseconds, unless special techniques are used

Even the smallest chemical sample we can imagine is far too


large to include completely in a simulation. Instead, biomolecular
simulations normally uses periodic boundary conditions to avoid
surface artifacts, so that a water molecule that exits to the right
reappears on the left in the system; if the box is sufficiently large
the molecules will not interact significantly with their periodic cop-
ies. This is intimately related to the nonbonded interactions, which
ideally should be summed over all neighbors in the resulting infi-
nite periodic system. Simple cutoffs can work for Lennard–Jones
interactions that decay very rapidly, but for Coulomb interactions
a sudden cutoff can lead to large errors. In the early days of simula-
tion is was common to “switch off” the electrostatic interaction
before the cutoff as shown in Fig. 4, but this too has severe artifacts—
the current method of choice is to use Particle-Mesh-Ewald sum-
mation (PME) to calculate the infinite electrostatic interactions
by splitting the summation into short- and long-range parts [11].
8 Erik Lindahl

4
3
2
relative potential

1
0
-1
4
3
2
1
0
-1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
r (nm)

Fig. 4 Alternatives to a sharp cutoff for nonbonded coulomb interactions. Top: By switching off the interaction
(dashed ) before the cutoff the force will be the exact derivative of potential, but the derivative (and thus force)
will unnaturally increase just before the cutoff. Bottom: Particle-Mesh-Ewald is an amazing algorithm where
the coulomb interaction (solid ) is divided into a short-range term that is evaluated within a cutoff (dashed ) and
a long-range term which can be solved exactly in reciprocal space with Fourier transforms (dot-dash)

For PME, the cutoff is not really a cutoff; it only determines the
balance between the two parts, and the long-range part is treated
by assigning charges to a grid that is solved in reciprocal space
through Fourier transforms.
Cutoffs and rounding errors can lead to drifts in energy, which
will cause the system to heat up during the simulation. Even with
a theoretically perfect simulation we would run into problems
since we typically start from an imperfect structure. As the poten-
tial energy of this structure decreases during the simulation, the
kinetic energy (i.e., temperature) would increase if the total system
energy was constant. To control this, the system is normally cou-
pled to a thermostat that scales velocities during the integration to
maintain room temperature. Similarly, the total pressure in the sys-
tem can be adjusted through scaling the simulation box size, either
isotropically or separately in x/y/z dimensions.
The single most demanding part of simulations is the compu-
tation of nonbonded interactions, since millions of pairs have to be
evaluated for each time step. Extending the time step is thus an
important way to improve simulation performance, but unfortu-
nately errors are introduced in bond vibrations already at 1 fs.
However, in most simulations these bond vibrations are not of
interest per se, and can be removed entirely by introducing bond
constraint algorithms such as SHAKE [12] or LINCS [13].
Constraints make it possible to extend time steps to 2 fs, and fixed-­
length bonds are likely better approximations of the quantum
mechanical oscillators than harmonic springs (see Note 3)—and in
the final section we will show you how to go even further.
Molecular Dynamics Simulations 9

3  Methods

With the basic theory covered, this section will describe how to (1)
choose and obtain a starting structure, (2) prepare it for a simula-
tion, (3) create a simulation box, (4) add solvent water, (5) p
­ erform
energy minimization, (6) equilibrate the structure with simulation,
(7) perform the production simulation, and (8) analyze the trajec-
tory data. To reproduce it, you will need access to a Unix/Linux
machine (see Note 4) with a molecular dynamics package installed.
While the options and files below refer to the GROMACS pro-
gram [14], the description should be reasonably straightforward to
follow with other programs like AMBER [15], CHARMM [16],
or NAMD [17]. It will also be useful to have the molecular viewer
PyMOL [18] and Unix graph program Grace installed (see Note 5).

3.1  Obtaining The Bovine Pancreatic Trypsin Inhibitor (BPTI) is a small 58-­residue
a Starting Structure water-soluble protein that inhibits several serine proteases [19].
It was one of the first proteins to be simulated [3], and has often
been referred to as a “hydrogen atom” of protein simulation.
There are several high-resolution X-ray structures of BPTI [20] in
the Protein Data Bank (http://www.pdb.org), and also NMR
structures. It was actually the early simulations of BPTI [3] that
lead experimentalists to realize that X-ray temperature factors can
be used to study the local dynamics of a protein [21]. Choose the
entry 6PTI with 1.7 Å resolution [20], and download it as 6PTI.
pdb (see Note 6). Figure 5 shows a cartoon representation of this
structure; the small crosses are crystal water oxygen atoms visible
in the X-ray experiment (see Note 7).

3.2  Preparation In addition to the coordinates/velocities that change each step,


of Input Data simulations also need a static description of all atoms and interac-
tions in the system, called topology. In GROMACS, this is created
from the PDB structure by the program pdb2gmx, which also adds
all the hydrogen atoms that are not present in most X-ray struc-
tures. For this example we will work with the Amber99SB-ILDN
force field, the TIP3P [22] water model (see Note 8), and accept
the default choices for all residue protonation states, termini, disul-
fide bridges, etc. If you just try the command below right away you
will get an error due to the issues with the structure mentioned in
Note 6. This is common, so it is something you need to learn how
to fix. Open the PDB file in an editor and scroll down to residue
57 at the end of the chain. Remove the single nitrogen atom line
just before the line starting with “TER”—we simply skip the miss-
ing residues. Just after the “TER” line there are also some lines for
a phosphate ion (residue “PO4”). To avoid problems with finding
parameters for this, remove these five lines too. Now you should be
good to go. The command to use is then
10 Erik Lindahl

Fig. 5 Cartoon representation of the BPTI structure 6PTI from Protein Data Bank,
with side chains shown as sticks. Including hydrogens, the protein contains
roughly 800 atoms. Ray-traced image generated with PyMOL

pdb2gmx –f 6PTI.pdb –water tip3p


You will be prompted for the force field (select “6” for
Amber99SB-ILDN), and the command will produce three files:
conf.gro contains coordinates with hydrogens, topol.top is
the topology, and posre.itp contains a list of position restraints
that will be used in Subheading 3.7. For all these programs, you can
use the –h flag for help and a detailed list of options (see Note 9).

3.3  Creating The default box is taken from the PDB crystal cell, but a simula-
a Simulation Box tion in water requires something larger. The box size is a trade-off,
though: volume is proportional to the box side cubed, and more
water means the simulation is slower. The easiest option it to place
the solute in the center of a cube, with for example 0.75 nm to the
box sides. We will show up some more advanced alternatives later,
but for now this will suffice:
editconf –f conf.gro –d 0.75 –o box.gro
where the distance (-d) flag automatically centers the protein in
the box, and the new conformation is written to the file box.gro
(see Note 10).

3.4  Adding The last step before the simulation is to add water in the box to
Solvent Water solvate the protein. This is done by using a small pre-equilibrated
system of water coordinates that is repeated over the box, and
Molecular Dynamics Simulations 11

Fig. 6 BPTI solvated in water in a cubic box. Note that there is quite a lot of water,
in particular in the box corners

overlapping water molecules removed. The BPTI system will


require roughly 3,400 water molecules, which increases the num-
ber of atoms significantly. GROMACS does not use a special
­pre-­equilibrated system for TIP3P water since water coordinates
can be used with any model—the actual parameters are stored in
the topology and force field. In GROMACS, a suitable command
to solvate the new box would be
genbox –cp box.gro –cs spc216.gro \
–p topol.top –o solvated.gro
The backslash means the entire command should be written
on a single line. Solvent coordinates (-cs) are taken from an SPC
water system [23], and the –p flag adds the new water to the topol-
ogy file. The resulting system is illustrated in Fig. 6.

3.5  Adding Ions In principle you could use the system as is, but the net charge on
the protein is unphysical in an infinite system, and many proteins
interact with counterions. There is a GROMACS program to help
us with this, but we first need an input file. GROMACS uses a
separate preprocessing program grompp to collect parameters,
topology, and coordinates into a single run input file (em.tpr)
from which the simulation is then started (this makes it easier to
move it to a another computer). Here we are not really going to
run anything, so just create an empty file called ions.mdp and
prepare an input file as:
12 Erik Lindahl

grompp –f ions.mdp –p topol.top –c solvated.gro \


–o ions.tpr
To neutralize the system and add 100 mM NaCl to the output
file ions.gro, use the command
genion –s ions.tpr –neutral –conc 0.1 \
–p topol.top –o ions.gro

3.6  Energy The added hydrogens and broken hydrogen bond network in water
Minimization would lead to quite large forces and structure distortion if molecular
dynamics was started immediately. To remove these forces it is nec-
essary to first run a short energy minimization. The aim is not to
reach any local energy minimum, so 500 steps of steepest descent (as
mentioned in the theory section) works very well as a stable rather
than maximally efficient minimization. Nonbonded interactions and
other settings are specified in a parameter file (em.mdp); it is only
necessary to specify parameters where we deviate from the default
value (this is why we could use an empty file above), for example:
------em.mdp------
integrator = steep
nsteps = 500
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
nstenergy = 10
------------------
See Note 11 contains a more detailed description of these set-
tings. Then prepare the input file and run the energy minimization:
grompp –f em.mdp –p topol.top –c ions.gro \
–o em.tpr
<lots of output>
mdrun –v –deffnm em
The –deffnm is a smart shortcut that uses “em” as the base
filename for all options, but with different extensions. The minimi-
zation will complete in a few seconds (see Note 12).

3.7  Position To avoid unnecessary distortion of the protein when the molecular
Restrained dynamics simulation is started, we first perform a 100 ps equilibra-
Equilibration tion run where all heavy protein atoms are restrained to their start-
ing positions (using the file posre.itp generated earlier) while
the water is relaxing around the structure. As covered in the theory
section, bonds will be constrained to enable 2 fs time steps. Other
settings are identical to energy minimization, but for molecular
Molecular Dynamics Simulations 13

dynamics we also control the temperature with the Bussi thermostat


[24] (see Note 13). The settings used are (see Note 14):
------pr.mdp------
integrator = md
nsteps = 50000
dt = 0.002
nstenergy = 1000
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
tcoupl = v-rescale
tc-grps = protein water_and_ions
tau-t = 0.5 0.5
ref-t = 300 300
pcoupl = parrinello-rahman
pcoupltype = isotropic
tau-p = 2.0
compressibility = 4.5e-5
ref-p = 1.0
cutoff-scheme = Verlet
define = -DPOSRES
refcoord_scaling = com
constraints = all-bonds
------------------
For a small protein like BPTI it should be more than enough with
100 ps (50,000 steps) for the water to equilibrate around it, but in a
large membrane system the slow lipid motions can require several
nanoseconds of relaxation. The only way to know for certain is to
watch the potential energy, and extend the equilibration until it has
converged. Running this equilibration in GROMACS you execute
grompp –f pr.mdp –p topol.top –c em.gro \
–o pr.tpr
mdrun –v –deffnm pr
This simulation will finish in a few minutes on a GPU-equipped
workstation.

3.8  Production Runs The difference between equilibration and production run is mini-
mal: the position restraints and pressure coupling are turned off
(see Note 15), we decide how often to write output coordinates to
analyze (say, every 5,000 steps), and start a significantly longer
simulation. How long depends on what you are studying, and that
should be decided before starting any simulations. For decent sam-
pling the simulation should be at least ten times longer than the
phenomena you are studying, which unfortunately sometimes
14 Erik Lindahl

c­onflicts with reality and available computer resources. We will


­perform a 10 ns simulation (5 million steps), which takes about an
hour on a GPU workstation. If you are not that patient (or have a
slow machine) you can choose a shorter simulation just to get an
idea of the concepts, and the analysis programs in the next section
can read the simulation output trajectory as it is being produced.
------run.mdp------
integrator = md
nsteps = 5000000
dt = 0.002
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
tcoupl = v-rescale
tc-grps = protein water_and_ions
tau-t = 0.5 0.5
ref-t = 300 300
cutoff-scheme = Verlet
nstxtcout = 5000
nstenergy = 5000
------------------
Prepare and perform the production run as (the extra option
to mdrun avoids spending too much time on writing out the
­current step to frequently):
grompp –f run.mdp –p topol.top –c pr.gro –o run.tpr
<output>
mdrun –v –deffnm run –stepout 10000

3.9  Trajectory One of the most important fundamental properties to analyze is


Analysis whether the protein is stable and close to the experimental struc-
ture. The standard way to measure this is the root-mean-square
3.9.1  Deviation
displacement (RMSD) of all heavy atoms with respect to the X-ray
from X-Ray Structure
structure. GROMACS has a program to do this, as
g_rms –s em.tpr –f run.xtc
Note that the reference structure here is taken from the input
before energy minimization. The program will prompt both for a
fit group, and the group to calculate RMSD for—choose
“Protein-H” (protein except hydrogens) for both. The output
will be written to rmsd.xvg, and if you installed the Grace pro-
gram you will directly get a finished graph with
xmgrace rmsd.xvg
The RMSD is also illustrated in Fig. 7. It increases pretty rap-
idly in the first part of the simulation, but stabilizes around 0.14 nm,
Molecular Dynamics Simulations 15

0.2

0.15
RMSD (nm)
0.1

0.05

0
0 2 4 6 8 10
Time (ns)

Fig. 7 Instantaneous Root-mean-square displacement (RMSD) of all heavy atoms in Lysozyme during the
simulation (solid), relative to the crystal structure. To a large extent atoms are vibrating around an equilibrium,
so the RMSD of a 1-ns running average structure (dashed gray) is a better measure

roughly the resolution of the X-ray structure. The difference is


partly caused by limitations in the force field, but also because atoms
in the simulation are moving and vibrating around an equilibrium
structure. A better measure can be obtained by first creating a run-
ning average structure (see Note 16) from the ­simulation and com-
paring the running average to the X-ray structure, which gives a
more realistic RMSD around 0.12 nm (see Note 17).

3.9.2  Comparing Vibrations around the equilibrium are not random, but depend on
Fluctuations local structure flexibility. The root-mean-square-fluctuation
with Temperature Factors (RMSF) of each residue is straightforward to calculate over the
trajectory, but more important they can be converted to tempera-
ture factors that are also present for each atom in a PDB file. Once
again there is a program that will do the entire job:
g_rmsf –s run.tpr –f run.xtc –o rmsf.xvg \
–oq bfac.pdb
You can use the group “C-alpha” to get one value per residue.
Figure  8 displays both the residue RMSF from the simulation
(xmgrace rmsf.xvg), as well as the calculated and experimental
temperature factors. The overall agreement is reasonable for a pro-
tein this small and a short simulation. Longer simulations of larger
proteins can fit almost perfectly.

3.9.3  Secondary Another measure of stability is the protein secondary structure. This
Structure can be calculated for each frame with a program such as DSSP [25]. If
the DSSP program is installed and the environment variable DSSP
points to the binary (see Note 18), the GROMACS program do_dssp
can create time-resolved secondary structure plots. Since the program
writes output in a special xpm (X pixmap) format you probably also
need the GROMACS program xpm2ps to convert it to postscript:
do_dssp –s run.tpr –f run.xtc –dt 50
xpm2ps –f ss.xpm –o ss.eps
Use the group “protein” for the calculation. Figure 9 shows
the resulting output in grayscale, with some unused formatting
16 Erik Lindahl

0.08
RMSF(nm) 0.07
0.06
0.05
0.04
0.03
30
25
20
B-factor

15
10
5
0
0 10 20 30 40 50 60
Residue

Fig. 8 Top: Root-mean-square fluctuations of residue coordinates in the simulation. Bottom: The fluctuations
can be converted to X-ray temperature factors (solid), which agree quite well with the experimental B-factors
from the PDB file (dashed)

Coil B-Sheet B-Bridge Bend Turn A-Helix 3-Helix


50

40
Residue

30

20

10

0 2000 4000 6000 8000 10000


Time (ps)

Fig. 9 Local secondary structure in BPTI as a function of time during the simulation, according to the DSSP
definition. Note how some elements periodically lose a bit of structure, but it rapidly reforms and the overall
structure is quite stable over 10 ns

removed. The DSSP secondary structure definition is pretty tight,


so it is quite normal for residues to fluctuate around the well-­
defined state, in particular at the ends of helices or sheets. For a
(long) protein folding simulation, a DSSP plot would show how
the secondary structures form during the simulation.

3.9.4  Radius of Gyration There are two more very basic properties that are useful to analyze:
and Hydrogen Bonds The size of the protein defined by the “radius of gyration” and the
number of hydrogen bonds. To calculate the radius of gyration,
use the command:
g_gyrate –s run.tpr –f run.xtc
Molecular Dynamics Simulations 17

1.2

Rgyr (nm) 1.15

1.1

1.05

35
# H-bonds

30

25

20
0 2000 4000 6000 8000 10000
Time (ps)

Fig. 10 Top: Radius of gyration of BPTI during 10 ns simulation. This is a good measure of how compact a
structure is. Bottom: Number of hydrogen bonds inside the protein

The result will be written to the file gyrate.xvg, which includes


both the overall radius and the radii around the three axes. Similarly,
you get the hydrogen bonds with
g_hbond –s run.tpr –f run.xtc–num hbnum.xvg
Select the group “protein” twice to get all hydrogen bonds
between the protein and the protein itself. Figure 10 shows the
results for both these analyses (see Note 19).

3.9.5  Making a Movie A normal movie uses roughly 30 frames/second, so a 10-s movie
requires 300 simulation trajectory frames. To make a smooth
movie the frames should not be more than 1–2 ps apart, or it will
just appear to shake nervously (see Note 20). In many cases it
makes sense to rerun a shorter trajectory just for the movie, but
here we just export a short trajectory from the first 500 ps in PDB
format (readable by PyMOL) as
trjconv –s run.tpr –f run.xtc \
–e 2500.0 –o movie.pdb
Choose the protein group for output rather than the entire sys-
tem (see Note 21). If you open this trajectory in PyMOL as “PyMOL
movie.pdb” you can immediately play it using the VCR-­style con-
trols on the bottom right, adjust visual settings in the menus, and
even use photorealistic ray-tracing for all images in the movie. With
MacPyMOL you can directly save the movie as a quicktime file, and
on Linux you can save it as a sequence of PNG images for assembly
in another program. Rendering a movie only takes a few minutes,
and the final product bpti.mov is included with the reference files.
18 Erik Lindahl

4  Speeding Things up to Solve a Real Problem

Once you have gotten your feet wet you will likely want to approach
more realistic problems. One important such problem is folding of
small proteins, for instance the Villin headpiece where some
mutants fold within a microsecond [30]. This mutant contains a
special residue (norleucine), so you will need to copy the entire
amber force field directory from the installation directory of
Gromacs to your current working directory, place the file
­norleucine.rtp in the amber subdirectory, and also put the file
residuetypes.dat in your current directory (so GROMACS
recognizes the new residue as a protein residue).
The first challenge you are likely to hit is that you need your
simulations to run faster to be able to reach relevant time scales.
Here we will briefly go through a couple of recommendations that
will help you achieve this.
You have probably seen that the problem is the number of steps
we must take. One way to improve performance is to make each
time step longer, but this is limited by the vibrations in the angles
involving hydrogens (try a longer timestep in your files above and
see what happens). GROMACS has a feature to remove these vibra-
tions by replacing individual hydrogens with virtual sites. This
retains the rotation of for example CH3-groups, but removed the
fast vibrations. To enable this, use the –vsite option when you
run pdb2gmx (or skip it if you want to stick to 2 fs steps):
pdb2gmx –f protein.pdb –water tip3p \
–vsite hydrogen
This will instantly enable us to take time steps up to 5 fs, which
will improve your performance by 150 %. Second, if you look at the
box you used for BPTI you will likely see that it would better match
the shape of the protein as a more spherical shape. Unfortunately
spheres are not periodical, but we can ask GROMACS to use a rhom-
bic dodecahedron box instead, which is at least more spherical than a
cube and has only 71 % of the cube volume. That reduces the number
of water molecules required to solvate the protein. This is difficult to
visualize in three dimensions, but Fig. 11 illustrates in two dimen-
sions how a hexagonal cell is more efficient than a square (very useful
for membrane simulations). The hexagonal box achieves the same
distance between periodic copies as a rectangular box at 86 % of the
volume (see Note 22). Since we really want to push performance, we
also accept a very small margin to the box side (-d option):
editconf –f conf.gro –bt dodecahedron \
–d 0.3 –o box.gro
Finally, although PME does provide state-of-the-art electro-
statics, the Villin headpiece is quite well-behaved and there are no
large mobile charges in this system. To get a bit of extra perfor-
mance, this is a case where we can decide to forego PME and use
Molecular Dynamics Simulations 19

Fig. 11 Two-dimensional example of how a hexagonal box leads to lower volume


than a square one, with the same separation distance. In three dimensions, the
shape most similar to a sphere is the rhombic dodecahedron

reaction-field electrostatics instead. This difference is easy—just


write “reaction-field” instead of “PME” for electrostatics in your
mdp files. We also use very short cutoffs (8 Å). The exact files used,
including an input structure from the paper [30], are included in a
separate directory.
With these settings you should be able to work energy minimi-
zation and equilibration of Villin exactly the same way as we did for
BPTI, but it will be even faster—on a single Core i7 desktop with
a GPU we get almost a microsecond a day. Study what happens
with the protein by using the analysis tools you learned above—
you should see that it starts to compact and form more hydrogen
bonds, but that it takes quite a while for secondary structure to
form. To see how far away you are from the native state you can
prepare a second TPR file using the native state as reference.
However, after a bit of fluctuation, and possible 2–3 days of simu-
lation, you should also be able to reach the native state of Villin.

5  Conclusions

This chapter should hopefully provide a basic introduction to gen-


eral simulations. An important lesson is that high-quality simula-
tions require a lot of care from the user—just as with experimental
techniques the entire result can be ruined by a single sloppy step.
Further, recent techniques based on distributed computing and
markovian state models have been able to probe dynamics in the
millisecond range without extending individual simulations to
those scales [31]—this will be covered in much more detail in sub-
sequent chapters presenting metadynamics (Chapter 8) and
­accelerated MD (Chapter 12). While simulations are advancing
rapidly due to the continuous development of faster computers,
the field has also been plagued by (published) simulations that
20 Erik Lindahl

have not advanced our knowledge either of simulation methods or


biomolecules. Instead of just starting a simulation and hoping for
something to happen, you should decide beforehand what you
want to study, estimate the timescales necessary or see if it can be
accomplished with more advanced methods (e.g., free energy cal-
culations), and not start simulations until you are fairly confident
both about sampling, analysis required, and the force field accu-
racy. Used with caution, molecular dynamics is an amazingly pow-
erful tool, and a great complement to experiments.

6  Notes

1. Most simulations rely on systems being ergodic, that is, the


time average of the properties of a single molecule on a long
simulation should be the same as the instantaneous ensemble
average over all molecules in an experimental measurement.
This is often (but not always) true, although it assumes our
single simulation is sufficiently long, which can be very ineffi-
cient to achieve.
2. The standard harmonic bond potentials in molecular simula-
tions will never allow atoms to separate. However, the alterna-
tive Morse potential is supported in many programs (including
GROMACS) and will allow atoms to separate. Still, this is not
used very frequently—if your problem involves breaking and
forming bonds it is likely a better solution to use a QM/MM
simulation.
3. The classical representations can be corrected in a number of
ways to make sure that they are faithful representations of the
real system. This is discussed in great detail in the first chapter
of the GROMACS manual, to which we refer the interested
reader. However, the really important thing in modeling is to
understand your system and decide in each case what approxi-
mations are reasonable. It is easy to add more detail (e.g., by
using quantum chemistry), but that automatically means you
lose in the other end by not getting as much sampling. The
challenge is to strike the right balance for each problem!
4. In general, most computational chemistry programs behave best
with the Linux operating system, although it is possible to run
GROMACS on Windows. When starting out, you want a stan-
dard AMD or Intel desktop. Currently (2013), you will get the
best price–performance ratio by investing in a single-­ socket
machine with fastest consumer processor you can buy, for
instance Intel Core i7 4770. You can get this for well under
$1000. GROMACS and some other codes support GPU accel-
eration for NVIDIA cards, so to improve performance signifi-
cantly it is a good idea to add a high-end graphics card such as
Molecular Dynamics Simulations 21

GTX780 or GTX TITAN. Beware that this development is even


faster than the CPUs, so consult the internet to find up-to-date
hardware. Commercial Linux distribution are not required—we
typically use the free Ubuntu (http://www.ubuntu.com). If you
are hesitant about installing Linux, get an Mac instead.
5. GROMACS is freely available from http://www.gromacs.org.
It should be quite easy to install using the step-by-step instruc-
tions, and for most common platforms there are finished binary
packages (installation might require root access, though).
PyMOL is distributed from http://www.PyMOL.org, with
binaries for Windows, Linux, and Mac OS X. The MacPyMOL
version requires a license after a trial period, but is very much
recommended for the better movie export capabilities.
Unfortunately, the Grace package is not quite as trivial to
install. The distribution site http://plasma-gate.weizmann.
ac.il/Grace/ only provides source code, so you might want to
perform an internet search for a binary for your platform.
Linux RPMs can often be found at http://www.rpmfind.net.
Grace uses Motif X11 library, but it compiles fine with the
open source clone LessTif, http://www.lesstif.org.
6. For this tutorial pretty much any structures would have been
fine too, but some of the pdb-files contain organic molecules
that are difficult to model automatically, both in GROMACS
and other programs. The key issue is to obtain and validate a
topology for your organic molecule before proceeding with
the simulation. It is often a good idea to both have a look at
the structure in a viewer, and read the text information at the
top of the PDB file to see if there are any special issues. For
6PTI, the header mentions that the last residue was not visible
at all, and only the nitrogen atom in the second last. If large
parts of the protein are inaccurate it might be better to choose
a different structure.
7. Sometimes people remove the crystal water to replace it with
their own solvent later, but this is usually a bad idea. The rea-
son why they are visible is that these waters are tightly bound
to the structure and often form salt bridges, so if they are dis-
carded the structure might distort before new solvent has a
chance to equilibrate in these positions. Keep the crystal water!
8. Water is a very special liquid, and actually quite difficult to
model accurately. However, biomolecular simulations are usu-
ally focusing on the protein/DNA/etc., and thus normally
prefer cheap and simple approximate solvent models to the
most accurate one. The most common such models are SPC
[23] (used with the GROMOS96 force field) and TIP3P [22]
(OPLS and Amber force fields), which both represent the
water as an entirely rigid molecule with three sites (oxygen &
two hydrogens). There are a couple of modified models such
22 Erik Lindahl

as SPC/E that improve bulk properties, but the standard


­models are often preferred for interface systems like mem-
branes. TIP4P [26] is a smart model with a fourth interaction
site offset from the oxygen, and still reasonably cheap compu-
tationally (recommended), while TIP5P [27] with five interac-
tion sites is too expensive for most simulations.
9. pdb2gmx can be somewhat picky with the input structures,
but that is usually a good thing—it will for instance not accept
proteins with missing heavy atoms. If that happens, the best
option is to find a better structure, and if that is not possible
you can try to build the missing parts with a program like
Modeller (http://salilab.org/modeller/). However, if you
have to build more than a handful of residues it is doubtful if
the resulting structure is accurate enough to simulate. For
6PTI, pdb2gmx will also issue a warning about net charge, but
that is fine. In general, all GROMACS program try to do both
double- and triple-checking of your input, so if you do not get
any warning you can be pretty confident about the correctness
of your input.
10. All GROMACS programs that write coordinates support a
number of different output formats. The default one is .gro,
simply because it has support for velocities too, but if you want
a PDB file to view for example in PyMOL you simply change
the output file extension to .pdb, when using a gromacs
program.
11. We choose a standard cutoff of 1.0 nm, both for the neighborl-
ist generation and the coulomb and Lennard–Jones interac-
tions. nstlist = 10 means it is updated at least every 10 steps,
but for energy minimization it will usually be every step.
Energies and other statistical data are stored every 10 steps
(nstenergy), and we have chosen the more expensive
Particle-Mesh-­Ewald summation for electrostatic interactions.
The treatment of nonbonded interactions is frequently border-
ing to religion. One camp advocates standard cutoffs are fine,
another swears by switched-off interactions, while the third
would not even consider anything but PME. One argument in
this context is that “true” interactions should conserve energy,
which is violated by sharp cutoffs since the force is no longer
the exact derivative of the potential. On the other hand, just
because an interaction conserves energy does not mean it
describes nature accurately. In practice, the difference is most
pronounced for systems that are very small or with large charges,
but the key lesson is really that it is a trade-off. PME is great,
but also clearly slower than cutoffs. Longer cutoffs are always
better than short ones (but slower), and while switched interac-
tions improve energy conservation they introduce artificially
large forces. Using PME is the safe option, but if that is not fast
Molecular Dynamics Simulations 23

enough it is worth investigating reaction-field electrostatics,


but you should never use a plain cutoff for electrostatics. It is
also a good idea to check and follow the recommended settings
for the force field used.
12. Mdrun will write several output files: em.edr is an "energy
file" with statistical data (energies, temperature, pressure, etc.).
em.trr is a trajectory with full coordinates/velocities of the
system during the run, and em.log a log file. Depending on
the parameters (disabled here), it might also write a compressed
trajectory with low-precision coordinates only, em.xtc.
13. The Bussi thermostat is a great advance for simulations. It is
both efficient and avoids excessive fluctuations, and maintains
a correct statistical mechanics ensemble. We strongly prefer it
over the Nose-Hoover thermostats [28]. For pressure cou-
pling we use the similar Parrinello–Rahman barostat [29].
When your only goal is to get the system to a specific tempera-
ture or pressure as quickly as possible without fluctuations, you
can also consider the Berendsen weak coupling thermostat/
barostat, but these do not provide correct ensembles. For the
Bussi thermostat we can use relatively slow coupling times
(0.5 ps), and the pressure coupling should be clearly slower
than this (2–5 ps).
14. For molecular dynamics simulations the integrator is "md."
Temperature coupling has been enabled for protein and water
separately (to avoid heating the water more than the protein or
vice versa), with a 300 K reference temperature. The com-
pressibility is really a symmetric tensor, and by setting the last
three elements (off-diagonal) to 0 we disable any box shear
deformation. The last line causes grompp to include the posi-
tion restraint file posre.itp generated by pdb2gmx, which
turns on position restraints. Since we are scaling the box with
pressure coupling, we also need to adjust the center-of-mass of
the reference coordinates for the position restraints with the
refcoord_scaling option. Finally, the Verlet cutoff-scheme is a
more accurate setting that also enables us to use GPU accelera-
tors in GROMACS.
15. The easiest way to create a running average in GROMACS is
to use the g_filter program. The command “g_filter –nf
50 -all –s run.tpr –f run.xtc –ol lowpass.
xtc” will create a lowpass version of the trajectory (cosine
averaging over 50 frames), which then can be used as modified
input file to the g_rms program.
16. In this particular case we just used pressure coupling to get the
right density, while the production simulation is performed in
a so-called NVT ensemble (constant number of particles,
volume, and temperature). For some systems, in particular
­
24 Erik Lindahl

membranes and membrane proteins, it is common to enable


pressure coupling during the entire simulation in a so-called
NPT ensemble.
17. If the RMSD is significantly higher than this, or continuously
increasing, there is likely something very wrong. Start over
with the PDB file, read the headers carefully and make sure the
starting structure is accurate. In the next step, check the differ-
ent energy terms and RMSD change both during minimiza-
tion and position restraints. You can also use the –posrefc
flag with pdb2gmx to increase the strength of the position
restraints, and extend the equilibration run.
18. The DSSP program can be obtained from http://swift.cmbi.
ru.nl/gv/dssp/. The latest version is now freely available for
everybody, but it also has a new output format. This new out-
put format is supported by Gromacs version 4.6 and later.
Download a precompiled binary if you find a suitable one, or
compile the program and install it, e.g., in/usr/local/bin. Set
the environment variable with a command like “export
DSSP=/usr/local/bin/dssp” (bash shell).
19. Modern force fields no longer use special hydrogen bond
interactions, partly because it is not necessary and partly
because it is difficult to track formation/breaking of hydro-
gen bonds separately. “Hydrogen bonds” are therefore
defined from geometric criteria, typically that the distance
between the donor and acceptor atoms should be smaller
than 0.35 nm, and the angle donor–acceptor–hydrogen
should be below 30 degrees.
20. To visualize slower phenomena such as protein folding, you
can use g_filter to smooth out motions in longer trajecto-
ries. In some cases this can lead to strange artifacts, e.g., when
averaging torsion rotation around a bond, but it is usually bet-
ter than just taking raw trajectory frames with too large
spacing.
21. PyMOL loads all frames of the trajectory into memory, so if
the water molecules are included it will likely run out of mem-
ory when creating graphical representations for over 20,000
atoms repeated in 250 frames. Trajectories restricted to the
protein part can thus be much longer.
22. The volume of a rhombic dodecahedron is roughly 71 % of a
cube with the same spacing, for a truncated octahedron it is
77 %, and a hexagonal box is 86 % of a rectangular one.
These difference can appear small, but 30 % is quite signifi-
cant when simulations use weeks of supercomputer time,
and it is a free lunch after all! However, not all programs
support all box shapes.
Molecular Dynamics Simulations 25

References
1. Alder BJ, Wainwright TE (1957) Phase transi- 15. Case DA et al (2005) The Amber biomolecular
tion for a hard sphere system. J Chem Phys simulation programs. J Comput Chem 26:
27:1208–1209 1668–1688
2. Rahman A, Stillinger FH (1971) Molecular 16. Brooks BR et al (1983) CHARMM: a program
dynamics study of liquid water. J Chem Phys for macromolecular energy, minmimization,
55:3336–3359 and dynamics calculations. J Comput Chem
3. McCammon JA, Gelin BR, Karplus M (1977) 4:187–217
Dynamics of folded proteins. Nature 267: 17. Phillips JC et al (2005) Scalable molecular
585–590 dynamics with NAMD. J Comput Chem
4. Allen MP, Tildesley DJ (1989) Computer sim- 26:1781–1802
ulation of liquids. Clarendon, New York, NY 18. DeLano WL (2002) The PyMOL molecular
5. Frenkel D, Smit B (2001) Understanding graphics system. DeLano Scientific, San Carlos,
molecular simulation. Academic, New York, NY CA, http://www.PyMOL.org
6. Kaminski GA, Friesner RA, Tirado-Rives J, 19. Ascenzi P et al (2003) The bovine basic pan-
Jorgensen WL (2001) Evaluation and repa- creatic trypsin inhibitor (kunitz inhibitor): a
rametrization of the OPLS-AA force field for milestone protein. Curr Protein Pept Sci
proteins via comparison with accurate quan- 4:231–251
tum chemical calculations on peptides. J Phys 20. Wlodawer A et al (1987) Structure of form III
Chem B 105:6474–6487 crystals of bovine pancreatic trypsin inhibitor. J
7. MacKerell AD Jr et al (1998) All-atom empiri- Mol Biol 198:469–480
cal potential for molecular modeling and 21. Frauenfelder H, Petsko GA, Tsernoglou D
dynamics Studies of proteins. J Phys Chem B (1979) Temperature-dependent X-ray diffrac-
102:3586–3616 tion as a probe of protein structural dynamics.
8. Oostenbrink C, Villa A, Mark AE, van Nature 280:558–563
Gunsteren WF (2004) A biomolecular force 22. Jorgensen WL, Chandrasekhar J, Madura JD,
field based on the free enthalpy of hydration Impey RW, Klein ML (1983) Comparison of
and solvation: the GROMOS force-field simple potential functions for simulating liquid
parameter sets 53A5 and 53A6. J Comput water. J Chem Phys 79:926–935
Chem 25:1656–1676 23. Berendsen HJC, Postma JPM, van Gunsteren
9. Wang J, Cieplak P, Kollman PA (2000) How WF (1981) Interaction models for water in
well does a restrained electrostatic potential relation to protein hydration. In: Pullman B
(RESP) model perform in calculating confor- (ed) Intermolecular forces. D. Reidel Publishing
mational energies of organic and biological Company, Dordrecht, Germany, pp 331–342
molecules? J Comput Chem 21:1049–1074 24. Bussi G, Donadio D, Parrinello M (2007)
10. Chandler D (1987) Introduction to modern Canonical sampling through velocity-rescaling.
statistical mechanics. Oxford University Press, J Chem Phys 126:014101
New York, NY 25. Kabsch W, Sanders C (1983) Dictionary of
11. Essman U, Perera L, Berkowitz M, Darden T, protein secondary structure: pattern recogni-
Lee H, Pedersen LG (1995) A smooth parti- tion of hydrogen-bonded and geometrical fea-
cle mesh Ewald method. J Chem Phys 103: tures. Biopolymers 22:2577–2637
8577–8593 26. Jorgensen WL, Madura JD (1985) Temperature
12. Ryckaert JP, Ciccotti G, Berendsen HJC and size dependence for monte carlo simula-
(1977) Numerical integration of the cartesian tions of TIP4P water. Mol Phys 56:1381–1392
equations of motion of a system with con- 27. Mahoney MW, Jorgensen WL (2000) A five-­
straints; molecular dynamics of n-alkanes. site model for liquid water and the
J Comp Phys 23:327–341 ­reproduction of the density anomaly by rigid,
13. Hess B, Bekker H, Berendsen HJC, Fraaije nonpolarizable potential fuynctions. J Chem
JGEM (1998) LINCS: a linear constraint Phys 112:8910–8922
solver for molecular simulation. J Comput 28. Nosé S (1984) A molecular dynamics method
Chem 18:1463–1472 for simulations in the canonical ensemble. Mol
14. Lindahl E, Hess B, van der Spoel D (2001) Phys 52:255–268
GROMACS 3.0: A package for molecular sim- 29. Parrinello M, Rahman A (1981) Polymorphic
ulation and trajectory analysis. J Mol Model transitions in single crystals: a new molecular
7:306–317 dynamics method. J Appl Phys 52:7182–7190
26 Erik Lindahl


30. Ensign DL, Kasson P, Pande V (2007) 31. Voelz VA, Bowman GR, Beauchamp K,

Heterogeneity even at the speed limit of fold- Pande VS (2010) Molecular simulation of
ing: large-scale molecular dynamics study of a ab initio protein folding for a millisecond
fast-folding variant of the villin headpiece. folder NTL9 (1-39). J Am Chem Soc 132:
J Mol Biol 374:806–816 1526–1528

You might also like