Molecular Dynamics Simulations: Erik Lindahl
Molecular Dynamics Simulations: Erik Lindahl
Abstract
Molecular dynamics has evolved from a niche method mainly applicable to model systems into a
cornerstone in molecular biology. It provides us with a powerful toolbox that enables us to follow and
understand structure and dynamics with extreme detail—literally on scales where individual atoms can
be tracked. However, with great power comes great responsibility: Simulations will not magically
provide valid results, but it requires a skilled researcher. This chapter introduces you to this, and
makes you aware of some potential pitfalls. We focus on the two basic and most used methods; opti-
mizing a structure with energy minimization and simulating motion with molecular dynamics. The
statistical mechanics theory is covered briefly as well as limitations, for instance the lack of quantum
effects and short timescales. As a practical example, we show each step of a simulation of a small pro-
tein, including examples of hardware and software, how to obtain a starting structure, immersing it in
water, and choosing good simulation parameters. You will learn how to analyze simulations in terms
of structure, fluctuations, geometrical features, and how to create ray-traced movies for presentations.
With modern GPU acceleration, a desktop can perform μs-scale simulations of small proteins in a
day—only 15 years ago this took months on the largest supercomputer in the world. As a final exer-
cise, we show you how to set up, perform, and interpret such a folding simulation.
Key words Molecular dynamics, Simulation, Force field, Protein, Solvent, Energy minimization,
Position restraints, Equilibration, Trajectory analysis, Secondary structure
1 Introduction
Andreas Kukol (ed.), Molecular Modeling of Proteins, Methods in Molecular Biology, vol. 1215,
DOI 10.1007/978-1-4939-1465-4_1, © Springer Science+Business Media New York 2015
3
4 Erik Lindahl
Fig. 1 Range of time scales for dynamics in biomolecular systems. While the individual time steps of molecular
dynamics is 1–2 fs, parallel computers make it possible to simulate on microsecond scale, and distributed
computing techniques can sample even slower processes, almost reaching milliseconds
2 Theory
Yes
More steps?
No
Done!
4
3
2
relative potential
1
0
-1
4
3
2
1
0
-1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
r (nm)
Fig. 4 Alternatives to a sharp cutoff for nonbonded coulomb interactions. Top: By switching off the interaction
(dashed ) before the cutoff the force will be the exact derivative of potential, but the derivative (and thus force)
will unnaturally increase just before the cutoff. Bottom: Particle-Mesh-Ewald is an amazing algorithm where
the coulomb interaction (solid ) is divided into a short-range term that is evaluated within a cutoff (dashed ) and
a long-range term which can be solved exactly in reciprocal space with Fourier transforms (dot-dash)
For PME, the cutoff is not really a cutoff; it only determines the
balance between the two parts, and the long-range part is treated
by assigning charges to a grid that is solved in reciprocal space
through Fourier transforms.
Cutoffs and rounding errors can lead to drifts in energy, which
will cause the system to heat up during the simulation. Even with
a theoretically perfect simulation we would run into problems
since we typically start from an imperfect structure. As the poten-
tial energy of this structure decreases during the simulation, the
kinetic energy (i.e., temperature) would increase if the total system
energy was constant. To control this, the system is normally cou-
pled to a thermostat that scales velocities during the integration to
maintain room temperature. Similarly, the total pressure in the sys-
tem can be adjusted through scaling the simulation box size, either
isotropically or separately in x/y/z dimensions.
The single most demanding part of simulations is the compu-
tation of nonbonded interactions, since millions of pairs have to be
evaluated for each time step. Extending the time step is thus an
important way to improve simulation performance, but unfortu-
nately errors are introduced in bond vibrations already at 1 fs.
However, in most simulations these bond vibrations are not of
interest per se, and can be removed entirely by introducing bond
constraint algorithms such as SHAKE [12] or LINCS [13].
Constraints make it possible to extend time steps to 2 fs, and fixed-
length bonds are likely better approximations of the quantum
mechanical oscillators than harmonic springs (see Note 3)—and in
the final section we will show you how to go even further.
Molecular Dynamics Simulations 9
3 Methods
With the basic theory covered, this section will describe how to (1)
choose and obtain a starting structure, (2) prepare it for a simula-
tion, (3) create a simulation box, (4) add solvent water, (5) p
erform
energy minimization, (6) equilibrate the structure with simulation,
(7) perform the production simulation, and (8) analyze the trajec-
tory data. To reproduce it, you will need access to a Unix/Linux
machine (see Note 4) with a molecular dynamics package installed.
While the options and files below refer to the GROMACS pro-
gram [14], the description should be reasonably straightforward to
follow with other programs like AMBER [15], CHARMM [16],
or NAMD [17]. It will also be useful to have the molecular viewer
PyMOL [18] and Unix graph program Grace installed (see Note 5).
3.1 Obtaining The Bovine Pancreatic Trypsin Inhibitor (BPTI) is a small 58-residue
a Starting Structure water-soluble protein that inhibits several serine proteases [19].
It was one of the first proteins to be simulated [3], and has often
been referred to as a “hydrogen atom” of protein simulation.
There are several high-resolution X-ray structures of BPTI [20] in
the Protein Data Bank (http://www.pdb.org), and also NMR
structures. It was actually the early simulations of BPTI [3] that
lead experimentalists to realize that X-ray temperature factors can
be used to study the local dynamics of a protein [21]. Choose the
entry 6PTI with 1.7 Å resolution [20], and download it as 6PTI.
pdb (see Note 6). Figure 5 shows a cartoon representation of this
structure; the small crosses are crystal water oxygen atoms visible
in the X-ray experiment (see Note 7).
Fig. 5 Cartoon representation of the BPTI structure 6PTI from Protein Data Bank,
with side chains shown as sticks. Including hydrogens, the protein contains
roughly 800 atoms. Ray-traced image generated with PyMOL
3.3 Creating The default box is taken from the PDB crystal cell, but a simula-
a Simulation Box tion in water requires something larger. The box size is a trade-off,
though: volume is proportional to the box side cubed, and more
water means the simulation is slower. The easiest option it to place
the solute in the center of a cube, with for example 0.75 nm to the
box sides. We will show up some more advanced alternatives later,
but for now this will suffice:
editconf –f conf.gro –d 0.75 –o box.gro
where the distance (-d) flag automatically centers the protein in
the box, and the new conformation is written to the file box.gro
(see Note 10).
3.4 Adding The last step before the simulation is to add water in the box to
Solvent Water solvate the protein. This is done by using a small pre-equilibrated
system of water coordinates that is repeated over the box, and
Molecular Dynamics Simulations 11
Fig. 6 BPTI solvated in water in a cubic box. Note that there is quite a lot of water,
in particular in the box corners
3.5 Adding Ions In principle you could use the system as is, but the net charge on
the protein is unphysical in an infinite system, and many proteins
interact with counterions. There is a GROMACS program to help
us with this, but we first need an input file. GROMACS uses a
separate preprocessing program grompp to collect parameters,
topology, and coordinates into a single run input file (em.tpr)
from which the simulation is then started (this makes it easier to
move it to a another computer). Here we are not really going to
run anything, so just create an empty file called ions.mdp and
prepare an input file as:
12 Erik Lindahl
3.6 Energy The added hydrogens and broken hydrogen bond network in water
Minimization would lead to quite large forces and structure distortion if molecular
dynamics was started immediately. To remove these forces it is nec-
essary to first run a short energy minimization. The aim is not to
reach any local energy minimum, so 500 steps of steepest descent (as
mentioned in the theory section) works very well as a stable rather
than maximally efficient minimization. Nonbonded interactions and
other settings are specified in a parameter file (em.mdp); it is only
necessary to specify parameters where we deviate from the default
value (this is why we could use an empty file above), for example:
------em.mdp------
integrator = steep
nsteps = 500
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
nstenergy = 10
------------------
See Note 11 contains a more detailed description of these set-
tings. Then prepare the input file and run the energy minimization:
grompp –f em.mdp –p topol.top –c ions.gro \
–o em.tpr
<lots of output>
mdrun –v –deffnm em
The –deffnm is a smart shortcut that uses “em” as the base
filename for all options, but with different extensions. The minimi-
zation will complete in a few seconds (see Note 12).
3.7 Position To avoid unnecessary distortion of the protein when the molecular
Restrained dynamics simulation is started, we first perform a 100 ps equilibra-
Equilibration tion run where all heavy protein atoms are restrained to their start-
ing positions (using the file posre.itp generated earlier) while
the water is relaxing around the structure. As covered in the theory
section, bonds will be constrained to enable 2 fs time steps. Other
settings are identical to energy minimization, but for molecular
Molecular Dynamics Simulations 13
3.8 Production Runs The difference between equilibration and production run is mini-
mal: the position restraints and pressure coupling are turned off
(see Note 15), we decide how often to write output coordinates to
analyze (say, every 5,000 steps), and start a significantly longer
simulation. How long depends on what you are studying, and that
should be decided before starting any simulations. For decent sam-
pling the simulation should be at least ten times longer than the
phenomena you are studying, which unfortunately sometimes
14 Erik Lindahl
0.2
0.15
RMSD (nm)
0.1
0.05
0
0 2 4 6 8 10
Time (ns)
Fig. 7 Instantaneous Root-mean-square displacement (RMSD) of all heavy atoms in Lysozyme during the
simulation (solid), relative to the crystal structure. To a large extent atoms are vibrating around an equilibrium,
so the RMSD of a 1-ns running average structure (dashed gray) is a better measure
3.9.2 Comparing Vibrations around the equilibrium are not random, but depend on
Fluctuations local structure flexibility. The root-mean-square-fluctuation
with Temperature Factors (RMSF) of each residue is straightforward to calculate over the
trajectory, but more important they can be converted to tempera-
ture factors that are also present for each atom in a PDB file. Once
again there is a program that will do the entire job:
g_rmsf –s run.tpr –f run.xtc –o rmsf.xvg \
–oq bfac.pdb
You can use the group “C-alpha” to get one value per residue.
Figure 8 displays both the residue RMSF from the simulation
(xmgrace rmsf.xvg), as well as the calculated and experimental
temperature factors. The overall agreement is reasonable for a pro-
tein this small and a short simulation. Longer simulations of larger
proteins can fit almost perfectly.
3.9.3 Secondary Another measure of stability is the protein secondary structure. This
Structure can be calculated for each frame with a program such as DSSP [25]. If
the DSSP program is installed and the environment variable DSSP
points to the binary (see Note 18), the GROMACS program do_dssp
can create time-resolved secondary structure plots. Since the program
writes output in a special xpm (X pixmap) format you probably also
need the GROMACS program xpm2ps to convert it to postscript:
do_dssp –s run.tpr –f run.xtc –dt 50
xpm2ps –f ss.xpm –o ss.eps
Use the group “protein” for the calculation. Figure 9 shows
the resulting output in grayscale, with some unused formatting
16 Erik Lindahl
0.08
RMSF(nm) 0.07
0.06
0.05
0.04
0.03
30
25
20
B-factor
15
10
5
0
0 10 20 30 40 50 60
Residue
Fig. 8 Top: Root-mean-square fluctuations of residue coordinates in the simulation. Bottom: The fluctuations
can be converted to X-ray temperature factors (solid), which agree quite well with the experimental B-factors
from the PDB file (dashed)
40
Residue
30
20
10
Fig. 9 Local secondary structure in BPTI as a function of time during the simulation, according to the DSSP
definition. Note how some elements periodically lose a bit of structure, but it rapidly reforms and the overall
structure is quite stable over 10 ns
3.9.4 Radius of Gyration There are two more very basic properties that are useful to analyze:
and Hydrogen Bonds The size of the protein defined by the “radius of gyration” and the
number of hydrogen bonds. To calculate the radius of gyration,
use the command:
g_gyrate –s run.tpr –f run.xtc
Molecular Dynamics Simulations 17
1.2
1.1
1.05
35
# H-bonds
30
25
20
0 2000 4000 6000 8000 10000
Time (ps)
Fig. 10 Top: Radius of gyration of BPTI during 10 ns simulation. This is a good measure of how compact a
structure is. Bottom: Number of hydrogen bonds inside the protein
3.9.5 Making a Movie A normal movie uses roughly 30 frames/second, so a 10-s movie
requires 300 simulation trajectory frames. To make a smooth
movie the frames should not be more than 1–2 ps apart, or it will
just appear to shake nervously (see Note 20). In many cases it
makes sense to rerun a shorter trajectory just for the movie, but
here we just export a short trajectory from the first 500 ps in PDB
format (readable by PyMOL) as
trjconv –s run.tpr –f run.xtc \
–e 2500.0 –o movie.pdb
Choose the protein group for output rather than the entire sys-
tem (see Note 21). If you open this trajectory in PyMOL as “PyMOL
movie.pdb” you can immediately play it using the VCR-style con-
trols on the bottom right, adjust visual settings in the menus, and
even use photorealistic ray-tracing for all images in the movie. With
MacPyMOL you can directly save the movie as a quicktime file, and
on Linux you can save it as a sequence of PNG images for assembly
in another program. Rendering a movie only takes a few minutes,
and the final product bpti.mov is included with the reference files.
18 Erik Lindahl
Once you have gotten your feet wet you will likely want to approach
more realistic problems. One important such problem is folding of
small proteins, for instance the Villin headpiece where some
mutants fold within a microsecond [30]. This mutant contains a
special residue (norleucine), so you will need to copy the entire
amber force field directory from the installation directory of
Gromacs to your current working directory, place the file
norleucine.rtp in the amber subdirectory, and also put the file
residuetypes.dat in your current directory (so GROMACS
recognizes the new residue as a protein residue).
The first challenge you are likely to hit is that you need your
simulations to run faster to be able to reach relevant time scales.
Here we will briefly go through a couple of recommendations that
will help you achieve this.
You have probably seen that the problem is the number of steps
we must take. One way to improve performance is to make each
time step longer, but this is limited by the vibrations in the angles
involving hydrogens (try a longer timestep in your files above and
see what happens). GROMACS has a feature to remove these vibra-
tions by replacing individual hydrogens with virtual sites. This
retains the rotation of for example CH3-groups, but removed the
fast vibrations. To enable this, use the –vsite option when you
run pdb2gmx (or skip it if you want to stick to 2 fs steps):
pdb2gmx –f protein.pdb –water tip3p \
–vsite hydrogen
This will instantly enable us to take time steps up to 5 fs, which
will improve your performance by 150 %. Second, if you look at the
box you used for BPTI you will likely see that it would better match
the shape of the protein as a more spherical shape. Unfortunately
spheres are not periodical, but we can ask GROMACS to use a rhom-
bic dodecahedron box instead, which is at least more spherical than a
cube and has only 71 % of the cube volume. That reduces the number
of water molecules required to solvate the protein. This is difficult to
visualize in three dimensions, but Fig. 11 illustrates in two dimen-
sions how a hexagonal cell is more efficient than a square (very useful
for membrane simulations). The hexagonal box achieves the same
distance between periodic copies as a rectangular box at 86 % of the
volume (see Note 22). Since we really want to push performance, we
also accept a very small margin to the box side (-d option):
editconf –f conf.gro –bt dodecahedron \
–d 0.3 –o box.gro
Finally, although PME does provide state-of-the-art electro-
statics, the Villin headpiece is quite well-behaved and there are no
large mobile charges in this system. To get a bit of extra perfor-
mance, this is a case where we can decide to forego PME and use
Molecular Dynamics Simulations 19
5 Conclusions
6 Notes
References
1. Alder BJ, Wainwright TE (1957) Phase transi- 15. Case DA et al (2005) The Amber biomolecular
tion for a hard sphere system. J Chem Phys simulation programs. J Comput Chem 26:
27:1208–1209 1668–1688
2. Rahman A, Stillinger FH (1971) Molecular 16. Brooks BR et al (1983) CHARMM: a program
dynamics study of liquid water. J Chem Phys for macromolecular energy, minmimization,
55:3336–3359 and dynamics calculations. J Comput Chem
3. McCammon JA, Gelin BR, Karplus M (1977) 4:187–217
Dynamics of folded proteins. Nature 267: 17. Phillips JC et al (2005) Scalable molecular
585–590 dynamics with NAMD. J Comput Chem
4. Allen MP, Tildesley DJ (1989) Computer sim- 26:1781–1802
ulation of liquids. Clarendon, New York, NY 18. DeLano WL (2002) The PyMOL molecular
5. Frenkel D, Smit B (2001) Understanding graphics system. DeLano Scientific, San Carlos,
molecular simulation. Academic, New York, NY CA, http://www.PyMOL.org
6. Kaminski GA, Friesner RA, Tirado-Rives J, 19. Ascenzi P et al (2003) The bovine basic pan-
Jorgensen WL (2001) Evaluation and repa- creatic trypsin inhibitor (kunitz inhibitor): a
rametrization of the OPLS-AA force field for milestone protein. Curr Protein Pept Sci
proteins via comparison with accurate quan- 4:231–251
tum chemical calculations on peptides. J Phys 20. Wlodawer A et al (1987) Structure of form III
Chem B 105:6474–6487 crystals of bovine pancreatic trypsin inhibitor. J
7. MacKerell AD Jr et al (1998) All-atom empiri- Mol Biol 198:469–480
cal potential for molecular modeling and 21. Frauenfelder H, Petsko GA, Tsernoglou D
dynamics Studies of proteins. J Phys Chem B (1979) Temperature-dependent X-ray diffrac-
102:3586–3616 tion as a probe of protein structural dynamics.
8. Oostenbrink C, Villa A, Mark AE, van Nature 280:558–563
Gunsteren WF (2004) A biomolecular force 22. Jorgensen WL, Chandrasekhar J, Madura JD,
field based on the free enthalpy of hydration Impey RW, Klein ML (1983) Comparison of
and solvation: the GROMOS force-field simple potential functions for simulating liquid
parameter sets 53A5 and 53A6. J Comput water. J Chem Phys 79:926–935
Chem 25:1656–1676 23. Berendsen HJC, Postma JPM, van Gunsteren
9. Wang J, Cieplak P, Kollman PA (2000) How WF (1981) Interaction models for water in
well does a restrained electrostatic potential relation to protein hydration. In: Pullman B
(RESP) model perform in calculating confor- (ed) Intermolecular forces. D. Reidel Publishing
mational energies of organic and biological Company, Dordrecht, Germany, pp 331–342
molecules? J Comput Chem 21:1049–1074 24. Bussi G, Donadio D, Parrinello M (2007)
10. Chandler D (1987) Introduction to modern Canonical sampling through velocity-rescaling.
statistical mechanics. Oxford University Press, J Chem Phys 126:014101
New York, NY 25. Kabsch W, Sanders C (1983) Dictionary of
11. Essman U, Perera L, Berkowitz M, Darden T, protein secondary structure: pattern recogni-
Lee H, Pedersen LG (1995) A smooth parti- tion of hydrogen-bonded and geometrical fea-
cle mesh Ewald method. J Chem Phys 103: tures. Biopolymers 22:2577–2637
8577–8593 26. Jorgensen WL, Madura JD (1985) Temperature
12. Ryckaert JP, Ciccotti G, Berendsen HJC and size dependence for monte carlo simula-
(1977) Numerical integration of the cartesian tions of TIP4P water. Mol Phys 56:1381–1392
equations of motion of a system with con- 27. Mahoney MW, Jorgensen WL (2000) A five-
straints; molecular dynamics of n-alkanes. site model for liquid water and the
J Comp Phys 23:327–341 reproduction of the density anomaly by rigid,
13. Hess B, Bekker H, Berendsen HJC, Fraaije nonpolarizable potential fuynctions. J Chem
JGEM (1998) LINCS: a linear constraint Phys 112:8910–8922
solver for molecular simulation. J Comput 28. Nosé S (1984) A molecular dynamics method
Chem 18:1463–1472 for simulations in the canonical ensemble. Mol
14. Lindahl E, Hess B, van der Spoel D (2001) Phys 52:255–268
GROMACS 3.0: A package for molecular sim- 29. Parrinello M, Rahman A (1981) Polymorphic
ulation and trajectory analysis. J Mol Model transitions in single crystals: a new molecular
7:306–317 dynamics method. J Appl Phys 52:7182–7190
26 Erik Lindahl
30. Ensign DL, Kasson P, Pande V (2007) 31. Voelz VA, Bowman GR, Beauchamp K,
Heterogeneity even at the speed limit of fold- Pande VS (2010) Molecular simulation of
ing: large-scale molecular dynamics study of a ab initio protein folding for a millisecond
fast-folding variant of the villin headpiece. folder NTL9 (1-39). J Am Chem Soc 132:
J Mol Biol 374:806–816 1526–1528