0% found this document useful (0 votes)
19 views127 pages

Simulation Pack Edition-1

The document is a comprehensive guide on the simulation of biomolecules, covering topics such as molecular mechanics, halogen bonds, sampling methods, molecular dynamics, Monte Carlo methods, advanced sampling methods, and free energy calculations. It includes detailed sections on various theoretical concepts, methodologies, and applications relevant to the field. The work is licensed under a Creative Commons license and acknowledges contributions from various sources.

Uploaded by

walhossain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views127 pages

Simulation Pack Edition-1

The document is a comprehensive guide on the simulation of biomolecules, covering topics such as molecular mechanics, halogen bonds, sampling methods, molecular dynamics, Monte Carlo methods, advanced sampling methods, and free energy calculations. It includes detailed sections on various theoretical concepts, methodologies, and applications relevant to the field. The work is licensed under a Creative Commons license and acknowledges contributions from various sources.

Uploaded by

walhossain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

Simulation of Biomolecules

G.Botti

II semester 2018/19
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives
4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-nd/4.0/ or send a letter to Creative Commons, PO
Box 1866, Mountain View, CA 94042, USA.

Le immagini sono esenti da CC, in quanto reperite dal materiale didattico fornito dal professor
Pieraccini

Si ringrazia Federica Marelli, per aver fornito una buona parte del materiale da cui questo testo
è stato tratto

1
Contents

1 Molecular Mechanics 4
1.1 Applying classical mechanics to a molecular system . . . . . . . . . . . . . . . . . 4
1.2 Force Fields and atom types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Force Field of a noble gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Diatomic molecule: stretching term . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Triatomic molecule: bending term . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Polyatomic molecule: torsion term . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Polyatomic molecule: intermolecular non-bonding term . . . . . . . . . . . . . . . 12
1.8 Lennard-Jones potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9 Electrostatic potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.10 Extra (optional) terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.11 The parameters problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.12 Some relevant questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.13 FF application in coordination chemistry . . . . . . . . . . . . . . . . . . . . . . 18
1.14 FF computational weight and classification . . . . . . . . . . . . . . . . . . . . . 18

2 Halogen Bond 20
2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 σ-hole nature and parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Halogen bond in bioactive molecules . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Distorted halogen bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 XB applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 XB modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Extra point charge (EPC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Docking procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Scalable anisotropic model (SAM) . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Introduction to sampling methods 24


3.1 A not so brief skirmish with statistical mechanics . . . . . . . . . . . . . . . . . . 24
3.2 Ensemble overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Partition functions and thermodynamic potentials . . . . . . . . . . . . . . . . . 27
3.4 Some relevant quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Molecular Dynamics 31
4.1 Finite differences method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Time step in molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 About the stability of MD trajectories . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Size and boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2
4.5 (En)sampling different ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.6 Verlet neighbour list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.7 Periodic Mesh Ewald approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.8 Equilibration phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.9 Restraints and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.10 Just restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.11 Constraints only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.12 Analysis of MD Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Monte Carlo methods 46


5.1 Monte Carlo hit or miss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Monte Carlo sample mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Monte Carlo importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 The actual Metropolis algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.5 Isothermal-isobaric Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.6 Random numbers generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 Advanced sampling methods 55


6.1 The barrier problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Parallel tempering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7 Free energy calculation 58


7.1 Free energy is not energy for free . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2 Actually calculating free energy, literally . . . . . . . . . . . . . . . . . . . . . . . 59
7.3 Properties of a system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.4 Variation of free energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.5 Cumulant expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.6 Relation between phase spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.7 Multistep free energy perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.8 Order parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.9 Improvements upon multistep perturbation . . . . . . . . . . . . . . . . . . . . . 66
7.10 Thermodynamic cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.11 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8 Using free energy 70


8.1 Free energy and macroscopic variables . . . . . . . . . . . . . . . . . . . . . . . . 70
8.2 Implicit solvent approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
8.3 Binding energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.4 Born and Generalized Born models . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.5 Sampling of the phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
8.6 Alanine scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

9 Analysis of protein-protein interactions 75


9.1 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.2 Geometrical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.3 Residue characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.4 Surface characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.5 Targeting the PP interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3
10 An handful of examples of MM applications 78
10.1 p53-hdm2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
10.2 Vinblastine in microtubules targeting . . . . . . . . . . . . . . . . . . . . . . . . . 78
10.3 Rapamycin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
10.4 An example of peptide strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

11 Jarzynsky equation 81
11.1 Non-equilibrium simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.2 Proof for Jarzynski equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
11.3 A time of trials and tribulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
11.4 Cumulant expansion of J. equation . . . . . . . . . . . . . . . . . . . . . . . . . . 84
11.5 Stiff spring approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

12 Umbrella sampling 87
12.1 A qualitative introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
12.2 An analytical elucidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
12.3 Adaptative umbrella sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.4 WHAM (!!) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

13 Umbrella sampling applications 90


13.1 Binding free energy of a polipeptide . . . . . . . . . . . . . . . . . . . . . . . . . 90
13.2 Peptide stability in mixed solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

14 Metadynamics 92
14.1 Something new, something old . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
14.2 Gaussian sand does not drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
14.3 Collective variables socialism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

15 Some examples 96
15.1 Test on the number of gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
15.2 Parrinello’s benchmark work on β-hairpin . . . . . . . . . . . . . . . . . . . . . . 97
15.3 Osmoprotectants 2: electric boogaloo . . . . . . . . . . . . . . . . . . . . . . . . . 97

16 Coarse grain force fields 99


16.1 martini force field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
16.2 Pros and Cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
16.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

17 Accelerated molecular dynamics 101


17.1 The accelerated approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
17.2 Free energy profile recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
17.3 The shape of the δU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
17.4 Reconstruction and reweighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
17.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

18 Conformational Analysis 104


18.1 Conformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
18.2 Generating starting points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
18.3 Advanced optimisation approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 106
18.4 Cluster analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4
19 Protein folding and stability 109
19.1 Characteristics of protein folding . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
19.2 Protein as a frustrated system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
19.3 Hydrophobic interaction in protein folding . . . . . . . . . . . . . . . . . . . . . . 110
19.4 Ideal chain polymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
19.5 Globule and coil model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
19.6 Random energy model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
19.7 An overview of folding kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
19.8 Chemical reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
19.9 Phase transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

20 Anti-freeze proteins 119


20.1 First order phase transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
20.2 Antifreeze protein action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5
List of Figures

1.1 Schematic representation of hard sphere potential . . . . . . . . . . . . . . . . . . 6


1.2 Schematic representation of square wave potential . . . . . . . . . . . . . . . . . 7
1.3 Schematic representation of Lennard-Jones potential . . . . . . . . . . . . . . . . 7
1.4 Schematic representation of Morse potential . . . . . . . . . . . . . . . . . . . . . 9
1.5 Morse potential approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Torsional potential profile of butane . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 Various forms of motum integration; (a) Verlet algorithm (b) Leap frog algorithm
(c) Velocity Verlet algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Pictorial representation of the periodic boundary condition setup . . . . . . . . . 35

6.1 From right to left, top to bottom: the potential energy function; the Metropolis
MC population result; the Parallel Tempering population result; the space-time
representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

14.1 A beholder. It has beauty in his eye . . . . . . . . . . . . . . . . . . . . . . . . . 93

15.1 Filling of a potential profile by an increasing number of gaussian functions . . . . 96

20.1 Crystallization free energy tendencies . . . . . . . . . . . . . . . . . . . . . . . . . 120


20.2 That is not just ice ahead, it is the pack . . . . . . . . . . . . . . . . . . . . . . . 123

6
Chapter 1

Molecular Mechanics

In this chapter, we will discuss the usage of force fields as a tool for running
simulations of large systems. We will analyse and discuss each term of the
potential, ending with force field applications and classification

1.1 Applying classical mechanics to a molecular system


This course will mainly treat about the in silico simulation of biomolecules, that comprehends
all those techniques applicable to molecules such as proteins, DNA fragment and membranes,
especially when involved in biological processes.
Therefore, the investigated systems will principally correspond to
• organic molecules with possible inorganic insertions
• large object, such as hundreds of amino acids in aqueous environment, i.e. between 10000
and 100000 atoms
From previous classes, we know that quantum mechanics can be applied exactly only up to the
hydrogen atom. For larger systems, approximations are needed; for example, the ion H+ 2 will
require the Born-Oppenheimer approximation, whereas even a simple molecule like H2 will need
further approximations like Hartree-Fock and subsequent corrections.
In quantum mechanics, a molecule is described as a set of nuclei surrounded by an electron
cloud, the energy of which can be worked out with the Schrödinger equation:
ĤΨ = EΨ
from which we can also obtain topological information about the molecule itself, id est about the
atomic configuration and bonding. Nonetheless, applying quantum mechanics on a large system
is computationally unaffordable.
Fortunately, we have that quantum effects become less and less important as the dimension
of the system grows; in this case, we can employ classical approximation, that is not only our
last resort, but also very useful in treating large amounts of small molecules, such as solvents.
The main assumption of the classical approximation is that atoms can be described by rigid
spheres, linked by springs (the bonds); each atom comes with a specific radius and the topology
of the system must be known a priori. This is quite a radical simplification, because it erases
completely the electronic degrees of freedom.
In the next sections we will discuss on how classical mechanics can be applied to compute
energies, minimum configurations and time evolutions of biomolecular samples.

7
1.2 Force Fields and atom types
In molecular mechanics we describe the way atoms interact by setting up a force field (FF),
that is an expression of the system potential energy as a function of coordinates. Each one of
these fields has a different functional form, given by different terms that sum up to the total
potential energy.
In order to expediently write this functions, we have to introduce things called atom types.
This is an atomic tag that accounts not only for atomic number, but for hybridization and
chemical environment too. In this way, an sp3 carbon atom is a different atom type than an sp2
one, and an sp2 carbon is equally distinct from an sp carbon; a carbonilic carbon is not in the
same atom type than an alkenic or carbossilic one, as a carbon in a cyclohexane is not the same
of a carbon atom in an alkanic chain. For every application of molecular mechanics, then, we
have to define carefully both the functional form of the force field and the employed atom types.
From chemical experiments, we know that similar groups of atoms in different molecules
have the same behaviors, id est certain properties can be transferred between certain elementary
bricks. For example, the distance between a carbon atom and an hydrogen one is the same in all
aliphatic chains, and the same goes for vibrational frequencies. These elementary bricks can be
transferred, but first we must define them; we already know that proteins are made by amino
acids, DNA by azobasis and biomembranes by phospholipides: these fundamental units make
force fields an effective way to describe biomolecules.
We will now proceed by delineating the general aspect of a force field, in the case of

• a noble gas
• a bound yet small system (small molecule)
• a large system

1.3 Force Field of a noble gas


A noble gas of N atoms is described by a set of N position vectors qi (3N coordinates) and N
momentum vectors pi (3N momenta),

Q = (q1 , . . . , qα , . . . , qN ) P = (p1 , . . . , pα , . . . , pN )

The total energy can be written as the hamiltonian function H(Q, P),

H(Q, P) = K(P) + U (Q)

in which K is the rather simple kinetic term,


N X 3
X p2iα
K=
i=1 α=1
2mi

while U is the potential contribution. Here is where the problem starts, for the potential depends
(for a real gas) on the nature and position of each particle.
In the most simple of cases, it can be written as a sum of terms, each depending on the
relative distances ri :
X XX XXX
U= U1 (ri ) + U2 (ri , rj ) + U3 (ri , rj , rk ) + . . .
i i j>i i j>i k>j

8
To simplify, we need first to describe and understand each term.
The total one-body interaction U1tot is the potential energy of the particles in a given field
(e.g. gravitational, electric), therefore can be neglected in most of the cases. On the other hand,
the three-bodies total contribution U3tot is an hefty chunk of energy, but can be implicitly imple-
mented in the two-bodies potential U2tot , by using experimental parameters. Therefore, in first
approximation we can just consider two-bodies interactions.
Real gasses can be described as small hard spheres moving in space, but we need a proper
pair interaction potential to use as U2 ; some of the most common are described afterwards.

Hard sphere potential It describes spheres that do not interact and cannot penetrate each
other. Totally vanilla sphere, then; the kind of sphere your grandma likes. Analytically, it takes
the form (
∞ for r < σ
UHS (r) =
0 for r ≥ 0

Figure 1.1: Schematic representation of hard sphere potential

Square wave potential It includes interaction below a certain distance. Analytically, it is


something like this
for r < σ1

∞

USW (r) = − for σ1 < r < σ2
for r ≥ σ2

0

This mean that the sphere are still impenetrable (r < σ1 ), but can interact (σ1 < r < σ2 ).

Soft sphere potential It allows the sphere to bounce against each other. It is usually described
as  σ n
USS (r) =  = ar−n
r
where a = σ n , n ∈ N. This means that for a low n there is a slight decrease of U , while for an
high n the potential goes more and more like that of an hard sphere.

9
Figure 1.2: Schematic representation of square wave potential

Lennard-Jones potential It takes different forms. The first one is this


  
LJ σ 12  σ 6
U (r) = 4 −
r r

As we can see, it takes two parameters to make a Lennard-Jones potential, the well depth  and
the zero-length σ. This potential can be divided in two contributions:

• a positive one, corresponding to the repulsive interaction


• a negative one, corresponding to the attraction interaction

Figure 1.3: Schematic representation of Lennard-Jones potential

The second form it is employed to underline the minimum energy interatomic distance, rm :
   r 6 
LJ rm 12 m
U (r) =  −2
r r

where, by comparison to form 1, we can get



6
rm = σ 2

10
The third form clamps everything together into
B A
U LJ (r) = 12
− 6
r r
where
B = 4σ 12 A = 4σ 6
A careful student will now ask himself why he needs to remember all this formulas. This
is because different softwares implement different forms of this potential, each one allowing to
extract different parameters.
The Lennard-Jones is the form of the potential that can describe Van der Waals interactions,
given  and σ,  and rm or A and B. This parameters have a lousy physical meaning:  corresponds
to the strength of interaction, while σ and rm are linked to atomic dimensions. For this, to describe
a box full of Ne, the Ne radius and the strength of the Ne-Ne are required. Instead, to describe
the a box full of two gasses, we can apply the Lorentz-Berthelot mixing rules; known σii ,
σjj , ii and jj , this rules allow us to calculate
1 √
σij = [σii + σjj ] ij = ii jj
2

1.4 Diatomic molecule: stretching term


Any diatomic molecule can be described as two sphere connected by a spring; experimentally,
we know that compressing the bond increases the energy, while stretching it increases the the
energy just until the bond breaks. This behavior is described by the Morse potential,
2
U (r) = De {1 − exp [−a(r − r0 )]}
where De is the dissociation energy, r0 is the natural or equilibrium bond length (therefore
U (r0 ) = 0); on the other hand, r
r
µ k
a=ω =
2De 2De
since s
k
ω=
µ
with µ as the reduced mass and k as the spring stiffness. This potential is shown in Figure 1.4.
This potential needs three parameters and the evaluation of an exponential form, so it is
not so easy to use. But by analysing the graph, we can see that there is a huge penalty for
non-equilibrium bond length, in which we are not interested. We can indeed limit our focus in a
10 kcal mol−1 box around r0 , simplifying the stretching potential by making a Taylor expansion
serie around r0 :
dU 1 d2 U 2 1 d3 U
U (r − r0 ) = U (r0 ) + (r − r0 ) + (r − r0 ) + (r − r0 )3 + . . .
dr 2 dr2 6 dr3
where U (r0 ) = 0 because it is the zero of the scale and dr U = 0 because we are in a minimum.
Let’s truncate at second order either way:
1 d2 U 1
U (r − r0 ) ' 2
(r − r0 )2 = K(r − r0 )2 = Kab ∆rab
2
2 dr 2
This way, the Morse potential resemble the potential energy of a spring, cutting down the number
of parameters to two: the natural distance r0 and the spring stiffness K.

11
Figure 1.4: Schematic representation of Morse potential

a large K means high stiffness, typical of a strong bond, that is intolerant to deformation
a small K means low stiffness, typical of a weak bond, that is tolerant to deformation

therefore
Ksingle < Kdouble < Ktriple
While we are at it, we should remember not to take seriously the physical interpretation of FF
parameters, but to limit ourself to understand which role they take in to the energy description.
Back to the track, we can easily see that this model is highly inaccurate at high r, because
it cannot predict the dissociation; therefore, for small molecules, we can push the expansion to
higher orders:
2 0 3 00 4
U (r − r0 ) = Kab ∆rab + Kab ∆rab + Kab ∆rab + ...
However, we must be cautious: higher orders not only mean more parameters, but have different
trends, as shown in Figure 1.5.
As we can see, the third order expansion presents an infinite well that creates problems during
optimization, requiring a reasonable starting geometry; on the other hand, fourth order diverges,
even if with a different behavior, but requires another parameter. The most expensive stretching
FF is of sixth order.
Let’s now discuss about the nature of r0 ; it is proper to call it natural bond length, because
the equilibrium length can be misleading; as a matter of fact, it does not correspond to any
equilibrium bond length in any real molecule, but it only describe the equilibrium length of a
diatomic molecule in vacuum. In a big molecule, the equilibrium bond length is due to many
factors.
Since every FF is different for any other, it is customary to give the measurement units of K:
−2
(
kcal mol−1 Å in Amber
K= −1
kJ mol nm −2
in Gromacs
Obviously, when treating a system made of more than one diatomic molecule, we need to take
into account the non-bonding interaction between the molecules too.

12
II ordine
600
III ordine

400

200

UE
0

−200

−400

−4 −2 0 2 4 6 8

Figure 1.5: Morse potential approximations

1.5 Triatomic molecule: bending term


With the addition of another atom to the molecule, we have an extra term to consider: the angle
bending term; usually, this term is harmonic too, like having a spring between the two bonds:

U bnd = K(θ − θ0 )2

where the squared term corresponds to the deviation of the angle θ with respect to the natural
angle θ0 . Like before, we can obtain a better description by adding higher order terms, at the
cost of more parameters. Since we are interested in what happens in close proximity to the
equilibrium, the harmonic approximation is good enough for us. The units of measurement, as
always, change with the software employed:
(
kcal mol−1 rad−2
K=
kJ mol−1 deg−2

Stretching and bending are so-called hard degrees of freedom, i.e. great variations of energy
are produced upon small changes in them (remember: big penalties).

1.6 Polyatomic molecule: torsion term


As another atom is added to our molecule, another term arises in our energy contributions.
In fact, we can have a rotation around a bond, that is a variation around a torsional angle.
Therefore, we can consider the torsional contribution as a function of ω. This energetic term has
some peculiarities:

• the energy is a periodic function of the torsional angle


• torsional degrees of freedom are soft ones: rotation is permitted, but it usually involves
small energy variations

13
For these reasons, the Taylor expansion is no longer suitable; instead, we use Fourier serie
expansion, as a sum of cosine terms:
X
U (ω) = Vn cos(nω)
n=1

where Vn goes under the name of barrier or barrier height. This term is again a bit misleading:
the real barrier is associated to many factors, like steric interaction (of Van der Waals nature);
the true rotational profile is indeed given by the sum of rotational terms and by the non-bonding
terms. The natural number n is related to the periodicity of the energy, as shown in Table 1.1.

n angle of periodicity
1 360◦
2 180◦
3 120◦

Table 1.1: Relation between n and periodicity

For an organic molecule with a small amount of bonds, the first three term are enough. Let’s
now discuss some very simple examples.

Ethane has three equivalent minima (staggered conformations) and three equivalent maxima
(eclipsed conformations), for an energetic profile that just requires the n = 3 term to be described:
the others have Vn6=3 = 0. Rigorously, an infinite serie expansion would require all the n = 3k
terms, with k ∈ N, but it’s cool to truncate.

Butane instead, has an absolute minimum (anti conformation), an absolute maximum (sin
conformation), two relative maxima and two relative minima (gauche conformations), that have
a little more energy due to the steric clash of the methyl groups. The symmetry of the system is
therefore different from that of ethane, so that not only the n = 3 is required, but also the n = 1
is necessary to tune the energy profile. Everything is shown in Figure 1.6.

Figure 1.6: Torsional potential profile of butane

Ethylene has two minima, at 0◦ and at 180◦ , therefore it requires the term n = 2 (or n = 2k)
in order to get this profile.

14
2-butene has two possible isomers, cis (with higher energy) and trans (with lower energy),
therefore it needs the n = 2 term to get the periodicity and the n = 1 term to get the fine
structure.
In general, we can also find the Fourier expansion written as
1 1 1
U (ω) = V1 [1 + cos(ω)] + V2 [1 − cos(2ω)] + V3 [1 + cos(3ω)]
2 2 2
where the 1s and + or − are simply there in order to get maxima and minima according to the
experimental data.
Also generally, the torsion term is employed to finely tune the rotational barrier.

1.7 Polyatomic molecule: intermolecular non-bonding term


In a large chain, we have to consider non-bonding interactions even between the same chain
atoms. To do this, we must understand which degree of separation requires them:

1-2 is covered by the stretching term


1-3 is covered by the rotational term (but seldom some FFs have non-bonding interactions for
this)
1-4 has a non-bonding term, scaled down because it already has the torsional term
1-5+ is treated as interaction between atoms belonging to different molecules (Lennard-Jones)

1.8 Lennard-Jones potential


As we saw, Lennard-Jones potential can be written as
B A
U LJ = − 6
r12 r
In this potential, the attractive term (r−6 ) can be deducted from the perturbative study of tran-
sient dipole moment interactions, and it is considered correct. On the other hand, the repulsive
term (r−12 ) is completely empirical and not always correct (sometimes the real trend is softer
than r−12 ). This means that other empirical forms can be used, and some of those can better
describe what’s going on.
One of these forms it’s the so-called Buckingham-Hill potential
C
U BH (r) = Ae−Br −
r6
where the exponential repulsive term is better suited to describe close-range interactions.
There are however some contraindications:

• three parameters are required, one more than Lennard-Jones


• we have to compute an exponential function (5 times more expensive than any algebraic
operation)
• for very small values of r, it diverges to −∞, presenting some problems during geometry
optimisation

15
• the distance r is (in Cartesian coordinates) the square root given by the Pythagorean
theorem and, as we know, square roots are as difficult to compute as exponential functions.
The BH potential requires therefore to evaluate the square root, that LJ avoids thanks to
the even powers it has

As always, we are rarely interested to what happens at low r, so even if LJ is not as impressive
as BH, using it does not bring any problem, allowing instead a faster computation of the total
energy.
Finally, we can take a look at the complete form of U , that can be written as follows:

U tot = Ustr + Ubend + Utor + UVdW + Uelst + U× + Uoop

We will continue this dissertation by considering the last term of this sum, the electrostatic term
Uelst , the cross term U× and the out-of-plane term Uoop .

1.9 Electrostatic potential


We can describe electrostatic interactions in two different ways:

1. with point charges placed at atomic positions (i.e. nuclear coordinates), with the possi-
bility of a fine tuning with additional charges
2. with dipole moments placed on the bonds

This two descriptions should yield the same results, given that we look at the molecule from
enough distance; though some differences may occur in describing certain rotational barriers.
Nonetheless, the second choice is computationally heavier, therefore the point charges are pre-
ferred.
This charges are chosen to better fit the electrostatic quantum-mechanical potential of the
molecule; we can consider it as made by a nuclear and electronic term, as follows
N
ρ(r0 )
Z
X Zi
ϕesp (r) = ϕn (r) + ϕel (r) = − dr0
i
|Ri − r| |r0 − r|

This means evaluating the electrostatic potential of an external point r as given by the sum of
each nuclear potential and the continuous sum of the potential generated by every charge fraction
ρdr. By fitting this function to the quantum mechanical potential, we obtain the point charges.
First and for most, we must decide where we are going to put the r point; usually, molecular
surfaces are employed, because non-bonding interactions are the main focus. The choice of these
surfaces is arbitrary: sometimes they are the envelope of Van der Waals spheres, sometimes the
surfaces containing 90% of the electronic density.
We now evaluate ϕesp in some points of this surface, then we look for the set of N (where
N = number of nuclei) point charges, starting from a guessed value for each of them. At this
point we optimize the error function

pts N
!2
X X Q(Ri )
ErrF(Q) = ϕesp (rj ) −
j i
|Ri − rj |

16
where Qi is the value of the i-th charge, and the i sum is the electrostatic potential given by
all of them. If ErrF = 0, we got to the right function, but this is not as usual as one can think.
Therefore, we simplify this function as

m N
!2
X X
ErrF(a) = Yi − ai Xij
j i

This way, we can more easily optimize by looking for a minimum:


!2
∂ ∂ X X
0= ErrF(a) = Yj − ai Xij =
∂ak ∂ak j i
!2
X ∂ X
= Yj − ai Xij
j
∂ak i
! !
X X ∂ X
= 2 Yi − ai Xij Yi − ak Xij =
j i
∂ak i
!
X X
= 2 Yj − ai Xij (−Xkj )
j i

This gives us a set of algebraic equations for each k, that we can write as
!
X X
Xkj Yj − ai Xij = 0 ⇒
j i
X X X
ai Xkj Xij = Xkj Yj
i j j

In matrix form, we can write


Xa = b
that we can solve by matrix inversion
a = X−1
This kind of problem is usually badly conditioned: atoms close to the surface are very impact-
ful on the value of the potential, while buried atoms are very marginal; this means that even a
big variation of their charges may not have great effect on the potential. This can give us charges
non-physically big or small on these atoms.
To prevent this, we may apply some restraints, such as

• forcing the equivalence between the sum of the point charges and the total molecular charge
• applying a RESP (Restrained Electro Static Potential), that adds an hyperbolic penalty
for the deviation of any charge from 0; in this way, we avoid unrealistic values of the buried
atoms charges

This point charge approach works pretty well, but it has some limitations:

• It fits the charges as seen from the outside, therefore it is unreliable for anything intramolec-
ular

17
• It can be a little crude
• It does not take into account how each charge depends from the conformation: different
minima have different ϕesp , so different charges; this mean that the charges are not polar-
izable
• Atomic charges are not fully transferable, but with a correct parametrization, we can take
the risk
• All QM values are taken in vacuum, while this kind of FFs is usually employed in solvent; to
compensate the obvious change in charges, ϕesp is evaluated at HF level, that overestimates
the dipole moment; this way, in error compensation we trust
Since this kind of potential is critical for anything biological, we can improve it by considering
• additional constraints, for example same charges on symmetrical atoms
• additional point charges, in certain charge-rich points, like triple bonds or lone pairs: the
more charges we have, the better the fit

1.10 Extra (optional) terms


Hydrogen bond The H-bond is a mainly electrostatic interaction, with an energy (4 kcal mol−1 )
in between that of a bond (100 kcal mol−1 ) and that of a VdW interaction (0.2 kcal mol−1 ). More-
over, it is directional, while VdW interactions have spherical symmetry.
This kind of interaction is not usually considered, except for certain specific FF (like GLY-
CAN, an AMBER-based FF for sugars); in this case, it is written as
    r 10 
HB r0 12 0
· f 1 − cos θxjk

U =ε 5 −6
r r
where in the first term is similar to Lennard-Jones, but it is deeper (higher ε) and steeper
(faster-decaying attraction term); on the other hand, the directionality is given by a function
of the angle between the three atoms x, j and k, that every software evaluates in a different
manner. Obviously, HB terms are evaluated just between the two atoms forming the bond.

Out-of-plane When a molecule is planar, the total internal angle is 360◦ . If the molecule gets
pyramidal, this angle changes, but great pyramidalization can be achieved by small variation of
this angle, therefore a strong harmonic penalty is require to prevent this. It is usually applied to
the angle χ between the bond and the plane, or to the distance d between the central atom and
the plane:
Uoop = Kχ2 Uoop = Kd2
Otherwise, high-barrier torsional terms can be employed to the same effect.

Cross terms Changing a certain term may influence some other: decreasing an angle, for
example, may elongate the bond; this means that bending and stretching are not completely
separated, and we use a cross term to take all of this into account:
Es/b = K(θabc − θ0abc ) (Rab − R0ab )(Rbc − R0bc )
 

We can have any combination of two or three terms coupling, therefore cross terms are only used
in FF for small molecules.

18
1.11 The parameters problem
Decided the functional form, we need to assign a value to each and every parameter that appears
in the force field, Table 1.2 shows what was done for a very famous FF.

Kind of term Number of terms N. of parameters Parameters fitted


Van der Waals 71 atom types 142 142
Stretching 450 bonds 900 290
Bending 13500 angles 27000 824
Torsion 405000 dihedral angles 1200000 2466

Table 1.2: Parameter job on a famous force field

This amounts to almost 0,2% of the required parameters; nonetheless, this FF is capable of
describing 20% of the existing organic molecules.
The process through which we find these parameters is another optimization of an error
function. We start from a training set of molecule with a lot of experimental data, then we use
the FF to evaluate the properties of the training set, beginning with guessed parameters; at this
point, we minimize
data
X
ErrF(prop) = wi (ref. value − calc. value)2
i
where wi is a generical weight.
From many parameters, we get many minima of ErrF; in order to get the best one, we can

• optimize the parameters sequentially, from chemical class to chemical class; this is an
application of transferability and ensures us that the task dimension gets smaller the more
we optimize making easier to add a new chemical class
• optimize all parameters en bloc, since the FF is unitary; this makes parametrization more
difficult, even if a better minimum can be obtained. Adding a new chemical class will
require a new, complete re-parametrization, though

Each molecule can have its own FF, that is as exact as useless: the purpose is to get the
experimental data without doing the experiment; we are not at all interested in reproducing the
experimental data of the training set! We are then required to test the FF on a validation set, i.e.
a set of molecules not contained into the training set, of which we have access to the experimental
data. If the FF is able to reproduce even the validation set data, it can work outside the training
set.
If we use the FF on molecules similar to those in the training set, we can achieve good results:
this makes biomolecules an ideal application for FF, because they are very similar between each
other. Organic molecules are good too. We must although remember that every FF has a purpose,
and employing it for something else may bring us to ruin: FF are good at linear peptides, but
not at cyclic peptides; in addition, since they focus on equilibrium states, protein folding is
completely inaccessible through them, so be careful!
The weight wi is used because all experimental data are equal, but some experimental data
are more equal than others: different techniques have different values, precision and equilibrium
condition. Some data are not even experimental, but are worked out by QM calculations (like
torsional terms, that require non-equilibrium data).
This brings us to the meaning of U tot . As far as its numerical value goes, U tot has little to
none meaning; this happens because the zero of the U scale is unobtainable: it requires all bonds

19
at natural distance (stretching term null), all angle at natural distance (bending term null) but
all atoms at infinite distance (Van der Waals term null). This means we cannot have a zero,
so we are not able to confront the values of U for two different molecules; only comparing two
different conformations of the same molecule is possible. FF can therefore be used to obtain
informations about the geometries and their relative energies; although they were developed for
organic molecules, some of them can also be applied to coordination compound.
In the case we are missing some parameters for our FF calculations, we can take different
paths:

• re-parametrize all; this path is seldom taken


• we can replace the missing parameters with something similar; e.g. if we lack the parameters
for

O O

C C

we can employ those for

C C C O O O∗

C C C C C C

In this case, a lot of chemical intuition and testing is required; the bare minimum is checking
how the properties change by modifying this parameters: if they change a lot, maybe we
should stop and re-parametrize; if they change a little, going on is acceptable

Missing parameters are a critical topic in FF implementation, since rarely softwares inform
the user when they use low quality (that is, totally nonsensical) parameters. Sometimes, rather
rough (ruspantelle) approximations are implemented; for example, some parameters are the same
for entire groups of atoms.
The teaching of this story is that just because you can do the calculation, it doesn’t mean
you will get a good result.

1.12 Some relevant questions


Redundancy If we consider an sp2 carbon atom, we can easily see that of the three angle that
surround it, only three are independent: the third one must add up to 360◦ ; of the three FF term
defined, then, just two are necessary. This is rather obvious if we take into consideration that for
a 6-atoms molecule we get 3N − 6 = 15 degrees of freedom. In the force field, instead, we have
6 stretching terms, 9 bending terms, 6 dihedral terms and an out-of-plane one, for a total of 22
degrees of freedom. It is clear that our FF is significantly redundant, because we are considering
more degrees of freedom than necessary. This can be useful though: redundancy compensates for
missing or badly acquired parameters, and it helps reducing the impact of parameters referred
to other molecules. This tells us that is completely forbidden to mix different FF together, at
the cost of getting nonsensical results; we have to remember that, at the end of the day, FF
parameters are just that: numbers that fit the experimental data. However, some FF are made
to work together, like the AMBER family of FFs.

20
Decomposition We can be tempted to decompose the result of a FF calculation to gain insight
on the predominant terms of the potential energy. This procedure is surely possible, but we must
recall that it doesn’t show at all the physics of the system, but just how the FF is built. The
result are globally correct, but it is better to avoid any over-analysis, since a lot is flavour.

1.13 FF application in coordination chemistry


Coordination compounds are quite more complex than usual organic molecules, possessing higher
coordination numbers and different geometries for many of those. In addition, coordinating bonds
are different from covalent one, being for example much more flexible, or presenting electronic
phenomena like the trans-effect1 .
It is common, in coordination compounds like iron pentacarbonate (Fe(CO)5 ), that although
the ligands can be swapped between each other, some FF terms (like bending) are different for
equatorial of axial substituents; for the same angle, we can easily have three parameters. This
issue may be solved by betraying the chemistry of the system and employing different atom
types; this way we loose the interchangeability.
Another problem in coordination chemistry is the bond definition. FFs require the topology
of the molecule, but it can be difficult identifying where are the bonds in π complexes. In this
case, we are usually forced to decide between one of the resonance structures, understanding
that some description are better than others, when we step down from the quantum mechanical
solution. Another geometrical question can be what geometry is assumed by the molecule: it has
to be decided beforehand through atom type selection.
Moreover, due to the nature of the bond, harmonic or second order expansion may not bring
satisfactory results, so Morse potential is employed, or bending terms are neglected by considering
1-3 non-bonding interactions (point and sphere approach).
Nonetheless, with their large presence alongside all biological systems, metal have to be
described by the means of a FF. A rather crude approach is representing the metal ion (e.g.
Zn2+ ) as a VdW sphere with a 2+ electric charge. From this, we can improve by considering the
octahedral Zn complex as a big charge surrounded by smaller charges, so that the total charge
is 2+; if the peripheral charges are δ+, the central one will be 2 − 6δ, in this case. In this way,
we get an octahedral coordination, but just because we are enforcing it; this description is good
for metals with few electrons, but it is rather problematic for transition ones. Moreover, if our
system explores different coordinations, we are utterly fucked; under this conditions, the best
choice is employing an hydrid method QM-MM.

1.14 FF computational weight and classification


The computational weight of a force field is mainly in its VdW term, going from a 70% for a 10
carbons hydrocarbon to 96% for a 100 carbons one. The scaling of each term for a system of N
atoms is shown if Table 1.3.
Even though different FFs have different parameter values, we can at least expect internal
coherence.
Finally, FFs are divided in three classes:

Class I is made by the simplest ones; they have all the basic terms, an harmonic stretching
and LJ potential for VdW; they are also known as harmonic or diagonal FFs and they are
largely employed for large system description
1 The trans-ligand influences the characteristics of the other ligand bond

21
Term Scaling
Stretching N −1
Bending 2(N − 2)
Torsional 3(N − 5)
Van der Waals N (N − 1)/2 − 3N + 5

Table 1.3: Scaling of each FF term for an N -atoms system

Class II possesses cubic or quartic stretching, a more complex VdW potential and cross terms;
their applications are limited to little molecules

Class III has hyperconjugation, δ effect and polarization; this class is restricted to a selected
club of small molecules

22
Chapter 2

Halogen Bond

In this chapter, we will discuss about the nature of the halogen bond; after
that, we will illustrate some implementation methods

2.1 Definition
Halogen bond is an elaborated hoax.
Seriously, it is an interaction between an halogen atom and a Lewis base; since halogen atoms
are the most electronegative ones, this seems a quite exotic interaction, but experimental data
show us that this bond is close to linearity and its length is less than the sum of the VdW
radii. This occurs because, although isolated halogen atoms are spherical, when bonded they can
present an anisotropy usually referred as polar flattening; in fact, the halogen electron density
is mostly located on the bond, and it seems compressed. This originates a positive electrostatic
potential in the region sitting opposite to the bond, known as σ-hole, while a negative potential
belt appears, perpendicular to the bond axis.

2.2 σ-hole nature and parameters


The nature of the σ-hole strongly depends on the halogen atom: we find bigger σ-hole with less
electronegative, less hard halogen, like I, while hard and electronegative halogens like F show
almost no σ-hole. In addition, the chemical environment also influences the dimension of the
σ-hole, since it occurs that vicinal electron withdrawing atoms can increase it significantly; we
can therefore conclude that different functional groups have different effect on the hole.
The way in which we describe the σ-hole is based on three parameters. Considering a sym-
metrical one, its point of maximum value on a chosen electrostatic potential surface is called
magnitude (mσ ), while we indicate as amplitude the angle that describes the σ-hole region
on the surface itself. Finally, the extension of the σ-hole is the distance it takes for the electro-
static potential to became from positive to null; it is can also be identified as the dimension of
the positive region along the bond axis.

2.3 Halogen bond in bioactive molecules


In biomolecules, halogen bond appears mainly between the halogen atom and backbone oxygens,
side chain sulphurs or nitrogens or π-clouds of aromatic compounds. That’s all.

23
2.4 Distorted halogen bonds
The halogen bond as a rather small tolerance to compression, losing a lot more energy by com-
pression than by extension; this means that this bond is very difficult to compress. On the other
hand, halogen bond can easily distorted angularly, if the linearity of the C-X-D interaction is
respected. The attack angle can vary, then. Obviously, the donor can modify both strength and
susceptibility of the XB.
A statistical analysis on the ligand-protein crystalline geometries shows there are three classes
of halogen interactions, namely primary, secondary and tertiary halogen bonds, classified through
the decreasing quality of the interaction. Strong linear (primary) bonds tend to force proper
structures to increase their energy, dictating the interaction geometry by the means of their
higher contribution to the binding energy. On the other hand, weaker bonds (like Cl ones)
generate less perfect structures, falling behind in the binding hierarchy.

2.5 XB applications
Certain ligands can be substituted by a different molecule that can form XB, sometimes even
with the same linking sites. By changing the nature and the topology of the interaction, new
approaches are now available to drug design.
On this topic, the possible synergy between HB and XB should be investigated. The statistical
analysis of the binding geometries tell us that the angle between an HB and an XB on the
same donor is always 90◦ , commonly with the planar HB and the out-of-plane XB. A fixed,
perpendicular XB does not influence the energy of a scanned HB, and the same vice versa; this
means that the two bonds are fairly independent from each other. Curiously, HBs perturb each
other, when in this very same configuration.
Replacing an HB with an XB can sometime improve significantly the overall binding energy,
even when it does not occur on the main linking position; e.g., the Cathepsin-ligand interaction,
that consists in a covalent bond, is significantly improved by a peripheral XB, that gives the
covalent bond enough time to form.

2.6 XB modelling
Before, the main effects of halogen atoms were hydrophobicity and dipole-moment changes. Now
we know the truth: they are useless as before, but someone thinks they play some role as bonding
site. This forces us to find a proper way to describe them in silico.
The halogen-donor interaction is of electrostatic nature, but it originates deep into the quan-
tum mechanics, so high order QM calculation should be employed. Obviously, this kind of cal-
culation are restricted to small molecules. For greater sets of atoms (up to the hundred), DFT
can be used, but for even bigger systems QM/MM hybrid methods are our only choice. However,
these calculations are computationally heavy and do not converge, so a good FF description is
required.
In this, the main problem is that X atoms are modelled as negatively charged VdW spheres,
so no XB can be seen on the horizon. We present two possible solutions:

1. extra point charges can be added to the halogen atom to reproduce the σ-hole; obviously,
the total charge is preserved. This approach only describes the predominant electrostatic
interaction

24
2. the halogen atom is not described as a sphere, but as an anisotropic shape, due to the polar
flattening

The first approach is more straightforward and common, against a more accurate and complex
approach number two.

2.7 Extra point charge (EPC)


Since ab initio QM calculations see σ-hole, we are searching for a viable way to implement this
phenomenon into FFs. A first approach consists of adding a new point charge (pseudoatom) on
the C-X bond axis. This can be done with an increasing degree of complexity:

no fit (nF) approach distance and charge are chosen previously, and the halogen charge is
modified accordingly. No QM calculations or RESP fitting are required, but charges have
to be known a priori
ruspantello fit (rF) approach after choosing distance and charge, all of the molecule charges
are refitted with RESP fitting. This approach requires an electrostatic potential grid
all fit (aF) approach the pseudoatom is treated as an extra point to fit electrostatic potential;
distance is fixed, but the charge has to be determined

The efficiency tests show that a very important parameter is the distance of the pseudoatom
from X. In the complex, this is a simple approach, but at least it considers an interaction, even
though qualitative.
A possible improvement is taking the pseudoatom distance as another variable, but it can
be done only on a limited set of molecules, due to its computational weight. The pseudoatom
is inserted as an extra little mass, that is kept in place by strong forces. In this way, we can
obtain geometries in good accord with more accurate QM/MM calculation and with PDB crys-
tallographic structures.

2.8 Docking procedures


We call molecular docking the tests that check if a certain ligand can fit an enzyme pocket. This
test are based on FFs, and can benefit from an improved description of XB. Its main aspects are
posing and scoring.

Posing is trying the fit of different poses (conformations) of the ligand. It is usually very efficient,
and it can be

rigid when both ligand and enzyme are not allowed to vibrate: only rototraslational de-
grees of freedom are explored
flexible when it is not rigid

Scoring is instead the assigning of a score to the different molecule geometries; it is the weak
point of docking procedures, since it employs an approximated binding energy function as
score function, so that this procedure could be applied to a large number of molecules.

25
Ideally, docking procedure should be able to create an accurate ranking of binding energies
and ligands, with a direct link to biological activity. In truth, since the scoring is inaccurate and
binding energy does not always correlate to in vivo biological activity, all of this is impossible.
What docking can do is enriching the molecular database of active compounds by reducing the
number of candidates (statistically) and analysing the binding modes and processes. In fact, the
set of molecules selected by these procedures is statistically richer in active compounds than the
random selected set.
As far as we are concerned right now, adding an extra atom to describe XB is very easy and
doable in these procedures.

2.9 Scalable anisotropic model (SAM)


A possible way to measure the anisotropy of a bounded hydrogen atom is changing the angle for
an He-Br2 interaction. Through QM calculations, we can see that the halogen atom is no longer
spherical, because the interaction depends from the interaction angle.
If we confront the HF (no correlation, full repulsive) energy, with the difference between MP2
(with correlation) and HF (full attractive) energy, what we get is that the most anisotropic term
is the repulsive one. Therefore, we can modify the LJ potential as
" 12  6 #
Rvdw (He) + hRvdw (Br)i − ∆R cos(να) Rvdw (He) + hRvdw (Br)i
VLJ = 4 −
r r

where the angle-dependence is in the cosine function, α = 180◦ − θ1 (θ1 is the interaction angle)
and hRvdw (Br)i is the effective bromine Van der Waals radius.
Anisotropy is considered also in the electrostatic term, by modulating the halogen charge as

ZBr = A cos(να) + B

so that the charge can depend from the angle in a way that describes the σ-hole. The final FF
functional form is able to reproduce QM results with acceptable accuracy, but other descriptions
are possible.
By confront with EPC results, we understand that SAM is more accurate and capable of
describing the physics of the system, at the cost of a much more difficult implementation. On
the other hand, EPC takes into account only the electrostatic term, but it is easier to implement
and accurate enough for many things.

26
Chapter 3

Introduction to sampling methods

In this chapter, we will introduce the statistical mechanics background for


the modelling applications, by focusing on the different ensembles and their
peculiarities

3.1 A not so brief skirmish with statistical mechanics


Now that we can write the potential, we can start thinking on how to use it. As we already know,
FFs can be used to

• confront different conformations


• look for the minimum energy geometry
• sample the behavior of the system

But before we deal with the third point, we have to get accustomed to the in silico use of sta-
tistical mechanics1 . Statistical mechanics relates the microscopic mechanical variables (q, p)
(microstate), with the macroscopic thermodynamical variables (N , V , p, T , µ, . . .) (macrostate).
In doing so, a key concept is that of phase space as the space defined by the 6N (3N generalized
coordinates plus 3N generalized momenta) degrees of freedom that describe the system.
We are going to discuss this space in a rather common example, the one dimensional harmonic
oscillator; the total energy of the system is given by
p2 1
E =K +V = + Kq 2
2m 2
An isolated system has E =constant, but we are going to fist in a little bit of indetermination,
taking instead E ∈ [E ; E + δE]. The phase space image of the HO is an ellipse; if we consider E
constant, we get a line, otherwise we get a little elliptical crown. Classically speaking, the phase
space is continuous, but the indetermination principle
~
∆x∆p ≥
2
does not allow it in quantum mechanics. As we can see from the formula, ~ has the same units
of measurement of an action, but also of a volume in the phase space, that is therefore granular.
1 For a rigorous introduction of statistical mechanics, see G. Mandelli - Introduzione alla Fisica Statistica

27
By solving the motum equations, we get
(
x = A cos(ωt + ϕ)
p = −mAω sin(ωt + ϕ)

This means our mass-on-spring oscillates at ω frequency, with an amplitude A and a phase ϕ.
By substitution, we get

mω 2 A2 K
E= sin2 (ωt + ϕ) + A2 cos2 (ωt + ϕ)
2 2
1  2
= KA sin (ωt + ϕ) + cos2 (ωt + ϕ)
2

2
1
= KA2
2
Each point of the phase space can be labelled as Γ(q, p), and any property of the system will
be a function of Γ:
A = A(Γ)
Let’s suppose we are following the time evolution of the system. The observed value of A will be
the time average of all the values we get:

1 to
Z
Aobs = hA(Γ(t))it = lim dt A(Γ(t))
to →∞ to 0

The time evolution is followed by integrating Newton’s equations step by step, by taking a time
interval δt = to /τo , where τo is the number of steps we are going to take. Therefore, we can
calculate
to /δt
1 X
Aobs = A(Γ(t))
τo τ =1
o

Alas, time average is not the usual formulation of statistical mechanics, that were based by
Gibbs on ensemble average. In fact, we can take an infinite set of system replicas and then take
the average on these; the collection of replicas is also known as ensemble, and it corresponds to
a collection of points in the phase space; each replica is represented as a point in there. This
way, we can find a probability distribution that describes how the system is disposed; we call it
probability density ρens (Γ).
As the system evolves, ρens changes with time, but its total time derivative is zero:

=0
dt
In principle, the partial time derivative is null only at equilibrium
(
∂ρ = 0 at equilibrium
∂t 6= 0 elsewhere

This represents a sort of conservation law that undergoes the name of Liouville’s theorem:
 
dρ ∂ρ X ∂ρ ∂qi ∂ρ ∂pi
= + +
dt ∂t i
∂qi ∂t ∂pi ∂t

28
Where the total derivative consists in following the system in a cruise along the phase space,
while the explicit, partial derivative means we are staying in a small volume ρdqdr and we look
at the number of system entering and exiting.
In this formulation of statistical mechanics, we can evaluate an observable value of expectation
as X
Aobs = hAiens = A(Γ)ρ(Γ)
Γ

The ensemble average yields the same result as the time average only if a trajectory can sample
any point of the phase space, given infinite time, i.e. if the system is ergodic. At this point,
the good news is that biological systems are all ergodic; the bad news is that the time required
to sample all the phase space is too big for current computers. Therefore, enhanced sampling
techniques are applied. Since at the end it is not easy to exploit ergodicity, we divide this methods
in

• molecular dynamics, that employ time averaging

• Monte Carlo sampling, that employs ensemble averaging

In principle, both methods require an infinite sampling, that is an infinite simulation time or
an infinite number of replicas, respectively. Ergodicity ensures us that

hAitime = hAiensamble

Ergodicity means that from any point of the phase space the system can reach any other point
of the space: a non-ergodic phase space, therefore, is divided in different regions that do not
communicate. However, due to computer limitations, it is not so easy to exploit ergodicity;
high barriers or bottlenecks may take a lot of time to be crossed, making it difficult for normal
molecular dynamics simulations. This means that molecular dynamics and Monte Carlo may
differ even for in an ergodic system.

3.2 Ensemble overview


The density probability ρens depends on the macroscopic thermodynamic variables, i.e. on the
ensemble. Typically, we consider four different ensembles:

microcanonical where N , V and E are constant; it represents an isolated system


canonical where N , V and T are constant; it represents a system exchanging heat with an heat
reservoir

isothermal isobaric where N , P and T are constant; it represents a system in contact with
both an heat reservoir and a barostat
grancanonical where µ, V and T are constant; it represents a system in contact with an heat
reservoir and able to exchange matter with the universe

Even if isothermal isobaric is the ensemble more relatable with usual chemical experiments,
we will focus more on the canonical one, because it is more straightforward. Nonetheless, we are
going to illustrate the peculiarities of each ensemble.

29
3.3 Partition functions and thermodynamic potentials
Usually, we can write the probability density as the ratio between a weight Wens and a normal-
ization factor Qens , both relative to the ensemble:
Wens (Γ)
ρens =
Qens (Γ)
where the normalization factor goes under the name of partition function and it corresponds
to the sum of the weights: X
Qens = Wens (Γ)
Γ

From the partition function, we can obtain any other physical quantity; in order to compute
it, however, we need to know everything about the system, that makes the direct calculation a
rather cumbersome task: this is why we employ simulations.
Indeed, the average of any observable can be obtained as
1 X
hAiens = Wens (Γ)A(Γ)
Qens
Γ

Finally, the thermodynamic potential is the greatness that links microscopic to macroscopic, and
it can be evaluated2 as
Ψens
= − ln Qens
kB T

We will know proceed to illustrate ρ, Q and Ψ for each ensemble we introduced.

Microcanonical The microcanonical probability density is proportional to a delta function


that filters all the compatible states, that are all the states with energy E:
(
0 for H(Γ) 6= E
ρN V E ∝ δ(H(Γ) − E) =
1 for H(Γ) = E

The partition function is therefore the number of accessible states


X
QN V E = δ(H(Γ) − E)
Γ

It is possible to use a quasi-classical approximation and write it also as


Z
1
QN V E = dΓ δ(H(Γ) − E)
N !h3N
Finally, the chemical potential is
Ψ S
= − ln QN V E = −
kB T kB
where S is the entropy of the system, as measure of the number of occupied states. The mi-
crocanonical ensemble corresponds to the conditions in which most of the simulations are run,
therefore some techniques will be required to get to the desired ensemble.
2 Properly, the thermodynamic potential has the units of measure of an energy. Prof Pieraccini avoid the use

of the kB T factor in his definition, though. Reader discretion is advised

30
Canonical The canonical probability density is proportional to an exponential function of the
hamiltonian H(Γ):  
H(Γ)
ρN V T ∝ exp −
kB T
The partition function is therefore the sum
 
X H(Γ)
QN V T = exp −
kB T
Γ

It is possible to use a quasi-classical approximation and write it also as


Z  
1 H(Γ)
QN V T = dΓ exp −
N !h3N kB T
Nonetheless, since H(Γ) = K(p) + V (q), these sums can be factorized as follows:
 
X H(Γ)
QN V T = exp −
kB T
Γ
   
X K(p) X V (q)
= exp − exp −
kB T kB T
Γ Γ
= Qid ex
N V T QN V T

where Qid N V T is the canonical partition function of an ideal gas made by the same particles and
QexN V T is the excess partition function, that takes into account the non-ideality of the system,
i.e. the potential energy. Sometimes, the term
Z  
V (q)
ZN V T = dq exp −
kB T
is called the configuration integral; this is what we sample with a Monte Carlo sampling.
Finally, the chemical potential is
Ψ A
= − ln QN V T = −
kB T kB T
where A is the Helmholtz free energy.

Isothermal isobaric The isothermal isobaric probability density is proportional to an expo-


nential function of the enthalpy H(Γ) + P V :
 
H(Γ) + P V
ρN P T ∝ exp −
kB T
The partition function is therefore the sum
 
XX H(Γ) + P V
QN P T = exp −
kB T
Γ V
   
X PV X H(Γ)
= exp − exp −
kB T kB T
V Γ
 
X PV
= exp − QN V T
kB T
V

31
Finally, the chemical potential is
Ψ G
= − ln QN P T = −
kB T kB T
where G is the Gibbs free energy.

Grancanonical The grancanonical probability density is proportional to an exponential func-


tion of H(Γ) + µN :  
H(Γ) + µN
ρµV T ∝ exp −
kB T
The partition function is therefore the sum
 
XX H(Γ) + µN
QµV T = exp −
kB T
Γ N
 X  
X µN H(Γ)
= exp − exp −
kB T kB T
N Γ
 
X µN
= exp − QN V T
kB T
N

3.4 Some relevant quantities


We can use what we have just learned to work out some interesting quantity.
First of all, the canonical average energy is
1 X
hEi = e−βHj Hj
QN V T j

where β = 1/kB T . But since


X X ∂ −βHj ∂ X −βHj
e−βHj Hj = − e =− e
j j
∂β ∂β j

We can write
1 ∂ ∂
hEi = − QN V T = − ln QN V T
QN V T ∂β ∂β
Secondly, the probability of a single state is the ratio between the number of systems in that
state over the total number of systems:

ni e−βHi
Pi = =
N QN V T
The last term is called Boltzmann distribution and it is easily derived by considering that
Pi = ρN V T for the i-th state.
In a two-states system, the ratio of each population is

n1 e−βH1 QN V T
= −βH2
n2 e QN V T
−βH1 βH2
=e e = e−β∆H

32
By reversing, we obtain  
n1
∆H = −kB T ln
n2
This makes our simulations quite crucial, because they allow us to get each state population,
from which we get the energy gap between the two. Now, a device is required that will allow us
to efficiently sample the phase space.

33
Chapter 4

Molecular Dynamics

In this chapter, we will discuss the main algorithms and techniques employed
to follow the time evolution of a molecular system, by application of force
fields and classical mechanics

4.1 Finite differences method


Molecular dynamics simulates the time evolution of a molecular system, following the classical
dynamic from a starting configuration; this way, microcanonical ensemble is sampled.
For each particle we have to solve Newton’s equation

d2 q
mi = Fi
dt2
where Fi = −∇i U ; U is the potential energy evaluated through force field. Since position and
force are vectors, for an N -particles systems we have to solve 3N second order, ordinary, differ-
ential equations.
On the other hand, we can start from
dqi pi
=
dt m
so that
dpi dvi
= mi = Fi = −∇i U
dt dt
This gives us 6N first order ordinary differential equations, corresponding to the Hamilton’s
equations of the system. Alas, to solve the dynamic numerically, we employ a finite difference
method, in which from the position and its derivatives at t we can estimate them at t + δt,
with δt as a small finite time increment. We can employ the following different variations of this
method.

Verlet algorithm We start from the forward and backward Taylor expansions of the position
1 1˙ 3
r(t + δt) = r(t) + ṙδt + r̈δt2 + r̈δt + o(δt3 )
2 6
1 1˙ 3
r(t − δt) = r(t) − ṙδt + r̈δt2 − r̈δt + o(δt3 )
2 6

34
If we sum them, we obtain

r(t + δt) + r(t − δt) = 2r(t) + r̈δt2 + o(δt3 )

that is, by truncation at second order

r(t + δt) = 2r(t) − r(t − δt) + r̈δt2

This formula, containing the acceleration a = r̈, is precise up to the third order, but it has some
problems. First, velocity does not appear; this is a problem if we need kinetic energy or constant
temperature (canonical ensemble); we can solve this problem with the formula
1
v(t) = [r(t + δt) − r(t − δt)]
2δt
This creates a lag between the knowledge of the position and that of the velocity. Finally, r(t+δt)
is the sum of a first order term and a second order term, so a small term is added to a big one: this
is a possible source for numerical errors. Figure 4.1 is a graphical representation of the algorithm;
as we can see, the knowledge of the previous step is required, so the first step is always done with
another integration methods, as imprecise as we want. Verlet is then applied from the second
step.

Leap frog algorithm We start from these two equations


 
1


 r(t + δt) = r(t) + v t + δt δt
 2
   
 1 1
v t + δt = v t − δt + a(t)δt

2 2

This way, there is no mixing between first and second order. A graphical representation is pre-
sented in Figure 4.1.

Velocity Verlet algorithm This algorithm is composed of two equations


1

r(t + δt) = r(t) + v(t)δt + a(t)δt

2
v(t + δt) = v(t) + 1 [a(t) + a(t + δt)] δt

2
The second equation is solved through half-step, even if we don’t know why:
1 1
v(t + δt) = v(t) + a(t)δt + a(t + δt)
 2  2
1 1
= v t + δt + a(t + δt)
2 2

The graphical representation of this algorithm is in Figure 4.1.

35
Figure 4.1: Various forms of motum integration; (a) Verlet algorithm (b) Leap frog algorithm (c)
Velocity Verlet algorithm

4.2 Time step in molecular dynamics


The size of the δt time step is crucial; the smaller it is, the better the approximation, but the
higher the number of steps required to sample the phase space; this means higher computer
time, that is proportional to the number of steps. Another problem can arise due to the time
step magnitude. Let’s consider a box of noble gas atoms: if δt is small, we will have a small
displacement, therefore a small force variation; on the other hand, if we take a big δt, we may
push some atoms into each others, resulting in enormous forces and abrupt crashes.
To avoid this, we must consider the temperature of the dynamic; in fact, the higher the
temperature is, the higher the speed of the molecules, the smaller we must take the time step.
Typically, the order of magnitude of δt is smaller than the relevant molecular motions, around
some femtoseconds. This means that a 1000000 steps are required to sample 1 ns. The usual
length of a dynamic is 100 ns for a biological system on cluster, up to some milliseconds for small
system on HPC. The problem is that interesting phenomena, such as molecular recognition and
protein folding, require more time: plain MD is not enough.

4.3 About the stability of MD trajectories


The stability of a trajectory is related to its dependence on the initial conditions; in molecular
dynamics, a small displacement of the initial conditions can bring the system to completely dif-
ferent points through completely different trajectories. For this, we say that molecular dynamics
trajectories are unstable or chaotic.
If a trajectory is defined as
r(t) = f (r0 , p0 ; t)
a “displaced” (or perturbed) trajectory can be of the kind
r0 (t) = f (r0 , p0 + ε; t)

36
where the small perturbation ε is expected to carry a very significant effect. This effect can be
evaluated as
∆r = r(t) − r0 (t)
For short simulations, ∆r is linear with ε, but its absolute value increases exponentially with
time:
|∆r| ∝ εeλt
The λ coefficient is also known as first Lyapunov exponent; indeed, this kind of instability is
called Lyapunov stability, i.e. the solutions are highly dependent on the initial conditions
(like, e.g. weather forecast).
To keep ∆r below a critical threshold ∆max for a time t < tmax , we have to limit the initial
perturbation to
ε ∝ ∆max e−λtmax
This means that for a long simulation we can just afford a small perturbation: our MD simulation
are not at all tolerant to perturbation. In this case, we should review why we resorted to MD
in the first place. MD trajectories are in fact run to sample the phase space, so that in long
simulation the trajectory returns on her path multiple times; however, the sequence of sampled
points is not relevant, as long as we are sampling them: the difference in paths is irrelevant and
the time average must be equal. If that is not true, it is due to the lack of convergence, not of
stability.

4.4 Size and boundaries


In MD, we are interested in simulating biological systems so that a single-molecule in silico
experiment could give us all the bulk properties we need. Alas, simulations of systems such as a
beaker flask of protein solution are not possible, so we have to limit ourself to smaller system, like
a single protein molecule surrounded by water, i.e. a microscopic drop of the afored mentioned
solution.
The first problem that arises is keeping the drop together: cohesive forces of water should be
enough, but a containing potential may be required. This is not the main difference though: in
a small system, a vast percentage of atoms is on surface, subjected to different forces than their
bulk brothers; this mean that bulk properties will be largely influenced by border effects and
therefore absolutely unrelated to the usual experiments. The amount of surface atoms decreases
with the system dimension, but so computation time does; to circumvent this problematic we
apply periodic boundaries conditions.
Let’s consider a system contained in a 2D square box. Applying PBC means replicating the
original box around itself, with all its content. If an atom in the original box moves, its movement
is replicated in all the other boxes. This way, if an atom exits for the box, it is replaced by one of
its replicas, exiting one of the box replicas: the number of box atoms is constant. The boundary
conditions are not physical, but they allow us to reduce our computational effort by storing just
the original box coordinates and at the same time removing every border effect. In 2D, it is like
folding the plane in a cylinder. This can create artificial crystallinity, that can be a problem
for solid-phase transitions, in which long range oscillations are present; moreover, some care is
needed when simulating a double layer. Obviously, any space filling polyhedron can be used as
box: the more spherical the better.
As far as MD goes, two problems arise:

1. long range interaction (r−n with n ≤ 3) require special tricks to avoid interactions between
one atom and its replica

37
Figure 4.2: Pictorial representation of the periodic boundary condition setup

38
2. the potential energy now should take into account all the infinite interactions between the
infinite number of particles
To solve this mess, the minimal image convention is employed: each particle only interacts
with the ones contained in a box of the same size and shape of the original box, centred on the
particle itself. This means that each and every particle interacts with the closest periodic images,
just its closest periodic images and nothing more.
Quoth the raven nevermore
In this fresh new virtual box, the number of particle is the same as the original box, so for
an N -particles system, each of them interacts with the N − 1 other, for a total of
1
N (N − 1) ' N 2
2
interactions. This is now a workable number, but it can be further reduced. At the end of the
day, each atom interacts with its nearest neighbours, so we can define a sphere that contains
all of them. The radius of this sphere is known as cut-off radius; outside of it there are no
interactions, but it must be shorter than one half of the box side. Clearly, the larger the box, the
more we save by using it. The cut-off radius can be introduced in a lot of colourful and different
ways, some of which are illustrated hereafter.

Simple cut-off It is the most basic, and it employs an Heaviside function θ(rc − rij ):
U 0 (rij ) = U (rij )θ(rc − rij )
where U (rij ) is the original potential and the Heaviside function is defined as
(
1 for x > 0
θ(x) =
0 for x < 0
The rather obvious discontinuity makes this potential a real cat to skin when it comes to motion
integration.

Switch approach We can smooth the previous approach by switching the potential to zero in
a gentler way. To do this, we employ not one, but two cut-off radii. Analytically, this is done by
considering
U 0 (rij ) = wij S(rij )U (rij )
where wij is a simple weight and S is the switch function defined as
for r ≤ rc1

1

S(rij ) = f (r, rc1 , rc2 ) for rc1 < r < rc2
for r > rc2

0

Shift approach Instead of cutting and stitching some functions together, we apply a pertur-
bation to the potential, so that it gets to zero at the rc . The analytical form is the same,
U 0 (rij ) = wij S(rij )U (rij )
but now  2
r
S(rij ) = 1 −
rc
This is the smoothest approach, but we are altering the potential: numerically better, physically
worse.

39
4.5 (En)sampling different ensembles
Since experimental data are obtained in different ensembles from microcanonical, we need meth-
ods to sample different ensembles. We show some of these methods afterwards.

Canonical sampling In the canonical ensemble the temperature is constant, due to a thermo-
stat. This forces a modification in the motion equations, modification that – as always – comes
in different tastes and colours.
A rather crude, rough-as-hell approach starts from the definitions of kinetic energy, both as
in classical mechanics as in kinetic gas theory:
N
1X 3
K= mi vi2 = N kB T
2 i=1 2

this leads to
N
1 X
T = mi vi2
3N kB i=1

If we want T = T 0 , we can therefore scale the speed to

vi0 = λvi

where the scaling factor λ can be obtained by considering


N
1 X
∆T = T 0 − T = mi vi2 (λ2 − 1)
3N kB i=1
= (λ2 − 1)T ⇒
T 0 − T = λ2 T − T ⇒
r
T0
λ=
T
This method is very efficient, but not so physical: the thermostat does not act instantaneously,
but fluctuations occur. What we sample is not the real canonical ensemble, then, but just an
ensemble with constant T .
An improvement comes under the name of Beredsen’s thermostat, that makes the system
converge exponentially to the thermostat temperature. To do so, we write the time derivative of
the temperature as
dT 1
= (Tb − T )
dt τ
where τ is the coupling constant (time) and Tb is the bath temperature. For a finite time step
δt, we get
δt
∆T = (Tb − T )
τ
and by using this ∆T in the previous equation, we get the velocity scaling factor as
s  
δt Tb
λ= 1+ −1
τ T

40
With δt = τ we clearly get back to the previous thermostat. The exponential convergence comes
from solving the differential equation
dT 1
= (Tb − T )
dt τ
by variable segregation. Breafly,
dt
dT = (Tb − T )
Z Zτ
dT dt
=
Tb − T τ
∆t
− ln |Tb − T | =
τ
|Tb − T | = e−∆t/τ
A large τ therefore means a weak thermostat-system coupling, and vice versa. As a rule of thumb,
during equilibration phase a small τ is employed, to switch to a large τ in the production phase,
so that we don’t strain the system in equilibrium conditions.
Another approach is known as Anderson’s thermostat, and its main goal is to reproduce
the physics of the heat exchange. Microscopically speaking, the atoms will bump into each other,
resulting in a change of kinetic energy, that is a temperature change in itself. To model this, at
regular interval of the dynamic, we change the speed of a random set of molecule with the speed
extracted by the Maxwell-Boltzmann distribution at Tb ; every time, we perturb the system, jump-
ing from a microcanonical PES to another, until all the particles reproduce Maxwell-Boltzmann
distribution at the target temperature.
Of the three methods, the first one is the less used, while the Beredsen one is not able
to fully reproduce the canonical ensemble; only the Andersen thermostat is able to reproduce
the canonical distribution after a given equilibration time. Obviously, other methods exist, like
changing directly motion equations or surrounding the system with a thermostatic fluid.

Isothermal-isobaric sampling The isothermal-isobaric ensemble requires a barostat, because


macroscopically the system is allowed to change volume. Volume fluctuation is related to the
isothermal compressibility K:  
1 ∂V
K=−
V ∂p T
A large K is proper of an easily compressible system, that presents large fluctuations of volume.
The compressibility can also be expressed as
hV 2 i − hV i2
K=β
hV 2 i
In gas phase, volume differences can be greater than your typical box, so that bigger boxes
are required for the isothermal-isobaric ensemble; fortunately, biological systems are usually
simulated in water, a rather incompressible fluid (K = 4475 × 10−6 atm−1 ); this allows us to
choose the box based only on our system dimensions and not on K. Otherwise, keeping P constant
is the same as keeping T , but positions are scaled, not speeds. For example, the Beredsen’s
barostat is obtained by
dP 1 √
3
= (Pb − P ) r = λri
dt τp
In conclusion, pressure fluctuates way more than temperature.

41
4.6 Verlet neighbour list
As we have just seen, periodic boundary conditions are a very useful tool in molecular dynamics.
They can be implemented through various and different boxes, like cubes, truncated octahedrons
or rhombic dodecahedron. Carefully choosing the box allows a reduction in the amount of solvent
to be computed; moreover, whatever box we choose, the specie of interest must not feel its own
replicas.
Previously, we introduced minimum image convention and cut-off radius to furtherly reduce
the amount of interactions we have to compute. We can estimate that for an L-sided box,
implementing a cut-off radius rc will produce a net computational gain of
4 rc3
gain =
3 L3
That is, the bigger the box, the higher the gain.
However, inserting a cut-off radius means estimating which molecules are in and which are
out: this requires calculating distances –i.e. square roots– between all molecules, with a significant
computational effort. But clearly some particles are too far away to be inside the cut-off radius
in the near future, and only the nearest one’s distances are relevant for the cut-off. By selecting
those particles that can get into the cut-off radius in a few integration steps, we are compiling
the so-called Verlet neighbour list; in practice, it contains all molecule inside a spherical crust
that could cross the cut-off sphere in somewhat between 10 and 20 steps. This way, instead of
computing all the distances, just those of the Verlet list are calculated to update the cut-off list;
after a certain time interval, the Verlet list itself is updated, by computing all distances. It is
obvious at this point that both crust size and refresh time of the Verlet neighbour list depend
on the temperature.
An alternative to this is the cell structure: the box is divided in cells, so that we can look
for neighbours only in those cells that surround the particle box; this way, we reduce significantly
the number of calculations.

4.7 Periodic Mesh Ewald approach


PBC are just fine with short range interactions, like Van der Waals or dipole-dipole interactions.
However, long range interactions, like those between net charges, are too big and require distances
too long to be contained in reasonable boxes. Since net charges are very common, a trick is needed:
we write the potential energy of the particle in two terms:
Z +∞
X N
U= U (rij ) + ρ dr U (r)4πr2
r <r
2 r c
ij c

where the first one is for those interactions in the cut-off radius, while the second one is an
estimate of everything outside, a sort of mean field in polar coordinates, multiplied by the density
ρ. For short range interactions, the second term can be dropped, but this is not possible for long
range ones. To evaluate this, we will proceed similarly to this example.
Let’s consider two positive charges at a certain distance. We can add and subtract a gaussian
of charge to each of them, i.e. we can put a diffusive gaussian of positive charge and one of
negative on each of them. This way we preserve the total charge, but we change the potential in
U = Um + Ug + Ugc
where

42
Um is the dipole-dipole interaction that arises between the two dipoles made by the positive
charge and the negative gaussian. It is short range and can go with PBC
Ug is the interaction between the gaussian charges. It is more easily evaluated in the reciprocal
space, so after Fourier transform; by anti-transform we then obtain the potential term
Ugc is the interaction between the charge and its own gaussians. It is constant since it does not
depend on distance, so it is evaluated one time for all

With this approach, charge interaction is splitted in three term, but it is made compatible with
PBC (rather obviously, since it was developed for crystallography). All in all, it goes under the
name of Periodic Mesh Ewald approach.

4.8 Equilibration phase


At the beginning of a dynamic, the initial geometry of the molecule and of the solvent may not
be perfectly exact: the system needs to be equilibrated a bit through short molecular dynamics.
First, an N V T dynamic is done to equilibrate the temperature (every dynamic starts at 0 K);
then a N P T dynamic is done to equilibrate density. These will run for some picoseconds, then
an ensemble is chosen to run the productive phase, that will compute up to some thousands of
nanoseconds.
In equilibration phase, the artificial positions may produce rogue forces in solvent-protein
interaction, capable of breaking the protein; to avoid this, some restraints are put to protein
atom positions.

4.9 Restraints and constraints


They seem similar, but they are not.

Constraints fix the value: motion equations are modified to respect this. They are employed
mostly to freeze degrees of freedoms to make longer steps a viable option
Restraints add a penalty for any variation of the value, without modifying the equations but
just the force field. They are put to preserve geometry in equilibration phase or to reproduce
experimental data (like NMR in NMR refinement)

We will now discuss further their peculiarities.

4.10 Just restraints


Position restraints are applied to protect the geometries during equilibration phase. We do
it through harmonic potential:
1
Vpr = Kpr (ri − Ri )2
2
where Ri is the target position, from experimental data. A parabolic potential means linear
forces:
F = −∇Vpr = −Kpr (xi − Xi )
The force constant Kpr is arbitrary, so we can start with a very strong one, then reduce it in
subsequent phases.

43
Angle restraints are applied to enforce particular geometries, thanks to a goniometric func-
tion:
Var (ri , rj , rk , rl ) = Kar {1 − cos [n(θ − θ0 )]}
This formula means that this kind of restraint is applied to four atoms. The θ angle is obtain
through the inner product:
rji • rkl = |rji ||rkl | cos θ
by inversion:
θ = arccos (r̂ji • r̂kl )
where the r̂ notation indicates a versor. The multiplicity number n allows to distinguish between
parallel and antiparallel reciprocal configuration. n = 2 means that parallel and antiparallel
configurations are equivalent, while n = 1 discriminates between them. In the same way, we can
restrain the orientation of a molecule, by fixing the angle between one of the axis:

θ = arccos (r̂ji • ẑ)

Distance restraints are to keep into account experimental data, usually NMR proton dis-
tances. The most trivial way is imposing a virtual bond between these atoms, by an harmonic
potential. However, this method does not consider that experimental data come with experimen-
tal uncertainty, so we should restrain the atoms into an interval, rather than to a position. To
do so, we apply a piece-wise potential, with no penalty in the interval:
1


 Kdr (r − r0 )2 for r < r0


 2
for r0 ≤ r ≤ r1

0


Vdr = 1
 Kdr (r − r1 )2 for r1 < r < r2
2




 1 Kdr (r2 − r1 )(2r − r2 − r1 ) for r ≥ r2



2
The linear potential is applied so that very wrong starting geometries are not subjected to very
strong forces; the linear potential gently guides the molecule into the restraint well. Obviously,

F = −∇V

In principle, this method works, but it has a limit: very flexible structures tend to fluctuate
during NMR data collection, so what we get is an average of a lot of geometries; this means
that all these restraints cannot be simultaneously satisfied, but the molecule should be allowed
to move.
This kind of restraint kills flexibility, so a time average one must be implemented. This is
possible just by considering the average distances r instead of the particular ones (r). Since MD
is a time evolution method, r is easily available, but on the first steps its value has little to none
meaning: we need some tool to enforce the restraints only when enough history is collected; this
requires a time-dependent constant:
h i
Kdra
= Kdr 1 − e−t/τ

where τ is the coupling constant that tunes the switching speed (a big τ is a small coupling). In
this way, MD is easily adapted to the necessities of NMR refinement. In fact, NMR yields a lot
more of compatible structures then the single X-Rays one, since its data represents what really
happens in solution: the molecule is free to move, while in crystalline phase is frozen in place.

44
4.11 Constraints only
Constraints are employed to freeze uninteresting coordinates, like vibrations; this way, longer
steps can be made. A simple way to apply constraints is to flip to normal modes and propagate
just those we are interested in, but this approach is only viable for small molecules. What we
do, then, is solving Newton’s equations without constraints first, and later applying a virtual
corrective force to keep the bond length to the given value:
mi r̈i = fi + gi
for each i-th atom, where fi is the natural force and gi is the constraint force. At this point, the
constraint equation is solved, to determine the magnitude of gi :
2
χij = rij (t) − d2ij = 0
basically, at each time step we make sure that the bond length rij is equal to the target bond
length dij .
Since the motions equations are solved through approximation, the constraints carry a small
error too; these errors may deviate our trajectory, so it is better to include them at each step of
the dynamics, by including them into the algorithm. In formulas, we have
ma r̈a = fa + ga ' fa + ga(2)
(2)
where ga is the approximated correction force on the a-th atom. This can be inserted in Verlet
algorithm, obtaining
δt2 δt2 (2)
ra (t + δt) = 2r(t) − r(t − δt) + fa + g
ma ma a
δt2 (2)
= r0a (t + δt) + g (t)
ma a
in which we indicate the position of the atom with no constraints as r0a .
Obviously, the constraints must be directed as the bond and they must follow Newton’s law;
we can express each of them through an indeterminate multiplier, λij . For a system with i-j-k
connectivity, this boils down to
 (2)
g = λ12 r12
 1


(2)
g2 = λ23 r23 − λ12 r12

 (2)

g3 = −λ23 r23
So that for each atom, the Verlet formula is applied:
δt2 (2)
ri (t + δt) = r0i + g (t)
mi i
(2)
where gi is one of the aforementioned forces. At this point, we can calculate
rij (t + δt) = ri (t + δt) − rj (t + δt)
δt2 δt2
= r0i (t + δt) − r0j (t + δt) + gi + gj
mi mj
δt2 δt2
= r0ij (t + δt) + gi + gj
mi mj

45
With this, we can work out λ, by solving the constraint equation:
|rij (t + δt)|2 = |rij (t)|2 = |dij |2
This formula means that for each time step, the rij we get from Verlet must be constant and
equal to the target distance. We can solve this problem only by iteration, because through the
trinomial square:
(a + b + c)2 = a2 + b2 + c2 + 2ab + 2bc + 2ac
we get a system of second grade equations in λ, that cannot be solved directly; however, since λ2
terms are quartic in δt, they are negligible in first approximation and we can iterate this solving
algorithm:
1. λ2 terms are neglected, leaving us with linear λ equations that can be solved directly
2. we put the results in λ2 , obtaining linear λ equations with different known coefficients
3. these coefficients are employed in the next step, iterating
4. the algorithm is stopped when point-wise convergence is reached
λij (n + 1) ' λij (n)

This way, we can see that enforcing constraints boils down to solve a system of algebraic equations
by iterations; this approach may require the inversion of a matrix, but since only the nearest
neighbours are taken into account, the matrix is full of zeros. An alternative to this algorithm is
the so-called shake algorithm, in which constraints are applied one by one.
Example 1. Let’s consider the S = C = S molecule. The number of vibrational constraints is
obviously 3 · 3 − 5 = 4, but we can actually just constraints the two S = C bonds and the S-S
distance.
Example 2. The molecule of benzene has 6 · 3 − 6 = 12 vibrational constraints to be applied;
unfortunately, since the constraints are directed as the bond, they may not respect the molecule
planarity, so more care is needed.
Example 3 (Velocity Verlet). Velocity Verlet requires to evaluate both half-time velocity and full
time velocity, becoming
 ga 2
 ra (t + δt) = r0a (t + δt) + δt



 ma
 
1 ga (t) δt


va (t + δt/2) = va0 t + δt +

 2 ma 2
 1 2
va (t + δt) = va0 (t + δt) + ga (t + 2 δt) δt



ma 2
Therefore, this algorithm actually considers that the bond can move at each half time, correcting
with a new force.

4.12 Analysis of MD Simulations


Trajectory informations After and during a MD simulation, looking at the trajectory allows
the expert operator to see if something went –or is going– wrong. For example, keeping the energy
and its contributions in check can help us understand when, how and what went wrong.

46
RMSD Another relevant measure we can extract from MD simulations is the Root Mean
Square Deviation or RMSD:
s
1 X
RMSD(t) = mi |ri (t) − rref
i |
2
M i=1

where M is the total mass and rref i represents a reference geometry for the molecule. RMSD
gives us informations on how much and when the structure changed. Obviously, ri are internal
coordinates, since the molecule can move inside the box.
In RMSD evaluation, usually only the backbone is employed, since substituents and side-
chains are very flexible. They can be adjusted with least square fitting later. If ligands are to be
considered, it is better to use least square fitting on the protein geometry and then to evaluate
the RMSD of the ligand.

RMSF The Root Mean Square Fluctuation (RMSF) is the time average of geometry differ-
ences: v
u
u1 X T
RMSFi = t |ri (t) − rref
i |
2
T t=1

This value is related to thermal ellipsoids and it can be used to estimate the molecule flexibility.

Radius of gyration The radius of gyration can be calculated with formulas like:
P 2
i |ri |mi
Rg = P
i mi

It represents the compactness of the structure, measured as distance from the center of mass. A
big Rg means the molecule is linear, a small one means is globular; it has clear applications in
protein folding analysis.

Distance in the structure Sometimes, it can be useful to know the distance between two
given amino acids, because it can be related to folding or catalytic activity. We can look at all
of them through the symmetric distance matrix, that stores all the distances values.

Hydrogen bond The presence of an H-bond is verified by checking the following geometrical
criteria:

r ≤ rHB = 0.35 mm
α ≤ αHB = 30◦

The most interesting thing, though, is how much time this bonds last and how many times they
appear in a specific position during the dynamic.

Secondary structure The secondary structure is basic for understanding protein folding,
stability and structure. It can be checked through various geometrical parameters.

Ramachandran plot For each couple of amino acids, the two backbone dihedral angles ϕ
e ψ can be plotted, obtaining the so-called Ramachandran plot. I’m sure it conveys a lot of
informations, but I don’t know how.

47
Time evolution Since MD follows the time evolution of the system, we can obtain not only
statistical informations, but dynamical ones too.

48
Chapter 5

Monte Carlo methods

In this chapter, we will discuss the techniques and algorithms of the Monte
Carlo family, that allow the ensemble sampling of a molecular system

5.1 Monte Carlo hit or miss


In this chapter, we will discuss the method to generate ensemble average, in opposition to molec-
ular dynamics, that follows the time evolution of the system. The Monte Carlo approach was
developed during the roaring years of WWII, to describe the diffusion of neutrons in nuclear
reactors; nonetheless, it works fine for a lot of other problems. The core concept of MC is turning
a deterministic problem into a stochastic one, i.e. one that involves the generation of random
numbers. We will consider its different flavours by applying it to integral evaluation.
The most crude approach we can think is called Monte Carlo hit or miss. Let’s try to
work out the value of π. To do so, we can take the circular sector identified by the unitary circle
in the first quadrant, and a square of unitary side. Their surfaces are

r2 π π
Acs = = As = r 2 = 1
4 4
therefore,
Acs
π=4
As
If we generate a couple of random numbers between 0 and 1, some of them will fall inside the
circle and some of them outside; it’s rather obvious that
τhit
π'
τshot
where τhit is the number of points that falls into the circular sector, while τshot is the total amount
of points. This method requires 2τshot random number, but converges with an error proportional

to τshot ; before convergence, the value of π will oscillate above and below the exact number.
This technique is quite expensive.

49
5.2 Monte Carlo sample mean
This is a neat improvement over the hit or miss. To evaluate the integral
Z x2
F = f (x) dx
x1

we multiply and divide the integral by the probability distribution ρ(x):


Z x2
ρ(x)
F = f (x) dx
x ρ(x)
1 
f (ζτ )
=
ρ(ζτ ) trials

This means our integral is the average of the f /ρ function.


If we take ρ as a uniform distribution between x2 = 1 and x1 = 0, we get
1
ρ(x) = =1
x2 − x1
that is
τmax
x2 − x1 X
F ' h(x2 − x1 )f (ζτ )i = f (ζτ )
τmax τ =1
In other words,
τX
max
1
F ' f (ζτ )
τ =1
τmax
Otherwise, if we are interested in an interval like [0; 10], we can take the uniform distribution
1 1
ρ= =
x2 − x1 10
and evaluate the integral as
τX
max
f (ζτ )
F ' 10
τ =1
τmax
This mean we can evaluate π with
Z 1p
1 X π
F = 1 − x2 dx ' f (ζτ ) =
0 τmax 4
In the end, this approach cannot compete to grid methods like rectangles or Cavalieri-Simpson
on 1D integrals. On the other hand, if we think to high-dimensional systems, everything changes.
A box of 100 particles is described by 300 coordinates; to get the value of
Z
ZN V T = dΓ e−βU (Γ)

(where Γ is the phase-space vector of configurations) we would need at least a 10 × . . . × 10 grid,


that is 10300 evaluation of the integrand function; on the other hand, if we take the sample mean
integral
V N X −βU (Γτ )
ZN V T = e
τmax

50
to evaluate the function 10 times we need to evaluate it 3000 times. All considered, Monte Carlo
sample mean gives us easy access to the average of an observable
Z
1
hAiN V T = dΓ Ae−βU (Γ)
ZN V T
τX
max

A(τ )e−βU (Γτ )


τ =1
= X
e−βU (Γτ )
τ

This is an improvement on hit or miss, but we need more! We want more! We crave for more!

5.3 Monte Carlo importance sampling


Until now we employed only uniform distributions: this is not a problem if the integral function
is your everyday ugly function; if instead the function is regular or quite sharp (or with compact
support), a lot of Monte Carlo points are wasted to evaluate its tails, that do not largely contribute
to the integral value. It would be nice to sample more frequently where the contribution to the
integral is more relevant.
Example 4 (The Depth of the river Nile). It is better to check the depth of the river Nile where
actually the river Nile is, and not in the middle of the god-damned desert.
From statistical mechanics, we know that
Z  
MC AρN V T
hAiN V T = dΓ AρN V T =
ρ NV T

where ρN V T is the canonical probability density and ρ is another probability density, like the
uniform one employed until now. However, we are not obliged to stick to uniform distribution,
so we could use ρN V T as ρ, getting to
 
AρN V T
hAiN V T = = hAitrials
ρN V T

Now we can just estimate the integral as an average over the trials, given that we are able to
reproduce ρN V T with random numbers; in fact, we just rephrased the problem: now we need to
generate a non-uniform distribution with random numbers. The solution to this dilemma was
given by Metropolis: it is indeed possible, but we have to generate it through a markovian chain
of events. This means two conditions must be verified:

1. only a finite number of outcomes is possible for each trial. This is not a problem, since in
silico space is already granular, so we have big yet finite number of possible configurations
2. every new configuration must the depend only and just only on the previous one (we recall
that in MD all configurations depend on the starting one)

By following these prescription we can generate a set of configurations that, although they are
not time related, describe ρN V T perfectly. For each couple of configurations Γm and Γn , we can
define πmn as the probability of the transition m → n. Altogether, the πmn probabilities are
stored in the transition matrix π.

51
Example 5 (From the early age of computer). At the dawn of times, computers were not so
reliable as they are now; each day, they could work (↑) or not work (↓). The transition matrix of
one of them could have been:  
π↑↑ π↑↓
π=
π↓↑ π↓↓
The probability of the second (fictional!) step will be
ρ(2) = ρ(1) π
At the n-th step, we get
ρ(n) = ρ(n−1) π = ρ(n−2) π 2 = . . . = ρ(1) π n
For a very long chain of events, we can hope to get to a limiting distribution ρ, defined as
ρ = lim ρ(1) π τ
τ →+∞

This limiting distribution must be an eigenvector of π, with eigenvalue 1,


ρπ = π
or alternatively X
ρm πmn = ρn
m
On the other hand, transition matrix is stochastic, so each of its row sums up to 1
X
πmn = 1
n

To have all of this, our MC simulations need an irreducible ergodic transition matrix, that means
that every point in phase space must be reachable from every point, or –as we previously said–
that our system must be ergodic.

5.4 The actual Metropolis algorithm


At the end of the day, we are interested in ρ. This is easy if we now π, but that is not always
possible; therefore we need to define the elements of π so that they not depend on Z (if they
do and we know Z, there is no point in simulating the system) and the limiting distribution
ρ = ρN V T . As we anticipated, Metropolis came out with an algorithm to obtain π, by enforcing
a condition stricter than necessary; this condition is called microscopic reversibility and
consists in setting the probability of the transition Γm → Γn equal to the reverse one Γm ← Γn ;
in formulas,
ρm πmn = ρn πnm
This condition is tighter than necessary, but it is sufficient to get π, and it makes everything
easier; by summing over m, we get indeed
X X
ρm πmn = ρn πnm
m m
X
= ρn πnm
m
= ρn 1

52
So thanks to this new condition and the pre-existence stochastic nature of π, we get to the
Markov condition. To be noted that, although the microscopic reversibility implies the Markovian
evolution (it is sufficient), the contrary is not true (it is not necessary).
At this point, Metropolis implemented the asymmetric solution, by defining the transition
matrix as
πmn = αmn for ρn ≥ ρm (Un ≤ Um )


ρn


for ρn ≤ ρm (Un ≥ Um )

π
mn = αmn
ρm
 X
πmm = 1 − πmn



n6=m

This way, everything depends on the underlying symmetric matrix αmn , and it does not depend
on the partition function. An uphill move is possible, but it is modulated by the ratio of the two
states probabilities.
To understand how this algorithm is applied, let’s consider a gaseous system. We modify
the system by moving a particle of a random displacement, and then we verify if the move is
accepted or not. First thing first, the random displacement is obtained as a random fraction
of a fixed maximum displacement δr; since the displacement is actually granular, only a finite
number of outcomes is possible, since the maximum-displacement box is divided in many small
boxes, due to machine algebra necessities. The probability of this displacement helps us define
the symmetric matrix αmn ; for a maximum-displacement region R, made of NR small boxes, we
get
 1

if r ∈ R
αmn = NR
0 if r 6∈ R

where r is the coordinate of our particle center of mass. At this point, we need to accept or refuse
the move; if Un ≤ Um (that is ρn ≥ ρm ), we accept the move with no issue, setting πmn = αmn .
On the other hand, if Un > Um (that is ρn < ρm ), we will accept the move with a probability
equal to ρn /ρm .
Let’s try to understand the value of this ratio. By Boltzmann theory,

ρn e−βUn QN V T
= = e−β∆U
ρm QN V T e−βUm

where ∆U = Un − Um . This means the acceptance probability of an uphill move is e−β∆U . The
acceptance is therefore estimated by drawing another random number, ξ. If

ξ < e−β∆U the move is accepted


(
ξ=
ξ > e−β∆U the move is refused

In a more compact way, this hole algorithm can be implemented by defining the transition
probability as the minimum between 1 and e−β∆U :

P = min 1, e−β∆U


In this, temperature has an effect through the factor β: if the temperature is increased, the
Boltzmann exponential falls more gently, and for a given ∆U the probability of an uphill step is
increased; the opposite is true too.
Moving just one atom means we just need to re-compute its own interactions, leaving all the
others throughout the system unchanged; this is quite convenient, because ∆U of moving the

53
i-th particle can estimated as
X X
∆Ui = U (n) (rij ) − U (m) (rij )
j j

A matter of some importance is the magnitude of the maximum displacement δr; a big δr means
we are sampling a lot of phase space, but we are also more likely to push the particles into each
other, resulting in a refused –therefore wasted– move.
This brings us to the crucial issue of the ideal acceptance rate. Even if a 100% acceptance
rate seems optimal, this cannot be farther than the truth: in this eventuality, only downhill
moves are made, giving us a poor sampling. On the other hand, a 1% ratio is useless, because no
information can be extracted. Usually, the acceptance rate is system dependent, but it is never
greater than 50%, sitting between 20% and 30%; this way we are sure we are sampling a good
canonical distribution. In fact, the number of configuration explored is not important, but the
correct distribution is.
The length of the simulation is a matter of MC steps, although time does not appear in MC
simulation; this has no chronological meaning, since there is no time relation between the steps.
The energy obtained through MC sampling is just the potential energy, corresponding to the
excess partition function (or configurational integral); the ideal part, related to kinetic energy,
can√be recovered, but it is seldom necessary. The accuracy of a MC average is of the order of
1/ τ , where τ is the number of MC steps:
 
1
hAiN V T = hAitrials + O √
τ

The last point is which moves to take to generate αmn : displacements may not be optimal
for polymers, but torsional angles can create very high energy configurations, etcetera etcetera.
All of this must be considered if we want a proper sampling of our phase space, and it can be a
deciding factor between MC and MD; although theoretically equivalent, their results are actually
very dependent on the sampling quality, and one of the two methods could be better than the
other, depending on the system itself. For example, lattice systems, in which the particles can
only rest on the nodes of a cubic lattice, cannot be simulated with MD, but are perfect for MC
sampling.

5.5 Isothermal-isobaric Monte Carlo


As we know, the probability distribution of an isoT-isoP Monte Carlo is
 
H + PV
ρN P T ∝ exp −
kB T

where H is the hamiltonian function. The partition function is


 
X PV
QN P T = exp − QN V T
kB T
V

As for the canonical ensemble, a configurational integral can be isolated

ZN V T = V N Qex
NV T

54
This is what we consider through Monte Carlo techniques, since there is no kinetic energy. The
way we implement constant pressure is through scaled coordinates,
r
s=
L
where s is the scaled coordinate, r is the original one and L is the side of the box.
Now we want a Markov chain whose limiting distribution is ρN P T . We start from
X
QN P T = e−βP V Qid ex
N V T QN V T
V

and we want
X
Qex
NP T = e−βP V Qex
NV T
V
X
= e−βP V V −N ZN V T
V
X X
= e−βP V V −N e−βU
V U
X
−N −β(U +P V )
= V e
V,U
X
= exp {−β(U + P V ) − N ln V }
V,U

This means our limiting distribution is

ρlim = exp {−β(U + P V ) − N ln V }

and the sums over V and U (r) are still there. Any observable average will have the following
shape Z +∞ Z
1
hAiN P T = dV e−βP V V −N ds A(s)e−βU (s)
QN P T 0
To simulate this ensemble, then, we have to randomly change both positions and box volume; to
generate a new state we will require

1. a particle position change


2. a volume change

This requirements can be expressed as the system


(
sni = sm
i + δs
Vn = Vm + δV

where m is the starting state, n is the final state, i is the particle counter and the two increments
are

δs = δsmax (2ξ − 1) δV = δVmax (2ξ − 1)

This formulas allows forward and backward movement, compression and depression in volume.

55
Now to determine the move acceptance; we have to consider an enthalpy difference between
the two states m → n:
N Vn
δHnm = δUnm + P (Vn − Vm ) − ln
β Vm
This is not a genuine enthalpy, but it is close enough, so it will do. The acceptance probability,
as before, will be
P = min 1; e−βδHnm


For the canonical ensemble, ∆U was easy to compute, since it involved only a single particle
interactions; by changing the volume too, all particles are moved, so all interactions must be
evaluated. For this, volume changes are computationally heavy, so they are implemented every
n steps.

Noble gas For non-bonded systems, a simple trick allows us to implement this algorithm
analytically. The LJ potential for the m state is
!12 !6
X σ X σ 12 6
Um = 4 (m)
− 4 (m)
= Um + Um
ij Lm sij ij Lm sij

If we change the volume isotropically to state n, we will obviously have


 12  6
12 Lm 6 Lm
Un = Um + Um
Ln Ln

therefore, the potential energy variation is


" 12 # " 6 #
12 Lm 6 Lm
∆Unm = Um − 1 + Um −1
Ln Ln

This way, the total potential energy variation is the sum of two contributes
tot vol displ
∆Unm = ∆Unm + ∆Unm

where the second one is the particle displacement contribution.

5.6 Random numbers generation


In a standard Monte Carlo dynamic, we may need as much as some million of random number.
This numbers can be obtained, in limited number, from natural phenomena, like radioactive
decay and atmospheric or electric noise. These are good quality random numbers, but they are
too few for our needs.
We need an algorithm to generate them: strictly speaking they are pseudo-random numbers,
because algorithms do not generate random stuff: it is an oxymoron y’all! Obviously, random
number generating algorithms come in different shapes and tastes, but some are better than
other, and ran2 is the best.
A viable and cheap option to generate uniformly distributed random numbers ∈ (0 , 1) is
to take a long sequence of integer numbers, where each of them depends on the previous one
through:  ax 
i
xi+1 = axi − mint
m

56
where int indicates the integer part, and a and m are both large integer numbers. This x random
numbers are ∈ (0 , m − 1): to move them into the right interval we divide by m:
xi
ξi =
m
This sequence will repeat after m − 1 times, so we take m as the largest number in machine
memory. Finally, a and m may be related to increase performance. This algorithm should give
us a good set of random numbers.
To check their quality, we can plot them two-by-two: if their quality is poor and there is a
sort of relation between them, we will see geometrical patterns in the graph.

57
Chapter 6

Advanced sampling methods

In this chapter, we will discuss a mayor problem in molecular mechanics, that


of barriers; moreover, we will introduce a simple solution for it

6.1 The barrier problem


If we simulate long enough, we should get to know everything, but it is not so simple. In fact, some
systems may not have enough energy to surpass certain barriers, remaining trapped in minima.
Only barriers of kB T magnitude can be overcome. The system is still ergodic, but barriers of
2kB T or greater magnitude may require too much time to climb, making impossible to exploit
ergodicity. Even in MC, the probability of climbing the well is very little, since it is the product
of the very small probabilities of each upward step; MC can still relocate outside the barrier, but
usually the rearrangement is too extended to be accessible by single-particle displacement. All
of this forces us to sample just a fraction in the phase space.
A crude, ineffective way to solve this problem is heating the system, by increasing kinetic
energy in MD and by increasing acceptance probability in MC. This works, but by heating we
are not only speeding up the system, but also changing the system nature; different temperature
could bring different phases, changing the shape of the PES. Basically, we are sampling another
thing. Even if ineffective, this is a good starting point for what comes next.

6.2 Parallel tempering


At a certain temperature T1 , we are not able to surpass a certain barrier, while at T2 > T1
the system moves freely; the sampling at T1 and the sampling at T2 are obviously independent.
Suppose that, from time to time, we attempt to exchange configurations between the two simula-
tions. In poor words, we are now using the T2 sampling as a tool to generate new configurations,
everywhere along the PES. We then take these configurations and use them to sample at T1 :
what we obtain is a patchwork sampling of multiple trajectories. Since we are interested in a
particular distribution, we need a tool to be certain we will always get it, even with the swap,
for all the simulations. What we get following this route is the hamiltonian swap.
Since it is easier to discuss this method in MC, we will describe it in that frame, knowing
it can even more easily adapted to MD. In particular, this approach goes under the name of
parallel tempering or replica exchange. In MC, we imposed the detailed balance (microscopic

58
reversibility) through the equation
ρm πmn = ρn πnm
where the transition matrix π is the composition of the symmetric matrix α and the acceptance
probability P .
πmn = αnm Pmn
In parallel tempering, we have many independent simulations, composing a system of multiple
non-interacting systems, with
Y Y
Qtot = QN V T i ρtot = ρi
i i

The hamiltonian swap boils down to just another kind of MC move, so we have to enforce detailed
balance; since each state has a geometry and a temperature, we will indicate the transition

(m , βm ; n , βn ) → (n , βm ; m , βn )

as → and the opposite transition as ←; we then decompose ρtot in its factor, getting to

ρ(m, βm )ρ(n, βn )α(→)P (→) = ρ(m, βn )ρ(n, βm )α(←)P (←)

This enforces the detailed balance into the swap. The acceptance rate is defined as

P (→) exp{−βm Un − βn Um } Q
=
P (←) exp{−βm Um − βn Un } Q
= exp {−(βn − βm )(Um − Un )}
= exp {−∆nm }

so that the acceptance probability is in the end

P = min(1 , e−∆nm )

The acceptance of the hamiltonian swap is therefore associated to the energies of the two simu-
lations at the given temperature.
Exercise. Consider the acceptance probability in all the possible relative values of β and U

From this formula, we clearly get that the acceptance probability rapidly decreases as ∆T
increases, so far-away simulations are not practical. We can instead make multiple close simu-
lations, each of them exchanging configurations with its nearest neighbour; in enough time, an
high temperature configuration will swap its way down to low T . If we take a look at the energy
distribution function at different temperatures, we will notice that the higher the temperature,
the flatter the distribution is, and all of them overlap in their tails. Hamiltonian swap can only
happen in the overlap region. Therefore, different independent simulations must be close enough
that their energy distributions overlap, but not so close that we are sampling the same shit all
over again. The appropriate spacing depends on T , to take into account of the distributions flat-
tening, being little at low T and large at high T . Usually a geometric progression in employed,
imposing
Ti
= constant
Tj
The optimal acceptance rate is between 20% and 30%.

59
All of this may seem a little ineffective, but running multiple, short simulations at different
temperature may allow us to sample a lot more phase space than a single, trapped, long simula-
tion. In MC simulations, this boils down to adding a new kind of step after a certain number of
typical steps. To generalize it to MD, we just have to make the MC-style swap every n steps of
the simulation. The main advantage is the better sampling, but multiple sampling at different T
may be useful too. As far as MC is concerned, we don’t loose anything with parallel tempering,
while in MD we loose time evolution, because we turn our trajectory in a bunch of pieces glued
together. Finally, we can say that parallel tempering just speeds up the simulation, yielding the
same result of very long and inaccessible vanilla simulations.
Example 6 (Set of increasing barriers). Consider a set of equivalent minima, each separated by
yet a bigger barrier. If typical Metropolis MC is employed, the minima are not equally populated,
since the system spends a lot of his limited simulation time in the minima with the smallest
barriers. Instead, if parallel tempering is employed, we get the physical Boltzmann distribution,
in which each equivalent minimum is equally populated.

Figure 6.1: From right to left, top to bottom: the potential energy function; the Metropolis MC
population result; the Parallel Tempering population result; the space-time representation

60
Chapter 7

Free energy calculation

In this chapter, we will discuss about free energy, and the techniques that
allow us to evaluate it through molecular mechanics

7.1 Free energy is not energy for free


Free energy is intrinsically an ensemble property, so each of them has one. The canonical ensemble
N V T has Helmholtz free energy (A or F ), while the isothermal-isobaric (N P T ) has Gibbs free
energy (G). The formula is actually the same, only with a different partition function1
Z
A(T ) = −kB T ln QN V T = −kB T ln dΓ exp {−βH(p, q)}

This way, the free energy does not contain any conformational information; since we are
interested in them, we will write the free energy as a conformational property too, by introducing
the conformation Y (r):
Z
A(Y, T ) = −kB T ln dΓ exp {−βH(Γ)}
Y (r)=Y

This corresponds to a restriction of the intergration space, to the subspace generated by a certain
number of conformations Y . This free energy is associated to the probability of actually having
the Y conformation, p(Y, T ):
Z
1
p(Y, T ) = dΓ exp {−βH(Γ)}
QN V T Y (r)=Y
where QN V T is the integral over the entire phase space
Z
QN V T = dΓ exp {−βH(Γ)}

By confronting with the A definition, we get


exp {−βA(Y, T )}
p(Y, T ) =
QN V T
1 This because all free energies are actually the same thing, i.e. the thermodynamic potential of the given

ensemble

61
The main advantage of probabilities is that they are additive if more configurations are consid-
ered:
p(Y1 ∪ Y2 , T ) = p(Y1 , T ) + p(Y2 , T )
while free energies are not
h i
A(Y1 ∪ Y2 , T ) = −kB T ln e−βA(Y1 ) + e−βA(Y2 )

On the other hand, free energy is smoother than the probability itself, making it simpler to read
and analyse; for example, relative minima are quite more easy to spot in a free energy plot than
in a probability plot. We have to recall that even relative and under-populated minima are useful
and relevant in describing the physics of the system; we do not discriminate. Alas, the main
problems with free energy are

• free energy is defined except for an additive constant: only variation is significant (third
principle of thermodynamics???)
• free energy is a function of all coordinates, but it is customary and useful to project it on a
variable subset. This may carry an approximation, since through projection we may loose
some information
Example 7 (Projection of two corrispondent minima). Let’s consider a bidimensional free
energy A(x, y), that presents two minima on the same x value, but with different y. If
we project A(x, y) on the x axis, this two minima will merge into a bigger, non-physical
minimum.

Free energy can also be decomposed in internal energy and entropic contribution:
 Z 
−βU (Y )
A(Y, T ) = −kB T ln e dΓ = U (Y ) − T S
Y

7.2 Actually calculating free energy, literally


A first approach is the so-called brute force approach: the free energy is evaluated as

A(Y, T ) = −kB T ln p(Y, T ) + ln Q

where p(Y, T ) is obtained through an extensive and complete sampling of the phase space; since
Q is constant throughout the phase space, it is usually neglected in computing free energy
variations.
The variation of free energy therefore amounts to

∆A(Y, T ) = A(Y1 ) − A(Y2 )


p(Y1 , T )
= −kB T ln
p(Y2 , T )
p(Y1 , T )
= −kB T ln
1 − p(Y1 , T )

The probability p is simply the number of configurations Y over the total number of configura-
tions. This approach is simple, but it is effective only if an extensive and recursive simulation is
performed, as we can see in the following example.

62
Example 8 (Protein folding). Let’s consider a simulation of a protein folding. After a cer-
tain amount of time, the protein changes conformation, from unfolded to folded. After that, the
simulation is brought to an end.
If we don’t give enough time after the folding, it will result that the system spent the vast ma-
jority of its time into the unfolded conformation; this will mean an higher probability of unfolded
conformations, therefore an absolute minimum for the unfolded protein. On the other hand, if we
let the system go (let it go! Let it go!) for enough time after the folding, the entire result could
be reversed: the protein will spend a lot of time into the folded conformation, that will present an
higher probability and a lower minimum.
It is then obvious that the brute force results strongly depends on the simulation length,
unless the simulation is long enough to come back multiple times; this is only possible for simple
systems on super computers.

7.3 Properties of a system


In statistical thermodynamics, we divide the properties of a system in
mechanical properties that depend on the derivatives of Q with the respect of T
thermal properties that depend directly on Q
This two kinds of properties have different requirements of a simulation. Let’s see what’s going
on with some examples.
Example 9 (Internal energy). Internal energy is a mechanical property that we can evaluate as
kB T 2 ∂Q ∂
U= =− ln Q
Q ∂T ∂β
kB T 2 ∂
Z
= dΓ e−βH(Γ)
Q ∂T
kB T 2
Z
∂ −βH
= dΓ e
Q ∂T
kB T 2 e−βH
Z Z
H
= dΓ e−βH 2
= dΓ H
Q kB T Q
= hHi
As we can see, high-energy regions do not contribute largely on the average, since they are weighted
by exp {−βH}.
Since MD and MC are very efficient at sampling low-energy regions, mechanical properties
like internal energy are easily accessible and quite reliable.
Example 10 (Free energy). Free energy, on the other hand, is a thermal property that we can
evaluate as
1
A = −kB T ln Q = kB T ln
Q
 Z 
1 1 −βH +βH
= kB T ln dΓ e e
Q VΓ
Z 
= kB T ln dΓρN V T e+βH − kB T ln VΓ

63
where VΓ is the phase space volume and all we added was an easy-to-demonstrate identity. As we
can see, high-energy contributions are now counterbalanced by exp {βH}, so they are no longer
negligible.
This makes obtaining thermal properties with MD or MC rather difficult, since the sampling
of high-energy regions is now strictly mandatory.

7.4 Variation of free energy


Most of the time we are interested in computing the difference of free energies ∆A between
two (or more) states. Throughout this discussion, let’s always keep in mind an easy example:
a protein-ligand system that can be either bounded (state 0) or unbounded (state 1). A first
attempt to evaluate ∆A can be the brute force approach, in which we simply go for an
extensive sampling of both states. We will bump in two problems:

1. if the ligand is a good one, after binding it will remain bound, with a difficult barrier to
overcome in order to come back
2. if the ligand is a good one, the bound state will be favoured and highly populated, while
the unbounded one will be poorly described

This problems make the brute force approach a not-so-efficient choice.


Nonetheless, we can evaluate the difference in free energy by restricting the integrals on the
single states:
dΓ e−βH
R
∆A = A1 − A0 = −kB T ln R1
0
dΓ e−βH
Obviously, one of the two regions will be poorly sampled, depending on which region we start
from. If the states have the same mass, instead of H it is usually employed U , evaluating what
rigorously is the excess free energy. With this approach, if the sampling is problematic, everything
goes to shit.
Why don’t we try a different approach? Instead of describing a two-states, one-potential
system, we can use one potential for each state, as if they were two separate systems. In this
case, the hamiltonians will be restricted on the “states”, while the integrals will remain on the
whole phase space:
dΓ e−βH1
R
∆A = −kB T ln R
dΓ e−βH0
This approach has many different implementations, that will be discussed as follow.

Thermodynamic perturbation We start from the free energy gap, as seen before, and we
add and subtract the same quantity

dΓ e−βH1
R
∆A = −kB T ln R
dΓ e−βH0
dΓ e−βH1 eβH0 e−βH0
R
= −kB T ln R
dΓ e−βH0
Z
= −kB T ln dΓ e−βH1 +βH0 ρ0N V T = −kB T lnhe−β∆H i0

64
The subscript 0 here means that the average is on the state 0, i.e. it is obtained by sampling just
the state 0.
Obviously,
−∆A = A0 − A1 = −kB T lnheβ∆H i1
so
∆A = −kB T lnhe−β∆H i0 = kB T lnheβ∆H i1
This is called single step thermodynamic perturbation.
Example 11 (Solvation energy difference). We want to estimate the solvation energy difference
between MeOH and MeSH; with the power of the simulation, we can just consider a system in
which, magically, the oxygen atoms turn into sulphur ones: the ∆A of the process will be the
solvation energy difference, if we run the simulation in aqueous environment. To do so, we will

1. simulate MeOH in water. For each configuration we generate, we have the hamiltonian
2. change O with S by swapping the FF parameters, evaluating HS in each of the previous
configurations
3. work out exp {−β∆H} and compute the average on the MeOH system

Obviously, the opposite process is doable too


In principle, the two gaps should be the same, but actually this is not the case; in fact, a
sampling problem occurs: if the phase spaces of the two states are largely superimposed, the
forward and backward process are equivalent; otherwise, some differences arise. Indeed, if the
state 1 lives in a subset of state 0 phase space, we can decide to sample the phase space 0 or
the phase space 1; in the first option, we may be able to reproduce state 1 only if we thoroughly
sample 0: it is an entropic problem. On the other hand, if we sample 1, we will be restrained in
its phase space and we will have an energetic problem. In principle, we cannot know in which
situation we are, so by confronting backward and forward we can get the error lower bound.

7.5 Cumulant expansion


As we know,
1
∆A = − ln e−βH 0
β
and we know it is possible to use ∆U instead of ∆H; indeed, since

H =T +U

if the states have the same mass, we have

∆H = T + U0 − T − U1 = ∆U

However, if the mass is different, what we will get is the excess free energy, that is actually what
experiments measure. For completeness sake, we add
1 1
∆A = − lnhe−β∆U i0 = lnheβ∆U i1
β β

65
At this point, we can see ∆A as an average of the probability distribution of ∆U sampled
phase space of one of the two states:
Z +∞
1
∆A = − ln e−β∆U P0 (∆U ) d∆U
β −∞

If we consider P0 as a gaussian distribution,

(∆U − h∆U i0 )2
 
1
P0 (∆U ) = √ exp −
σ 2π 2σ 2

with
σ 2 = h∆U 2 i0 − h∆U i20
what we get into the integral is another gaussian function, no more normalized and shifted by
βσ 2 :
2 2
Z +∞ (   )
C ∆U − h∆U i 0 + βσ
e−β∆A = √ exp − d∆U
σ 2π −∞ 2σ 2
with  
1
C = exp −β(h∆U i0 − βσ 2 )
2
This means that – in order to correctly sample this gaussian – we need at least a symmetric
interval of radius 2σ centered on the distribution center: more sampling to do! Analitically, this
integral brings to
1
∆A = h∆U i0/1 − βσ 2
2
where 0/1 means we can use one of the two averages. This formula applies only to gaussian
distributions, but it is very useful nonetheless; indeed, h∆U i0/1 can be positive or negative,
while −βσ 2 /2 is an always-negative contribution, we get to

∆A ≤ h∆U i0 ∆A ≥ h∆U i1

These inequalities hold for every probability distribution, and they give us a working interval
that is very helpful.
We can know introduce an expression called cumulant expansion2 ; with it, we write the
free energy difference as
+∞
1 X βn
∆A = − lnhe−β∆U i0 = (−1)n Kn
β n=1
n!
where the Kn are named cumulants and can be obtained through the formulas:

K1 = h∆U i0
K2 = h∆U 2 i0 − h∆U i20
K3 = h∆U 3 i0 − 3h∆U 2 i0 h∆U i0 + 2h∆U i20
... = ...
2 For further information on cumulants, and other statistical novelties, see G. Mandelli - Introduzione alla

Fisica Statistica

66
The gaussian distribution has Kn = 0 for n > 2, therefore it has a finite number of cumulants
and both an easy and exact cumulant expansion; truncating a distribution to the second order
means approximating it with a gaussian with the same average and variance. Adding more terms
may seem a good way to improve the approximation, but since convergence is rather slow, it is
not so great as approach. Therefore, we content with a second order approximation
β β
h∆U 2 i0 − h∆U i20 = h∆U i0 − σ 2

∆A ' h∆U i0 −
2 2
Example 12 (Charged particle). If we suppose

state 0 as a particle with no charge in a aqueous cavity


state 1 as a particle with charge q in the same cavity

we can evaluate ∆A, knowing that ∆U = qV :


β
h∆U 2 i0 − h∆U i20

∆A = h∆U i0 −
2
β
hq 2 V 2 i0 − hqV i20

= hqV i0 −
2
β 2 2
q hV i0 − q 2 hV i20

= qhV i0 −
2
Since water is an homogeneous and isotropic liquid, all dipoles are randomly oriented so there
is no net average potential (hV i0 = 0):
β
∆A = − q 2 hV 2 i0
2
This result is no different from what we will get from the generalized Born model.

7.6 Relation between phase spaces


As we just saw, the quality of the free energy evaluation is related to the phase space overlap
between the reference and target state. In the worst case scenario, if the two phase spaces do
not overlap, no good free energy can be extracted. On the other hand, if the target state phase
space is a subset of the reference one, a good estimate can be obtained. Normally, we just have
a not-so-good overlap.
However, all we said is a simplification: what matters are not the phase spaces per se, but
the relevant portions of these phase spaces, i.e. the low energy regions that largely contribute
to the description of the system. It is therefore possible that high barriers prevent the sampling
of the target phase space, or that it itself covers an high-energy region of the reference phase
space, making the sampling more difficult. All of this makes single step free energy perturbation
a rather clumsy attempt.

7.7 Multistep free energy perturbation


Since free energy is a state function, so it depends only on the start and the end, we can change
path, breaking down the large perturbation in many small ones; we can add an intermediate
state, labelled 1/2, so that
∆A = ∆A1 + ∆A2

67
where

∆A1 = A1/2 − A0 ∆A2 = A1 − A1/2

We can easily see that

∆A = ∆A1 + ∆A2 = A1/2 − A0 + A1 − A1/2 = A1 − A0

so everything is correct.
This way, we can break down a big jump in many small steps, so that phase space overlap is
big enough; forward free energy will be
steps N −1
−−→ X 1 X
∆A = ∆Ai = − lnhexp [−β(Ui+1 − Ui )]ii
i
β i=0

while backward free energy will result


steps N −1
←−− X 1 X
∆A = ∆Ai = lnhexp [β(Ui+1 − Ui )]ii+1
i
β i=0

Each step works as a single step thermodynamic perturbation, so this approach is known as
multistep thermodynamic perturbation; this means we perform a simulation on state 0,
evaluating ∆U with state 1, then we run the simulation on 1 and evaluate ∆U with 2, and so on
until N , on which we do not perform any simulation. The same goes for the backward approach.
This method can yield good estimates of free energy difference, as long as the simulations are
good and long enough, and the states are close enough, with no hole between them. The way we
ensure all of this is by introducing an order parameter.

7.8 Order parameter


Initially introduced to describe phase transitions, the order parameter is now a wonderful tool
to measure any system transformation. We call it λ, so that λ0 = 0 and λN = 1; this way, the
order parameter drives the transformation.
For example, if we want the free energy difference between CH3 OH and CH3 SH, in the
intermediate steps we need something in the middle between O and S; these states are not
physical, but since A is a state function anything goes. We then use λ to slowly change any FF
parameter P from that of the reference state P0 to that of the target PN :

Pi = λi PN + (1 − λ)P0

This is possible only if the reference and target state share the same topology. However, if
we want the free energy difference of the ketoenolic tautomery of acetaldehyde, we need to take
into account the change of topology.
O OH

H
There are two approaches: single and double topology.

68
Single topology approach In the first approach, we can describe the two state with only a
topology, as illustrated:

H O X2

X1 C C

H H
This means that the reference state will have the X2 atom as dummy and the X1 as H, while the
target will have X2 = H and X1 = dummy. In between, there is a slow change in parameters, so
that the atom disappears on one side and appears on the other. Two problems arise:

1. the dummy or close-to-dummy atoms bonds may express strange behaviors, due to the
small charge or the small bonding parameters
2. the VdW parameter reduction may cause the dummy atom – with still a relevant charge – to
crash into another atom, leading to infinite energies. This is called end point catastrophe

Double topology approach To solve the first problem, the double topology approach may
be employed; in this case, both topologies exists, but the atoms that change are inserted in an
exclusion list that kills any interaction between them. This way, we can compute

Ui = λi U1 + (1 − λi )U0

This approach does not solve the end point catastrophe, so other tricks are needed. For example,
one can change the simulation window size, that is how much λ change, to make smaller steps
where we need higher precision; unfortunately, one can only hope, since this method seldom
works. Another approach is to remove the charge first (obtaining a “charge only” ∆A) and the
VdW parameter later (“VdW only” ∆A): this is possible, as always, because A is a state function.
This method does not double the work, as someone can think, since the new simulations we have
to run may reduce the steps we need to take.

7.9 Improvements upon multistep perturbation


−−→ ←−− −−→ ←−−
In principle ∆A = ∆A, but due to convergence issues, some hysteresis makes so ∆A 6= ∆A.
We do not now a priori which one reseamble more closely the real ∆A, so there is room for
improvement.
A first approach is to use the arithmetical average of the two values
−−→ ←−−
∆A + ∆A
h∆Ai =
2
This is not the best way, since one of the two ∆As has a greater error, so by averaging we increase
the uncertainty of the result.
A better approach goes under the name of double white sampling; in it, N − 1 simulations
are performed, as for normal multistep perturbation. Then, between each step we take an half
perturbation step forward
λi → λi+1/2

69
and an half perturbation step backward

λi+1 → λi+1/2

These half steps are effortless, since at each full step we already have the trajectories, while no
trajectories are required for the half one. In this way,

∆Ai,i+1 = ∆Ai,i+1/2 − ∆Ai+1,i+1/2


"  #
1 hexp −β∆Ui,i+1/2 ii
= − ln 
β hexp +β∆Ui+1,i+1/2 ii+1

This is all post-processing and no new simulation is required: we can try many different flavours
of this.
The number of steps required to obtained an acceptable ∆A is heavyly dependent on the
system in exam. However, halfstepping gives us a quick method to evaluate convergence. At first,
an N step simulation is run, obtaining a certan value of ∆A, namely ∆A1 ; by considering also
the half steps, we increase the number of simulation runs, but we do not waste what we already
have (that is, the trajectories at each step). By working iteratively, we can evaluate the number
of steps required for convergence; as always, convergence is when

∆Ai+1 ' ∆Ai

Another way to improve ∆A evaluation is increasing the trajectory sampling quality. A good
way to do this is similar to parallel tempering: indeed, we can run the N simulations, each
with a different λ; at certain steps, we can attempt an exchange between trajectories, so that the
sampling of the phase space is improved; a probability controls the hop between the trajectories.
This method is known as Hamiltonian hopping.

7.10 Thermodynamic cycle


Exploiting thermodynamic cycles is crucial to obtain very important data. We recall that in a
cicle between four states, as follows
∆A1
1 2
∆A2
∆A4

∆A3
4 3
the sum of ∆A in the cycle is zero, since A is a state function.
Let’s now suppose that we want to evaluate the difference in binding energy between a certain
ligand LA and another one LB , with the same protein. For the example sake, think LA as an
alcohol and LB as a thiol. Initially, we can think of directly calculating the free energy of complex
formation, but we soon discover it is not as easy as said: even with multistep perturbation, what
we get are serious tecnical problems, that require solutions that are not efficient. On the other
hand, multistep perturbation allows us to change one ligand into the other, closing the following
thermodynamic cycle:

70
∆A1
P + LA PLA

∆A4
∆A3
∆A2
P + LB PLB
In this way, ∆A1 and ∆A2 are now accessible, since ∆A3 and ∆A4 can be computed as we
just saw; in fact
∆A2 − ∆A1 = ∆A4 − ∆A3
Since

∆A1 = −kB T ln KA ∆A2 = −kB T ln KB

where KA and KB are the equilibrium formation constants, we have


KB
= e−β(∆A2 −∆A1 ) = e−β(∆A4 −∆A3 )
KA
At the end, what we obtain in this example is a ∆∆A, that is rather useful in drug design. If the
molecule is large, and the modification is small, this method has great success, getting very close
to the experimental data; however, if the modification provokes a change in the binding mode,
then we are utterly fucked.
With such approach, we can also evaluate the absolute binding free energy, by considering a
cycle as such:
∆Agas
Rgas + Lgas RLgas
∆Asol
∆Asol

∆Aabs
Rsol + Lsol RLsol
where the same reaction is considered first in gas phase then in solvent. The value of ∆Aabs is
difficult to obtain without enhancing methods, but all the other term are far easier to evaluate.
The ∆Agas , for example, is trivial, since in gas phase only the binding energy changes between
the two states. As for the solvation terms, some useful tricks make them easy to compute; at
first, we consider the sum over the cycle, that is

∆Agas + ∆Asol (LR) − ∆Aabs − ∆Asol (L + R) = 0

then we recall that

∆Asol (L + R) = ∆Asol (L) + ∆Asol (R)


∆Asol (X) = ∆Agas (X → 0) − ∆Asol (X → 0)

where the X → 0 process is called annihilation, and consists in the disappearing of the molecule.
In multistep approach, we can take X as reference and 0 as target, and then we slowly switch off
the parameters. This way, we obtain

∆Aabs =∆Agas + ∆Agas (LR → 0) − ∆Asol (LR → 0) − ∆Agas (R → 0)


+ ∆Asol (R → 0) − ∆Agas (L → 0) + ∆Asol (L → 0)

71
Now we can split the solvation term, by making the ligand disappear first and the receptor later

∆Asol (LR → 0) = ∆Asol (LR → R) + ∆Asol (R → 0)

so that
∆Aabs = ∆Asol (L → 0) − ∆Asol (LR → R)
This last equation represents another, much simpler cycle
∆Abind
R + L RL

∆A = 0
R R
This technique works pretty well, but ∆A = 0 is a critical assumption, since we are assuming
that between the free and the bound state, R does not change conformation and L may change
degrees of freedom.

7.11 Concluding remarks


Molecular simulation allows us to obtain critical information about the system, and the thermo-
dynamic cycles increase the amount of data we have in output; moreover, the methods and the
cycles just illustrated are the basis for more approximated methods. However, some issues are to
be considered.

• A small free energy difference is not easier to evaluate than a bigger one; in fact,
β 2
∆A = hU0 i − σ
2
So a small ∆A can be the small difference of two big numbers
• A good ∆A only comes from a good usage of the methods
−−→ ←−−
• ∆A = ∆A only in principle, because convergence may play some tricks
• While internal energy can be decomposed to great advantage, it is meaningless to decompose
free energy: the terms that make up the free energy, in fact, are not necessarely state
functions (for example, heat and work are not state functions, but their sum is). This can
be seen if we consider U = Ua + Ub , then
β 2
σa + σb2 + 2ρσa σb

∆A = hUa i0 + hUb i0 −
2
The double product spoils the state-functionhood, thanks to the correlation coefficient ρ,
that measures how much the two modes are interlocked. Decomposing the free energy can
yield valuable information only if we then follow a relevant physical path.

72
Chapter 8

Using free energy

In this chapter, we will discuss about the employment of free energy in our
everyday life

8.1 Free energy and macroscopic variables


As we know from thermodynamics,
 
∂A
= −p
∂V N,T

so Z V1   Z V1
∂A
∆A = A(V1 ) − A(V2 ) = dV = − p dV
V0 ∂V N,T V0

Moreover, since we can express ρ = N/V , we have


N N
V = ⇒ dV = − 2 dρ
ρ ρ
so Z ρ1
N
∆A = p dρ
ρ0 ρ2
Alas, we typically do not change V or ρ, but we change the order parameter λ: we want a
way to evaluate Z 1
∂A
∆A = dλ
0 ∂λ
Since the A dependence on λ is through the partition function Q, and
A = −kB T ln Q
we get Z 1
1 ∂Q(λ)
∆A = −kB T dλ
0 Q(λ) ∂λ
Now we can recall that the Q dependence on λ is through the hamiltonian function, as a parameter
Z
Q(λ) = dpdq e−βH(p,q;λ)

73
Due to the fact that the integral does not meddle with the derivative variable, we can switch
them, obtaining
Z 1 Z  
kB T ∂H
∆A = − dλ dpdq e−βH(λ) −β
0 Q ∂λ
Z 1 Z   −βh
∂H e
= dλ dpdq
0 ∂λ Q
Z 1 Z   Z 1  
∂H λ ∂H
= dλ dpdq ρN V T = dλ
0 ∂λ 0 ∂λ

where ρλN V T is the canonical probability density function. Since we proceed through discrete
steps, we deal with
N  
X ∂H
∆A ' ∆λi
i=1
∂λ λi
Sometimes the hamiltonian derivative can be computed analytically, but most of the time it is
done numerically. As a side note, an experienced student may notice the striking resemblance of
this result to the quantum mechanical Hellman-Feynmann theorem1 .

8.2 Implicit solvent approach


The methods we saw in the previous chapters may be too demanding on large set of molecules
and on large molecules, so simpler approaches are required, even if less accurate. For example, if
we think at a protein-ligand complex in water, we are mostly thinking at water; we can therefore
speed up our simulation by simplifying the solvent description.
In the first place, the solvent

1. changes the dielectric constant


2. generates hydrophobic and hydrophilic interactions, that are effective interactions not re-
lated to a force, but to an entropic effect
3. dumps the motion of atoms, generating friction
4. exchanges energy with the solute by the means of scattering
5. may give specific interaction with the solute

The first effect may be tackled by turning the set of solvent molecules (explicit description)
into a continous dielectric medium with the proper dielectric constant  (implicit description).
The second effect can instead be accounted into the FF, usually implicitly, sometimes even
explicitly. Finally, effects 3 and 4 require a modification in Newton’s equation, since they are
related to the system dynamic. In this case, the standard equation

ṗi = Fi

becomes the so-called Langevin equation


γ
ṗi = Fi − pi + η(t)
mi
1 see of the same author, Chimica Quantistica (only available in italian)

74
where γ is the friction coefficient, mi is the particle mass and η is a stochastic vector. The first
term added describes the friction dumping, proportional to p, but in the opposite direction (the
minus sign), while the second term models the scattering, through the random force η.

8.3 Binding energy


We shall consider the following cycle.
∆Ggas
Pgas + Lgas PLgas

∆Gsolv
∆Gsolv
∆Gaq
Paq + Laq PLaq

From it, we get the binding free energy in water (or any other solvent) as

∆Gaq = ∆Ggas − ∆Gsolv (PL) − ∆Gsolv (P) − ∆Gsolv (L)

In the first place, we can consider

∆Ggas = ∆Hgas − T ∆S

Of these two terms, ∆Hgas is rather simple to obtain, while the enthropic one, that takes into
account the different states available and the change in the number of degrees of freedom, is very
difficult to compute, so much that it is usually neglected.
That considered, everything revolves around ∆Gsolv , that can be split in three terms:

∆Gsolv = ∆Gelec + ∆GVdW + ∆Gcav

where ∆Gelec is the electrostatic work to bring the charges from vacuum to the dielectric, ∆GVdW
is the Van der Waals term, that is crucial in implicit description, and ∆Gcav that is the work
necessary to make room for the charge in the dielectric medium. In the simpliest implicit solvent
model, we assume
∆GVdW + ∆Gcav = γSASA + b
where SASA is the solvent accessible surface area, usually the Van der Waals envelope, that
is the surface described by an H2 O Van deer Waals sphere rolling on the sum of the solute Van
deer Waals spheres. Finally, γ (not related to the friction coefficient) and b are experimental
terms.

8.4 Born and Generalized Born models


We now need to calculate ∆Gelec ; the simplest method is the Born model, that we have already
seen. To transfer a charge q from vacuum to a spherical cavity of radius a (atomic radius) in a
continous dielectric medium with dielectric constant , we are required

q2
 
1
∆Gelec = − 1−
2a 

75
If we have many well separated charges, so that their cavities do not merge, we just need to add
the variation in particle interaction due to the different dielectric scaling:

1 X qi2
 
1 1 X qi qj 1 X qi qj
∆Gelec = − 1− + −
2  i
ai 2 i,j rij 2 i,j rij
 
1 X qi2 X qi qj 
 
1
=− 1− −
2  i
ai i,j
rij

This is actually very nice if we are interested in moving ions, but we want to move a protein,
that is to say, a bunch of charges close together, with communicating cavities; for this, we switch
to the Generalized Born model:
 
1 1 X qi qj
∆Ggbel = − 1 −
2  i,j F (rij , aij )

where q
F (rij , aij ) = 2 + a2 e−D
rij ij

with
2
√ rij
aij = ai aj D=
(2aij )2

It is easy to see that this model encompasses the other two; indeed, if we are moving a single
atom, i = j, aii = ai , rii = 0, so
q
F (rii , aii ) = a2i e−0 = ai

If instead we move two separated charges, rij  aij , so

F (rij , aij ) = rij

and we are back to the separated charges model. Obviously, the Generalized Born is also able to
reproduce experimental data, so it is more viable.

8.5 Sampling of the phase space


Now that we are done with the electrostatic term, we can focus more generally on the binding
free energy in gas phase. In order to evaluate it, we need to sample the phase spaces of both
the products and the complex, extracting the conformations necessary to compute ∆Ggas and
∆Gsolv , and then mediating.
Usually, a single trajectory for each specie is run, but this does not yield an exaustive sampling
of the phase spaces; actually, some fragments of the protein may occupy different states in free
form than in the complex, generating a lot of noise when computing the ∆G. Nonetheless, this
is the three-trajectories approach.
To cut down this noise, we can run just the trajectory of the entire complex, extracting from
there the conformations of both the protein and the ligand; this way we will not have noise, but
we are neglecting the conformation contribution to the ∆G, since we are assuming that both P
and L present the same geometry in free and bounded state. This assumption may be good for

76
the large protein, but not for the small ligand, and it can be safely applied only in specific cases.
However, due to its affordability, this single-trajectory approach is very common.
When sampling is performed, explicit solvent is used, while in order to evaluate ∆Gsolv ,
implicit solvent models are applied, since explicit solvent algorithms for evaluating ∆Gsolv are
not worth the effort.
All of these approaches and techniques are mixed together into protocols; this means that we
can tailor a protocol to a specific case to increase the results agreement with experiment, thanks
to some eldritch error compensation. However, specific protocols are rarely transferable to other
systems, so more often standard protocols are applied, and we content with what we get.

8.6 Alanine scanning


The thermodynamic cycles allow us to evaluate also protein-protein binding energy, that is too
much for both perturbation and simple integration approaches. Moreover, it can be applied to
the computation of single-residue contribution to binding energy itself, through the so-called
Alanine scanning. In this method, the binding surface residues are sequentially changed into
alanine to evaluate
∆∆Gb = ∆Gm b − ∆Gb
w

where m stands for mutated and w for wild type. If the binding energy changes, that specific
residue is relevant for the binding, even if most often that not, changing the residue does not
yield any significant result. Since some protein-protein interactions are designed to be reversible,
it is even possible for the mutation to increase the binding energy. Alanine is preferred because
glycine has a very open Ramachandran plot, i.e. it adds a lot of flexibility to the protein structure,
creating a significant perturbation.
The residue contribution is quite impervious to obtain through experiment, since each mu-
tation requires significant effort. On the other hand, in silico simulations can be easily adapted.
The single-trajectory approach is adopted, so we run the complex dynamics, extracting all the
conformations we need of the complex and the protomers; at this point, we replace the relevant
residues with alanine, one by one, and we compute the ∆Gs. This is time consuming, but it is
actually a post-processing task, so it can be performed.
The underlying approximation is that the mutation does not affect the complex structure:
in the case of single residues, this is often true, but sometimes it is not; if this approximation
does not work (usually for charged residues), we will overestimate the mutation effect. In these
cases, we have to perform two trajectories, one for the wild type and the other for the mutated
complex.
It is easy to understand that this is quite computationally heavy, since scanning a total of
100 residues will require 101 trajectories in total; moreover, this approach possesses the same
problem of the three-trajectories approach, that is noise. For clarity sake, we have to state the
the three-trajectory approach for the evaluation of the binding energy is unaffordable for protein-
protein binding: only the single-trajectory approach is applied to evaluate ∆G, even when we
are following the mutated complex dynamics; only when we can assume that there is no effective
geometrical effect of the mutation, we can use the same – wild type – trajectory.
In the end, alanine scanning is a good semi-quantitative analysis of residue importance, and
some tailored protocols can reach a very agreement with experiment.

77
Chapter 9

Analysis of protein-protein
interactions

In this chapter, we will give an overview of the principal techniques to analyze


protein-protein interactions, and their applications to drug design

9.1 Categorization
We categorize protein-protein interactions as

obligatory when the protomers are always bound together; sometimes, the protomers are not
even stable on their own

transient when protomers are stable on their own and associate and dissociate dynamically

Transient interaction are often found in signal transfer systems, and they are very interesting as
target for drug interference.

9.2 Geometrical analysis


Proteins interact through their surfaces, so a part of them is buried from solvent interactions.
Protein geometry databases allows us to perform a statistical analysis on both the geometries
and the compositions that occur in these interactions.
First, we get to know that the area of interaction is larger than a protein-ligand one, arising
some difficulties in drug design. On the compositional level, we notice that the buried surface
is rich in hydrophobic residues and poor in hydrophilic one, except for arginine, which peculiar
interactions strengthen the complex. This abundance makes us think that the binding will be
mostly driven by hydrophobic effect, in the same way protein folding is. We are brought to think
that large contact surfaces should give high binding energy, but this belief is not mirrored by
the empiric datum: we must therefore assume that the hydrophobic effect is relevant, but does
not comprehend the entirety of the binding energy. Indeed, protein-protein interactions are too
specific to be completely described by a gross average affect like the hydrophobic one: all the
extra energy terms – like hydrogen bonds, exempli gratia – make the binding energy selectively
sensible to mutations.

78
In fact, some mutations do not affect the binding energy, while some other are quite effective;
this is explained in the so-called hotspot picture: the binding energy does not spread on the
all surfaces (like in a velcro), but it is concentrated only in specific spots of the surfaces. Some
of these hotspots are cooperative, that is their simulataneus mutations yield more effect, while
some other are simply additive. Nonetheless, the hotspot picture makes drug interference much
more viable.

9.3 Residue characteristics


We can divide the amino acidic residues at the protein-protein interface in two categories:

Core r. are fully buried from solvent interaction


Rim r. are partially buried from solvent interaction

The rim residues task is to protect the core from solvent perturbation, so they present some
variability between similar complexes (most often than not, the same complex is different from
specie to specie), while the core residue are always the same along all the species.
We can measure this variability with the residue entropy s(i),
X
s(i) = − p(k) ln p(k)
k

where p(k) is the probability of finding the amino acid in that position; we have p = 1 for total
conservation. This formula of entropy, derived from information science, it is called Shannon’s
entropy1 . Average residue entropy stands between 0 and 3 kJ K−1 . Low entropy is typical of high
conservation residues, like those in the core which are critically important for the interaction,
whereas high entropy is typical in rim residues, that can be easily substituted and have low
conservation. Mean residue entropy hsi can be given in both the crude and the weighted version:
P P
s(i) s(i)∆ASA
hsi = hsiw = P
n ∆ASA
where ∆ASA is the variation of accessible surface area.
The ratio between hscore i and hsrim i is a good indicator of the presence of a relevant inter-
action: if from the crystallographic data of the complex we get that this ratio is < 1, we can
assume an interaction, since a there is different conservation at the interphase; vice versa, the
complex is probably due to a crystal contact, so its interaction is not specific, but it originates
from the crystallization process.
As a side note, antibodies present low conservation of their core residues, due to their specific
task.

9.4 Surface characteristics


The interaction surfaces may be large, but the interaction energy is concentrated in hotspots;
hotspots (residues?) in the same region are cooperative, that is the energy lost in a simultaneous
mutation is greater than the sum of single-mutation energy losses. Instead, hotspots in different
regions are additive, so the energy loss in a simultaneous mutation is exactly the sum of the
single-mutation energy losses.
1 For more information on Shannon’s entropy, see G.Mandelli - Introduzione alla Fisica Statistica

79
Interaction surfaces are usually flat, but can present small indentation, noticeable with careful
analysis. Sometimes, these surfaces have pockets, that we classify as

Complemented p. that disappear in the complex, because the two proteins link perfectly
Non-Complemented p. that create hollow spaces in the complex, in which water or a ligand
can get, mediating the interaction

As we have already said, core regions and especially hotspots are less compositionally flexible
than the rim regions; sometimes hotspots are even pre-organized, ready to react, making them
an optimal target for drugs.

9.5 Targeting the PP interaction


We can target the interaction between two proteins

• with a ligand that binds on the hotspots, perturbing the interaction; this method is rather
easy to simulate
• with an allosteric interaction: the ligand binds far far away from the interactions, but
it prompts a conformational change in the protein, interfering with the interaction; this
approach is much more difficult to simulate

Obviously, we can target the interaction to weaken it, but also to strengthen it: stabilization of
the interaction can be achieved through mediating “glue” ligands.
In order to design the right drug, we need to consider large surfaces with small pockets, taking
into account that usually these proteins do not bind with small molecules, that could have helped
to identify the drug structure. Indeed, large surfaces do not require large molecules, but the right
ones2 ; to find them, we need a careful analysis of hotspots and pockets. Thanks to thousands of
researchers, the number of drugs that target protein-protein interactions is steadily increasing,
mostly because there are a lot more targets and applications.
Right now, the targeting is done with two approaches:

HTS a. in which the drug is selected among a large library optimized for protein-protein inter-
actions; libraries must be large but more importantly diverse to achieve success
peptide a. in which the interaction is directly targeted with a piece of one of the proteins

The actual drugability of an interaction depends upon

• the presence of 3D structural informations


• the presence of cavities, and their dimension (not too big)
• the hydrophobic or hydrophilic nature of those cavities
• the interacting subunit complementary to the cavity

This is where molecular modelling comes in help. If all the structures are available, we can
thoroughly study the energetic aspect of the interaction; if the structure of the complex is not
present, some docking techniques can be applied to obtain it from the apoproteins structures; if
we possess nothing, why do we even try?
2 La solita storia del pennello grande contro il grande pennello; alla fine della fiera, l’importante è...

80
Chapter 10

An handful of examples of MM
applications

10.1 p53-hdm2
The p53 agent binds damaged DNA and kills the cell; the hdm2 switches off p53, and it is
overexpressed in tumor cells. Inibiting the p53-hdm2 interaction may become a good therapy for
cancer.
The first attempt of targeting this interaction was with a peptide cut from p53; since the
peptide assumes a random coil conformation, though, its binding requires a large entropic loss.
To prevent it, we can imitate the α-helix chunk of p53 with a β-hairpin that reproduces it.
Another way is that of using an hydrocarbon chain to staple the helix, so that it is stabilized.
These ways we reduce entropic loss.
However, peptides are only good in theory, since they get digested or isolated by the im-
mune system, so something similar is needed, like phenilic chains1 , triazolic scaffolds or other
peptidomimetics.

10.2 Vinblastine in microtubules targeting


Microtubules are cilindrical structures made of filaments of tubuline dimers, that enact chro-
mosomal separation by forming the mitotic spindle during mitosis. In tumor cells, replication
is much faster than healty ones, so mitosis is a good target for a non-specific cancer therapy.
Hindering the microtubules action, though, has effect on all the fast-reproducing cells, like white
cells and hair cells. A blocked mitosis always ends with cell apoptosis.
Microtubules growth is a complex polymerisation process, mediated by the reaction

GTP GDP + Pi

To damage this process, both microtubule destabilizer and microtubule stabilizer have being
designed, but we will only focus on the action of Vinblastine, a microtubule destibilizer.
Vinblastine forces a bent conformation on the filaments, so that the microtubule cannot
grow: the elongation is not hindered, but the geometry is changed and finally the microtubule
1 “As soluble as a brick” - S. Pieraccini

81
collapses. What happens is that Vinblastine places itself between the two dimers, acting like a
wedge and modifing the dimer geometry.
In order to simulate this three-bodies interaction, we need to group Vinblastine with one of
the two dimer, therefore considering just a two-bodies interaction. By grouping Vinblastine with
one of the two dimers instead of the other, results are different; from these differences, we can
deduce that:

1. Far residues are not affected by Vinblastine (questo lo potevo dire anch’io)

2. Close residues are instead really affected

This because residues in contact have different energy if Vinblastine is linked to one or the
other dimer, so these are the important residues. By considering the thermodynamical cycle
between straight unbound tubuline, Vinblastine bound tubuline, both unbound dimer and one
of the two bound with Vinblastine, we get that Vinblastine binding is favourable, so Vinblastine
compensate for the steric stress.

10.3 Rapamycin
Rapamycin represents an extreme case of mediated protein-protein complexation, since almost
half of the buried surface is covered by Rapamycin.
We can identify the relevant residues in the interaction, by confronting the ∆∆G of rapamycin
bound first with one protein, then with the other. The results are that the proteins interact mostly
just with Rapamycin, due to an entropic process. Indeed, the unmediated interaction between
the proteins is completely dominated by a positive entropic contribution; this mean we cannot
neglect entropy in free energy calculations.
Even if during Alanine scanning we usually neglect entropy, this approximation must be
considered carefully when computing absolute binding energy.

10.4 An example of peptide strategy


Let’s consider the tubuline dimers interaction. We would like to directly use part of the binding
surface as a drug, to prevent polymerisation. To achieve this goal, we have to

1. analyze the interaction network through Alanine scanning, obtaining the important amino
acids
2. validate the results through phylogenetic analysis, i.e. comparing conservation data with
Alanine scanning: conservation is a good importancy indicator
3. analyse the hotspots distribution in the peptidic sequence: if some of these spots are con-
tiguos, it will be easier to sinthesise the peptides
4. test which peptide works better, with Alanine scanning; do not forget that most peptides
lose the secondary structure
5. proceed with in vitro tests, by checking tubuline polymerization; tubuline polymers make
the solution opaque, so we can follow the process through spectroscopy; the partial result is
that the selected plug damages polymerization, that now requires an higher concentration
to start

82
6. test on cultured cells, with a single blind

A few important data are the remaining cells in the culture after application, and the results
of the scrambled peptides method, in which peptides with the same aminoacidic composition,
but difference sequence are tested, to make sure it is actually due to the sequence and not, for
example, to the charge.
Sometimes, some synthetic trick are required, as for FtsZ. This molecules has the same effect
of microtubules destabilizers, but in bacteria; since the actors involved are different, FtsZ only
target bacteria, leaving the host cells alone. Due to a massive entropic binding term, the helix
peptide must be stapled; fortunately, this does not interfere with the binding. The only downside
is that bacteria cells walls do not allow peptides in, so just the degree of polymerization has been
tested.
While designing such a drug, one must always keep in focus its dimensions: a large molecule
will have a bigger binding energy, because more interactions are available; however, large molecules
are more complex, generating synthetical and biological concerns that should be addressed pre-
viously or at least compensated by the drug efficiency.
In conclusion, molecular mechanics provide us with many efficients tools to analyze PP in-
teractions that allows us to study targeting, druggability and design of potential drugs.

83
Chapter 11

Jarzynsky equation

In this chapter, we will discuss how we can get equilibrium information from
non-equilibrium simulation, with the Jarzynsky equation

11.1 Non-equilibrium simulations


Let’s consider a two-states (a and b) system. The generic path γ between the two states is
described through an order parameter λ; we are interested in the variation of free energy F .

Quasi-static transformation If the transformation proceeds infinitely slowly, allowing the


system to reach equilibrium at each instant of time, it will require an amount of work equal to
w∞ = Fb − Fa = ∆F
where the subscript ∞ stands for “infinitely slow”.

Finite rate transformation On the other hand, if the transformation occurs at a finite rate,
the system lags behind equilibrium at each time instant; depeding on the starting conditions,
different works are required for it to happen. What we get is a probability distribution of work
ρ(w, ts ) for each switching time ts (time required for λ = 0 → 1). From this distribution, the
average work is evaluated as Z
hw(ts )i = wρ(w, ts ) dw

If the switching time ts → ∞, we recover the quasi-static transformation, so


lim ρ(w, ts ) = δ(w − ∆F )
ts →∞

For any other ts , ρ will take another functional shape.


Due to the dissipation processes happening in the finite rate transformation, the average work
is an upper bound to ∆F , so
hwi ≥ ∆F
where the equality holds only for the quasi-static process. It will be demonstrated, however, that
the work exponential average
Z
he−βw i = ρ(w, ts )e−βw dw = e−β∆F

84
for each ts .
This brings us to the Jarzynski’s equation:
1
∆F = − lnhe−βw i
β
This mean that, in principle, ∆F can be obtained by running multiple replicas of a simulation
and calculating the work exponential average. This requires a good statistical sampling of all
possible paths: the single, infinite, quasi-statical simulation can be substituted by many short
simulations. Obviously, the conviency depends on the number of simulation required.
We should consider two limiting cases of this equation.

ts → ∞ what we have is a set of quasi-static processes, so

w = hwi = ∆F

and we are performing a thermodynamic integration


ts → 0 what we have is a set of instantaneous switches, each of them with

w = H1 − H0 = ∆H

where H is the Hamiltonian function; the Jarzynski equation becomes


1
∆F = − lnhe−β∆H i
β
so we are performing a thermodynamic perturbation

11.2 Proof for Jarzynski equation


We are going to present a proof for Jarzynski equation in two different cases: an isolated system
and a isothermic system.

Isolated s. We prepare the initial states a by putting the system in contact with an heat
reservoir; after each replica is thermalized, we switch the reservoir off (we decouple system and
reservoir), so that the system is isolated, but canonically distributed. The canonical distribution
along the transformation coordinate z at t0 is therefore

e−βH0 (z0 )
ρ(z0 , t0 ) =
Z0
At each time, we can evaluate the accumulated work w(z, t) i.e. the work done up to time t; if
t = ts , the total work is obtained. We then consider the exponential average
Z
he−βw i = ρ(z, ts )e−βw(z,ts ) dz
γ

In an isolated system,
w(z, t) = Hλ (z) − H0 (z)

85
where Hλ is the hamiltonian function corresponding to the order parameter λ. Since the proba-
bility distribution follows Liouville’s theorem, we have

e−βH0 (z0 )
ρ(z, t) = ρ(z0 , t0 ) =
Z0

We can recover our integrand function by multiplying for e−βw :

ρ(z, t)e−βw(z) = ρ(z0 , t0 )e−β(Hλ −H0 ) =


e−βH0 −β(Hλ −H0 ) e−βHλ
= e =
Z0 Z0
so Z
−βw 1 Zλ
he i= e−βHλ dz =
Z0 Z0
If λ = 1, the trasformation is concluded. Finally, we recall that
 
1 Zλ
∆Fλ = − ln
β Z0

therefore
1
∆F = − lnhe−βw i
β

Non-isolated s. If the system remains in contact with the reservoir, we can think of it as a
big isolated system of total hamiltonian

Gλ = Hλ (z) + Hres (z 0 ) + hint (z, z 0 )

where the interaction hamiltonian hint depends on both the system coordinates z and the reservoir
coordinates z 0 . If this coupling interaction is small, it is easy to demonstrate that Jarzynski still
holds. Indeed, as for an isolated system, we can reach the conclusion that

he−βw i =
Y0
where Y is the canonical partition function of the global system:
Z Z
0
Yλ = dzdz 0 e−βGλ (z,z ) = dzdz 0 e−β(Hλ +Hr +h)

If h is small, it is neglectable, so the integrals can be separated:


Z Z
Yλ = dz e−βHλ dz 0 e−βHr

This means: Z Z
dz e−βHλ dz 0 e−βHr
Yλ Zλ
= Z Z = = e−β∆F
Y0 Z0
dz e−βH0 dz 0 e−βHr

Even without the small coupling assumption, the Jarzynski equation validity can be demon-
strated, but it is outside the scope of this course.

86
11.3 A time of trials and tribulations
Even if Jarzynski approach is exact, a problem of convergence, purely statistical, limits its ap-
plicability; things like ts and number of replicas are unfortunately heavily system dependent,
and the lack of experimental reference makes testing the simulations difficult, while only small
system can be pushed to quasi-static process. In addition, the exponential average is not so good
statistically, so that a truncated cumulant expansion is a widely accepted improvement.
The average work fon Ns simulations can be expressed, arithmetically, as
Ns
1 X
wa = wi
Ns i=0

For many sets of Ns simulations, that is for a lot of them, we have


Z
a
hw i = hwi = wρ(w) dw

These two formulas are not equivalent, unless Ns → ∞, so that the law of large numbers
can be applied. As we have seen, rather than using hwi as an F -estimate, we can use something
related to the exponential average
Ns
1 1 X
wx = − ln e−βwi
β Ns i=1

For Ns = 1, wx → wa , but for Ns → ∞, wa → hwi, while wx → ∆F . If Ns is just big, we have

∆F ≤ wx ≤ wa

so wx is closer to ∆F than wa : a good estimate, and a statistically less biased at that.


Unfortunately, it is an exponential function. This mean that if the works are very similar, we
have a low standard deviation and everything is smooth as silk, producing a good estimate of ∆F .
However, if the data have a large dispersion (> kB T ), the values of work are very different and
the exponential average will be dominated by the smaller one. Probably, these small work values
will represent the tail of the work distribution, so we are extracting information from geometries
of very limited relevance; this increases the number of simulations required for convergence, to
the point of turning this approach unaffordable.
For microscopic systems (even experimental ones), everything works fine, but the convergence
problem appears for macroscopic ones. If the sampling is insufficient, we loose precision: having
small statistical errors (accuracy) is still possible, though. A poor sampling is always a cradle for
systematic errors.

11.4 Cumulant expansion of J. equation


We already know that

X wn
∆F = (−β)n−1
n=1
n!
so
β2
lnhe−βw i = −βhwi + hw2 i + hwi2 + . . .

2

87
If the work distribution is gaussian, only the first two cumulats are 6= 0, so
β2 2
∆F = hwi − σ
2
where we can find a suitable expression for the dissipated work
β2 2
wd = − σ
2
The close relation between the dissipation of energy and the fluctuation of the system σ is yet
another proof of the fluctuation-dissipation theorem1 . The main advantage of the trunkated
cumulant expansion is its faster convergence; so, even as an approximation, the systematic error
it introduces is compensated by better estimates, due to this faster convergence.

11.5 Stiff spring approximation


We define as potential of mean force (PMF) Φ the free energy as a function of the reaction
coordinate ξ or the order parameter λ. The reaction coordinate ξ is the actual pathway on which
we apply the force, so we can evaluate the PMF with the help of a delta function:
Z
0
e−βΦ(ξ ) = drdp δ [ξ(r) − ξ 0 ] e−βH

The delta function here selects all the coordinates in the pathway, instead of using a line integral.
In the end, we are limiting our partition function Z solely on the trajectory.
To force the system on the pathway, we apply a guiding potential in the shape of
k 2
h(r, λ) = [ξ(r) − λ]
2
that can be imagined as if we are exerting our force by the means of a spring. To properly trasmit
the force, the spring must be so rigid that is almost rigid.
The guiding potential represents a perturbation in the hamiltonian function
k 2
H̃ = H + [ξ(r) − λ]
2
Since
e−β(Fλ −F0 ) = he−βw0,λ i
we have Z Z  
βk 2
e−βF (λ) = e−β H̃ drdp = exp −βH − [ξ(r) − λ] drdp
2
Actually, we are interested in the PMF of the unperturbed system, so we introduce the identity
Z
dξ 0 δ [ξ(r) − ξ 0 ] = 1

in our integral:
Z Z Z  
−βF (λ) 0 0 −β H̃ 0 0 βk 0
e = drdp ξ δ(ξ − ξ )e = dξ exp −βΦ(ξ ) − (ξ − λ)2
2
1 for more information about the fluctuation-dissipation theorem, see G.Mandelli - Introduzione alla Fisica

Statistica

88
If the spring is stiff (large k), most of the integral will be on ξ 0 ' λ due to the gaussian function;
this means our system does not drift, and the simulation value is close to the theoretical one:

Φ(λ) = F (λ)

This equality holds if we have a good coupling between the pulling and the system movement.

89
Chapter 12

Umbrella sampling

In this chapter, we will discuss about the tecnicalities of Umbrella Sampling,


and its application

12.1 A qualitative introduction


Our goal is to obtain the potential of mean force (PMF) of a transition between the state
A and the state B along the reaction coordinate ξ. At room temperature, our system may be
trapped by any barrier > kB T ; an optimal situation would be to have a flat PES, so that we
could sample the pure diffusion motion.
By chosing the right ξ, we may modify the FF potential by adding a biasing potential W to
boost the barrier jump. The resulting biased potential is therefore

U b (r) = U u (r) + W (ξ)

where U u is the unbiased potential we are actually looking for, that is a function of the system
positions r; on the other hand, the biasing potential is a function of ξ, that is a function of r.
Obviously, the optimal W would be the opposite of the potential of mean force F :

W (ξ) = −F (ξ)

Since knowing F is the actual goal of all of this, we have to find another way of generating W .
The strategy is dividing the reaction coordinate ξ in many windows, and in each of them
we apply a biasing potential, in general terms different from window to window; then we run a
simulation. For example, we can add an harmonic biasing potential, to avoid the system egress
from the window. In this way, we obtain an exaustive sampling of each window, and we are left
with just the problem of linking together all the results; at that point, knowing both U b and W ,
we can get U b .

12.2 An analytical elucidation


The unbiased probability density of the reaction coordinate ξ is
Z
1
P u (ξ) = e−βU (r) δ [ξ(r) − ξ 0 ] dr
Q

90
This is the statistical weight in the phase space of the trajectory along ξ, since on the numerator
we have the configurational integral limited on it. We already know that the free energy difference
between the two states can be obtained as
   
1 PB 1 QB
∆F = − ln = − ln
β PA β QA
By adding the biasing potential, we have to consider the biased probability density for each
window i: Z
1
Pib (ξ) = b e−β[U (r)+Wi (ξ)] δ [ξ(r) − ξ 0 ] dr
Q
where Qb is the biased partition function:
Z
Qbi = e−β[U (r)+Wi (ξ)] dr

Since W is a function of ξ only, it can be taken out from the integral


e−βWi
Z
Pib (ξ) = e−βU (r) δ [ξ(r) − ξ 0 ] dr
Qb
By comparison, we have that
Qb
Piu (ξ) = Pib (ξ)eβWi
Q
where, thanks to the definition of average,
Z
e−βU −βWi dr
Qb
= Z
Q
e−βU dr

= e−βWi

so that
Piu = Pib eβWi he−βWi i
The free-energy in the window is therefore easily accessible as
1 1 
ln(Piu ) = − ln Pib (ξ)eβWi he−βWi i

Fi (ξ) = −
β β
1 1
= − ln Pi (ξ) − Wi (ξ) − lnhe−βWi i
b
β β
If we divide ξ with one single and only window, Fi is as simple as evaluating the first two terms,
since the difficult one – the last one – can be neglected as an additive constant. Increasing the
number of windows forces us to consider it, since free energy is a continuos function.

12.3 Adaptative umbrella sampling


We typically apply an harmonic biasing potential in each window
k
Wi (ξ) = (ξ(r) − ξ 0 )2
2

91
where ξ 0 is now the center of the window. With this we sample each window.
We must take care that the sampling distribution of neighbour windows superimpose with
each other, keeping in mind that new windows can always be added. Another possible approach
is that of the adaptative umbrella sampling, in which instead of using many windows and a
starting Wi , we sample our system in a single window without biasing potential; then we adapt
the PES in the subsequent simulations. Exempli gratia, without any biasing potential our system
will be stuck in the starting minimum, that will be well sampled. In the second simulation, we
ipotetize that the underlying potential is harmonic and centered so that it reproduce what we
sampled; we had its opposite to the total potential and we run the simulation: the system will
explore a much broad portion of phase space, so a different underlying potential is added. This
procedure is protracted untill convergence is reached, convergeance being a diffusive dynamic.
The linking problem is still there: since we still have different potentials to glue together. We
have just turn a many-windows single-simulation problem in a many-simulations single-window
one. The linking task is left to the WHAM (!!).

12.4 WHAM (!!)


The wave hystogram analysis method (WHAM) gives us a viable approach to link together
th potential bits obtained through US. It works for both standard and adaptative umbrella
sampling, so we are going to describe it for the standard case.
We divide each Si window in M intervals j, so that an hystogram can be created for each of
these windows. The biased probability in the j-th interval of the i-th window is

Pijb = fi Cij Piju

where Pju is the unbiased probability, Cij = exp(−βWi ) is the biasing factor and fi is the
normalization factor. The last one can be obtained by enforcing normalization on the Pijb :

M
X 1
Pijb = 1 ⇒ fi = P u
j j ij Pij
C

At the end of the day we got the M + S equation system



 XS
nij





 u i=1
 Pj = S



 X

 NfC i i ij
 i=1


 1
fi =



 M
X
Cij Pju





j=1

that cannot be solved directly, but require a self consistent algorithm.

92
Chapter 13

Umbrella sampling applications

13.1 Binding free energy of a polipeptide


Some peptides stacks together, creating a plack that causes neurodegenerative illnesses. Umbrella
sampling can be applied to the system to generate the conformations obtained by pulling one of
the peptides away. This process is repeated for the wild type and a group of mutations; since
all of this is not an equilibrium simulation, umbrella sampling is absolutely required to obtain
meaningful results.

13.2 Peptide stability in mixed solvent


A lot of important molecules protect extromophiles from osmotic stress. We recall that osmosis
makes the concentration of the salt equal on both of the sides of a semipermeable membrane, to
enforce equilibrium. These molecules are called osmoprotectors and have interesting effects on
protein stability, since they stabilized them against any sort of denaturation.
To simulate all this shit with umbrella sampling, we need to identify the start, the end and
everything in between: this means finding a reaction coordinate. Fortunately, an easy peptide
reaction coordinate can be the end-to-end distance; it works just fine for both α-helix and β-
hairpin. The number of windows is obtained on the reference system (water solution): convergence
is reached after 37 windows. With the same approach the optimal length of the simulation results
10 ns.
The trajectory is then tested with different molecules, both osmoprotectants and denaturants
(like urea). The first ones increase unbinding ∆G, while the other decrease it. By scanning the
reaction coordinate, we can identify the PMF profile, so that mechanicist assumption may be
inferred. PMF presents two jumps:
1. when the hydrogen bonds are breaking. For each solvent, the breaking path is the same,
but the free energy changes
2. when an hydrophobic core is exposed and broken
By looking at the effective force ∂ξ G, we notice that the first jump turns into a peak (that
is expected), but it is not affected by osmoprotectants; however, denaturants lower the energy
required; urea, for example, is a backbone replacer, so it mimic the interactions lost in the
unbinding. The second peak, instead, is only affected by the osmoprotectants, that shield the
hydrophobic core, hindering the breakdown.

93
Alas, umbrella sampling can only see what happens during the simulation, so we are not able
to see anything outside of the reaction coordinate: we cannot consider any misfolded β-hairpin,
verbi gratia; to do it, we would require more reaction coordinates, that means more windows,
that means a much greater computational effort.

94
Chapter 14

Metadynamics

14.1 Something new, something old


Metadynamics is a molecular dynamics approach related to the self-adapting umbrella sampling,
and based on a dimensional reduction of the phase space. First thing first, we make a constrained
representation of the phase space, describing the system with the so-called collective variables.
These are function of all the coordinates, elected to describe the system in its entirety. The aim
of metadynamics, then, is to reduce the free energy profile to a function of just the collective
variables and then running a simulation on it, cutting the calculation time.
Let’s try to understand this through a pictorial example. A certain night, a man falls into
an empty swimming pool; in the dark, he moves randomly to find an exit. Since the walls are
all equally tall, he is constantly pushed away, looking for an easier passage. Unfortunately, in
this conditions, the man will spend the rest of the night in the pool. However, he posses a bag
of holding that he got from cruently slaying a beholder (Figure 14.1), and it is full of an infinite
amount of sand. With this, the man can get out of the pool by simply dropping a bit of sand
each n steps, slowly filling the pool.
In metadynamics, the system won’t have a bag of holding, but every n steps, it will drop a
small gaussian biasing potential where it stands; doing so, it will be pushed away from already
explored regions, until it will be able to escape. This strategy is applied in the reduced phase
space. Moreover, once the phase space is filled, we will have an empirical reconstuction of the
free energy as a function of the collective variables.

95
Figure 14.1: A beholder. It has beauty in his eye

14.2 Gaussian sand does not drift


We have a small number α of collective variables s(x), functions of the system spacial coordinates
x. The system is subjected to a potential U (x), on which we will stack the biasing potentials.
Equilibrium behavior follows the probability distribution of the reduced phase space
1 −βH(s)
P (s) = e
Zs
where Zs is the partition function on the s-space. The free energy can be evaluated by restraining
the configurational integral to the collective variables, by the means of a delta function:
Z
1
F = − ln e−βU (x) δ [s − s(x)] dx
β
The total biasing potential depends on time, since we are changing it every n steps. It has
the shape

( )
2
X [s(x) − s(nτg )]
Ug (s(x, t)) = w exp −
n=1
2σs2
where w is the height of the gaussian, τg is the number of steps between each drop, and σs is the
gaussian width. Large w and σs , and small τg mean big gaussian very frequently, so a fast filling
of the wells, but a bad recontruction of the free energy. Indeed, accuracy is related to σs , while
the error on the free energy profile is
W
Err(F ) ∝
τg
so each time we must balance celerity and accuracy. The assumption is that
lim Ug (s, t) ' −F (x)
t→∞

but it is important to notice that this equation was only heuristically postulated and then
positively tested on multiple systems; there is no theory behind.

96
14.3 Collective variables socialism
More often than not, more than one collective variable is taken, since oversimplifying may lead
to artifacts; this requires a multidimensional gaussian potential of the shape
d
( )
2
XY [si (x) − si (nτg )]
Ug (s, t) = w exp −
n=1 i
2σs2

where d is the number of collective variables. An high d has some drawbacks, since the number
of gaussians increases exponentially, to the point that vanilla molecular dynamics becomes more
viable. The number of gaussian to fill a well ng goes as

ng ∼ σs−d

so typically d stands between 1 and 3.


Now onto the kind of collective variables we can emply.

• distances between atoms or centers of mass; they do not take into account the orientation
of the molecules, so we may need some...
• ... dihedral angle
• coordination number is very useful to describe hydrophobic interactions, H-bonds and com-
plexation. It is defined as
X 1 − (rij /r0 )n
s(r) =
ij
1 − (rij /r0 )m

where rij is the distance between the two objects, r0 is a reference distance, while n and m
are fitting parameter, to smooth what would otherwise be a step function in some extreme
cases.

These extra ones are specific to the secondary structure of a protein:

• helical content in the peptides may be measured with helicity:


residues
X 1
Φ= [1 + cos(ϕi − ϕ0 )]
i
2

where ϕi are the angles in the Ramachandran plot, ϕ0 being the perfect helix one, that is
45◦ . If the peptide is a perfect α-helix, Φ is equal to the number of residues.
• a bit more accurate way is dihedral correlation, expressed as
res−1
X p
Φcorr = 1 + cos(ϕi − ϕi−1 )
i=2

If the correlation is high, like in any folded structure, φi ' φi−1 , so Φcorr = n − 1; if there
is no recognizable secondary structure, the dihedral correlation assumes many different
values. It can be compared with circular dicroism.

• the radius of gyration measures the length of the peptide in some way

97
• Another possible collective variable is the root mean square deviation (RMSD) of the
protein in respect to ideal structure, considering each 6 residues groups inside the peptide.
For a perfect α-helix, the RMSDs of the groups are all null. The collective variable can be
expressed as  n
ri − d0
X 1−
groups
r0
s=  m
ri − d0
i 1−
r0

We can easily understand that even a simple thing can spawn a lot of different collective
variables, so the choice is fundamental. More important, these collective variables may not be
enough: an α-helix bundle is not different from a long α-helix if we have an helicity variable, so
the radius of gyration may be added.

98
Chapter 15

Some examples

15.1 Test on the number of gaussians


If we take a simple potential energy surface, we can see what happens when the number of
gaussian functions increases. The first chunk of gaussian functions will fill the starting well, and
the system will inevitably fall into another one; at this point, each gaussian function will help in
leveling the hole, so that the system can exit and the cycle may restart. This filling-exiting-falling
mechanism proceeds until the free energy profile is completely levelled and the system evolves
in diffusion regime. Considering more collective variable increases the computational effort, but
makes possible to identify certain paths and recognize the nature of certain points.

Figure 15.1: Filling of a potential profile by an increasing number of gaussian functions

99
15.2 Parrinello’s benchmark work on β-hairpin
To apply metadynamics on the β-hairpin, Parrinello took as collective variables the number of
intramolecular hydrogen bonds and the radius of gyration. Three minima appear in the resulting
free energy surface:

1. the deepest, at low radius of gyration and high number of hydrogen bonds, corresponding
to the folded state
2. another at low gyration radius and low number of hydrogen bonds, corresponding to a
misfolded compact structure
3. the last one at high gyration radius and low number of hydrogen bonds, that is the unfolded
state

Then, Parrinello applied Parallel tempering to the simulation: this created not only a canon-
ical distribution of initial conditions, but allowed also to sample a set of temperatures. With this
brand new collaboration, metadynamics is a lot faster, because the system does not wait the well
to be filled to jump out, but finds itself in another well thanks to replica exchange. This allows
a more uniform filling of the whole surface, increasing the calculation efficiency.

15.3 Osmoprotectants 2: electric boogaloo


Applying parallel tempering to metadynamics not only speeds up the calculation, but also yields
informations at different temperatures. This approach was employed to study the effect of osmo-
protectants at different temperatures, focusing on the change of the melting temperature. In
fact, the 3D structure of a protein is stable at room temperature, but by heating, a phase tran-
sition (commonly called denaturation) happens at a determined temperature, that is melting
temperature Tm .
The effect of temperature is that of changing the shape of the free energy profile F (Q):

1. at T < Tm , the folded structure sits on a deeper minimum than the unfolded one, so it is
stable
2. at T = Tm , the folded mininum is as deep as the unfolded one, so both of them are stable
(phase transition)
3. at T > Tm , inversion occurs and the unfolded structure sits on the deepest minimum, and
the protein is unfolded

At room temperature, the osmoprotectants make the folded minimum larger, while with
denaturants this minimum almost disappear, and the unfolded one gains importance; osmopro-
tectants increase Tm , while denaturants reduce it. The change in the free energy profile influences
directly the ∆F of transition, that can be obtained as always through the ratio of the partition
functions.
This was parallel tempering. As for metadynamics, it allows us to easily get an atomic de-
scription. With it, we can see the mechanism of both classes of molecules:

denaturants interact directly with the protein, exposing the hydrophobic core but mainly steal-
ing H-bonds
osmoprotectants protect the hydrophobic core of the protein, without direct interaction

100
Honestly, the osmoprotectants effect is still unclear, but two hypothesis were presented:

water ordering the osmoprotectant forces a more ordered structure on water, reducing the
entropic penalty on the hydrophobic core; it is the opposite of the cautropic effect of urea
osmophobic effect the osmoprotectants have an unfavourable interaction with the protein
backbone, that is far more exposed in the denaturated structure; this means that the
osmoprotectants destabilize both folded and unfolded states, but far greatly the unfolded
one

101
Chapter 16

Coarse grain force fields

In this chapter, we will move away from the atomic frame, to further increase
the calculation speed. These kind of beads force fields will be discussed through
the martini force field

16.1 martini force field


Until now we have treated atomic simulation, but the number of atoms may be too big to
treat certain systems, so we have to further simplify the description. In the early days of MD,
something called united atoms was thus employed; for example, the methyl group was treated
as a modified carbon atom. This approach is no longer used, but its idea still live in the coarse
grain force fields, that use a larger scale, turning an amino acid in a single bead, for example.
This way we save a lot of trouble, but we lose the entire atomic description.
The example we will use throughout this chapter is the martini FF for phospholipidic double
layers, that is a 1-to-4 FF, meaning one bead every four atoms. This description stands in the
middle between MD and more extreme coarse grain FFs. The martini FF adopts four degrees
of polarity: charged, polar, non polar and apolar. If the number of heavy atoms do not divide by
four, we have to introduce an arbitrary description, moreover:

Rings are described with a 1-to-2 mapping


Water is described with a single large bead for 4H2 O

Ions are in beads with their first solvation sphere

16.2 Pros and Cons


The main advantage in coarse grain is the reduction of the number of particles; in addition, less,
larger particles allow a larger time step, up to 40 fs: here is where the real gain is. Moreover, the
beads description presents a much smoother PES, making the exploration something like four
times faster. Both calculation and dynamics speed are increased by the lack of H-bonds to be
evaluated or increasing the diffusion coefficient. In total, we can estimate that a coarse grain
simulation could easily be hundreds of times faster that a normal one.
The problems stem directly from the lack of atomic description. Even if the CGFF is balanced
for free energy, entropic and enthalpic contribution are not reliable, since the coarse grain possess

102
far less states than a proper set of atoms; then, any atomic chirality is lost. This means that the
temperature dependence of any property is not at all reliable. Finally, coarse grain FFs have the
same weakness of any other FF, i.e. their parametrization. Indeed, each FF has been optimized
with a goal in minde, so they are prone to overstabilization of the system of interest. In all of
this, we have to remember that structural informations must be given as an input, and great
modifications are not possible.

16.3 Some examples


Dipeptide aggregation It is a well known fact that certain dipeptides can self-assemble into
tubes or other structures, given a lot of time. Since the time step is about 1 µm and a lot of
peptides are necessary, this process can only be simulated through coarse grain. With martini
force field, a box of water and dipeptides was evoluted, measuring the association propensity
index, that is the ratio between the initial surface area and the final one. If the final surface
is less than the initial one, some aggregation has happened, but only an association propensity
index greater than 2 is a clear symptom of aggregation. The simulation was reapeated for every
possible dipeptide, and only a small number of them show self-assembly. In this, martini FF
managed to reproduce the experimental results, correctly identifying the assembling dipeptides.

Membranes Obviously, MFF was successfully employed to describe mebrane self-assembly,


vesciculae formation and every other thing that happens to membranes. This was possible thanks
to the coarse grain approach.

Nanoparticles Coarse grain has been also applied to simulate the interaction between the
membrane and some other objects, like fullerenes and nanoparticles. This because nanoparticles
are a very good drug carriers, so their use and toxicity are important informations.

Holes The longer time frame accessible to the coarse grain force fields allowed the exploration
of membrane hole formation under the effect of particular anti-microbial peptides.

103
Chapter 17

Accelerated molecular dynamics

In this chapter, we will describe the last approach to enhance the sampling,
the accelerated molecular dynamic.

17.1 The accelerated approach


As always, we are looking for a fast method to sample the phase space and reconstruct a free
energy profile. Like the other methods, AMD enforces a better sampling, by helping the system
escape from wells. To do so, a threshold energy (or boost energy) ε is set; if the potential energy
is above it, no modification is applied, otherwise a little increment is added, to facilitate the
escape. In other words, the PES is modified to better explore the phase space:
(
U if U ≥ ε
U∗ =
U + δU if U < ε

The effect of this modification is to exponentially decrease the time required to escape a
potential well, that depends exponentially on the barrier energy. This means that the effective
time step δt∗ is greater than the set time step δt, because the sampling is more efficient. We can
state
δt∗ = δteβδU
where the exponential function of δU modulates the effective time step increment. The potential
boost δU is a function of both the coordinates and time. This formula also tells us that if U ∗ = U ,
δt∗ = δt, otherwise δt∗ > δt. The effective simulation time is the sum of all the effective time
steps, so
X X
t∗ = δt∗i = δteβδUi
i i
N
N X
= δt eβδUi = theβδU i
N i

The exponential average of δU is known as boost factor.

104
17.2 Free energy profile recovery
What we saw is not enough to get a free energy profile, since we are actually sampling a biased
PES U ∗ , of which we have no interest. We want the original PES, at the cost of the biased one.
We recall that free energy is more or less a probability of occupation, that describes the higher
population of lower energy regions. Population is then the key to solve this problem, yet we still
have a biased population.
If we consider the average of an observable A,
Z
1
hAi = A(x)ρ(x) dx
Z
but we sample a biased PES, the bias will fall on the distribution ρ, turning it into ρ∗ :
Z
1
hAi∗ = ∗ A(x)ρ∗ (x) dx
Z
The change in distribution will reflect the change in potential

U ∗ = U + δU

so by weighting each position by eβδU , we effectively recover the proper distribution, since

e−βU = e−βU eβδU

The same procedure is repeated on Z ∗ . This way we go back to hAi, since we know δU at each
x.

17.3 The shape of the δU


No fish romance in here, just cold mathematics. The first implementation of the boost potential,
(
∗ U if U ≥ ε
U =
U + δU if U < ε

was too abrupt, hindering convergeance. A smoother version was then created, introducing a
smoothing parameter α.
(ε − U )2
δU =
α + (ε − U )
If α = 0 we go back to the original, while if α > 0 we are modifying the potential by an α measure.
Originally, the acceleration was only applied to torsional terms, since they were considered critical
in folding processes. Now, every FF term is boosted.
Another important matter is the magnitude of ε. To find it, some unboosted simulations are
run and the potential energy is extrapolated. The system was probably stuck at the bottom of a
well, of which we get the profile; now we can make an educated guess on how much energy would
be required to escape it. As always, balance is of the essence, and it is way more important in
the choice of the smoothing parameter α.
We have to say that the choice of parameter of accelated dynamics is counterbalanced by
the selection of the reaction or collective coordinates in metadynamics and umbrella sampling.
In facts, accelerated methods do not require any projection of the potential energy surface, but
need a proper α and ε. Since the simulation heavily depends on them, an exaustive testing must
be done.

105
17.4 Reconstruction and reweighting
In practice, recovering the unbiased free energy profile closely resemble the WHAM (!!) tecnique.
Indeed, we look for the probability of A as a function of the j-th beam, in an histogram fashion.
At this point we have ρ∗j , from which we can recover the original ρ by weighting

ρj = ρ∗j eβδUj

so that for m beams, we have


m
X
hAi = Aj ρ∗j eβδUj
j

The free energy can be now evaluated as


1
F =− ln ρ
β
Since throughout all the recovery an exponential average is employed, we can expect slow con-
vergence. As always, a second order truncated cumulant expansion will solve the problem.

17.5 Examples
β-hairpin folding Since β-hairpin is the simplest compact structure peptides assume, it is a
common test system. After the simulation is completed, the free energy F can be written as a
function of any interesting variable: this is just post-processing, and no relevant computer effort
is wasted. A good plotting variable was the difference from the crystal structure.

Small protein AMD was applied to the folding of a small protein with an α-helix and a turn.
An compact intermediate was discovered.

Three-helix bundle Also this system folding presents a compact intermediate. In this case,
AMD is able to reproduce the results of MD simulations 18 times longer in computational time.

106
Chapter 18

Conformational Analysis

In this chapter, we will discuss the peculiarities of conformational analysis,


with focus on the generation of conformations

18.1 Conformations
A conformation is a 3D molecular structure that differs by a dihedral angle from another. Even
in a simple hydrocarbon, we can have a lot of dihedral angles, that can generate many different
conformations. This definition can be relaxed, since in more complex molecules some other pa-
rameters can change, due to steric strain; for this, we do not consider the small changes in bond
length and angle that may occur. In complex molecules, though, can be very useful analyzing the
different conformation. A particular drug, for example, may have just one active conformation
that we would like to know.
Conformational analysis is the right tool to find it, but some general aspects must be consid-
ered first:

• the active conformation may be stable only when bound, and may not be the most favourable
• the conformation depends on the solvent; conformational analysis is usually performed in
vacuum or in implicit solvent
• when conformational analysis is performed, we look for all possible conformations, that
correspond to the potential energy minima

This means that, for all intends and purposes, conformational analysis is an optimization problem,
and we know that no optimization tool is able to find all the minima, let alone the absolute one.
What we can find is the closest minimum from the starting position. Therefore, the core business
of CA is optimizing a set of different starting positions.

18.2 Generating starting points


Optimization is started from a grid of points, from which we will get a lot of “closest minima”,
but very often different starting points will reach the same minimum. The first problem is thus
generating starting conformations, and only then we get to optimization.
As for generating starting conformations, CA has three peculiar methods:

107
1. systematic method or grid search
2. model building method
3. random search

Systematic method explores the conformational space systematically and reproducibly. E.g.,
ethane has a single dihedral angle; if we change its value from 0 to 360◦ with a 10◦ step, we will
have 36 starting conformations. Since ethane has three minima, on average 12 conformations will
reach the same one. Unfortunately, in most of the cases we do not know the PES, so this kind
of logic cannot be applied. The total number of initial conformations is given by
dihedral
Y 360◦
N=
i
θi

where θi is the step for each dihedral angle; usually it is the same for all angles, but it is just an
habit.
This approach is easy and reproducible, but also prone to combinatorial explosion. With 5
rotable bonds and a 30◦ step, we have around 250000 starting structure. This is not very time
consuming, while optimization is! With 7 rotable point, 30◦ step and 1 s of optimization, the
complete confomation analysis amounts to more than 400 days. This means that the systematic
method can be applied to larger molecules only after proper consideration.
Fortunately, some conformations are really ugly, with self-crossing or very close atoms, and
they would not be able to reach any suitable minimum: for any extent, they can be discarded
without remorse. Moreover, all similar conformations can be discarded, further cutting the num-
ber. This procedure is called pruning, as we can see from an example. In exane, only the central
three dihedral angles are relevant, namely ω1 , ω2 and ω3 . If we use a 120◦ , we would get three
values for each of them. This means that for each ω1 , three ω2 are possible, and equally for each
ω2 we will have three ω3 . This is where combinatiorial explosion comes from. If for a certain ω2
we get a bad conformation, we can cut the entire branch, effectly skipping the evaluation of the
three different ω3 .
Another problem arises for cyclic molecule. Obviously, in generating the starting conformation
we may lose cyclity, so we generate a pseudo-acyclic analog by breaking a bond. Before starting
optimization, we check that the structures generated have the two atoms of the broken bond
close enough; this way, we discard anything too distorted to form a cycle. Some generation time
is wasted, but we do not waste during optimization.

Model building takes advantage of the fact that organic molecules are made of standard
moieties (building blocks, functional groups), so we can break down the molecule to its functional
group, and perform CA on them, storing the result into a database; at the end, we put together
everything. This is based on three assumptions

1. the blocks conformations are mutually independent


2. the blocks conformations are not perturbed by the rest of the molecule, so they are the
same as in vacuum

3. a good database is accessible

108
Random search is the opposite of systematic search, and generates structures randomly. We
have two main approaches:

1. the new conformation is obtained by randomly modifying the cartisian coordinates of a


previous one; this approach is easy to code, but generates a lot of distorted conformations
that make optimization heavier
2. we randomly modify a dihedral angle of a molecule; bad conformations are still generated,
so preliminary checks have to be performed

Nonetheless, from a starting guess conformation, we generate another by modifying the angle.
This conformation is optimized, and the resulting one is confronted with the database of the
conformations optimized previously. If there is a similar one, it is discarded, otherwise it is
stored. The new guess structure can be created in many ways:

1. we can use the previous accepted one


2. we choose one of the less represented structures of the database, biasing the CA toward
unexplored regions
3. we choose one of the less energetic structures of the database, biasing the CA toward low
energy regions
4. we employ a Metropolis algorithm
5. ...

We have to point out that in CA we are looking for all the minima, while in Monte Carlo
MD we just want to sample the all phase space: MD has no optimization phase. However, MD
can be exploited to generate the starting conformation. What we find with CA is the minimum
geometry, while MD passes through a lot of conformations that are close to the minimum and are
not so useful, resulting in a big waste of everyone’s time. Moreover, MD is performed at a certain
temperature, and the system will move around, at least until we decrease the temperature to the
absolute zero, when the system will freeze into the minimum.
The termination phase for random search needs a little bit more attention. Indeed, if sys-
tematic search ends when all the generated structure are optimized, random search can go on
until the end of time; we can stop it by

• setting a maximum number of generation cycles


• forcing an end if for n step the algorithm does not find any new conformation

18.3 Advanced optimisation approaches


Simulated annealing The term annealing comes from the semiconductor industry, where it
refers to technique used to obtain a perfect Si crystal, via slow cooling of the molten material.
The same process can be simulated with molecular dynamics: the system is equilibrated at
high temperature and then cooled down at absolute zero. If the cooling is infinitely slow, the
system will fall into the absolute minimum; however, the simulation can only proceed step by
step, waiting for equilibration. This means that at intermediate temperatures the system can
get stuck in another minimum, preventing a proper optimization. We avoid this by performing
a set of simulation: each of them will reach a minimum, and the most reached one will probably

109
be the absolute. Simulated annealing is an example of a molecular dynamics application in
conformational analysis, although it is not a plain one. Another possible application is a sort
of quenching procedure, in which alongside the simulation some conformations are extracted
and optimized.

Genetic algorithms Biological evolution1 is a natural optimization process and inspires a


entire set of approaches. In our case, the phenotype is the 3D structure of the molecule, whereas
the genotype is the list of dihedral angles. The optimisation starts with a random generated
population of individuals, subjected to a fitness function that is the potential energy: the lower,
the better.
Evolution, like in biological populations, happens by passing from one generation to the next
one, by mating of the individuals; this is implemented through exchanging between different
individuals bits of the genome (genome crossover). Moreover, we must ensure that the fittest
individual will generate more offspring, like in Darwinian evolution; for this we use a circle
selection: to each individual we assign a circle, so that the lower the energy the higher the circle.
Later, we apply an hit-or-miss algorithm to determine which individuals reproduce. This kind
of algorithm can be stopped after a certain number of generation or energy value, since in most
of the cases convergence is not sure. A large number of runs is required to get a result of any
relevance.
In addition, we can introduce an operator that simulates mutations, by randomly changing
a gene (dihedral angle) in a random individual. Since all of this is a simulation, we can also
use some cheats. For example, we can save very fit individuals throughtout the generations; or
we can perform a classic geometry optimisation (something like intensive training) on a selected
group of individuals, creating a so called Lamarkian genetic algorithm, because we allow this
adquired modification to be transmitted.
Convergence is still a problem, since we cannot be sure that the individuals that reproduce
are the best ones, but just that they are the fittest of that specific generation. We solve this
problem by splitting the population in different enviromental niches, like what happened with
Galapagos birds; this is done by running multiple simulations.

18.4 Cluster analysis


From CA we can easily get thousands of very similar minima, so a simplification can be very
useful. This is done through cluster analysis, that generates similarity subgroups from a large set
of structures, based on measure of distance or difference. These are usually the RMSD (distance)
v
u N
uX
u
u (xi,a − xi,b )2
t i
RMSD =
N
where x can be the cartesian position, the cartesian distance or the dihedral angle. Another
measure can be the module of the difference of analogous dihedral angles (city block system).
1 There were many different hypothesis on how evolution works. The first one was the catastrophes theory, ac-

cording to which living organisms evolved after a big environmental catastrophe (like dinosaurs). Then Lamarkian
evolution stated (not completely correctly) that adquired characters could be inherited, so giraffes got their long
necks because they kept straining themself to reach high leaves. Finally, darwinian evolution stated that a tandem
of mutation and selection garantees the survival (therefore reproduction) of the fittest. This evolution theory has
now evolved (eh eh) to include genetics: the genotype is partially transimitted to the offspring that will probably
manifest it through its phenotype. Sex is where most of the mutation happens.

110
Obviously, before evaluating these measures, the structures must be superimposed. Different
measures give different values of similarity.
After a difference measurement has been established, the collecting algorithm can be initiated.
These are of different categories, illustrated as follows.

hierarchic that is a sequential algorithm. It can be


agglomerative puts individuals together, reducing the initial number of clusters
divisive splits the individuals, increasing the initial number of clusters

The linkage condition can be set as


simple l. in which the cluster distance is the shortest distance between the individuals of
different clusters
average l. in which the cluster distance is the average distance between the individuals
of different clusters
complete l. in which the cluster distance is the longest distance between the individuals
of different clusters
No one is better than the other. Hierarchic algorithms require to compute all the distances,
so are not suitable for large data sets

non hierarchic that is not sequential. A good example is the K-means algorithm, that divide
the population in K subpopulations. In each of them a centroid is identified, and then the
clusters are re-organised so that each of them collects the nearest neighbour of its centroid.
This process is iterated, taking every time the centroid as the individual with the lowest
sum of distances in respect to the others. Convergence is reached when modification no
longer occurs, or the distance is below a certain threshold. The centroid is then taken as
representative conformation.

111
Chapter 19

Protein folding and stability

Protein folding uuuh uuuuuh!!

19.1 Characteristics of protein folding


In 1957, Anfinsen elaborated an experiment to understant the denaturation process of proteins.
In it, a protein of well known folded structure was denaturated under different stresses (tempera-
ture, acids, salts). Once the stress was removed, Anfisen noticed that the protein turned back to
the same original folded structure. After this experiment was successfully repeated on different
proteins, he concluded that the primary structure is closely associated to a particular 3D struc-
ture. We now know that the opposite is not true, and a mutation in the sequence usually does
not change the folded structure: the sequence determines the structure, but not the opposite.
Some time later, in 1968, Levinthal realised that if we treat the protein as a amino acidic
polymer, the time it will take to explore all its conformations and reach the folded one will be
greater than the age of the universe. Since life exists thanks to protein folding, and experimentally
this process does not exceeds one second, this is known as Levinthal paradox. The folding
process is not a simple exploration of the phase space, but it is guided by the amino acidic
sequence.
These two anecdotes underline that protein folding is a twofold (or duofold eh eh) problem-
atical,

thermodynamically because each protein possesses a folded structure and many denaturated
ones, so the folded structure has zero entropy and we need to describe a polymer that folds
with zero entropy
kinetically because the single folded state is reached in finite time

Defining the folded state and the folding pathway is the protein folding problem.

19.2 Protein as a frustrated system


Even if the native structure is an absolute minimum, it does not mean that everyone of the
thousands interaction is the optimal one: the folded protein is a frustrated system, globally
optimized, locally not always so.

112
The classic example of a frustrated system is a group of three people that hate each other
and have to walk on one of the two sides of a road. The optimal solution always ends up being
two people on the same side, that is a locally non optimized solution.
While the typical frustrated system presents many different minima, separated by huge bar-
riers, a protein presents a distinct low energy native structure, well separated from the others.

19.3 Hydrophobic interaction in protein folding


The interactions that characterize protein folding are mainly electrostatic, Van der Waals, H-
bonds and hydrophobic ones. We will now discuss about the last one. We can decribed the
structure of liquid water as a network of H bonds; an hydrophobic molecule in water cannot
replace the lost H bonds, so the system has to rearrange, in two manners:

• water forms a cage around the hydrophobic molecule, paying the entropic toll

• the hydrophobic molecule changes conformation to minimize surface area, paying the en-
thalpic toll

This changes can be synthesised by the hydrophobic effective force. An effective force is the
derivative of the free energy in respect to an order parameter.

19.4 Ideal chain polymer


Like the ideal gas, the ideal chain is the simplest model we can use to describe a protein. In it,
the particles are constrained on a string and have no dimentions or interactions. The constrain
is due to the chemical bonds, that are considered constant. We use the end-to-end distance as
order parameter, and as only variable for the free energy

F = U − T S = −T S

Since there are no interactions, orientations doesn’t matter and the system is equivalent to
a single particle moving by random walk1 . The end-to-end distance ρ is the sum of the step
vectors, that are our bonds, so
X
ρ= li
i

Since we are talking of random walk, the average ρ will be zero,

hρi = 0

so we have to extract all the informations we need from hρ2 i, that is the square of the variance.
We can evaluate it as
n−1
X n−1
X n−1
X
hρ2 i = li • lj = l2 + li • lj
i,j i=j i6=j

Since there is no correlation, the sum over i 6= j is null, and


p √
hρ2 i = nl2 ⇒ ρ0 = hρ2 i = nl
1 For more about random walk, see G.Mandelli - Introduzione alla Fisica Statistica, only available in italian

113
where ρ0 is the ideal chain length and n is the number of monomers. Notice that rigour will
require n − 1 instead of n, but for large n this is a good approximation.
We can get the probability distribution of the end-to-end distance thanks to the central limit
theorem: a sum of uncorrelated stochastic variables has a gaussian distribution, so
2
/2ρ20
P (ρ) = e−ρ

The number of chain with length ρ (nρ ) is then given by the total number of chains N weighted
by P (ρ):
nρ = N P (ρ)
With this, we can estimate the entropy as

S = kB ln W

where W is the number of states accessible at a given ρ, i.e. the number of chains nρ :

S = kB ln nρ = kB ln N + kB ln P (ρ)

Given the definition of F , we have


ρ2
F = −T S = −T S0 +
2βρ20
This means that the free energy is a parabolic function of the end-to-end distance ρ, with a
minimum in ρ = 0: the ideal chain collapses in a point. This is obviously false for a protein, but
a normal polymer can actually act as an entropic spring, pulling together two nanoparticles
linked to its extremities.
This very simple model is therefore capable of describing something, and can be improved by
adding angular correlation between close vectors; in this case,

hρ2 i = nλ2

where λ is now the correlation or persistance length, that is the distace after which monomers can-
not feel each other. Now the ideal chain is divided in small self-correlating blocks, and monomer-
monomer interactions will push our model further.

19.5 Globule and coil model


Proteins are polipeptides that uniquely fold, because their sequence was selected by years of
biological evolution: we need a model that incorporate this. The ideal chain was able to portrait
the entropic spring, but that’s all, some improvement is still required.
We can start from the partition function of a collection of particles (a gas, for example)
 
ZZ  X p2 X 
i
Q= dpdq exp −β −β U (qi , qj )
 2m i

i,j

that in the simplest case can be divided in an integral over positions Qq and an integral over
momenta Qp ; we already know from theory that
Z  3N/2
−βT 2πm
Qp = dp e =
β

114
while Qq depends on the potential. To dodge any problem, we can expand it with the virial
expansion
N2 N3
 
N
Qq = V 1 − V 2 B(β) − V 3 C(β)
V V
where B and C are the virial coefficients and
N
ρ=
V
is the particle density. With this, we can explicitate Q in the free energy F :
1 1 1
F =− ln Q = − ln Qp − ln Qq
β β β
 
1 3N 2πm 1
=− ln − ln Qq
β 2 β β
We can know approximate ln Qq with something reasonable, since

ln(1 + x) ∼ x

for x  1; this means that

ln Qq = N ln V − V ρ2 B(β) − V ρ3 C(β)

For the virial expansion, ρ must be small, so the logaritm approximation is applicable.
We can further approximate by carefully considering ρ. An ideal gas is isotropic, so

ρ = hρi hρi2 − hρ2 i = 0

This is what is commonly known as mean field approximation; it is good for gasses, but for
polymers obviously is not: we can use it if we take as box a sphere centered in the polymer centre
of mass, with radius equal to the polymer characteristic chain length R0 . The low density and
this sphere make possible to apply both virial expansion and the mean field.

Mutually repulsive monomers Let’s focus now on the case B(β) > 0; this means that a
repulsive force is present between the particles, so that the volume V > V0 , with V0 ideal volume.
Likewise, the characteristic length R > R0 , characteristic length of the ideal system. To use the
mean field approximation, we need
4
V0 = πR03
3
so the ideal density will be
N 3N
ρ0 = =
V0 4πR03
3N 3 1
= √ = √
4πl3 N N 4 πl3 N
where R0 was substitued as the characteristic length of the ideal chain, with an l step. It is clear
from this formula that
1
ρ0 ∼ √
N
This relation holds even in the non-ideal case, with ρ < ρ0 .

115
At this point, we want to know the number of scattering events between monomers; the
two-bodies collisions are given by
1 √
P2b = ρN ∝ √ N = N
N
while the three-bodies are
P3b = ρ2 N ∝ 1
This tells us that these events are so rare in a chain of length N , that we can trunkate the virial
expansion to the second order, since ρ is really really small, so small that three-bodies scattering
is neglectable. We now introduce the constant α = R/R0 with R characteristic length (end-to-
end distance) of the system, while R0 is that of the ideal case. The free energy as a function of
R is made by the ideal chain one and a virial correction at the first order:

3R2 V ρ2 B(β)
F (R) = +
2βR02 β

If we switch to α, we get √
3α2 N B(β)
F (α) = + +c
2β βl3 α3
We can find the α that minimize F by putting the α derivative equal to zero:
s√
∂F N B(β)
=0 ⇒ α=
∂α l3

Since
B(β)
R = αR0 = 3 N 3/5
l

We recall that for the ideal chain, R ∼ N , so the repulsive chain is more inflated.

Mutually attractive monomers The opposite case, B(β) < 0, is not as simple as this. Due
to the attractive interactions, we expect a collapsed chain, so that the low ρ approximation is
no longer viable; the three-bodies interactions are no more negligible and the virial expansion
cannot be trunkated to the second order any longer.
It seems not, but this is a good news: since the virial coefficients depend on T , B(β) changes
sign with the change of T , representing two different tendencies:

coil is an inflated chain dominated by repulsion (B(β) > 0)


globule is a compact chain dominated by attraction (B(β) < 0)

Between the two, when B(β) = 0 there is the globule-coil transition, that could represent
a crude approximation of protein folding. The main difference is that the globular object does
not have a native conformation entropy, since the compact structure has a different number of
accessible states.
Nevertheless, a good correction is available; if we have a chain of length N in a sphere of radius
d, roughly the ratio N/d does not change if the system gets bigger. Magically, since entropy is
estensive, we can heuristically take
l2 N R2
S = 2 = 20
d d

116
as the entropy value. Then we slap it into the free energy, that now has the third order virial
term √
N B(β) C(β) 1
F (α) = + 6 6+
βl2 α3 βl α βα2
where the last term is the new scaled entropy. By minimizing F along α, we obtain

3
R= N

So that, given N , the globule is the more compact structure, followed by the ideal chain and
finally by the random coil. The transition from globule to coil increases R and it is related to
the temperature dependence of B(β). Nonetheless, this is still not a good description, since we
are still missing the native structure; indeed, the globule has a lot of accessible structures, so it
has entropy different from zero.

19.6 Random energy model


We need to change framework, taking a different approach. The random energy energy model
is defined with the microcanonical ensamble, with N, V, E constant, and evaluates the total energy
of the system as the sum of a constant number of independent random conctat energies:
Nc
X
E= εi
i=1

where the independent random conctact energies εi are uncorrelated (the central limit theorem
will play a role, later) and will take the shape of something similar to a step function, since there
is conctat or not.
In this model, a generic sequence of amino acids arranged in a particular structure is just
a composition of conctat energies: structure and sequence are no longer important. We assume
that

1. there are only 2-bodies interactions


2. there are only short range interactions
3. the number of conctats Nc is constant
4. we are treating with a random generic polymer

The first two points mean that the conctat energy is either full or null. The point number 3
seems rather odd at first sight: in theory, different structures should have a different number of
conctats; however, a protein in solution is always in compact structure, even when not folded.
Compact structures have more or less the same amount of conctats, so we can assume it constant.
Finally, 4 means that every amino acid can occupy each position; this is obviously not a protein,
but we will address this discrepancy with a successive correction.
As we predicted, central limit theorem tells us that the energy probability distribution is
gaussian:
(E − E0 )2
 
P (E) = K exp −
2σε2
with
p
E0 = NC ε0 σE = Nc σε

117
The number of structures at a certain energy will be proportional to this probability,

(E − E0 )2
 
n(E) = K 0 γ N exp −
2σε2

where K 0 is a normalization constant like K, γ is the average coordination number and N is the
length of the chain. This way, γ N is the estimate of the number of compact conformations.
In general, N 6= Nc , but for lattice models we have N ' Nc . A lattice model describes
the polymer with the monomers on the vertexes of a simple cubic lattice; in this case, it is quite
easy to define conctacts, limiting the interaction to the nearest neighbours. For the lattice model,
then,
(E − N ε0 )2
 
n(E) = K 0 γ N exp −
2σε2
by using the identity
γ N = eN ln γ
we contract everything to

(E − N ε0 )2
 
n(E) ∝ exp N ln γ −
2σε2

Since n(E) is the number of configurations accessible to the protein as a function of the energy,
it should be an integer, while the exponential function is obviously a real number. This means
that trunkation is needed, and if the exponential is less the 1, no conformations are available.
We are then interested in the sign of the exponent, since
(
x > 0 means ex > 0
x<0 means ex < 0

By simple algebra, we get that the exponent is greater than zero if


p
|E − N ε0 | < N σε 2 ln γ

We are focusing on the low energy regions, so we can neglect the portion with E > N ε0 , getting
to p
E > N ε0 − N σε 2 ln γ
Otherwise, the exponent is negative and there are no states. Instead, when the exponent is zero,
only one state is accessible and we are at the critical energy
p
Ec = N ε0 − N σε 2 ln γ

Below Ec , a deep ocean of nothingness stands in wait. From n(E) we can evaluate the entropy
S, as
(E − N ε0 )2
S(E) = kb ln n(E) = N ln γ −
2σε2
In microcanonical ensemble,
1 ∂S
=
T ∂E
so
1 E − N ε0
=−
T N σε2

118
This allows us to define the critical temperature as
−N σε2 σε
Tc = √ =√
N ε0 − N σε 2 ln γ − N ε0 2 ln γ
that is independent from N .
In the canonical ensemble (constant N, V, T ), at each T the energy will fluctuate, but if
we lower T the energy will drop towards Ec . When there, the system is virtually frozen and
further reducing T will not have any effect. All of this looks like a phase transition towards
nothingness, that is called glassy phase transition. The problem now is that the lowest energy
state is unique, but very close to a continuum of very different states: we are describing a random
polymer, not a protein that has been evolutionary selected to present a certain structure. Actually,
a protein presents correlations that emerge in the native structure, and in the native structure
alone.
For this, we add a native state separated from the others by a large gap δ: the random energy
model just describes the unfolded compact structures, but a structurally dissimilar one emerges
from evolution. This structure will have free energy

FN = EN

since SN = 0, whereas the unfolded states will have

FU = EU − T SU

and EU > EN for hypothesis. At 0 K, only the native state is occupied, but ∃T so that

FN = FU

and the two states are equally occupied; this is the folding temperature Tf
EU − EN δ
Tf = =
SU SU
We now have a set of parameters that allows us to classify the behavior of a polymer. If

Tf > Tc the system folds before freezing, and behaves like a protein
Tf < Tc the system freezes before folding, and behaves like a polymer that gets stuck in glass
phase

The same conditions can be expressed in terms of the ratio Tf /Tc ; if greater than one, the protein
is able to fold.
Sometimes, some states can be between EN and Ec , but since they have zero entropy and
energy > EN , they are thermodynamically unimportant. Much more dangerous is the case of
two native states, structurally distinct. This situation is typical of the prionic proteins, that have
a very stable misfolded structure in which they get stuck, generating terrible conditions in the
subject.

19.7 An overview of folding kinetics


Until now, we were preoccupied of localizing the native state, but from now on we will focus on
how the protein reaches it. We recall that, from Levithal paradox, we deduced that the sequence
itself encodes the folding pathway. To describe the folding process, two analogies were exploited:

119
chemical reaction from the unfolded to the folded state, through an unknown transition state.
The main problem sits in the choice of the reaction coordinate
phase transition of the first order, that is a better description, even if the chemical reaction
approach can still enlighten us about the mechanism

More informations will be handed in the next sections.

19.8 Chemical reaction


As we said, the reaction coordinate is quite difficult to identify; hereafter there are some of the
attempts:

chain volume it can obviously distinguish coil from globule, but cannot identify the native
state among the globules
number of conctat is the same, since an elongate chain as very few conctats, while globules
have a lot; still it cannot identify the native state

conctats ratio is instead the ratio between the native conctats and the total number of conctats,
Nnat
Q=
Ntot
and can distinguish globular (Q < 1) and native (Q = 1) structure, but it has some
problems with coils. Although is particulary good for thermodynamical purposes, the as-
sumption that all native conctats are equal limits is descriptive power for kinetics

Another weak point for the reaction picture is the identification of the transition state, that
inevitably fails at accounting for the entropic nature of the folding barrier.

19.9 Phase transition


A first order phase transition goes through a metastability phase and procedees through a nucle-
ation mechanism. Now, we are going to describe the phase transition as a chemical reaction, so
that we can learn where the two differ. For water crystallization, we can take the system density
ρ as reaction coordinate; a freezing system will go from ρl to ρs < ρl , passing by a transition state
with density ρ∗ . This would mean that during the transition, all the system changed its density:
since nucleation is a proven fact, this model cannot be reliable. In facts, after the freezing point,
some water will have ρl , while some other will have ρs . Moreover, since the height of the barrier
scales with the system dimensions, no substance will be able to freeze in large quantities.
The same happens in protein folding: the elongated polypeptide initally collapses for the
hydrophobic effect, reaching the globular stucture quite fast. At this point true folding begins,
with some consecutive amino acidic groups forming local elementary structures (LES). These
portions of secondary structures generate quite fast again, since they are local. When – during
phase space exploration – these blocks meet, they originate the post-critical folding nucleus;
this process is the true rate determining step, since the encounter between the LESs can take
some time. Once the nucleus is formed, the molten globule quickly turns into the folded structure.
This pathway well describes the physics of folding. The folding nucleus is not a set of n native
conctacts, but it is very specific for each protein, being a set of those specific n conctats; this

120
is the reason why the conctat ratio is not a good reaction coordinate. Moreover, the TS is no
more a single structure, but a ensemble of those with the folding nucleus. As for a normal TS,
the transition state can go forward, but can also go backward to the unfolded state. All of this
is the solution to the Levinthal’s paradox, since the sampling of the phase space procedees just
until one of the bilions of structures with the post-critical folding nucleus is reached.
The phase transition frame gives a better scaling of the folding time in respect to the chain
length N , exp(αN 2/3 ). This is much close to the experiment that the reaction one of exp(αN ).
At the same time, mutagenesis confirms the presence of a nucleus that if mutated will make the
folding much more difficult.
As a remark, the formation of the LESs does not mean that secondary structure emerges
before and guide the formation of the terziary structure, since usually secondary structures are
not even stable on his own: yet some are, and are able to form the nucleus. What this slow
step hints is that protein folding can be targeted to prevent folding. A peptidomimetic can
partially substitute the LES in the binding, slowing the folding down to the point of making the
protein useless. In this case, resistence can be achieved only through matching mutations, since
a mutation alone will prevent folding too. Not great, not terrible.

121
Chapter 20

Anti-freeze proteins

20.1 First order phase transition


If we bring a bulk of water below its freezing temperature, as long as it is pure and undisturbed,
nothing happens. The supercooled liquid is tranquil up to −40 ◦C, but any disturbance will
make it freeze. In principle, Gibbs’ free energy is both an implicit and explicit function of the
temperature T :
∆G(T ) = ∆H(T ) − T ∆S
The phase transition is therefore described by a critical temperature, the freezing temperature
T:

T > Tf we have µl < µs , so the liquid is stable


T = Tf we have µl = µs , so there is no driving force, and everything is still

T < Tf we have µl > µs , so the solid is stable

This thermodynamic description is plain and simple, but the kinetic one is not. Close to Tf ,
we can consider both H and S equal to their equilibrium value, ∆Ht and ∆St , where
∆Ht
∆Gt = 0 ⇒ ∆St =
Tf

so that
∆St
∆G(T ) = ∆Ht − T
Tf
∆Ht Tf − T ∆Ht ∆Ht ∆T
= =
Tf Tf

This means the sing of ∆G depends on both ∆H and ∆T .


As we saw previously, the kinetic process is a form of nucleation, that can be

homogeneous if the substance is pure and nucleation happens by random fluctuations

heterogeneous if other substances are present that can catalize the nucleation, like sand makes
oysters create pearls

122
To improve our comprehension, we will consider the free energy in terms of molar volume Vm :
1 ∆Ht ∆T
∆Gv =
Vm Tf
The creation of a new phase corresponds to a volumetric term (the new phase) and a surface
term (the interphase):
∆Gtot = ∆Gvol + ∆Gsurf
4
= πr3 ∆Gv + 4πr2
3
In freezing, ∆Gv < 0 is T < Tf so ∆Gvol is always negative, while since γ > 0 what remains is
positive. As a sum of a positive square and a negative cube, ∆G will assume the shape shown in
Figure 20.1. This curve presents a maximum at (r∗ , ∆G∗ ), that we can easily identify:
d !
∆G = 0 = 4πr2 ∆Gv + 8πrγ
dr
= r(4πr∆Gv + 8πγ) ⇒

r∗ = −
∆Gv
and
4
∆G∗ = ∆G(r∗ ) = π(r∗ )3 ∆Gv + 4π(r∗ )2 γ =
3
32πγ 3 ∆Gv 16πγ 3
=− + =
3∆G3v ∆G2v
16πγ 3
=
3∆G2v

2
∆G
r2
−r3
1 r2 − r3

r
1 2 3 4

−1

−2

Figure 20.1: Crystallization free energy tendencies

To analyse them, we put ∆Gv in


2γTf Vm 16 3 Tf2 Vm2
r∗ = − ∆G∗ = πγ
∆Ht ∆T 3 ∆Ht2 ∆Tf2

123
This formulas clearly show that ∆G∗ ∝ ∆T −2 and r∗ ∝ ∆T −1 , so that less energy and a smaller
radius are required the farther we go from Tf ; it is easier for random fluctuation to generate
crystals big enough to become crystallization nuclei.

20.2 Antifreeze protein action


Ice binding protein are a family of proteins that specifically binds on ice. They are further
classified as

AFP (antifreeze proteins) present in fishes and insects that live in cold environments
INP (Ice nucleating proteins) present outside some bacteria, promote the formation of ice as a
mean of defence
IAP (Ice adhering proteins) present in some polar algae, to bind them to the pack (Figure 20.2)

The AFP are part of the cold defences of those organism; in vitro, their effects include

• depression of freezing point (freezing hysteresis)


• increasing of melting point (melting hysteresis)
• all in all, the sum of the two, thermal hysteresis, in a non colligative manner
• ice recrystallization inhibition, by interrupting the Ostwald ripening1

The AFP applications, on the other hand, include

• cryoproctetors for tissues and organs


• food processing
• crop protection in agricolture
• surface deicing (airplanes wings)

A well known effect in nucleation, is the Kelvin effect that empirically describe the depen-
dence of the melting temperature Tm on the crystal radius Rp :

bulk αE K
Tm (Rp ) = Tm −
Rp

where Tmbulk
is the melting temperature of the ice bulk; this means that the smaller the crystal,
the lowet the melting temperature is. On this effect is based the AFP action; they adhere on the
surface of the ice, so that it cannot grow there, but just in the gaps between the proteins. This
surface modification descreases the melting temperature by decreasing the effective radius. In
other words, the AFP act as a secondary nucleation inhibitors, inhibiting at the same time the
Ostwald ripening. A simple experimental model shows us that the melting temperature drop is
cos θ
∆T ∝
d
1 Ostwald ripening is the name of the crystal growth mechanism: the critical crystals feed on the smaller ones

to fuel their growth

124
where θ is the surface conctat angle and d is the distance between proteins. This teaches us that
the AFP effect depends on the concentration of adsorbed proteins. This interation is obviously
irreversible.
There are some issues with this model, though:

• different AFP have different thermal hysteresis at the same concentration


• the activity of some AFP is influenced by their concentration

This is why AFP action is still a matter of discussion.


However, we can reach further conclusion by considering the nature of the ligand, ice. Ice is
a crystalline solid, and different AFP have different face affinity:

active AFP present in fishes, bind to the prismatic plane


hyperactive AFP present in insects, bind both the prismatic and the basal plane

When the ice crystal is bound on the prismatic plane, it keeps growing on the basal, until it
forms an elongated bipyramid; this is the reason why concentration is important in active AFP,
since new AFP have to cover the growing surface. Hyperactive AFP, on the other hand, generate
short crystals and have greater thermal hysteresis. This differences find their reasons in organism
habitats: fishes live in water, and are full of liquid in equilibrium with the cold water, so they
need AFP with a faster action. Insects, instead, live on the colder heart, so they prefer slower,
more efficient AFP.
In this, we understand the AFP main action is not to avoid freezing, but to hinder Ostwald
ripening so that the ice crystals do not damage the cells.

Hyperactive AFP structure The insects AFP are solenoid-like, with internal disulphuric
bridges keeping the structure rigid. On the exterior, amino acids with both hydrophilic and
hydrophobic residues link with the ice. This interaction is usually mediated by clatrates, that
form between the ice and the protein, while the latter explores the surface. No preorganization
is required.

AFP size effect The bigger the AFP, the bigger the obstruction, the bigger the effect.
Solenoids can increase their size by increasing their spire number.

Peptide coated surfaces Antifreeze peptides, obtained by cutting the ice-binding domains
form AFP, have vey low activity, but guess what? If we coat a surface with them their activity
is much higher, due to a probable cooperative effect.

The end

You can find more notes of other classes on https://inquinamentochimico.wordpress.com

125
Figure 20.2: That is not just ice ahead, it is the pack

126

You might also like