Simulation Pack Edition-1
Simulation Pack Edition-1
G.Botti
II semester 2018/19
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives
4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-nd/4.0/ or send a letter to Creative Commons, PO
Box 1866, Mountain View, CA 94042, USA.
Le immagini sono esenti da CC, in quanto reperite dal materiale didattico fornito dal professor
Pieraccini
Si ringrazia Federica Marelli, per aver fornito una buona parte del materiale da cui questo testo
è stato tratto
1
Contents
1 Molecular Mechanics 4
1.1 Applying classical mechanics to a molecular system . . . . . . . . . . . . . . . . . 4
1.2 Force Fields and atom types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Force Field of a noble gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Diatomic molecule: stretching term . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Triatomic molecule: bending term . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Polyatomic molecule: torsion term . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Polyatomic molecule: intermolecular non-bonding term . . . . . . . . . . . . . . . 12
1.8 Lennard-Jones potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.9 Electrostatic potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.10 Extra (optional) terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.11 The parameters problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.12 Some relevant questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.13 FF application in coordination chemistry . . . . . . . . . . . . . . . . . . . . . . 18
1.14 FF computational weight and classification . . . . . . . . . . . . . . . . . . . . . 18
2 Halogen Bond 20
2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 σ-hole nature and parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Halogen bond in bioactive molecules . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Distorted halogen bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 XB applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 XB modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Extra point charge (EPC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Docking procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Scalable anisotropic model (SAM) . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Molecular Dynamics 31
4.1 Finite differences method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Time step in molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 About the stability of MD trajectories . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Size and boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2
4.5 (En)sampling different ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.6 Verlet neighbour list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.7 Periodic Mesh Ewald approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.8 Equilibration phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.9 Restraints and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.10 Just restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.11 Constraints only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.12 Analysis of MD Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3
10 An handful of examples of MM applications 78
10.1 p53-hdm2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
10.2 Vinblastine in microtubules targeting . . . . . . . . . . . . . . . . . . . . . . . . . 78
10.3 Rapamycin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
10.4 An example of peptide strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11 Jarzynsky equation 81
11.1 Non-equilibrium simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.2 Proof for Jarzynski equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
11.3 A time of trials and tribulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
11.4 Cumulant expansion of J. equation . . . . . . . . . . . . . . . . . . . . . . . . . . 84
11.5 Stiff spring approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
12 Umbrella sampling 87
12.1 A qualitative introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
12.2 An analytical elucidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
12.3 Adaptative umbrella sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.4 WHAM (!!) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
14 Metadynamics 92
14.1 Something new, something old . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
14.2 Gaussian sand does not drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
14.3 Collective variables socialism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
15 Some examples 96
15.1 Test on the number of gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
15.2 Parrinello’s benchmark work on β-hairpin . . . . . . . . . . . . . . . . . . . . . . 97
15.3 Osmoprotectants 2: electric boogaloo . . . . . . . . . . . . . . . . . . . . . . . . . 97
4
19 Protein folding and stability 109
19.1 Characteristics of protein folding . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
19.2 Protein as a frustrated system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
19.3 Hydrophobic interaction in protein folding . . . . . . . . . . . . . . . . . . . . . . 110
19.4 Ideal chain polymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
19.5 Globule and coil model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
19.6 Random energy model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
19.7 An overview of folding kinetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
19.8 Chemical reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
19.9 Phase transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5
List of Figures
4.1 Various forms of motum integration; (a) Verlet algorithm (b) Leap frog algorithm
(c) Velocity Verlet algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Pictorial representation of the periodic boundary condition setup . . . . . . . . . 35
6.1 From right to left, top to bottom: the potential energy function; the Metropolis
MC population result; the Parallel Tempering population result; the space-time
representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6
Chapter 1
Molecular Mechanics
In this chapter, we will discuss the usage of force fields as a tool for running
simulations of large systems. We will analyse and discuss each term of the
potential, ending with force field applications and classification
7
1.2 Force Fields and atom types
In molecular mechanics we describe the way atoms interact by setting up a force field (FF),
that is an expression of the system potential energy as a function of coordinates. Each one of
these fields has a different functional form, given by different terms that sum up to the total
potential energy.
In order to expediently write this functions, we have to introduce things called atom types.
This is an atomic tag that accounts not only for atomic number, but for hybridization and
chemical environment too. In this way, an sp3 carbon atom is a different atom type than an sp2
one, and an sp2 carbon is equally distinct from an sp carbon; a carbonilic carbon is not in the
same atom type than an alkenic or carbossilic one, as a carbon in a cyclohexane is not the same
of a carbon atom in an alkanic chain. For every application of molecular mechanics, then, we
have to define carefully both the functional form of the force field and the employed atom types.
From chemical experiments, we know that similar groups of atoms in different molecules
have the same behaviors, id est certain properties can be transferred between certain elementary
bricks. For example, the distance between a carbon atom and an hydrogen one is the same in all
aliphatic chains, and the same goes for vibrational frequencies. These elementary bricks can be
transferred, but first we must define them; we already know that proteins are made by amino
acids, DNA by azobasis and biomembranes by phospholipides: these fundamental units make
force fields an effective way to describe biomolecules.
We will now proceed by delineating the general aspect of a force field, in the case of
• a noble gas
• a bound yet small system (small molecule)
• a large system
Q = (q1 , . . . , qα , . . . , qN ) P = (p1 , . . . , pα , . . . , pN )
The total energy can be written as the hamiltonian function H(Q, P),
while U is the potential contribution. Here is where the problem starts, for the potential depends
(for a real gas) on the nature and position of each particle.
In the most simple of cases, it can be written as a sum of terms, each depending on the
relative distances ri :
X XX XXX
U= U1 (ri ) + U2 (ri , rj ) + U3 (ri , rj , rk ) + . . .
i i j>i i j>i k>j
8
To simplify, we need first to describe and understand each term.
The total one-body interaction U1tot is the potential energy of the particles in a given field
(e.g. gravitational, electric), therefore can be neglected in most of the cases. On the other hand,
the three-bodies total contribution U3tot is an hefty chunk of energy, but can be implicitly imple-
mented in the two-bodies potential U2tot , by using experimental parameters. Therefore, in first
approximation we can just consider two-bodies interactions.
Real gasses can be described as small hard spheres moving in space, but we need a proper
pair interaction potential to use as U2 ; some of the most common are described afterwards.
Hard sphere potential It describes spheres that do not interact and cannot penetrate each
other. Totally vanilla sphere, then; the kind of sphere your grandma likes. Analytically, it takes
the form (
∞ for r < σ
UHS (r) =
0 for r ≥ 0
This mean that the sphere are still impenetrable (r < σ1 ), but can interact (σ1 < r < σ2 ).
Soft sphere potential It allows the sphere to bounce against each other. It is usually described
as σ n
USS (r) = = ar−n
r
where a = σ n , n ∈ N. This means that for a low n there is a slight decrease of U , while for an
high n the potential goes more and more like that of an hard sphere.
9
Figure 1.2: Schematic representation of square wave potential
As we can see, it takes two parameters to make a Lennard-Jones potential, the well depth and
the zero-length σ. This potential can be divided in two contributions:
The second form it is employed to underline the minimum energy interatomic distance, rm :
r 6
LJ rm 12 m
U (r) = −2
r r
10
The third form clamps everything together into
B A
U LJ (r) = 12
− 6
r r
where
B = 4σ 12 A = 4σ 6
A careful student will now ask himself why he needs to remember all this formulas. This
is because different softwares implement different forms of this potential, each one allowing to
extract different parameters.
The Lennard-Jones is the form of the potential that can describe Van der Waals interactions,
given and σ, and rm or A and B. This parameters have a lousy physical meaning: corresponds
to the strength of interaction, while σ and rm are linked to atomic dimensions. For this, to describe
a box full of Ne, the Ne radius and the strength of the Ne-Ne are required. Instead, to describe
the a box full of two gasses, we can apply the Lorentz-Berthelot mixing rules; known σii ,
σjj , ii and jj , this rules allow us to calculate
1 √
σij = [σii + σjj ] ij = ii jj
2
11
Figure 1.4: Schematic representation of Morse potential
a large K means high stiffness, typical of a strong bond, that is intolerant to deformation
a small K means low stiffness, typical of a weak bond, that is tolerant to deformation
therefore
Ksingle < Kdouble < Ktriple
While we are at it, we should remember not to take seriously the physical interpretation of FF
parameters, but to limit ourself to understand which role they take in to the energy description.
Back to the track, we can easily see that this model is highly inaccurate at high r, because
it cannot predict the dissociation; therefore, for small molecules, we can push the expansion to
higher orders:
2 0 3 00 4
U (r − r0 ) = Kab ∆rab + Kab ∆rab + Kab ∆rab + ...
However, we must be cautious: higher orders not only mean more parameters, but have different
trends, as shown in Figure 1.5.
As we can see, the third order expansion presents an infinite well that creates problems during
optimization, requiring a reasonable starting geometry; on the other hand, fourth order diverges,
even if with a different behavior, but requires another parameter. The most expensive stretching
FF is of sixth order.
Let’s now discuss about the nature of r0 ; it is proper to call it natural bond length, because
the equilibrium length can be misleading; as a matter of fact, it does not correspond to any
equilibrium bond length in any real molecule, but it only describe the equilibrium length of a
diatomic molecule in vacuum. In a big molecule, the equilibrium bond length is due to many
factors.
Since every FF is different for any other, it is customary to give the measurement units of K:
−2
(
kcal mol−1 Å in Amber
K= −1
kJ mol nm −2
in Gromacs
Obviously, when treating a system made of more than one diatomic molecule, we need to take
into account the non-bonding interaction between the molecules too.
12
II ordine
600
III ordine
400
200
UE
0
−200
−400
−4 −2 0 2 4 6 8
U bnd = K(θ − θ0 )2
where the squared term corresponds to the deviation of the angle θ with respect to the natural
angle θ0 . Like before, we can obtain a better description by adding higher order terms, at the
cost of more parameters. Since we are interested in what happens in close proximity to the
equilibrium, the harmonic approximation is good enough for us. The units of measurement, as
always, change with the software employed:
(
kcal mol−1 rad−2
K=
kJ mol−1 deg−2
Stretching and bending are so-called hard degrees of freedom, i.e. great variations of energy
are produced upon small changes in them (remember: big penalties).
13
For these reasons, the Taylor expansion is no longer suitable; instead, we use Fourier serie
expansion, as a sum of cosine terms:
X
U (ω) = Vn cos(nω)
n=1
where Vn goes under the name of barrier or barrier height. This term is again a bit misleading:
the real barrier is associated to many factors, like steric interaction (of Van der Waals nature);
the true rotational profile is indeed given by the sum of rotational terms and by the non-bonding
terms. The natural number n is related to the periodicity of the energy, as shown in Table 1.1.
n angle of periodicity
1 360◦
2 180◦
3 120◦
For an organic molecule with a small amount of bonds, the first three term are enough. Let’s
now discuss some very simple examples.
Ethane has three equivalent minima (staggered conformations) and three equivalent maxima
(eclipsed conformations), for an energetic profile that just requires the n = 3 term to be described:
the others have Vn6=3 = 0. Rigorously, an infinite serie expansion would require all the n = 3k
terms, with k ∈ N, but it’s cool to truncate.
Butane instead, has an absolute minimum (anti conformation), an absolute maximum (sin
conformation), two relative maxima and two relative minima (gauche conformations), that have
a little more energy due to the steric clash of the methyl groups. The symmetry of the system is
therefore different from that of ethane, so that not only the n = 3 is required, but also the n = 1
is necessary to tune the energy profile. Everything is shown in Figure 1.6.
Ethylene has two minima, at 0◦ and at 180◦ , therefore it requires the term n = 2 (or n = 2k)
in order to get this profile.
14
2-butene has two possible isomers, cis (with higher energy) and trans (with lower energy),
therefore it needs the n = 2 term to get the periodicity and the n = 1 term to get the fine
structure.
In general, we can also find the Fourier expansion written as
1 1 1
U (ω) = V1 [1 + cos(ω)] + V2 [1 − cos(2ω)] + V3 [1 + cos(3ω)]
2 2 2
where the 1s and + or − are simply there in order to get maxima and minima according to the
experimental data.
Also generally, the torsion term is employed to finely tune the rotational barrier.
15
• the distance r is (in Cartesian coordinates) the square root given by the Pythagorean
theorem and, as we know, square roots are as difficult to compute as exponential functions.
The BH potential requires therefore to evaluate the square root, that LJ avoids thanks to
the even powers it has
As always, we are rarely interested to what happens at low r, so even if LJ is not as impressive
as BH, using it does not bring any problem, allowing instead a faster computation of the total
energy.
Finally, we can take a look at the complete form of U , that can be written as follows:
We will continue this dissertation by considering the last term of this sum, the electrostatic term
Uelst , the cross term U× and the out-of-plane term Uoop .
1. with point charges placed at atomic positions (i.e. nuclear coordinates), with the possi-
bility of a fine tuning with additional charges
2. with dipole moments placed on the bonds
This two descriptions should yield the same results, given that we look at the molecule from
enough distance; though some differences may occur in describing certain rotational barriers.
Nonetheless, the second choice is computationally heavier, therefore the point charges are pre-
ferred.
This charges are chosen to better fit the electrostatic quantum-mechanical potential of the
molecule; we can consider it as made by a nuclear and electronic term, as follows
N
ρ(r0 )
Z
X Zi
ϕesp (r) = ϕn (r) + ϕel (r) = − dr0
i
|Ri − r| |r0 − r|
This means evaluating the electrostatic potential of an external point r as given by the sum of
each nuclear potential and the continuous sum of the potential generated by every charge fraction
ρdr. By fitting this function to the quantum mechanical potential, we obtain the point charges.
First and for most, we must decide where we are going to put the r point; usually, molecular
surfaces are employed, because non-bonding interactions are the main focus. The choice of these
surfaces is arbitrary: sometimes they are the envelope of Van der Waals spheres, sometimes the
surfaces containing 90% of the electronic density.
We now evaluate ϕesp in some points of this surface, then we look for the set of N (where
N = number of nuclei) point charges, starting from a guessed value for each of them. At this
point we optimize the error function
pts N
!2
X X Q(Ri )
ErrF(Q) = ϕesp (rj ) −
j i
|Ri − rj |
16
where Qi is the value of the i-th charge, and the i sum is the electrostatic potential given by
all of them. If ErrF = 0, we got to the right function, but this is not as usual as one can think.
Therefore, we simplify this function as
m N
!2
X X
ErrF(a) = Yi − ai Xij
j i
This gives us a set of algebraic equations for each k, that we can write as
!
X X
Xkj Yj − ai Xij = 0 ⇒
j i
X X X
ai Xkj Xij = Xkj Yj
i j j
• forcing the equivalence between the sum of the point charges and the total molecular charge
• applying a RESP (Restrained Electro Static Potential), that adds an hyperbolic penalty
for the deviation of any charge from 0; in this way, we avoid unrealistic values of the buried
atoms charges
This point charge approach works pretty well, but it has some limitations:
• It fits the charges as seen from the outside, therefore it is unreliable for anything intramolec-
ular
17
• It can be a little crude
• It does not take into account how each charge depends from the conformation: different
minima have different ϕesp , so different charges; this mean that the charges are not polar-
izable
• Atomic charges are not fully transferable, but with a correct parametrization, we can take
the risk
• All QM values are taken in vacuum, while this kind of FFs is usually employed in solvent; to
compensate the obvious change in charges, ϕesp is evaluated at HF level, that overestimates
the dipole moment; this way, in error compensation we trust
Since this kind of potential is critical for anything biological, we can improve it by considering
• additional constraints, for example same charges on symmetrical atoms
• additional point charges, in certain charge-rich points, like triple bonds or lone pairs: the
more charges we have, the better the fit
Out-of-plane When a molecule is planar, the total internal angle is 360◦ . If the molecule gets
pyramidal, this angle changes, but great pyramidalization can be achieved by small variation of
this angle, therefore a strong harmonic penalty is require to prevent this. It is usually applied to
the angle χ between the bond and the plane, or to the distance d between the central atom and
the plane:
Uoop = Kχ2 Uoop = Kd2
Otherwise, high-barrier torsional terms can be employed to the same effect.
Cross terms Changing a certain term may influence some other: decreasing an angle, for
example, may elongate the bond; this means that bending and stretching are not completely
separated, and we use a cross term to take all of this into account:
Es/b = K(θabc − θ0abc ) (Rab − R0ab )(Rbc − R0bc )
We can have any combination of two or three terms coupling, therefore cross terms are only used
in FF for small molecules.
18
1.11 The parameters problem
Decided the functional form, we need to assign a value to each and every parameter that appears
in the force field, Table 1.2 shows what was done for a very famous FF.
This amounts to almost 0,2% of the required parameters; nonetheless, this FF is capable of
describing 20% of the existing organic molecules.
The process through which we find these parameters is another optimization of an error
function. We start from a training set of molecule with a lot of experimental data, then we use
the FF to evaluate the properties of the training set, beginning with guessed parameters; at this
point, we minimize
data
X
ErrF(prop) = wi (ref. value − calc. value)2
i
where wi is a generical weight.
From many parameters, we get many minima of ErrF; in order to get the best one, we can
• optimize the parameters sequentially, from chemical class to chemical class; this is an
application of transferability and ensures us that the task dimension gets smaller the more
we optimize making easier to add a new chemical class
• optimize all parameters en bloc, since the FF is unitary; this makes parametrization more
difficult, even if a better minimum can be obtained. Adding a new chemical class will
require a new, complete re-parametrization, though
Each molecule can have its own FF, that is as exact as useless: the purpose is to get the
experimental data without doing the experiment; we are not at all interested in reproducing the
experimental data of the training set! We are then required to test the FF on a validation set, i.e.
a set of molecules not contained into the training set, of which we have access to the experimental
data. If the FF is able to reproduce even the validation set data, it can work outside the training
set.
If we use the FF on molecules similar to those in the training set, we can achieve good results:
this makes biomolecules an ideal application for FF, because they are very similar between each
other. Organic molecules are good too. We must although remember that every FF has a purpose,
and employing it for something else may bring us to ruin: FF are good at linear peptides, but
not at cyclic peptides; in addition, since they focus on equilibrium states, protein folding is
completely inaccessible through them, so be careful!
The weight wi is used because all experimental data are equal, but some experimental data
are more equal than others: different techniques have different values, precision and equilibrium
condition. Some data are not even experimental, but are worked out by QM calculations (like
torsional terms, that require non-equilibrium data).
This brings us to the meaning of U tot . As far as its numerical value goes, U tot has little to
none meaning; this happens because the zero of the U scale is unobtainable: it requires all bonds
19
at natural distance (stretching term null), all angle at natural distance (bending term null) but
all atoms at infinite distance (Van der Waals term null). This means we cannot have a zero,
so we are not able to confront the values of U for two different molecules; only comparing two
different conformations of the same molecule is possible. FF can therefore be used to obtain
informations about the geometries and their relative energies; although they were developed for
organic molecules, some of them can also be applied to coordination compound.
In the case we are missing some parameters for our FF calculations, we can take different
paths:
O O
C C
C C C O O O∗
C C C C C C
In this case, a lot of chemical intuition and testing is required; the bare minimum is checking
how the properties change by modifying this parameters: if they change a lot, maybe we
should stop and re-parametrize; if they change a little, going on is acceptable
Missing parameters are a critical topic in FF implementation, since rarely softwares inform
the user when they use low quality (that is, totally nonsensical) parameters. Sometimes, rather
rough (ruspantelle) approximations are implemented; for example, some parameters are the same
for entire groups of atoms.
The teaching of this story is that just because you can do the calculation, it doesn’t mean
you will get a good result.
20
Decomposition We can be tempted to decompose the result of a FF calculation to gain insight
on the predominant terms of the potential energy. This procedure is surely possible, but we must
recall that it doesn’t show at all the physics of the system, but just how the FF is built. The
result are globally correct, but it is better to avoid any over-analysis, since a lot is flavour.
Class I is made by the simplest ones; they have all the basic terms, an harmonic stretching
and LJ potential for VdW; they are also known as harmonic or diagonal FFs and they are
largely employed for large system description
1 The trans-ligand influences the characteristics of the other ligand bond
21
Term Scaling
Stretching N −1
Bending 2(N − 2)
Torsional 3(N − 5)
Van der Waals N (N − 1)/2 − 3N + 5
Class II possesses cubic or quartic stretching, a more complex VdW potential and cross terms;
their applications are limited to little molecules
Class III has hyperconjugation, δ effect and polarization; this class is restricted to a selected
club of small molecules
22
Chapter 2
Halogen Bond
In this chapter, we will discuss about the nature of the halogen bond; after
that, we will illustrate some implementation methods
2.1 Definition
Halogen bond is an elaborated hoax.
Seriously, it is an interaction between an halogen atom and a Lewis base; since halogen atoms
are the most electronegative ones, this seems a quite exotic interaction, but experimental data
show us that this bond is close to linearity and its length is less than the sum of the VdW
radii. This occurs because, although isolated halogen atoms are spherical, when bonded they can
present an anisotropy usually referred as polar flattening; in fact, the halogen electron density
is mostly located on the bond, and it seems compressed. This originates a positive electrostatic
potential in the region sitting opposite to the bond, known as σ-hole, while a negative potential
belt appears, perpendicular to the bond axis.
23
2.4 Distorted halogen bonds
The halogen bond as a rather small tolerance to compression, losing a lot more energy by com-
pression than by extension; this means that this bond is very difficult to compress. On the other
hand, halogen bond can easily distorted angularly, if the linearity of the C-X-D interaction is
respected. The attack angle can vary, then. Obviously, the donor can modify both strength and
susceptibility of the XB.
A statistical analysis on the ligand-protein crystalline geometries shows there are three classes
of halogen interactions, namely primary, secondary and tertiary halogen bonds, classified through
the decreasing quality of the interaction. Strong linear (primary) bonds tend to force proper
structures to increase their energy, dictating the interaction geometry by the means of their
higher contribution to the binding energy. On the other hand, weaker bonds (like Cl ones)
generate less perfect structures, falling behind in the binding hierarchy.
2.5 XB applications
Certain ligands can be substituted by a different molecule that can form XB, sometimes even
with the same linking sites. By changing the nature and the topology of the interaction, new
approaches are now available to drug design.
On this topic, the possible synergy between HB and XB should be investigated. The statistical
analysis of the binding geometries tell us that the angle between an HB and an XB on the
same donor is always 90◦ , commonly with the planar HB and the out-of-plane XB. A fixed,
perpendicular XB does not influence the energy of a scanned HB, and the same vice versa; this
means that the two bonds are fairly independent from each other. Curiously, HBs perturb each
other, when in this very same configuration.
Replacing an HB with an XB can sometime improve significantly the overall binding energy,
even when it does not occur on the main linking position; e.g., the Cathepsin-ligand interaction,
that consists in a covalent bond, is significantly improved by a peripheral XB, that gives the
covalent bond enough time to form.
2.6 XB modelling
Before, the main effects of halogen atoms were hydrophobicity and dipole-moment changes. Now
we know the truth: they are useless as before, but someone thinks they play some role as bonding
site. This forces us to find a proper way to describe them in silico.
The halogen-donor interaction is of electrostatic nature, but it originates deep into the quan-
tum mechanics, so high order QM calculation should be employed. Obviously, this kind of cal-
culation are restricted to small molecules. For greater sets of atoms (up to the hundred), DFT
can be used, but for even bigger systems QM/MM hybrid methods are our only choice. However,
these calculations are computationally heavy and do not converge, so a good FF description is
required.
In this, the main problem is that X atoms are modelled as negatively charged VdW spheres,
so no XB can be seen on the horizon. We present two possible solutions:
1. extra point charges can be added to the halogen atom to reproduce the σ-hole; obviously,
the total charge is preserved. This approach only describes the predominant electrostatic
interaction
24
2. the halogen atom is not described as a sphere, but as an anisotropic shape, due to the polar
flattening
The first approach is more straightforward and common, against a more accurate and complex
approach number two.
no fit (nF) approach distance and charge are chosen previously, and the halogen charge is
modified accordingly. No QM calculations or RESP fitting are required, but charges have
to be known a priori
ruspantello fit (rF) approach after choosing distance and charge, all of the molecule charges
are refitted with RESP fitting. This approach requires an electrostatic potential grid
all fit (aF) approach the pseudoatom is treated as an extra point to fit electrostatic potential;
distance is fixed, but the charge has to be determined
The efficiency tests show that a very important parameter is the distance of the pseudoatom
from X. In the complex, this is a simple approach, but at least it considers an interaction, even
though qualitative.
A possible improvement is taking the pseudoatom distance as another variable, but it can
be done only on a limited set of molecules, due to its computational weight. The pseudoatom
is inserted as an extra little mass, that is kept in place by strong forces. In this way, we can
obtain geometries in good accord with more accurate QM/MM calculation and with PDB crys-
tallographic structures.
Posing is trying the fit of different poses (conformations) of the ligand. It is usually very efficient,
and it can be
rigid when both ligand and enzyme are not allowed to vibrate: only rototraslational de-
grees of freedom are explored
flexible when it is not rigid
Scoring is instead the assigning of a score to the different molecule geometries; it is the weak
point of docking procedures, since it employs an approximated binding energy function as
score function, so that this procedure could be applied to a large number of molecules.
25
Ideally, docking procedure should be able to create an accurate ranking of binding energies
and ligands, with a direct link to biological activity. In truth, since the scoring is inaccurate and
binding energy does not always correlate to in vivo biological activity, all of this is impossible.
What docking can do is enriching the molecular database of active compounds by reducing the
number of candidates (statistically) and analysing the binding modes and processes. In fact, the
set of molecules selected by these procedures is statistically richer in active compounds than the
random selected set.
As far as we are concerned right now, adding an extra atom to describe XB is very easy and
doable in these procedures.
where the angle-dependence is in the cosine function, α = 180◦ − θ1 (θ1 is the interaction angle)
and hRvdw (Br)i is the effective bromine Van der Waals radius.
Anisotropy is considered also in the electrostatic term, by modulating the halogen charge as
ZBr = A cos(να) + B
so that the charge can depend from the angle in a way that describes the σ-hole. The final FF
functional form is able to reproduce QM results with acceptable accuracy, but other descriptions
are possible.
By confront with EPC results, we understand that SAM is more accurate and capable of
describing the physics of the system, at the cost of a much more difficult implementation. On
the other hand, EPC takes into account only the electrostatic term, but it is easier to implement
and accurate enough for many things.
26
Chapter 3
But before we deal with the third point, we have to get accustomed to the in silico use of sta-
tistical mechanics1 . Statistical mechanics relates the microscopic mechanical variables (q, p)
(microstate), with the macroscopic thermodynamical variables (N , V , p, T , µ, . . .) (macrostate).
In doing so, a key concept is that of phase space as the space defined by the 6N (3N generalized
coordinates plus 3N generalized momenta) degrees of freedom that describe the system.
We are going to discuss this space in a rather common example, the one dimensional harmonic
oscillator; the total energy of the system is given by
p2 1
E =K +V = + Kq 2
2m 2
An isolated system has E =constant, but we are going to fist in a little bit of indetermination,
taking instead E ∈ [E ; E + δE]. The phase space image of the HO is an ellipse; if we consider E
constant, we get a line, otherwise we get a little elliptical crown. Classically speaking, the phase
space is continuous, but the indetermination principle
~
∆x∆p ≥
2
does not allow it in quantum mechanics. As we can see from the formula, ~ has the same units
of measurement of an action, but also of a volume in the phase space, that is therefore granular.
1 For a rigorous introduction of statistical mechanics, see G. Mandelli - Introduzione alla Fisica Statistica
27
By solving the motum equations, we get
(
x = A cos(ωt + ϕ)
p = −mAω sin(ωt + ϕ)
This means our mass-on-spring oscillates at ω frequency, with an amplitude A and a phase ϕ.
By substitution, we get
mω 2 A2 K
E= sin2 (ωt + ϕ) + A2 cos2 (ωt + ϕ)
2 2
1 2
= KA sin (ωt + ϕ) + cos2 (ωt + ϕ)
2
2
1
= KA2
2
Each point of the phase space can be labelled as Γ(q, p), and any property of the system will
be a function of Γ:
A = A(Γ)
Let’s suppose we are following the time evolution of the system. The observed value of A will be
the time average of all the values we get:
1 to
Z
Aobs = hA(Γ(t))it = lim dt A(Γ(t))
to →∞ to 0
The time evolution is followed by integrating Newton’s equations step by step, by taking a time
interval δt = to /τo , where τo is the number of steps we are going to take. Therefore, we can
calculate
to /δt
1 X
Aobs = A(Γ(t))
τo τ =1
o
Alas, time average is not the usual formulation of statistical mechanics, that were based by
Gibbs on ensemble average. In fact, we can take an infinite set of system replicas and then take
the average on these; the collection of replicas is also known as ensemble, and it corresponds to
a collection of points in the phase space; each replica is represented as a point in there. This
way, we can find a probability distribution that describes how the system is disposed; we call it
probability density ρens (Γ).
As the system evolves, ρens changes with time, but its total time derivative is zero:
dρ
=0
dt
In principle, the partial time derivative is null only at equilibrium
(
∂ρ = 0 at equilibrium
∂t 6= 0 elsewhere
This represents a sort of conservation law that undergoes the name of Liouville’s theorem:
dρ ∂ρ X ∂ρ ∂qi ∂ρ ∂pi
= + +
dt ∂t i
∂qi ∂t ∂pi ∂t
28
Where the total derivative consists in following the system in a cruise along the phase space,
while the explicit, partial derivative means we are staying in a small volume ρdqdr and we look
at the number of system entering and exiting.
In this formulation of statistical mechanics, we can evaluate an observable value of expectation
as X
Aobs = hAiens = A(Γ)ρ(Γ)
Γ
The ensemble average yields the same result as the time average only if a trajectory can sample
any point of the phase space, given infinite time, i.e. if the system is ergodic. At this point,
the good news is that biological systems are all ergodic; the bad news is that the time required
to sample all the phase space is too big for current computers. Therefore, enhanced sampling
techniques are applied. Since at the end it is not easy to exploit ergodicity, we divide this methods
in
In principle, both methods require an infinite sampling, that is an infinite simulation time or
an infinite number of replicas, respectively. Ergodicity ensures us that
hAitime = hAiensamble
Ergodicity means that from any point of the phase space the system can reach any other point
of the space: a non-ergodic phase space, therefore, is divided in different regions that do not
communicate. However, due to computer limitations, it is not so easy to exploit ergodicity;
high barriers or bottlenecks may take a lot of time to be crossed, making it difficult for normal
molecular dynamics simulations. This means that molecular dynamics and Monte Carlo may
differ even for in an ergodic system.
isothermal isobaric where N , P and T are constant; it represents a system in contact with
both an heat reservoir and a barostat
grancanonical where µ, V and T are constant; it represents a system in contact with an heat
reservoir and able to exchange matter with the universe
Even if isothermal isobaric is the ensemble more relatable with usual chemical experiments,
we will focus more on the canonical one, because it is more straightforward. Nonetheless, we are
going to illustrate the peculiarities of each ensemble.
29
3.3 Partition functions and thermodynamic potentials
Usually, we can write the probability density as the ratio between a weight Wens and a normal-
ization factor Qens , both relative to the ensemble:
Wens (Γ)
ρens =
Qens (Γ)
where the normalization factor goes under the name of partition function and it corresponds
to the sum of the weights: X
Qens = Wens (Γ)
Γ
From the partition function, we can obtain any other physical quantity; in order to compute
it, however, we need to know everything about the system, that makes the direct calculation a
rather cumbersome task: this is why we employ simulations.
Indeed, the average of any observable can be obtained as
1 X
hAiens = Wens (Γ)A(Γ)
Qens
Γ
Finally, the thermodynamic potential is the greatness that links microscopic to macroscopic, and
it can be evaluated2 as
Ψens
= − ln Qens
kB T
30
Canonical The canonical probability density is proportional to an exponential function of the
hamiltonian H(Γ):
H(Γ)
ρN V T ∝ exp −
kB T
The partition function is therefore the sum
X H(Γ)
QN V T = exp −
kB T
Γ
where Qid N V T is the canonical partition function of an ideal gas made by the same particles and
QexN V T is the excess partition function, that takes into account the non-ideality of the system,
i.e. the potential energy. Sometimes, the term
Z
V (q)
ZN V T = dq exp −
kB T
is called the configuration integral; this is what we sample with a Monte Carlo sampling.
Finally, the chemical potential is
Ψ A
= − ln QN V T = −
kB T kB T
where A is the Helmholtz free energy.
31
Finally, the chemical potential is
Ψ G
= − ln QN P T = −
kB T kB T
where G is the Gibbs free energy.
We can write
1 ∂ ∂
hEi = − QN V T = − ln QN V T
QN V T ∂β ∂β
Secondly, the probability of a single state is the ratio between the number of systems in that
state over the total number of systems:
ni e−βHi
Pi = =
N QN V T
The last term is called Boltzmann distribution and it is easily derived by considering that
Pi = ρN V T for the i-th state.
In a two-states system, the ratio of each population is
n1 e−βH1 QN V T
= −βH2
n2 e QN V T
−βH1 βH2
=e e = e−β∆H
32
By reversing, we obtain
n1
∆H = −kB T ln
n2
This makes our simulations quite crucial, because they allow us to get each state population,
from which we get the energy gap between the two. Now, a device is required that will allow us
to efficiently sample the phase space.
33
Chapter 4
Molecular Dynamics
In this chapter, we will discuss the main algorithms and techniques employed
to follow the time evolution of a molecular system, by application of force
fields and classical mechanics
d2 q
mi = Fi
dt2
where Fi = −∇i U ; U is the potential energy evaluated through force field. Since position and
force are vectors, for an N -particles systems we have to solve 3N second order, ordinary, differ-
ential equations.
On the other hand, we can start from
dqi pi
=
dt m
so that
dpi dvi
= mi = Fi = −∇i U
dt dt
This gives us 6N first order ordinary differential equations, corresponding to the Hamilton’s
equations of the system. Alas, to solve the dynamic numerically, we employ a finite difference
method, in which from the position and its derivatives at t we can estimate them at t + δt,
with δt as a small finite time increment. We can employ the following different variations of this
method.
Verlet algorithm We start from the forward and backward Taylor expansions of the position
1 1˙ 3
r(t + δt) = r(t) + ṙδt + r̈δt2 + r̈δt + o(δt3 )
2 6
1 1˙ 3
r(t − δt) = r(t) − ṙδt + r̈δt2 − r̈δt + o(δt3 )
2 6
34
If we sum them, we obtain
This formula, containing the acceleration a = r̈, is precise up to the third order, but it has some
problems. First, velocity does not appear; this is a problem if we need kinetic energy or constant
temperature (canonical ensemble); we can solve this problem with the formula
1
v(t) = [r(t + δt) − r(t − δt)]
2δt
This creates a lag between the knowledge of the position and that of the velocity. Finally, r(t+δt)
is the sum of a first order term and a second order term, so a small term is added to a big one: this
is a possible source for numerical errors. Figure 4.1 is a graphical representation of the algorithm;
as we can see, the knowledge of the previous step is required, so the first step is always done with
another integration methods, as imprecise as we want. Verlet is then applied from the second
step.
This way, there is no mixing between first and second order. A graphical representation is pre-
sented in Figure 4.1.
35
Figure 4.1: Various forms of motum integration; (a) Verlet algorithm (b) Leap frog algorithm (c)
Velocity Verlet algorithm
36
where the small perturbation ε is expected to carry a very significant effect. This effect can be
evaluated as
∆r = r(t) − r0 (t)
For short simulations, ∆r is linear with ε, but its absolute value increases exponentially with
time:
|∆r| ∝ εeλt
The λ coefficient is also known as first Lyapunov exponent; indeed, this kind of instability is
called Lyapunov stability, i.e. the solutions are highly dependent on the initial conditions
(like, e.g. weather forecast).
To keep ∆r below a critical threshold ∆max for a time t < tmax , we have to limit the initial
perturbation to
ε ∝ ∆max e−λtmax
This means that for a long simulation we can just afford a small perturbation: our MD simulation
are not at all tolerant to perturbation. In this case, we should review why we resorted to MD
in the first place. MD trajectories are in fact run to sample the phase space, so that in long
simulation the trajectory returns on her path multiple times; however, the sequence of sampled
points is not relevant, as long as we are sampling them: the difference in paths is irrelevant and
the time average must be equal. If that is not true, it is due to the lack of convergence, not of
stability.
1. long range interaction (r−n with n ≤ 3) require special tricks to avoid interactions between
one atom and its replica
37
Figure 4.2: Pictorial representation of the periodic boundary condition setup
38
2. the potential energy now should take into account all the infinite interactions between the
infinite number of particles
To solve this mess, the minimal image convention is employed: each particle only interacts
with the ones contained in a box of the same size and shape of the original box, centred on the
particle itself. This means that each and every particle interacts with the closest periodic images,
just its closest periodic images and nothing more.
Quoth the raven nevermore
In this fresh new virtual box, the number of particle is the same as the original box, so for
an N -particles system, each of them interacts with the N − 1 other, for a total of
1
N (N − 1) ' N 2
2
interactions. This is now a workable number, but it can be further reduced. At the end of the
day, each atom interacts with its nearest neighbours, so we can define a sphere that contains
all of them. The radius of this sphere is known as cut-off radius; outside of it there are no
interactions, but it must be shorter than one half of the box side. Clearly, the larger the box, the
more we save by using it. The cut-off radius can be introduced in a lot of colourful and different
ways, some of which are illustrated hereafter.
Simple cut-off It is the most basic, and it employs an Heaviside function θ(rc − rij ):
U 0 (rij ) = U (rij )θ(rc − rij )
where U (rij ) is the original potential and the Heaviside function is defined as
(
1 for x > 0
θ(x) =
0 for x < 0
The rather obvious discontinuity makes this potential a real cat to skin when it comes to motion
integration.
Switch approach We can smooth the previous approach by switching the potential to zero in
a gentler way. To do this, we employ not one, but two cut-off radii. Analytically, this is done by
considering
U 0 (rij ) = wij S(rij )U (rij )
where wij is a simple weight and S is the switch function defined as
for r ≤ rc1
1
S(rij ) = f (r, rc1 , rc2 ) for rc1 < r < rc2
for r > rc2
0
Shift approach Instead of cutting and stitching some functions together, we apply a pertur-
bation to the potential, so that it gets to zero at the rc . The analytical form is the same,
U 0 (rij ) = wij S(rij )U (rij )
but now 2
r
S(rij ) = 1 −
rc
This is the smoothest approach, but we are altering the potential: numerically better, physically
worse.
39
4.5 (En)sampling different ensembles
Since experimental data are obtained in different ensembles from microcanonical, we need meth-
ods to sample different ensembles. We show some of these methods afterwards.
Canonical sampling In the canonical ensemble the temperature is constant, due to a thermo-
stat. This forces a modification in the motion equations, modification that – as always – comes
in different tastes and colours.
A rather crude, rough-as-hell approach starts from the definitions of kinetic energy, both as
in classical mechanics as in kinetic gas theory:
N
1X 3
K= mi vi2 = N kB T
2 i=1 2
this leads to
N
1 X
T = mi vi2
3N kB i=1
vi0 = λvi
40
With δt = τ we clearly get back to the previous thermostat. The exponential convergence comes
from solving the differential equation
dT 1
= (Tb − T )
dt τ
by variable segregation. Breafly,
dt
dT = (Tb − T )
Z Zτ
dT dt
=
Tb − T τ
∆t
− ln |Tb − T | =
τ
|Tb − T | = e−∆t/τ
A large τ therefore means a weak thermostat-system coupling, and vice versa. As a rule of thumb,
during equilibration phase a small τ is employed, to switch to a large τ in the production phase,
so that we don’t strain the system in equilibrium conditions.
Another approach is known as Anderson’s thermostat, and its main goal is to reproduce
the physics of the heat exchange. Microscopically speaking, the atoms will bump into each other,
resulting in a change of kinetic energy, that is a temperature change in itself. To model this, at
regular interval of the dynamic, we change the speed of a random set of molecule with the speed
extracted by the Maxwell-Boltzmann distribution at Tb ; every time, we perturb the system, jump-
ing from a microcanonical PES to another, until all the particles reproduce Maxwell-Boltzmann
distribution at the target temperature.
Of the three methods, the first one is the less used, while the Beredsen one is not able
to fully reproduce the canonical ensemble; only the Andersen thermostat is able to reproduce
the canonical distribution after a given equilibration time. Obviously, other methods exist, like
changing directly motion equations or surrounding the system with a thermostatic fluid.
41
4.6 Verlet neighbour list
As we have just seen, periodic boundary conditions are a very useful tool in molecular dynamics.
They can be implemented through various and different boxes, like cubes, truncated octahedrons
or rhombic dodecahedron. Carefully choosing the box allows a reduction in the amount of solvent
to be computed; moreover, whatever box we choose, the specie of interest must not feel its own
replicas.
Previously, we introduced minimum image convention and cut-off radius to furtherly reduce
the amount of interactions we have to compute. We can estimate that for an L-sided box,
implementing a cut-off radius rc will produce a net computational gain of
4 rc3
gain =
3 L3
That is, the bigger the box, the higher the gain.
However, inserting a cut-off radius means estimating which molecules are in and which are
out: this requires calculating distances –i.e. square roots– between all molecules, with a significant
computational effort. But clearly some particles are too far away to be inside the cut-off radius
in the near future, and only the nearest one’s distances are relevant for the cut-off. By selecting
those particles that can get into the cut-off radius in a few integration steps, we are compiling
the so-called Verlet neighbour list; in practice, it contains all molecule inside a spherical crust
that could cross the cut-off sphere in somewhat between 10 and 20 steps. This way, instead of
computing all the distances, just those of the Verlet list are calculated to update the cut-off list;
after a certain time interval, the Verlet list itself is updated, by computing all distances. It is
obvious at this point that both crust size and refresh time of the Verlet neighbour list depend
on the temperature.
An alternative to this is the cell structure: the box is divided in cells, so that we can look
for neighbours only in those cells that surround the particle box; this way, we reduce significantly
the number of calculations.
where the first one is for those interactions in the cut-off radius, while the second one is an
estimate of everything outside, a sort of mean field in polar coordinates, multiplied by the density
ρ. For short range interactions, the second term can be dropped, but this is not possible for long
range ones. To evaluate this, we will proceed similarly to this example.
Let’s consider two positive charges at a certain distance. We can add and subtract a gaussian
of charge to each of them, i.e. we can put a diffusive gaussian of positive charge and one of
negative on each of them. This way we preserve the total charge, but we change the potential in
U = Um + Ug + Ugc
where
42
Um is the dipole-dipole interaction that arises between the two dipoles made by the positive
charge and the negative gaussian. It is short range and can go with PBC
Ug is the interaction between the gaussian charges. It is more easily evaluated in the reciprocal
space, so after Fourier transform; by anti-transform we then obtain the potential term
Ugc is the interaction between the charge and its own gaussians. It is constant since it does not
depend on distance, so it is evaluated one time for all
With this approach, charge interaction is splitted in three term, but it is made compatible with
PBC (rather obviously, since it was developed for crystallography). All in all, it goes under the
name of Periodic Mesh Ewald approach.
Constraints fix the value: motion equations are modified to respect this. They are employed
mostly to freeze degrees of freedoms to make longer steps a viable option
Restraints add a penalty for any variation of the value, without modifying the equations but
just the force field. They are put to preserve geometry in equilibration phase or to reproduce
experimental data (like NMR in NMR refinement)
43
Angle restraints are applied to enforce particular geometries, thanks to a goniometric func-
tion:
Var (ri , rj , rk , rl ) = Kar {1 − cos [n(θ − θ0 )]}
This formula means that this kind of restraint is applied to four atoms. The θ angle is obtain
through the inner product:
rji • rkl = |rji ||rkl | cos θ
by inversion:
θ = arccos (r̂ji • r̂kl )
where the r̂ notation indicates a versor. The multiplicity number n allows to distinguish between
parallel and antiparallel reciprocal configuration. n = 2 means that parallel and antiparallel
configurations are equivalent, while n = 1 discriminates between them. In the same way, we can
restrain the orientation of a molecule, by fixing the angle between one of the axis:
Distance restraints are to keep into account experimental data, usually NMR proton dis-
tances. The most trivial way is imposing a virtual bond between these atoms, by an harmonic
potential. However, this method does not consider that experimental data come with experimen-
tal uncertainty, so we should restrain the atoms into an interval, rather than to a position. To
do so, we apply a piece-wise potential, with no penalty in the interval:
1
Kdr (r − r0 )2 for r < r0
2
for r0 ≤ r ≤ r1
0
Vdr = 1
Kdr (r − r1 )2 for r1 < r < r2
2
1 Kdr (r2 − r1 )(2r − r2 − r1 ) for r ≥ r2
2
The linear potential is applied so that very wrong starting geometries are not subjected to very
strong forces; the linear potential gently guides the molecule into the restraint well. Obviously,
F = −∇V
In principle, this method works, but it has a limit: very flexible structures tend to fluctuate
during NMR data collection, so what we get is an average of a lot of geometries; this means
that all these restraints cannot be simultaneously satisfied, but the molecule should be allowed
to move.
This kind of restraint kills flexibility, so a time average one must be implemented. This is
possible just by considering the average distances r instead of the particular ones (r). Since MD
is a time evolution method, r is easily available, but on the first steps its value has little to none
meaning: we need some tool to enforce the restraints only when enough history is collected; this
requires a time-dependent constant:
h i
Kdra
= Kdr 1 − e−t/τ
where τ is the coupling constant that tunes the switching speed (a big τ is a small coupling). In
this way, MD is easily adapted to the necessities of NMR refinement. In fact, NMR yields a lot
more of compatible structures then the single X-Rays one, since its data represents what really
happens in solution: the molecule is free to move, while in crystalline phase is frozen in place.
44
4.11 Constraints only
Constraints are employed to freeze uninteresting coordinates, like vibrations; this way, longer
steps can be made. A simple way to apply constraints is to flip to normal modes and propagate
just those we are interested in, but this approach is only viable for small molecules. What we
do, then, is solving Newton’s equations without constraints first, and later applying a virtual
corrective force to keep the bond length to the given value:
mi r̈i = fi + gi
for each i-th atom, where fi is the natural force and gi is the constraint force. At this point, the
constraint equation is solved, to determine the magnitude of gi :
2
χij = rij (t) − d2ij = 0
basically, at each time step we make sure that the bond length rij is equal to the target bond
length dij .
Since the motions equations are solved through approximation, the constraints carry a small
error too; these errors may deviate our trajectory, so it is better to include them at each step of
the dynamics, by including them into the algorithm. In formulas, we have
ma r̈a = fa + ga ' fa + ga(2)
(2)
where ga is the approximated correction force on the a-th atom. This can be inserted in Verlet
algorithm, obtaining
δt2 δt2 (2)
ra (t + δt) = 2r(t) − r(t − δt) + fa + g
ma ma a
δt2 (2)
= r0a (t + δt) + g (t)
ma a
in which we indicate the position of the atom with no constraints as r0a .
Obviously, the constraints must be directed as the bond and they must follow Newton’s law;
we can express each of them through an indeterminate multiplier, λij . For a system with i-j-k
connectivity, this boils down to
(2)
g = λ12 r12
1
(2)
g2 = λ23 r23 − λ12 r12
(2)
g3 = −λ23 r23
So that for each atom, the Verlet formula is applied:
δt2 (2)
ri (t + δt) = r0i + g (t)
mi i
(2)
where gi is one of the aforementioned forces. At this point, we can calculate
rij (t + δt) = ri (t + δt) − rj (t + δt)
δt2 δt2
= r0i (t + δt) − r0j (t + δt) + gi + gj
mi mj
δt2 δt2
= r0ij (t + δt) + gi + gj
mi mj
45
With this, we can work out λ, by solving the constraint equation:
|rij (t + δt)|2 = |rij (t)|2 = |dij |2
This formula means that for each time step, the rij we get from Verlet must be constant and
equal to the target distance. We can solve this problem only by iteration, because through the
trinomial square:
(a + b + c)2 = a2 + b2 + c2 + 2ab + 2bc + 2ac
we get a system of second grade equations in λ, that cannot be solved directly; however, since λ2
terms are quartic in δt, they are negligible in first approximation and we can iterate this solving
algorithm:
1. λ2 terms are neglected, leaving us with linear λ equations that can be solved directly
2. we put the results in λ2 , obtaining linear λ equations with different known coefficients
3. these coefficients are employed in the next step, iterating
4. the algorithm is stopped when point-wise convergence is reached
λij (n + 1) ' λij (n)
This way, we can see that enforcing constraints boils down to solve a system of algebraic equations
by iterations; this approach may require the inversion of a matrix, but since only the nearest
neighbours are taken into account, the matrix is full of zeros. An alternative to this algorithm is
the so-called shake algorithm, in which constraints are applied one by one.
Example 1. Let’s consider the S = C = S molecule. The number of vibrational constraints is
obviously 3 · 3 − 5 = 4, but we can actually just constraints the two S = C bonds and the S-S
distance.
Example 2. The molecule of benzene has 6 · 3 − 6 = 12 vibrational constraints to be applied;
unfortunately, since the constraints are directed as the bond, they may not respect the molecule
planarity, so more care is needed.
Example 3 (Velocity Verlet). Velocity Verlet requires to evaluate both half-time velocity and full
time velocity, becoming
ga 2
ra (t + δt) = r0a (t + δt) + δt
ma
1 ga (t) δt
va (t + δt/2) = va0 t + δt +
2 ma 2
1 2
va (t + δt) = va0 (t + δt) + ga (t + 2 δt) δt
ma 2
Therefore, this algorithm actually considers that the bond can move at each half time, correcting
with a new force.
46
RMSD Another relevant measure we can extract from MD simulations is the Root Mean
Square Deviation or RMSD:
s
1 X
RMSD(t) = mi |ri (t) − rref
i |
2
M i=1
where M is the total mass and rref i represents a reference geometry for the molecule. RMSD
gives us informations on how much and when the structure changed. Obviously, ri are internal
coordinates, since the molecule can move inside the box.
In RMSD evaluation, usually only the backbone is employed, since substituents and side-
chains are very flexible. They can be adjusted with least square fitting later. If ligands are to be
considered, it is better to use least square fitting on the protein geometry and then to evaluate
the RMSD of the ligand.
RMSF The Root Mean Square Fluctuation (RMSF) is the time average of geometry differ-
ences: v
u
u1 X T
RMSFi = t |ri (t) − rref
i |
2
T t=1
This value is related to thermal ellipsoids and it can be used to estimate the molecule flexibility.
Radius of gyration The radius of gyration can be calculated with formulas like:
P 2
i |ri |mi
Rg = P
i mi
It represents the compactness of the structure, measured as distance from the center of mass. A
big Rg means the molecule is linear, a small one means is globular; it has clear applications in
protein folding analysis.
Distance in the structure Sometimes, it can be useful to know the distance between two
given amino acids, because it can be related to folding or catalytic activity. We can look at all
of them through the symmetric distance matrix, that stores all the distances values.
Hydrogen bond The presence of an H-bond is verified by checking the following geometrical
criteria:
r ≤ rHB = 0.35 mm
α ≤ αHB = 30◦
The most interesting thing, though, is how much time this bonds last and how many times they
appear in a specific position during the dynamic.
Secondary structure The secondary structure is basic for understanding protein folding,
stability and structure. It can be checked through various geometrical parameters.
Ramachandran plot For each couple of amino acids, the two backbone dihedral angles ϕ
e ψ can be plotted, obtaining the so-called Ramachandran plot. I’m sure it conveys a lot of
informations, but I don’t know how.
47
Time evolution Since MD follows the time evolution of the system, we can obtain not only
statistical informations, but dynamical ones too.
48
Chapter 5
In this chapter, we will discuss the techniques and algorithms of the Monte
Carlo family, that allow the ensemble sampling of a molecular system
r2 π π
Acs = = As = r 2 = 1
4 4
therefore,
Acs
π=4
As
If we generate a couple of random numbers between 0 and 1, some of them will fall inside the
circle and some of them outside; it’s rather obvious that
τhit
π'
τshot
where τhit is the number of points that falls into the circular sector, while τshot is the total amount
of points. This method requires 2τshot random number, but converges with an error proportional
√
to τshot ; before convergence, the value of π will oscillate above and below the exact number.
This technique is quite expensive.
49
5.2 Monte Carlo sample mean
This is a neat improvement over the hit or miss. To evaluate the integral
Z x2
F = f (x) dx
x1
50
to evaluate the function 10 times we need to evaluate it 3000 times. All considered, Monte Carlo
sample mean gives us easy access to the average of an observable
Z
1
hAiN V T = dΓ Ae−βU (Γ)
ZN V T
τX
max
This is an improvement on hit or miss, but we need more! We want more! We crave for more!
where ρN V T is the canonical probability density and ρ is another probability density, like the
uniform one employed until now. However, we are not obliged to stick to uniform distribution,
so we could use ρN V T as ρ, getting to
AρN V T
hAiN V T = = hAitrials
ρN V T
Now we can just estimate the integral as an average over the trials, given that we are able to
reproduce ρN V T with random numbers; in fact, we just rephrased the problem: now we need to
generate a non-uniform distribution with random numbers. The solution to this dilemma was
given by Metropolis: it is indeed possible, but we have to generate it through a markovian chain
of events. This means two conditions must be verified:
1. only a finite number of outcomes is possible for each trial. This is not a problem, since in
silico space is already granular, so we have big yet finite number of possible configurations
2. every new configuration must the depend only and just only on the previous one (we recall
that in MD all configurations depend on the starting one)
By following these prescription we can generate a set of configurations that, although they are
not time related, describe ρN V T perfectly. For each couple of configurations Γm and Γn , we can
define πmn as the probability of the transition m → n. Altogether, the πmn probabilities are
stored in the transition matrix π.
51
Example 5 (From the early age of computer). At the dawn of times, computers were not so
reliable as they are now; each day, they could work (↑) or not work (↓). The transition matrix of
one of them could have been:
π↑↑ π↑↓
π=
π↓↑ π↓↓
The probability of the second (fictional!) step will be
ρ(2) = ρ(1) π
At the n-th step, we get
ρ(n) = ρ(n−1) π = ρ(n−2) π 2 = . . . = ρ(1) π n
For a very long chain of events, we can hope to get to a limiting distribution ρ, defined as
ρ = lim ρ(1) π τ
τ →+∞
To have all of this, our MC simulations need an irreducible ergodic transition matrix, that means
that every point in phase space must be reachable from every point, or –as we previously said–
that our system must be ergodic.
52
So thanks to this new condition and the pre-existence stochastic nature of π, we get to the
Markov condition. To be noted that, although the microscopic reversibility implies the Markovian
evolution (it is sufficient), the contrary is not true (it is not necessary).
At this point, Metropolis implemented the asymmetric solution, by defining the transition
matrix as
πmn = αmn for ρn ≥ ρm (Un ≤ Um )
ρn
for ρn ≤ ρm (Un ≥ Um )
π
mn = αmn
ρm
X
πmm = 1 − πmn
n6=m
This way, everything depends on the underlying symmetric matrix αmn , and it does not depend
on the partition function. An uphill move is possible, but it is modulated by the ratio of the two
states probabilities.
To understand how this algorithm is applied, let’s consider a gaseous system. We modify
the system by moving a particle of a random displacement, and then we verify if the move is
accepted or not. First thing first, the random displacement is obtained as a random fraction
of a fixed maximum displacement δr; since the displacement is actually granular, only a finite
number of outcomes is possible, since the maximum-displacement box is divided in many small
boxes, due to machine algebra necessities. The probability of this displacement helps us define
the symmetric matrix αmn ; for a maximum-displacement region R, made of NR small boxes, we
get
1
if r ∈ R
αmn = NR
0 if r 6∈ R
where r is the coordinate of our particle center of mass. At this point, we need to accept or refuse
the move; if Un ≤ Um (that is ρn ≥ ρm ), we accept the move with no issue, setting πmn = αmn .
On the other hand, if Un > Um (that is ρn < ρm ), we will accept the move with a probability
equal to ρn /ρm .
Let’s try to understand the value of this ratio. By Boltzmann theory,
ρn e−βUn QN V T
= = e−β∆U
ρm QN V T e−βUm
where ∆U = Un − Um . This means the acceptance probability of an uphill move is e−β∆U . The
acceptance is therefore estimated by drawing another random number, ξ. If
In a more compact way, this hole algorithm can be implemented by defining the transition
probability as the minimum between 1 and e−β∆U :
P = min 1, e−β∆U
In this, temperature has an effect through the factor β: if the temperature is increased, the
Boltzmann exponential falls more gently, and for a given ∆U the probability of an uphill step is
increased; the opposite is true too.
Moving just one atom means we just need to re-compute its own interactions, leaving all the
others throughout the system unchanged; this is quite convenient, because ∆U of moving the
53
i-th particle can estimated as
X X
∆Ui = U (n) (rij ) − U (m) (rij )
j j
A matter of some importance is the magnitude of the maximum displacement δr; a big δr means
we are sampling a lot of phase space, but we are also more likely to push the particles into each
other, resulting in a refused –therefore wasted– move.
This brings us to the crucial issue of the ideal acceptance rate. Even if a 100% acceptance
rate seems optimal, this cannot be farther than the truth: in this eventuality, only downhill
moves are made, giving us a poor sampling. On the other hand, a 1% ratio is useless, because no
information can be extracted. Usually, the acceptance rate is system dependent, but it is never
greater than 50%, sitting between 20% and 30%; this way we are sure we are sampling a good
canonical distribution. In fact, the number of configuration explored is not important, but the
correct distribution is.
The length of the simulation is a matter of MC steps, although time does not appear in MC
simulation; this has no chronological meaning, since there is no time relation between the steps.
The energy obtained through MC sampling is just the potential energy, corresponding to the
excess partition function (or configurational integral); the ideal part, related to kinetic energy,
can√be recovered, but it is seldom necessary. The accuracy of a MC average is of the order of
1/ τ , where τ is the number of MC steps:
1
hAiN V T = hAitrials + O √
τ
The last point is which moves to take to generate αmn : displacements may not be optimal
for polymers, but torsional angles can create very high energy configurations, etcetera etcetera.
All of this must be considered if we want a proper sampling of our phase space, and it can be a
deciding factor between MC and MD; although theoretically equivalent, their results are actually
very dependent on the sampling quality, and one of the two methods could be better than the
other, depending on the system itself. For example, lattice systems, in which the particles can
only rest on the nodes of a cubic lattice, cannot be simulated with MD, but are perfect for MC
sampling.
ZN V T = V N Qex
NV T
54
This is what we consider through Monte Carlo techniques, since there is no kinetic energy. The
way we implement constant pressure is through scaled coordinates,
r
s=
L
where s is the scaled coordinate, r is the original one and L is the side of the box.
Now we want a Markov chain whose limiting distribution is ρN P T . We start from
X
QN P T = e−βP V Qid ex
N V T QN V T
V
and we want
X
Qex
NP T = e−βP V Qex
NV T
V
X
= e−βP V V −N ZN V T
V
X X
= e−βP V V −N e−βU
V U
X
−N −β(U +P V )
= V e
V,U
X
= exp {−β(U + P V ) − N ln V }
V,U
and the sums over V and U (r) are still there. Any observable average will have the following
shape Z +∞ Z
1
hAiN P T = dV e−βP V V −N ds A(s)e−βU (s)
QN P T 0
To simulate this ensemble, then, we have to randomly change both positions and box volume; to
generate a new state we will require
where m is the starting state, n is the final state, i is the particle counter and the two increments
are
This formulas allows forward and backward movement, compression and depression in volume.
55
Now to determine the move acceptance; we have to consider an enthalpy difference between
the two states m → n:
N Vn
δHnm = δUnm + P (Vn − Vm ) − ln
β Vm
This is not a genuine enthalpy, but it is close enough, so it will do. The acceptance probability,
as before, will be
P = min 1; e−βδHnm
For the canonical ensemble, ∆U was easy to compute, since it involved only a single particle
interactions; by changing the volume too, all particles are moved, so all interactions must be
evaluated. For this, volume changes are computationally heavy, so they are implemented every
n steps.
Noble gas For non-bonded systems, a simple trick allows us to implement this algorithm
analytically. The LJ potential for the m state is
!12 !6
X σ X σ 12 6
Um = 4 (m)
− 4 (m)
= Um + Um
ij Lm sij ij Lm sij
This way, the total potential energy variation is the sum of two contributes
tot vol displ
∆Unm = ∆Unm + ∆Unm
56
where int indicates the integer part, and a and m are both large integer numbers. This x random
numbers are ∈ (0 , m − 1): to move them into the right interval we divide by m:
xi
ξi =
m
This sequence will repeat after m − 1 times, so we take m as the largest number in machine
memory. Finally, a and m may be related to increase performance. This algorithm should give
us a good set of random numbers.
To check their quality, we can plot them two-by-two: if their quality is poor and there is a
sort of relation between them, we will see geometrical patterns in the graph.
57
Chapter 6
58
reversibility) through the equation
ρm πmn = ρn πnm
where the transition matrix π is the composition of the symmetric matrix α and the acceptance
probability P .
πmn = αnm Pmn
In parallel tempering, we have many independent simulations, composing a system of multiple
non-interacting systems, with
Y Y
Qtot = QN V T i ρtot = ρi
i i
The hamiltonian swap boils down to just another kind of MC move, so we have to enforce detailed
balance; since each state has a geometry and a temperature, we will indicate the transition
(m , βm ; n , βn ) → (n , βm ; m , βn )
as → and the opposite transition as ←; we then decompose ρtot in its factor, getting to
This enforces the detailed balance into the swap. The acceptance rate is defined as
P (→) exp{−βm Un − βn Um } Q
=
P (←) exp{−βm Um − βn Un } Q
= exp {−(βn − βm )(Um − Un )}
= exp {−∆nm }
P = min(1 , e−∆nm )
The acceptance of the hamiltonian swap is therefore associated to the energies of the two simu-
lations at the given temperature.
Exercise. Consider the acceptance probability in all the possible relative values of β and U
From this formula, we clearly get that the acceptance probability rapidly decreases as ∆T
increases, so far-away simulations are not practical. We can instead make multiple close simu-
lations, each of them exchanging configurations with its nearest neighbour; in enough time, an
high temperature configuration will swap its way down to low T . If we take a look at the energy
distribution function at different temperatures, we will notice that the higher the temperature,
the flatter the distribution is, and all of them overlap in their tails. Hamiltonian swap can only
happen in the overlap region. Therefore, different independent simulations must be close enough
that their energy distributions overlap, but not so close that we are sampling the same shit all
over again. The appropriate spacing depends on T , to take into account of the distributions flat-
tening, being little at low T and large at high T . Usually a geometric progression in employed,
imposing
Ti
= constant
Tj
The optimal acceptance rate is between 20% and 30%.
59
All of this may seem a little ineffective, but running multiple, short simulations at different
temperature may allow us to sample a lot more phase space than a single, trapped, long simula-
tion. In MC simulations, this boils down to adding a new kind of step after a certain number of
typical steps. To generalize it to MD, we just have to make the MC-style swap every n steps of
the simulation. The main advantage is the better sampling, but multiple sampling at different T
may be useful too. As far as MC is concerned, we don’t loose anything with parallel tempering,
while in MD we loose time evolution, because we turn our trajectory in a bunch of pieces glued
together. Finally, we can say that parallel tempering just speeds up the simulation, yielding the
same result of very long and inaccessible vanilla simulations.
Example 6 (Set of increasing barriers). Consider a set of equivalent minima, each separated by
yet a bigger barrier. If typical Metropolis MC is employed, the minima are not equally populated,
since the system spends a lot of his limited simulation time in the minima with the smallest
barriers. Instead, if parallel tempering is employed, we get the physical Boltzmann distribution,
in which each equivalent minimum is equally populated.
Figure 6.1: From right to left, top to bottom: the potential energy function; the Metropolis MC
population result; the Parallel Tempering population result; the space-time representation
60
Chapter 7
In this chapter, we will discuss about free energy, and the techniques that
allow us to evaluate it through molecular mechanics
This way, the free energy does not contain any conformational information; since we are
interested in them, we will write the free energy as a conformational property too, by introducing
the conformation Y (r):
Z
A(Y, T ) = −kB T ln dΓ exp {−βH(Γ)}
Y (r)=Y
This corresponds to a restriction of the intergration space, to the subspace generated by a certain
number of conformations Y . This free energy is associated to the probability of actually having
the Y conformation, p(Y, T ):
Z
1
p(Y, T ) = dΓ exp {−βH(Γ)}
QN V T Y (r)=Y
where QN V T is the integral over the entire phase space
Z
QN V T = dΓ exp {−βH(Γ)}
ensemble
61
The main advantage of probabilities is that they are additive if more configurations are consid-
ered:
p(Y1 ∪ Y2 , T ) = p(Y1 , T ) + p(Y2 , T )
while free energies are not
h i
A(Y1 ∪ Y2 , T ) = −kB T ln e−βA(Y1 ) + e−βA(Y2 )
On the other hand, free energy is smoother than the probability itself, making it simpler to read
and analyse; for example, relative minima are quite more easy to spot in a free energy plot than
in a probability plot. We have to recall that even relative and under-populated minima are useful
and relevant in describing the physics of the system; we do not discriminate. Alas, the main
problems with free energy are
• free energy is defined except for an additive constant: only variation is significant (third
principle of thermodynamics???)
• free energy is a function of all coordinates, but it is customary and useful to project it on a
variable subset. This may carry an approximation, since through projection we may loose
some information
Example 7 (Projection of two corrispondent minima). Let’s consider a bidimensional free
energy A(x, y), that presents two minima on the same x value, but with different y. If
we project A(x, y) on the x axis, this two minima will merge into a bigger, non-physical
minimum.
Free energy can also be decomposed in internal energy and entropic contribution:
Z
−βU (Y )
A(Y, T ) = −kB T ln e dΓ = U (Y ) − T S
Y
where p(Y, T ) is obtained through an extensive and complete sampling of the phase space; since
Q is constant throughout the phase space, it is usually neglected in computing free energy
variations.
The variation of free energy therefore amounts to
The probability p is simply the number of configurations Y over the total number of configura-
tions. This approach is simple, but it is effective only if an extensive and recursive simulation is
performed, as we can see in the following example.
62
Example 8 (Protein folding). Let’s consider a simulation of a protein folding. After a cer-
tain amount of time, the protein changes conformation, from unfolded to folded. After that, the
simulation is brought to an end.
If we don’t give enough time after the folding, it will result that the system spent the vast ma-
jority of its time into the unfolded conformation; this will mean an higher probability of unfolded
conformations, therefore an absolute minimum for the unfolded protein. On the other hand, if we
let the system go (let it go! Let it go!) for enough time after the folding, the entire result could
be reversed: the protein will spend a lot of time into the folded conformation, that will present an
higher probability and a lower minimum.
It is then obvious that the brute force results strongly depends on the simulation length,
unless the simulation is long enough to come back multiple times; this is only possible for simple
systems on super computers.
63
where VΓ is the phase space volume and all we added was an easy-to-demonstrate identity. As we
can see, high-energy contributions are now counterbalanced by exp {βH}, so they are no longer
negligible.
This makes obtaining thermal properties with MD or MC rather difficult, since the sampling
of high-energy regions is now strictly mandatory.
1. if the ligand is a good one, after binding it will remain bound, with a difficult barrier to
overcome in order to come back
2. if the ligand is a good one, the bound state will be favoured and highly populated, while
the unbounded one will be poorly described
Thermodynamic perturbation We start from the free energy gap, as seen before, and we
add and subtract the same quantity
dΓ e−βH1
R
∆A = −kB T ln R
dΓ e−βH0
dΓ e−βH1 eβH0 e−βH0
R
= −kB T ln R
dΓ e−βH0
Z
= −kB T ln dΓ e−βH1 +βH0 ρ0N V T = −kB T lnhe−β∆H i0
64
The subscript 0 here means that the average is on the state 0, i.e. it is obtained by sampling just
the state 0.
Obviously,
−∆A = A0 − A1 = −kB T lnheβ∆H i1
so
∆A = −kB T lnhe−β∆H i0 = kB T lnheβ∆H i1
This is called single step thermodynamic perturbation.
Example 11 (Solvation energy difference). We want to estimate the solvation energy difference
between MeOH and MeSH; with the power of the simulation, we can just consider a system in
which, magically, the oxygen atoms turn into sulphur ones: the ∆A of the process will be the
solvation energy difference, if we run the simulation in aqueous environment. To do so, we will
1. simulate MeOH in water. For each configuration we generate, we have the hamiltonian
2. change O with S by swapping the FF parameters, evaluating HS in each of the previous
configurations
3. work out exp {−β∆H} and compute the average on the MeOH system
H =T +U
∆H = T + U0 − T − U1 = ∆U
However, if the mass is different, what we will get is the excess free energy, that is actually what
experiments measure. For completeness sake, we add
1 1
∆A = − lnhe−β∆U i0 = lnheβ∆U i1
β β
65
At this point, we can see ∆A as an average of the probability distribution of ∆U sampled
phase space of one of the two states:
Z +∞
1
∆A = − ln e−β∆U P0 (∆U ) d∆U
β −∞
(∆U − h∆U i0 )2
1
P0 (∆U ) = √ exp −
σ 2π 2σ 2
with
σ 2 = h∆U 2 i0 − h∆U i20
what we get into the integral is another gaussian function, no more normalized and shifted by
βσ 2 :
2 2
Z +∞ ( )
C ∆U − h∆U i 0 + βσ
e−β∆A = √ exp − d∆U
σ 2π −∞ 2σ 2
with
1
C = exp −β(h∆U i0 − βσ 2 )
2
This means that – in order to correctly sample this gaussian – we need at least a symmetric
interval of radius 2σ centered on the distribution center: more sampling to do! Analitically, this
integral brings to
1
∆A = h∆U i0/1 − βσ 2
2
where 0/1 means we can use one of the two averages. This formula applies only to gaussian
distributions, but it is very useful nonetheless; indeed, h∆U i0/1 can be positive or negative,
while −βσ 2 /2 is an always-negative contribution, we get to
∆A ≤ h∆U i0 ∆A ≥ h∆U i1
These inequalities hold for every probability distribution, and they give us a working interval
that is very helpful.
We can know introduce an expression called cumulant expansion2 ; with it, we write the
free energy difference as
+∞
1 X βn
∆A = − lnhe−β∆U i0 = (−1)n Kn
β n=1
n!
where the Kn are named cumulants and can be obtained through the formulas:
K1 = h∆U i0
K2 = h∆U 2 i0 − h∆U i20
K3 = h∆U 3 i0 − 3h∆U 2 i0 h∆U i0 + 2h∆U i20
... = ...
2 For further information on cumulants, and other statistical novelties, see G. Mandelli - Introduzione alla
Fisica Statistica
66
The gaussian distribution has Kn = 0 for n > 2, therefore it has a finite number of cumulants
and both an easy and exact cumulant expansion; truncating a distribution to the second order
means approximating it with a gaussian with the same average and variance. Adding more terms
may seem a good way to improve the approximation, but since convergence is rather slow, it is
not so great as approach. Therefore, we content with a second order approximation
β β
h∆U 2 i0 − h∆U i20 = h∆U i0 − σ 2
∆A ' h∆U i0 −
2 2
Example 12 (Charged particle). If we suppose
67
where
so everything is correct.
This way, we can break down a big jump in many small steps, so that phase space overlap is
big enough; forward free energy will be
steps N −1
−−→ X 1 X
∆A = ∆Ai = − lnhexp [−β(Ui+1 − Ui )]ii
i
β i=0
Each step works as a single step thermodynamic perturbation, so this approach is known as
multistep thermodynamic perturbation; this means we perform a simulation on state 0,
evaluating ∆U with state 1, then we run the simulation on 1 and evaluate ∆U with 2, and so on
until N , on which we do not perform any simulation. The same goes for the backward approach.
This method can yield good estimates of free energy difference, as long as the simulations are
good and long enough, and the states are close enough, with no hole between them. The way we
ensure all of this is by introducing an order parameter.
Pi = λi PN + (1 − λ)P0
This is possible only if the reference and target state share the same topology. However, if
we want the free energy difference of the ketoenolic tautomery of acetaldehyde, we need to take
into account the change of topology.
O OH
H
There are two approaches: single and double topology.
68
Single topology approach In the first approach, we can describe the two state with only a
topology, as illustrated:
H O X2
X1 C C
H H
This means that the reference state will have the X2 atom as dummy and the X1 as H, while the
target will have X2 = H and X1 = dummy. In between, there is a slow change in parameters, so
that the atom disappears on one side and appears on the other. Two problems arise:
1. the dummy or close-to-dummy atoms bonds may express strange behaviors, due to the
small charge or the small bonding parameters
2. the VdW parameter reduction may cause the dummy atom – with still a relevant charge – to
crash into another atom, leading to infinite energies. This is called end point catastrophe
Double topology approach To solve the first problem, the double topology approach may
be employed; in this case, both topologies exists, but the atoms that change are inserted in an
exclusion list that kills any interaction between them. This way, we can compute
Ui = λi U1 + (1 − λi )U0
This approach does not solve the end point catastrophe, so other tricks are needed. For example,
one can change the simulation window size, that is how much λ change, to make smaller steps
where we need higher precision; unfortunately, one can only hope, since this method seldom
works. Another approach is to remove the charge first (obtaining a “charge only” ∆A) and the
VdW parameter later (“VdW only” ∆A): this is possible, as always, because A is a state function.
This method does not double the work, as someone can think, since the new simulations we have
to run may reduce the steps we need to take.
69
and an half perturbation step backward
λi+1 → λi+1/2
These half steps are effortless, since at each full step we already have the trajectories, while no
trajectories are required for the half one. In this way,
This is all post-processing and no new simulation is required: we can try many different flavours
of this.
The number of steps required to obtained an acceptable ∆A is heavyly dependent on the
system in exam. However, halfstepping gives us a quick method to evaluate convergence. At first,
an N step simulation is run, obtaining a certan value of ∆A, namely ∆A1 ; by considering also
the half steps, we increase the number of simulation runs, but we do not waste what we already
have (that is, the trajectories at each step). By working iteratively, we can evaluate the number
of steps required for convergence; as always, convergence is when
Another way to improve ∆A evaluation is increasing the trajectory sampling quality. A good
way to do this is similar to parallel tempering: indeed, we can run the N simulations, each
with a different λ; at certain steps, we can attempt an exchange between trajectories, so that the
sampling of the phase space is improved; a probability controls the hop between the trajectories.
This method is known as Hamiltonian hopping.
∆A3
4 3
the sum of ∆A in the cycle is zero, since A is a state function.
Let’s now suppose that we want to evaluate the difference in binding energy between a certain
ligand LA and another one LB , with the same protein. For the example sake, think LA as an
alcohol and LB as a thiol. Initially, we can think of directly calculating the free energy of complex
formation, but we soon discover it is not as easy as said: even with multistep perturbation, what
we get are serious tecnical problems, that require solutions that are not efficient. On the other
hand, multistep perturbation allows us to change one ligand into the other, closing the following
thermodynamic cycle:
70
∆A1
P + LA PLA
∆A4
∆A3
∆A2
P + LB PLB
In this way, ∆A1 and ∆A2 are now accessible, since ∆A3 and ∆A4 can be computed as we
just saw; in fact
∆A2 − ∆A1 = ∆A4 − ∆A3
Since
∆Aabs
Rsol + Lsol RLsol
where the same reaction is considered first in gas phase then in solvent. The value of ∆Aabs is
difficult to obtain without enhancing methods, but all the other term are far easier to evaluate.
The ∆Agas , for example, is trivial, since in gas phase only the binding energy changes between
the two states. As for the solvation terms, some useful tricks make them easy to compute; at
first, we consider the sum over the cycle, that is
where the X → 0 process is called annihilation, and consists in the disappearing of the molecule.
In multistep approach, we can take X as reference and 0 as target, and then we slowly switch off
the parameters. This way, we obtain
71
Now we can split the solvation term, by making the ligand disappear first and the receptor later
so that
∆Aabs = ∆Asol (L → 0) − ∆Asol (LR → R)
This last equation represents another, much simpler cycle
∆Abind
R + L RL
∆A = 0
R R
This technique works pretty well, but ∆A = 0 is a critical assumption, since we are assuming
that between the free and the bound state, R does not change conformation and L may change
degrees of freedom.
• A small free energy difference is not easier to evaluate than a bigger one; in fact,
β 2
∆A = hU0 i − σ
2
So a small ∆A can be the small difference of two big numbers
• A good ∆A only comes from a good usage of the methods
−−→ ←−−
• ∆A = ∆A only in principle, because convergence may play some tricks
• While internal energy can be decomposed to great advantage, it is meaningless to decompose
free energy: the terms that make up the free energy, in fact, are not necessarely state
functions (for example, heat and work are not state functions, but their sum is). This can
be seen if we consider U = Ua + Ub , then
β 2
σa + σb2 + 2ρσa σb
∆A = hUa i0 + hUb i0 −
2
The double product spoils the state-functionhood, thanks to the correlation coefficient ρ,
that measures how much the two modes are interlocked. Decomposing the free energy can
yield valuable information only if we then follow a relevant physical path.
72
Chapter 8
In this chapter, we will discuss about the employment of free energy in our
everyday life
so Z V1 Z V1
∂A
∆A = A(V1 ) − A(V2 ) = dV = − p dV
V0 ∂V N,T V0
73
Due to the fact that the integral does not meddle with the derivative variable, we can switch
them, obtaining
Z 1 Z
kB T ∂H
∆A = − dλ dpdq e−βH(λ) −β
0 Q ∂λ
Z 1 Z −βh
∂H e
= dλ dpdq
0 ∂λ Q
Z 1 Z Z 1
∂H λ ∂H
= dλ dpdq ρN V T = dλ
0 ∂λ 0 ∂λ
where ρλN V T is the canonical probability density function. Since we proceed through discrete
steps, we deal with
N
X ∂H
∆A ' ∆λi
i=1
∂λ λi
Sometimes the hamiltonian derivative can be computed analytically, but most of the time it is
done numerically. As a side note, an experienced student may notice the striking resemblance of
this result to the quantum mechanical Hellman-Feynmann theorem1 .
The first effect may be tackled by turning the set of solvent molecules (explicit description)
into a continous dielectric medium with the proper dielectric constant (implicit description).
The second effect can instead be accounted into the FF, usually implicitly, sometimes even
explicitly. Finally, effects 3 and 4 require a modification in Newton’s equation, since they are
related to the system dynamic. In this case, the standard equation
ṗi = Fi
74
where γ is the friction coefficient, mi is the particle mass and η is a stochastic vector. The first
term added describes the friction dumping, proportional to p, but in the opposite direction (the
minus sign), while the second term models the scattering, through the random force η.
∆Gsolv
∆Gsolv
∆Gaq
Paq + Laq PLaq
From it, we get the binding free energy in water (or any other solvent) as
∆Ggas = ∆Hgas − T ∆S
Of these two terms, ∆Hgas is rather simple to obtain, while the enthropic one, that takes into
account the different states available and the change in the number of degrees of freedom, is very
difficult to compute, so much that it is usually neglected.
That considered, everything revolves around ∆Gsolv , that can be split in three terms:
where ∆Gelec is the electrostatic work to bring the charges from vacuum to the dielectric, ∆GVdW
is the Van der Waals term, that is crucial in implicit description, and ∆Gcav that is the work
necessary to make room for the charge in the dielectric medium. In the simpliest implicit solvent
model, we assume
∆GVdW + ∆Gcav = γSASA + b
where SASA is the solvent accessible surface area, usually the Van der Waals envelope, that
is the surface described by an H2 O Van deer Waals sphere rolling on the sum of the solute Van
deer Waals spheres. Finally, γ (not related to the friction coefficient) and b are experimental
terms.
q2
1
∆Gelec = − 1−
2a
75
If we have many well separated charges, so that their cavities do not merge, we just need to add
the variation in particle interaction due to the different dielectric scaling:
1 X qi2
1 1 X qi qj 1 X qi qj
∆Gelec = − 1− + −
2 i
ai 2 i,j rij 2 i,j rij
1 X qi2 X qi qj
1
=− 1− −
2 i
ai i,j
rij
This is actually very nice if we are interested in moving ions, but we want to move a protein,
that is to say, a bunch of charges close together, with communicating cavities; for this, we switch
to the Generalized Born model:
1 1 X qi qj
∆Ggbel = − 1 −
2 i,j F (rij , aij )
where q
F (rij , aij ) = 2 + a2 e−D
rij ij
with
2
√ rij
aij = ai aj D=
(2aij )2
It is easy to see that this model encompasses the other two; indeed, if we are moving a single
atom, i = j, aii = ai , rii = 0, so
q
F (rii , aii ) = a2i e−0 = ai
and we are back to the separated charges model. Obviously, the Generalized Born is also able to
reproduce experimental data, so it is more viable.
76
the large protein, but not for the small ligand, and it can be safely applied only in specific cases.
However, due to its affordability, this single-trajectory approach is very common.
When sampling is performed, explicit solvent is used, while in order to evaluate ∆Gsolv ,
implicit solvent models are applied, since explicit solvent algorithms for evaluating ∆Gsolv are
not worth the effort.
All of these approaches and techniques are mixed together into protocols; this means that we
can tailor a protocol to a specific case to increase the results agreement with experiment, thanks
to some eldritch error compensation. However, specific protocols are rarely transferable to other
systems, so more often standard protocols are applied, and we content with what we get.
where m stands for mutated and w for wild type. If the binding energy changes, that specific
residue is relevant for the binding, even if most often that not, changing the residue does not
yield any significant result. Since some protein-protein interactions are designed to be reversible,
it is even possible for the mutation to increase the binding energy. Alanine is preferred because
glycine has a very open Ramachandran plot, i.e. it adds a lot of flexibility to the protein structure,
creating a significant perturbation.
The residue contribution is quite impervious to obtain through experiment, since each mu-
tation requires significant effort. On the other hand, in silico simulations can be easily adapted.
The single-trajectory approach is adopted, so we run the complex dynamics, extracting all the
conformations we need of the complex and the protomers; at this point, we replace the relevant
residues with alanine, one by one, and we compute the ∆Gs. This is time consuming, but it is
actually a post-processing task, so it can be performed.
The underlying approximation is that the mutation does not affect the complex structure:
in the case of single residues, this is often true, but sometimes it is not; if this approximation
does not work (usually for charged residues), we will overestimate the mutation effect. In these
cases, we have to perform two trajectories, one for the wild type and the other for the mutated
complex.
It is easy to understand that this is quite computationally heavy, since scanning a total of
100 residues will require 101 trajectories in total; moreover, this approach possesses the same
problem of the three-trajectories approach, that is noise. For clarity sake, we have to state the
the three-trajectory approach for the evaluation of the binding energy is unaffordable for protein-
protein binding: only the single-trajectory approach is applied to evaluate ∆G, even when we
are following the mutated complex dynamics; only when we can assume that there is no effective
geometrical effect of the mutation, we can use the same – wild type – trajectory.
In the end, alanine scanning is a good semi-quantitative analysis of residue importance, and
some tailored protocols can reach a very agreement with experiment.
77
Chapter 9
Analysis of protein-protein
interactions
9.1 Categorization
We categorize protein-protein interactions as
obligatory when the protomers are always bound together; sometimes, the protomers are not
even stable on their own
transient when protomers are stable on their own and associate and dissociate dynamically
Transient interaction are often found in signal transfer systems, and they are very interesting as
target for drug interference.
78
In fact, some mutations do not affect the binding energy, while some other are quite effective;
this is explained in the so-called hotspot picture: the binding energy does not spread on the
all surfaces (like in a velcro), but it is concentrated only in specific spots of the surfaces. Some
of these hotspots are cooperative, that is their simulataneus mutations yield more effect, while
some other are simply additive. Nonetheless, the hotspot picture makes drug interference much
more viable.
The rim residues task is to protect the core from solvent perturbation, so they present some
variability between similar complexes (most often than not, the same complex is different from
specie to specie), while the core residue are always the same along all the species.
We can measure this variability with the residue entropy s(i),
X
s(i) = − p(k) ln p(k)
k
where p(k) is the probability of finding the amino acid in that position; we have p = 1 for total
conservation. This formula of entropy, derived from information science, it is called Shannon’s
entropy1 . Average residue entropy stands between 0 and 3 kJ K−1 . Low entropy is typical of high
conservation residues, like those in the core which are critically important for the interaction,
whereas high entropy is typical in rim residues, that can be easily substituted and have low
conservation. Mean residue entropy hsi can be given in both the crude and the weighted version:
P P
s(i) s(i)∆ASA
hsi = hsiw = P
n ∆ASA
where ∆ASA is the variation of accessible surface area.
The ratio between hscore i and hsrim i is a good indicator of the presence of a relevant inter-
action: if from the crystallographic data of the complex we get that this ratio is < 1, we can
assume an interaction, since a there is different conservation at the interphase; vice versa, the
complex is probably due to a crystal contact, so its interaction is not specific, but it originates
from the crystallization process.
As a side note, antibodies present low conservation of their core residues, due to their specific
task.
79
Interaction surfaces are usually flat, but can present small indentation, noticeable with careful
analysis. Sometimes, these surfaces have pockets, that we classify as
Complemented p. that disappear in the complex, because the two proteins link perfectly
Non-Complemented p. that create hollow spaces in the complex, in which water or a ligand
can get, mediating the interaction
As we have already said, core regions and especially hotspots are less compositionally flexible
than the rim regions; sometimes hotspots are even pre-organized, ready to react, making them
an optimal target for drugs.
• with a ligand that binds on the hotspots, perturbing the interaction; this method is rather
easy to simulate
• with an allosteric interaction: the ligand binds far far away from the interactions, but
it prompts a conformational change in the protein, interfering with the interaction; this
approach is much more difficult to simulate
Obviously, we can target the interaction to weaken it, but also to strengthen it: stabilization of
the interaction can be achieved through mediating “glue” ligands.
In order to design the right drug, we need to consider large surfaces with small pockets, taking
into account that usually these proteins do not bind with small molecules, that could have helped
to identify the drug structure. Indeed, large surfaces do not require large molecules, but the right
ones2 ; to find them, we need a careful analysis of hotspots and pockets. Thanks to thousands of
researchers, the number of drugs that target protein-protein interactions is steadily increasing,
mostly because there are a lot more targets and applications.
Right now, the targeting is done with two approaches:
HTS a. in which the drug is selected among a large library optimized for protein-protein inter-
actions; libraries must be large but more importantly diverse to achieve success
peptide a. in which the interaction is directly targeted with a piece of one of the proteins
This is where molecular modelling comes in help. If all the structures are available, we can
thoroughly study the energetic aspect of the interaction; if the structure of the complex is not
present, some docking techniques can be applied to obtain it from the apoproteins structures; if
we possess nothing, why do we even try?
2 La solita storia del pennello grande contro il grande pennello; alla fine della fiera, l’importante è...
80
Chapter 10
An handful of examples of MM
applications
10.1 p53-hdm2
The p53 agent binds damaged DNA and kills the cell; the hdm2 switches off p53, and it is
overexpressed in tumor cells. Inibiting the p53-hdm2 interaction may become a good therapy for
cancer.
The first attempt of targeting this interaction was with a peptide cut from p53; since the
peptide assumes a random coil conformation, though, its binding requires a large entropic loss.
To prevent it, we can imitate the α-helix chunk of p53 with a β-hairpin that reproduces it.
Another way is that of using an hydrocarbon chain to staple the helix, so that it is stabilized.
These ways we reduce entropic loss.
However, peptides are only good in theory, since they get digested or isolated by the im-
mune system, so something similar is needed, like phenilic chains1 , triazolic scaffolds or other
peptidomimetics.
GTP GDP + Pi
To damage this process, both microtubule destabilizer and microtubule stabilizer have being
designed, but we will only focus on the action of Vinblastine, a microtubule destibilizer.
Vinblastine forces a bent conformation on the filaments, so that the microtubule cannot
grow: the elongation is not hindered, but the geometry is changed and finally the microtubule
1 “As soluble as a brick” - S. Pieraccini
81
collapses. What happens is that Vinblastine places itself between the two dimers, acting like a
wedge and modifing the dimer geometry.
In order to simulate this three-bodies interaction, we need to group Vinblastine with one of
the two dimer, therefore considering just a two-bodies interaction. By grouping Vinblastine with
one of the two dimers instead of the other, results are different; from these differences, we can
deduce that:
1. Far residues are not affected by Vinblastine (questo lo potevo dire anch’io)
This because residues in contact have different energy if Vinblastine is linked to one or the
other dimer, so these are the important residues. By considering the thermodynamical cycle
between straight unbound tubuline, Vinblastine bound tubuline, both unbound dimer and one
of the two bound with Vinblastine, we get that Vinblastine binding is favourable, so Vinblastine
compensate for the steric stress.
10.3 Rapamycin
Rapamycin represents an extreme case of mediated protein-protein complexation, since almost
half of the buried surface is covered by Rapamycin.
We can identify the relevant residues in the interaction, by confronting the ∆∆G of rapamycin
bound first with one protein, then with the other. The results are that the proteins interact mostly
just with Rapamycin, due to an entropic process. Indeed, the unmediated interaction between
the proteins is completely dominated by a positive entropic contribution; this mean we cannot
neglect entropy in free energy calculations.
Even if during Alanine scanning we usually neglect entropy, this approximation must be
considered carefully when computing absolute binding energy.
1. analyze the interaction network through Alanine scanning, obtaining the important amino
acids
2. validate the results through phylogenetic analysis, i.e. comparing conservation data with
Alanine scanning: conservation is a good importancy indicator
3. analyse the hotspots distribution in the peptidic sequence: if some of these spots are con-
tiguos, it will be easier to sinthesise the peptides
4. test which peptide works better, with Alanine scanning; do not forget that most peptides
lose the secondary structure
5. proceed with in vitro tests, by checking tubuline polymerization; tubuline polymers make
the solution opaque, so we can follow the process through spectroscopy; the partial result is
that the selected plug damages polymerization, that now requires an higher concentration
to start
82
6. test on cultured cells, with a single blind
A few important data are the remaining cells in the culture after application, and the results
of the scrambled peptides method, in which peptides with the same aminoacidic composition,
but difference sequence are tested, to make sure it is actually due to the sequence and not, for
example, to the charge.
Sometimes, some synthetic trick are required, as for FtsZ. This molecules has the same effect
of microtubules destabilizers, but in bacteria; since the actors involved are different, FtsZ only
target bacteria, leaving the host cells alone. Due to a massive entropic binding term, the helix
peptide must be stapled; fortunately, this does not interfere with the binding. The only downside
is that bacteria cells walls do not allow peptides in, so just the degree of polymerization has been
tested.
While designing such a drug, one must always keep in focus its dimensions: a large molecule
will have a bigger binding energy, because more interactions are available; however, large molecules
are more complex, generating synthetical and biological concerns that should be addressed pre-
viously or at least compensated by the drug efficiency.
In conclusion, molecular mechanics provide us with many efficients tools to analyze PP in-
teractions that allows us to study targeting, druggability and design of potential drugs.
83
Chapter 11
Jarzynsky equation
In this chapter, we will discuss how we can get equilibrium information from
non-equilibrium simulation, with the Jarzynsky equation
Finite rate transformation On the other hand, if the transformation occurs at a finite rate,
the system lags behind equilibrium at each time instant; depeding on the starting conditions,
different works are required for it to happen. What we get is a probability distribution of work
ρ(w, ts ) for each switching time ts (time required for λ = 0 → 1). From this distribution, the
average work is evaluated as Z
hw(ts )i = wρ(w, ts ) dw
84
for each ts .
This brings us to the Jarzynski’s equation:
1
∆F = − lnhe−βw i
β
This mean that, in principle, ∆F can be obtained by running multiple replicas of a simulation
and calculating the work exponential average. This requires a good statistical sampling of all
possible paths: the single, infinite, quasi-statical simulation can be substituted by many short
simulations. Obviously, the conviency depends on the number of simulation required.
We should consider two limiting cases of this equation.
w = hwi = ∆F
w = H1 − H0 = ∆H
Isolated s. We prepare the initial states a by putting the system in contact with an heat
reservoir; after each replica is thermalized, we switch the reservoir off (we decouple system and
reservoir), so that the system is isolated, but canonically distributed. The canonical distribution
along the transformation coordinate z at t0 is therefore
e−βH0 (z0 )
ρ(z0 , t0 ) =
Z0
At each time, we can evaluate the accumulated work w(z, t) i.e. the work done up to time t; if
t = ts , the total work is obtained. We then consider the exponential average
Z
he−βw i = ρ(z, ts )e−βw(z,ts ) dz
γ
In an isolated system,
w(z, t) = Hλ (z) − H0 (z)
85
where Hλ is the hamiltonian function corresponding to the order parameter λ. Since the proba-
bility distribution follows Liouville’s theorem, we have
e−βH0 (z0 )
ρ(z, t) = ρ(z0 , t0 ) =
Z0
therefore
1
∆F = − lnhe−βw i
β
Non-isolated s. If the system remains in contact with the reservoir, we can think of it as a
big isolated system of total hamiltonian
where the interaction hamiltonian hint depends on both the system coordinates z and the reservoir
coordinates z 0 . If this coupling interaction is small, it is easy to demonstrate that Jarzynski still
holds. Indeed, as for an isolated system, we can reach the conclusion that
Yλ
he−βw i =
Y0
where Y is the canonical partition function of the global system:
Z Z
0
Yλ = dzdz 0 e−βGλ (z,z ) = dzdz 0 e−β(Hλ +Hr +h)
This means: Z Z
dz e−βHλ dz 0 e−βHr
Yλ Zλ
= Z Z = = e−β∆F
Y0 Z0
dz e−βH0 dz 0 e−βHr
Even without the small coupling assumption, the Jarzynski equation validity can be demon-
strated, but it is outside the scope of this course.
86
11.3 A time of trials and tribulations
Even if Jarzynski approach is exact, a problem of convergence, purely statistical, limits its ap-
plicability; things like ts and number of replicas are unfortunately heavily system dependent,
and the lack of experimental reference makes testing the simulations difficult, while only small
system can be pushed to quasi-static process. In addition, the exponential average is not so good
statistically, so that a truncated cumulant expansion is a widely accepted improvement.
The average work fon Ns simulations can be expressed, arithmetically, as
Ns
1 X
wa = wi
Ns i=0
These two formulas are not equivalent, unless Ns → ∞, so that the law of large numbers
can be applied. As we have seen, rather than using hwi as an F -estimate, we can use something
related to the exponential average
Ns
1 1 X
wx = − ln e−βwi
β Ns i=1
∆F ≤ wx ≤ wa
87
If the work distribution is gaussian, only the first two cumulats are 6= 0, so
β2 2
∆F = hwi − σ
2
where we can find a suitable expression for the dissipated work
β2 2
wd = − σ
2
The close relation between the dissipation of energy and the fluctuation of the system σ is yet
another proof of the fluctuation-dissipation theorem1 . The main advantage of the trunkated
cumulant expansion is its faster convergence; so, even as an approximation, the systematic error
it introduces is compensated by better estimates, due to this faster convergence.
The delta function here selects all the coordinates in the pathway, instead of using a line integral.
In the end, we are limiting our partition function Z solely on the trajectory.
To force the system on the pathway, we apply a guiding potential in the shape of
k 2
h(r, λ) = [ξ(r) − λ]
2
that can be imagined as if we are exerting our force by the means of a spring. To properly trasmit
the force, the spring must be so rigid that is almost rigid.
The guiding potential represents a perturbation in the hamiltonian function
k 2
H̃ = H + [ξ(r) − λ]
2
Since
e−β(Fλ −F0 ) = he−βw0,λ i
we have Z Z
βk 2
e−βF (λ) = e−β H̃ drdp = exp −βH − [ξ(r) − λ] drdp
2
Actually, we are interested in the PMF of the unperturbed system, so we introduce the identity
Z
dξ 0 δ [ξ(r) − ξ 0 ] = 1
in our integral:
Z Z Z
−βF (λ) 0 0 −β H̃ 0 0 βk 0
e = drdp ξ δ(ξ − ξ )e = dξ exp −βΦ(ξ ) − (ξ − λ)2
2
1 for more information about the fluctuation-dissipation theorem, see G.Mandelli - Introduzione alla Fisica
Statistica
88
If the spring is stiff (large k), most of the integral will be on ξ 0 ' λ due to the gaussian function;
this means our system does not drift, and the simulation value is close to the theoretical one:
Φ(λ) = F (λ)
This equality holds if we have a good coupling between the pulling and the system movement.
89
Chapter 12
Umbrella sampling
where U u is the unbiased potential we are actually looking for, that is a function of the system
positions r; on the other hand, the biasing potential is a function of ξ, that is a function of r.
Obviously, the optimal W would be the opposite of the potential of mean force F :
W (ξ) = −F (ξ)
Since knowing F is the actual goal of all of this, we have to find another way of generating W .
The strategy is dividing the reaction coordinate ξ in many windows, and in each of them
we apply a biasing potential, in general terms different from window to window; then we run a
simulation. For example, we can add an harmonic biasing potential, to avoid the system egress
from the window. In this way, we obtain an exaustive sampling of each window, and we are left
with just the problem of linking together all the results; at that point, knowing both U b and W ,
we can get U b .
90
This is the statistical weight in the phase space of the trajectory along ξ, since on the numerator
we have the configurational integral limited on it. We already know that the free energy difference
between the two states can be obtained as
1 PB 1 QB
∆F = − ln = − ln
β PA β QA
By adding the biasing potential, we have to consider the biased probability density for each
window i: Z
1
Pib (ξ) = b e−β[U (r)+Wi (ξ)] δ [ξ(r) − ξ 0 ] dr
Q
where Qb is the biased partition function:
Z
Qbi = e−β[U (r)+Wi (ξ)] dr
= e−βWi
so that
Piu = Pib eβWi he−βWi i
The free-energy in the window is therefore easily accessible as
1 1
ln(Piu ) = − ln Pib (ξ)eβWi he−βWi i
Fi (ξ) = −
β β
1 1
= − ln Pi (ξ) − Wi (ξ) − lnhe−βWi i
b
β β
If we divide ξ with one single and only window, Fi is as simple as evaluating the first two terms,
since the difficult one – the last one – can be neglected as an additive constant. Increasing the
number of windows forces us to consider it, since free energy is a continuos function.
91
where ξ 0 is now the center of the window. With this we sample each window.
We must take care that the sampling distribution of neighbour windows superimpose with
each other, keeping in mind that new windows can always be added. Another possible approach
is that of the adaptative umbrella sampling, in which instead of using many windows and a
starting Wi , we sample our system in a single window without biasing potential; then we adapt
the PES in the subsequent simulations. Exempli gratia, without any biasing potential our system
will be stuck in the starting minimum, that will be well sampled. In the second simulation, we
ipotetize that the underlying potential is harmonic and centered so that it reproduce what we
sampled; we had its opposite to the total potential and we run the simulation: the system will
explore a much broad portion of phase space, so a different underlying potential is added. This
procedure is protracted untill convergence is reached, convergeance being a diffusive dynamic.
The linking problem is still there: since we still have different potentials to glue together. We
have just turn a many-windows single-simulation problem in a many-simulations single-window
one. The linking task is left to the WHAM (!!).
where Pju is the unbiased probability, Cij = exp(−βWi ) is the biasing factor and fi is the
normalization factor. The last one can be obtained by enforcing normalization on the Pijb :
M
X 1
Pijb = 1 ⇒ fi = P u
j j ij Pij
C
92
Chapter 13
93
Alas, umbrella sampling can only see what happens during the simulation, so we are not able
to see anything outside of the reaction coordinate: we cannot consider any misfolded β-hairpin,
verbi gratia; to do it, we would require more reaction coordinates, that means more windows,
that means a much greater computational effort.
94
Chapter 14
Metadynamics
95
Figure 14.1: A beholder. It has beauty in his eye
but it is important to notice that this equation was only heuristically postulated and then
positively tested on multiple systems; there is no theory behind.
96
14.3 Collective variables socialism
More often than not, more than one collective variable is taken, since oversimplifying may lead
to artifacts; this requires a multidimensional gaussian potential of the shape
d
( )
2
XY [si (x) − si (nτg )]
Ug (s, t) = w exp −
n=1 i
2σs2
where d is the number of collective variables. An high d has some drawbacks, since the number
of gaussians increases exponentially, to the point that vanilla molecular dynamics becomes more
viable. The number of gaussian to fill a well ng goes as
ng ∼ σs−d
• distances between atoms or centers of mass; they do not take into account the orientation
of the molecules, so we may need some...
• ... dihedral angle
• coordination number is very useful to describe hydrophobic interactions, H-bonds and com-
plexation. It is defined as
X 1 − (rij /r0 )n
s(r) =
ij
1 − (rij /r0 )m
where rij is the distance between the two objects, r0 is a reference distance, while n and m
are fitting parameter, to smooth what would otherwise be a step function in some extreme
cases.
where ϕi are the angles in the Ramachandran plot, ϕ0 being the perfect helix one, that is
45◦ . If the peptide is a perfect α-helix, Φ is equal to the number of residues.
• a bit more accurate way is dihedral correlation, expressed as
res−1
X p
Φcorr = 1 + cos(ϕi − ϕi−1 )
i=2
If the correlation is high, like in any folded structure, φi ' φi−1 , so Φcorr = n − 1; if there
is no recognizable secondary structure, the dihedral correlation assumes many different
values. It can be compared with circular dicroism.
• the radius of gyration measures the length of the peptide in some way
97
• Another possible collective variable is the root mean square deviation (RMSD) of the
protein in respect to ideal structure, considering each 6 residues groups inside the peptide.
For a perfect α-helix, the RMSDs of the groups are all null. The collective variable can be
expressed as n
ri − d0
X 1−
groups
r0
s= m
ri − d0
i 1−
r0
We can easily understand that even a simple thing can spawn a lot of different collective
variables, so the choice is fundamental. More important, these collective variables may not be
enough: an α-helix bundle is not different from a long α-helix if we have an helicity variable, so
the radius of gyration may be added.
98
Chapter 15
Some examples
99
15.2 Parrinello’s benchmark work on β-hairpin
To apply metadynamics on the β-hairpin, Parrinello took as collective variables the number of
intramolecular hydrogen bonds and the radius of gyration. Three minima appear in the resulting
free energy surface:
1. the deepest, at low radius of gyration and high number of hydrogen bonds, corresponding
to the folded state
2. another at low gyration radius and low number of hydrogen bonds, corresponding to a
misfolded compact structure
3. the last one at high gyration radius and low number of hydrogen bonds, that is the unfolded
state
Then, Parrinello applied Parallel tempering to the simulation: this created not only a canon-
ical distribution of initial conditions, but allowed also to sample a set of temperatures. With this
brand new collaboration, metadynamics is a lot faster, because the system does not wait the well
to be filled to jump out, but finds itself in another well thanks to replica exchange. This allows
a more uniform filling of the whole surface, increasing the calculation efficiency.
1. at T < Tm , the folded structure sits on a deeper minimum than the unfolded one, so it is
stable
2. at T = Tm , the folded mininum is as deep as the unfolded one, so both of them are stable
(phase transition)
3. at T > Tm , inversion occurs and the unfolded structure sits on the deepest minimum, and
the protein is unfolded
At room temperature, the osmoprotectants make the folded minimum larger, while with
denaturants this minimum almost disappear, and the unfolded one gains importance; osmopro-
tectants increase Tm , while denaturants reduce it. The change in the free energy profile influences
directly the ∆F of transition, that can be obtained as always through the ratio of the partition
functions.
This was parallel tempering. As for metadynamics, it allows us to easily get an atomic de-
scription. With it, we can see the mechanism of both classes of molecules:
denaturants interact directly with the protein, exposing the hydrophobic core but mainly steal-
ing H-bonds
osmoprotectants protect the hydrophobic core of the protein, without direct interaction
100
Honestly, the osmoprotectants effect is still unclear, but two hypothesis were presented:
water ordering the osmoprotectant forces a more ordered structure on water, reducing the
entropic penalty on the hydrophobic core; it is the opposite of the cautropic effect of urea
osmophobic effect the osmoprotectants have an unfavourable interaction with the protein
backbone, that is far more exposed in the denaturated structure; this means that the
osmoprotectants destabilize both folded and unfolded states, but far greatly the unfolded
one
101
Chapter 16
In this chapter, we will move away from the atomic frame, to further increase
the calculation speed. These kind of beads force fields will be discussed through
the martini force field
102
far less states than a proper set of atoms; then, any atomic chirality is lost. This means that the
temperature dependence of any property is not at all reliable. Finally, coarse grain FFs have the
same weakness of any other FF, i.e. their parametrization. Indeed, each FF has been optimized
with a goal in minde, so they are prone to overstabilization of the system of interest. In all of
this, we have to remember that structural informations must be given as an input, and great
modifications are not possible.
Nanoparticles Coarse grain has been also applied to simulate the interaction between the
membrane and some other objects, like fullerenes and nanoparticles. This because nanoparticles
are a very good drug carriers, so their use and toxicity are important informations.
Holes The longer time frame accessible to the coarse grain force fields allowed the exploration
of membrane hole formation under the effect of particular anti-microbial peptides.
103
Chapter 17
In this chapter, we will describe the last approach to enhance the sampling,
the accelerated molecular dynamic.
The effect of this modification is to exponentially decrease the time required to escape a
potential well, that depends exponentially on the barrier energy. This means that the effective
time step δt∗ is greater than the set time step δt, because the sampling is more efficient. We can
state
δt∗ = δteβδU
where the exponential function of δU modulates the effective time step increment. The potential
boost δU is a function of both the coordinates and time. This formula also tells us that if U ∗ = U ,
δt∗ = δt, otherwise δt∗ > δt. The effective simulation time is the sum of all the effective time
steps, so
X X
t∗ = δt∗i = δteβδUi
i i
N
N X
= δt eβδUi = theβδU i
N i
104
17.2 Free energy profile recovery
What we saw is not enough to get a free energy profile, since we are actually sampling a biased
PES U ∗ , of which we have no interest. We want the original PES, at the cost of the biased one.
We recall that free energy is more or less a probability of occupation, that describes the higher
population of lower energy regions. Population is then the key to solve this problem, yet we still
have a biased population.
If we consider the average of an observable A,
Z
1
hAi = A(x)ρ(x) dx
Z
but we sample a biased PES, the bias will fall on the distribution ρ, turning it into ρ∗ :
Z
1
hAi∗ = ∗ A(x)ρ∗ (x) dx
Z
The change in distribution will reflect the change in potential
U ∗ = U + δU
so by weighting each position by eβδU , we effectively recover the proper distribution, since
∗
e−βU = e−βU eβδU
The same procedure is repeated on Z ∗ . This way we go back to hAi, since we know δU at each
x.
was too abrupt, hindering convergeance. A smoother version was then created, introducing a
smoothing parameter α.
(ε − U )2
δU =
α + (ε − U )
If α = 0 we go back to the original, while if α > 0 we are modifying the potential by an α measure.
Originally, the acceleration was only applied to torsional terms, since they were considered critical
in folding processes. Now, every FF term is boosted.
Another important matter is the magnitude of ε. To find it, some unboosted simulations are
run and the potential energy is extrapolated. The system was probably stuck at the bottom of a
well, of which we get the profile; now we can make an educated guess on how much energy would
be required to escape it. As always, balance is of the essence, and it is way more important in
the choice of the smoothing parameter α.
We have to say that the choice of parameter of accelated dynamics is counterbalanced by
the selection of the reaction or collective coordinates in metadynamics and umbrella sampling.
In facts, accelerated methods do not require any projection of the potential energy surface, but
need a proper α and ε. Since the simulation heavily depends on them, an exaustive testing must
be done.
105
17.4 Reconstruction and reweighting
In practice, recovering the unbiased free energy profile closely resemble the WHAM (!!) tecnique.
Indeed, we look for the probability of A as a function of the j-th beam, in an histogram fashion.
At this point we have ρ∗j , from which we can recover the original ρ by weighting
ρj = ρ∗j eβδUj
17.5 Examples
β-hairpin folding Since β-hairpin is the simplest compact structure peptides assume, it is a
common test system. After the simulation is completed, the free energy F can be written as a
function of any interesting variable: this is just post-processing, and no relevant computer effort
is wasted. A good plotting variable was the difference from the crystal structure.
Small protein AMD was applied to the folding of a small protein with an α-helix and a turn.
An compact intermediate was discovered.
Three-helix bundle Also this system folding presents a compact intermediate. In this case,
AMD is able to reproduce the results of MD simulations 18 times longer in computational time.
106
Chapter 18
Conformational Analysis
18.1 Conformations
A conformation is a 3D molecular structure that differs by a dihedral angle from another. Even
in a simple hydrocarbon, we can have a lot of dihedral angles, that can generate many different
conformations. This definition can be relaxed, since in more complex molecules some other pa-
rameters can change, due to steric strain; for this, we do not consider the small changes in bond
length and angle that may occur. In complex molecules, though, can be very useful analyzing the
different conformation. A particular drug, for example, may have just one active conformation
that we would like to know.
Conformational analysis is the right tool to find it, but some general aspects must be consid-
ered first:
• the active conformation may be stable only when bound, and may not be the most favourable
• the conformation depends on the solvent; conformational analysis is usually performed in
vacuum or in implicit solvent
• when conformational analysis is performed, we look for all possible conformations, that
correspond to the potential energy minima
This means that, for all intends and purposes, conformational analysis is an optimization problem,
and we know that no optimization tool is able to find all the minima, let alone the absolute one.
What we can find is the closest minimum from the starting position. Therefore, the core business
of CA is optimizing a set of different starting positions.
107
1. systematic method or grid search
2. model building method
3. random search
Systematic method explores the conformational space systematically and reproducibly. E.g.,
ethane has a single dihedral angle; if we change its value from 0 to 360◦ with a 10◦ step, we will
have 36 starting conformations. Since ethane has three minima, on average 12 conformations will
reach the same one. Unfortunately, in most of the cases we do not know the PES, so this kind
of logic cannot be applied. The total number of initial conformations is given by
dihedral
Y 360◦
N=
i
θi
where θi is the step for each dihedral angle; usually it is the same for all angles, but it is just an
habit.
This approach is easy and reproducible, but also prone to combinatorial explosion. With 5
rotable bonds and a 30◦ step, we have around 250000 starting structure. This is not very time
consuming, while optimization is! With 7 rotable point, 30◦ step and 1 s of optimization, the
complete confomation analysis amounts to more than 400 days. This means that the systematic
method can be applied to larger molecules only after proper consideration.
Fortunately, some conformations are really ugly, with self-crossing or very close atoms, and
they would not be able to reach any suitable minimum: for any extent, they can be discarded
without remorse. Moreover, all similar conformations can be discarded, further cutting the num-
ber. This procedure is called pruning, as we can see from an example. In exane, only the central
three dihedral angles are relevant, namely ω1 , ω2 and ω3 . If we use a 120◦ , we would get three
values for each of them. This means that for each ω1 , three ω2 are possible, and equally for each
ω2 we will have three ω3 . This is where combinatiorial explosion comes from. If for a certain ω2
we get a bad conformation, we can cut the entire branch, effectly skipping the evaluation of the
three different ω3 .
Another problem arises for cyclic molecule. Obviously, in generating the starting conformation
we may lose cyclity, so we generate a pseudo-acyclic analog by breaking a bond. Before starting
optimization, we check that the structures generated have the two atoms of the broken bond
close enough; this way, we discard anything too distorted to form a cycle. Some generation time
is wasted, but we do not waste during optimization.
Model building takes advantage of the fact that organic molecules are made of standard
moieties (building blocks, functional groups), so we can break down the molecule to its functional
group, and perform CA on them, storing the result into a database; at the end, we put together
everything. This is based on three assumptions
108
Random search is the opposite of systematic search, and generates structures randomly. We
have two main approaches:
Nonetheless, from a starting guess conformation, we generate another by modifying the angle.
This conformation is optimized, and the resulting one is confronted with the database of the
conformations optimized previously. If there is a similar one, it is discarded, otherwise it is
stored. The new guess structure can be created in many ways:
We have to point out that in CA we are looking for all the minima, while in Monte Carlo
MD we just want to sample the all phase space: MD has no optimization phase. However, MD
can be exploited to generate the starting conformation. What we find with CA is the minimum
geometry, while MD passes through a lot of conformations that are close to the minimum and are
not so useful, resulting in a big waste of everyone’s time. Moreover, MD is performed at a certain
temperature, and the system will move around, at least until we decrease the temperature to the
absolute zero, when the system will freeze into the minimum.
The termination phase for random search needs a little bit more attention. Indeed, if sys-
tematic search ends when all the generated structure are optimized, random search can go on
until the end of time; we can stop it by
109
be the absolute. Simulated annealing is an example of a molecular dynamics application in
conformational analysis, although it is not a plain one. Another possible application is a sort
of quenching procedure, in which alongside the simulation some conformations are extracted
and optimized.
cording to which living organisms evolved after a big environmental catastrophe (like dinosaurs). Then Lamarkian
evolution stated (not completely correctly) that adquired characters could be inherited, so giraffes got their long
necks because they kept straining themself to reach high leaves. Finally, darwinian evolution stated that a tandem
of mutation and selection garantees the survival (therefore reproduction) of the fittest. This evolution theory has
now evolved (eh eh) to include genetics: the genotype is partially transimitted to the offspring that will probably
manifest it through its phenotype. Sex is where most of the mutation happens.
110
Obviously, before evaluating these measures, the structures must be superimposed. Different
measures give different values of similarity.
After a difference measurement has been established, the collecting algorithm can be initiated.
These are of different categories, illustrated as follows.
non hierarchic that is not sequential. A good example is the K-means algorithm, that divide
the population in K subpopulations. In each of them a centroid is identified, and then the
clusters are re-organised so that each of them collects the nearest neighbour of its centroid.
This process is iterated, taking every time the centroid as the individual with the lowest
sum of distances in respect to the others. Convergence is reached when modification no
longer occurs, or the distance is below a certain threshold. The centroid is then taken as
representative conformation.
111
Chapter 19
thermodynamically because each protein possesses a folded structure and many denaturated
ones, so the folded structure has zero entropy and we need to describe a polymer that folds
with zero entropy
kinetically because the single folded state is reached in finite time
Defining the folded state and the folding pathway is the protein folding problem.
112
The classic example of a frustrated system is a group of three people that hate each other
and have to walk on one of the two sides of a road. The optimal solution always ends up being
two people on the same side, that is a locally non optimized solution.
While the typical frustrated system presents many different minima, separated by huge bar-
riers, a protein presents a distinct low energy native structure, well separated from the others.
• water forms a cage around the hydrophobic molecule, paying the entropic toll
• the hydrophobic molecule changes conformation to minimize surface area, paying the en-
thalpic toll
This changes can be synthesised by the hydrophobic effective force. An effective force is the
derivative of the free energy in respect to an order parameter.
F = U − T S = −T S
Since there are no interactions, orientations doesn’t matter and the system is equivalent to
a single particle moving by random walk1 . The end-to-end distance ρ is the sum of the step
vectors, that are our bonds, so
X
ρ= li
i
hρi = 0
so we have to extract all the informations we need from hρ2 i, that is the square of the variance.
We can evaluate it as
n−1
X n−1
X n−1
X
hρ2 i = li • lj = l2 + li • lj
i,j i=j i6=j
113
where ρ0 is the ideal chain length and n is the number of monomers. Notice that rigour will
require n − 1 instead of n, but for large n this is a good approximation.
We can get the probability distribution of the end-to-end distance thanks to the central limit
theorem: a sum of uncorrelated stochastic variables has a gaussian distribution, so
2
/2ρ20
P (ρ) = e−ρ
The number of chain with length ρ (nρ ) is then given by the total number of chains N weighted
by P (ρ):
nρ = N P (ρ)
With this, we can estimate the entropy as
S = kB ln W
where W is the number of states accessible at a given ρ, i.e. the number of chains nρ :
S = kB ln nρ = kB ln N + kB ln P (ρ)
hρ2 i = nλ2
where λ is now the correlation or persistance length, that is the distace after which monomers can-
not feel each other. Now the ideal chain is divided in small self-correlating blocks, and monomer-
monomer interactions will push our model further.
that in the simplest case can be divided in an integral over positions Qq and an integral over
momenta Qp ; we already know from theory that
Z 3N/2
−βT 2πm
Qp = dp e =
β
114
while Qq depends on the potential. To dodge any problem, we can expand it with the virial
expansion
N2 N3
N
Qq = V 1 − V 2 B(β) − V 3 C(β)
V V
where B and C are the virial coefficients and
N
ρ=
V
is the particle density. With this, we can explicitate Q in the free energy F :
1 1 1
F =− ln Q = − ln Qp − ln Qq
β β β
1 3N 2πm 1
=− ln − ln Qq
β 2 β β
We can know approximate ln Qq with something reasonable, since
ln(1 + x) ∼ x
ln Qq = N ln V − V ρ2 B(β) − V ρ3 C(β)
For the virial expansion, ρ must be small, so the logaritm approximation is applicable.
We can further approximate by carefully considering ρ. An ideal gas is isotropic, so
This is what is commonly known as mean field approximation; it is good for gasses, but for
polymers obviously is not: we can use it if we take as box a sphere centered in the polymer centre
of mass, with radius equal to the polymer characteristic chain length R0 . The low density and
this sphere make possible to apply both virial expansion and the mean field.
Mutually repulsive monomers Let’s focus now on the case B(β) > 0; this means that a
repulsive force is present between the particles, so that the volume V > V0 , with V0 ideal volume.
Likewise, the characteristic length R > R0 , characteristic length of the ideal system. To use the
mean field approximation, we need
4
V0 = πR03
3
so the ideal density will be
N 3N
ρ0 = =
V0 4πR03
3N 3 1
= √ = √
4πl3 N N 4 πl3 N
where R0 was substitued as the characteristic length of the ideal chain, with an l step. It is clear
from this formula that
1
ρ0 ∼ √
N
This relation holds even in the non-ideal case, with ρ < ρ0 .
115
At this point, we want to know the number of scattering events between monomers; the
two-bodies collisions are given by
1 √
P2b = ρN ∝ √ N = N
N
while the three-bodies are
P3b = ρ2 N ∝ 1
This tells us that these events are so rare in a chain of length N , that we can trunkate the virial
expansion to the second order, since ρ is really really small, so small that three-bodies scattering
is neglectable. We now introduce the constant α = R/R0 with R characteristic length (end-to-
end distance) of the system, while R0 is that of the ideal case. The free energy as a function of
R is made by the ideal chain one and a virial correction at the first order:
3R2 V ρ2 B(β)
F (R) = +
2βR02 β
If we switch to α, we get √
3α2 N B(β)
F (α) = + +c
2β βl3 α3
We can find the α that minimize F by putting the α derivative equal to zero:
s√
∂F N B(β)
=0 ⇒ α=
∂α l3
Since
B(β)
R = αR0 = 3 N 3/5
l
√
We recall that for the ideal chain, R ∼ N , so the repulsive chain is more inflated.
Mutually attractive monomers The opposite case, B(β) < 0, is not as simple as this. Due
to the attractive interactions, we expect a collapsed chain, so that the low ρ approximation is
no longer viable; the three-bodies interactions are no more negligible and the virial expansion
cannot be trunkated to the second order any longer.
It seems not, but this is a good news: since the virial coefficients depend on T , B(β) changes
sign with the change of T , representing two different tendencies:
Between the two, when B(β) = 0 there is the globule-coil transition, that could represent
a crude approximation of protein folding. The main difference is that the globular object does
not have a native conformation entropy, since the compact structure has a different number of
accessible states.
Nevertheless, a good correction is available; if we have a chain of length N in a sphere of radius
d, roughly the ratio N/d does not change if the system gets bigger. Magically, since entropy is
estensive, we can heuristically take
l2 N R2
S = 2 = 20
d d
116
as the entropy value. Then we slap it into the free energy, that now has the third order virial
term √
N B(β) C(β) 1
F (α) = + 6 6+
βl2 α3 βl α βα2
where the last term is the new scaled entropy. By minimizing F along α, we obtain
√
3
R= N
So that, given N , the globule is the more compact structure, followed by the ideal chain and
finally by the random coil. The transition from globule to coil increases R and it is related to
the temperature dependence of B(β). Nonetheless, this is still not a good description, since we
are still missing the native structure; indeed, the globule has a lot of accessible structures, so it
has entropy different from zero.
where the independent random conctact energies εi are uncorrelated (the central limit theorem
will play a role, later) and will take the shape of something similar to a step function, since there
is conctat or not.
In this model, a generic sequence of amino acids arranged in a particular structure is just
a composition of conctat energies: structure and sequence are no longer important. We assume
that
The first two points mean that the conctat energy is either full or null. The point number 3
seems rather odd at first sight: in theory, different structures should have a different number of
conctats; however, a protein in solution is always in compact structure, even when not folded.
Compact structures have more or less the same amount of conctats, so we can assume it constant.
Finally, 4 means that every amino acid can occupy each position; this is obviously not a protein,
but we will address this discrepancy with a successive correction.
As we predicted, central limit theorem tells us that the energy probability distribution is
gaussian:
(E − E0 )2
P (E) = K exp −
2σε2
with
p
E0 = NC ε0 σE = Nc σε
117
The number of structures at a certain energy will be proportional to this probability,
(E − E0 )2
n(E) = K 0 γ N exp −
2σε2
where K 0 is a normalization constant like K, γ is the average coordination number and N is the
length of the chain. This way, γ N is the estimate of the number of compact conformations.
In general, N 6= Nc , but for lattice models we have N ' Nc . A lattice model describes
the polymer with the monomers on the vertexes of a simple cubic lattice; in this case, it is quite
easy to define conctacts, limiting the interaction to the nearest neighbours. For the lattice model,
then,
(E − N ε0 )2
n(E) = K 0 γ N exp −
2σε2
by using the identity
γ N = eN ln γ
we contract everything to
(E − N ε0 )2
n(E) ∝ exp N ln γ −
2σε2
Since n(E) is the number of configurations accessible to the protein as a function of the energy,
it should be an integer, while the exponential function is obviously a real number. This means
that trunkation is needed, and if the exponential is less the 1, no conformations are available.
We are then interested in the sign of the exponent, since
(
x > 0 means ex > 0
x<0 means ex < 0
We are focusing on the low energy regions, so we can neglect the portion with E > N ε0 , getting
to p
E > N ε0 − N σε 2 ln γ
Otherwise, the exponent is negative and there are no states. Instead, when the exponent is zero,
only one state is accessible and we are at the critical energy
p
Ec = N ε0 − N σε 2 ln γ
Below Ec , a deep ocean of nothingness stands in wait. From n(E) we can evaluate the entropy
S, as
(E − N ε0 )2
S(E) = kb ln n(E) = N ln γ −
2σε2
In microcanonical ensemble,
1 ∂S
=
T ∂E
so
1 E − N ε0
=−
T N σε2
118
This allows us to define the critical temperature as
−N σε2 σε
Tc = √ =√
N ε0 − N σε 2 ln γ − N ε0 2 ln γ
that is independent from N .
In the canonical ensemble (constant N, V, T ), at each T the energy will fluctuate, but if
we lower T the energy will drop towards Ec . When there, the system is virtually frozen and
further reducing T will not have any effect. All of this looks like a phase transition towards
nothingness, that is called glassy phase transition. The problem now is that the lowest energy
state is unique, but very close to a continuum of very different states: we are describing a random
polymer, not a protein that has been evolutionary selected to present a certain structure. Actually,
a protein presents correlations that emerge in the native structure, and in the native structure
alone.
For this, we add a native state separated from the others by a large gap δ: the random energy
model just describes the unfolded compact structures, but a structurally dissimilar one emerges
from evolution. This structure will have free energy
FN = EN
FU = EU − T SU
and EU > EN for hypothesis. At 0 K, only the native state is occupied, but ∃T so that
FN = FU
and the two states are equally occupied; this is the folding temperature Tf
EU − EN δ
Tf = =
SU SU
We now have a set of parameters that allows us to classify the behavior of a polymer. If
Tf > Tc the system folds before freezing, and behaves like a protein
Tf < Tc the system freezes before folding, and behaves like a polymer that gets stuck in glass
phase
The same conditions can be expressed in terms of the ratio Tf /Tc ; if greater than one, the protein
is able to fold.
Sometimes, some states can be between EN and Ec , but since they have zero entropy and
energy > EN , they are thermodynamically unimportant. Much more dangerous is the case of
two native states, structurally distinct. This situation is typical of the prionic proteins, that have
a very stable misfolded structure in which they get stuck, generating terrible conditions in the
subject.
119
chemical reaction from the unfolded to the folded state, through an unknown transition state.
The main problem sits in the choice of the reaction coordinate
phase transition of the first order, that is a better description, even if the chemical reaction
approach can still enlighten us about the mechanism
chain volume it can obviously distinguish coil from globule, but cannot identify the native
state among the globules
number of conctat is the same, since an elongate chain as very few conctats, while globules
have a lot; still it cannot identify the native state
conctats ratio is instead the ratio between the native conctats and the total number of conctats,
Nnat
Q=
Ntot
and can distinguish globular (Q < 1) and native (Q = 1) structure, but it has some
problems with coils. Although is particulary good for thermodynamical purposes, the as-
sumption that all native conctats are equal limits is descriptive power for kinetics
Another weak point for the reaction picture is the identification of the transition state, that
inevitably fails at accounting for the entropic nature of the folding barrier.
120
is the reason why the conctat ratio is not a good reaction coordinate. Moreover, the TS is no
more a single structure, but a ensemble of those with the folding nucleus. As for a normal TS,
the transition state can go forward, but can also go backward to the unfolded state. All of this
is the solution to the Levinthal’s paradox, since the sampling of the phase space procedees just
until one of the bilions of structures with the post-critical folding nucleus is reached.
The phase transition frame gives a better scaling of the folding time in respect to the chain
length N , exp(αN 2/3 ). This is much close to the experiment that the reaction one of exp(αN ).
At the same time, mutagenesis confirms the presence of a nucleus that if mutated will make the
folding much more difficult.
As a remark, the formation of the LESs does not mean that secondary structure emerges
before and guide the formation of the terziary structure, since usually secondary structures are
not even stable on his own: yet some are, and are able to form the nucleus. What this slow
step hints is that protein folding can be targeted to prevent folding. A peptidomimetic can
partially substitute the LES in the binding, slowing the folding down to the point of making the
protein useless. In this case, resistence can be achieved only through matching mutations, since
a mutation alone will prevent folding too. Not great, not terrible.
121
Chapter 20
Anti-freeze proteins
This thermodynamic description is plain and simple, but the kinetic one is not. Close to Tf ,
we can consider both H and S equal to their equilibrium value, ∆Ht and ∆St , where
∆Ht
∆Gt = 0 ⇒ ∆St =
Tf
so that
∆St
∆G(T ) = ∆Ht − T
Tf
∆Ht Tf − T ∆Ht ∆Ht ∆T
= =
Tf Tf
heterogeneous if other substances are present that can catalize the nucleation, like sand makes
oysters create pearls
122
To improve our comprehension, we will consider the free energy in terms of molar volume Vm :
1 ∆Ht ∆T
∆Gv =
Vm Tf
The creation of a new phase corresponds to a volumetric term (the new phase) and a surface
term (the interphase):
∆Gtot = ∆Gvol + ∆Gsurf
4
= πr3 ∆Gv + 4πr2
3
In freezing, ∆Gv < 0 is T < Tf so ∆Gvol is always negative, while since γ > 0 what remains is
positive. As a sum of a positive square and a negative cube, ∆G will assume the shape shown in
Figure 20.1. This curve presents a maximum at (r∗ , ∆G∗ ), that we can easily identify:
d !
∆G = 0 = 4πr2 ∆Gv + 8πrγ
dr
= r(4πr∆Gv + 8πγ) ⇒
2γ
r∗ = −
∆Gv
and
4
∆G∗ = ∆G(r∗ ) = π(r∗ )3 ∆Gv + 4π(r∗ )2 γ =
3
32πγ 3 ∆Gv 16πγ 3
=− + =
3∆G3v ∆G2v
16πγ 3
=
3∆G2v
2
∆G
r2
−r3
1 r2 − r3
r
1 2 3 4
−1
−2
123
This formulas clearly show that ∆G∗ ∝ ∆T −2 and r∗ ∝ ∆T −1 , so that less energy and a smaller
radius are required the farther we go from Tf ; it is easier for random fluctuation to generate
crystals big enough to become crystallization nuclei.
AFP (antifreeze proteins) present in fishes and insects that live in cold environments
INP (Ice nucleating proteins) present outside some bacteria, promote the formation of ice as a
mean of defence
IAP (Ice adhering proteins) present in some polar algae, to bind them to the pack (Figure 20.2)
The AFP are part of the cold defences of those organism; in vitro, their effects include
A well known effect in nucleation, is the Kelvin effect that empirically describe the depen-
dence of the melting temperature Tm on the crystal radius Rp :
bulk αE K
Tm (Rp ) = Tm −
Rp
where Tmbulk
is the melting temperature of the ice bulk; this means that the smaller the crystal,
the lowet the melting temperature is. On this effect is based the AFP action; they adhere on the
surface of the ice, so that it cannot grow there, but just in the gaps between the proteins. This
surface modification descreases the melting temperature by decreasing the effective radius. In
other words, the AFP act as a secondary nucleation inhibitors, inhibiting at the same time the
Ostwald ripening. A simple experimental model shows us that the melting temperature drop is
cos θ
∆T ∝
d
1 Ostwald ripening is the name of the crystal growth mechanism: the critical crystals feed on the smaller ones
124
where θ is the surface conctat angle and d is the distance between proteins. This teaches us that
the AFP effect depends on the concentration of adsorbed proteins. This interation is obviously
irreversible.
There are some issues with this model, though:
When the ice crystal is bound on the prismatic plane, it keeps growing on the basal, until it
forms an elongated bipyramid; this is the reason why concentration is important in active AFP,
since new AFP have to cover the growing surface. Hyperactive AFP, on the other hand, generate
short crystals and have greater thermal hysteresis. This differences find their reasons in organism
habitats: fishes live in water, and are full of liquid in equilibrium with the cold water, so they
need AFP with a faster action. Insects, instead, live on the colder heart, so they prefer slower,
more efficient AFP.
In this, we understand the AFP main action is not to avoid freezing, but to hinder Ostwald
ripening so that the ice crystals do not damage the cells.
Hyperactive AFP structure The insects AFP are solenoid-like, with internal disulphuric
bridges keeping the structure rigid. On the exterior, amino acids with both hydrophilic and
hydrophobic residues link with the ice. This interaction is usually mediated by clatrates, that
form between the ice and the protein, while the latter explores the surface. No preorganization
is required.
AFP size effect The bigger the AFP, the bigger the obstruction, the bigger the effect.
Solenoids can increase their size by increasing their spire number.
Peptide coated surfaces Antifreeze peptides, obtained by cutting the ice-binding domains
form AFP, have vey low activity, but guess what? If we coat a surface with them their activity
is much higher, due to a probable cooperative effect.
The end
125
Figure 20.2: That is not just ice ahead, it is the pack
126