Mathematical Techniques in Structural Biology - J. R. Quine
Mathematical Techniques in Structural Biology - J. R. Quine
Mathematical Techniques in Structural Biology - J. R. Quine
J. R. QUINE
Contents
0. Introduction 4
1. Molecular Genetics: DNA 6
1.1. Genetic code 6
1.2. The geometry of DNA 6
1.3. The double helix 6
1.4. Larger organization of DNA 7
1.5. DNA and proteins 7
1.6. Problems 7
2. Molecular Genetics: Proteins 10
2.1. Amino Acids 10
2.2. The genetic code 10
2.3. Amino acid template 11
2.4. Tetrahedral geometry 11
2.5. Amino acid structure 13
2.6. The peptide bond 13
2.7. Protein structure 14
2.8. Secondary structure 14
3. Frames and moving frames 19
3.1. Basic definitions 19
3.2. Frames and gram matrices 19
3.3. Frames and rotations 20
3.4. Frames fixed at a point 20
3.5. The Frenet Frame 20
3.6. The coiled-coil 22
3.7. The Frenet formula 22
3.8. Problems 24
4. Orthogonal transformations and Rotations 25
4.1. The rotation group 25
4.2. Complex form of a rotation 28
4.3. Eigenvalues of a rotation 28
4.4. Properties of rotations 29
4.5. Problems 30
5. Torsion angles and pdb files 33
5.1. Torsion Angles 33
5.2. The arg function 34
5.3. The torsion angle formula 34
5.4. Protein torsion angles. 35
5.5. Protein Data Bank files. 35
1
2 J. R. QUINE
10.11. Problems 89
4 J. R. QUINE
0. Introduction
These are notes about using mathematics to study the molecular structure of
molecules, especially long organic molecules like DNA and proteins. In both crys-
tallography and NMR structure determination, mathematics plays an important
role.
Distance geometry and discrete differential geometry are useful in structure de-
termination using NMR data. Molecules can be studied using matrices of distances
between pairs of atoms. The study of such matrices is called distance geometry.
Proteins and DNA are also long strings of atoms that can be modeled using differ-
ential geometry of curves. These techniques also apply to other questions in shape
theory and robotics.
Crystallography uses a different method. The molecule is modeled using level
surfaces of a positive function giving the electron density at a point in space. This
function is expanded as a Fourier series and the experiment gives information on
the coefficients of the series.
Section 1 discusses the structure of DNA. The discovery in 1953 of the double
helical structure of the DNA molecule by Watson and Crick initiated a revolution
in biology. First, it dramatically showed that the structure of a molecule can give
a clear picture of how it functions. In this case the double helical structure of DNA
shows how it replicates itself and transmits genetic information in the form of a
code. This helped initiate the subject of bioinformatics.
Secondly, the discovery by Watson and Crick of the structure of DNA showed
that molecular structures can be discovered by a considering chemical properties,
experimental data, and geometry. Ball and stick models and paper cutouts were
used to understand the geometry of the molecule. In the computer era, all the
relevant data is incorporated databases, and computers are able to manipulate
the structure. This is computational structural biology and molecular dynamics.
Section 9 gives a brief introduction to this.
There was a key piece of evidence that helped in the discovery of the structure
of DNA, an X-ray diffraction pattern found by Rosalind Franklin. The relationship
between the structure and the diffraction pattern was well known to Crick, and
he knew from the diffraction pattern that DNA was some kind of helix. Section 6
discusses how the electron density function of crystals is computed from diffraction
intensities using Fourier series.
Section 2 discusses proteins The discovery using X-ray crystallography of the
structures of the proteins myoglobin by John Kendrew in 1958 and hemoglobin
by Max Perutz in 1959 showed how the arrangement of the atoms in these struc-
tures reveals how these proteins function. There are a large number of atoms in
myoglobin, approximately 1200 not including hydrogens, and visualizing the ar-
rangement of them in 3D space is difficult. A graphic artist, Irving Geis, created
pictures to make the arrangement clear. The advent of computer graphics enabled
those studying proteins to see graphic renditions using ribbon diagrams and rotate
them with a mouse. The ribbon diagram is a graphic created by Jane S. Richard-
son to indicate a sequence of atoms in a helical formation. These helices are called
secondary structures. Secondary structures such as alpha helices, beta sheets, and
coiled coils are discussed in section 5 on torsion angles, in section 3 on the Frenet
formula, and in section 8 on the discrete Frenet frame.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 5
1.2. The geometry of DNA. The reason that DNA functions the way it does is
due to the geometrical structure of the bases. In figure 1 are close up views of the
way nucleotides pair in a DNA structure.
The structure of DNA was discovered by Watson and Crick by studying the
geometry of the DNA basepairs. The shapes of the molecules and the distribution
of charges on them forces them to bond together only in pairs A-T and C-G. The
weak bonds between the bases, indicated by the dashed lines, are called hydrogen
bonds. The pair A-T is held together by 2 hydrogen bonds and the pair C-G by
3 hydrogen bonds. The bonds are caused by the electrical attraction between the
positive and negative charges on the atoms as indicated.
1.3. The double helix. The basepairs are strung together by a sugar-phosphate
backbone, and form a double helix. Figures 2 and 3 show in detail how the base
pairs fit into the helix.
A big clue for Watson and Crick was an x-ray diffraction photo (figure 4) of pu-
rified DNA fibers. This photo was obtained by Rosalind Franklin in the laboratory
of Maurice Wilkins. The inner cross pattern of spots indicated to Watson and Crick
that DNA was some sort of helix.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 7
The relationship between the diffraction pattern and the structure of a molecule
will be discussed later in the book.
1.4. Larger organization of DNA. In the human organism, DNA is organized
into 23 pairs of chromosomes. Chromosomes are tightly wound strands of DNA.
The strands are very long and without the presence of certain enzymes, DNA
would become tangled and knotted. The way that DNA is ordered in chromosomes
and the way it is unknotted by the enzymes is a current topic of research which
uses ideas from geometry and topology.
1.5. DNA and proteins. The key to understanding the effect of genetics on health
and disease is to understand that genes code for proteins. It is these proteins which
determine the characteristics of organisms. Proteins and the genetic code will be
discussed in the next lecture.
1.6. Problems.
(1) From the Watson and Crick paper, what would be the approximate length
of a straight DNA helix of 106 basepairs?
8 J. R. QUINE
Alanine Ala A
Cysteine Cys C
Aspartic AciD Asp D
Glutamic Acid Glu E
Phenylalanine Phe F
Glycine Gly G
Histidine His H
Isoleucine Ile I
Lysine Lys K
Leucine Leu L
Methionine Met M
AsparagiNe Asn N
Proline Pro P
Glutamine Gln Q
ARginine Arg R
Serine Ser S
Threonine Thr T
Valine Val V
Tryptophan Trp W
TYrosine Tyr Y
Table 1. Amino acids and their abbreviations.
2.2. The genetic code. Based on the discovery of the structure of DNA as a long
word in a four letter alphabet, the key to genetics was found to be a code. A
sequence of three letters is a code for one of the 20 amino acids. A string of 3n
letters codes for a protein with n amino acids and gives the sequence in which the
amino acids are strung together.
Attempts were made to discover the code by logical reasoning, but the code was
found by experiments expressing proteins from manufactured sequences of DNA.
It was found, for example, that TGC codes for the amino acid Cystine. Also TAA
codes for STOP, which means that the string of amino acids stops, and the protein
is complete.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 11
2.4. Tetrahedral geometry. The geometry of the amino acids is partially deter-
mined by the tetrahedral geometry of the carbon bond. The bond directions for
12 J. R. QUINE
carbon are approximately the same as from the centroid of a regular tetrahedron
to the vertices.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 13
2.5. Amino acid structure. Figure 8 shows a typical structure of the amino acid
leucine. Configuration of side chains are sometimes called rotamers because the
tetrahedral geometry at the carbon bonds stays the same and the only degree of
freedom is rotation about the carbon bonds.
2.6. The peptide bond. To form a protein, amino acids are bonded together in
sequence making a long chain. The bond between adjacent amino acids is called
the peptide bond. The carboxyl group of one amino acid and the amide group of
the subsequent amino acid lose an oxygen and two hydrogens, i. e., water (figure
9).
The bond is approximately planar; the six atoms involved in the bond lie in a
plane, called the peptide plane. The electrons associated with these atoms form a
cloud called the π orbital. There is a special geometry associated with the peptide
plane shown in figure 10.
14 J. R. QUINE
2.7. Protein structure. As amino acids are bonded together they form into a
specific shape called the fold. The structure of a protein is hard to see because of
the number of atoms involved.
Before the era of computer graphics, only an artist could render an understand-
able picture of a protein. One such artist was Irving Geis. Here (figure 11) is his
painting of sperm whale myoglobin, the first protein structure to be discovered.
There is a website devoted to artistic renditions of molecules by Irving Geis and
others. These renditions have been replaced by computer graphics which rely on
ribbon diagrams.
viewers. These can be found online at the Protein Data Bank website. See figures
14 and 15. The spiraling ribbons are alpha-helixes and the straight ribbons are
beta-sheets. The original structure of sperm whale myoglobin by John Kendrew in
the the Protein Data Bank under the code 1mbn.
Figures 12 and 13 are a detailed view of the alpha helix and the beta sheet. The
structures are distinguished by the hydrogen bonding patterns. In an alpha helix
the hydrogen bonds join atoms nearby in the chain; in a beta sheet the hydrogen
bonds join atoms between two different parts of the chain.
16 J. R. QUINE
form a frame (e1 , e2 , e3 ) which is the identity matrix I. This is called the standard
basis or lab frame.
The determinant of the matrix (v1 , v2 , v3 ) is equal to the scalar triple product,
v1 · (v2 × v3 ) = det(v1 , v2 , v3 ).
(1) Ft F = I.
3.3. Frames and rotations. Vectors or points in space are usually given by col-
umn vectors whose entries are coordinates in the lab frame,
a
v = b = a e1 + b e2 + c e3 .
c
For simplicity a vector will often be written as a row vector (a, b, c) instead of the
column vector (a, b, c)t .
Any other frame F gives another set of coordinates to each point in R3 . If
F = (v1 , v2 , v3 ) then any w can be written as a linear combination of the vectors
in F as
(2) w = x v1 + y v2 + z v3
and (x, y, z) are said to be the coordinates in the frame F.
Equation (2) can also be written conveniently in matrix form as
x
(3) w = F y .
z
An orthonormal frame F can also be thought of as a way to specify an orthogonal
transformation. There is a unique transformation of space leaving the origin fixed
and sending the lab frame I = (e1 , e2 , e3 ) to an orthogonal frame F = (u1 , u2 , u3 )
and this is given by
t t
(x, y, z) → F(x, y, z)
so the transformation is given in lab frame coordinates by multiplication on the
t
left by the matrix F. If (x, y, z) are the coordinates in the lab frame before the
t
rotation, then F(x, y, z) are the coordinates in the lab frame after the rotation.
We see that
Fe1 = u1 Fe2 = u2 Fe3 = u3
are the columns of the orthogonal matrix F.
3.4. Frames fixed at a point. We have been thinking of frames as vectors with
initial points at the origin. If the initial points are fixed at a point p, the frame F is
denoted {F, p}. The Euclidean motion that sends {I, 0} to {F, p} is an orthogonal
transformation F followed by translation by p.
v → {Fv, p}.
3.5. The Frenet Frame. A moving frame is a frame F(t) which is a function of
t. The frame is thought of as changing with time and is often thought of as moving
along a curve. A curve in space is a vector function x(t) whose coordinates give
the parametric equations for the curve. Think of x(t) as the position of an object
at time t. If the initial points of the vectors of F(t) are at the point x(t) we write
{F(t), x(t)} and F is said to move along the curve x.
A moving frame frequently used with a curve is the Frenet frame. The first
vector in the frame is the unit tangent vector t pointing along the curve in the
direction of motion. The other vectors are the normal vector n and the binormal
vector b. The Frenet frame is defined as
F(t) = (t(t), n(t), b(t))
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 21
where
dx/dt
t(t) =
|dx/dt|
(4) dt/dt
n(t) =
|dt/dt|
b(t) = t(t) × n(t).
For the frame to be defined at a point we must have dx/dt 6= 0 and dt/dt 6= 0. The
speed is
ds/dt = |dx/dt| ,
so the unit tangent is the derivative of the position vector with respect to distance,
dx/dt
t(t) = = dx/ds.
ds/dt
By (4), (t, n, b) is a right handed frame, and the lengths of t and n are 1. To see
that it is an orthonormal frame, it is enough to show that t and n are perpendicular.
This is shown by differentiating
t·t=1
using the product rule, to get
2t · (dt/dt) = 0.
This is also geometrically evident, since t(t) can be thought of as a curve on a sphere
of radius 1. Any tangent vector to the curve at a point must be perpendicular the
radius of the sphere through the point, and therefore perpendicular to t.
3.5.1. Example of a Frenet Frame. Consider a helix of radius r and pitch 2πp where
the pitch of the helix is defined as the change in the z coordinate per turn of the
helix. The equation of the helix is
(5) h(r, p)(t) = (r cos t, r sin t, p t).
One complete turn of the helix corresponds to t going from 0 to 2π. During this
time the z coordinate changes by 2πp which is the pitch.
22 J. R. QUINE
3.6. The coiled-coil. Francis Crick used a curve called the coiled-coil to describe
the backbone structures of proteins like collagen, which is found in fibrous tissue
in the body. He made his atomic structure by placing the alpha carbon atoms of a
protein at equally spaced parameter values along the curve. The equation for this
curve uses the Frenet frame for a helix.
The coiled coil is drawn by drawing a circle at a radians per second in the n, b
plane of the Frenet frame of helix (5) while traversing the helix. If this circle has
radius r2 then the vector equation of the coiled-coil is
C(t) = h(t) + r2 (cos at n + sin at b)
where h = h(r1 , p) is the equation of the helix. The equation can also be written
as
t
(7) C(t) = h(t) + r2 F(t)(0, cos at, sin at)
where F is the Frenet frame for h. The curve coils around a tube of radius r2 about
the helix h. In the moving frame the coil appears as a circle.
The helix h is called the major helix, and r1 is the radius of the major helix.
The coiled-coil C is called the minor helix, and r2 is the radius of the minor helix.
The major helix can be thought of as a curved axis for the minor helix. One turn
of the major helix corresponds to a variation of t by 2π; one turn of the minor helix
corresponds to a variation of t by 2π/a. Thus there are a turns of the minor helix
per turn of the major helix. (See figure 17 of a coil with with a = 4)
Maple demo of coiled coil
3.7. The Frenet formula. Curvature and torsion are defined by the derivative of
the Frenet frame. The derivative of a matrix is similar to the derivative of a vector,
it is computed by replacing each entry in the matrix by its derivative.
A square matrix A is said to be skew symmetric if
At = −A.
Note that that the entries on the diagonal of a skew-symmetric matrix are 0. An
important fact is
dA
If A(t) is orthogonal then At is skew-symmetric.
dt
This can be seen by taking the derivative of
At A = I
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 23
Figure 17. A coiled coil with 4 turns of the minor helix per
turn of the major helix.
The curvature measures how the curve deviates from being a straight line. If
the curve is a line, then κ = 0. The torsion how it deviates from being in a plane.
If the curve is in a plane, then t and n are in the plane and b is the normal to
the plane, so τ = 0. Using the Frenet formula an equation for the curve can be
determined from the function κ and τ .
In the example of the helix (5) the curvature is a constant r/(r2 + p2 ) and the
torsion is constant p/(r2 + p2 ). In general the curvature and torsion are functions
of the parameter t.
3.8. Problems.
(1) Let
x(t) = (r cos t, r sin t, p t)
be a helix. Show that the curvature is a constant r/(r2 +p2 ) and the torsion
is constant p/(r2 + p2 ).
(2) Use Maple and the Frenet Formula
0 −κ 0
dF
Ft = κ 0 −τ
ds
0 τ 0
to find the curvature and torsion of the curve (3t − t3 , 3t2 , 3t + t3 ). You can
use the procedure presented in the lecture for computing the Frenet Frame
F.
(3) Show that
v1 · (v2 × v3 ) = det(v1 , v2 , v3 ).
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 25
4.1.1. Rotations and cross products. Rotations are orthogonal transformations which
preserve orientation. This is equivalent to the fact that they preserve the vector
cross product:
for all rotations A and vectors v and w. Recall the right hand rule in the definition
of the cross product (fig 18). The cross product is defined in terms of lengths and
angles and right-handedness, all of which remain unchanged after rotation.
The fact that A preserves the cross product can also be proved using only the
fact that A0 A = I and det A = 1 (see problem 1).
4.1.3. Three dimensions. In three dimensions, matrices for rotation about coordi-
nate axes have a form related to the 2 dimensional rotation matrices:
• Rotation about the x axis
1 0 0
Rx (θ) = 0 cos θ − sin θ
0 sin θ cos θ
Rotation about the y axis
cos θ 0 sin θ
Ry (θ) = 0 1 0
− sin θ 0 cos θ
Rotation about the z axis
cos θ − sin θ 0
Rz (θ) = sin θ cos θ 0 .
0 0 1
All rotations are counterclockwise about the axis indicated in the subscript. The
rotations Rz (θ), are exactly the ones that leave the vector e3 = (0, 0, 1)0 fixed and
can be identified with rotation in the xy plane. Similarly for Rx and Ry .
The rotations Rx (θ) commute,
Rx (θ)Rx (φ) = Rx (θ + φ)
= Rx (φ)Rx (θ).
Similarly rotations Ry (θ) commute and rotations Rz (θ) commute. In general, how-
ever, rotations in three dimensions do not commute. For example,
0 −1 0
Rx (π)Rz (π/2) = −1 0 0
0 0 −1
but
0 1 0
Rz (π/2)Rx (π) = 1 0 0 .
0 0 −1
Since rotations form a group we can get other rotations by multiplying together
rotations of the form Rx , Ry , and Rz . It can be shown that any rotation A can be
written as a product of three rotations about the y and z axes,
(14) A = Rz (α)Ry (γ)Rz (δ).
The angles α, γ, δ are called Euler angles for the rotation A. See problem 6
The proof that any rotation A can be written as above in terms of Euler angles
relies on a simple fact. Any unit vector u can be written in terms of spherical
coordinates as
sin φ cos θ
(15) u = sin φ sin θ ,
cos φ
and u can be obtained from e3 by two rotations
(16) u = Rz (θ)Ry (φ)e3 .
28 J. R. QUINE
(22) Av · Aw = v · w
(23) Av × Aw = A(v × w)
(24) u · (R (u, θ) v − v) = 0
0
We can write (26) in a different form. If u = (a, b, c) write
0 −c b
Su = c 0 −a
−b a 0
Then
(27) R (u, θ) = cos θ I + (1 − cos θ)uu0 + sin θ Su
The identities (22) was shown above. For (23), see problem 1. The proofs of
(24) and (25) are left as an exercise (see exercise 4).
Here is a proof of (26). The idea is to change variables so that u becomes e3 and
then the formula becomes obvious. Let A be a rotation such that Au = e3 and let
w = Av. Using (25) and applying A to both sides of the equation, it becomes
R (e3 , θ) w = cos θ (w − (e3 · w)e3 ) + sin θ (e3 × w) + (e3 · w)e3 .
Writing w = (x, y, z)0 the equation becomes
x x −y 0
R (e3 , θ) y = cos θ y + sin θ x + 0 ,
z 0 0 z
which is clear from the definition of R (e3 , θ).
4.5. Problems.
(1) (a) Show that if det A = 1, then
v1 · (v2 × v3 ) = Av1 · (Av2 × Av3 )
for all vectors v1 , v2 , v3 .
(b) If in addition A0 A = I, so that A preserves the dot product, show
that
A(v2 × v3 ) = Av2 × Av3
for all vectors v2 , v3 .
(2) Suppose A0 A = I and det A = 1 and Ae3 = e3 show that A is of the form
cos θ − sin θ 0
sin θ cos θ 0 .
0 0 1
Use section 10 problem 13.
(3) Suppose B1 and B2 are rotations with B1 e3 = B2 e3 . Show that B1 Rz (θ)B−1
1 =
B2 Rz (θ)B−1
2 .
(4) Prove that if A is a rotation, then
AR (u, θ) A−1 = R (Au, θ) .
(5) Show that any rotation A can be written as a product of three rotations
about the y and z axes,
A = Rz (α)Ry (γ)Rz (δ).
The angles α, γ, δ are called Euler angles for the rotation A. (Hint: Write
Ae3 in spherical coordinates.)
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 31
(6) Every rotation can be written as a product of three rotations in the form
Rz (α)Ry (γ)Rz (δ), and the angles α, γ and δ are called Euler angles. Write
Rx (θ) in that form and find Euler angles for Rx . (The formula AR(u, θ) =
R(Au, θ)A might be √ useful.)
(7) Let u = (1, −1, 1)/ 3. Use (27) to find the matrix for R(u, 2π/3).
(8) If u = (a, b, c)0 write
0 −c b
Su = c 0 −a .
−b a 0
Suppose that u is a unit vector (with real entries) so that |u|2 = u0 u = 1.
(a) Show that Su u = 0.
(b) Show that
Su2 = uu0 − I
(9) With Su as in the previous problem with u a unit vector,
(a) Show that
Su3 = −Su
(b) Show that
Su2n = (−1)n (I − uu0 ) for n ≥ 1
and
Su2n+1 = (−1)n Su for n ≥ 0
32 J. R. QUINE
The dihedral angle can be thought of as the angle between two planes (See
figure 20). It is the angle counterclockwise from the normal vector a × b of the
plane containing a and b to the normal vector b × c of the plane containing b and
c. Both a × b and b × c are in the plane perpendicular to b
MOLECULAR MODELING OF PROTEINS 411
p1
a
α p3
φ
p2 b β
c
p
Angles αbond
with andlength
β are bond angles.
p
krk = (r, r) ,
where
5.2. The arg function. The torsion angle can be defined in terms of the argument
(p, q) := p1 q1 + p2 q2 + p3 q3
of a vector oriscomplex number. Define θ to be the argument of a vector (x, y) 6=
the standard inner product in R3 .
◦ i-j and k-l, we◦have the bond vectors
(0, 0), written θ = arg(x,
Similarly, for y), if −180
two adjacent bonds < θ ≤ 180 and
p = xj − xi ,pq = xl − xk .
cos θ = x/ (x2 + y 2 )
The bond angle α =<)(i-j-k), of Fig. 3, can pthen be computed from the formulas
sin θ
cos α =
= (p,y/
r) (x2 +kpy×2rk
, sin α =
).
kpkkrk kpkkrk
We can also write the argument in terms of complex numbers. The angle θ =
(together with α ∈ [0◦ , 180◦ ]), where
arg(x, y) if x + iy is written in polar form
p2 r3 − p3 r2
r
p × r = p3 r1 − p1iθ
x + iyp1= rep2 r31.
r2 −
(30) φ = τ (−i, k, p) .
5.3. The torsion angle formula. We give a formula for computing the dihedral
angle, hence the torsion angle, in terms of the argument.
The Dihedral Angle Formula. For vectors a, b, and c for which the torsion angle
is defined,
(31) τ (a, b, c) =
arg −|b|2 a · c + (a · b)(b · c), |b|a · (b × c) .
Proof. Recall that the left hand side is the angle from a × b to b × c measured
counterclockwise around b.
34 J. R. QUINE
Notice that both sides of (31) are unchanged if a, b, c are replaced by Aa, Ab,
and Ac for a rotation A. So we can assume b is in the direction of e3 . Likewise the
equation unchanged if b is replaced by λb for λ > 0 (dilation), so we can assume
b = e3 .
The equation is unchanged if a is replaced by its projection a − (a · b)b perpen-
dicular to b. So we can assume a is perpendicular to b. As above we can rotate
and dilate so that a = e2 . Let c = (x, y, z)0 . Then (31) is equivalent to
τ (e2 , e3 , (x, y, z)) = arg (−y, x)
which is true because the left hand side is the angle from e2 × e3 = e1 to
e3 × (x, y, z) = (−y, x, 0).
Here is a Maple worksheet to compute torsion angles.
5.4. Protein torsion angles.
5.4.1. Protein backbone torsion angles. The atoms along a protein backbone are
Cα -C-N-Cα -C-N-Cα . . . in a sequence repeating every third atom. If each atom has
a set of coordinates, the torsion angles along the backbone of a protein are named
as follows
• the angle Tor (C, N, Cα , C) is the φ torsion angle
• the angle Tor (N, Cα , C, N) is the ψ torsion angle
• the angle Tor (Cα , C, N, Cα ) is the ω torsion angle
Moving along the backbone we get a sequence of φ, ψ and ω torsion angles that
can be used to describe the structure of the backbone.
5.4.2. Protein sidechain torsion angles. We can also get torsion angles by moving
along a side chain. The greek letter subscripts for the atoms along the side chain
are indicated in figure 21. For example, the sequence of atoms Cα , Cβ , Cγ , Cδ of
Lysine determine the χ2 torsion angle. The atoms Cα , Cβ , Cγ , S of Methionine
determine the χ2 torsion angle.
For the χ1 angle, the first atom used for the torsion angle is the N on the
backbone. For example,
• for Leucine, the angle Tor (N, Cα , Cβ , Cγ ) is the χ1 torsion angle
• for Threonine, the angle Tor (N, Cα , Cβ , O) is the χ12 torsion angle (when
there are two χ1 angles, another subscript is added).
5.5. Protein Data Bank files. Structures of all known proteins are stored online
at the Protein Data Bank. The files there are called pdb files. The structural
information contained in the file is a list of three coordinates, (x, y, z), for the
centers of every atom in the molecule (although hydrogen atoms are left out because
they are small and their positions can be determined from the positions of the other
atoms). For example, a file identified as 1E0P contains the coordinates for a protein
called bacteriorhodopsin. Here is part of the file which can be downloaded from
the RCSB Protein Data Bank:
ATOM 1557 CB ILE A 205 -14.646 17.302 50.448 1.00 21.52 C
ATOM 1558 CG1 ILE A 205 -13.253 16.800 50.104 1.00 19.39 C
ATOM 1559 CG2 ILE A 205 -15.422 17.496 49.149 1.00 22.80 C
ATOM 1560 CD1 ILE A 205 -13.299 15.453 49.472 1.00 18.37 C
ATOM 1561 N PHE A 206 -12.336 19.006 52.127 1.00 22.92 N
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 35
indicate the number of residues per turn, the number of amino acids for each 180
degree turn about the axis of the helix, for the corresponding regular structure. All
of the structures can be thought of as helices with various numbers of residues per
turn.
5.7. Torsion angles on the diamond packing. The diamond packing is a set of
points in space where the centers of carbons of a diamond crystal lie. The diamond
packing is obtained from the face centered cubic lattice (the set of points with
integer coordinates adding up to an even number) by adding to it points of the face
centered cubic lattice moved over by the vector (1/2, 1/2, 1/2). So
By moving on a path through the diamond packing you can get torsion angles of
180, 60, and −60 only (or undefined if two consecutive vectors are parallel). Since
the -60 degree torsion angles are close to the ones for alpha helices, attempts have
been made to model proteins by putting atoms in a protein on points in a diamond
packing. Maple demo
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 37
5.8. Appendix, properties of cross product. Here are some useful formulas
involving the cross product and the dot product
(a × b) · (c × d) = (a · c)(b · d) − (a · d)(b · c)
a × (b × c) = (a · c)b − (a · b)c
5.9. Problems.
(1) Suppose a pdb file contains the following lines
ATOM 813 N ARG O 113 -25.027 14.899 -17.838 1.00 8.68 N
ATOM 814 CA ARG O 113 -24.794 13.442 -17.751 1.00 8.96 C
ATOM 815 C ARG O 113 -24.526 13.069 -16.312 1.00 9.33 C
ATOM 816 O ARG O 113 -24.431 13.934 -15.454 1.00 8.24 O
ATOM 817 CB ARG O 113 -23.838 12.955 -18.888 1.00 7.62 C
ATOM 818 CG ARG O 113 -24.493 12.701 -20.214 1.00 7.02 C
ATOM 819 CD ARG O 113 -23.954 12.682 -21.559 1.00 4.59 C
ATOM 820 NE ARG O 113 -24.914 12.545 -22.637 1.00 4.39 N
ATOM 821 CZ ARG O 113 -24.964 12.763 -23.899 1.00 5.44 C
ATOM 822 NH1 ARG O 113 -24.081 13.156 -24.830 1.00 5.82 N
ATOM 823 NH2 ARG O 113 -26.153 12.663 -24.592 1.00 8.39 N
ATOM 824 N THR O 114 -24.536 11.769 -16.002 1.00 11.55 N
ATOM 825 CA THR O 114 -24.313 11.347 -14.633 1.00 11.92 C
ATOM 826 C THR O 114 -22.940 10.686 -14.493 1.00 11.89 C
ATOM 827 O THR O 114 -22.473 9.949 -15.295 1.00 11.88 O
ATOM 828 CB THR O 114 -25.467 10.527 -13.954 1.00 11.40 C
ATOM 829 OG1 THR O 114 -26.730 11.163 -14.288 1.00 10.92 O
ATOM 830 CG2 THR O 114 -25.425 10.484 -12.405 1.00 11.83 C
(a) Write in sequence the numbers of the atoms used to compute the χ2
torsion angle for ARG. Use Maple and the torsion angle formula to
compute the χ2 torsion angle.
(b) Using the coordinates above, use Maple to find (in degrees) the angle
Cα -C-O in THR.
(2) Using the formula in the notes for torsion angle, show that
Tor(p1 , p2 , p3 , p4 ) = Tor(p4 , p3 , p2 , p1 ).
38 J. R. QUINE
Most of the information that we have on protein structure comes from x-ray
crystallography. The basic steps in finding a protein structure using this method
are:
What follows is a brief discussion of some of the mathematics involved in finding the
electron density from the diffraction intensities and phases. This requires studying
Fourier series for functions periodic on lattices.
The origin of the coordinate system can be put at any point crystallized protein.
If the origin is placed at an atom in the protein, then every lattice point will be on
exactly the same atom in another copy of the protein in the lattice.
6.1.1. Examples of lattices. First consider two dimensions. The vectors a = (1, 0)
and b√ = (0, 1) generate the square lattice. The vectors a = (1, 0) and b =
(1/2, 3/2) generate the hexagonal lattice.
In three dimension the vectors a = (1, 0, 0), b = (0, 1, 0), and c = (0, 0, 1)
generate the cubic lattice. The vectors a = (1, 1, 0), b = (1, 0, 1), c = (0, 1, 1)
generate the face centered cubic lattice. The face centered cubic lattice can also be
described as the set of points (x, y, z) with integer coordinates such that x + y + z
is even.
The lattices described above are examples of lattices with symmetries. A sym-
metry of a lattice is a rotation such that the rotation and its inverse maps every
point in the lattice onto another point of the lattice. We say the rotation leave the
lattice unchanged. The square lattice, for example, is left unchanged by 90 degree
rotation, an order 4 symmetry since the product of 4 rotations of 90 degrees is the
identity. The hexagonal lattice is left unchanged by 60 degree rotation, an order 6
symmetry. The cubic lattice is left fixed by 90 degree rotation about any coordinate
axis. These are order 4 symmetries. The face centered cubic lattice is left fixed
by 120 degree rotation about the axis in the direction (1, 1, 1). This is an order 3
symmetry.
The trace of a matrix A (tr A) is the sum of its diagonal entries. A basic fact in
linear algebra is that
trA = tr(B −1 AB)
for all invertible matrices B. Applying this to (34),
(35) tr R = 2 cos θ = tr M = h1 + k2 .
Now from (35) it follows that 2 cos θ is an integer, and so must be equal to −2,
−1, 0, 1, or 2. Thus the only possibilities for θ are 0◦ , ±60◦ , ±90◦ , ±120◦ , and
180◦ and this proves the crystallographic restriction in dimension 2.
The simplest and oldest rule of diffraction is Bragg’s law derived by the English
physicists Sir W.H. Bragg and his son Sir W.L. Bragg in 1913,
(36) 2 d sin θ = n λ
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 41
where θ is the angle of incidence of the x-ray, λ is the wave length of the x-ray, and
n is some integer. The formula holds when refraction yields a bright spot at angle
θ.
6.3.1. Mathematical statement of Bragg’s law. Bragg’s law is an equation for the
phase difference between two sine waves. We say the functions sin 2π 2π
λ x and sin λ (x+
p
p) have phase difference λ cycles. The wave length is λ for both.
42 J. R. QUINE
A sine wave with amplitude 1 and wavelength λ moving in the direction of the
unit vector u1 can be written as
2π
(37) sin x · u1 .
λ
where x is a vector representing a point in space. This represents, at a fixed time,
an x-ray beam moving in the direction u1 . The reflection off a point x0 along a line
through x0 in the direction u2 , the wave has the equation
2π
(38) sin (x · u2 + x0 · (u1 − u2 )) .
λ
This can be seen by checking that (37) and (38) agree at x = x0 .
Similarly the reflection off the point x1 along a line through x1 in the same
direction u2 , the wave has the equation
2π
(39) sin (x · u2 + x1 · (u1 − u2 )) .
λ
See figure 24. The difference in phase between (38) and (39) is
1
(40) (x1 − x0 ) · (u1 − u2 )
λ
cycles, and when the difference is a integer n,
(41) (x1 − x0 ) · (u1 − u2 ) = nλ,
the outgoing waves are in phase at the point at infinity in the direction u2 .
Equation (42) is the mathematical expression of Bragg’s law (36). To see this,
note that (x1 − x0 ) · (u1 − u2 ) = d|u1 − u2 | and |u1 − u2 | = 2 sin θ.
It says, for example, that on the integer lattice, since v always has integer co-
ordinates, we get reflections in the direction u2 from the planes normal to u1 − u2
only if (u1 − u2 )/λ = (h, k, l) also has integer coordinates. The reflection spot in
the direction u2 is labelled (h, k, l) for integers h, k, and l. They come from planes
on the integer lattice parallel to the plane hx + ky + lz = 0.
The level surface shows more structural detail at higher (1 Angstrom) resolution.
The electron density map is the level surface of a function computed from a
Fourier series. Here is how Maple deals with level curves of functions of two variables
and level surfaces of functions of three variables.
Periodic functions are studied using Fourier series. We begin by looking at the
theory for one variable. This will generalize to the three variable case which we are
interested in.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 45
6.5. Fourier series. The electron density function is constructed using Fourier
Analysis. Fourier analysis is the approximation of periodic functions by sines and
cosines. The basic idea of Fourier analysis is that any real valued function f (x) of
period 1 can be approximated by sums of the type
Xn
(43) a0 + [aj cos(2πjx) + bj sin(2πjx)]
j=1
for suitable choice of the real coefficients aj and bj and for n large enough. A sum
of type (43) is called a Fourier series.
The surprising part about Fourier’s discovery is that even a discontinuous func-
tion such as a square wave can be approximated by a Fourier series. For example,
consider the period 1 square wave defined on the interval [0, 1] by the function
0, if 14 < x < 43
(44) f (x) =
2, otherwise.
R1
This is a function of period 1, and 0 f (x) dx = 1. The Fourier cosine series
n
X
(45) 1+ aj cos 2π(2j + 1)x
j=0
with coefficients
4(−1)j
(46) aj =
π(2j + 1)
approximates f closely for large n. Here is a Maple demo
6.5.1. Complex form of Fourier series. We will discuss how the coefficients of a
Fourier series are computed. The simplest formula uses the complex form of the
Fourier series. Complex numbers give a convenient way of writing Fourier series,
even for real functions.
where
1
cj = (aj − ibj ) , j = 1, . . . , n
2
46 J. R. QUINE
c0 = a0
and
c−j = c¯j , j = 1, . . . , n.
So now we have a sum (47) of complex functions equal to the real sum (43).
The sum (47) adds up to a real function since every term cj e2πijx is added to its
conjugate cj e−2πijx .
If
n
X
(48) f (x) = cj e2πijx
j=−n
then the Fourier coefficients can be recovered from the function f by the formula
Z 1
(49) ck = f (x)e−2πikx dx.
0
This can be seen by multiplying both sides of (48) by e−2πikx and integrating,
noting that (
Z 1
2πi(j−k)x 6 k
0, if j =
e dx =
0 1, if j = k.
The Fourier coefficients, like all complex numbers, have an absolute value and
an argument, that is, we can write them in polar form as
(50) cj = |cj |eiδj .
The number δj is called the phase and |cj | the norm. These are the phases referred
to in the crystallography phase problem.
6.5.2. Delta function. The square wave function (44) is an example of an approxi-
mation to a periodic 1 Dirac delta function. The periodic delta function δZ , or the
delta function of the one dimensional integer lattice Z, is thought of as a “function”
of period 1 such that
Z 1/2
(51) δZ (x)f (x) dx = f (0)
−1/2
There is no function that has these properties, but the delta function can be thought
of as the limit as M → ∞ of square waves such as
M 1 1
(52) f (x) = 2 , if n − M < x < n + M , n an integer;
0, otherwise,
of which (44) is the special case M = 4. Similarly by replacing 0 by a in (51), get
the definition of the periodic delta function δZ+a .
A way to think about electron density function of a crystal and the phase problem
is to use a delta function. Think of the periodic delta function δZ+a in one dimension
as describing the electron density of a crystal formed by a single atom of zero
radius and mass 1 at each point a + n where n is an integer. Better, think of it
as approximated by a function which is non-negative but zero except near a and
which integrates near a to 1. By (49), the Fourier coefficients of δZ+a are given by
(53) cj = e−2πija .
So δZ+a can be approximated by
N
X
(54) δZ+a (x) ≈ e2πik(x−a)
k=−N
for large N . Look at the Maple worksheet to see what this function looks like. In
fact, simplifying the right hand side of (54) gives
sin 2π N + 12 (x − a)
(55) δZ+a (x) ≈ .
sin[π(x − a)]
The phase of cj is −2πja and the norm is 1. If we did not know the phase but
put 0 for the phase instead, then the coefficients cj would all be 1 and we would
get an atom at x = 0 instead of x = a. This illustrates the importance of knowing
the phases in computing the electron density function.
6.6. Three variable Fourier series. Fourier series can be used to analyze peri-
odic functions in any number of variables, for example the electron density function
ρ for a 3D crystal.
since the function is real, and where |h|, |j|, and |k| are ≤ n where n is a large
enough integer. Writing the complex Fourier coefficient in polar form as
chkl = |chkl |eiαhkl ,
the number αhkl is the phase.
In x-ray crystallography, the values |chkl | are measured from the intensities of the
spots in the diffraction pattern. The phases αhkl must be supplied by other means
to find the electron density function. This is called the phase problem. There
are mathematical techniques for finding the phases but also some guess work is
involved.
6.6.1. Delta function. Similar to the delta function for the one dimensional integer
lattice, the delta function δZ3 for the three dimensional integer lattice, or cubic
lattice, can be approximated by a finite sum of the form (56) with chkl = 1 for all
integer triples hkl. Also δZ3 +a is approximated by setting the Fourier coefficients
c` = e−2πia·` where ` = (h, k, l).
6.7. Diffraction pattern and Fourier series. Here we show how the diffraction
intensities give the absolute values of the Fourier coefficients of the electron density
function. We show this when the molecule is a single point and the electron density
is a delta function. The general case follows from this by adding delta functions for
all the atoms in the molecule.
The Fourier coefficients of the electron density function are c` = me−2πix0 ·` for
` ∈ L. Note that ρ(x) is an approximation of the delta function δx0 +Z3 .
these atoms. The equation for the beam reflected from the atom at x0 is given as
(38). The computation simplifies if we replace the sine function by the complex
exponential. Also assume that the intensity of the reflected beam is proportional
to the electron density m at the point. Then the outgoing beam has equation
2πi
(59) m exp (x · u2 + x0 · (u1 − u2 ))
λ
giving the amplitude and phase for the beam at a point x in space. The beam from
the entire lattice of points is the sum of the intensities of each point in the lattice,
(60)
X 2πi
w(x) = m exp (x · u2 + (x0 + `) · (u1 − u2 ))
λ
`∈L
X
2πi 2πi 2πi
= m exp x · u2 exp x0 · (u1 − u2 ) exp ` · (u1 − u2 ) .
λ λ λ
`∈L
the electron density function (58). Since only the amplitude of the beam can be
measured from the intensity of the diffraction spot, the data gives only |c`∗ | = m,
the absolute value of the Fourier coefficient, and not the phase.
Using the above method of argument, it can be shown that the same result
holds when we add electron densities of atoms to get the electron density of a
molecule. Only the absolute value of the Fourier coefficient can be determined
from the diffraction pattern.
6.8. Problems.
(1) The face centered cubic (fcc) lattice is generated by the basis vectors
(0, 1, 1), (1, 1, 0), (1, 0, 1),
which means it is the set of all vectors of the form
a(0, 1, 1) + b(1, 1, 0) + c(1, 0, 1)
50 J. R. QUINE
where a, b, and c are integers. Show that this lattice is also the set of vectors
(p, q, r) where p, q, and r are integers and p + q + r is even.
(2) Show that the rotations
0 1 0 0 0 1
0 0 −1 and 1 0 0
−1 0 0 0 1 0
are symmetries of the fcc lattice.
(3) For 0 < a < 1/2 let fa (x) be the square wave defined on the interval
−1/2 ≤ x ≤ 1/2 by
(
1
if − a < x < a
fa (x) = 2a
0 otherwise
(a) Find the Fourier coefficients ck of the Fourier series
∞
X
ck e2πikx
k=−∞
for fa .
(b) Show that for every k, ck → 1 as a → 0. (Remark: the coefficients
ck = 1 are the coefficients of the delta function δZ )
(4) The Heaviside function can be used to construct square waves. Using Maple
plot the functions Heaviside(x) and
f (x) = Heaviside(1/4 − x) + Heaviside(1/4 + x) − 1
for −1 < x < 1.
(5) Use plot3d in Maple to plot the function
g(x, y) = f (x)f (y)
for −1 < x < 1, −1 < y < 1 and where f is the function in problem 4.
(6) For a function f (x, y) of period 1 in each variable the Fourier series is
defined by
N
X XN
ch,k e2πi(hx+ky)
h=−N k=−N
where Z 1 Z 1
ch,k = f (x, y)e−2πi(hx+ky) dxdy
0 0
are the Fourier coefficients. Let N be 2. Find the Fourier coefficients for
the function g in the previous problem. Use Maple to plot the Fourier series
and compare it with the graph of g. Try N = 3 and N = 4.
(7) Suppose
X n
f (x) = cj e2πijx
j=−n
Show that
n
X
g(x) = |cj |2 e2πijx .
j=−n
Note: In crystallography the three dimensional analog of g is called the
Patterson function for f . It is determined by the norms of the Fourier
coefficients of f and not the phases.
We briefly discuss the physics of NMR, and then describe distance geometry, a
mathematical theory that can be used to find the protein structure from some types
of NMR data.
7.1. Larmor frequency. Spins placed in a magnetic field precess; they wobble
like a spinning top. Only certain isotopes of molecules found in organic compounds
have spins that react to the magnetic field; the most common ones used in proteins
are 1 H, 13 C, 15 N. The isotopes 13 C and 15 N are not in common abundance, so
specially prepared protein samples must be used.
7.2. Splitting and chemical shift. Equation (63) assumes that the detected fre-
quency depends only on the atom and the intensity of the magnetic field. There is
another factor, however. The magnetic field intensity B0 is not the same everywhere
in the molecule; it is affected by neighboring atoms and electrons. Neighboring spins
have their own magnetic field and this perturbs the field of the magnet and changes
the frequency of precession.
52 J. R. QUINE
This is illustrated by the NMR spectrum of the hydrogen atoms in the molecule
Toluene (figure 28). The spectrum is a Fourier transform of the electrical signal
showing the intensities (vertical axis) of certain frequencies (horizontal axis). The
spectrum can be thought of as the absolute values of Fourier coefficients for the
functionPgiving the signal as a function of time. It the signal is the real part of
n
s(t) = j=1 aj e2πiωj t then the spectrum gives absolute values |aj |, j = 1 . . . n at
frequencies ωj cycles per second.
The spectrum of Toluene shows that the peak frequencies cluster around two
values. Also a reference signal is shown for hydrogens which are not part of any
molecule. The frequencies of the hydrogens in the molecule are different (shifted)
from the reference signal. The observed difference is divided by the frequency of
the reference signal times 106 , and the change in frequency is reported in parts per
million or ppm. The change is called a chemical shift. The spectrum shows that
the 3 methyl hydrogens (to the right) are shifted less that the other 5 hydrogens.
From the symmetry of the molecule it is easy to see that the peak on the right
comes from the methyl hydrogens, the hydrogens on the CH3 group at the end of
the molecule. This is because they all have the same relation to the rest of the
molecule which causes the shift.
1
The 300 MHz H NMR spectrum of Toluene
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 53
methyl group
signal from
methyl hydrogens
signals from
non-methyl reference
hydrogens hydrogen
signal
d (ppm)
The spectrum of Toluene shows that we can infer facts about the shape of a
molecule by looking at the spectrum. This suggests that we can find chemical
structures of larger molecules by NMR. However the spectrum of a protein is much
more difficult to interpret. Below is the NMR spectrum of all the hydrogens in
the protein thioredoxin indicating which part of the molecule the hydrogen signals
are coming from. Although some aspects of the structure can be deduced from the
spectrum, it would be difficult to find coordinates of the atoms from this spectrum.
There is another type of spectrum (figure 30) called a 2D NOESY spectrum. This
experiment observes two frequencies from an atom, so the intensity is a function
of two variables. The figure shows level curves for high intensity, which looks like
a set of points. These can be used to estimate distances between atoms. Such
estimates are called distance constraints. Distance constraints can be used to find
atomic coordinates using techniques of distance geometry. Similar spectra can be
used to find orientational constraints which measure angles rather than distances.
Orientational constraint measure the angle between covalent bond vectors and the
vector giving the direction of the magnetic field.
54 J. R. QUINE
There is no unique list of coordinates since a rotated and translated set of coor-
dinates gives the same distance matrix, however, coordinates can be found which
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 55
F2(ppm)
7.3.1. Distance matrix example. Label the rows and columns below by the letters
a through h. The corresponding entry gives the square of the distance between the
two corresponding points on the cube with sides of length 1 in figure 31.
0 1 2 1 1 2 3 2
1 0 1 2 2 1 2 3
2 1 0 1 3 2 1 2
1 2 1 0 2 3 2 1
1 2 3 2 0 1 2 1
2 1 2 3 1 0 1 2
3 2 1 2 2 1 0 1
2 3 2 1 1 2 1 0
For example, if va and vg are vectors giving the coordinates of points a and g
respectively, then the ag and ga entries in the matrix are |va − vg |2 . The distances
can be found from figure 31 or by writing out coordinates fro the vertices, va =
(0, 0, 0), vb = (0, 1, 0), etc.
56 J. R. QUINE
a b
e f
d c
g
h
M = (v1 , . . . , vn ).
Finding the vectors vj from the distance matrix follows in two steps
(1) find the gram matrix G = M0 M from the distance matrix. The matrix G
can be thought of as the square of M.
(2) find M from the gram matrix G. This can thought of as taking the square
root of the gram matrix.
7.4.1. Gram matrix from distance matrix. The entries in the distance matrix D are
|vj − vk |2 , j, k = 0, 1, . . . n,
The result is −2G bordered by zeros the first row and column.
7.5. Coordinates from the gram matrix. Given G obtained from a distance
matrix of vectors in 3D space, the problem is to find a square root of G, a 3 × n
matrix M such that
(67) G = M0 M.
This can be done by finding eigenvectors and eigenvalues. Since G can be written
in the form (67), it has real, non-negative eigenvalues. Since M is 3 × n there are
at most 3 non-zero eigenvalues. See exercise 1
Write
(72) V = (V1 , V2 )
58 J. R. QUINE
Now construct a diagonal matrix whose entries on the diagonal are the square
roots of the entries of E1 . Since the entries on the diagonal are non-negative,
√ the
square roots are also non-negative real numbers. Call this matrix E1 . From (73)
it follows that
p p
(74) G = V1 E1 E1 V10 .
Letting
p
M= E1 V10 ,
G = M0 M and M is a 3 × n matrix.
This procedure for finding coordinates is best illustrated using 4 points (n = 4).
Find coordinates of points in 3D space giving the distance matrix
0 2 1 1
2 0 3 1
1 3 0 2
1 1 2 0
Maple demo
Note that finding a sequence of vectors with the given distance matrix depends
on the gram matrix being rank 3 and having non-negative eigenvalues. If there are
small errors in the data, this will not be the case, but we can still solve the linear
algebra problem with vectors in a space of dimension n. These vectors may have
complex valued coordinates.
The above techniques are the basis for studying the Cayley-Menger theorem and
Cayley-Menger determinants.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 59
7.6. Problems.
We saw in the previous section how distance geometry is useful in finding protein
structures from NMR distance data. When the sample can be held rigid in relation
to the magnetic field, another method is used based on orientational constraints.
This method uses the NMR spectrum to find the coordinates of the unit magnetic
field direction in frames rigidly attached to the protein. These frames are discrete
versions of the Frenet frames discussed in section 3. We call such a frame a Discrete
Frenet Frame (DFF). The DFF is also useful in the study of robotics and kinematics
by transforming a rigid motions into a sequence of simpler ones.
Organic chemistry includes the study of long molecules such as a proteins and
DNA. The backbone of a protein can be thought of as a sequence of points at the
center of atoms rather than as a continuous function of t. Using differences rather
than derivatives a Frenet frame can be defined which is useful in analyzing the
shape of the protein, and in finding protein structures using NMR orientational
constraints.
8.3. Torsion angles and bond angles. Geometric properties of the curve can be
expressed in terms of the DFF. We have already defined the torsion angle for each
bond. The torsions angle at the bond from vk to vk+1 to be
φk = τ (tk−1 , tk , tk+1 ) k = 1, . . . , n − 1
The bond angle or curvature angle θk at vk is defined as the angle between the
bond vector from vk−1 to vk and the bond vector from vk to vk+1 ,
θk = arccos(tk−1 · tk ) k = 1, . . . , n
as in figure 32.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 61
The bond angles and can also be defined using rotation matrices,
(75) tk = R (bk , θk ) tk−1
bk+1 = R (tk , φk ) bk .
Both of these equations follow directly from the definition of the bond angle and
the torsion angle.
q1
v
1
v
2
v
0
Now write the rotation from Fk to Fk+1 in terms of the bond angle and the
torsion angle. By (75) and the fact that R (u, θ) leaves u fixed,
bk+1 = R (bk+1 , θk+1 ) R (tk , φk ) bk
(76)
tk+1 = R (bk+1 , θk+1 ) R (tk , φk ) tk .
Since rotations preserve cross products and nk+1 = tk+1 × bk+1 , it follows from
(76) that
nk+1 = R (bk+1 , θk+1 ) R (tk , φk ) nk .
The previous three equations can be combined as
(77) Fk+1 = R (bk+1 , θk+1 ) R (tk , φk ) Fk .
8.4. The Euclidean Group. The orthogonal group is the group of all rigid trans-
formation of 3D space which leave the origin fixed. Every rigid transformation of
space T is given by an orthogonal transformation and a translation,
(82) Tw = Fw + v
where F is an orthogonal transformation and v is a vector. The set of all such
transformations is a group called the Euclidean group, denoted E(3), and an element
of this group is called a Euclidean transformation.
The matrix (85) can also be thought as the fixed frame F whose vectors have
initial points at the point v.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 63
8.5. DFF and the Euclidean Group. Equation (81) gives the frame Fk as a
function of the torsion angles and bond angles. The discrete curve cannot be
determined from the frames alone. Distances between points are needed to find
the points vk of the curve. Let sk = |vk+1 − vk | be the distance between two
consecutive points. Since tk = Fk e1 and
(87) vk+1 = vk + sk tk ,
the distances sk give the extra information needed to find the points, which can be
computed recursively from (87).
The bond angles, torsion angles and distances can be thought of as directions
for driving (or flying) along the curve. The distances sk between the points are like
the differences in odometer readings between pont k and k + 1. The bond angles
θk are like right and left turns if θk = π/2, except you may turn at any angle. Also
you do not need to make turns in the same plane, the torsion angle φk tells how to
move to a different plane.
8.6. Application to Proteins. Use of the DFF to study protein structure re-
quires knowledge of the geometry of the bonds in peptide plane. The atoms along
the protein backbone appear in sequence Cα , C, N, Cα , . . . and can be thought
of as points of a discrete curve. The bond angles at these atoms have been de-
termined by crystallography studies and the values are 58◦ at nitrogen atoms, 64◦
at carbonyl carbon atoms, and 69◦ at the alpha carbons (see figure 33). (If the
geometry at the alpha carbon is taken to be exactly tetrahedral, then the bond
angle arccos 31 = 70.5◦ is used.) Distances between atoms in the peptide bond have
also been determined. The distance from Cα to C is approximately 1.51, from C
to N approximately 1.32 and from N to Cα approximately 1.45.
k 1 2 3 4 5 6
vk O C N Cα C 0 O0
θk 56.5◦ 58◦ 70◦ 64◦ 0
◦
φk 0 0 φ ψ − 180 0
sk 1.24 1.325 1.455 1.51 1.24
Table 2. Bond angles, and torsion angles, and distance parame-
ters (in Angstroms) for the discrete curve from O to O0 in a dipep-
tide
determines the distance between all atoms. The protein segment from O to O0 is
a discrete curve consisting of atoms O, C, N, Cα , C0 , O0 . The bond and torsion
angles and distances along the curve are shown in table 2. Formula (80) can be
0
used find the coordinates of O and O0 and graph the distance d(O,O ) as a function
0
of φ, ψ (figure 34). The level curves of the function d(O,O ) shows regions of the
φ, ψ plane where the oxygens O and O0 are too close. Since the radius of an oxygen
atom is about 1.5 Angstroms, they certainly cannot get closer than a distance of
3. These excluded regions are shown on many Ramachandran diagrams (see figure
35).
Maple worksheet
8.7. Period 3 structures. The DFF can be used to study the helix parameters
for period three secondary structures, the beta sheet and the alpha helix. Suppose
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 65
0
Figure 34. Contour plot distances d(O,O ) between oxygens in
adjacent peptide planes as function of φ and ψ. Points where the
distance is less that 3 are excluded since the oxygens are too close.
Letting Âk = {Fk ; vk }, (95) holds for F replaced by Â. So multiplication can
be done in the Euclidean group. The vector v3 is the vectors from alpha carbon 0
to alpha carbon 3 and is referred to as the virtual bond vector. It will be denoted
simply as v.
There are two possible choices of u . The choice is made so that u · v > 0. The
helix is right or left handed depending on whether 0 < θ < 180◦ or −180◦ < θ < 0.
Computing the products in (94), the helix parameters can be computed from
the torsion angle φ and ψ. It is convenient to let
φ+ψ φ−ψ
(98) s= t= .
2 2
Then
3.54
v = 1.37 cos ψ
1.37 sin ψ
|v| = 3.80
θ
(99) cos = −.82 sin s + .03 sin t
2
.82 cos s + .03 cos t
θ
sin u = −.57 cos s + .04 cos t
2
−.57 sin s − .04 sin t
θ
d sin = −.68 cos t + 2.9 cos s.
2
(101)
α π β α β α β β−α α−β α−β
e 2 K e 2 I e 2 K = e 2 K Ie 2 K = Ie− 2 K e 2 K = Ie 2 K = cos I + sin J.
2 2
8.9. Problems.
(1) Consider the discrete curve along the FCC lattice given by the 5 points
p1 = (0, 0, 0), p2 = (1, 1, 0), p3 = (1, 2, 1), p4 = (2, 2, 2), p5 = (3, 3, 2).
Compute the bond angles θ2 , θ3 , θ4 and the torsion angles φ2 , φ3 . (You can
write the torsion angle in terms of the arccosine function or the argument.)
(2) Consider the closed discrete curve given by the points
√ !0
jπ jπ j 5
vj = cos , sin , (−1)
3 3 4
for j = 0, . . . , 6. This models the discrete curve formed by the carbon
atoms in the chair configuration of the molecule cyclohexane. Note that
the molecule has symmetry. The set of points is left fixed by the rotation
Rz (2π/3).
(a) Find the discrete Frenet frame, F1 .
(b) Find the unit tangent vectors t0 , t1 , . . . , t6 .
(c) Show cos θj = − 31 , for j = 1, . . . , 6 so the bond angles are compatible
with a tetrahedral carbon bond.
(d) Find the torsion angles φ1 , . . . , φ6 .
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 69
9. Protein Folding
Linus Pauling predicted the alpha helix secondary structure in 1948. His conjec-
tured structure was based on his knowledge of covalent bonds and hydrogen bonds,
and geometric reasoning using a piece of paper that he folded. Here is a copy of his
sketch on the unfolded sheet of paper. He asked himself how the long chain of the
protein backbone could be arranged so that the oxygens and the hydrogens on the
backbone form a hydrogen bond. His folded paper (figure 37) brought the points A
and B close together and created a hydrogen bond between the negatively charged
oxygen and the positively charged hydrogen. The hydrogen bond completed a ring
consisting of 13 atoms and showed a helix with 3.6 residues per turn. Structures of
proteins were later obtained by crystallography, confirming Pauling’s conjecture.
All Pauling needed to make his discovery was the sequence of atoms and a
knowledge about hydrogen bonds created by the electric force between the positively
70 J. R. QUINE
charged H bonded to N and the negatively charged. O bonded to C. This raises the
question of whether protein structures can be determined from just the sequence of
amino acids and all the forces between the atoms. Possibly with powerful computers
and a knowledge of all the forces between atoms in a protein, the structure can be
determined without experiments in the same way that Pauling did.
TGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITT
LVPAIAFTMYLSMLLGYGLTMVPFGGEQNPIYWARYADWLFTT
PLLLLDLALLVDADQGTILALVGADGIMIGTGLVGALTKVYSY
RFVWWAISTAAMLYILYVLFFGFTSKAESMRPEVASTFKVLRN
VTVVLWSAYPVVWLIGSEGAGIVPLNIETLLFMVLDVSAKVGF
GLILLRSRAIFGE
determines the structure shown in figure 38 that we find in the pdb file 1E0P.
The structure of the functional protein is called the native state. The process by
which a protein folds into this structure from an extended chain is called folding
(figure 39).
9.2. Configuration space. The basic question in protein folding is whether from
the amino acid sequence and basic laws of physics the structure of a protein can be
predicted using the power of a computer. This leads to the following questions:
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 71
Levinthal’s paradox: It would take a protein the present age of the universe to
explore all possible configurations and find the minimum energy configuration. Yet
proteins fold in microseconds.
the coordinates of every atom. Some atom positions are fixed by others. For ex-
ample, if the coordinates of a carbon and two adjacent bonded atoms are known,
the coordinates of any hydrogen bonded to the carbon can be found.
The size of the configuration space can also be reduced by using torsion angles
for parameters instead of coordinates of atoms. The table of side chains is helpful
for counting. Below in table 3 is listed the number of torsion angles per side chain,
including main chain. On average there are about 4 (φ, ψ, χ) torsion angles per
residue.
Table 3. Number of torsions angle needed to determine the struc-
ture of a side chain.
number of
residues
torsion angles
2 Gly, Ala, Pro
3 Ser, Cys, Thr, Val
4 Ile, Leu, Asp, Asn, His, Phe, Try, Trp
5 Met, Glu, Gln
6 Lys, Arg
9.4. Energy functions. Pauling discovered the structure of the alpha helix by
trying to bring a hydrogen and an oxygen in close proximity while keeping the
geometry of the peptide bond and keeping some uniformity in the torsion angles
at the alpha carbons. If d is the distance from the O to the H in the hydrogen
bond, Pauling was trying to minimize d or equivalently to minimize −1/d. If q1
is the charge on the O, a negative charge, and q2 is the charge on H, a positive
charge, then by Coulomb’s law the electrostatic energy of the OH pair is q1 q2 /d. So
Pauling was minimizing energy. The numbers q1 and q2 are called partial charges
since they are less than 1 in absolute value, while the charge on one proton is +1
and the charge on one electron is −1.
Another term in the energy function is the electrostatic energy which is large if
two oppositely charged atoms are far apart. Minimizing the electrostatic energy
function will includes making the distance between atoms in a hydrogen bond small.
The electrostatic energy is another term in the total energy function. Many other
types of energy functions can be added to get the total energy.
9.4.1. The energy of a spring. We review the equation for the energy of a spring.
If on a number line x = 0 represents the coordinate of the end of a spring at
equilibrium, the force on the spring with the end moved to x is given by F = −kx
where k > 0 is the spring constant(figure 40). The energy to move the end of the
spring to x is
Z x
x2
(103) E= ks ds = k .
0 2
Force represents the rate of change of energy, F = −dE/dx. We can think of the
force as a tendency to move to minimum energy (figure 41). The graph of the energy
function is called the energy landscape. To find the minimum energy configuration,
we look for valleys in the energy landscape. For this spring, the minimum energy
is at x = 0, at the lowest point of the parabola.
9.4.2. CHARMM Energy. The main part of the CHARMM energy is a sum of six
types of energy terms, each of which is a quadratic function analogous to the energy
function of a spring.
74 J. R. QUINE
c` (b − b0 )2
P
V (x) = Pb b a bond length
+ Pθ ca (θ − θ0 )2 θ a bond angle
+ Pτ ci (τ − τ0 )2 τ a torsion angle
(104) + trig(ω) ω a dihedral angle
Pω Qi Qj
+ i,j D rij rij dist. between charged pair
P Ri +Rj
+ cw φ rij rij dist. between pair
Some further terms, accounting specifically for disulfide bonds and hydrogen
bonds, are also present but will not be discussed here.
9.4.3. Explanation of energy terms. The Qi are partial charges assigned to the
atoms in order to approximate the electrostatic potential of the electron cloud. The
constants labelled c are analogous to spring constants. These are estimated from
principles of physical chemistry and values for these are included in the CHARMM
package. D is the dielectric constant which is determined by the medium, usually
water, in which the protein folds. It measures how strongly within the medium an
electric charge is felt at a distance from the charge.
The quantities indexed by the subscript 0 are reference bond lengths, bond an-
gles, and improper torsion angles near their equilibrium values; different constants
apply depending on the chemistry and on their location in a functional group. The
coefficients of the trigonometric terms trig(ω) (linear combinations of cosines of
multiples of ω) are also determined by the chemistry of the atoms.
The van der Waals interactions (defined by the final sum in the potential) depend
on the interatomic pair potential ϕ which, in the simplest case, is taken as the
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 75
ϕ(x) = x12 − 2 x6 .
Under this potential, two atoms are attracted if they are farther than R0 apart and
repel each other if they are less than R0 apart.
That all of these constants are independent of the molecule is a basic assumption
of molecular mechanics called transferability.
Slides from Michael Levitt’s structural biology course at Stanford are helpful in
understanding energy functions.
Assuming that the O and the H have three Cα atoms in between, and assuming
that the φ, ψ torsion angles are the same at each Cα , the configuration space is 2
dimensional, consisting of the parameters φ and ψ. Looking at the discrete curve
from the O to the H atom, and using the discrete Frenet frame and the bond, torsion,
and distances given from protein geometry (see table 4), the distance d(O,H) can
be computed as a function of φ and ψ. Figure 42 shows the level curves of this
function. See the Maple worksheet for this computation.
In all cases it is useful to allow the entries in the matrix to be complex numbers.
If you have studied matrices only with real number entries, it is very easy to adapt
to complex numbers. Almost all the rules of computations are the same.
76 J. R. QUINE
k atom θk φk sk
1 O 1.24
2 C 56.5◦ 0 1.32
3 N 58◦ φ 1.45
4 Cα 69◦ ψ 1.51
5 C 54◦ 180◦ 1.32
6 N 58◦ φ 1.45
7 Cα 69◦ ψ 1.51
8 C 54◦ 180◦ 1.32
9 N 58◦ φ 1.45
10 Cα 69◦ ψ 1.51
11 C 54◦ 180◦ 1.32
12 N 60◦ 0 1.02
13 H
Matlab is that Maple can work symbolically, that is, you can use letter as well as
numbers for entries. When using numbers, Matlab is often faster.
Below we give a review of a few basic ideas that will be used in the course.
A matrix with the same number of rows and columns is called a square matrix.
3 × 3 square matrix:
3 1 7
B = −1 2 0
0 1 5
3 × 2 matrix:
2 0
C = −9 10
1 14
Matrices with 1 row are called row vectors and matrices with 1 column are called
column vectors.
3
2
A= B = 2
1
1
are column vectors.
C= 2 1 D= 3 2 1
are row vectors. Usually we will assume vectors are column vectors. A row vector
can be converted into a column vector (or vice versa) by the transpose operation,
which changes rows to columns.
78 J. R. QUINE
Example:
0 1
1 −1 0 = −1
0
• addition
• scalar multiplication
• multiplication
2 1 i 0 2+i 1
+ =
−1 3 1 2 0 5
• conjugation
• transpose
• adjoint
• inversion
Example: Let
1+i 2 1 2 1 0
Z= 1 = 1 + i
3−i 2 + 2i 3 2 −1 2
= A + Bi
where
1 2
A= 1
3 2
and
1 0
B=
−1 2
.
The matrix A is the real part of Z and the matrix B is the imaginary part of Z.
80 J. R. QUINE
The conjugate of Z is
1−i 2
Z=
3 + i 12 − 2i
or using the real and imaginary parts of the matrix,
Z = A − Bi
Some properties of the matrix congugate are:
AB = A B
(A + B) = A + B
2 1 −1 2 0 −1
A= 0 1 2 At = 1 1 0
−1 0 1 −1 2 1
t
(AB) = Bt At
t
(A + B) = At + Bt
Example:
∗ 1
1, 2 − i, 3 = 2 + i
3
10.3. Matrix inverse. The matrix I denotes a square matrix whose entries are aij
where (
1 if i = j
aij =
0 if i 6= j.
The matrix I is called the unit or identity matrix. Identity matrices come in different
sizes:
1 0 0
1 0
I2 = I3 = 0 1 0 .
0 1
0 0 1
Let A be a square matrix. The inverse is a matrix A−1 such that A−1 A =
AA−1 = I.
a x
10.4. Dot product. Let v = b and w = y be two vectors, then the dot
c z
product is given by
x
v · w = v t w = a b c y = ax + by + cz
z
1+i 2
Let v = 1 and w = 1 − i then the Hermitian dot product is given by
i 3
hv, wi = v ∗ w = v · w
= (1 − i)2+1(1 − i) + (−i)3
= 3 − 6 i.
82 J. R. QUINE
At = ... .
vnt
If
w1t
B = ...
wkt
is a k × m matrix given as a list of k row vectors, w1t , . . . , wkt , each 1 × m, then
w1 · v1 . . . w1 · vn
BA = ... ..
.
wk · v1 ... wk · v n
is a matrix of dot products.
Example:
3 1+i 2
1 + i 0 −5
2 −5 2
is symmetric.
Example:
1
2 1+i 2 + 2i
1−i 3 5
1
2 − 2i 5 4
is self adjoint.
MATHEMATICAL TECHNIQUES IN STRUCTURAL BIOLOGY 83
For a 3 × 3 matrix,
a b c
e f
d
f d e
det d e f = a
− b + c .
h k g k g h
g h k
This is called expansion by minors. Likewise the determinant is defined for any
square matrix.
Properties of determinant:
detAB = detA detB
The formula for cross product is often remembered by pretending that i, j and
k are numbers and writing
a x i j k
b × y = det a b c
c z x y z
Example: 1
2 1 1 1
=3
1 2 1 1
The eigenvalue is 3, and an eigenvector is [1, 1]t . Note that [2, 2]t is also an
eigenvector.
84 J. R. QUINE
Example: 2
0 −1 1 i 1
= =i
1 0 −i 1 −i
Example: 3
0 −1 1 −i 1
= = −i
1 0 i 1 i
Note that the equation in example 3 is the conjugate on the one in example 2.
Also note that a matrix with real entries can have complex eigenvalues and
eigenvectors.
The eigenvalues of self adjoint matrices are real. This fact is essential in many
areas of mathematics and is also a key fact in the mathematical formulation of
quantum mechanics. Here is a proof:
cos θ − sin θ
.
sin θ cos θ
Matrices can be found for rotation of any angle about any axis, where an axis is
given by a non-zero vector.
10.11. Problems.
(7) Let
0 1 0 −i 1 0
Σ1 = Σ2 = Σ3 = .
1 0 i 0 0 −1
Use Maple to verify the following identities:
Σ21 = 1, Σ22 = 1 Σ23 = 1,
Σ1 Σ2 = iΣ3 , Σ2 Σ1 = −iΣ3 ,
Σ2 Σ3 = iΣ1 , Σ3 Σ2 = −iΣ1 ,
Σ3 Σ1 = iΣ2 , Σ1 Σ3 = −iΣ2 .
(8) For the each of the matrices Σk in the previous problem, find vectors X
and Y such that
Σk X = X and Σk Y = −Y.