Structural Bioinformatics Life Through
Structural Bioinformatics Life Through
The 3D Glasses 10
Ankita Punetha, Payel Sarkar, Siddharth Nimkar, Himanshu Sharma,
Yoganand KNR, and Siranjeevi Nagaraj
10.1 Introduction
The realization of the need of understanding the structural principles to know the
functioning led to remarkable growth in structure of macromolecules like
deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein. The
macromolecules can attain various shapes responsible for particular function. There-
fore, in order to understand the function of these macromolecules, their structure
needs to be understood.
10.2.1 DNA
Deoxyribonucleic acid is the genetic material that carries the biological information
on how an organism will grow, develop, maintain, and reproduce. It is a biopolymer
composed of repeating units of nucleotides and usually comprises of two strands,
which coil around each other to form a double helix. Each nucleotide is made up of a
pentose sugar called as deoxyribose (lacks hydroxyl group at the 2nd carbon of the
pentose sugar), a nitrogenous base – either purine like adenine (A) and guanine
(G) or pyrimidine like thymine (T) and cytosine (C) – and a phosphate group. The
backbone of this polynucleotide chain has alternating sugar-phosphate molecules in
which the sugar of a nucleotide is covalently linked to phosphate of the next. The
hydrogen bonding between the nitrogenous bases of the two polynucleotide strands
(A with T and G with C) results in the formation of double-stranded DNA.
The first extraction of DNA dates back to 1869 by Friedrich Miescher (Dahm
2008), but its double-helical nature was revealed in 1953 by Watson and Crick from
the X-ray diffraction data of Rosalind Franklin (Watson and Crick 1953; Wilkins
et al. 1953). The nucleic acid research focus was soon shifted to fiber diffraction
(Arnott 1970; Arnott et al. 1974a, 1976; Rodley et al. 1976), which provided insight
into a variety of structures adopted by nucleic acid like single-stranded helices
(Arnott et al. 1976), parallel helices (Rich et al. 1961), and triple and quadruple
helices (Arnott et al. 1974b). DNA being flexible can exist in various forms
depending on the environmental conditions. Its conformation is governed by the
sequence, extent and direction of supercoiling, base modifications, level of hydra-
tion, ionic strength, and the presence and concentration of metal ions or polyamines
in the solution (Basu et al. 1988; Cheng and Pettitt 1992; Ghosh and Bansal 2003;
Choi and Majima 2011; Zhou et al. 2015; Porrini et al. 2017; Dickerhoff et al. 2017;
Sathyamoorthy et al. 2017; Kriegel et al. 2017a). The publication of B-DNA
structure in 1980 revealed it to be a right-handed double helix (Wing et al. 1980).
Also the unusual Z-DNA, the left-handed form of DNA structure, was elucidated
(Wang et al. 1979). Tremendous growth in the fine structures of DNA followed,
194 A. Punetha et al.
which increased our understanding manifolds (Kocman and Plavec 2017; Kriegel
et al. 2017b; Porrini et al. 2017; Yella and Bansal 2017; Gajarsky et al. 2017; Artusi et al.
2016; Arcella et al. 2012; Adrian et al. 2012; Choi and Majima 2011; Chou et al. 2003;
Wahl and Sundaralingam 1997).
The nitrogenous bases have planar aromatic heterocyclic structure and can be
categorized into purines and pyrimidines. Adenine and guanine are purines in
structure (a nitrogen containing double ring having a six- and a five-membered
ring) and form a glycosidic bond between their 9 nitrogen and 10 -OH group of the
deoxyribose. Cytosine and thymine are pyrimidines (a nitrogen containing single
six-membered ring) and form glycosidic bond between their 1 nitrogen and the
10 -OH of the deoxyribose. The phosphate group forms a bond with the deoxyribose
sugar through an ester bond between one of the negatively charged oxygen groups
and 50 -OH of the sugar. The nucleotides in a polynucleotide chain are linked by
phosphodiester bond between 50 and 30 carbon atoms. The oxygen and nitrogen atoms
in the backbone make the chain polar. The order of nucleotides within a DNA forms
its sequence and represented by the series of letters of its base.
grooves, length of helix turn, and the number of bases per turn. The tertiary
organization of DNA double helix results in its three forms – B-DNA, A-DNA,
and Z-DNA.
The B-DNA is right-handed helix and the most common form of DNA under
physiological conditions, neutral pH, and low salt concentration. It attains a narrow,
elongated structure with narrow minor groove and wide major groove with helix axis
being perpendicular to base pairs. The deoxyribose sugar ring of B-DNA has
C20 endoconformation, i.e., the C20 atom is above the plane of C40 -O-C10 . The base
separation is the same as the helical rise 3.4 Å. The right-handed double helix has ten
base pairs per complete turn, with the two polynucleotide chains antiparallel to each
other and linked by Watson-Crick base pairs (A-T, G-C). The Watson-Crick base
pairing results in asymmetry of the two deoxyribose sugars linked to the bases of an
individual pair on the same side of it. The helix winds along, parallel to the sugar-
phosphodiester chains with base pairs almost centered over the helix axis. The wide
major groove has similar depth (distance of base pairs from the helix axis) as much
narrower minor groove. The major groove is richer in base substituents – O6, N6 of
purines and N4, and O4 of pyrimidines compared to minor one. The major groove
width renders it accessible to proteins. B-DNA occurs at high water concentration, as
the hydration of the minor groove appears to favor B-DNA form. The X-ray
diffraction analysis of oligonucleotides crystals reveals that even the same sequence
can adopt distinct structures, which may differ in propeller twist between bases
within a pair to optimize the base stacking, or the two successive base pairs can move
relative to each other showing twist, roll, or slide.
The right-handed DNA duplex attains A-DNA form in dehydrating
environments, which is shorter and wider than B-form. It occurs at low water
concentration. The A-DNA has C30 atom above the C40 -O-C10 plane, i.e., it has
C30 endoconformation in contrast to the C20 endoconformation of B-DNA. The
C30 endoconformation brings both the consecutive phosphate groups on the nucleo-
tide chain closer together reducing the distance between the adjacent nucleotides by
1 Å in A-form relative to B-form. In A-DNA, the base pairs are twisted, tilted, and
displaced nearly 5 Å from the helix axis, which results in different groove
characteristics. The major groove is deep and narrow and not easily accessible to
proteins, while the minor groove is wide and shallow which can be accessed by
proteins but has lower information content than the major groove. Thus, the A-DNA
has a hollow cylindrical core. The helical rise is consequently reduced to 2.56 Å, and
the helix is wider with 11 base pair per turn.
The Z-DNA is a relatively rare left-handed double helix with pronounced zigzag
pattern in the phosphodiester backbone (Wang and Vasquez 2007). Its helix is more
narrow and elongated than A- and B-DNA with convex outer surface of the major
groove and a deep central minor groove. Z-DNA formation can occur when the
DNA has alternating purine-pyrimidine sequence with purines and pyrimidines in
different conformation, leading to the zigzag pattern. Usually, there is alteration of
cytosine and guanine with cytosine at the first position. It occurs when there is a high
salt concentration (Bae et al. 2011). In a base pair, Z-DNA has one nucleotide with
sugar in the C30 endoconformation (like A-DNA and in contrast to B-DNA) and the
196 A. Punetha et al.
base in synconformation which places the base over the sugar ring (in contrast to
anticonformation in A- and B-DNA). The advantage of having base in
anticonformation is that it places the base in a position where it can readily form
hydrogen bonds with the complementary base on the opposite strand. The duplex in
Z-DNA has to accommodate the distortion of this nucleotide in the synconformation,
while the adjacent nucleotide of Z-DNA is in the normal C20 endo, anticonformation.
The comparison between the three forms of DNA is shown in Table 10.1.
The guanine base has the ability to utilize both its faces at once to form hydrogen-
bonded arrays, resulting in multi-stranded structures in guanine-rich DNA
sequences. Guanine (G) quartet is one such arrangement with four guanines. The
G-quartet is stabilized by forming stacked sets of four bases, where first the four
guanine bases form a flat plate, which then stacks over another flat plate to form a
quadruplex structure. Each four base unit is stabilized by hydrogen bonding between
the base edges and metal ion chelation in the center (Burge et al. 2006; Parkinson
et al. 2002). Numerous conformations can be formed from a set of four bases, either
10 Structural Bioinformatics: Life Through The 3D Glasses 197
from different parallel strands that contribute a base to the central structure or from a
single strand that folds around a base. Diverse quadruplexes can be formed
depending on the length and number of strand involved and also in the intervening
non-guanine loop sequence. The diverse topologies adopted by G-quadruplexes
include interlocked G-quadruplexes, double-chain-reversal and V-shaped loops,
triads, mixed tetrads, adenine-mediated pentads, hexads, and snap-back G-tetrad
alignments (Dolinnaya et al. 2016; Huppert 2010; Campbell and Parkinson 2007;
Perrone et al. 2017; Kocman and Plavec 2017).
The presence of DNA tetrameric structure was first shown in 1947 (Arnott et al.
1974b), but the biological relevance was discovered in 1995 (Rhodes and Giraldo
1995). The tetrameric arrangement of DNA exists in the G-rich eukaryotic telomeres
(at the ends of the linear chromosomes) and also in non-telomeric genomic DNA,
e.g., nuclease-hypersensitive promoter regions (Burge et al. 2006), and viral
genome, e.g., the human herpes simplex-1 (HSV-1) genome (Artusi et al. 2016). It
has been reported that the DNA G-quadruplex structures are involved in gene
expression and telomere maintenance (Takahama et al. 2013; Murat and
Balasubramanian 2014; Rhodes and Lipps 2015; Fukuhara et al. 2017).
Cells have specialized regions called telomeres that permit chromosomal end
replication utilizing enzyme telomerase (Greider and Blackburn 1985) and also
protect the DNA ends from the DNA repair systems of the cell from treating them
as damage to be corrected (Nugent and Lundblad 1998). The telomeres in human
cells usually contain single-stranded DNA with several thousand repeats of
TTAGGGG, which loop back to form DNA quadruplex having conformation very
different from the usual DNA helix (Wright et al. 1997). The large loop structures in
telomeres called T-loops are extensive circle of the single-stranded DNA stabilized
by telomere-binding proteins. Slight variations of human telomeric sequences can
form different types of G-quadruplex structures (Griffith et al. 1999; Li et al. 2014).
Toward the T-loop end, the single-stranded telomere DNA strand disrupts the
double-stranded DNA to base pair with one of the strand to form a triple-stranded
arrangement termed displacement loop or D-loop (Parkinson et al. 2002). The
G-quadruplex formation at telomeric ends seems to negatively regulate the activity
of the enzyme telomerase, which maintains telomere length (Patel et al. 2007;
Kuryavyi et al. 2010).
Another addition to tetrahelical families are AGCGA-quadruplexes, which
comprises of four 50 -AGCGA-30 tracts stabilized by G-A and G-C base pairs,
forming GAGA- and GCGC-quartets, respectively. Residues in the core of the
structure are connected with edge-type loops. Sequences of alternating
50 -AGCGA-30 and 50 -GGG-30 repeats form AGCGA-quadruplexes instead of
G-quadruplexes. These structurally unique AGCGA-quadruplexes have lower sen-
sitivity to cation and pH variation. This indicates their biological significance in
regulatory regions of genes responsible for basic cellular processes that are related to
neurological disorders, cancer, and abnormalities in bone and cartilage development
(Kocman and Plavec 2017).
198 A. Punetha et al.
10.2.3 RNA
pattern of sugars, bases, and phosphates to form nucleotide which then binds to form
nucleic acid in similar fashion.
As in DNA, RNA nitrogen bases are divided into types – purines and
pyrimidines. Adenine and guanine are purines in structure (a nitrogen containing
double ring having a six- and a five-membered ring) which form a glycosidic bond
between their 9 nitrogen and 10 -OH group of the ribose. Cytosine and uracil are
pyrimidines (a nitrogen containing single six-membered ring) and form glycosidic
bond between their 1 nitrogen and the 10 -OH of the ribose. The phosphate group
forms an ester bond between one of the negatively charged oxygen groups and
50 -OH of the ribose sugar. The nucleotides are linked by phosphodiester bond
between 50 and 30 carbon atoms in a polynucleotide chain. The oxygen and nitrogen
atoms in the backbone make the chain polar. The RNA sequence is the order of
nucleotides in the polynucleotide chain and represented by the series of letters of its
nitrogenous base A, U, G, and C, denoting adenine, uracil, guanine, and cytosine,
respectively. Unlike DNA, RNA has much shorter nucleotide chain.
Double Helix
The antiparallel strands form the helical shape. RNA double helices have structures
similar to the A-form of DNA.
Stem-Loop Structures
Stem-loop or hairpin loop is the most common RNA secondary structure, which is
formed when the nucleotide chain folds back onto itself to form double-helical
portion called stem. Loop is the single-stranded region formed by the unpaired
nucleotides. It serves as the building block for larger structural motifs like cloverleaf
structures, which are four-helix junctions like in tRNA.
Pseudoknots
Another form of RNA secondary structure is pseudoknot, which is a helical segment
resulting from the pairing of nucleotides from the hairpin loop with a single-stranded
region outside of the hairpin. Pseudoknots fold into knot-shaped 3D conformations
but are not true topological knots. The base pairing occurs that overlaps one another
in sequence position. Pseudoknots are found in most classes of RNA and have
diverse functions. It was first identified in turnip yellow mosaic virus (Rietveld et al.
1982). Among the pseudoknots H-type fold pseudoknots are best characterized. It
has two stems and two loops. The second stem loop is formed as a result of pairing of
nucleotides in hairpin loop with bases outside the hairpin stem (Staple and Butcher
2005). Pseudoknots are involved in several important biological processes like the
pseudoknot of RNA component of human telomerase that is critical for activity
(Chen and Greider 2005).
10.2.4 Protein
form a tertiary structure. Not all but in few cases, tertiary structures associate with
each other (intermolecular) to form quaternary structure.
Motifs
Motifs act as structural subunits of the protein and comprise of various secondary
structural elements, which are arranged in regular patterns. Based on these
arrangements, these super-secondary structures are classified into various types
enumerated below.
1. Helix-loop-helix
2. Helix-turn-helix
In this motif, two helices are joined by a loop that makes a turn, thus folding back
the polypeptide chain. These are commonly found in DNA-binding domains of the
proteins.
The β-α-β motif commonly exists in proteins with parallel β-sheets. The
C-terminus of first β-strand is connected to the N-terminus of a second by a loop-α
helix-loop. Generally, in 3D structure parallel β-sheet exists in a plane, where
intermittent helices are placed above the sheet plane. Varying lengths of loops are
observed in different motifs. In some proteins, catalytic sites are found in the loop
regions of β-α-β motif.
The Greek key motif consists of four adjacent antiparallel strands arranged in the
form of an ornamental Greek key. Three β-antiparallel strands of this motif are
connected by two hairpin loops, while the fourth is placed adjacent to the first and
linked to the third by a longer loop.
This is one of the common folds present in various proteins. In most cases, four
antiparallel helices are bundled to pack hydrophobic core at the helix interface and
expose the hydrophilic residues to the aqueous solvent.
This fold is present in globin family of proteins (e.g., hemoglobin and myoglo-
bin). This fold contains eight helices forming an active site pocket for binding of
heme group. In this domain, two helices at the C-terminus form a helix-turn-helix
motif, thus arranging themselves antiparallel. The other helices in the remaining
domain pack against each other with angle around 50 .
10 Structural Bioinformatics: Life Through The 3D Glasses 207
The β-domains are made up of β-sheets alone. Examples include up and down
β-barrels, jelly roll barrel, and β-sandwich.
The large antiparallel β-sheet wraps around in circular fashion so that the strands
that would be on the edges of the sheet are spatially adjacent and hydrogen bonded
forming a barrel structure with large void in the center. The amino acid side chains
alternatively point above and below the sheet. This space in the center acts as a
transporting channel in various membrane proteins.
Jelly roll barrel also consists of single sheet wrapped around itself, but here longer
loops transverse the channel core, thus leaving no void. The core region consists of
hydrophobic residues. It usually consists of eight beta strands arranged in two four-
stranded antiparallel beta sheets.
(c) β-sandwich
In this fold, two antiparallel β-sheets are arranged in parallel planes stacking each
other like a bread sandwich. In contrast to β-barrel, they conceive a hydrophobic core
with no void spaces. The number of strands found in such domains may differ from
one protein to another. This type of fold is found in immunoglobulins.
3. αþβ-domains
The SH2 domain is a structurally conserved protein domain contained within the
Src oncoprotein and in many other intracellular signal-transducing proteins. It
contains two α-helices and seven β-strands and is approximately 100 amino acids
in length. It shows high affinity to phosphorylated tyrosine residues and is known to
identify a three to six amino acid sequence within a peptide motif.
4. α-/β-domains
(a) α-/β-barrel
The protein is able to perform its biological function by forming stable 3D structure
in normal environment. For example, enzymes use a cavity in the surface of their 3D
structure called active site, which is accessible to reactants to catalyze the reactions.
The multifunctional active sites contain key catalytic machinery of the protein,
consisting of one or more residues that are actively involved in catalyzing the
reaction and transition-state stabilization. Based on the active site shape and
physiochemical properties, only a particular class of molecules can bind and
catalyzed. All this depends on the active site attaining proper 3D conformations,
which in turn depends on the folding of the polypeptide chain. In general, all proteins
rely on specific 3D structure to perform their biological function. All proteins are not
enzymes and may have other functions such as molecular recognition like transport
proteins need to recognize and carry specific molecules, or the antibodies which need
to recognize the foreign proteins, or the interaction of components in signaling
pathway or the complex formation. The recognition of other macromolecules is
very important in gene expression regulation by DNA-binding protein and formation
of nucleoprotein complex like ribosome. The recognition of molecular signal by
210 A. Punetha et al.
receptor proteins is important in sensing (e.g., the receptors present in cell nucleus
sense steroids).
The basic requirement for molecular recognition requires binding of the
molecules in energetically favorable conformation, which depends on complemen-
tarity of shapes and physiochemical properties, i.e., they must fit snuggly together
and their surface atoms in contact must have complementary properties. Thus, the
hydrophobic area of one interacting partner must be in contact with hydrophobic
area of the other, and the negatively charged area of one must contact the positively
charged area of the other. All this is dependent on the formation of specific 3D
structure of proteins. Therefore, the protein function is dependent on its attaining a
stable specific 3D structure.
Various approaches have been developed to predict function from the structural
information. The basic approach that uses structural data for predicting function of a
protein relies on finding globally similar structural features (Sleator and Walsh
2010). However, if the match is not significant, similarities between the functional
sites are assessed. Typically, it involves either protein fold comparison, use of local
3D templates, or the local structural feature comparison. Proteins having similar
structural features along their entire sequence are more likely to have similar
functions (Whisstock and Lesk 2003; Tosatto and Toppo 2006). Some of the popular
web services available for quantifying this relationship are DALI (Holm and Laakso
2016), CATHEDRAL (Redfern et al. 2007), SALSAs (Wang et al. 2013), and
FLORA (Redfern et al. 2009). The significance of the similarity is assessed based
on the number of amino acid residues considered in the alignment and the quality of
superposition. Detecting the presence of common motifs distributed over the range
of diverse folds within the structure hints the key functional similarity. The analysis
of CATH database (Dawson et al. 2017) reveals that the protein domains having the
same folds tend to have a specific function, but a few number of additional
superfolds can completely change the key function. Recent advancement in the
similarity-based scoring methods involves the comparison of protein’s internal
residue contact that identifies the residues co-located in the range of 8–10 Å in the
structure and finally detects additional similarities using conventional global align-
ment methods.
Though whole fold comparison is the most common method used to assign
protein functions, it has some limitations. It does not consider the conservation of
the local environment distinctly, which is very important as small changes in the
active site residues can cause a complete alteration in protein functions. For example,
the function of enzymes and DNA-binding proteins is solely dependent on the
conservation of their active site residues. Thus, methods have been developed that
compare smaller structural motifs to assign specific functions to proteins. The
Catalytic Site Atlas (Furnham et al. 2014) is a protein structure database that stores
all manually annotated catalytic site residues of different proteins. It helps to provide
a structural template that can be compared to the protein structures of unknown
function using a fast search algorithm to transfer and assign the closest Enzyme
Commission (EC) numbers. Hydrophobic residues are often eliminated while
constructing a structural template because they tend to be buried in the core of the
10 Structural Bioinformatics: Life Through The 3D Glasses 211
There are several methods to determine the protein structure like X-ray crystallogra-
phy, nuclear magnetic resonance (NMR) spectroscopy, and electron microscopy
(EM). The priority in usage of these methods over one another depends on the
biological question that needs to be addressed. If one has to study small protein with
<50 amino acids, NMR is the obvious choice. Not all proteins behave the same,
some are easy to crystalize, and some are not amenable for crystallization. For easily
crystalizing proteins, X-ray crystallography is preferred. Cryo-EM helps to unravel
overall topology of the protein interactions by direct imaging the macromolecular
interactions. Other techniques such as electrospray ionization mass spectrometry
(ESI-MS) have also been developed to study macromolecular structures that are not
accessible by either NMR or X-ray crystallography. All these techniques have some
pitfalls. The major issue with X-ray crystallography is the complexity of amino
acids, which decides the protein’s fate to get into crystal or not. Additionally, the
expression of protein in larger quantity for structural studies is often difficult.
212 A. Punetha et al.
itself with the external magnetic field. It is measured as declining amplitude with
time and is called as free induction decay. This gives a measure of frequency and
decay as a function of time. In order to get the spectrum with a particular peak for
particular nuclei, the data is subjected to Fourier transform. Each peak in an NMR
spectrum defines certain magnetically different nuclei. The presence of other nuclei
in vicinity in the form of atomic bonds, van der Waals interaction, ionic interaction,
etc. will have an effect on the position of the peak. This is termed as chemical shift.
The chemical shift for an NMR signal is normally measured in Hertz (Hz) shifted
relative to a reference signal of tetramethylsilane.
If any of the above criteria is not met, it becomes very difficult to obtain crystals,
and one has to modify expression and purification conditions such as pH, salt
concentration, etc. The solubility of protein depends on the interactions with other
compounds present in the solution. At physiological conditions, proteins are soluble,
but as the concentration rises, the protein tends to precipitate, a process called as
10 Structural Bioinformatics: Life Through The 3D Glasses 215
salting out. The basic idea behind crystallization is to slowly salt out protein to form
crystals. Precipitant concentration is increased gradually, which allows protein to
enter metastable state leading to crystal formation (Fig. 10.1).
Many factors influence the crystal formation, such as the following:
1. Protein purity – If the protein is not pure, the lattice will not be properly formed,
which will lead to the disintegration of crystals.
2. pH of the solution – Protein tends to precipitate and form crystal near its pI as the
charge on protein becomes null, which leads to easy precipitation.
3. Concentration of protein – If the protein concentration is too low, it tends to
remain in soluble form, while the molecular crowding due to high concentration
of protein easily forms crystals.
4. Temperature – It affects the rate of precipitation and hence the crystal formation.
Typically, 4 and 18 C are used.
5. Precipitant – Different proteins tend to precipitate with different precipitants.
Hence, the choice of precipitant depends on the protein.
6. Additives – The use of additives is to increase intermolecular attraction between
the proteins molecules, or it may help in decreasing the interaction between the
solvent and protein, thereby increasing the propensity of protein to crystallize.
(Fig. 10.2). Initially the precipitant concentration in the drop is less, and the water
concentration is more as compared to the reservoir buffer. However, as the system
tends to achieve equilibrium, water from droplet evaporates into reservoir, thereby
increasing the precipitant concentration in the droplet. This slow increase in precipi-
tant concentration leads to crystal formation.
Microbatch Crystallization
In microbatch method or sitting drop method, the protein drop and reagent are
combined and sealed in a plate, tube, and container or sealed under a layer of oil
(Fig. 10.3). It can be categorized in two types:
In microbatch under oil method, the protein drop is placed at the bottom of the
tank, and it is then covered with a layer of oil – either paraffin oil or Al’s oil
(a mixture of 1:1 silicon oil and paraffin oil). Oil acts as a barrier between the
reservoir and the protein drop, allowing little to no diffusion of water through the oil.
Microbatch under Al’s oil permits diffusion of water from the drop through the oil,
hence allowing for concentration of the sample and the reagents in the drop.
sealed container, with or without the possibility of evaporation, and usually involve
temperature control. No oil is used to cover the protein and reagent. Microbatch
without oil can also be performed on a micro- or nanoliter scale in a sealed plate,
which is termed as drop drop crystallization.
Micro-dialysis
This method employs a semipermeable membrane across which precipitant can pass,
whereas larger molecule like protein cannot pass. A salt gradient is established
across the membrane, which allows slow diffusion of precipitant into protein drop.
Data Collection
The second step in X-ray crystallography is bombardment of protein crystal with
high-intensity X-rays. Four types of X-ray sources are available for protein
crystallography:
X-rays once generated are then shot to protein crystal. Most of the X-ray pass
through the crystal without any diffraction, but some X-rays are scattered from the
electron, and this phenomenon is called as X-ray diffraction. Although these waves
cancel one another out in most directions through destructive interference, they add
constructively in a few specific directions, determined by Bragg’s law:
2d sin θ ¼ nλ
Here d is the spacing between diffracting planes, θ is the incident angle, n is any
integer, and λ is the wavelength of the beam. These specific directions appear as
spots on the diffraction pattern called reflections.
Phase Problem
In order to solve the structure, phase information is required. The destructive
interference of waves lead to phase problems as no diffraction pattern is obtained
in such case. There are few ways to solve phase problem. If the coordinates are
already present from similar protein structure, molecular replacement can be used to
solve the phase problem. Molecular replacement takes coordinates from the existing
system and tries to fit into the experimental data until a good match is obtained. If it
is successful, it can be used to create electron density map. In other methods, heavy
atoms are allowed to diffuse into the crystal without affecting the crystal lattice.
Since heavy atoms are large in size, it is assumed that one unit cell will have only one
heavy atom. Heavy atoms are electron dense and hence give a very clear diffraction
pattern. The diffraction data is also collected without the heavy atoms. The differ-
ence between the two data can allow easy calculation of phase using vector
218 A. Punetha et al.
1:22 λ
r¼
2nsin θ
Here r is the resolving power, λ is the wavelength of the beam, n is the refractive
index of the view medium between the objective lens and the object, and θ is the
semi-angle of collection of the magnifying lens.
It uses electromagnetic and electrostatic lenses to control the electron beams and
direct it toward the specimen. Majorly two types of electron microscopes are used,
namely, transmission electron microscope (TEM) in which electrons are transmitted
through ultrathin section of specimen and scanning electron microscope (SEM) in
which specimen is scanned with beams of electron. TEM has one order higher
magnitude of resolution than SEM.
Cryo-electron tomography is a powerful technique that can image the native cellular
environment (Asano et al. 2016; Bharat et al. 2015). It relies on the intrinsic contrast
of frozen cellular material for direct identification of macromolecules. In cryo-ET,
multiple 2D projections of biological sample are computationally integrated to
reconstruct its 3D image. Multiple images are taken with every image tilted at a
certain angle as compared to the previous image, and then all images are merged to
create a complete 3D image. This allows densities to be resolved in 3D that would
otherwise overlap in 2D projection images. To increase signal-to-noise ratio and
resolution, the structures present in multiple copies within tomograms are extracted,
aligned, and averaged. This reconstruction approach is termed subtomogram aver-
aging and can produce 3D pictures (tomograms) of complex objects such as asym-
metric viruses, cellular organelles, or whole cells (Bharat and Scheres 2016; Wan
and Briggs 2016). Subtomogram averaging or single particle tomography (SPT) is
gaining enormous momentum and becoming a widely used technique, owing to its
potential for in situ structural biology at subnanometer resolution. With recent
advances in sample preparation, detector technology, and phase plate imaging, it
can be applied to unambiguously determine the structures of macromolecular
complexes that exhibit compositional and conformational heterogeneity, both in
situ and in vitro (Galaz-Montoya and Ludtke 2017).
220 A. Punetha et al.
The limitation with cryo-ET is that the samples must be cut into thin sections to
allow proper freezing and TEM images to be taken. If the sample is too thick, then it
must be sliced into fine sections to obtain better image. Another limitation is that the
samples have to be kept at cryo-temperatures to avoid radiation damage, which
limits the 3D resolution of the sample.
paper. Tools were also developed so that mmCIF data files could be easily accessed
and validated using computer programs. The structure of the dictionary was further
improved to deal with complexity of macromolecules data, and the Dictionary
Definition Language (DDL; Westbrook and Hall 1995) was used. Soon it was
realized that it was not sufficient. Its data typing was not efficient with missing
links among data items. This led to the development of enhanced DDL (DDL2). The
dictionary was placed on World Wide Web, and mmCIF list server was used to
receive comments from the community, which resulted in continuous correction and
update of the dictionary. mmCIF dictionary version 1.0 containing 1700 definitions
was released in 1997 after the review of the IUCr committee that supervises the
dictionary development. The dictionary extensions were managed using a scientific
journal as model with proposed extensions being sent to the specialized editors of the
mmCIF dictionary for scientific review and then sent to technical editors. New
definitions came with succeeding years, which were incorporated in the mmCIF
dictionary version 2. To parse and access CIF and mmCIF, software libraries were
produced for many languages including C, Cþþ, Java, Fortran, Perl, and Python.
The syntax of mmCIF data files and dictionaries is similar to the syntax of core CIF
(used for describing small molecule crystallography) and is derived from the Self-
defining Text Archive and Retrieval (STAR; Hall 1991) grammar. The mmCIF
simplest data file has paired collection of data item names and values.
These dictionaries contain the number of contents that were not covered in mmCIF
dictionary but are developed on the same methodology used for mmCIF data and are
consistent with its data representation. For example, imgCIF dictionary details the
crystallographic data in ASCII and binary formats from image detectors, symmetry
extension adds crystallographic symmetry details, cryo-electron microscopy exten-
sion adds the structure and volume data for 3D EM experiments, BioSync dictionary
describes the features and facilities available at synchrotron beamlines, MDB dictio-
nary provides homology models, and PDB exchange dictionary provides data
internally used by PDB and data required to describe high-throughput structure
determination. Thus, a single file format cannot be used for all users and application.
Application program interfaces (API) are used to access data to avoid file format
issues. Data is accessed collection of functions, procedures, and methods depending
on the language used which is standardized by Object Management Group (OMG)
using Common Object Request Broker Architecture (CORBA). The language- and
platform-independent programmable interfaces are defined using interface definition
language (IDL), which is supported by CORBA. Thus, CORBA supports the cross-
platform access and often called middleware. The mmCIF data representation in
CORBA IDL for macromolecular structure provides efficient program access to all
the data in PDB entries.
Each of the representation of macromolecular structural data has their own
strength and weaknesses. The PDB format is accessible with simple tools, while
222 A. Punetha et al.
with only minor errors in side-chain packing and rotameric state. If the sequence
identity is in the range of 30–50%, errors can be more severe and are often located in
loops. The regions above 30% sequence identity fall in safe zone (Fig. 10.4) for
homology modeling, while the regions below it fall in twilight zone (Rost 1999). In
this low-identity region, fold recognition methods are preferred over homology
modeling as serious errors can occur like wrong prediction of basic fold (Blake
and Cohen 2001; Baker and Sali 2001). The primary source of error at high sequence
identities (where homology modeling is done) can arrive due to wrong selection of
the template or templates for model building, while at lower identities error can
occur in sequence alignment inhibiting high-quality model generation (Venclovas
and Margelevicius 2005).
There are various resources available for structure validation. Some of them are
enumerated below:
There are many comparative modeling software available (Table 10.2). Some are
stand-alone, while others are automated web servers.
226 A. Punetha et al.
Fold recognition is about searching the most compatible fold that the target protein
might adopt from a library of known folds (known protein structures), using both
sequence and structural information. Fold recognition uses alignment of the target
sequence with one or more distantly related sequences of known structures and can
be considered as extension of comparative modeling to discover distant
relationships. Fold is detected even when there is no significant sequence similarity
to any protein of known structure. Thus, the distant structural and evolutionary
relationship is detected with separation from chance sequence similarities associated
with the shared fold.
Fold recognition methods are effective because protein folds are limited in nature,
mostly because of evolution but also due to constraints imposed by the polypeptide
chain’s chemistry. Hence, it is likely that a protein with similar fold to the target has
already been experimentally studied and can be found in PDB.
Fold recognition methods can be broadly classified into profile-based methods and
threading. The profile-based fold recognition approach (Bowie et al. 1991) involves
fitting of the physicochemical properties of the amino acids of the target protein with
the environment in which they are placed in the modeled structure. In profile
representation, each amino acid in the structure is labeled as either buried (protein
core) or exposed (surface), whether it is part of α-helix or β-sheet (i.e., its local
secondary structure) and/or its conservation (evolutionary information). The 3D
representation describes a structure as a set of interatomic distances; although it is
much richer and more flexible, it is harder for alignment calculation. The similarity
in sequence detected by amino acid substitution matrices is added with structural
information. For example, the three-dimensional position-specific scoring matrix
(3D-PSSM; Kelley et al. 1999) uses both – the fold library structures which are
described in terms of ordinary 1D sequence profiles generated by position-specific
iterated basic local alignment search tool (PSI-BLAST; Altschul et al. 1997; Jones
and Swindells 2002) and the 3D profiles holding secondary structure and solvation
potential information. The secondary structure component describes the similarities
between secondary structures of the predicted and of the member in fold library,
while the solvation potential takes account of the tendency of hydrophobic amino
acids to bury in hydrophobic core. Thus, this method requires a sequence-structure
alignment. It can be done by using PSI-BLAST, which constructs a multiple
sequence alignment followed by creation of a profile or a PSSM customized to the
10 Structural Bioinformatics: Life Through The 3D Glasses 227
1. Selection of template protein structure from the protein structure databases such
as PDB (Rose et al. 2017), FSSP (Holm and Sander 1996), SCOP (Lo Conte et al.
2000), or CATH (Knudsen and Wiuf 2010), after removing protein structures
with high sequence similarities.
2. Designing of a good scoring function to measure the fitness between target
sequences and templates based on the knowledge of the known sequence-
structure relationships. It should contain pairwise potential, mutation potential,
secondary structure compatibilities, gap penalties, and environment fitness poten-
tial. The energy function quality relates to the alignment accuracy.
3. Threading alignment – it aligns target sequence and structure templates utilizing
the designed scoring function. This is crucial for threading-based structure
prediction programs that take pairwise contact potential into consideration. Alter-
natively, a dynamic programming algorithm is used.
4. Threading prediction – statistically the most probable threading alignment is
selected for construction of target structure. The target sequence’s backbone
atoms are placed at the positions aligned with the backbone of structural template.
with available homologous protein structure, homology modeling is used, but when
only fold-level homology exists, threading is used for model generation. In other
words, homology modeling handles easier targets, while protein threading handles
harder targets.
Homology modeling utilizes sequence template and sequence homology in
prediction, while protein threading utilizes structural template and extracts both
sequence and structure information from the alignment. In the absence of significant
homology, protein threading predicts based on the structural information.
In case of low sequence identity (<25%) in a sequence alignment, homology
modeling may not produce reliable prediction. In such cases, protein threading could
generate a good prediction if a distant homology is found for the target.
Ab initio modeling (Klepeis et al. 2005; Liwo et al. 2005) or de novo modeling
(Bradley et al. 2005b), or physics-based modeling (Oldziej et al. 2005), or free
modeling (Bradley et al. 2005a; Jauch et al. 2007) is a fundamental test of our
knowledge of protein folding, how and why a protein adopts a specific structure out
of many possibilities. Ab initio structure prediction uses the understanding of
physicochemical principles of protein folding in nature and directly applies it to
predict the native conformation of a protein from the amino acid sequence alone
without the use of framework of earlier known structures, i.e., predicts from the
scratch. It uses physical science theories like quantum mechanics and statistical
thermodynamics.
Usually, the easiest way to predict the structure of a protein is to find a high-
resolution structure of its homolog (analog in some cases) and use its framework to
build model, which is the case of template-based modeling. This cannot be used
many times because the corresponding protein structure might not be available as the
protein structures lag far behind the protein sequences. Plausible, due to technical
difficulties, intensive labor and time costs of the experimental structure determina-
tion, whereas an exponential increase in protein sequences can be attributed to the
tremendous success of the genome sequencing projects. In such cases, computer-
based algorithm efficient to predict 3D structures directly from sequences can be
used to bridge the big gap between the number of protein sequences and the
availability of their corresponding structures. A lot of advancement is needed in
ab initio methods to handle the enormous system made of proteins in their natural
solvation environment, which involves accurate calculations for thousands of atoms
in 3D space.
Ab initio modeling is based on the consideration that all of the necessary
information for a folding of protein into native conformation resides in its amino
acid sequence. In the absence of large kinetic barriers in the free energy landscape,
the protein’s native conformation is the lowest free energy conformation for its
sequences (Anfinsen 1973) with a few exceptions (Baker and Agard 1994). The
protein folding is actually governed by the physical forces acting on the atoms of the
protein, and thus the most accurate way of structure prediction is in consideration of
all-atom model subjected to the physical forces. However, such a representation that
contains all atoms of the protein and surrounding solvent molecules increases the
complexity and makes the solution computationally expensive, which is beyond the
current computational capacity. Moreover, the representation of huge number of
atoms and the interactions between them might not be necessary during the initial
phase of the search that is far from the native conformation. So, reduced
representations of the polypeptide chain are used to reduce calculations and limit
the conformational space to manageable size. This can be done in various ways:
1. Accurate free energy function sufficiently close to the true potential for the native
state that results from the native structure of a protein corresponds to the thermo-
dynamically most stable state, i.e., lowest free energy minima among all possible
conformations.
2. Efficient search method that swiftly does the conformational search to identify the
low-energy states.
3. Efficient native-like model selection criteria from all the protein conformations.
There are two kinds of energy functions – the physics-based energy functions and
the knowledge-based energy functions. In the physics-based ab initio methods, all
atoms are represented by their atom types, and only the number of electrons is
significant. The interactions amid atoms are based on quantum mechanics, electron
charge, and Planck constant (the fundamental parameters of the coulomb potential)
(Hagler et al. 1974; Hagler and Lifson 1974; Weiner et al. 1984). However, even for
small protein structure prediction, the complete use of quantum mechanics requires
extensive computational resources. So in practice, the ab initio protein modeling
uses a compromised force field with a huge number of selected atom types (Weiner
et al. 1984; Hagler and Lifson 1974). The physics-based force fields which take all
atoms into consideration include AMBER (Weiner et al. 1984; Cornell et al. 1995;
10 Structural Bioinformatics: Life Through The 3D Glasses 231
Duan and Kollman 1998; Kaus et al. 2013), CHARMM (Brooks et al. 1983; Neria
et al. 1996; MacKerell et al. 1998; Hynninen and Crowley 2014), and OPLS
(Jorgensen and Tiradorives 1988; Jorgensen and Tirado-Rives 1998; Jorgensen
et al. 1996; Kaminski et al. 2001), with the major difference among them being
the choice of atom types and interaction parameters. These potentials contain
information about the bond lengths, angles, torsion angles, van der Waals, and
electrostatic interactions, while the knowledge-based energy functions use the
empirical energy terms obtained from the statistics of the existing 3D structure of
proteins in PDB and can be divided into two categories (Skolnick 2006). One of
them contains the generic and sequence-independent terms like the hydrogen bond
and the local backbone rigidity of a polypeptide chain (Zhang et al. 2003), while the
other contains amino acid or protein sequence-dependent terms, like pairwise resi-
due contact potential (Skolnick et al. 1997), distance-dependent atomic contact
potential (Samudrala and Moult 1998; Shen and Sali 2006; Lu and Skolnick 2001;
Zhou and Zhou 2002), and secondary structure propensities (Zhang et al. 2003,
2006; Zhang and Skolnick 2005). The most successful ab initio methods using the
knowledge-based energy functions are ROSETTA (Simons et al. 1997; Bender et al.
2016) and TASSER (Zhang and Skolnick 2004; Yang and Zhang 2015).
1. Monte Carlo simulations – Simulated annealing (SA) is the most commonly used
method (Kirkpatrick et al. 1983; Lee 1993).
2. Molecular dynamics simulations.
3. Genetic algorithm – Conformational space annealing (CSA) is one of the most
widely used genetic algorithms (Lee et al. 1998).
4. Mathematical optimization (Klepeis et al. 2005; Klepeis and Floudas 2003).
The recent advances in the sector of health care and disease prevention have come as
a collimated effort of understanding disease biology and development of efficacious
drug molecules to overcome the irregularity. The field of drug discovery dates back
to the late 1800s when chemists at Bayer synthetically synthesized the first drug
aspirin (Desborough and Keeling 2017; Sneader 2000). Since then the drug discov-
ery pipeline has traversed from being highly dependent on identifying inhibitors of
target molecule inferred from crystallographic structures (Beddell et al. 1976;
Newman and Cragg 2012) to a paradigm of high-throughput format using computa-
tional as well as wet lab resources (Doman et al. 2002). The trend has arisen
concurrently with the demand for new medicinal compounds for emerging diseases
as well as the rising cost and the financial risks while introducing a drug into the
market. The estimated value of introducing a new drug into the market has surged up
from $400 million to $2.6 billion (DiMasi et al. 2003; Basak 2012) and has further
10 Structural Bioinformatics: Life Through The 3D Glasses 233
Target
identification Lead Lead Drug Drug
Clinical trials
and identification optimization cadidates molecule
validation
risen. The issue is also thwarted by frequent failure of drugs at the clinical trial stages
due to their insufficiency to meet the adsorption, distribution, metabolism, excretion,
and toxicity (ADMET) criterions or even the withdrawal of marketed drugs due to
unforeseen implications on their use. The current scenario calls for increasing the
productivity of the pharma sector by screening for new drug targets or effector
molecules that can elicit the desired effects as well as sustain the strict criterion laid
by monitoring agency.
Efforts to integrate structural biology and drug discovery pipelines through
computer-aided drug design (CADD) are underway. There are various steps
involved in rational drug design (Fig. 10.5):
The approaches in the computational drug discovery can be divided into two
categories:
The choice of method to be used for finding effector molecule depends on the
availability of information – the structural knowledge of the target proteins or its
homologs, existence of any previously known drugs or compound libraries, and the
required computational resources. In both approaches, each step moves through
numerous iterative cycles in order to present the best possible prediction of a target
or the ligand molecule and their interaction.
Bioinformatics aids in the analysis of sequences and structure; in the development
of algorithms and software for modeling the drug-target interaction, building the
compound libraries, and easy retrieval system; and in the development of high-
throughput screening (HTS) system (Matter et al. 2001; Scapin 2006; Edwards 2009;
Cheng et al. 2013; Lagorce et al. 2015; Villoutreix 2016; Daina et al. 2017; Miteva
and Villoutreix 2017; Lagorce et al. 2017).
The first step of SBDD methodology involves gathering all information on a target
of interest: a thorough understanding of the mechanism of disease progression and
the involvement of the target protein in particular stage/stages. The implicated
proteins are identified, cloned, purified, and crystallized for solving their structure
through X-ray crystallography, NMR, or a relevant structure prediction method, in
case of experimental structure determination failure. The structure of the target
molecule (usually a protein) is used to analyze its druggability. Not all proteins
can act as valid drug targets. For being an effectual drug target, the protein must
possess an active site that can be inhibited. In other words, protein should accom-
modate ligands – either analogues of the natural ligand or other small molecules in
the active site by electrostatic interactions. The likelihood of finding suitable drug
targets can be assessed using surface and active site properties like volume, charge,
and shape that can be calculated using tools like CAST (Liang et al. 1998), CASTp
(Dundas et al. 2006), GRASP (Nicholls et al. 1991), VICE (Tripathi and Kellogg
2010), POCKET (Levitt and Banaszak 1992), and TRAPP web server (Stank et al.
2017). The procedure of identifying targets also entails the possibility of having no
functional overlap between the drug target and other host proteins, which is inferred
using phylogenetic relationships between the target and host proteins. Structure
10 Structural Bioinformatics: Life Through The 3D Glasses 235
activity relationship homology, SARAH (Frye 1999), based searches analyze and
group proteins based on sequence similarity and their ability to bind a ligand in high
throughput manner. The proposed drug targets must pass through a validation step in
order to qualify for the next rounds of drug discovery process. Possible means to
validate drug targets involve gene disruption by deletion or suppression of expres-
sion by RNA interference (RNAi) studies (Smith 2003; Ghosh et al. 2017) or site-
directed mutagenesis (Zeng et al. 2010). The reverse (Eyers et al. 1998) and forward
(Choi et al. 2014) chemical genetic screening is focused on creating or isolating
mutants of target proteins sensitive to known inhibitors.
The identification of the lead molecule involves the search for a substance with
desirable biological activity, which may serve as drug (Di et al. 2009). The ligand
molecule that binds only to the target molecule with medium or high potency is
needed to ensure that only the safest and the most bioactive compounds pass through
the trail cycles. This further reduces the risk of failures at later stages of the discovery
process. The drug molecule should have some basic properties as listed in
Table 10.5.
Appropriate assay systems to monitor the target-ligand binding should highlight
the binding preferences of a particular target molecule and consider the physiologi-
cal outcome expected for a living system and also pass well on criterion of cost and
reproducibility and hold potential to assess the effects of drug. Counter-screening
approaches using bioinformatics analysis rely on finding all possible targets (Davies
et al. 2000). A vast pool of biochemical knowledge exists on protein-ligand interac-
tion and protein-analog interaction. The existing knowledge of a target binding to a
drug can be applied to a related target protein. Thus, focused set of library are
required if the structure of the target is known. This will define particular set of
ligands, i.e., focused on one region of the chemical space. Various chemical leads
have been derived using structural similarity, which includes the development of
236 A. Punetha et al.
Once the target molecule is identified and biochemically characterized to detect its
active site, its ligand-binding pocket is screened for finding a suitable ligand from a
library of existing compounds. For this, docking tools like AutoDock (Osterberg
et al. 2002), DOCK (Kuntz et al. 1982), FlexX (Rarey et al. 1996), Glide (Friesner
et al. 2004), LigandFit (Venkatachalam et al. 2003), MOE-Dock (Corbeil et al.
2012), and UCSF Dock (Allen et al. 2015) are used (Pagadala et al. 2017; de Ruyck
et al. 2016; Lohning et al. 2017). Boltzmann-weighted potentials of mean force are
derived from the structural data of protein-ligand complex, and a scoring function is
used to score and identify candidates. The approaches to analyze the empirical
changes in the free energy and other changes in thermodynamic parameters on target
binding to different ligand are taken into consideration. Finally, a Gaussian method
to estimate the volume exclusion and solvent forces applies the Poisson-Boltzmann
equation to small and larger molecules. Thus, if the target structural data is available,
these algorithms can be applied to identify the interacting ligands that can serve as
candidate drugs based on goodness of fit. Relenza and Captopril are the well-known
drugs developed in this manner.
The important requirement of drug discovery is the availability of compound
libraries with small drug-like molecules. For becoming a potential drug candidate,
the ligand must follow the Lipinski’s rule of five. It comprises set of physical
parameters designed to predict the bioavailability of a molecule and other important
pharmaceutical characteristics. To ensure maximal bioavailability, a compound must
fulfill the following parameters of Lipinski’s rule of five (Lipinski 2004; Oprea et al.
2001; Lipinski et al. 2001):
4. The number of groups that can accept hydrogen atoms to form hydrogen bonds is
less than 10 (all nitrogen and oxygen atoms).
Many variations in this rule have been introduced to increase the druglikeness
(Ghose et al. 1999; Xu and Stevenson 2000; Avdeef 2001; Tice 2001, 2002; Veber
et al. 2002; Congreve et al. 2003; Lovering et al. 2009; Meanwell 2011; Leeson
2012; Vallianatou et al. 2015; Meanwell 2016; Shekhawat and Pokharkar 2017).
Nonetheless, the abovementioned measures form the basis of the well-established set
of ADMET properties.
Recent developments in the field of pharmacokinetics have focused on creating
alternative methods to design parameters that benchmark the properties a compound
should possess for entering the lead discovery process. To quantify the druglikeness,
the concept of desirability was implemented which provides a quantitative metric for
assessing druglikeness called as quantitative estimate of druglikeness (QED;
Harrington 1965; Derringer and Suich 1980; Bickerton et al. 2012). The QED
approach assigns desirability values to a molecule for its assessment as drug based
on categorical parameters built on desirable functions. Further, the functions are
summed to provide a single numerical QED value ranging from 0 to 1 signifying an
unfavorable to a highly favorable candidate. The desirability is simple but powerful
approach for multi-criteria optimization. It can be implemented in numerous drug
discovery applications like selection of compound, library design, molecular target
prioritization, permeation of central nervous system, and reliability estimation of the
screening data. It takes several numeric parameters measured on different scales and
labels each by an individual desirability function, which are then combined into a
single dimensionless score. A series of desirability functions (d) are derived for a
particular compound, each of which corresponds to a different molecular descriptor.
The individual desirability functions are combined into the QED by taking the
geometric mean of the individual functions, as shown in the following QED
equation:
X
1 n
QED ¼ exp ln di
n i¼1
For deriving the desirability, the eight widely used molecular properties include
molecular weight, octanol-water partition coefficient, number of hydrogen bond
donors, number of hydrogen bond acceptors, the number of aromatic rings, number
of rotatable bonds, molecular polar surface area, and number of structural alerts
(Bickerton et al. 2012). The selection is based on their relevance in determining
druglikeness.
various software to query the chemical libraries. Once the lead is identified, it needs
to be optimized for increasing its efficacy and specificity to the target.
Chemical leads that pass the initial screening process may still require further
optimization to improve their potency. The inherent problem of solving complex
crystal structures of target-ligand having variable side chains and problems in
determining the kinetic parameters for target-ligand derivative binding makes it
challenging to perform the task in high-throughput manner. The computational
approaches for lead optimization depend on designing derivatives of lead
compounds by addition of various side chains followed by prediction of 3D models
for target-ligand complexes and their virtual ADMET profiles (Cheng et al. 2013;
Honorio et al. 2013; Meanwell 2011).
The final optimized candidate drugs (CDs) are then passed through sets of clinical
trials involving preclinical phase (animal model studies), phase I (studies on normal
healthy human volunteers), phase II (selection of dose regime and the evaluation of
safety and efficacy in patients), phase III (testing on large population of patients with
potential drug and placebo – the commercial launch can be taken after this by
regulatory authorities), and phase IV (monitoring the long-term effects or any
adverse reactions reported by doctors). Thus, the drug discovery itself is a time
taking and lengthy process, which therefore requires the aid of computational
methods to cut down the time and cost at various steps in the process. This requires
development of efficient prediction algorithms, methods to efficiently model the
target-ligand interaction, efficient software, databases, and retrieval tools.
Thus, structural bioinformatics is not only an integral part of structural biology
but is also indispensable in drug discovery and health care.
References
Adrian M, Heddi B, Phan AT (2012) NMR spectroscopy of G-quadruplexes. Methods (San Diego,
Calif) 57(1):11–24. https://doi.org/10.1016/j.ymeth.2012.05.003
Agarwal T, Jayaraj G, Pandey SP, Agarwala P, Maiti S (2012) RNA G-quadruplexes:
G-quadruplexes with “U” turns. Curr Pharm Des 18(14):2102–2111
Ahmed YL, Ficner R (2014) RNA synthesis and purification for structural studies. RNA Biol 11
(5):427–432. https://doi.org/10.4161/rna.28076
Allen WJ, Balius TE, Mukherjee S, Brozell SR, Moustakas DT, Lang PT, Case DA, Kuntz ID,
Rizzo RC (2015) DOCK 6: Impact of new features and current docking performance. J Comput
Chem 36(15):1132–1156. https://doi.org/10.1002/jcc.23905
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids
Res 25(17):3389–3402
Amunts A, Brown A, Toots J, Scheres SH, Ramakrishnan V (2015) Ribosome. The structure of the
human mitochondrial ribosome. Science 348(6230):95–98. https://doi.org/10.1126/science.
aaa1193
10 Structural Bioinformatics: Life Through The 3D Glasses 239
Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J (2000a) The
protein data bank and the challenge of structural genomics. Nat Struct Biol 7(Suppl):957–959.
https://doi.org/10.1038/80734
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE
(2000b) The protein data bank. Nucleic Acids Res 28(1):235–242
Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O,
Shimanouchi T, Tasumi M (1977) The protein data bank: a computer-based archival file for
macromolecular structures. J Mol Biol 112(3):535–542
Bharat TA, Scheres SH (2016) Resolving macromolecular structures from electron cryo-
tomography data using subtomogram averaging in RELION. Nat Protoc 11(11):2054–2065.
https://doi.org/10.1038/nprot.2016.124
Bharat Tanmay A, Russo Christopher J, Löwe J, Passmore Lori A, Scheres Sjors H (2015)
Advances in single-particle electron cryomicroscopy structure determination applied to
sub-tomogram averaging. Structure (London, England:1993) 23(9):1743–1753. https://doi.
org/10.1016/j.str.2015.06.026
Bhattacharya D, Cao R, Cheng J (2016) UniCon3D: de novo protein structure prediction using
united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics
(Oxford, England) 32(18):2791–2799. https://doi.org/10.1093/bioinformatics/btw316
Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Cassarino TG,
Bertoni M, Bordoli L, Schwede T (2014) SWISS-MODEL: modelling protein tertiary and
quaternary structure using evolutionary information. Nucleic Acids Res 42(Web Server
issue):W252–W258. https://doi.org/10.1093/nar/gku340
Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL (2012) Quantifying the chemical
beauty of drugs. Nat Chem 4(2):90–98. https://doi.org/10.1038/nchem.1243
Binkowski TA, Freeman P, Liang J (2004) pvSOAR: detecting similar surface patterns of pocket
and void surfaces of amino acid residues on proteins. Nucleic Acids Res 32(Web Server issue):
W555–W558. https://doi.org/10.1093/nar/gkh390
Blake JD, Cohen FE (2001) Pairwise sequence alignment below the twilight zone. J Mol Biol 307
(2):721–735
Blaszczyk M, Jamroz M, Kmiecik S, Kolinski A (2013) CABS-fold: Server for the de novo and
consensus-based prediction of protein structure. Nucleic Acids Res 41(Web Server issue):
W406–W411. https://doi.org/10.1093/nar/gkt462
Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a
known three-dimensional structure. Science 253(5016):164–170
Bradley P, Malmstrom L, Qian B, Schonbrun J, Chivian D, Kim DE, Meiler J, Misura KM, Baker D
(2005a) Free modeling with Rosetta in CASP6. Proteins 61(Suppl 7):128–134
Bradley P, Misura KM, Baker D (2005b) Toward high-resolution de novo structure prediction for
small proteins. Science 309(5742):1868–1871
Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) CHARMM:
a program for macromolecular energy, minimization, and dynamics calculations. J Comput
Chem 4(2):187–217
Burge S, Parkinson GN, Hazel P, Todd AK, Neidle S (2006) Quadruplex DNA: sequence, topology
and structure. Nucleic Acids Res 34(19):5402–5415. https://doi.org/10.1093/nar/gkl655
Campbell NH, Parkinson GN (2007) Crystallographic studies of quadruplex nucleic acids. Methods
(San Diego, Calif) 43(4):252–263. https://doi.org/10.1016/j.ymeth.2007.08.005
Chen JL, Greider CW (2005) Functional analysis of the pseudoknot structure in human telomerase
RNA. Proc Natl Acad Sci U S A 102(23):8080–8085; discussion 8077–8089. https://doi.org/10.
1073/pnas.0502259102
Chen X, Ramakrishnan B, Sundaralingam M (1995) Crystal structures of B-form DNA-RNA
chimers complexed with distamycin. Nat Struct Biol 2(9):733–735
Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW,
Richardson JS, Richardson DC (2010) MolProbity: all-atom structure validation for
10 Structural Bioinformatics: Life Through The 3D Glasses 241
Di L, Kerns EH, Carter GT (2009) Drug-like property concepts in pharmaceutical design. Current
Pharm Des 15(19):2184–2194
Dickerhoff J, Haase L, Langel W, Weisz K (2017) Tracing effects of fluorine substitutions on
G-Quadruplex conformational changes. ACS Chem Biol 12(5):1308–1315. https://doi.org/10.
1021/acschembio.6b01096
DiMasi JA, Hansen RW, Grabowski HG (2003) The price of innovation: new estimates of drug
development costs. J Health Econ 22(2):151–185. https://doi.org/10.1016/S0167-6296(02)
00126-1
Doherty EA, Doudna JA (2000) Ribozyme structures and mechanisms. Annu Rev Biochem
69:597–615. https://doi.org/10.1146/annurev.biochem.69.1.597
Dolinnaya NG, Ogloblina AM, Yakubovskaya MG (2016) Structure, properties, and biological
relevance of the DNA and RNA G-Quadruplexes: overview 50 years after their discovery.
Biochem Biokhimiia 81(13):1602–1649. https://doi.org/10.1134/s0006297916130034
Doman TN, McGovern SL, Witherbee BJ, Kasten TP, Kurumbail R, Stallings WC, Connolly DT,
Shoichet BK (2002) Molecular docking and high-throughput screening for novel inhibitors of
protein tyrosine phosphatase-1B. J Med Chem 45(11):2213–2221
Dorn M, E Silva MB, Buriol LS, Lamb LC (2014) Three-dimensional protein structure prediction:
methods and computational strategies. Comput Biol Chem 53:251–276. https://doi.org/10.1016/
j.compbiolchem.2014.10.001
Duan Y, Kollman PA (1998) Pathways to a protein folding intermediate observed in a
1-microsecond simulation in aqueous solution. Science 282(5389):740–744. https://doi.org/
10.1126/science.282.5389.740
Dudek CA, Dannheim H, Schomburg D (2017) BrEPS 2.0: optimization of sequence pattern
prediction for enzyme annotation. PloS One 12(7):e0182216. https://doi.org/10.1371/journal.
pone.0182216
Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, Liang J (2006) CASTp: computed atlas of
surface topography of proteins with structural and topographical mapping of functionally
annotated residues. Nucleic Acids Res 34(Web Server issue):W116–W118. https://doi.org/10.
1093/nar/gkl282
Edwards PJ (2009) Current parallel chemistry principles and practice: application to the discovery
of biologically active molecules. Curr Opin Drug Discov Dev 12(6):899–914
Eisenberg D (2003) The discovery of the α-helix and β-sheet, the principal structural features of
proteins. Proc Natl Acad Sci U S A 100(20):11207–11210. https://doi.org/10.1073/pnas.
2034522100
Eisenberg D, Luthy R, Bowie JU (1997) VERIFY3D: assessment of protein models with three-
dimensional profiles. Methods Enzymol 277:396–404
Eyers PA, Craxton M, Morrice N, Cohen P, Goedert M (1998) Conversion of SB 203580-
insensitive MAP kinase family members to drug-sensitive forms by a single amino-acid
substitution. Chem Biol 5(6):321–328
Fay MM, Lyons SM, Ivanov P (2017) RNA G-quadruplexes in biology: principles and molecular
mechanisms. J Mol Biol 429(14):2127–2147. https://doi.org/10.1016/j.jmb.2017.05.017
Ferre-D’Amare AR, Doudna JA (1999) RNA folds: insights from recent crystal structures. Annu
Rev Biophys Biomol Struct 28:57–73. https://doi.org/10.1146/annurev.biophys.28.1.57
Floudas CA (2007) Computational methods in protein structure prediction. Biotechnol Bioeng 97
(2):207–213. https://doi.org/10.1002/bit.21411
Frank J (2017) Advances in the field of single-particle cryo-electron microscopy over the last
decade. Nat Protoc 12(2):209–212. https://doi.org/10.1038/nprot.2017.004
Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH,
Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS (2004) Glide: a new approach for rapid,
accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47
(7):1739–1749. https://doi.org/10.1021/jm0306430
10 Structural Bioinformatics: Life Through The 3D Glasses 243
Nugent CI, Lundblad V (1998) The telomerase reverse transcriptase: components and regulation.
Genes Dev 12(8):1073–1085
Oefner C, D’Arcy A, Hennig M, Winkler FK, Dale GE (2000) Structure of human neutral
endopeptidase (Neprilysin) complexed with phosphoramidon. J Mol Biol 296(2):341–349.
https://doi.org/10.1006/jmbi.1999.3492
Oldziej S, Czaplewski C, Liwo A, Chinchio M, Nanias M, Vila JA, Khalili M, Arnautova YA,
Jagielska A, Makowski M, Schafroth HD, Kazmierkiewicz R, Ripoll DR, Pillardy J, Saunders
JA, Kang YK, Gibson KD, Scheraga HA (2005) Physics-based protein-structure prediction
using a hierarchical protocol based on the UNRES force field: assessment in two blind tests.
Proc Natl Acad Sci U S A 102(21):7547–7552
Oprea TI, Davis AM, Teague SJ, Leeson PD (2001) Is there a difference between leads and drugs?
A historical perspective. J Chem Inf Comput Sci 41(5):1308–1315
Orlov I, Myasnikov AG, Andronov L, Natchiar SK, Khatter H, Beinsteiner B, Menetret JF,
Hazemann I, Mohideen K, Tazibt K, Tabaroni R, Kratzat H, Djabeur N, Bruxelles T,
Raivoniaina F, Pompeo LD, Torchy M, Billas I, Urzhumtsev A, Klaholz BP (2017) The
integrative role of cryo electron microscopy in molecular and cellular structural biology. Biol
Cell 109(2):81–93. https://doi.org/10.1111/boc.201600042
Osterberg F, Morris GM, Sanner MF, Olson AJ, Goodsell DS (2002) Automated docking to
multiple target structures: incorporation of protein mobility and structural water heterogeneity
in AutoDock. Proteins 46(1):34–40
Pagadala NS, Syed K, Tuszynski J (2017) Software for molecular docking: a review. Biophys Rev 9
(2):91–102. https://doi.org/10.1007/s12551-016-0247-1
Pandey RB, Jacobs DJ, Farmer BL (2017) Preferential binding effects on protein structure and
dynamics revealed by coarse-grained Monte Carlo simulation. J Chem Phys 146(19):195101.
https://doi.org/10.1063/1.4983222
Paquet E, Viktor HL (2015) Molecular dynamics, Monte Carlo simulations, and langevin dynamics:
a computational review. BioMed Res Int 2015:183918. https://doi.org/10.1155/2015/183918
Parkinson GN, Lee MP, Neidle S (2002) Crystal structure of parallel quadruplexes from human
telomeric DNA. Nature 417(6891):876–880. https://doi.org/10.1038/nature755
Patel DJ, Phan AT, Kuryavyi V (2007) Human telomere, oncogenic promoter and 5’-UTR
G-quadruplexes: diverse higher order DNA and RNA targets for cancer therapeutics. Nucleic
Acids Res 35(22):7429–7455. https://doi.org/10.1093/nar/gkm711
Patel TR, Chojnowski G, Astha, Koul A, McKenna SA, Bujnicki JM (2017) Structural studies of
RNA-protein complexes: a hybrid approach involving hydrodynamics, scattering, and compu-
tational methods. Methods (San Diego, Calif) 118:146–162. https://doi.org/10.1016/j.ymeth.
2016.12.002
Pauling L, Corey RB (1951) Configuration of polypeptide chains. Nature 168(4274):550–551
Pauling L, Corey RB, Branson HR (1951) The structure of proteins; two hydrogen-bonded helical
configurations of the polypeptide chain. Proc Natl Acad Sci U S A 37(4):205–211
Perrone R, Lavezzo E, Palu G, Richter SN (2017) Conserved presence of G-quadruplex forming
sequences in the Long Terminal Repeat Promoter of Lentiviruses. Sci Rep 7(1):2018. https://
doi.org/10.1038/s41598-017-02291-1
Piccirilli JA, Koldobskaya Y (2011) Crystal structure of an RNA polymerase ribozyme in complex
with an antibody fragment. Philos Trans R Soc Lond Ser B Biol Sci 366(1580):2918–2928.
https://doi.org/10.1098/rstb.2011.0144
Pillardy J, Czaplewski C, Liwo A, Lee J, Ripoll DR, Kazmierkiewicz R, Oldziej S, Wedemeyer WJ,
Gibson KD, Arnautova YA, Saunders J, Ye YJ, Scheraga HA (2001) Recent improvements in
prediction of protein structure by global optimization of a potential energy function. Proc Natl
Acad Sci U S A 98(5):2329–2333. https://doi.org/10.1073/pnas.041609598
Porrini M, Rosu F, Rabin C, Darre L, Gomez H, Orozco M, Gabelica V (2017) Compaction of
duplex nucleic acids upon native electrospray mass spectrometry. ACS Cent Sci 3(5):454–461.
https://doi.org/10.1021/acscentsci.7b00084
10 Structural Bioinformatics: Life Through The 3D Glasses 249
Samudrala R, Xia Y, Huang E, Levitt M (1999) Ab initio protein structure prediction using a
combined hierarchical approach. Proteins Suppl 3:194–198
Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural
meaning of sequence alignment. Proteins 9(1):56–68
Sathyamoorthy B, Shi H, Zhou H, Xue Y, Rangadurai A, Merriman DK, Al-Hashimi HM (2017)
Insights into Watson-Crick/Hoogsteen breathing dynamics and damage repair from the solution
structure and dynamic ensemble of DNA duplexes containing m1A. Nucleic Acids Res 45
(9):5586–5601. https://doi.org/10.1093/nar/gkx186
Scapin G (2006) Structural biology and drug discovery. Current Pharm Des 12(17):2087–2097
Schlick T, Pyle AM (2017) Opportunities and challenges in RNA structural modeling and design.
Biophys J 113(2):225–234. https://doi.org/10.1016/j.bpj.2016.12.037
Sedova A, Banavali NK (2015) RNA approaches the B-form in stacked single strand dinucleotide
contexts. Biopolymers. https://doi.org/10.1002/bip.22750
Shekhawat PB, Pokharkar VB (2017) Understanding peroral absorption: regulatory aspects and
contemporary approaches to tackling solubility and permeability hurdles. Acta Pharm Sin B 7
(3):260–280. https://doi.org/10.1016/j.apsb.2016.09.005
Shen MY, Sali A (2006) Statistical potential for assessment and prediction of protein structures.
Protein Sci: Publ Protein Soc 15(11):2507–2524. https://doi.org/10.1110/ps.062416606
Simons KT, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from
fragments with similar local sequences using simulated annealing and Bayesian scoring
functions. J Mol Biol 268(1):209–225
Skiniotis G, Southworth DR (2016) Single-particle cryo-electron microscopy of macromolecular
complexes. Microscopy (Oxford, England) 65(1):9–22. https://doi.org/10.1093/jmicro/dfv366
Skolnick J (2006) In quest of an empirical potential for protein structure prediction. Curr Opin
Struct Biol 16(2):166–171. https://doi.org/10.1016/j.sbi.2006.02.004
Skolnick J, Jaroszewski L, Kolinski A, Godzik A (1997) Derivation and testing of pair potentials for
protein folding. When is the quasichemical approximation correct? Protein Sci: Publ Protein Soc
6(3):676–688. https://doi.org/10.1002/pro.5560060317
Sleator RD, Walsh P (2010) An overview of in silico protein function prediction. Arch Microbiol
192(3):151–155. https://doi.org/10.1007/s00203-010-0549-9
Smith C (2003) Drug target validation: hitting the target. Nature 422(6929): 341, 343, 345 passim.
https://doi.org/10.1038/422341a
Sneader W (2000) The discovery of aspirin: a reappraisal. BMJ: Br Med J 321(7276):1591–1594
Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology
detection and structure prediction. Nucleic Acids Res 33(Web Server issue):W244–W248.
https://doi.org/10.1093/nar/gki408
Stahl K, Schneider M, Brock O (2017) EPSILON-CP: using deep learning to combine information
from multiple sources for protein contact prediction. BMC Bioinf 18:303. https://doi.org/10.
1186/s12859-017-1713-x
Stank A, Kokh DB, Horn M, Sizikova E, Neil R, Panecka J, Richter S, Wade RC (2017) TRAPP
webserver: predicting protein binding site flexibility and detecting transient binding pockets.
Nucleic Acids Res. https://doi.org/10.1093/nar/gkx277
Staple DW, Butcher SE (2005) Pseudoknots: RNA structures with diverse functions. PLoS Biol 3
(6):e213. https://doi.org/10.1371/journal.pbio.0030213
Subramaniam S, Earl LA, Falconieri V, Milne JLS, Egelman EH (2016) Resolution advances in
cryo-EM enable application to drug discovery. Curr Opin Struct Biol 41:194–202. https://doi.
org/10.1016/j.sbi.2016.07.009
Sugiki T, Kobayashi N, Fujiwara T (2017) Modern technologies of solution nuclear magnetic
resonance spectroscopy for three-dimensional structure determination of proteins open avenues
for life scientists. Comput Struct Biotechnol J 15:328–339. https://doi.org/10.1016/j.csbj.2017.
04.001
10 Structural Bioinformatics: Life Through The 3D Glasses 251
Sun LZ, Zhang D, Chen SJ (2017) Theory and modeling of RNA structure and interactions with
metal ions and small molecules. Annu Rev Biophys 46:227–246. https://doi.org/10.1146/
annurev-biophys-070816-033920
Takahama K, Takada A, Tada S, Shimizu M, Sayama K, Kurokawa R, Oyoshi T (2013) Regulation
of telomere length by G-quadruplex telomere DNA- and TERRA-binding protein TLS/FUS.
Chem Biol 20(3):341–350. https://doi.org/10.1016/j.chembiol.2013.02.013
Tice CM (2001) Selecting the right compounds for screening: does Lipinski’s Rule of 5 for
pharmaceuticals apply to agrochemicals? Pest Manag Sci 57(1):3–16. https://doi.org/10.1002/
1526-4998(200101)57:1<3::aid-ps269>3.0.co;2-6
Tice CM (2002) Selecting the right compounds for screening: use of surface-area parameters. Pest
Manag Sci 58(3):219–233. https://doi.org/10.1002/ps.441
Tilton RF, Dewan JC, Petsko GA (1992) Effects of temperature on protein structure and dynamics:
x-ray crystallographic studies of the protein ribonuclease-A at nine different temperatures from
98 to 320K. Biochemistry 31(9):2469–2481. https://doi.org/10.1021/bi00124a006
Tinoco I Jr, Bustamante C (1999) How RNA folds. J Mol Biol 293(2):271–281. https://doi.org/10.
1006/jmbi.1999.3001
Tosatto SC, Toppo S (2006) Large-scale prediction of protein structure and function from sequence.
Curr Pharm Des 12(17):2067–2086
Tripathi A, Kellogg GE (2010) A novel and efficient tool for locating and characterizing protein
cavities and binding sites. Proteins 78(4):825–842. https://doi.org/10.1002/prot.22608
Vaguine AA, Richelle J, Wodak SJ (1999) SFCHECK: a unified set of procedures for evaluating the
quality of macromolecular structure-factor data and their agreement with the atomic model. Acta
Crystallogr D Biol Crystallogr 55(Pt 1):191–205. https://doi.org/10.1107/s0907444998006684
Vallianatou T, Giaginis C, Tsantili-Kakoulidou A (2015) The impact of physicochemical and
molecular properties in drug design: navigation in the “drug-like” chemical space. Adv Exp
Med Biol 822:187–194. https://doi.org/10.1007/978-3-319-08927-0_21
Veber DF, Johnson SR, Cheng HY, Smith BR, Ward KW, Kopple KD (2002) Molecular properties
that influence the oral bioavailability of drug candidates. J Med Chem 45(12):2615–2623
Venclovas C, Margelevicius M (2005) Comparative modeling in CASP6 using consensus approach
to template selection, sequence-structure alignment, and structure assessment. Proteins 61
(Suppl 7):99–105
Venkatachalam CM, Jiang X, Oldfield T, Waldman M (2003) LigandFit: a novel method for the
shape-directed rapid docking of ligands to protein active sites. J Mol Graph Model 21
(4):289–307
Venko K, Roy Choudhury A, Novic M (2017) Computational approaches for revealing the structure
of membrane transporters: case study on bilitranslocase. Comput Struct Biotechnol J
15:232–242. https://doi.org/10.1016/j.csbj.2017.01.008
Villoutreix BO (2016) Combining bioinformatics, chemoinformatics and experimental approaches
to design chemical probes: applications in the field of blood coagulation. Ann Pharm Fr 74
(4):253–266. https://doi.org/10.1016/j.pharma.2016.03.006
Wahl MC, Sundaralingam M (1997) Crystal structures of A-DNA duplexes. Biopolymers 44
(1):45–63. https://doi.org/10.1002/(sici)1097-0282(1997)44:1<45::aid-bip4>3.0.co;2-#
Wan W, Briggs JA (2016) Cryo-electron tomography and subtomogram averaging. Methods
Enzymol 579:329–367. https://doi.org/10.1016/bs.mie.2016.04.014
Wang G, Vasquez KM (2007) Z-DNA, an active element in the genome. Front Biosci
12:4424–4438
Wang AH, Quigley GJ, Kolpak FJ, Crawford JL, van Boom JH, van der Marel G, Rich A (1979)
Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature
282(5740):680–686
Wang Z, Yin P, Lee JS, Parasuram R, Somarowthu S, Ondrechen MJ (2013) Protein function
annotation with Structurally Aligned Local Sites of Activity (SALSAs). BMC Bioinf 14(Suppl
3):S13–S13. https://doi.org/10.1186/1471-2105-14-S3-S13
252 A. Punetha et al.
Wang C, Zhang H, Zheng W-M, Xu D, Zhu J, Wang B, Ning K, Sun S, Li SC, Bu D (2016)
FALCON@home: a high-throughput protein structure prediction server based on remote
homologue recognition. Bioinformatics (Oxford, England) 32(3):462–464. https://doi.org/10.
1093/bioinformatics/btv581
Wang S, Sun S, Li Z, Zhang R, Xu J (2017) Accurate de novo prediction of protein contact map by
ultra-deep learning model. PLoS Comput Biol 13(1):e1005324. https://doi.org/10.1371/journal.
pcbi.1005324
Watson JD, Crick FH (1953) Molecular structure of nucleic acids; a structure for deoxyribose
nucleic acid. Nature 171(4356):737–738
Webb B, Sali A (2016) Comparative protein structure modeling using MODELLER. Current
protocols in bioinformatics/editorial board, Andreas D Baxevanis [et al] 54:5.6.1–5.6.37.
https://doi.org/10.1002/cpbi.3
Weichenberger CX, Sippl MJ (2007) NQ-Flipper: recognition and correction of erroneous aspara-
gine and glutamine side-chain rotamers in protein structures. Nucleic Acids Res 35(Web Server
issue):W403–W406. https://doi.org/10.1093/nar/gkm263
Weiner SJ, Kollman PA, Case DA, Singh UC, Ghio C, Alagona G, Profeta S, Weiner P (1984) A
new force-field for molecular mechanical simulation of nucleic-acids and proteins. J Am Chem
Soc 106(3):765–784. https://doi.org/10.1021/Ja00315a051
Weisser M, Schafer T, Leibundgut M, Bohringer D, Aylett CHS, Ban N (2017) Structural and
functional insights into human re-initiation complexes. Mol Cell 67(3):447–456.e447. https://
doi.org/10.1016/j.molcel.2017.06.032
Weldon C, Eperon IC, Dominguez C (2016) Do we know whether potential G-quadruplexes
actually form in long functional RNA molecules? Biochem Soc Trans 44(6):1761–1768.
https://doi.org/10.1042/bst20160109
Weldon C, Behm-Ansmant I, Hurley LH, Burley GA, Branlant C, Eperon IC, Dominguez C (2017)
Identification of G-quadruplexes in long functional RNAs using 7-deazaguanine RNA. Nat
Chem Biol 13(1):18–20. https://doi.org/10.1038/nchembio.2228
Westbrook JD, Hall RS (1995) DDL. A dictionary description language for structure macromolec-
ular, V. 2.1.1. Rutgers University NDB-110, New Brunswick
Whisstock JC, Lesk AM (2003) Prediction of protein function from protein sequence and structure.
Q Rev Biophys 36(3):307–340
Wiederstein M, Sippl MJ (2007) ProSA-web: interactive web service for the recognition of errors in
three-dimensional structures of proteins. Nucleic Acids Res 35(Web Server issue):W407–
W410. https://doi.org/10.1093/nar/gkm290
Wilkins MH, Stokes AR, Wilson HR (1953) Molecular structure of deoxypentose nucleic acids.
Nature 171(4356):738–740
Wing R, Drew H, Takano T, Broka C, Tanaka S, Itakura K, Dickerson RE (1980) Crystal structure
analysis of a complete turn of B-DNA. Nature 287(5784):755–758
Wright WE, Tesmer VM, Huffman KE, Levene SD, Shay JW (1997) Normal human chromosomes
have long G-rich telomeric overhangs at one end. Genes Dev 11(21):2801–2809
Wu S, Zhang Y (2008) MUSTER: Improving protein sequence profile–profile alignments by using
multiple sources of structure information. Proteins 72(2):547–556. https://doi.org/10.1002/prot.
21945
Xu J, Stevenson J (2000) Drug-like index: a new approach to measure drug-like compounds and
their diversity. J Chem Inf Comput Sci 40(5):1177–1187
Xu D, Zhang Y (2012) Ab initio protein structure assembly using continuous structure fragments
and optimized knowledge-based force field. Proteins 80(7):1715–1735. https://doi.org/10.1002/
prot.24065
Yang J, Zhang Y (2015) I-TASSER server: new development for protein structure and function
predictions. Nucleic Acids Res 43(W1):W174–W181. https://doi.org/10.1093/nar/gkv342
Yang H, Guranovic V, Dutta S, Feng Z, Berman HM, Westbrook JD (2004) Automated and
accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta
10 Structural Bioinformatics: Life Through The 3D Glasses 253