0% found this document useful (0 votes)
87 views107 pages

Ligand-Based+structure-Based Screening

The document discusses ligand-based and structure-based virtual screening techniques used to select potential drug molecules from large databases. It describes how virtual screening can be used to choose compounds for biological testing from in-house databases, external suppliers, or targeted synthesis. The key techniques discussed are ligand-based methods like similarity searching, pharmacophore mapping, and machine learning, as well as structure-based docking when the target protein structure is known. It also covers how molecular fingerprints represent molecules as binary vectors and are used to measure similarity as part of virtual screening workflows.

Uploaded by

Manohar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views107 pages

Ligand-Based+structure-Based Screening

The document discusses ligand-based and structure-based virtual screening techniques used to select potential drug molecules from large databases. It describes how virtual screening can be used to choose compounds for biological testing from in-house databases, external suppliers, or targeted synthesis. The key techniques discussed are ligand-based methods like similarity searching, pharmacophore mapping, and machine learning, as well as structure-based docking when the target protein structure is known. It also covers how molecular fingerprints represent molecules as binary vectors and are used to measure similarity as part of virtual screening workflows.

Uploaded by

Manohar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 107

Ligand-Based and Structure-Based

Virtual Screening
The Drug Discovery Challenge

right molecule, right


target
High throughput automati on
High-throughput screening Combinatorial chemistry

Still need to consider carefully what to


screen/make
Choosing the right molecule
• Goal: to find a lead compound that can be optimised to give a
drug candidate
– Optimisation: using chemical synthesis to modify the lead molecule
in order to improve its chances of being a successful drug
• The challenge: chemical space is vast
– Estimates vary
• Reymond et al. suggest there are ~1 billion compounds with up to 13
heavy atoms
• There are ~30 million known compounds
• A typical pharmaceutical compound collection contains ~1
million compounds
• High throughput screening allows large (up to 1 million)
numbers of
compounds to be tested
– But very small proportion of “available” compounds
– Large scale screening is expensive
– Not all targets are suitable for HTS

B l u m , L.C. & Reymond, J.-louis . J. Am. Chem. Soc. 131, 8732-


Virtual Screening
• Virtual screening refers to a range of in-silico techniques used
to search large compound databases to select a smaller
number for biological testing

• Virtual screening can be used to


– Select compounds for screening from in-house databases
– Choose compounds to purchase from external suppliers
– Decide which compounds to synthesise next

• The technique applied depends on the amount of


information available about the particular disease target
Virtual Screening

3D Structure of Target

Unknown Known

Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known

Similarity Pharmacophore Machine learning Protein Ligand


searching mapping methods Docking
Virtual Screening

3D Structure of Target

Unknown Known

Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known

Similarity Pharmacophore Machine learning Protein Ligand


searching mapping methods Docking
Rati onale for similarity
searching
• The similar property principle states that
structurally similar molecules tend to have similar
properties (cf neighbourhood principle)
N N
N

O O
O O O
HO OH O O
O OH

Morphine Codeine Heroin

• Basis of medicinal chemistry efforts and of all


ligand-
based virtual screening methods
– Despite the existence of “activity cliffs”
Similarity-based virtual screening
• Given an active reference structure rank order a
database of compounds on similarity to the
reference
• Select the top ranking compounds for biological
testing
• Requires a way of measuring the similarity of a pair
of compounds

• But similarity is inherently subjective, so need to


provide a quantitative basis, a similarity measure, for
ranking structures
• There is no single measure of similarity
Wh i c h among these are most similar?

Banana Orange Basketball Ash gourd

Apple Pears Pumpkin


Football
Three components of a similarity
measure
• Molecular descriptors
– Numerical values assigned to structures
• Physicochemical properties, e.g., MW, logP, MR, PSA,....
• 2D properties: fingerprints, topological indices, maximum
common substructures
• 3D properties: fingerprints, molecular fields

• Similarity coefficient
– A quantitative measure of similarity between two sets
of molecular descriptors

• Can also use a weighting function to ensure equal (or


non-equal) contributions from all parts of the
measure
Todeschini & Consonni, Handbook of Molecular Descriptors
Wiley-VCH, 2009
2D fi ngerprints: molecules
represented as binary vectors
C O
C C C C C
C C

• Each bit in the bit string (binary vector) represents one molecular
fragment. Typical length is ~1000 bits
• The bit string for a molecule records the presence (“1”) or
absence (“0”) of each fragment in the molecule
• Originally developed for speeding up substructure search
– for a query substructure to be present in a database molecule each bit set to
“1” in the query must also be set to “1” in the database structure
• Similarity is based on determining the number of bits that
are common to two structures
Example fragments
C

C N C N C

C C
C C
a. Augmented b. Atom c. Bond Sequence
Atom C rs C rd Sequence C rs AA rs AA rs AA rd
C rs C C rs C rd C AA
C C C
C C C C
C
d. Ring Composition e. Ring Fusion f. Atom Pair
N rs C rd C rs C rs C XX3 XX3 XX3 XX2 N 0;3 - 2 - C
rs XX2 0;3

N N
C C

C C
C
Dictionary-based fingerprints: pre-defined fragments each of
which maps to a single bit. Examples include MACCS Keys, BCI
fps
Hashed Fingerprints
OH
H3C

C O O O O

• Fragments are generated algorithmically without the need for a


dictionary eg, all paths up to seven non-hydrogen atoms
• Each fragment is processed using several different
hashing functions, each of which sets a single bit in the
fingerprint
• There is a one-to-many mapping between a fragment and bits in
the bit string and a given bit may be set by different fragments
• Examples: Daylight, UNITY fingerprints
MACCS keys are 166-bit 2D structure fingerprints that are commonly
used for the mea- sure of molecular similarity. Because each bit is
either on (i.e., 1) or off (i.e., 0), 

MACCS 166 keys can represent more than 9.3 × 1049


distinct fingerprint vectors.

The Morgan fingerprint is basically a reimplementation of the


extended connectivity fingerprint  (ECFP)
Molecular Similarity

Molecular similarity is one of the most heavily exploited concepts in


cheminformatics and related areas (such as medicinal chemistry and
drug discovery). It is applied to multiple tasks, including similarity
searching, property prediction, synthesis design, virtual screening,
cluster analysis, and molecular diversity analysis.  

However, because molecular similarity is a concept, not a physical


observable, “measuring” molecular similarity is inherently subjective
and context-dependent. 

There is no correct or authoritative measure of molecular similarity.  

As a result, various similarity measures have been proposed to


quantify the degree of structural similarity between molecules.  

Molecular Descriptors. (2019, October 25).


https://chem.libretexts.org/@go/page/192626
In general, these measures involve two principal components:

Molecular descriptors that represent the structures of the molecules


being compared.

Similarity coefficient (metric) used to compute a quantitative score for


the degree of similarity based on the weighted values of structural
descriptors.

The molecular descriptors may need to be pre-processed before the


similarity calculation, using a weighting scheme that assigns differing
degrees of importance to various components of molecular descriptors.
 For this reason, some papers list the weighting scheme as a third
component of similarity measures .  

While some studies have focused on the effects of the weighting


schemes upon similarity calculations, much more attention has been
given to molecular descriptors and similarity coefficients.  Therefore,
this chapter also focuses on these two components.
 
Molecular descriptors

There are many molecular descriptors that capture different aspects of


molecules, but they are broadly classified according to their “dimensionality”. 

One-dimensional (1-D) descriptors include bulk properties and


physicochemical parameters (e.g., log P, molecular weight, polar surface
area). 

Two-dimensional (2-D) descriptors include structural fragments or connectivity


indices derived from the 2-D representation of the molecule. 

Three-dimensional (3-D) descriptors, such as molecular shape, are derived


from 3-D molecular structures (i.e., 3-D coordinates of the atoms in the
molecule). 

Here the focus is on 2-D molecular fingerprints, which encodes the 2-D
structure of molecules.  While many molecular fingerprints have been
developed, we discuss two types of molecular fingerprints, structural keys and
hashed fingerprints, because they are more widely used than others
Fig. 1. (above) Two molecules are shown along with the respective bit substructures
highlighted for comparison. The number of bits and designations used for this figure is
simply for display and illustrative purposes. The true fingerprint would be much longer.

In structural keys, the structure of a molecule is encoded into a binary bit string (that is,
a sequence of 0’s and 1’s), each bit of which corresponds to a “pre-defined” structural
feature (e.g., substructure or fragment). If the molecule has a pre-defined feature, the
bit position corresponding to this feature is set to 1 (ON). Otherwise, it is set to 0
(OFF). It is important to understand that structural keys cannot encode structural
features that are not pre-defined in the fragment library.

Examples are the MACCS keys and PubChem Fingerprints.


MACCS keys

The MACCS (Molecular ACCess System) keys are one of the most commonly used
structural keys. They are sometimes referred to as the MDL keys, named after the
company that developed them [the MDL Information Systems (now BIOVIA)].

While there are two sets of MACCS keys (one with 960 keys and the other containing
a subset of 166 keys), only the shorter fragment definitions are available to the public.
These 166 public keys are implemented in popular open-source cheminformatics
software packages, including RDKit, OpenBabel, CDK etc.

The fragment definitions for the MACCS 166 keys can be found in this document:
https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/MACCSkeys.py
PubChem fingerprints

The PubChem fingerprint is a 881-bit-long structural key, which is used by


PubChem for similarity searching (interactively through the PubChem Homepage
or programmatically through PUG-REST). It is also used for structure neighboring,
which “pre-computes” a list of similar chemical structure for each compound.
This pre-computed list is accessible through the Compound Summary page (the
Related Compounds and Related Compounds with Annotation sections).

The fragment dictionary of the PubChem fingerprint is organized in seven


sections, as described in the following document:

ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf
Hashed Fingerprints

An alternative to structural keys is hashed fingerprints. Contrary to structural keys,


hashed fingerprints do not require a pre-defined fragment library. Instead, they are
generated by enumerating through the molecule all possible fragments that are not
bigger than a certain size and then converting these fragments into numeric values
using a “hash” function (https://en.wikipedia.org/wiki/Hash_function). These
numeric values can be used to indicate bit positions in the hashed fingerprints.

Hash functions are used to map data of arbitrary size to “fixed-size” values.
Enumerating all possible fragments with a molecule may result in a very large
number of fragments. Hashing them into values within a fixed range inevitably
results in “bit collisions”, in which different fragments are converted into the same
numeric value (and the same bit position). Because of this, there is no one-to-one
correspondence between fragments and fingerprint bits (contrary to structural
keys).

Hashed fingerprints may be further classified into topological or path-based


fingerprints and circular fingerprints, according to the way by which the fragments
are enumerated.
Path-based fingerprints

Fig. Shown above is a topological fingerprint with multiple collisions between fragments.
A bit collision is represented by having two or more arrows from the
molecular fragments pointing to the same bit value. Starting with the chlorine atom, all
of the possible fragments are shown. However in a true fingerprint, each atom could be
the starting point which would allow for many more fragments than this example
shows. The more bits allowed, the less likely for the bit collisions, which is represented
by having two collisions due to only 10 bits being used. 

In this type of fingerprints, fragments of the molecule are generated by following a


(usually linear) path up to a certain number of bonds within the molecule.  The most
well-known example of path-based fingerprints is the Daylight fingerprint .
Circular fingerprints

Circular fingerprints are generated by considering the “circular” environment of each


atom up to a given “radius” or “diameter”.  Examples of circular fingerprints are
extended-connectivity fingerprints (ECFPs).  ECFPs are generated using a variant of
the Morgan algorithm, which is a method for solving the molecular isomorphism
problem (i.e., how to identify identical molecules that have different atom
numberings).  Different flavors of ECFPs may be generated by selecting different
maximum diameter of the circular atom neighborhood and they are referred to as
ECFP2, ECFP4, ECFP6, etc., where the digit at the end indicates the maximum
diameter value employed to generate the fingerprint.  The most commonly used
ones are ECFP4 and ECFP6.
   
Another example of circular fingerprints is functional-class fingerprints (FCFPs),
which are a variation of ECFPs.  FCFPs are further abstracted in that FCFPs encodes
atom’s roles (not atoms).  At the initial stage of FCFP generation, each atom in the
molecule is assigned a special code that represents one of the atom roles (e.g.,
hydrogen-bond acceptor and donor, negatively or positively ionizable, aromatic, and
halogen), and these codes (not the atoms) are used to generate FCFPs, through the
same process as ECFPs.
 
Other descriptors: Circular
substructures
• Each atom is represented by a string
of integers obtained by an adaptation
of the Morgan algorithm N
HN N
• Pipeline Pilot (Accelrys) descriptors, N
e.g., ECFP2, ECFP4, ECFP6, FCFP2,.... O
OH
• ECFP fragments encode atomic N
type, charge and mass O

• FCFP fragments encode six


generalised
atom-types
• 2, 4 or 6 denotes the diameter (in
bonds) of the circular
substructure
• RDKit variant: Morgan,
FeatMorgan
Similarity coeffi cients
• Tanimoto coefficient for binary bit
strings
C
SIM RD 
RD
C
– C bits set in common in the reference and database
structure
– R bits set in reference structure
– D bits set in database structure
• More complex form for use with non-binary data,
e.g., physicochemical property vectors
• Many other types of similarity coefficient exist that
can be applied, e.g., cosine coefficient, Euclidean
distance, Tversky index
Limitati ons of traditi onal 2D
descriptors
N

Daylight fingerprints; HO O
OH
Tanimoto Morphin
similarities e
N N
N

O O
O O O
OH O O
O

0.99 similar 0.95 similar 0.20 similar


Codeine Heroin
Methadone
Scaffold Hopping
• 2D fingerprints are very good at identifying
close analogues
• Scaffold Hopping: “Identification of structurally novel
compounds by modifying the central core structure
of the molecule”
Bohm, Flohr & Stahl, Drug Discovery Today: Technologies, 2004, 1,
217-224

– Patent reasons: move away from competitor compounds


– Provide alternate lead series if problems arise due to
difficult chemistry or poor ADME properties
• Descriptors for scaffold hopping
– Reduced graphs
– Topological pharmacophore keys
– 3D descriptors
Langdon, Ertl & Brown, Molecular Informatics, 2010, 29,
Scaffold Hops
The main brands of  COX-
2 inhibitor drugs 
currently on the market
are Celebrex and Bextra
(since the Vioxx recall). 

COX-2 inhibitors are a
newer type of NSAID that
block the COX-2 enzyme
at the site of
inflammation.

An NSAID used to treat


mild to moderate pain,
inflammation,
osteoarthritis, and
rheumatoid arthritis.

Cyclooxygenase inhibitors

Bohm, Flohr & Stahl, Scaffold hopping. Drug Discovery Today: Technologies, 2004, 1, 217-
224
Pharmacophore Vectors:
Similog
• Similog 0100
keys
• Atom typing scheme based on four O
6
properties: hydrogen-bond donor,
hydrogen-bond acceptor, bulkiness
and electropositivity
• Atom triplets of strings encoding 0010
absence and presence of 6
properties, plus distance encoding O
form a DABE key
4 O
• Vector contains a count for each
H
of the 8031 possible DABE keys 1100

0010-4-1100-6-0100-6-

Schuffenauer et al. Similarity metrics for ligands reflecting the similarity of target
proteins Journal of Chemical Information and Computer Sciences, 2003, 43, 391-
405
Reduced Graphs

Gillet, Willett & Bradshaw, Similarity searching using reduced graphs


Journal of Chemical Information and Computer Sciences, 2003, 43, 338-
345
3D similarity searching
• Systems for 3D substructure searching are
widely available – see pharmacophore searching
• Extension to 3D similarity searching is a natural
one
• What the receptor sees?

• Alignment independent
– Fingerprint approaches
• Alignment-based
– Field-based and surface-based methods
• No consensus as to the most effective method
3D fi ngerprints

• Presence or absence of geometric features


– Pairs of atoms at given distance range
– Triplets of atoms and associated distance
– Pharmacophore pairs and triplets (donors,
acceptors, aromatic centres,....)
– Valence angles
– Torsion angles
Alignment-based 3D similarity
• Shape-based
– ROCS (Rapid Overlay of Chemical
Structures)
– Molecules are aligned in 3D
– Similarity score is based on common volume VC
SIM AB 
VA  VB VC

Nicholls et al, Molecular Shape and Medicinal Chemistry; A


Perspective.
Journal of Medicinal Chemistry, 2010, 53, 3862-3886
Conformati onal fl exibility
• Conformations are different three-dimensional structures
of molecules that arise from
– Rotation about single bonds (torsion angles)
– Different rings conformations
• Having several rotatable bonds results in a “combinatorial
explosion”
• For a molecule with N rotatable bonds, if each torsion angle is
rotated in increments of θ degrees, number of conformations
is (360º/ θ)N
– If the torsion angles are incremented in steps of 30º, this
means that a molecule with 5 rotatable bonds with have 12^5
≈ 250K conformations
Two approaches to handling
conformati onal fl exibility
Conformer selection Exploration of
conformational space
• When a new molecule is to be • Use of triangle smoothing to
registered in a database, a identify min-max distances
conformational analysis is used between each atom-pair
to select diverse conformers • Creation of a distance-range
spanning the low-energy (rather than a distance) graph for
conformational space each database structure
• Each such conformer is loaded • Screen and graph search of the min-
into the database and then max distance data using
searched as if it was a single, appropriately modified algorithms
rigid structure
• Final conformational analysis (by
• Trade-off between effectiveness varying torsional angles) of the hits
of coverage (selection of many resulting from the screen/graph
conformers) and efficiency of searches
searching (selection of few
conformers)
3D similarity
• Computationally more expensive than 2D methods
• Requires consideration of conformational flexibility
– Rigid search - based on a single conformer
– Flexible search
• Conformation explored at search time
• Ensemble of conformers generated prior to search time with each
conformer of each molecule considered in turn
• How many conformers are required?

• Methods that require aligning molecules are more


costly than vector-based calculations
Evaluati on of similarity methods
• Retrospective search
• For a reference compound of known activity, search against a
database that contains other actives and decoy compounds
– Determine where the active compounds appear in the ranked list
– A good similarity measure will cluster the known actives at the top of
the
ranking
– Performance measures: enrichment factors, AUC, BEDROC, .....
• Comparative studies suggest that 2D fingerprints are most effective
– Good at identifying "me-too" compounds but less good at
scaffold hopping
• R.P. Sheridan and S.K. Kearsley (2002) Drug Discovery Today, 7,
903- 911
– “We have come to regard looking for ‘the best’ way of searching chemical
databases as a futile exercise. In both retrospective and prospective studies,
different methods select different subsets of actives for the same biological
activity and the same method might work better on some activities than
Data fusion
• Fusion of ranked lists generated for same active
compound (similarity fusion)
– Do a similarity search for a reference structure and rank the
database in
order of decreasing similarity
– Repeat with different representations, coefficients, etc.
– Sum the rank positions for a given structure to give an overall fused
rank position
– The fused rankings form the output from the search
• Consistency of search performance across a range of
reference structures, types of fingerprint, biological activities
etc.
• Analogous approaches (called consensus scoring) used in
docking studies
Multi ple acti ve structures
• Fuse the results of searches carried out using
different reference compounds
– Same descriptors, same coefficient, different active
compounds
• Results are generally improved relative to using a
single reference structure
• Best performance is achieved for diverse actives
Virtual Screening

3D Structure of Target

Unknown Known

Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known

Similarity Pharmacophore Machine learning Protein Ligand


searching mapping methods Docking
Multi ple acti ves
known: phamacophore
searching
(with thanks to Stefan Senger,
GSK)
Pharmacophore Defi niti on
• A pharmacophore is the ensemble of steric and
electronic features that is necessary to ensure the
optimal supramolecular interactions with a specific
biological target structure and to trigger (or to block)
its biological response

Glossary of terms used in Medicinal Chemistry (IUPAC Recommendations 1998) Pure &
Appl.
Chem. 1998, 70(5), 1129-1143 http://dx.doi.org/10.1351/pac199870051129).
Example: Rimonabant
hydrophobic
featu r e hydrogen bond acceptor
( H B A ) feature + projected
point

aromati c r i n g
feature + projected
point h ydr oph obi c
feature

other common feature types (not used here):


Cannabinoid • hydrogen bond donor
R eceptor 1 • positive/negative features (charged/ionizable)
• customized features
(CB1) • inclusion/exclusion volume spheres (shape)
antagonist H. Wang et al. J. Med. Chem. 2008, 51, 2439-
pharmacophore 2446
Generati ng pharmacophore
models: Ligand-based

Rimonabant

(alternative) CB1 antagonist


pharmacophore

Trying to predict how the ligands will bind to the receptor


without knowing the structure of the receptor

Foloppe et al. Bioorg. Med. Chem. Lett . 2009, 19, 4183-


4190
Pharmacophore generati on
methods
• Pharmacophoric features in each ligand identified
– Donors, acceptors, hydrophobic groups,...
– Often SMARTs-based to allow user-definitions
• Ligands aligned such that corresponding features are overlaid
• Conformational space explored
– On-the-fly eg using a genetic algorithm
– Generating ensemble of conformations with each conformer
considered
in turn
• Given the undetermined nature of the problem it is unlikely that
a single correct solution will be found
• Pharmacophore hypotheses are scored
– eg number of features, goodness of fit to features,
conformational energy, volume of the overlay, rarity of the
pharmacophore,....
Ligand-based pharmacophores:
practi cal aspects
• Select a ‘representative’ set of actives
– Most methods assume similar binding modes
– One or more rigid molecules are preferred
– The ligands should be diverse (otherwise too many common
features that are not involved in binding)
• Prepare molecules (e.g. tautomeric form, protonation
state), generate 3D structure and conformations (if
required)
• Use pharmacophore software/tool to generate
pharmacophores
(biased or unbiased?)
• Select preferred pharmacophore model(s) and validate
them
– Visual inspection
Structure-based pharmacophores

PDB entry 1osh, Pharmacophore contains five


farnesoid X receptor hydrophobic features, one
(FXR, a ligand- hydrogen bond acceptor feature,
dependent transcription and 27 exclusion spheres
factor)

D. Schulster et al. Bioorg. Med. Chem. 2011, 19, 7168-


7180
(http://dx.doi.org/10.1016/j.bmc.2011.09.056)
U. Grienke et al. Bioorg. Med. Chem. 2011, 19, 6779-
6791
Pharmacophore searching
O
a = 8.62+- 0.58 Angstroms N
O
b = 7.08+- 0.56 Angstroms
c a O
c = 3.35+- 0.65 Angstroms
O O

b N O O
O
S

N
O N
O
O O N
O N
N
O
O O
N N N O
O O
N O N O P
N
O N OO
O N O
O N O OO
N O
O
O P
O
N O O P
O O
Database searching
• Conformational search
– On-the-fly
– Ensemble of conformers
• Database search should be “compatible” with parameters used
to generate the pharmacophore
– The same pharmacophore feature definitions should be used
to describe the database structures as were used to generate
the pharmacophore
– The database should be generated using the same protocol as used
to generate the pharmacophore
– What tolerance should be used to allow a match?
• If two pharmacophore features are separated by 5Å what
distance range is acceptable: 4.5-5.5Å; 4-6Å?
• Should all tolerances be the same?
• What effect does this have on recall and precision?
– Can exclusion/inclusion volumes be used?
Pharmacophore-based VS: workfl ow
Select actives +
Select inactives/decoys Select
actives for validati on compounds
for screenin g
Generate
Generate ‘compatible
conformers ’ validati on Filter (availability,
database Generate/select properties,
‘compatible’ novelty, visually
compound inspect
Generate database mappings,…)
(Modify) Validati on 2:
pharmacophore Search validati on
database – enrichment, Virtual screening
models
specificity, sensiti vity?
Perform
search/mapping(s)

Validati on 1:
Map actives back on
pharmacophore Pr ior iti se/select

pharmacophore
model(s)
Example - Cannabinoid CB1 receptor
antagonists
• No CB1 crystal structure, only very
success with homology models
limited Rimonabant
• Aim was to assay 420 compounds selected
using
a– pharmacophore model
8 CB1 selective antagonists/inverse agonists were
selected from the literature including
rimonabant
– A maximum of 250 unique conformations were
generated
the for force
MMFF94s each molecule
field) (with Macromodel
Ca n n a b i n o i d
– using
Pharmacophores were generated with Catalyst. Receptor 1 (CB1)
– The model that yielded the most reasonable mapping antagonist
p har m acoph ore
for Rimonabant was selected for the database
search
– The database contained about 500k compounds (max.
of 150 conf. per molecule, generated with Catalyst)

H . Wang et al. J. Med. Chem. 2008, 51, 2439-2446


(htt p://dx.doi.org/10.1021/jm701519h)
Example (conti nued)
• The pharmacophore search resulted in 22794 hits (approx. 5% of
the database)
• Stepwise filtering
300 < MW < 550 (18693 compounds
availability as solid > 2 mg remaining)
modified Lipinski’s rule of (10581 compounds
five
• A Bayesian remaining)
model built from compounds in the MDDR database was
(7247 compounds remaining)
used to rank the remaining compounds (using the FCFP6
fingerprints in Pipeline Pilot)
• The top ranking 2100 were selected

• Clustering using the maximum dissimilarity clustering algorithm. 420


clusters were generated and from each cluster the compound with
the highest Bayesian score was selected.
H . Wang et al. J. Med. Chem. 2008, 51, 2439-
2446
(http://dx.doi.org/10.1021/jm701519h)
Example (conti nued)
• 420 compounds were screened at a single concentration. Five
compounds showed more than 50% inhibition. All five
compounds confirmed in the full curve assay.
– Approx. 1% screening hit rate
• One compound has a Ki of less than 100 nM.

Rimonabant

Cannabinoid Receptor 1
(CB1) antagonist
pharmacophore
H . Wang et al. J. Med. Chem. 2008, 51, 2439-2446
(htt p://dx.doi.org/10.1021/jm701519h)
(Commercial) software
Examples (by no means comprehensive):
Software Source Recent published use cases
Catalyst Accelrys http://dx.doi.org/10.1007/s00894-011-1105-5
(Discovery Studio) http://dx.doi.org/10.1016/j.bmcl.2010.12.131
GASP Tripos http://dx.doi.org/10.1016/j.jmgm.2010.02.004
GALAHAD Tripos http://dx.doi.org/10.1016/j.bmc.2011.09.016
http://dx.doi.org/10.1016/j.ejmech.2010.09.012
Ligandscout Inte:ligand http://dx.doi.org/10.1016/j.eplepsyres.2011.08.0
16
MOE Chemical http://dx.doi.org/10.1007/s10822-011-9442-0
Computing http://dx.doi.org/10.1016/j.ejmech.2010.07.020
Group
Phase Schrödinger http://10.1111/j.1747-0285.2011.01130.x
http://cs-
test.ias.ac.in/cs/Volumes/100/12/1847.pdf
Some references for pharmacophores
• A. R. Leach, V. J. Gillet, R. A. Lewis, R. Taylor Three-Dimensional
Pharmacophore Methods in Drug Discovery J. Med. Chem. 2010, 53, 539-558 (
http://dx.doi.org/10.1021/jm900817u)

• T. Seidel, G. Ibis, F. Bendix, G. Wolber Strategies for 3D pharmacophore-based


virtual screening Drug Disc. Today: Technologies 2010, 7, e221-e228 (
http://dx.doi.org/10.1016/j.ddtec.2010.11.004)

• G. Hessler, K.-H. Baringhaus The scaffold hopping potential of pharmacophores Drug


Disc. Today: Technologies 2010, 7, e263-e269 (
http://dx.doi.org/10.1016/j.ddtec.2010.09.001)

• M. Hein, D. Zilian, C. A. Sotriffer Docking compared to 3D-pharmacophores: the scoring


function challenge Drug Disc. Today: Technologies 2010, 7, e2229-e236 (
http://dx.doi.org/10.1016/j.ddtec.2010.12.003)

• F. Caporuscio, A. Tafi Pharmacophore Modelling: A Forty Year Old Approach and its
Modern Synergies Curr. Med. Chem. 2011, 18, 2543-2553

• I. Wallach Pharmacophore Interference and its Application to Computational Drug


Discovery Drug Dev. Res. 2011, 72, 17-25 (http://dx.doi.org/10.1002/ddr.20398)
Virtual Screening

3D Structure of Target

Unknown Known

Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known

Similarity Pharmacophore Machine learning Protein Ligand


searching mapping methods Docking
Structure-Acti vity Relati onship
Modelling
• Use knowledge of known active and known
inactive compounds to build a predictive model
• Quantitative-Structure Activity Relationships
(QSARs)
– Long established (Hansch analysis, Free-Wilson analysis)
– Generally restricted to small, homogeneous datasets eg
lead optimisation
• Structure-Activity Relationships (SARs)
– “Activity” data is usually treated qualitatively
– Can be used with data consisting of diverse structural
classes and multiple binding modes
– Some resistance to noisy data (HTS data)
– Resulting models used to prioritise compounds for lead
finding
Generalised machine learning
method Top Ranked
Compounds Picked
for Testing
Training Set
Known active compounds
C3
Known inactive
C1
compounds
Untested C4
•Substructural
compounds analysis C2

Likelihood of being active


C 1, C 2, C 3, C 4, C 5
•Recursive C5

partitioning .
Anal •Support vector machines .
•K nearest .
acti yse
inact .
ves
ives
neighbours .
•Neural networks .
.
.
Model of Activity .
.
Compute .
scores
Substructural analysis
• The first (1973) machine learning method to be applied to
large activity datasets (before HTS methods became
available)
• Based on the idea that each fragment substructure makes a
constant contribution to a particular type of activity, irrespective
of its environment
– Normally used with fragment-based fingerprints
• A weight is assigned to each fragment to reflect its differential
occurrence in the training-set actives and inactives
– Many different types of weighting scheme
• An unknown molecule is scored by summing the weights for all
the fragments it contains
• The scores are used to rank the test-set molecules in
decreasing probability of activity
Calculati on of weights
• The weight for a fragment substructure comprises
some or all of the following
– ACT and INACT, the numbers of active and inactive molecules in
a training set
– ACT(I) and INACT(I), the numbers of active and inactive
molecules in the training set that contain the I-th
fragment
• Many weights have been suggested: a typical
example is
of the form:
ACT (I )
ACT (I )  INACT (I )
• Closely related to the now widely used naïve
Bayesian classifier
Recursive Parti ti oning
• Classification approach that constructs a decision
tree from qualitative data
– active/inactive, soluble/insoluble, toxic/non-toxic
• Identification of a rule that gives the best statistical
split into classes, with the lowest rate of
misclassification
– Example drug|non-drug: MW < 500|MW > 500
• Repeat on each set coming from the previous split
until no more reasonable splits can be found
• Can generate good models but with poor
predictive power if used without care
– Use leave-many-out strategies to validate
– Easy
Hamman to interpret/drive
F, Gutmann H. Voigt N, Helmawhat-next decisions
C, Drewe J. Prediction of adverse drug reactions
using decision tree modeling. Clin Pharmacol Ther, 2010, 88, 52-59.
Example

Test compounds are dropped through the tree. Prediction


depends on whether they fall into “active” or inactive
nodes”
Virtual Screening

3D Structure of Target

Unknown Known

Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known

Similarity Pharmacophore Machine learning Protein Ligand


searching mapping methods Docking
What Is Docking….?
Docking attempts to find the “best” matching between two molecules
Docking is a method which predicts the preferred orientation of one molecule to
a second when bound to form a stable complex with overall minimum energy.
Why is docking important?

It is of extreme relevance in cellular biology


It is the key to rational drug design

Docking

Prediction of the
Identification of the ligand’s binding affinity
correct binding Geometry (Scoring Function)
(pose) in the binding Site
(Binding Mode)

Rational Design Of
Drugs
Docking can be between
•Protein - Ligand
•Protein – Protein
•Protein – Nucleotide
A Typical Docking Workflow
Key Stages In Docking

Receptor selection Ligand selection

7
0
Types of docking

1. Rigid Docking (Lock and Key)


• In rigid docking, the internal geometry of both the receptor and ligand are treated
as rigid.

• Flexible Docking (Induced fit)


2.
An enumeration on the rotations of one of the molecules (usually smaller one) is
performed. Every rotation the energy is calculated; later the most optimum pose is
selected.

3. Manual docking

7
1
Manual docking
Dock or fit a molecule in the binding site

Binding group on the ligand and binding site are known, defined by the
operator.

Binding group in the ligand is paired with its complementary group in the
binding site

Ideal bonding distance for potential interaction is defined.

Docking procedure is started

The program try to get best fit, as defined by the operator


The paired groups are not directly overlaid, they are , fitted within preferred bonding
distance

Both ligand and protein remain same conformation throughout the process

So this is a rigid fit, once a molecule successfully docked fit optimization is carried out.

Same as in energy minimization

Different conformation of molecule can be docked to in same way

Identify the best fit 7


3
Tools and Softwares
• SANJEEVINI – IIT Delhi (www.scfbio-iitd.res.in/sanjeevini/sanjeevini.jsp)

• GOLD – University of Cambridge ,UK (


www.ccdc.cam.ac.uk/Solutions/GoldSuite/Pages/GOLD.aspx)
• AUTODOCK - Scripps Research Institute,USA (autodock.scripps.edu/)
GemDock (Generic Evolutionary Method for Molecular Docking) - A tool,

developed by Jinn-Moon Yang, a professor of the Institute of
Bioinformatics, National Chiao Tung University, Taiwan
(gemdock.life.nctu.edu.tw/dock/)
• Hex Protein Docking - University of Aberdeen, UK (hex.loria.fr/)

• GRAMM (Global Range Molecular Matching) Protein docking - A Center for

• Bioinformatics, University of Kansas, USA


(www.bioinformatics.ku.edu/files/vakser/gramm/) 10
Applications

• Virtual screening (hit identification) docking with a scoring function can be


used to quickly screen large databases of potential drugs in silico to identify
molecules that are likely to bind to protein target of interest.
• Drug Discovery (lead optimization) docking can be used to predict in
where and in which relative orientation a ligand binds to a protein
(binding mode or pose). This information may in turn be used to design
more potent and selective analogs.
• Bioremediation Protein ligand docking can also be used to predict pollutants
that can be degraded by enzymes.

11
Docking based screening
1) Virtual based screening
2) Molecular based screening

Molecular based screening


• Docking- the process by which molecular modeling software fits a molecule
into target binding sites.
• Used for finding binding modes of protein with ligands/inhibitors
• In molecular docking, attempt to predict the structure of the intermolecular
complex formed between two or more molecules.

Molecular docking tries to predict the structure of the intermolecular complex


formed between two or more constituent molecules.

Molecular docking has become an increasingly important tool for drug


discovery.
76
Steps Involve in molecular docking
IN-silico generation of ligands(using chemsketch in this software we can draw
the structure of Ligand/molecule).

Conversion of file format( OPEN BABEL is the software used to converting


format of file .mol to .pdb

Protein optimization(RCSB- protein data bank, here you can prepare your protein
of interest for docking).

Energy Minimization( here SPDV swiss-Pdb viewer software) This can be done
by commanding.

Molecular docking (creation of .gpf-grid parameter file and .dpf-dock parameter


file. Autodock Vina, Autodock 4.0, Autodock 4.2 77
These are the software used to make auto griding and dock parameters along
with grid mapping.

Running the docking algorithm(CYGWIN-1 in this software we will create GLG


file and DLG file here this software works commanding after getting succesfull
comand for auto grid(glg) and auto dock (dlg)

After this step we got to know minimal binding energy CYGWIN-2

Hydrogen bond analysis( UCSF CHIMERA we use this software for visulisation
and analysis of result.

ADMET( the molecules which have shown H bond with the active site residue or
any other residue of the binding pocket note down those molecules and then run
these molecule on the online ADMET serve. 78
Applications

1. Structure based drug design.


2. Lead Optimization.
3. Virtual Screening.
4. Protein-Protein Docking.
5. Chemical mechanism studies.

79
Protein-Ligand Docking

• How does a ligand (small molecule) bind into the active


site of a protein?
• Docking algorithms are based on two key components
– search algorithm
• to generate “poses” (conformation, position and orientation) of
the ligand within the active site
– scoring function
• to identify the most likely pose for an individual ligand
• to assign a priority order to a set of diverse ligands docked to
the same protein – estimate binding affinity
The search space
• The difficulty with protein–ligand docking is in
part due to the fact that it involves many degrees
of freedom
– The translation and rotation of one molecule relative
to another involves six degrees of freedom
– These are in addition the conformational degrees of
freedom
of both the ligand and the protein
– The solvent may also play a significant role in
determining the
protein–ligand geometry (often ignored though)
• The search algorithm generates poses, orientations
of particular conformations of the molecule in the
binding site
– Tries to cover the search space, if not exhaustively, then as
extensively as possible
– There is a tradeoff between time and search space
Examples of Docking Search
Algorithms

• DOCK: first docking program by Kuntz et al. 1982


– Based on shape complementarity and rigid ligands
• Current algorithms
– Fragment-based methods: FlexX, DOCK (since version 4.0)
– Monte Carlo/Simulated annealing: QXP(Flo), Autodock, Affinity
& LigandFit (Accelrys)
– Genetic algorithms: GOLD, AutoDock (since version 3.0)
– Systematic search: FRED (OpenEye), Glide (Schrödinger)

R. D. Taylor et al. “A review of protein-small molecule docking


methods”, J. Comput. Aid. Mol. Des. 2002, 16, 151-166.
D O C K (Kuntz et al. 1982)
• Rigid docking based on
shape
• A negative image of the
cavity is constructed by
filling it with spheres O
O
S
• Spheres are of varying size N
H

• Each touches the surface at O

two points NH

• The centres of the spheres N

become potential locations


for ligand atoms
DOCK
O
O
S
• Ligand atoms are matched N

to sphere centres so that H


O

distances between atoms


NH
equals distances between
sphere centres N

• The matches are used to


position the ligand within
the active site
• If there are no steric O
S
O

N O
clashes the ligand is scored H

NH

N
DOCK
• Many different mappings (poses) are
possible
• Each pose is scored based on goodness of fit
• Highest scoring pose is presented to the user
Exploring conformati onal space
of ligands
• Ensemble of conformations
– A series of conformations is generated before docking
– Each conformer is docked in turn as a rigid body
– FLOG (variant on DOCK)
– Glide, FRED: often use filters and approximations to
identify conformations of interest

• Conformational space explored at run time


– The accessible conformations of the ligands are explored at
the same time as the docking
– GOLD: Genetic Algorithm
– AutoDOCK: Monte Carlo/Simulated annealing
– FlexX: Incremental construction
Example of Flexible Docking
Program: G O L D
• Full ligand flexibility and partial receptor flexibility (side chains
can rotate)
• Genetic algorithm
– A population of potential solutions is maintained
– Each solution represents one conformation of the ligand together
with one mapping between the ligand and the binding site
– The mapping is used to generate a “pose” – orientation and position of
a ligand conformation within the binding site
– The “pose” is then scored using a function that includes vdw
interactions; internal energy of ligand and h-bonding of complex
– The GA iterates (modifying the population members) until an
optimum value of the scoring function is obtained
Gold uses a Geneti c Algorithm
Generate initial
population

Select operator, parent

Replace least fit


member

Select operator, parents

Replace least fit


members
GOLD: chromosome compositi on

• Ligand torsions

• Protein OH and NH3 torsions, if not fixed by H-bonding

• Mapping of H-bonding points on ligand


with complementary points on protein

• Mapping of hydrophobic points on protein to ligand


C(H) atoms
GOLD: Bond Mappings

O O N
O
1 H 1 2
3 H
Ligan N H 2

d 2 N H
1 Protein
O
1
O 4
N O
2 H
7 H
5 H H 4 3

N N
H
6

Hydrogen
s
Acceptors
Flexible Docking: F l e x X
• Incremental construction: flexible ligand; rigid protein
– The conformation of the ligand is constructed step-wise
within the active site
– The ligand is broken down into fragments
– Base fragments of ligand are docked first
– A systematic conformational search of the ligand is carried
out as
each new fragment is added in all possible ways
– The protein binding site is used to prune the search tree
N
N
N N

O O

OH OH
Fragment-based docking: F l e x X

Interaction model:
Interaction centre of first group lies
approximately on interaction surface
of second group.

B. Kramer et al.
“Ligand Docking and
Screening with FlexX”, Med.
Chem. Res.
1999, 9, 463-478
FlexX matches triangles of interaction
http://www.biosolveit.de sites onto complementary ligand atoms.
Energeti cs of protein-ligand
binding
• Ligand-receptor binding is driven by
• electrostatics (including hydrogen bonding interactions)
• dispersion or van der Waals forces
• hydrophobic interactions
• desolvation: surfaces buried between the protein and the
ligand have to be desolvated
• Conformational changes to protein and ligand
• ligand must be properly orientated and translated to interact
and
form a complex

• Freeloss of entropy of the ligand due to being fixed in one



energy of binding
conformation
 Gint  Grot  Gt / r  Gv i b
Gbind  G solvent  Gconf
Scoring Functi ons: I

• Molecular mechanics/force field


– Attempt to calculate the interaction terms directly
• eg Lennard-Jones potential for vdw’s interactions
– Only account for some of the contributions

• GOLD Score
– Protein-ligand hydrogen bond energy S(hb_ext)
– Protein-ligand van der Waals (vdw) energy
S(vdw_ext)
– Ligand internal energy S(int)
Scoring Functi ons: I I
• Empirical
– Böhm J. Comput. Aided Mol. Design 8 (1994)
243-256
 Gro t NROT
Gb in d  G0  Gh b  f R,   G io n ic  f R,   G
lip o

Alip o
hbonds ionicinteractions
– equation proposed based on linear combination of simple
properties – hydrogen bonding, ionic interactions, lipophilic
interactions, loss of internal conformational freedom of
ligand
– multiple linear regression used to calculate values for
coefficients by attempting to fit the equation to
experimental binding data (eg 45 protein-ligand complexes)
Ghb=-1.2kcal/mol, Gionic=-2.0kcal/mol, Glipo=-0.04kcal/mol Å2,
Grot=+0.3kcal/mol, G0=+1.3kcal/mol

– Examples include ChemScore, PLP, Glide SP/XP


Scoring Functi ons: I I I
• Knowledge based methods
– Based on statistics of observed inter-atomic contact frequencies
and/or distances
– Assume that statistical preferences reflect favourable/unfavourable
interactions between functional groups
– eg PMF: Potential Mean Force; DrugScore; ASP

• Main effort is now in developing more effective scoring functions


– No single scoring function is uniformly superior
– Consensus/Data fusion approaches combine results from several
scoring schemes
– Rescoring uses one scoring function during the docking and another
to evaluate the final poses
Evaluati ng a Docking Program

• Take a known protein-


ligand complex from
the PDB
• Extract the ligand
• Minimise the
conformation of the
ligand
• Dock back into the protein
• Compare the docked pose
with the experimental
data
Evaluati ng a Docking
Program
The docked result (red) is superimposed on the X-ray
(experimental) structure
crystal  (x  x )  (y  y )  (z
a
2

b
2

a b
 z )2
a
b
RMSD  N

Root Mean Square N

Deviation
Evaluati ng a docking program
The GOLD result (dark) superimposed on the Xray
structure (light)

4PHV: Good 1GLQ:


HIV Protease
15 rotatable Close
bonds Peptidic ligand

1CIN: Wrong
Fatty acid binding
protein
GOLD: Validati on
• GOLD validation
– 305 complexes found in PDB (CCDC/Astex dataset)
– ligand extracted from complex
– ligand minimised
– docked back to protein
– GOLD prediction compared with original crystal structure

• ~72% success rate using stringent criteria

• G. Jones, P. Willett, R. C. Glen, A. R. Leach & R. Taylor, J. Mol.


Biol
1997, 267, 727-748
• J. W. M. Nissink et al. “A New Test Set for Validating Predictions
of Protein-Ligand Interaction”, Proteins 2002, 49, 457-471.
Issues related to the protein
• Need to ensure all residues are in the
correct protonation and tautomeric states
• Protein conformation
– Can be several examples of the same protein but with
different ligands bound
– The conformation of the binding site can vary from one
complex to another
– Which should be used in the virtual screening experiment?
• Ensemble docking to different protein conformations
may be required where there are large changes in
the binding site
Where there’s no chicken wire,
there are no electrons..atoms

An X-ray cr ystal structure is one


crystallographer’s subjective
interpretation of an observed
electron- density map expressed in
terms of an atomic models

A Davis, S Teague G
Kleywegt Angew. Chem.
2003, 24, 2693

Homology models can be even


more subjective
Issues related to the ligands
• The protonation state and tautomeric form of a
ligand can influence its hydrogen bonding ability
– Need to ensure all ligands are in the correct protonation
and tautomeric states or enumerate and dock all
possibilities HN
N

HO O
Enol Keton
e
• Conformations
– Need to ensure sufficient sampling of conformational space has
been carried out
– Can we be sure the bioactive conformation has been
generated?
– May want to apply filtering techniques to prune
unlikely candidates prior to carrying out the docking
Current Status of Docking:

1
Most docking programs take account of conformational flexibility
of the ligand but very flexible ligands are still difficult
• Some protein-ligand interactions occur via a water molecules
– Can switch waters on and off in the binding site but usually based
on positions seen in the x-ray structure
• Some docking programs allow protein side chain flexibility
– Full protein flexibility cannot yet be handled except by molecular
dynamics with is extremely computationally demanding
• Scoring functions
– Reasonably good at finding the correct pose for a given protein-
ligand complex
– Less good at ranking different ligands against the same protein
(virtual screening)
• Variety of different post-processing procedures are available to
help reorder the output
Current Status of Docking: 2
• Despite its limitations docking is very widely used and there
are many success stories
– see Kolb et al. Curr. Opin. Biotech., 2009, 20, 429,
and Waszkowycz et al., WIREs Comp Mol. Sci., 2011,
1, 229)
• Performance varies from target to target, and scoring
function to
scoring function
– See for example, Plewczynski et al, “Can we trust
docking results? Evaluation of seven commonly used
programs on PDBbind database”, J. Comp. Chem.,
2011, 32, 742.
• Care needs to be taken when preparing both the protein and
the ligands
• The more information you have (and use!), the better your
chances
– Targeted library, docking constraints, filtering poses,
seeding
with known actives, comparing with known crystal poses
Conclusions
• Wide range of virtual screening techniques have
been developed
• The performance of different methods varies
on different datasets
• Increased complexity in descriptors and method
does not necessarily lead to greater success
• Combining different approaches can lead to
improved results
• Computational filters should be applied to remove
undesirable compounds from further
consideration
Some more references
• Ripphausen et al. (2010) Quo vadis, virtual screening? A
comprehensive review of prospective applications. Journal of
Medicinal Chemistry, 53, 8461-8467.
• Scior et al. (2012) Recognizing pitfalls in virtual screening: a critical
review. Journal of Chemical Information and Modeling, 52, 867-881
• Sotriffer (Ed) Virtual Screening. Principles, Challenges and Practical
Guidelines. Wiley-VCH, 2011.
• Varnek A, Baskin I. Machine Learning Methods for Property Prediction
in Chemoinformatics: Quo Vadis? Journal of Chemical Information and
Modeling 2012, 52, 1413−1437
• Hartenfeller, M.; Schneider, G. Enabling future drug discovery by de novo
design. Wiley Interdisciplinary Reviews-Computational Molecular
Science 2011, 1, 742-759.

You might also like