Ligand-Based+structure-Based Screening
Ligand-Based+structure-Based Screening
Virtual Screening
The Drug Discovery Challenge
3D Structure of Target
Unknown Known
Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known
3D Structure of Target
Unknown Known
Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known
O O
O O O
HO OH O O
O OH
• Similarity coefficient
– A quantitative measure of similarity between two sets
of molecular descriptors
• Each bit in the bit string (binary vector) represents one molecular
fragment. Typical length is ~1000 bits
• The bit string for a molecule records the presence (“1”) or
absence (“0”) of each fragment in the molecule
• Originally developed for speeding up substructure search
– for a query substructure to be present in a database molecule each bit set to
“1” in the query must also be set to “1” in the database structure
• Similarity is based on determining the number of bits that
are common to two structures
Example fragments
C
C N C N C
C C
C C
a. Augmented b. Atom c. Bond Sequence
Atom C rs C rd Sequence C rs AA rs AA rs AA rd
C rs C C rs C rd C AA
C C C
C C C C
C
d. Ring Composition e. Ring Fusion f. Atom Pair
N rs C rd C rs C rs C XX3 XX3 XX3 XX2 N 0;3 - 2 - C
rs XX2 0;3
N N
C C
C C
C
Dictionary-based fingerprints: pre-defined fragments each of
which maps to a single bit. Examples include MACCS Keys, BCI
fps
Hashed Fingerprints
OH
H3C
C O O O O
Here the focus is on 2-D molecular fingerprints, which encodes the 2-D
structure of molecules. While many molecular fingerprints have been
developed, we discuss two types of molecular fingerprints, structural keys and
hashed fingerprints, because they are more widely used than others
Fig. 1. (above) Two molecules are shown along with the respective bit substructures
highlighted for comparison. The number of bits and designations used for this figure is
simply for display and illustrative purposes. The true fingerprint would be much longer.
In structural keys, the structure of a molecule is encoded into a binary bit string (that is,
a sequence of 0’s and 1’s), each bit of which corresponds to a “pre-defined” structural
feature (e.g., substructure or fragment). If the molecule has a pre-defined feature, the
bit position corresponding to this feature is set to 1 (ON). Otherwise, it is set to 0
(OFF). It is important to understand that structural keys cannot encode structural
features that are not pre-defined in the fragment library.
The MACCS (Molecular ACCess System) keys are one of the most commonly used
structural keys. They are sometimes referred to as the MDL keys, named after the
company that developed them [the MDL Information Systems (now BIOVIA)].
While there are two sets of MACCS keys (one with 960 keys and the other containing
a subset of 166 keys), only the shorter fragment definitions are available to the public.
These 166 public keys are implemented in popular open-source cheminformatics
software packages, including RDKit, OpenBabel, CDK etc.
The fragment definitions for the MACCS 166 keys can be found in this document:
https://github.com/rdkit/rdkit/blob/master/rdkit/Chem/MACCSkeys.py
PubChem fingerprints
ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.pdf
Hashed Fingerprints
Hash functions are used to map data of arbitrary size to “fixed-size” values.
Enumerating all possible fragments with a molecule may result in a very large
number of fragments. Hashing them into values within a fixed range inevitably
results in “bit collisions”, in which different fragments are converted into the same
numeric value (and the same bit position). Because of this, there is no one-to-one
correspondence between fragments and fingerprint bits (contrary to structural
keys).
Fig. Shown above is a topological fingerprint with multiple collisions between fragments.
A bit collision is represented by having two or more arrows from the
molecular fragments pointing to the same bit value. Starting with the chlorine atom, all
of the possible fragments are shown. However in a true fingerprint, each atom could be
the starting point which would allow for many more fragments than this example
shows. The more bits allowed, the less likely for the bit collisions, which is represented
by having two collisions due to only 10 bits being used.
Daylight fingerprints; HO O
OH
Tanimoto Morphin
similarities e
N N
N
O O
O O O
OH O O
O
COX-2 inhibitors are a
newer type of NSAID that
block the COX-2 enzyme
at the site of
inflammation.
Cyclooxygenase inhibitors
Bohm, Flohr & Stahl, Scaffold hopping. Drug Discovery Today: Technologies, 2004, 1, 217-
224
Pharmacophore Vectors:
Similog
• Similog 0100
keys
• Atom typing scheme based on four O
6
properties: hydrogen-bond donor,
hydrogen-bond acceptor, bulkiness
and electropositivity
• Atom triplets of strings encoding 0010
absence and presence of 6
properties, plus distance encoding O
form a DABE key
4 O
• Vector contains a count for each
H
of the 8031 possible DABE keys 1100
0010-4-1100-6-0100-6-
Schuffenauer et al. Similarity metrics for ligands reflecting the similarity of target
proteins Journal of Chemical Information and Computer Sciences, 2003, 43, 391-
405
Reduced Graphs
• Alignment independent
– Fingerprint approaches
• Alignment-based
– Field-based and surface-based methods
• No consensus as to the most effective method
3D fi ngerprints
3D Structure of Target
Unknown Known
Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known
Glossary of terms used in Medicinal Chemistry (IUPAC Recommendations 1998) Pure &
Appl.
Chem. 1998, 70(5), 1129-1143 http://dx.doi.org/10.1351/pac199870051129).
Example: Rimonabant
hydrophobic
featu r e hydrogen bond acceptor
( H B A ) feature + projected
point
aromati c r i n g
feature + projected
point h ydr oph obi c
feature
Rimonabant
b N O O
O
S
N
O N
O
O O N
O N
N
O
O O
N N N O
O O
N O N O P
N
O N OO
O N O
O N O OO
N O
O
O P
O
N O O P
O O
Database searching
• Conformational search
– On-the-fly
– Ensemble of conformers
• Database search should be “compatible” with parameters used
to generate the pharmacophore
– The same pharmacophore feature definitions should be used
to describe the database structures as were used to generate
the pharmacophore
– The database should be generated using the same protocol as used
to generate the pharmacophore
– What tolerance should be used to allow a match?
• If two pharmacophore features are separated by 5Å what
distance range is acceptable: 4.5-5.5Å; 4-6Å?
• Should all tolerances be the same?
• What effect does this have on recall and precision?
– Can exclusion/inclusion volumes be used?
Pharmacophore-based VS: workfl ow
Select actives +
Select inactives/decoys Select
actives for validati on compounds
for screenin g
Generate
Generate ‘compatible
conformers ’ validati on Filter (availability,
database Generate/select properties,
‘compatible’ novelty, visually
compound inspect
Generate database mappings,…)
(Modify) Validati on 2:
pharmacophore Search validati on
database – enrichment, Virtual screening
models
specificity, sensiti vity?
Perform
search/mapping(s)
Validati on 1:
Map actives back on
pharmacophore Pr ior iti se/select
pharmacophore
model(s)
Example - Cannabinoid CB1 receptor
antagonists
• No CB1 crystal structure, only very
success with homology models
limited Rimonabant
• Aim was to assay 420 compounds selected
using
a– pharmacophore model
8 CB1 selective antagonists/inverse agonists were
selected from the literature including
rimonabant
– A maximum of 250 unique conformations were
generated
the for force
MMFF94s each molecule
field) (with Macromodel
Ca n n a b i n o i d
– using
Pharmacophores were generated with Catalyst. Receptor 1 (CB1)
– The model that yielded the most reasonable mapping antagonist
p har m acoph ore
for Rimonabant was selected for the database
search
– The database contained about 500k compounds (max.
of 150 conf. per molecule, generated with Catalyst)
Rimonabant
Cannabinoid Receptor 1
(CB1) antagonist
pharmacophore
H . Wang et al. J. Med. Chem. 2008, 51, 2439-2446
(htt p://dx.doi.org/10.1021/jm701519h)
(Commercial) software
Examples (by no means comprehensive):
Software Source Recent published use cases
Catalyst Accelrys http://dx.doi.org/10.1007/s00894-011-1105-5
(Discovery Studio) http://dx.doi.org/10.1016/j.bmcl.2010.12.131
GASP Tripos http://dx.doi.org/10.1016/j.jmgm.2010.02.004
GALAHAD Tripos http://dx.doi.org/10.1016/j.bmc.2011.09.016
http://dx.doi.org/10.1016/j.ejmech.2010.09.012
Ligandscout Inte:ligand http://dx.doi.org/10.1016/j.eplepsyres.2011.08.0
16
MOE Chemical http://dx.doi.org/10.1007/s10822-011-9442-0
Computing http://dx.doi.org/10.1016/j.ejmech.2010.07.020
Group
Phase Schrödinger http://10.1111/j.1747-0285.2011.01130.x
http://cs-
test.ias.ac.in/cs/Volumes/100/12/1847.pdf
Some references for pharmacophores
• A. R. Leach, V. J. Gillet, R. A. Lewis, R. Taylor Three-Dimensional
Pharmacophore Methods in Drug Discovery J. Med. Chem. 2010, 53, 539-558 (
http://dx.doi.org/10.1021/jm900817u)
• F. Caporuscio, A. Tafi Pharmacophore Modelling: A Forty Year Old Approach and its
Modern Synergies Curr. Med. Chem. 2011, 18, 2543-2553
3D Structure of Target
Unknown Known
Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known
3D Structure of Target
Unknown Known
Ligand-Based Structure-Based
Methods Methods
Actives known Actives and inactives known
Docking
Prediction of the
Identification of the ligand’s binding affinity
correct binding Geometry (Scoring Function)
(pose) in the binding Site
(Binding Mode)
Rational Design Of
Drugs
Docking can be between
•Protein - Ligand
•Protein – Protein
•Protein – Nucleotide
A Typical Docking Workflow
Key Stages In Docking
7
0
Types of docking
3. Manual docking
7
1
Manual docking
Dock or fit a molecule in the binding site
Binding group on the ligand and binding site are known, defined by the
operator.
Binding group in the ligand is paired with its complementary group in the
binding site
Both ligand and protein remain same conformation throughout the process
So this is a rigid fit, once a molecule successfully docked fit optimization is carried out.
11
Docking based screening
1) Virtual based screening
2) Molecular based screening
Protein optimization(RCSB- protein data bank, here you can prepare your protein
of interest for docking).
Energy Minimization( here SPDV swiss-Pdb viewer software) This can be done
by commanding.
Hydrogen bond analysis( UCSF CHIMERA we use this software for visulisation
and analysis of result.
ADMET( the molecules which have shown H bond with the active site residue or
any other residue of the binding pocket note down those molecules and then run
these molecule on the online ADMET serve. 78
Applications
79
Protein-Ligand Docking
two points NH
N O
clashes the ligand is scored H
NH
N
DOCK
• Many different mappings (poses) are
possible
• Each pose is scored based on goodness of fit
• Highest scoring pose is presented to the user
Exploring conformati onal space
of ligands
• Ensemble of conformations
– A series of conformations is generated before docking
– Each conformer is docked in turn as a rigid body
– FLOG (variant on DOCK)
– Glide, FRED: often use filters and approximations to
identify conformations of interest
• Ligand torsions
O O N
O
1 H 1 2
3 H
Ligan N H 2
d 2 N H
1 Protein
O
1
O 4
N O
2 H
7 H
5 H H 4 3
N N
H
6
Hydrogen
s
Acceptors
Flexible Docking: F l e x X
• Incremental construction: flexible ligand; rigid protein
– The conformation of the ligand is constructed step-wise
within the active site
– The ligand is broken down into fragments
– Base fragments of ligand are docked first
– A systematic conformational search of the ligand is carried
out as
each new fragment is added in all possible ways
– The protein binding site is used to prune the search tree
N
N
N N
O O
OH OH
Fragment-based docking: F l e x X
Interaction model:
Interaction centre of first group lies
approximately on interaction surface
of second group.
B. Kramer et al.
“Ligand Docking and
Screening with FlexX”, Med.
Chem. Res.
1999, 9, 463-478
FlexX matches triangles of interaction
http://www.biosolveit.de sites onto complementary ligand atoms.
Energeti cs of protein-ligand
binding
• Ligand-receptor binding is driven by
• electrostatics (including hydrogen bonding interactions)
• dispersion or van der Waals forces
• hydrophobic interactions
• desolvation: surfaces buried between the protein and the
ligand have to be desolvated
• Conformational changes to protein and ligand
• ligand must be properly orientated and translated to interact
and
form a complex
• GOLD Score
– Protein-ligand hydrogen bond energy S(hb_ext)
– Protein-ligand van der Waals (vdw) energy
S(vdw_ext)
– Ligand internal energy S(int)
Scoring Functi ons: I I
• Empirical
– Böhm J. Comput. Aided Mol. Design 8 (1994)
243-256
Gro t NROT
Gb in d G0 Gh b f R, G io n ic f R, G
lip o
Alip o
hbonds ionicinteractions
– equation proposed based on linear combination of simple
properties – hydrogen bonding, ionic interactions, lipophilic
interactions, loss of internal conformational freedom of
ligand
– multiple linear regression used to calculate values for
coefficients by attempting to fit the equation to
experimental binding data (eg 45 protein-ligand complexes)
Ghb=-1.2kcal/mol, Gionic=-2.0kcal/mol, Glipo=-0.04kcal/mol Å2,
Grot=+0.3kcal/mol, G0=+1.3kcal/mol
b
2
a b
z )2
a
b
RMSD N
Deviation
Evaluati ng a docking program
The GOLD result (dark) superimposed on the Xray
structure (light)
1CIN: Wrong
Fatty acid binding
protein
GOLD: Validati on
• GOLD validation
– 305 complexes found in PDB (CCDC/Astex dataset)
– ligand extracted from complex
– ligand minimised
– docked back to protein
– GOLD prediction compared with original crystal structure
A Davis, S Teague G
Kleywegt Angew. Chem.
2003, 24, 2693
HO O
Enol Keton
e
• Conformations
– Need to ensure sufficient sampling of conformational space has
been carried out
– Can we be sure the bioactive conformation has been
generated?
– May want to apply filtering techniques to prune
unlikely candidates prior to carrying out the docking
Current Status of Docking:
•
1
Most docking programs take account of conformational flexibility
of the ligand but very flexible ligands are still difficult
• Some protein-ligand interactions occur via a water molecules
– Can switch waters on and off in the binding site but usually based
on positions seen in the x-ray structure
• Some docking programs allow protein side chain flexibility
– Full protein flexibility cannot yet be handled except by molecular
dynamics with is extremely computationally demanding
• Scoring functions
– Reasonably good at finding the correct pose for a given protein-
ligand complex
– Less good at ranking different ligands against the same protein
(virtual screening)
• Variety of different post-processing procedures are available to
help reorder the output
Current Status of Docking: 2
• Despite its limitations docking is very widely used and there
are many success stories
– see Kolb et al. Curr. Opin. Biotech., 2009, 20, 429,
and Waszkowycz et al., WIREs Comp Mol. Sci., 2011,
1, 229)
• Performance varies from target to target, and scoring
function to
scoring function
– See for example, Plewczynski et al, “Can we trust
docking results? Evaluation of seven commonly used
programs on PDBbind database”, J. Comp. Chem.,
2011, 32, 742.
• Care needs to be taken when preparing both the protein and
the ligands
• The more information you have (and use!), the better your
chances
– Targeted library, docking constraints, filtering poses,
seeding
with known actives, comparing with known crystal poses
Conclusions
• Wide range of virtual screening techniques have
been developed
• The performance of different methods varies
on different datasets
• Increased complexity in descriptors and method
does not necessarily lead to greater success
• Combining different approaches can lead to
improved results
• Computational filters should be applied to remove
undesirable compounds from further
consideration
Some more references
• Ripphausen et al. (2010) Quo vadis, virtual screening? A
comprehensive review of prospective applications. Journal of
Medicinal Chemistry, 53, 8461-8467.
• Scior et al. (2012) Recognizing pitfalls in virtual screening: a critical
review. Journal of Chemical Information and Modeling, 52, 867-881
• Sotriffer (Ed) Virtual Screening. Principles, Challenges and Practical
Guidelines. Wiley-VCH, 2011.
• Varnek A, Baskin I. Machine Learning Methods for Property Prediction
in Chemoinformatics: Quo Vadis? Journal of Chemical Information and
Modeling 2012, 52, 1413−1437
• Hartenfeller, M.; Schneider, G. Enabling future drug discovery by de novo
design. Wiley Interdisciplinary Reviews-Computational Molecular
Science 2011, 1, 742-759.