0% found this document useful (0 votes)
329 views57 pages

Protein - Ligand Docking

Protein-ligand docking is a structure-based drug design method that computationally mimics the binding of a ligand to a protein. It predicts both the pose of the ligand molecule in the protein's binding site and the binding affinity score. The main applications of protein-ligand docking are for virtual screening to identify potential lead compounds and for pose prediction. Popular docking software uses search algorithms to generate poses and scoring functions to calculate binding affinity scores for the poses. Key aspects that docking software must address include ligand and protein preparation, flexible or rigid docking approaches, and limitations of the rigid receptor approximation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
329 views57 pages

Protein - Ligand Docking

Protein-ligand docking is a structure-based drug design method that computationally mimics the binding of a ligand to a protein. It predicts both the pose of the ligand molecule in the protein's binding site and the binding affinity score. The main applications of protein-ligand docking are for virtual screening to identify potential lead compounds and for pose prediction. Popular docking software uses search algorithms to generate poses and scoring functions to calculate binding affinity scores for the poses. Key aspects that docking software must address include ligand and protein preparation, flexible or rigid docking approaches, and limitations of the rigid receptor approximation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Protein- Ligand Docking

Outline

Introduction
Applications
Algorithms
Programs available for Docking
Practical aspects

Unknown
protein structure

Known protein
structure

Computer-aided drug design (CADD)


Known ligand(s)

No known ligand

Structure-based drug
design (SBDD)

De novo design

Protein-ligand docking

Ligand-based drug design


(LBDD)
1 or more ligands
Similarity searching
Several ligands
Pharmacophore searching
Many ligands (20+)
Quantitative Structure-Activity
Relationships (QSAR)

CADD of no use
Need experimental
data of some sort

Protein-ligand docking

A Structure-Based Drug Design (SBDD) method


structure means using protein structure

Computational method that mimics the binding of a


ligand to a protein
Given...

Predicts...
The pose of the molecule in
the binding site
The binding affinity or a
score representing the
strength of binding

Pose vs. binding site


Binding site (or active site)
the part of the protein where the
ligand binds
generally a cavity on the protein
surface
can be identified by looking at the
crystal structure of the protein bound
with a known inhibitor

Pose (or binding mode)


The geometry of the ligand in the
binding site
Geometry = location, orientation
and conformation

Protein-ligand docking is not


about identifying the binding site

Uses of docking
The main uses of protein-ligand docking are for
Virtual screening, to identify potential lead compounds from
a large dataset (see next slide)
Pose prediction

Pose prediction
If we know exactly where and how a
known ligand binds...

We can see which parts are


important for binding
We can suggest changes to
improve affinity
Avoid changes that will clash
with the protein

Components of docking software


Typically, protein-ligand docking software
consist of two main components which work
together:
1. Search algorithm

Generates a large number of poses of a molecule in


the binding site

2. Scoring function

Calculates a score or binding affinity for a particular


pose
To give:
The pose of the molecule in
the binding site
The binding affinity or a
score representing the
strength of binding

Final points
Large number of docking programs available

AutoDock, DOCK, e-Hits, FlexX, FRED, Glide, GOLD,


LigandFit, QXP, Surflex-Dockamong others
Different scoring functions, different search algorithms,
different approaches

See Section 12.5 in DC Young, Computational Drug Design (Wiley


2009) for good overview of different packages

Note: protein-ligand docking is not to be confused with the field


of protein-protein docking (protein docking)

Outline

Introduction to protein-ligand docking


Practical aspects
Searching for poses
Scoring functions
Assessing performance

Pre Docking

Steps in Autodock

Program needs to download


Installation procedure
Important files for docking
Protocol for Docking

Post Docking

Algorithm to be discussed
Scoring Function

Protein Preparation

Preparing the protein structure


PDB structures often contain water molecules
In general, all water molecules are removed except where
it is known that they play an important role in
coordinating to the ligand

PDB structures are missing all hydrogen atoms


Many docking programs require the protein to have
explicit hydrogens. In general these can be added
unambiguously, except in the case of acidic/basic side
chains
An incorrect assignment of protonation
states in the active site will give poor
results
Glutamate, Aspartate have COO- or
COOH
OH is hydrogen bond donor, O- is not

Histidine is a base and its neutral form


has two tautomers

Preparing the protein structure


For particular protein side chains, the PDB
structure can be incorrect
Crystallography gives electron density, not
molecular structure
In poorly resolved crystal structures of proteins,
isoelectronic groups can give make it difficult to deduce
the correct structure

Affects asparagine, glutamine, histidine


Important? Affects hydrogen bonding pattern
May need to flip amide or imidazole
How to decide? Look at hydrogen bonding pattern in
crystal structures containing ligands

Ligand Preparation
A reasonable 3D structure is required as starting
point
During docking, the bond lengths and angles in ligands
are held fixed; only the torsion angles are changed

The protonation state and tautomeric form of a


particular ligand could influence its hydrogen
bonding ability
Either protonate as expected for physiological pH and use
a single tautomer
Or generate and dock all possible protonation states and
tautomers, and retain the one with the highest score
Enol

Ketone

Search Algorithms
We can classify the various search algorithms
according to the degrees of freedom that they
consider
Rigid docking or flexible docking
With respect to the ligand structure

Rigid docking
The ligand is treated as a rigid structure during the
docking
Only the translational and rotational degrees of freedom are
considered
To deal with the problem of ligand conformations, a large
number of conformations of each ligand are generated in
advance and each is docked separately
Examples: FRED (Fast Rigid Exhaustive Docking) from
OpenEye, and one of the earliest docking programs, DOCK

The DOCK algorithm Rigid docking

The DOCK algorithm developed by


Kuntz and co-workers is generally
considered one of the major
advances in proteinligand docking
[Kuntz et al., JMB, 1982, 161, 269]

The earliest version of the DOCK


algorithm only considered rigid
body docking and was designed to
identify molecules with a high
degree of shape complementarity
to the protein binding site.

The first stage of the DOCK


method involves the construction of
a negative image of the binding
site consisting of a series of
overlapping spheres of varying
radii, derived from the molecular
surface of the protein

AR Leach, VJ Gillet, An Introduction to Cheminformatics

The DOCK algorithm Rigid docking

Ligand atoms are then


matched to the sphere
centres so that the distances
between the atoms equal the
distances between the
corresponding sphere centres,
within some tolerance.

The ligand conformation is


then oriented into the binding
site. After checking to ensure
that there are no
unacceptable steric
interactions, it is then scored.

New orientations are


produced by generating new
sets of matching ligand atoms
and sphere centres. The
procedure continues until all
possible matches have been
AR Leach, VJ Gillet, An Introduction to Cheminformatics
considered.

Flexible docking
Flexible docking is the most common form of docking
today

Conformations of each molecule are generated on-the-fly


by the search algorithm during the docking process
The algorithm can avoid considering conformations that do
not fit

Exhaustive (systematic) searching computationally too


expensive as the search space is very large
One common approach is to use stochastic search
methods

These dont guarantee optimum solution, but good solution


within reasonable length of time
Stochastic means that they incorporate a degree of
randomness
Such algorithms include genetic algorithms (GOLD),
simulated annealing (AutoDock)

An alternative is to use incremental construction


methods
These construct conformations of the ligand within the
binding site in a series of stages
First one or more base fragments are identified which are

Handling protein conformations


Most docking software treats the protein as rigid
Rigid Receptor Approximation

This approximation may be invalid for a


particular protein-ligand complex as...

the protein may deform slightly to accommodate


different ligands (ligand-induced fit)
protein side chains in the active site may adopt
different conformations

Some docking programs allow protein side-chain


flexibility

For example, selected side chains are


allowed to undergo torsional rotation
around acyclic bonds
Increases the search space

Larger protein movements can only be handled


by separate dockings to different protein
conformations

Ensemble docking (e.g. GOLD 5.0)

Components of docking software


Typically, protein-ligand docking software
consist of two main components which work
together:
1. Search algorithm

Generates a large number of poses of a molecule in


the binding site

2. Scoring function

Calculates a score or binding affinity for a particular


pose
To give:
The pose of the molecule in
the binding site
The binding affinity or a
score representing the
strength of binding

The perfect scoring function will


Accurately calculate the binding affinity

Will allow actives to be identified in a virtual screen


Be able to rank actives in terms of affinity

Score the poses of an active higher than poses of an


inactive
Will rank actives higher than inactives in a virtual screen

Score the correct pose of the active higher than an


incorrect pose of the active
Will allow the correct pose of the active to be identified

actives = molecules with biological activity

Classes of scoring function


Broadly speaking, scoring functions can
be divided into the following classes:
Forcefield-based
Based on terms from molecular mechanics
forcefields
GoldScore, DOCK, AutoDock

Empirical
Parameterised against experimental binding
affinities
ChemScore, PLP, Glide SP/XP

Knowledge-based potentials
Based on statistical analysis of observed pairwise
distributions
PMF, DrugScore, ASP

Empirical scoring functions

Bhms empirical scoring function

In general, scoring functions assume that the free energy of binding can be
written as a linear sum of terms to reflect the various contributions to binding

Bohms scoring function included contributions


from hydrogen bonding, ionic interactions, lipophilic
interactions and the loss of internal conformational
freedom of the ligand.

The G values on the right of the equation are all constants (see next slide)
Go is a contribution to the binding energy that does not directly depend on any
specific interactions with the protein
The hydrogen bonding and ionic terms are both dependent on the geometry
of the interaction, with large deviations from ideal geometries (ideal distance R,
ideal angle ) being penalised.
The lipophilic term is proportional to the contact surface area (Alipo) between
protein and ligand involving non-polar atoms.
The conformational entropy term is the penalty associated with freezing
internal rotations of the ligand. It is largely entropic in nature. Here the value is
directly proportional to the number of rotatable bonds in the ligand (NROT).

Bhms empirical scoring function

This scoring function is an empirical scoring


function
Empirical = incorporates some experimental data

The coefficients (G) in the equation were


determined using multiple linear regression on
experimental binding data for 45 protein
ligand complexes
Although the terms in the equation may differ,
this general approach has been applied to the
development of many different empirical
scoring functions

Knowledge-based
potentials

Statistical potentials

Based on a comparison between the observed


number of contacts between certain atom types (e.g.

sp2-hybridised oxygens in the ligand and aromatic carbons in the


protein) and the number of contacts one would expect

there were no interaction between the atoms (the


reference state)

Derived from an analysis of pairs of non-bonded


interactions between proteins and ligands in PDB
Observed distributions of geometries of ligands in crystal
structures are used to deduce the potential that gave rise to
the distribution
Hence knowledge-based potential

if

Knowledge-based
For example, creating potentials
the distributions of ligand carbonyl oxygens to
protein hydroxyl groups:

(imagine the minimum at 3.0Ang)

Knowledge-based
potentials
Some pairwise interactions may occur seldom in the
PDB
Resulting distribution may be inaccurate

Doesnt take into account directionality of


interactions, e.g. hydrogen bonds
Just based on pairwise distances

Resulting score contains contributions from a large


number of pairwise interactions
Difficult to identify problems and to improve

Sensitive to definition of reference state


DrugScore has a different reference state than ASP (Astex
Statistical Potential)

Final thoughts
Protein-ligand docking is an essential tool for
computational drug design

Widely used in pharmaceutical companies


Many success stories (see Kolb et al. Curr. Opin. Biotech.,
2009, 20, 429)

But its not a golden bullet

The perfect scoring function has yet to be found


The performance varies from target to target, and scoring
function to scoring function

See for example, Plewczynski et al, Can we trust docking results?


Evaluation of seven commonly used programs on PDBbind database,
J. Comp. Chem., Online 1 Sep 2010.

Care needs to be taken when preparing both the


protein and the ligands
The more information you have (and use!), the better
your chances

Targeted library, docking constraints, filtering poses, seeding


with known actives, comparing with known crystal poses

Rational Approach to Drug Discovery


Identify and validate target

Clone gene encoding target

Express target
Crystal structures/MM of target and target/inhibitor
complexes
Identify lead
compounds
Synthesize modified lead compounds
Toxicity &

pharmacokinetic studies

Preclinical trials

Molecular Docking

Docking is the computational determination of binding

affinity between molecules (protein structure and ligand).


Given a protein and a ligand find out the binding
free energy of the complex formed by docking them.

L
R

Molecular Docking: classification


Docking or Computer aided drug designing can
be broadly classified
Receptor based methods- make use of the structure of
the target protein.
Ligand based methods- based on the known inhibitors

Receptor based methods


Uses the 3D structure of the target receptor to
search for the potential candidate compounds
that can modulate the target function.
These involve molecular docking of each
compound in the chemical database into the
binding site of the target and predicting the
electrostatic fit between them.
The compounds are ranked using an appropriate
scoring function such that the scores correlate
with the binding affinity.
Receptor based method has been successfully
applied in many targets

Ligand based strategy


In the absence of the structural information of the
target, ligand based method make use of the
information provided by known inhibitors for the
target receptor.
Structures similar to the known inhibitors are
identified from chemical databases by variety of
methods,
Some of the methods widely used are similarity
and substructure searching, pharmacophore
matching or 3D shape matching.
Numerous successful applications of ligand based
methods have been reported

Ligand based strategy


Search for similar compounds

database

known actives

structures found

Components of molecular docking


A) Search algorithm
To find the best conformation of the ligand
and the protein system.
Rigid and flexible docking
B) Scoring function
Rank the ligands according to the interaction energy.
Based on the energy force-field function.

Success with vHTS

Dihydrofolate reductase inhibitor (1992)


HIV-protease (1992)
Phospholypase A2 (1994)
Thrombine (1996)
Carbonic anhydrase inhibitors(2002)

Ligand-based drug discovery

No a priori knowledge of the receptor


What information can we get from a few active
compounds

Drug-likeness: what makes a drug a drug?


Lipinskis rule of 5: most drugs have less than two violation
of the following rules:
Lipinski et al. Advanced Drug Delivery Reviews Volume 46, Issues 1-3, 1 March 2001, Pages 3-26

* 10 hydrogen bond acceptors


* 5 hydrogen bond donors
* Molecular weight 500
A partition coefficient log P 5

(non-original paper) 2-8 rotatable bonds

http://www.daylight.com/dayhtml/doc/clogp/#PCMsc1.2

benzene = 1.39
methanol = -.32
phenol = 1.4

(entropy)
~30% exceptions. E.g., taxol

Overview of the lecture


Introduction to molecular docking:

Definition
Types
Some techniques
Programs

Algorithm for Protein-Protein docking based in


paper:
Protein-Protein Docking with Simultaneous Optimization of Rigidbody Displacement and Side-chain Conformations
Jeffrey J. Gray, Stewart Moughon, Chu Wang, Ora Schueler-Furman
Brian Kuhlman, Carol A. Rohl and David Baker
J. Mol. Biol. (2003) 331, 281-299

What is Docking?

Docking attempts to find the best matching between two molecules

a more serious
definition
Given two biological molecules determine:
- Whether the two molecules interact
- If so, what is the orientation that maximizes the
interaction while minimizing the total energy
of the complex

Goal: To be able to search a database of molecular structures


and retrieve all molecules that can interact with the query
structure

Why is docking important?


It is of extreme relevance in cellular biology,
where function is accomplished by proteins
interacting with themselves and with other
molecular components
It is the key to rational drug design: The results
of docking can be used to find inhibitors for
specific target proteins and thus to design new
drugs. It is gaining importance as the number of
proteins whose structure is known increases

Types of Docking studies


Protein-Protein Docking
Both molecules usually considered rigid
6 degrees of freedom
First apply steric constraints to limit search space and
the examine energetics of possible binding
conformations

Protein-Ligand Docking
Flexible ligand, rigid-receptor
Search space much larger
Either reduce flexible ligand to rigid fragments
connected by one or several hinges, or search the
conformational space using monte-carlo methods or
molecular dynamics

Docking Programs
More information in:
http://www.bmm.icnet.uk/~smithgr/soft.html

The programs are:


DOCK (I. D. Kuntz, UCSF)
AutoDOCK (Arthur Olson, The Scripps Research
Institute)
RosettaDOCK (Baker, Washington Univ., Gray, Johns
Hopkins Univ.)

Important Points in Drug Design based on


Bioinformatics Tools

Chemical Modification of Known Drugs


Drug improvement by chemical modification
Pencillin G -> Methicillin; morphine->nalorphine

Receptor Based drug design

Receptor is the target (usually a protein)


Drug molecule binds to cause biological effects
It is also called lock and key system
Structure determination of receptor is important

Ligand-based drug design


Search a lead ocompound or active ligand
Structure of ligand guide the drug design process

Important Points in Drug Design based on


Bioinformatics Tools

Identify Target Disease


Identify and study the lead compounds
Marginally useful and may have severe side effects

Refinement of the chemical structures

Detect the Molecular Bases for Disease


Detection of drug binding site
Tailor drug to bind at that site
Protein modeling techniques
Traditional Method (brute force testing)

Drug Discovery & Development


Identify disease

A
ND

Formulation

Human clinical trials


(2-10 years)

le

Fi
IN le
D

Preclinical testing
(1-3 years)

Fi

Isolate protein
involved in
disease (2-5 years)

Find a drug effective


against disease protein
(2-5 years)
Scale-up

FDA approval
(2-3 years)

Techology is impacting this process


GENOMICS, PROTEOMICS & BIOPHARM.

Potentially producing many more target


and personalized targets

HIGH THROUGHPUT SCREENING


Identify disease

Screening up to 100,000 compounds


day for activity against a target prote

VIRTUAL SCREENING

Using a computer to
predict activity

Isolate protein

COMBINATORIAL CHEMISTRY

Rapidly producing vast numbers


of compounds

Find drug

MOLECULAR MODELING

Computer graphics & models help improve activity

IN VITRO & IN SILICO ADME MODELS

Preclinical testing

Tissue and computer models begin to replace animal testing

Molecular docking-definition
It is a process by which two
molecules are put together in 3
Dimension
Best ways to put two molecules
together
Using molecular modeling and
computational chemistry tools

Molecular docking
Docking used for finding binding
modes of protein with
ligands/inhibitors
In molecular docking, we attempt to
predict the structure of the
intermolecular complex formed
between two or more molecules
Docking algorithms are able to
generate a large number of possible
structures
We use force field based strategy to

Steps of molecular docking


Three steps
(1)Definition of the structure of the
target molecule
(2) Location of the binding site
(3) Determination of the binding
mode

Some examples
Ritonavir (trade name Norvir) is one
of a class of anti-HIV drugs called
protease inhibitors
Saquinavir
Indinavir is another example of very
potent peptidomimetic compound
discovered using the elements of 3D
structure and Structure Activity
Relationship (SAR)

You might also like