Sequence Alignment
Sequence Alignment
DRUG DESIGN
Dr. M Indira
Associate Professor
Department of Biotechnology
Vignan University
SYLLABUS
Protein structure prediction;
Introduction to comparative modeling;
Sequence alignment;
Constructing and evaluating a comparative model;
Predicting protein structures by 'threading';
Molecular docking - AUTODOCK/EASYMODELLER
and HEX;
Structure based de novo ligand design;
Drug discovery;
Chemoinformatics; QSAR.
OUTLINE Bioinformatics
• Sequence Alignment
• Types of a sequence alignment
• Methods of sequence
alignment
• Dot Matrix method
• Dynamic programming method
• Word method or k-tuple method
Definition of sequence alignment
Global Alignment
Local Alignment
Global alignment
Applications:
- Comparing two genes with same function (in human vs. mouse).
Note: for local matching, overhangs at the ends are not treated
as gaps
Applications:
- Searching for local similarities in large sequences
(e.g., newly sequenced genomes).
- Looking for conserved domains or motifs in two
proteins
Types of Sequence Alignmentu
• L G P S S K Q T G K G S - S R I W D N
• Globalalignment
• L N - I T K S A G K G A I M R L G D A
• - - - - - - - T G K G - - - - - - - -
• Localalignment
• - - - - - - - A G K G - - - - - - - -
Method of sequence alignment
• Is the process of solving problems where one needs to find the best
decision one after another.
• It was introduced by Richard Bellman in 1940.
• The word programming here denotes finding an acceptable plan of action
not computer programming.
• It is useful in aligning nucleotide sequence of DNA and amino acid
sequence of proteins coded by that DNA .
• Dynamic programming is a three step process that involves :
1) Breaking of the problem into small subproblems.
2) Solving subproblems using recursive methods.
3) Construction of optimal solutions for original problem using the optimal
solutions .
Dynamic programming algorithm for sequence
alignment
•The method compares every pair of characters in the two sequences and
generates an alignment, which is the best or optimal.
•This is a highly computationally demanding method. However the latest
algorithmic improvements and ever increasing computer capacity make possible
to align a query sequence against a large DB in a few minutes.
•Each alignments has its own score and it is essential to recognise that several
different alignments may have nearly identical scores, which is an indication
that the dynamic programming methods may produce more than one optimal
alignment. However intelligent manipulation of some parameters is important
and may discriminate the alignments with similar scores.
•Global alignment program is based on Needleman-Wunsch algorithm and local
alignment on Smith-Waterman. Both algorithms are derivates from the basic
dynamic programming algorithm.
Description of the dynamic programming algorithm
•The alignment procedure depends upon scoring system, which can be based on
probability that 1) a particular amino acid pair is found in alignments of related
proteins (pxy); 2) the same amino acid pair is aligned by chance (pxpy); 3)
introduction of a gap would be a better choice as it increases the score.
•The ratio of the first two probabilities is usually provided in an amino acid
substitution matrix. There are many such matrices, two of them PAM and
BLOSUM are considered later.
•The score for the gap introduction and its extension is also calculated from the
matrices and represent a prior knowledge and some assumptions. One of them is
quite simple, if negative cost of a gap is too high a reasonable alignment
between slightly different sequences will be never achieved but if it is too low
an optimal alignment is hardly possible. Other assumptions are based on
sophisticated statistical procedures.
Derivation of the dynamic programming
algorithm
1. Score of new = Score of previous + Score of new
alignment alignment (A) aligned pair
V D S - C Y V D S - C Y
V E S L C Y V E S L C Y
15 = 8 + 7
2. Score of = Score of previous + Score of new
alignment (A) alignment (B) aligned pair
V D S - C V D S - C
V E S L C V E S L C
8 = -1 + 9
Amino acids are grouped according to to the chemistry of the side group: (C) sulfhydryl, (STPAG)-
small hydrophilic, (NDEQ) acid, acid amide and hydrophilic, (HRK) basic, (MILV) small hydrophobic,
and (FYW) aromatic. Log odds values: +10 means that ancestor probability is greater, 0 means that the
probability are equal, -4 means that the change is random. Thus the probability of alignment YY/YY is
10+10=20, whereas YY/TP is –3-5=-8, a rare and unexpected between homologous sequences.
Scoring matrices: BLOSUM62
(BLOcks amino acid SUbstitution Matrices)
Ideology of BLOSUM is similar but it is calculated from a very different and much larger set
of proteins, which are much more similar and create blocks of proteins with a similar pattern
Formal description of dynamic programming
algorithm
i-x
Si-x,j-wx
Si–1j-,1 +s(ai,bj)
i-1
Si,j-y-wy Si,j
i-y j-1 j
•This diagram indicates the moves that are possible to reach a certain position (i,j) starting from the
previous row and column at position (i-1,j-1)or from any position in the same row or column
•Diagonal move with no gap penalties or move from any other position from column jor row i, with a
gap penalty that depends on the size of the gap
Word Method or K-tuple method