0% found this document useful (0 votes)

15 views24 pages

Sequence Alignment

Uploaded by

indu221007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views24 pages

Sequence Alignment

Uploaded by

indu221007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

UNIT II – STRUCTURE PREDICTION AND

DRUG DESIGN

Dr. M Indira
Associate Professor
Department of Biotechnology
Vignan University
SYLLABUS
 Protein structure prediction;
 Introduction to comparative modeling;
 Sequence alignment;
 Constructing and evaluating a comparative model;
 Predicting protein structures by 'threading';
Molecular docking - AUTODOCK/EASYMODELLER
and HEX;
 Structure based de novo ligand design;
 Drug discovery;
 Chemoinformatics; QSAR.
OUTLINE Bioinformatics

• Sequence Alignment
• Types of a sequence alignment
• Methods of sequence
alignment
• Dot Matrix method
• Dynamic programming method
• Word method or k-tuple method
Definition of sequence alignment

 Sequence alignment is a way of arranging sequences of DNA,RNA or

protein to identify regions of similarity is made to align the entire

sequence. the similarity may indicate the funcutional,structural and

evolutionary significance of the sequence.

 The sequence alignment is made between a known sequence and

unknown sequence or between two unknown sequences.

 The known sequence is called reference sequence.the

unknown sequence is called query sequenc.

Interpretation of sequence alignment

•Sequence alignment is useful for discovering structural, functional

and evolutionary information.
•Sequences that are very much alike may have similar secondary and
3D structure, similar function and likely a common ancestral
sequence. It is extremely unlikely that such sequences obtained
similarity by chance. For DNA molecules with nnucleotides such
probability is very low P=4n-.For proteins the probability even much
lower P=20n–,where nis a number of amino acid residues
•Large scale genome studies revealed existence of horizontal
transfer of genes and other sequences between species, which may
cause similarity between some sequences in very distant species.
Types of Sequence Alignment
 Sequence Alignment is of two types , namely :

 Global Alignment

 Local Alignment

 Global Alignment : is a matching the residues of two sequences

across their entire length.
 global alignment matches the identical sequences .

 Local Alignment : is a matching two sequence from regions which

have more similarity with each other.
Types of Sequence Alignment

 Global alignment

 Input: treat the two sequences as potentially equivalent

 Goal: identify conserved regions and differences

 Applications:

- Comparing two genes with same function (in human vs. mouse).

- Comparing two proteins with similar function.

Types of Sequence Alignment
 Local alignment

 Input: The two sequences may or may not be related

 Goal: see whether a substring in one sequence aligns well
with a substring in the other

 Note: for local matching, overhangs at the ends are not treated
as gaps
 Applications:
- Searching for local similarities in large sequences
(e.g., newly sequenced genomes).
- Looking for conserved domains or motifs in two
proteins
Types of Sequence Alignmentu
• L G P S S K Q T G K G S - S R I W D N
• Globalalignment
• L N - I T K S A G K G A I M R L G D A

• - - - - - - - T G K G - - - - - - - -
• Localalignment
• - - - - - - - A G K G - - - - - - - -
Method of sequence alignment

• Dot matrix method

• The dynamic programming (DP) algorithm
• Word or k-tuple methods
Dot matrix analysis
•A dot matrix is a grid system where the similar nucleotides of two DNA
sequences are represented as dots.
• It also called dot plots.
• It is a pairwise sequence alignment made in the computer.
• The dots appear as colourless dots in the computer screen.
•In dot matrix , nucleotides of one sequence are written from the left to
right on the top row and those of the other sequence are written from the
top to bottom on the left side (column) of the matrix.At every point,
where the two nucleotides are the same , a dot in the intersection of row
and column becomes a dark dot. when all these darken dots are
connected, it gives a graph called dot plot. the line found in the dot plot is
called recurrence plot. Each dot in the plot represents a matching
nucleotide or amino acid.
Dot matrix analysis
• Dot matrix method is a qualitative and simple
to analyze sequences. however, it takes much
time to analyze large sequences.
•Dot matrix method is useful for the following
studies :
• Sequence similarity between two nucleotide
sequences or two amino acid sequences.
•Insertion of short stretches in DNA or amino
acid sequence.
Dot matrix analysis: Two identical sequences
• Nucleic Acids Dot Plots
Dot matrix analysis: two very different sequences
• Nucleic Acids Dot Plots of genes
Dot matrix analysis: two similar sequences
• Nucleic Acids Dot Plots of genes
Dynamic Programming Method

• Is the process of solving problems where one needs to find the best
decision one after another.
• It was introduced by Richard Bellman in 1940.
• The word programming here denotes finding an acceptable plan of action
not computer programming.
• It is useful in aligning nucleotide sequence of DNA and amino acid
sequence of proteins coded by that DNA .
• Dynamic programming is a three step process that involves :
1) Breaking of the problem into small subproblems.
2) Solving subproblems using recursive methods.
3) Construction of optimal solutions for original problem using the optimal
solutions .
Dynamic programming algorithm for sequence
alignment

•The method compares every pair of characters in the two sequences and
generates an alignment, which is the best or optimal.
•This is a highly computationally demanding method. However the latest
algorithmic improvements and ever increasing computer capacity make possible
to align a query sequence against a large DB in a few minutes.
•Each alignments has its own score and it is essential to recognise that several
different alignments may have nearly identical scores, which is an indication
that the dynamic programming methods may produce more than one optimal
alignment. However intelligent manipulation of some parameters is important
and may discriminate the alignments with similar scores.
•Global alignment program is based on Needleman-Wunsch algorithm and local
alignment on Smith-Waterman. Both algorithms are derivates from the basic
dynamic programming algorithm.
Description of the dynamic programming algorithm

•The alignment procedure depends upon scoring system, which can be based on
probability that 1) a particular amino acid pair is found in alignments of related
proteins (pxy); 2) the same amino acid pair is aligned by chance (pxpy); 3)
introduction of a gap would be a better choice as it increases the score.
•The ratio of the first two probabilities is usually provided in an amino acid
substitution matrix. There are many such matrices, two of them PAM and
BLOSUM are considered later.
•The score for the gap introduction and its extension is also calculated from the
matrices and represent a prior knowledge and some assumptions. One of them is
quite simple, if negative cost of a gap is too high a reasonable alignment
between slightly different sequences will be never achieved but if it is too low
an optimal alignment is hardly possible. Other assumptions are based on
sophisticated statistical procedures.
Derivation of the dynamic programming
algorithm
1. Score of new = Score of previous + Score of new
alignment alignment (A) aligned pair
V D S - C Y V D S - C Y
V E S L C Y V E S L C Y
15 = 8 + 7
2. Score of = Score of previous + Score of new
alignment (A) alignment (B) aligned pair
V D S - C V D S - C
V E S L C V E S L C
8 = -1 + 9

3. Repeat removing aligned pairs until end of alignments is reached

Scoring matrices: PAM (Percent Accepted
Mutation)

Amino acids are grouped according to to the chemistry of the side group: (C) sulfhydryl, (STPAG)-
small hydrophilic, (NDEQ) acid, acid amide and hydrophilic, (HRK) basic, (MILV) small hydrophobic,
and (FYW) aromatic. Log odds values: +10 means that ancestor probability is greater, 0 means that the
probability are equal, -4 means that the change is random. Thus the probability of alignment YY/YY is
10+10=20, whereas YY/TP is –3-5=-8, a rare and unexpected between homologous sequences.
Scoring matrices: BLOSUM62
(BLOcks amino acid SUbstitution Matrices)

Ideology of BLOSUM is similar but it is calculated from a very different and much larger set
of proteins, which are much more similar and create blocks of proteins with a similar pattern
Formal description of dynamic programming
algorithm

i-x

Si-x,j-wx
Si–1j-,1 +s(ai,bj)

i-1

Si,j-y-wy Si,j

i-y j-1 j
•This diagram indicates the moves that are possible to reach a certain position (i,j) starting from the
previous row and column at position (i-1,j-1)or from any position in the same row or column
•Diagonal move with no gap penalties or move from any other position from column jor row i, with a
gap penalty that depends on the size of the gap
Word Method or K-tuple method

• It is used to find an optimal alignment solution,but is more than dynamic

programming .
• This method is useful in large-scale database searches to find whether there
is significant match available with the query sequence.
• Word method is used in the database search tools FASTA and the BLAST
family .
• They identify a series of short ,non-overlapping subsequences (words)
of the query sequence.
• Then they are matched to candidate database sequences to get result .
Word Method or K-tuple method
• In the FASTA method ,the user defines a value kto use as the word length
to search the database .it is slower but more sensitive at lower values of
k.they are also perferred for serches involving a very short qurery
sequence .
• The BLAST provides a number of algorithms optimized for particular types
of queries ,for distantly related sequence matches.
• It is a good alternative to FASTA .However , the results are not very
accurate .
• Like FASTA ,BLAST uses a word search of length k,but evaluates only
the most significant word m,latches rather than every word match .

Tense and Aspect in Han Period Chinese PDF
50% (2)
Tense and Aspect in Han Period Chinese PDF
538 pages
E-Let Review Mathematics Set 2
No ratings yet
E-Let Review Mathematics Set 2
53 pages
Maternal Pelvis
100% (2)
Maternal Pelvis
32 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Application of Anti-Corona Products: Slot Portion End Windings
No ratings yet
Application of Anti-Corona Products: Slot Portion End Windings
17 pages
Weighted Moving Average Formula
No ratings yet
Weighted Moving Average Formula
25 pages
PIPESIM Presentation SAE - 20181105
No ratings yet
PIPESIM Presentation SAE - 20181105
35 pages
BMB 822 - Bioinformatics and Computing - Lecture Notes
No ratings yet
BMB 822 - Bioinformatics and Computing - Lecture Notes
94 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Lec 02
No ratings yet
Lec 02
103 pages
Delta Ferrite
No ratings yet
Delta Ferrite
4 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
Bio 3
No ratings yet
Bio 3
51 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Golden Physics Book
No ratings yet
Golden Physics Book
90 pages
Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
Logcat 1678081425376
No ratings yet
Logcat 1678081425376
135 pages
Module II
No ratings yet
Module II
51 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Bioinfo Notes 2
No ratings yet
Bioinfo Notes 2
9 pages
L8 Msa
No ratings yet
L8 Msa
52 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
Aits 1819 FT II Jeem
100% (1)
Aits 1819 FT II Jeem
23 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Boiler Settings, Combustion Systems, and Auxiliary Equipment
No ratings yet
Boiler Settings, Combustion Systems, and Auxiliary Equipment
115 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Developing Machine Learning Based Passenger Behaviour Assesment Model For Davangere Bus Transport System
No ratings yet
Developing Machine Learning Based Passenger Behaviour Assesment Model For Davangere Bus Transport System
15 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
M2 R5 Jan2023 Set1
No ratings yet
M2 R5 Jan2023 Set1
21 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
AsBioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
AsBioinfo Ders 7 ALLIGNMENT - 1
9 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Jurnal Tanah
No ratings yet
Jurnal Tanah
9 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
Protein Tertiary Structures: Prediction From Amino Acid Sequences
No ratings yet
Protein Tertiary Structures: Prediction From Amino Acid Sequences
7 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Unit Iii
No ratings yet
Unit Iii
14 pages
Sequence Allignment
No ratings yet
Sequence Allignment
5 pages
3a - External Flow Examples For Convection Heat Transfer
No ratings yet
3a - External Flow Examples For Convection Heat Transfer
7 pages
Unit 3 Bioinformatics
No ratings yet
Unit 3 Bioinformatics
11 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Introduction To Bioinformatics: Sequence Alignment
No ratings yet
Introduction To Bioinformatics: Sequence Alignment
29 pages
GC AccessoryCat 09 V2
No ratings yet
GC AccessoryCat 09 V2
13 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
Congruency Proofs Q2
No ratings yet
Congruency Proofs Q2
2 pages
Introduction To Bioinformatics Lecture 3
No ratings yet
Introduction To Bioinformatics Lecture 3
20 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
Failure Analysis of A Helical Gear in A Gearbox Used in A Steel Rolling Mill
No ratings yet
Failure Analysis of A Helical Gear in A Gearbox Used in A Steel Rolling Mill
7 pages
BDT Mock Sample 1
No ratings yet
BDT Mock Sample 1
4 pages
Protein Sequence Alignment Lecture Notes
No ratings yet
Protein Sequence Alignment Lecture Notes
2 pages
Srm-3006: Selective Radiation Meter For Electromagnetic Fields Up To 6 GHZ
No ratings yet
Srm-3006: Selective Radiation Meter For Electromagnetic Fields Up To 6 GHZ
24 pages
JBL Tr125 Manual de Servicio
No ratings yet
JBL Tr125 Manual de Servicio
2 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Figure 4.3: The Kisssoft Results Window
No ratings yet
Figure 4.3: The Kisssoft Results Window
10 pages
Flygt DX: Submersible Drainage & Waste Water Pumps, 50 HZ
No ratings yet
Flygt DX: Submersible Drainage & Waste Water Pumps, 50 HZ
4 pages
KMP Algorithm
No ratings yet
KMP Algorithm
1 page
Oop hw1
No ratings yet
Oop hw1
7 pages
Ug NX
No ratings yet
Ug NX
4 pages
Principle of Concrete Mix Design
No ratings yet
Principle of Concrete Mix Design
3 pages
06 Maths Ws 09 Data Handling 01
No ratings yet
06 Maths Ws 09 Data Handling 01
3 pages
Course - CS 120 Computer Programming Lab
No ratings yet
Course - CS 120 Computer Programming Lab
3 pages
PHSN 106 Chapter 1 Reading Journal
No ratings yet
PHSN 106 Chapter 1 Reading Journal
2 pages
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Sequence Alignment

Uploaded by

Sequence Alignment

Uploaded by

UNIT II – STRUCTURE PREDICTION AND

 Sequence alignment is a way of arranging sequences of DNA,RNA or

protein to identify regions of similarity is made to align the entire

sequence. the similarity may indicate the funcutional,structural and

evolutionary significance of the sequence.

 The sequence alignment is made between a known sequence and

unknown sequence or between two unknown sequences.

 The known sequence is called reference sequence.the

unknown sequence is called query sequenc.

•Sequence alignment is useful for discovering structural, functional

 Global Alignment : is a matching the residues of two sequences

 Local Alignment : is a matching two sequence from regions which

 Input: treat the two sequences as potentially equivalent

 Goal: identify conserved regions and differences

- Comparing two proteins with similar function.

 Input: The two sequences may or may not be related

• Dot matrix method

3. Repeat removing aligned pairs until end of alignments is reached

• It is used to find an optimal alignment solution,but is more than dynamic

You might also like