Introduction-To-Computational Biology
Introduction-To-Computational Biology
Bioinformatics
1
Introduction to Bioinformatics.
2
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
5
LECTURE 3: SEQUENCE ALIGNMENT
Homeoboxes and Master regulatory genes
6
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
HOMEO BOX
7
LECTURE 3: SEQUENCE ALIGNMENT
Drosophila melanogaster: HOX homeoboxes
8
LECTURE 3: SEQUENCE ALIGNMENT
Drosophila melanogaster: PAX homeoboxes
9
LECTURE 3: SEQUENCE ALIGNMENT
Homeoboxes and Master regulatory genes
10
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
11
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
* prediction of function
* database searching
* gene finding
* sequence divergence
* sequence assembly
12
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
14
LECTURE 3: SEQUENCE ALIGNMENT
HOMOLOGOUS and PARALOGOUS
15
LECTURE 3: SEQUENCE ALIGNMENT
HOMOLOGOUS and PARALOGOUS versus ANALOGOUS
16
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT: sequence similarity
18
The biological problem of
sequence alignment
DNA-sequence-1
tcctctgcctctgccatcat---caaccccaaagt
|||| ||| ||||| ||||| ||||||||||||
tcctgtgcatctgcaatcatgggcaaccccaaagt
DNA-sequence-2 Alignment
19
Sequence alignment - definition
The sequences are padded with gaps (dashes) so that wherever possible,
columns contain identical characters from the sequences involved
tcctctgcctctgccatcat---caaccccaaagt
|||| ||| ||||| ||||| ||||||||||||
tcctgtgcatctgcaatcatgggcaaccccaaagt
20
Algorithms
Needleman-Wunsch
Pairwise global alignment only.
Smith-Waterman
Pairwise, local (or global) alignment.
BLAST
Pairwise heuristic local alignment
21
Pairwise alignment
Pairwise sequence alignment methods are concerned with finding the best-
matching piecewise local or global alignments of protein (amino acid) or DNA
(nucleic acid) sequences.
22
Global alignment
As these sequences are also easily identified by local alignment methods global
alignment is now somewhat deprecated as a technique.
23
Global Alignment
V I V A L A S V E G A S
A(s,t) = | | | | | | |
V I V A D A - V - - I S
indels
24
The Needleman-Wunsch algorithm
25
The Needleman-Wunsch algorithm
26
Alignment scoring function
27
Alignment cost
28
A simple scoring function
σ(-,a) = σ(a,-) = -1
σ(a,b) = -1 if a ≠ b
σ(a,b) = 1 if a = b
29
The substitution matrix
- A G C T
A 10 -1 -3 -4
G -1 7 -5 -3
C -3 -5 9 0
T -4 -3 0 8
Examples:
V I V A L A S V E G A S
A(s,t) = | | | | | | |
V I V A D A - V - - I S
is:
A* = argmax M(A)
Local alignment methods find related regions within sequences - they can
consist of a subset of the characters within each sequence.
This is a more flexible technique than global alignment and has the advantage
that related regions which appear in a different order in the two proteins (which is
known as domain shuffling) can be identified as being related.
33
The Smith Waterman algorithm
The Smith-Waterman algorithm (1981) is for determining similar regions
between two nucleotide or protein sequences.
35
Sequence alignment - meaning
36
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
original sequence s
38
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
3.6 BLAST:
fast approximate alignment
• Verb: to blast
39
40
41
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
42
Multiple alignment
43
Multiple alignment
44
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
45
Dynamic Programming Approach to
Sequence Alignment
The dynamic programming approach to sequence alignment always tries to
follow the best prior-result so far.
If both penalties are set to 0, it aims to always find an alignment with maximum
matches so far.
Maximum match = largest number matches can have for one sequence by
allowing all possible deletion of another sequence.
Scores for aligned characters are specified by the transition matrix σ (i,j) :
the similarity of characters i and j.
47
The Needleman-Wunsch algorithm
For example, if the substitution matrix was
- A G C T
A 10 -1 -3 -4
then the alignment: AGACTAGTTAC
G -1 7 -5 -3
C -3 -5 9 0 CGA---GACGT
T -4 -3 0 8
48
The Needleman-Wunsch algorithm
1. Create a table of size (m+1)x(n+1) for sequences s and t of lengths m and n,
3. Starting from the top left, compute each entry using the recursive relation:
M i 1, j 1 (s i , t j )
M i, j max M i 1, j (s i ,)
M ( , t )
i , j 1 j
49
The Needleman-Wunsch algorithm
Once the F matrix is computed, note that the bottom right hand corner of the
matrix is the maximum score for any alignments. To compute which alignment
actually gives this score, you can start from the bottom left cell, and compare the
value with the three possible sources(Choice1, Choice2, and Choice3 above) to
see which it came from. If it was Choice1, then A(i) and B(i) are aligned, if it was
Choice2 then A(i) is aligned with a gap, and if it was Choice3, then B(i) is aligned
with a gap.
50
The Needleman-Wunsch algorithm
51
52
The Smith-Waterman algorithm
1. Create a table of size (m+1)x(n+1) for sequences s and t of lengths m and n,
3. Starting from the top left, compute each entry using the recursive relation:
M i 1, j 1 (s i , t j )
M
i 1, j (s i ,)
M i, j max
M i , j 1 (, t j )
0
4. Perform the trace-back procedure from the maximum element in the table to
the first zero element on the trace-back path.
53
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
Compare the gene eyeless of Drosophila Melanoganster with the human gene
aniridia. They are master regulatory genes producing proteins that control large
cascade of other genes. Certain segments of genes eyeless of Drosophila
melanogaster and human aniridia are almost identical. The most important of
such segments encodes the PAX (paired-box) domain, a sequence of 128 amino
acids whose function is to bind specific sequences of DNA. Another common
segment is the HOX (homeobox) domain that is thougth to be part of more than
0.2% of the total nummber of vertebrate genes.
54
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
55
Introduction to Bioinformatics
LECTURE 3: GLOBAL ALIGNMENT
56
Introduction to Bioinformatics
LECTURE 3: GLOBAL ALIGNMENT
57
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
58
END of LECTURE 3
59
Introduction to Bioinformatics
LECTURE 3: SEQUENCE ALIGNMENT
60
61