0% found this document useful (0 votes)
4 views36 pages

Algorithm

Uploaded by

Amit kumar raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views36 pages

Algorithm

Uploaded by

Amit kumar raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Sequence alignment methods

Dr. P. Borah
Professor & Head, Dept. of Animal Biotechnology
College of Veterinary Science
AAU, Khanapara, Guwahati-781022
Two types of alignment:

• Global alignment
• Local alignment
Global alignment
• Compares sequences and gives the
best overall alignment.

• May fail to find the best local region of


similarity (such as a shared motif)
among distantly related sequences.

Needleman-Wunsch algorithm
Local Alignment
Finds regions of un-gapped sequence with a
high degree of similarity.

Better at finding motifs, especially for


sequences that are different overall.

Will return only the best matching segment for


a given pair of sequences.

Smith-Waterman algorithm
Dotplots

Dot plots are two dimensional graphs,


showing a comparison of two sequences.
The two axes of the graph represent the two
sequences being compared.
Every region of the sequence is compared to
every region of the other sequence.
A T G C A T G C
A * *
T * *
G * *
C * *
A * *
T * *
G * *
C * *
Dotplots
 Dotplotting is the best way to see all of the
structures in common between two sequences.
 Dotplotting can also be used to view repeated
structures or inverted repeats in a single sequence.
 This is accomplished by comparing a sequence to
itself.
 Dotplotting helps recognize large regions of
similarity. In most cases it is not sensitive enough to
see small structures, like promoters.
Needleman-Wunsch algorithm
for
Global Alignment
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC

A A T G C A G C T
Match = 1
0 Mismatch = 0
Gap = -1
A

C Needleman Wunsch
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC

A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1

T -2

A -3

T -4

G -5

C -6

A -7

G -8
Needleman Wunsch
C -9
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC

A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1

T -2
Z + M/MM X + Gap
A -3

T -4

G -5 Y + Gap

C -6

A -7

G -8
Needleman Wunsch
C -9
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC

A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1 1

T -2

A -3 Z + M/MM X + Gap

T -4

G -5
Y + Gap
C -6

A -7

G -8

C -9 Needleman Wunsch
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC

A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1 1 0 -1

T -2

A -3 Z + M/MM X + Gap

T -4

G -5
Y + Gap
C -6

A -7

G -8

C -9 Needleman Wunsch
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC

A A T G C A G C T
Match = 1
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Mismatch = 0
Gap = -1
A -1 1 0 -1 -2 -3 -4 -5 -6 -7

T -2 0 1 1 0 -1 -2 -3 -4 -5

A -3 -1 1 1 1 0 0 -1 -2 -3 Z + M/MM X + Gap

T -4 -2 0 2 1 1 0 0 -1 -1

G -5 -3 -1 1 3 2 1 1 0 -1
Y + Gap
C -6 -4 -2 0 2 4 3 2 2 1

A -7 -5 -3 -1 0 3 5 4 3 2

G -8 -6 -4 -2 0 2 4 6 5 4

C -9 -7 -5 -3 -1 1 3 5 7 6 Needleman Wunsch
algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A-ATGCAGCT
ATATGCAGC-
A A T G C A G C T Match = 1
Mismatch = 0
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Gap = -1

A -1 1 0 -1 -2 -3 -4 -5 -6 -7

T -2 0 1 1 0 -1 -2 -3 -4 -5
Z + M/MM X + Gap
A -3 -1 1 1 1 0 0 -1 -2 -3

T -4 -2 0 2 1 1 0 0 -1 -1
Y + Gap
G -5 -3 -1 1 3 2 1 1 0 -1

C -6 -4 -2 0 2 4 3 2 2 1

A -7 -5 -3 -1 0 3 5 4 3 2

G -8 -6 -4 -2 0 2 4 6 5 4 Needleman Wunsch
C -9 -7 -5 -3 -1 1 3 5 7 6 algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A-ATGCAGCT
ATATGCAGC-
A A T G C A G C T Match = 1
Mismatch = 0
0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Gap = -1

A -1 1 0 -1 -2 -3 -4 -5 -6 -7

T -2 0 1 1 0 -1 -2 -3 -4 -5
Z + M/MM X + Gap
A -3 -1 1 1 1 0 0 -1 -2 -3

T -4 -2 0 2 1 1 0 0 -1 -1
Y + Gap
G -5 -3 -1 1 3 2 1 1 0 -1

C -6 -4 -2 0 2 4 3 2 2 1

A -7 -5 -3 -1 0 3 5 4 3 2

G -8 -6 -4 -2 0 2 4 6 5 4 Needleman Wunsch
C -9 -7 -5 -3 -1 1 3 5 7 6 algorithm
Seq. 1: AATGCAGCT
Seq. 2: ATATGCAGC
A-ATGCAGCT
Match = 1
ATATGCAGC- Mismatch = 0
Gap = -1

Matches = 8
Mismatch = 0 M/MM Gap

Indel = 2

Total score = 8 x 1 + 0 + 2 x (-1) = 8-2 = 6 Gap

Needleman Wunsch
algorithm
Total score = 8 x 1 + 0 + 2 x (-1) = 8-2 = 6

A A T G C A G C T

0 -1 -2 -3 -4 -5 -6 -7 -8 -9

A -1 1 0 -1 -2 -3 -4 -5 -6 -7

T -2 0 1 1 0 -1 -2 -3 -4 -5

A -3 -1 1 1 1 0 0 -1 -2 -3

T -4 -2 0 2 1 1 0 0 -1 -1

G -5 -3 -1 1 3 2 1 1 0 -1

C -6 -4 -2 0 2 4 3 2 2 1

A -7 -5 -3 -1 0 3 5 4 3 2

G -8 -6 -4 -2 0 2 4 6 5 4
Needleman Wunsch
C -9 -7 -5 -3 -1 1 3 5 7 6
algorithm
Smith – Waterman Algorithm for
Local Alignment
It is a modification of Needleman-Wunch Algorithm

Modifications:
1. No negative values are allowed.
2. Negative values are replaced by zero.
3. Back tracking is initiated from the largest value in the last
column or last row.
4. Find the path for that value.
5. Then align the sequences based on the directions of the
arrows as done in Needleman-Wunch algorithm.
M/MM Gap

Xi,j = The largest value of - Gap

(i) Xi-1,j + gap, or

(ii) Xi,j-1 + gap, or

(iii) Xi-1, j-1 + (Match or Mismatch)

i and j are column and row numbers.


Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2

A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 7 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2

A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 7 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2

A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 7 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2

A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 7 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2

A A T C G A T C G G
0 0 0 0 0 0 0 0 0 0 0
T 0 0 0 2 0 0 0 2 0 0 0
C 0 0 0 0 4 2 0 0 4 2 0
A 0 2 2 0 2 3 4 2 2 3 1
A 0 2 4 2 0 1 5 3 1 1 2
G 0 0 2 3 1 2 3 4 2 3 3
T 0 0 0 4 2 0 1 5 3 1 2
C 0 0 0 2 6 4 2 3 77 5 3
Seq 1: AATCGATCGG Match = +2
Mismatch = -1
Seq 2: TCAAGTC Gap = -2

AATCGA_TCGG
TCAAGTC

Match = 5 x 2 = 10
Mismatch = 1 x (-1) = -1
Gap = 1 x (-2) = -2
Total score = +7
DOT PLOT METHOD

It’s a graphical method for comparing two


biological sequences and identifying regions
of close similarity after sequence alignment.
Seq 1: TAATCGATCGG
Seq 2: TTCGAGTCAG
T A A T C G A T C G G
T x x x
T x x x
C x x
G x x
A x x x
G x x x
T x x x
C x x
A x x x
G x x x
Seq 1: TAATCGATCGG
Seq 2: TTCGAGTCAG
T A A T C G A T C G G
T x x x
T x x x
C x x
G x x
Gap in
A x x x Seq 1

G x x x
T x x x
C x x
A x x x Break

G x x x
Seq 1: TAATCGATCGG
Seq 2: TTCGAGTCAG
T A A T C G A T C G G
T x x x
T x x x
C x x
G x x
Indel
A x x x
G x x x
T x x x
C x x
A x x x Substitution

G x x x
Seq 1: TAATCGATCGG TAATCGA –TCGG Final
alignment
Seq 2: TTCGAGTCAG TTCGAGTCAG
T A A T C G A T C G G
T x x x
T x x x
C x x
G x x
Indel
A x x x
G x x x
T x x x
C x x
A x x x Substitution

G x x x
Analysis of dot plot matrix
Thank you

You might also like