Chapter 5 Pairwise Alignment

Uploaded by

Abderrahmane BeLakhdar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views8 pages

Chapter 5 Pairwise Alignment

Uploaded by

Abderrahmane BeLakhdar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Chapter 5 Pairwise alignment

5.1 Introduction
PSA is a technique to align two sequences that searches for the best and most efficient
pairwise alignments of a few query sequences using a database similarity search tool.
The method has found widespread application in the study of sequences for their
functional, evolutionary, and structural properties. When matched sequences reveal a
high degree of similarity, the two sequences can be considered members of the same
family. Pairwise alignments are used to compare only two sequences at once. They are
easy to calculate and are usually used for tasks that don’t require a high level of
accuracy.
In order to align anything less than an exact alphabetic match, the algorithm must be
aware of what it is looking for and how to evaluate the significance of what it finds. In
order to do this, “comparison matrices” have been developed, defining a value for each
and every potential match scenario effectively a score of how well the computational
alignment is performing. The algorithm will look for the best possible score. The total
score can only be used for the alignment it creates. It can’t be used for anything else.
The aim of pairwise sequence alignment is to find the best pairing of two sequences.
dot-matrix technique and dynamic programming are the most used approaches for
pairwise alignments. All three approaches have pros and cons, but they all have trouble
matching highly repeated sequences with little relevant information, especially when
the number of repetitions in the two sequences to be aligned is different.

5.2 Dot matrix method

The dot matrix method, which is otherwise called the dot plot method, is the most
central technique for sequence alignment. It’s a two-dimensional matrix comparison
tool that compares two sequences graphically.
The dot matrix method provides a description of two sequences and aims to classify the
areas that have the highest connections. The benefit of this procedure is that repeated
sequences dependent on the presence of equal boundaries of a similar dimension, in an
upward direction or on a level plane, in the matrix are recognized.
All possible sequence matches are shown using the dot matrix process. It is also the
responsibility of the user to create a full alignment by linking adjacent diagonals with
inserts and removals.
The top of the matrix lists the first sequence, indicated by the letter X, while the left side
lists the second sequence, indicated by the letter Y. Beginning with the first character in
Y, one proceeds across the column while remaining in the first row and adding a dot in
each column when the character in X is the same. The procedure is repeated once all
potential comparisons between X and Y have been performed.
The sections of a matching sequence will then be graphically represented by diagonal
lines. Any region of similarity may be identified with the use of a diagonal line of dot

33
markers. Dots that aren’t on the diagonal but are otherwise isolated signify random
matches.
Here is an example of a dot-matrix plot:
Sequence 1: G A T T C T A T C T A A C T
Sequence 2: G T T C T A T T C T A A C

dot-matrix plot can be used to visually identify sequence properties like insertions,
deletions, repetitions, and inverted repeats in the absence of noise.
The interpretation of dot matrices is as follows:
• areas of resemblance will show as diagonal runs of dots;
• inversions will be indicated by reverse diagonals that are perpendicular to the
diagonal;
• palindromes will be indicated by reverse diagonals that cross the diagonal
 The center of the diagonal line interferences mean additions or deletions.
 The repeated regions of the sequences represent parallel diagonal lines within
the matrix.
When using the dot matrix approach to compare large series, there is an issue called the
high noise level. Dots are plotted all over the graph in most of the dot plots, making it
difficult to identify the true alignment.
Dot plots provide several benefits, including the fact that they are quite simple to
implement. Its presentation makes it simple to comprehend. It illustrates every
combination of aligned pairs that is feasible. It is possible to employ it in conjunction
with other different approaches; it finds inverted and direct repeats, insertions, and
deletions much easier than the other, more automated approaches do.

34
dot-matrix is used in genomics. It can be used to find repeats of chromosomes and to
compare gene preservation between two closely associated genomes. It can also be used
in a series to detect auto-complementarity and classify secondary nucleic acid
structures.
Dot plots have drawbacks as a method of displaying information, including noise, a
lack of clarity, unintuitiveness, and difficulties in obtaining information on match
positions and summary statistics for the two sequences. There is a lot of unused space
in dot-plots since they can only display two sequences, and because the match data is
automatically reproduced across the diagonal, noise or empty space takes up a large
portion of the plot’s real size.

Another disadvantage of this method to visual analysis is that the assessment of

alignment precision lacks analytical rigour. In addition, the procedure is limited to
pairwise alignment. The method has a hard time scaling up to multiple alignments.
By comparing each character using a threshold value and window size, the noise
problem may be resolved (the size can be assigned according to the requirement). For a
dot to be drawn, a certain number of the matches in the window must be right.
Instead of scanning for similarities with a single residue, a filtering method must be
used, which uses a “window” of fixed length spanning a stretch of residue pairs to
minimize noise. Filtering compares all possible stretches by sliding windows between
the two sequences. Focuses must be situated if a bunch of residues equivalent to the size
of the window fit into a sequence. This technique has proved effective in reducing
levels of noise. Also known as a tuple, the window can be adapted to produce a
consistent sequence pattern. The sensitivity of the alignment is lost, however, if the
window size is too large.

35
Some examples of web servers using dot plots to compare sequences in pairs are given
below:
• Dotmatcher and Dottup are two EMBOSS applications that have been made
available online as part of the EMBOSS package. Dotmatcher aligns and displays
dot plots of two FASTA-formatted input sequences, which can be either DNA or
proteins. A scoring scheme and a window of a certain length are used. If the
resemblance between the positions of the windows is greater than a certain
threshold, diagonal lines are drawn over them. In order to coordinate sequences,
Dottup uses a word approach and can handle sequence length genome only if
diagonal lines match the same word length.
• Dothelix is a dot matrix software used to analyze sequences of different
macromolecules like DNA or protein. The program implements matrices for
protein sequences and offers a range of threshold options (similar to window
size). The program shows true pair alignment, besides drawing the diagonal line
over a certain threshold with similarity scores.
• MatrixPlot is a more advanced matrix alignment program for protein and
nuclear acid. Consumers may add details such as sequence logo profiles or
remote matrices from recognised 3D protein structures or nucleic acids. The
program uses colored grids to display an orientation or other user-defined
information instead of points and lines.

5.3 Dynamic programming

Dynamic programming is a technique for segmenting longer sequences into
manageable chunks, with each transition between pairs of characters in an alignment
accounting for all potential modifications.
An algorithm for dynamic programming consists of four components:
• a recursive formulation of the optimum score;
• a DP matrix for storing sub-problem optimal scores,
• a method of filling the matrix from the bottom up by tackling the most
elementary sub-problems first;
• a method of tracing back through the matrix to identify the details of the best
solution that produced the highest score.
The process of matching nucleotides to protein sequences, which is complicated by the
need to take care of frame shift mutations, can be aided by dynamic programming.
The method is particularly helpful for sequences with many indels since it may detect
frame shifts offset by any number of nucleotides, which makes it challenging to align
using more effective heuristic techniques. The programs BLAST and EMBOSS offer
fundamental tools for producing translated alignments. More broad techniques are
offered by the open-source application GeneWise. Given a certain function of scoring,
the dynamic programming approach is meant to produce the best alignment; however,
selecting a suitable function of scoring is often an empirical rather than a theoretical

36
procedure. Dynamic programming is slow when it has to deal with a lot of sequences or
sequences that are too long, even though it can include more than one sequence.
Two classical algorithms of dynamic programming are:
• Needleman and Wunsch (1970): For global alignment and the result contains all
residues in the alignment.
• Smith and Waterman (1981): For local alignment and the result contains only
certain parts of our sequences.
5.3.1 Needleman-Wunsch
When comparing biological sequences, this was one of the first instances of “dynamic
programming.” The method was developed by Saul B. Needleman and Christian D.
Wunsch, and it was first presented in the year 1970. The approach divides a significant
challenge, such as the whole of the sequence, into a series of more manageable issues,
and then makes use of the solutions found for the more manageable issues to locate an
optimal response to the significant challenge. Even today, the Needleman-Wunsch
method is frequently utilized, particularly when the quality of the global alignment
must be maintained. The technique gives a score to every possible alignment, and its
goal is to identify all of the alignments that have the greatest significant value.
The first component of Needleman and Wunsch’s ultra-algorithm generates all possible
alignments between any pair of sequences, considering their probabilities of being
similar, distinct, or containing some insertions along with deletions.
Following the completion of all of these steps, the total scores need to be summed in
order to select the best alignment possible from among all of the potential alignments
that were generated by the procedure. It is necessary to select the alignment with the
highest possible score.
Creating a two-dimensional (2D) matrix using the penalty scores for the match,
mismatch, and gap is a necessary step in the process. The matrix is solved in three
stages: the first stage is called initialization, the second stage is called matrix filling,
and the third stage is called traceback.
Step 1: Initialization table “T”
The formation of a scoring matrix begins with the placement of sequences, which are
placed on the x and y axes of the matrix.
While Seq1 “GAATTC” will be positioned at the “x” coordinate, Seq2 “GATAC” will be
positioned at the “y” coordinate. The first column and row of the matrix are initially
started from the (0,0) cell, with 0 being the first value in the cell. The gap score is added
to the adjacent cell of rows and columns.

Scores: starting score=0, match=2, mismatch=-1 and gap=-2

Construct a matrix F(i,j). Begin the matrix by initializing F(0,0)=0

37
Step 2: Filling the matrix
It is necessary to know the scores of the cells on the diagonal, left, and right in order to
get the maximum score of each cell. Match or mismatch (assumed) scores are added to
the diagonal score. In a similar fashion, the gap score is added to the values coming
from adjacent cells or boxes (horizontal and vertical). With these three numbers at your
disposal, the highest possible score can be obtained; use that to fill the ith and jth slots.
However, since maximum match alignment is required, only the highest value is placed
in the cell. Now fill each cell from the upper left hand corner according to the following
rule and considering the given scores:
F(i-1,j-1)+S(xi,yj)
F(i,j)=max F(i,j-1)+d
F(i-1,j)+d

Where, F(i,j): the row and column position in matrix

S(xi, yj): the match or mismatch score of the residues being aligned
d: gap penalty score
F(1,1)=max(F(0,0)+S(G,G), F(0,1)+d, F(1,0)+d)
=max(0+2, -2+(-2), -2+(-2))
=max(2, -4, -4)=2

38
Step 3: Traceback
The final score to be computed is the score of the best possible alignment of the whole
sequences. Nevertheless, the best alignment has yet to be determined. This is found by a
recursive matrix “traceback.” Starting with the bottom right corner cell, the algorithm
determines which of the three highest values was used to fill this cell, and the direction
from which that value came is highlighted or saved with a back arrow, before moving
to that cell to find the best path or alignment.
To fully construct the optimum alignment, the procedure is repeated until the cell (0,0)
is reached.

39
Rules to align the sequence

On the path graph:

Diagonal move Residue in Seq 1(i) is aligned with residue in Seq 2 (j)
Horizontal move a residue in the Seq 1(i) is aligned with the gap (-) in Seq 2
Vertical move a residue in the Seq 2 (j) is aligned with the gap (-) in the Seq 1

The score of both the alignments are same as the score in the last cell/box of the matrix
and hence both are optimal global alignments.

Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
An Introduction To Bioinformatics
No ratings yet
An Introduction To Bioinformatics
29 pages
Introduction To Bioinformatics Lecture 3
No ratings yet
Introduction To Bioinformatics Lecture 3
20 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Dot Plot Interpretation
No ratings yet
Dot Plot Interpretation
18 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
DOT PLOT and SEQUENTIAL ALIGNMENT
No ratings yet
DOT PLOT and SEQUENTIAL ALIGNMENT
22 pages
Week 4
No ratings yet
Week 4
38 pages
Unit 3 Bioinformatics
No ratings yet
Unit 3 Bioinformatics
11 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Sequence Comparison Part 2
No ratings yet
Sequence Comparison Part 2
17 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Chapter 7 Multiple Alignment
No ratings yet
Chapter 7 Multiple Alignment
6 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Sequence Analysis 1
No ratings yet
Sequence Analysis 1
7 pages
Interpretation
No ratings yet
Interpretation
2 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Multiple Sequence Alignments
No ratings yet
Multiple Sequence Alignments
9 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Lecture1 Loi
No ratings yet
Lecture1 Loi
52 pages
Sequencing Alignment & Its Methods Group II
No ratings yet
Sequencing Alignment & Its Methods Group II
12 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
No ratings yet
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
18 pages
Module II
No ratings yet
Module II
51 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
Protein Sequence Alignment Lecture Notes
No ratings yet
Protein Sequence Alignment Lecture Notes
2 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Class14-Pairwise Sequence Alignment and Multiple Sequence Alignment
No ratings yet
Class14-Pairwise Sequence Alignment and Multiple Sequence Alignment
27 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Sequence Analysis 2
No ratings yet
Sequence Analysis 2
13 pages
Sequence Alignment
No ratings yet
Sequence Alignment
17 pages
3rd Lec Details
No ratings yet
3rd Lec Details
2 pages
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
No ratings yet
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
6 pages
DNA Sequence Alignment
No ratings yet
DNA Sequence Alignment
21 pages
Group - 6 FINAL
No ratings yet
Group - 6 FINAL
17 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
Assignment No 02: Submitted To: Sir Mohammad Rizwan Submitted By: Rafiullah Reg#: SP20-BCS-064
No ratings yet
Assignment No 02: Submitted To: Sir Mohammad Rizwan Submitted By: Rafiullah Reg#: SP20-BCS-064
10 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
Optimal Alignment and Heuristic Solutions
No ratings yet
Optimal Alignment and Heuristic Solutions
7 pages
Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
AsBioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
AsBioinfo Ders 7 ALLIGNMENT - 1
9 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
Bioinformatics 03
No ratings yet
Bioinformatics 03
27 pages
ORF Finder Exercise-2
No ratings yet
ORF Finder Exercise-2
2 pages
Homology Modelling Notes PDF
No ratings yet
Homology Modelling Notes PDF
30 pages
Assignment.3.2.2 Phylogenetics Anaylsis-NCMJ
No ratings yet
Assignment.3.2.2 Phylogenetics Anaylsis-NCMJ
2 pages
Assignment I
No ratings yet
Assignment I
4 pages
15 Days Bioinformatics Industrial Internship
No ratings yet
15 Days Bioinformatics Industrial Internship
3 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
Bioinformatic
No ratings yet
Bioinformatic
7 pages
Get Understanding Bioinformatics 1st Edition Marketa Zveibil PDF Ebook With Full Chapters Now
100% (4)
Get Understanding Bioinformatics 1st Edition Marketa Zveibil PDF Ebook With Full Chapters Now
67 pages
Molecular Phylogeny Part I
No ratings yet
Molecular Phylogeny Part I
10 pages
Lecture01 Introduction FA24
No ratings yet
Lecture01 Introduction FA24
140 pages
Nucl. Acids Res. 2004 Stanke W309 12
No ratings yet
Nucl. Acids Res. 2004 Stanke W309 12
4 pages
Martin 2011
No ratings yet
Martin 2011
3 pages
Msa Notes
No ratings yet
Msa Notes
10 pages
Id Converters Test
No ratings yet
Id Converters Test
8 pages
Courses - Downloads - 2019 Batch - Dual Degree Curriculum
No ratings yet
Courses - Downloads - 2019 Batch - Dual Degree Curriculum
86 pages
Databases and Ontologies
No ratings yet
Databases and Ontologies
1 page
AlinhamentosMultiplos 2023-24
No ratings yet
AlinhamentosMultiplos 2023-24
24 pages
BIOINFO MT 1 Set A
No ratings yet
BIOINFO MT 1 Set A
1 page
Bioinformatics Lecture 5-9 Review
100% (4)
Bioinformatics Lecture 5-9 Review
44 pages
BLAST - A Heuristic Algorithm
No ratings yet
BLAST - A Heuristic Algorithm
18 pages
Rosetta Workshop Modeling
No ratings yet
Rosetta Workshop Modeling
40 pages
Genomics and Bioinformatics
No ratings yet
Genomics and Bioinformatics
34 pages
Metagen Overview
No ratings yet
Metagen Overview
1 page
BLOSUM
No ratings yet
BLOSUM
3 pages
(Computational Biology Series) Dmitrij Frishman, Manja Marz - Virus Bioinformatic-CRC Press (2021)
No ratings yet
(Computational Biology Series) Dmitrij Frishman, Manja Marz - Virus Bioinformatic-CRC Press (2021)
297 pages
Practical 2 Sequence Alignment
No ratings yet
Practical 2 Sequence Alignment
8 pages
Bioinformatics
No ratings yet
Bioinformatics
10 pages
It-461 Systems Biology
0% (1)
It-461 Systems Biology
2 pages
Bif401 Manual 2023
No ratings yet
Bif401 Manual 2023
27 pages

Chapter 5 Pairwise Alignment

Uploaded by

Chapter 5 Pairwise Alignment

Uploaded by

Chapter 5 Pairwise alignment

5.2 Dot matrix method

Another disadvantage of this method to visual analysis is that the assessment of

5.3 Dynamic programming

Scores: starting score=0, match=2, mismatch=-1 and gap=-2

Where, F(i,j): the row and column position in matrix

On the path graph:

You might also like