0% found this document useful (0 votes)
16 views11 pages

Tabby

The document discusses the Needleman-Wunsch and Smith-Waterman algorithms, both dynamic programming techniques used for sequence alignment in bioinformatics. The Needleman-Wunsch algorithm is designed for global alignment, while the Smith-Waterman algorithm focuses on local alignment, each serving different purposes and applications in evolutionary biology and genomics. Despite their computational complexities and limitations, both algorithms are essential tools for analyzing genetic sequences and understanding evolutionary relationships.

Uploaded by

peter ronoh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

Tabby

The document discusses the Needleman-Wunsch and Smith-Waterman algorithms, both dynamic programming techniques used for sequence alignment in bioinformatics. The Needleman-Wunsch algorithm is designed for global alignment, while the Smith-Waterman algorithm focuses on local alignment, each serving different purposes and applications in evolutionary biology and genomics. Despite their computational complexities and limitations, both algorithms are essential tools for analyzing genetic sequences and understanding evolutionary relationships.

Uploaded by

peter ronoh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Name:

Reg No:
Unit:
Title:
Needleman-Wunsch and Smith-Waterman Algorithms
Introduction
Comparative analysis is the backbone of evolutionary biology. The use of phenotypic variation
allowed Darwin to compose his theory of natural selection. The theory rests on the fact that the
transfer of the genetic code from parent to progeny does not exist without change. It is these
changes in the genetic sequence that allow for the divergence of species and thus provide a
backdrop for natural selection. Just as comparative analysis was key for evolutionary biology,
sequence alignment is the cornerstone of modern bioinformatics. Rapid and automated sequence
analysis facilitates everything from functional classification and structural determination of
proteins to studies of genetic expression and evolution (Sudha, 2014). The Needleman-Wunsch
and Smith-Waterman algorithms are both dynamic programming techniques used in
bioinformatics for sequence alignment. Despite their similarities, they serve different purposes
and are tailored to different types of alignment tasks.
1. Needleman-Wunsch
The Needleman-Wunsch algorithm, first published in 1970, is a pivotal method in bioinformatics
for finding the optimal global alignment of two sequences. This algorithm maximizes the number
of nucleotides or amino acid matches while minimizing the number of gaps necessary to align
the sequences. The Needleman-Wunsch algorithm is designed for global sequence alignment,
where it aligns the entire length of two sequences from start to end, ensuring that the alignment is
optimal according to a specified scoring system (Sudha, 2014). This makes it suitable for
comparing sequences that are expected to share a significant degree of similarity across their
entire lengths, such as homologous genes or proteins. The Needleman-Wunsch algorithm
consists of three steps:
1. Initialization of the score matrix
2. Calculation of scores and filling the trace-back matrix
3. Deducing the alignment from the trace-back matrix
Initialization of the Score Matrix:
The algorithm begins by initializing a matrix F of size (m+1) ×(n+1), where m and n are
sequences A and B lengths, respectively. The matrix is initialized to account for gap penalties and
matches/mismatches between sequence elements.
Calculation of Scores and Filling the Matrix:
Each cell F[i][j] in the matrix represents the optimal alignment score up to the i-th element of
sequence A and the j-th element of sequence B.
 Scores are calculated based on:
 F[i−1] [j−1] + score (A[i], B[j]: Matching or mismatching A[i] and B[j].
 F[i−1] [j]+gap penalty: Inserting a gap in sequence B.
 F[i][j−1] +gap penalty: Inserting a gap in sequence A.
The score for each cell F[i][j] is determined by choosing the maximum score among these
possibilities.
Trace-back to Deducing the Alignment:
Once the matrix F is filled, the optimal alignment is deduced by tracing back from the bottom-
right corner of the matrix to the top-left corner. Starting from F[m][n], the trace-back follows the
path that gives rise to the maximum alignment score, determining whether to align two matching
elements or to introduce gaps to maintain alignment.
Example:
Find the best alignment of these two sequences:
ACTGATTCA
ACGCATCA
Using -2 as a gap penalty, -3 as a mismatch penalty, and 2 as the score for a
match.
Solution:
Step 1: Draw the matrix
For the 2 sequences (length m and length n), what size scoring matrix is
needed for their alignment? Grid dimensions must be (m+1) × (n+1).
Thinking of each increment as a division of the sequence members and assign scores.
The optimal path is traced beginning from the lower right-hand corner.
A C T G A T T C A
-2 -4 -6 -8 -10 -12 -14 -16 -18
A -2 2 0 -2 -4 -6 -8 -10 -12 -14
C -4 0 4 2 0 -2 -4 -6 -8 -10
G -6 -2 2 1 4 2 0 -2 -4 -6
C -8 -4 0 -1 2 1 -1 -3 0 -2
A- 10 -6 -2 -3 0 4 2 0 -2 2

T -12 -8 -4 0 -2 2 6 4 2 0
-10 -6 -2 -4 0 4 2 6 4
C -14
-12 -8 -4 -5 -2 2 1 4 8
A -16

Result:
This analysis yielded the following alignment:
ACTG-ATTCA
| | | || |
AC-GCAT-CA
The alignment score is equal to the value in the lower right-hand corner of the matrix (8).
Limitations
The Needleman-Wunsch algorithm, while powerful for global alignment, has limitations:
Computational Complexity: It requires O(m×n) time and space complexity, where m and n are
the lengths of the sequences. This can be prohibitive for very large sequences.
Global alignment Only: It aligns entire sequences and cannot identify local regions of
similarity.
Applications
Comparative Genomics: Analyzing evolutionary relationships by comparing entire genomes or
long DNA sequences.
Protein Structure Prediction: Aligning protein sequences to understand structural similarities
and functional domains.
Gene Annotation: Identifying conserved regions and regulatory elements across species.
The Needleman-Wunsch algorithm provides an efficient way to find the optimal global
alignment of two sequences by balancing matches and gaps. Its systematic approach ensures that
the alignment reflects the best possible similarity between the sequences according to a defined
scoring system. Despite its limitations in computational demands and inability to find local
alignments, it remains indispensable in bioinformatics for a wide range of sequence comparison
tasks.
2. Smith-Waterman Algorithm (local alignment algorithm)
The Smith-Waterman algorithm is a dynamic programming method for determining the
similarity between nucleotide or protein sequences. The algorithm was first proposed in 1981 by
Smith and Waterman and identifies homologous regions between sequences by searching for
optimal local alignments. To find the optimal local alignment, a scoring system including a set of
specified gap penalties is used (Muhamad et al., 2015). Homology identified by sequence
database searches often implies shared functionality between sequences, and further research and
development might depend on the accuracy of the search results. The Smith-Waterman algorithm
is built on the idea of comparing segments of all possible lengths between two sequences to
identify the best local alignment. This means that the Smith-Waterman search is very sensitive
and ensures an optimal alignment of the sequences. Unfortunately, this also has the effect that the
method is both time and CPU-intensive.
Needleman and Wunsch were the first to introduce a heuristic alignment algorithm for
calculating homology between sequences (Patil, 2016). Later, a number of variations were
suggested, among others, to get closer to fulfilling the requests of biology by measuring the
metric distance between sequences (Sudha, 2014). Further development of this led to the Smith-
Waterman algorithm based on the calculation of local alignments instead of global alignments of
the sequences and allowing consideration of deletions and insertions of arbitrary length. The
Smith-Waterman algorithm is the most accurate algorithm when it comes to searching databases
for sequence homology, but it is also the most time-consuming. Thus, there has been a lot of
development and suggestions for optimizations and less time-consuming models. One example is
the well-known Basic Local Alignment Search Tool, BLAST. Only three slight modifications
were made to the Needleman-Wunsch algorithm to make it a local alignment algorithm. They are
1. In the initialization step, the first row and first column cells are filled with zero.
2. During dynamic programming (in matrix fill), if the score becomes negative,
the value is set to zero. This has the effect of changing the matrix score. The
implication of this is that there are no values below zero in a local alignment
scoring matrix.
3. While we must start tracing from the highest value in the cells in the matrix
Example:
Find the best local alignment between these two sequences:
ATGCATCCCATGAC
TCTATATCCGT
Using -2 as a gap penalty, -3 as a mismatch penalty, and 2 as the score for a match.
Solution:
Traceback begins at the highest value (which is also the alignment score).
A T G C A T C C C A T G A C
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

T 0 0 2 0 0 0 2 0 0 0 0 2 0 0 0
C 0 0 0 0 2 0 0 4 2 2 0 0 0 0 2
T 0 0 2 0 0 0 0 2 1 0 0 2 0 0 0
2 0 0 0 2 0 0 0 0 2 0 0 2 0
A 0
0 4 2 0 0 2 0 0 0 0 4 2 0 0
T 0
2 0 0 0 2 0 0 0 0 2 0 0 2 0
A 0
0 4 2 0 0 4 2 0 0 0 4 0 0 0
T 0
0 2 0 4 0 0 6 4 2 0 0 0 0 2
C 0
0 0 0 2 0 0 4 8 6 4 2 0 0 2
C 0
0 0 2 0 0 0 2 6 5 3 1 4 2 0
G 0
0 2 0 0 0 2 0 4 3 2 5 3 1 0
T 0

Which yields the alignment:


ATCC
| || |
ATCC; with an alignment score of 8.

Applications
1. Database Searches for Similar Sequences:
The Smith-Waterman algorithm is widely used to identify regions of similarity (homologous
regions) between sequences in databases. This is crucial for inferring evolutionary relationships
and functional similarities between genes or proteins. By identifying conserved regions, the
algorithm helps predict biological functions based on sequence similarity. This is particularly
useful in annotating newly sequenced genes or proteins.
2. Genome Annotation:
It aids in pinpointing conserved domains, regulatory motifs, or functional elements within
genomic sequences. This is essential for understanding gene regulation and protein function
across species.
3. Protein Structure and Function Prediction:
The algorithm can align protein sequences to predict their 3D structures or to infer structural
similarities, which are indicative of similar functions. It helps in identifying conserved domains
within proteins, which are critical for their biochemical functions.
4. Evolutionary Studies:
By aligning sequences from different species, the algorithm assists in reconstructing evolutionary
histories and understanding the divergence of species over time. It facilitates the comparison of
entire genomes or large genomic regions to study evolutionary relationships and genomic
rearrangements.
5. Clinical and Diagnostic Applications:
In clinical genomics, the algorithm is used to compare patient sequences against reference
sequences to identify disease-causing mutations or genetic variations.
Limitations
1. Computational Intensity:
The Smith-Waterman algorithm is computationally demanding, especially for large datasets. It
requires extensive computational resources to perform pairwise alignments due to its sensitivity
and exhaustive search for local alignments.
2. Scalability:
As sequence lengths increase, the time and memory requirements of the algorithm grow
significantly. This limits its application to very large datasets or genomes.
3. Sensitivity to Parameters:
The performance of the algorithm can be sensitive to the choice of scoring system, including gap
penalties and substitution matrices. Optimization of these parameters is crucial for obtaining
biologically meaningful results.
4. Alignment Ambiguity:
In regions where sequences exhibit complex patterns of mutations, insertions, or deletions, the
algorithm may struggle to accurately resolve the optimal alignment, leading to ambiguous
results.
5. Limitation to Local Alignments:
While powerful for local alignments, the algorithm is not suitable for identifying global
similarities across entire sequences, which is where algorithms like Needleman-Wunsch excel.
Discussion
The Smith-Waterman algorithm excels in identifying optimal local alignments within
sequences, making it a powerful tool for pinpointing regions of high similarity. This method is
computationally intensive due to its exhaustive search for the best local alignment, comparing
sequences character by character. Despite its precision in identifying homologous regions, its
time and memory requirements can be prohibitive for large-scale analyses. As a result, faster
alternatives like BLAST or FASTA are often preferred for handling massive datasets in
contemporary research (Saloom & Khafaji, 2023). However, the algorithm's sensitivity to local
similarities makes it indispensable in studies where identifying conserved motifs or regions of
suspected similarity within large sequences is crucial. By focusing on local alignments, the
Smith-Waterman algorithm avoids aligning less conserved regions, which is particularly
beneficial as protein sequences often exhibit higher mutation rates towards their ends than in
central domains.
In contrast, the Needleman-Wunsch algorithm aims for optimal global alignment across
entire sequences. It systematically evaluates matches, mismatches, and gaps to determine the
best alignment from start to finish. This makes it suitable for comparing sequences of similar
length and overall similarity, such as homologous genes or proteins. Its approach ensures a
comprehensive view of sequence similarity but comes at a cost: the algorithm's time and space
complexity scales with the product of sequence lengths, making it less efficient for lengthy
sequences (Shehab et al., 2012). Despite these computational demands, the Needleman-Wunsch
algorithm remains a cornerstone for tasks where precise global alignment is paramount, such as
comparative genomics and protein structure prediction.

Conclusion
Both the Smith-Waterman and Needleman-Wunsch algorithms offer optimal alignment
solutions tailored to different biological questions. The choice between them hinges on the
specific requirements of the analysis:
 Smith-Waterman is ideal for local alignments in sequences with suspected conserved
regions or motifs.
 Needleman-Wunsch is preferred for comparing sequences of similar length and overall
similarity across their entire length.
As bioinformatics continues to evolve with larger datasets and more complex analyses,
understanding the strengths and limitations of these algorithms is crucial for choosing the most
appropriate tool for the task at hand.
References
A.Shehab, S., Keshk, A., & Mahgoub, H. (2012). Fast Dynamic Algorithm for Sequence
Alignment Based On Bioinformatics. International Journal of Computer Applications,
37(7), 54–61. https://doi.org/10.5120/4624-6636
Muhamad, F. N., Ahmad, R. B., Asi, S. M., & Murad, M. N. (2015). Reducing the search space
and time complexity of Needleman-Wunsch algorithm (Global alignment) and smith-
waterman algorithm (local alignment) for DNA sequence alignment. Jurnal Teknologi,
77(20), 137–146. https://doi.org/10.11113/jt.v77.6564
Patil, J. G. (2016). Sequence Comparison Techniques for Biological Sequence Comparison. 3(5),
3–6.
Saloom, R. H., & Khafaji, H. K. (2023). Developing New Pairwise Sequence Alignment Method
Based on Needleman-Wunsch Algorithm. International Journal of Intelligent Engineering
and Systems, 16(2), 580–590. https://doi.org/10.22266/ijies2023.0430.48
Sudha, M. P. (2014). Sequence Alignment in DNA Using Smith Waterman and Needleman
Algorithms. 5(4), 5957–5960.

You might also like