Sequence Alignment Methods
Sequence Alignment Methods
Module 2 – Unit - 01
Prof.T.C.Venkateswarulu
Department of Biotechnology
School of Biotechnology and Pharmaceutical Sciences
Sequence alignment
It is a computational technique used to compare and analyze the similarities and differences
between two or more sequences of biological data, such as DNA, RNA, or protein sequences. It
involves arranging the sequences in a way that maximizes matches or minimizes mismatches
and indels (insertions and deletions).
Why compare sequences?
o Find important molecular regions – conserved across species.
o Determine the evolutionary constraints at work
o Find mutations in a population or family of genes
o Find similar looking sequence in a database
o Find secondary/tertiary structure of a sequence of interest – molecular modeling using
a template (homology modeling)
(Ref: PowerPoint Presentation (cmu.edu); Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com))
Applications of sequence alignment
Sequence alignment has a wide range of applications in various fields, including:
Genomics
o It is also essential for identifying genes, regulatory regions, and functional elements
within genomes. Comparative genomics uses sequence alignment to study
evolutionary relationships between species and identify conserved regions.
Proteomics
o Sequence alignment is used to compare protein sequences and identify similarities,
functional domains, and motifs. It aids in predicting protein structure and function, as
well as understanding the relationship between protein sequences and their biological
activities.
(Ref: PowerPoint Presentation (cmu.edu); Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com))
Evolutionary Biology
o To study the evolutionary history of species by comparing DNA or protein
sequences. It helps in inferring phylogenetic trees and determining the
relatedness between different organisms.
Drug Discovery
o Assists in identifying potential drug targets by comparing protein sequences of
disease-related genes. It helps in understanding the functional implications of
genetic variations and mutations associated with diseases. Used in virtual
screening, where it aids in identifying drug candidates by aligning them with
target protein sequences.
Forensic Analysis
o In comparing DNA profiles obtained from crime scenes with those of potential
suspects.
Molecular Biology and Biotechnology Biodiversity and Conservation
o In designing primers for o To study genetic diversity within and
(Ref: Okada et al. BMC Bioinformatics (2015) 16:321 ; Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com))
Pairwise sequence alignment - Global and local alignment algorithms
o The two commonly used algorithms for pairwise alignment are the
Needleman-Wunsch algorithm, which is based on dynamic programming
and is used for global alignment, and the Smith-Waterman algorithm,
which is used for local alignment.
o Dynamic programming is a popular approach for global pairwise
sequence alignment, with the Needleman-Wunsch algorithm being a
prominent example.
o The technique of global alignment involves the comparison of the
complete length of two sequences, whereas local alignment is centred on
the detection of particular regions of similarity present within the
sequences.
Dynamic programming
o Dynamic programming is solving complex problems by breaking them into a simpler subproblems.
o Needleman -Wunsch were the first to propose this method.
o Maximize a score of similarity to give maximum match.
Steps in dynamic programming
Initialization
o The first step in global alignment dynamic programming approach is to create a matrix with M+1
columns and N+1 rows where M and N correspond to the size of sequences to be aligned.
Matrix filling
o fill the matrix with highest possible score.
o To align with diagonal (align in next position)
o Align in half diagonal requires insertion of corresponding gaps.
Trace back and aligning
o Move from last corner and follow arrow.
Global alignment via Dynamic programming
1st column and 1st row will be empty.
Fill 1st block with zero.
Than fill 1st row and 1st column with gap penalty multiples.
While filling the matrix there are three possible values
horizontal= score + gap penalty
Vertical=score+ gap penalty
Diagonal= score+(match/mismatch)
to write max scores from these values in a cell, Let match=1, mismatch= -1, gap
penalty= -2
Ref: Mount. D. - Bioinformatics: Sequence and Genome Analysis, In- dian Edition, Cold Spring Harbor Lab, 2001.
Global alignment
Let Seq# 1 TCGCA Seq #2 TCCA
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Backward Tracking: In backward tracking we have to move from last cell (lower corner) and follows
arrow from which cell the current cells values come from and go ahead.
Smith & Waterman, 1981
Similarity Scoring Expected value:
negative for random alignments,
positive for highly similar sequence Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
The Smith-Waterman Algorithm
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Characteristics of local alignments
o The mean value of the scoring matrix (e.g. PAM, BLOSUM) should be negative, but
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Let, Seq# 1 GAATTCAGTTA Seq #2 GGATCGA
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Backward Tracking
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Global Alignment – Tools
structure of the
sequences involved.
repetitions, insertions
Progressive Methods:
o Progressive methods are commonly used for MSA.
o Hogeweg and Hesper first formulated it. Progressive is a heuristics approach where
complex MSA problem is separated into subproblems.
o These algorithms build the alignment progressively by initially aligning pairs of
sequences and then incorporating additional sequences one by one.
o Popular progressive methods include: ClustalW, Clustal Omega, and T-Coffee.
Ref: https://doi.org/10.1016/j.ygeno.2017.06.007
Progressive Methods:
o It first performs the global pairwise alignment of the sequences and
develops a distance matrix. It then builds a guide tree based on the
matrix values.
o Finally, it generates a consensus alignment by gradually adding
sequences following the guide tree where the closest sequence pairs
(smallest branch length in guide tree) are aligned first and thus, it
gradually adds the next sequences.
o Guide tree guides the merging order of sequences based on the
pairwise distances calculated for all the possible sequence pairs to be
aligned in MSA
Ref: https://doi.org/10.1016/j.ygeno.2017.06.007
,
,
,
sequence alignment.
o First described by David J. Lipman and o FASTA provides a heuristic search with
among sequences.
o This method offers a rapid and highly
Ref: https://www.ebi.ac.uk/Tools/sss/fasta/genomes.html
responsive approach to comparing sequences.
All!
Text Here
Easy to change
colors.
NAAC
Accredited A+
Department of Biotechnology