0% found this document useful (0 votes)
14 views

Sequence Alignment Methods

Sequence alignment is a technique used to compare biological sequences and identify similarities and differences. It involves arranging sequences to maximize matches and minimize mismatches. It has various applications in genomics, proteomics, evolutionary biology, drug discovery, forensics, and more. Common algorithms for sequence alignment include Needleman-Wunsch for global alignment and Smith-Waterman for local alignment.

Uploaded by

dharahasitha03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Sequence Alignment Methods

Sequence alignment is a technique used to compare biological sequences and identify similarities and differences. It involves arranging sequences to maximize matches and minimize mismatches. It has various applications in genomics, proteomics, evolutionary biology, drug discovery, forensics, and more. Common algorithms for sequence alignment include Needleman-Wunsch for global alignment and Smith-Waterman for local alignment.

Uploaded by

dharahasitha03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Sequence alignment methods

Module 2 – Unit - 01

Prof.T.C.Venkateswarulu
Department of Biotechnology
School of Biotechnology and Pharmaceutical Sciences
Sequence alignment
It is a computational technique used to compare and analyze the similarities and differences
between two or more sequences of biological data, such as DNA, RNA, or protein sequences. It
involves arranging the sequences in a way that maximizes matches or minimizes mismatches
and indels (insertions and deletions).
Why compare sequences?
o Find important molecular regions – conserved across species.
o Determine the evolutionary constraints at work
o Find mutations in a population or family of genes
o Find similar looking sequence in a database
o Find secondary/tertiary structure of a sequence of interest – molecular modeling using
a template (homology modeling)
(Ref: PowerPoint Presentation (cmu.edu); Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com))
Applications of sequence alignment
Sequence alignment has a wide range of applications in various fields, including:
Genomics
o It is also essential for identifying genes, regulatory regions, and functional elements
within genomes. Comparative genomics uses sequence alignment to study
evolutionary relationships between species and identify conserved regions.

Proteomics
o Sequence alignment is used to compare protein sequences and identify similarities,
functional domains, and motifs. It aids in predicting protein structure and function, as
well as understanding the relationship between protein sequences and their biological
activities.
(Ref: PowerPoint Presentation (cmu.edu); Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com))
Evolutionary Biology
o To study the evolutionary history of species by comparing DNA or protein
sequences. It helps in inferring phylogenetic trees and determining the
relatedness between different organisms.
Drug Discovery
o Assists in identifying potential drug targets by comparing protein sequences of
disease-related genes. It helps in understanding the functional implications of
genetic variations and mutations associated with diseases. Used in virtual
screening, where it aids in identifying drug candidates by aligning them with
target protein sequences.
Forensic Analysis
o In comparing DNA profiles obtained from crime scenes with those of potential
suspects.
Molecular Biology and Biotechnology Biodiversity and Conservation
o In designing primers for o To study genetic diversity within and

polymerase chain reaction between species, contributing to


(PCR) experiments, where biodiversity assessments and conservation
specific regions of DNA are efforts.
amplified. It is also used in Personalized Medicine
recombinant DNA technology to o To analyze an individual’s genetic variation

align DNA sequences for the and identify disease-causing mutations or


purpose of gene cloning, genetic predispositions. It aids in tailoring
genetic engineering, and gene treatment strategies and predicting drug
synthesis. responses based on an individual’s genetic
profile.
(Ref: PowerPoint Presentation (cmu.edu); Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com))
Types of Sequence Alignment
o Pairwise alignment and multiple
sequence alignment (MSA)

Pairwise sequence alignment


 identifies similar regions
between two biological
sequences (such as between
nucleotide and protein
sequences) and is useful for
analyzing functional, structural,
and evolutional relationships
between the two.
(Ref: Okada et al. BMC Bioinformatics (2015) 16:321 ; Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com))
Pairwise sequence alignment - Global and local alignment algorithms

o An end-to-end alignment of sequences and the latter produces alignments that

describe most similar regions within sequences.

o In particular, local alignment is useful for constructing a phylogenetic tree

because it can identify regions in which mutations such as the insertions or

deletions of nucleotides occurred in the evolutionary process.

(Ref: Okada et al. BMC Bioinformatics (2015) 16:321 ; Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com))
Pairwise sequence alignment - Global and local alignment algorithms
o The two commonly used algorithms for pairwise alignment are the
Needleman-Wunsch algorithm, which is based on dynamic programming
and is used for global alignment, and the Smith-Waterman algorithm,
which is used for local alignment.
o Dynamic programming is a popular approach for global pairwise
sequence alignment, with the Needleman-Wunsch algorithm being a
prominent example.
o The technique of global alignment involves the comparison of the
complete length of two sequences, whereas local alignment is centred on
the detection of particular regions of similarity present within the
sequences.
Dynamic programming
o Dynamic programming is solving complex problems by breaking them into a simpler subproblems.
o Needleman -Wunsch were the first to propose this method.
o Maximize a score of similarity to give maximum match.
Steps in dynamic programming
Initialization
o The first step in global alignment dynamic programming approach is to create a matrix with M+1
columns and N+1 rows where M and N correspond to the size of sequences to be aligned.
Matrix filling
o fill the matrix with highest possible score.
o To align with diagonal (align in next position)
o Align in half diagonal requires insertion of corresponding gaps.
Trace back and aligning
o Move from last corner and follow arrow.
Global alignment via Dynamic programming
 1st column and 1st row will be empty.
 Fill 1st block with zero.
 Than fill 1st row and 1st column with gap penalty multiples.
 While filling the matrix there are three possible values
 horizontal= score + gap penalty
 Vertical=score+ gap penalty
 Diagonal= score+(match/mismatch)
 to write max scores from these values in a cell, Let match=1, mismatch= -1, gap
penalty= -2

Ref: Mount. D. - Bioinformatics: Sequence and Genome Analysis, In- dian Edition, Cold Spring Harbor Lab, 2001.
Global alignment
Let Seq# 1 TCGCA Seq #2 TCCA

Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf

Backward Tracking: In backward tracking we have to move from last cell (lower corner) and follows
arrow from which cell the current cells values come from and go ahead.
Smith & Waterman, 1981
Similarity Scoring Expected value:
negative for random alignments,
positive for highly similar sequence Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
The Smith-Waterman Algorithm

Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Characteristics of local alignments

o The alignment can start/end at any point in the matrix.

o No negative scores in the alignment.

o The mean value of the scoring matrix (e.g. PAM, BLOSUM) should be negative, but

there should be positive scores in the scoring matrix

Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Let, Seq# 1 GAATTCAGTTA Seq #2 GGATCGA

Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Backward Tracking

Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Ref: https://www.cs.cmu.edu/~02710/Lectures/SeqAlign2015.pdf
Global Alignment – Tools

Global alignment tools create an end-to-end alignment of the sequences to be


aligned.
EMBOSS Needle
 EMBOSS Needle creates an optimal global alignment of two sequences using the
Needleman-Wunsch algorithm.
EMBOSS Stretcher
 EMBOSS Stretcher uses a modification of the Needleman-Wunsch algorithm that
allows larger sequences to be globally aligned.
LALIGN
 LALIGN finds internal duplications by calculating non-intersecting local
alignments of protein or DNA sequences.

Ref: < EMBL-EBI


Local Alignment - Tools
Local alignment tools find one, or more, alignments describing the most similar region(s)
within the sequences to be aligned. They are can align protein and nucleotide sequences.
EMBOSS Water
 EMBOSS Water uses the Smith-Waterman algorithm (modified for speed
enhancements) to calculate the local alignment of two sequences.
EMBOSS Matcher
 EMBOSS Matcher identifies local similarities between two sequences
using a rigorous algorithm based on the LALIGN application.
Dot Plot
o The dot plot is a graphical technique employed
to represent pairwise sequence alignments. The
process entails the representation of a sequence on
the horizontal axis and another sequence on the
vertical axis.
o Every point on the graph corresponds to a set of
aligned residues, and dots are situated at the
locations where the residues exhibit similarity.
Why use dot plots?
o A major benefit when using dot plots for alignment
is the ability to observe changes that occur across
sections of the sequence.
o Repeats within a sequence will not be
highlighted in a standard sequence alignment,
but because dot plots align a section of the
sequence to the entire query, all areas of alignment
are noted. Regions that contain repeats appear as
stacked diagonal lines (Figure ). Figure. Alignment of sequence with itself, containing
internal repeats.
Ref: https://en.vectorbuilder.com/tool/gc-content-calculator.html
Dot Plot

 Dot plot methods are

quite good to study the

structure of the

sequences involved.

They can show

repetitions, insertions

and deletions clearly.

Figure: Example of comparing two sequences using dot plots.


(Xiong, J., 2006).
Multiple sequence alignment (MSA)
o Multiple Sequence Alignment (MSA) is generally the alignment of three or more biological
sequences (protein or nucleic acid) of similar length. From the output, homology can be inferred
and the evolutionary relationships between the sequences studied.

Progressive Methods:
o Progressive methods are commonly used for MSA.
o Hogeweg and Hesper first formulated it. Progressive is a heuristics approach where
complex MSA problem is separated into subproblems.
o These algorithms build the alignment progressively by initially aligning pairs of
sequences and then incorporating additional sequences one by one.
o Popular progressive methods include: ClustalW, Clustal Omega, and T-Coffee.

Ref: https://doi.org/10.1016/j.ygeno.2017.06.007
Progressive Methods:
o It first performs the global pairwise alignment of the sequences and
develops a distance matrix. It then builds a guide tree based on the
matrix values.
o Finally, it generates a consensus alignment by gradually adding
sequences following the guide tree where the closest sequence pairs
(smallest branch length in guide tree) are aligned first and thus, it
gradually adds the next sequences.
o Guide tree guides the merging order of sequences based on the
pairwise distances calculated for all the possible sequence pairs to be
aligned in MSA

Ref: https://doi.org/10.1016/j.ygeno.2017.06.007
,
,
,

Figure - This is an example of


how a progressive alignment
performs MSA. The alignment
structure and the guide tree are
constructed by ClustalW. Each
node in the guide tree is
associated with an alignment.
Based on the pairwise distance
measures, s1 and s2 are aligned
first because of smallest distance
pairs (smallest branch length)
and later s3 is added to them.
Finally, the root node associates
all the sequences considered in
the MSA.
Ref: https://doi.org/10.1016/j.ygeno.2017.06.007
Iterative Methods:
 Iterative methods, also known as iterative refinement methods, improve the alignment
iteratively by refining an initial alignment.
 These algorithms typically involve three steps:
 (a) generating an initial alignment using a pairwise alignment algorithm,
 (b) estimating a new alignment based on the initial alignment, and
 (c) repeating the process until convergence.
 Common iterative methods include MUSCLE (Multiple Sequence Comparison by
Log-Expectation), MAFFT (Multiple Alignment using Fast Fourier Transform),
and ProbCons (Probability-based Consistency).
Multiple sequence alignment (MSA) – Online Tools
MUSCLE
o Accurate MSA tool, especially good with
proteins. Suitable for medium alignments. Clustal Omega
MView o New MSA tool that uses seeded guide
o Transform a Sequence Similarity Search trees and HMM profile-profile techniques
result into a Multiple Sequence Alignment to generate alignments. Suitable for
or reformat a Multiple Sequence Alignment medium-large alignments.
using the MView program. Cons (EMBOSS)
T-Coffee EMBOSS Cons creates a consensus
o Consistency-based MSA tool that attempts sequence from a protein or nucleotide
to mitigate the pitfalls of progressive multiple alignment.
alignment methods. Suitable for small Kalign
alignments. Very fast MSA tool that concentrates on
WebPRANK local regions. Suitable for large alignments.
o The EBI has a new phylogeny-aware MAFFT
multiple sequence alignment program MSA tool that uses Fast Fourier Transforms.
which makes use of evolutionary Suitable for medium-large alignments.
information to help place insertions and
deletions.
Ref: https://www.ebi.ac.uk/Tools/msa/
Heuristic methods for sequence alignment: BLAST and FASTA
Basic Local Alignment Search Tool
(BLAST)
o The Basic Local Alignment Search Tool
(BLAST) finds regions of local similarity
between sequences.
o The program compares nucleotide or
protein sequences to sequence
databases and calculates the
statistical significance of matches.
o BLAST can be used to infer functional
and evolutionary relationships between
sequences as well as help identify
members of gene families. Ref: https://blast.ncbi.nlm.nih.gov/Blast.cgi
FASTA (Fast All-At-Once Sequence Comparison)
o The FASTA algorithm, known as Fast All-At- o FASTA provides a heuristic search with

Once Sequence Comparison, is a commonly a protein query. FASTX and FASTY

employed method for conducting pairwise translate a DNA query.

sequence alignment.
o First described by David J. Lipman and o FASTA provides a heuristic search with

William R. Pearson, 1988. a nucleotide query. TFASTX and


o The methodology employed involves a heuristic TFASTY translate the DNA database

algorithm to locate proximate similarities for searching with a protein query.

among sequences.
o This method offers a rapid and highly
Ref: https://www.ebi.ac.uk/Tools/sss/fasta/genomes.html
responsive approach to comparing sequences.

Ref: Sequence Alignment - Definition, Types, Tools, Applications (microbiologynote.com)


)
Thank
You 03

All!
Text Here
Easy to change
colors.

NAAC
Accredited A+
Department of Biotechnology

You might also like