0% found this document useful (0 votes)

3 views28 pages

Multiple Alignment

The document discusses multiple sequence alignment (MSA), its importance for organizing sequence data, inferring phylogenetic relationships, and identifying conserved and variable regions. It outlines the challenges of aligning multiple sequences compared to pairwise alignments and describes methods such as Clustal W, which uses a progressive approach based on evolutionary relationships. The document also highlights scoring systems and the significance of consensus sequences in representing common features across aligned sequences.

Uploaded by

rebernate92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views28 pages

Multiple Alignment

Uploaded by

rebernate92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

Multiple Sequence Alignment

Ranojit Sarker
ITME

1
Reasons for aligning sets of
sequences
• Organise data to reflect sequence homology
• Infer phylogenetic trees from homologous
sites
• Highlight conserved sites/regions
• Highlight variable sites/regions
• Uncover changes in gene structure
• Summarise information
2
Multiple sequence alignments (MA)

• We may want to find the optimal alignment

of multiple sequences instead of pairs of
sequences.
• For instance, we have proteins with the
same function for multiple organisms: we
want to find out which parts of the
sequences match and which parts contain
most gaps and mismatches.
3
Multiple Alignments
• In theory, making an optimal alignment
between two sequences is computationally
straightforward (Smith-Waterman
algorithm), but aligning a large number of
sequences using the same method is almost
impossible.
• The problem increases exponentially with
the number of sequences involved
(the product of the sequence lengths)
4
Multiple Alignment: an extension of pair-wise alignment
Sequence a A C A - - - A T G
Sequence b T C A A C T A T C
Sequence c A C A C - - A G C
Sequence e A G A - - - A T C
Sequence d A C C G - - A T C

Assumptions:
1) Sequences are related by a common ancestor+
2) Sequence are independent*+
3) Positions (columns) of the alignment are independent*

* Actually, neither of these is true … but it makes the computation easier.

BUT, treating all the sequences as independent can cause

significant biases. +The sequences are related by a common ancestor – multiple,
closely related sequences can skew the alignment.

Ideally we would have a species tree and the sequences from each species …5
Multiple Sequence Alignment
Again, we need a scoring system and a search method.

Scoring:
a) Same substitution matrices and gap penalties as pair-wise
b) Score of an alignment is the ‘Sum of Pairs’ (SP)*

* Here is where having some very closely related species can skew things.

c) Other scoring systems also (like maximum entropy)

Alignment methods

1. Multi-dimensional dynamic programming

2. Profile Hidden Markov Model (HMM)

3. Progressive pair-wise alignment

6
Optimal Alignment
• For a given group of sequences, there
is no single "correct" alignment, only an
alignment that is "optimal" according to
some set of calculations.

• Determining what alignment is best for a

given set of sequences is really up to
the judgement of the investigator.

7
Pairwise vs Multiple Sequences

• Pairs of sequences typically aligned using

exhaustive algorithms (dynamic
programming)

• Multiple sequence alignment using heuristic

methods

8
• Sequence alignment is easy with
sufficiently closely related sequences

• Below a certain level of identity sequence

alignment may become meaningless
– twilight zone for aa sequences ~ 30%

• In the twilight zone it is good to make use

of additional information if possible (e.g.
structure)
9
evolution
• Biological sequences are not randomly
sampled.
• If two sequences are similar, most probably
they have an evolutionary relationship.
• When we have many similar sequences, we
can try to guess their relationship.
• This guess can drive the search for an
optimal multiple alignment.
10
Clustal W
• A heuristic program for multiple sequence
alignment: not exact, but quite fast.

• The main paper on it [Thompson, Higgins,

Gibson 1994] has over 10k citations,
making it the single most cited paper in
Computer Science.

11
How Clustal works
• Exploit the fact that similar sequences are
evolutionarily related.
• Build up a multiple alignment progressively by a
series of pairwise alignments, following the
branches of a guide tree.
• In brief: we first guess the evolutionary picture,
then we generate the alignment according to it.
• Naturally, the alignment will suggest an
evolutionary picture which might be different
from the one we guessed first.

12
How Clustal works
1) all pairs of sequences are aligned separately in
order to calculate a distance matrix giving the
divergence of each pair of sequences;
2) a guide tree is calculated from the distance matrix
(how? we’ll see);
3) the sequences are progressively aligned according
to the branching order (i.e. starting from the
closest pairs) in the guide tree.

13
step 1: pairwise alignments
• Global pairwise alignment between every
couple of sequences.

• If we have S sequences of length n, this

costs:
S(S-1)/2 alignments.

14
step1b: distance matrix
• The score of the alignment between any two
sequences is converted into a distance in [0
1] (1 being non-identical sequences).

• The distances are stored in an SxS

(symmetric) distance matrix.

15
distance matrix

seq A seq B seq C seq D

seq A - 0.2 0.1 0.6

seq B - 0.4 0.1

seq C - 0.6

seq D -

16
Step 2: the guide tree

• branch lengths proportional to estimated

divergence along each branch
• estimated divergence is based on distance
contained in the distance matrix
• in ClustalW the tree is built based on a
method called Neighbour Joining (NJ).

17
step 2: the guide tree
A

D
18
Progressive Pairwise
Methods
• Most of the available multiple alignment
programs use some sort of incremental
or progressive method that makes
pairwise alignments, then adds new
sequences one at a time to these
aligned groups.
• This is an approximate method!

19
Step 3: progressive alignment
• Align gradually sequences starting from the
closest ones on the tree. Each time sequences are
aligned, we make a further hypothesis as to how
evolution has worked.
• Every time an alignment is performed, the original
sequences are substituted with their alignment.
• Along the way we align alignments instead of
sequences. This is not a problem (can align
profiles against sequences, or profiles against
profiles)

20
progressive alignment
A

D
21
progressive alignment
A+C

D
22
progressive alignment
A+C

D
23
progressive alignment

(A+C) + B

D
24
progressive alignment

(A+C) + B

D
25
((A+C) + B) + D

26
ClustalW: a common heuristic multiple alignment program

Advantages of ClustalW:

1. Weights sequences according to evolutionary distance:

Sequences that are recently related through evolution are more
likely to be similar in sequence because they haven’t had time to
diverge

2. Uses different substitution matrices depending on sequence similarity

3. Uses affine gap penalties that are influenced by existing gaps in the
multiple alignment

4. Guided by guide tree to choose the order of the sequences to align

5. Takes advantage of profile alignment rather than only doing pairwise

alignments
27
Consensus Sequences
• Simplest Form:
A single sequence which represents the most common
amino acid/base in that position

Y D D G A V - E A L
Y D G G - - - E A L
F E G G I L V E A L
F D - G I L V Q A V
Y E G G A V V Q A L
Y D G G A/I V/L V E A L

(Methods in Molecular Biology, 2231) Kazutaka Katoh - Multiple Sequence Alignment - Methods and Protocols-Humana (2020)
No ratings yet
(Methods in Molecular Biology, 2231) Kazutaka Katoh - Multiple Sequence Alignment - Methods and Protocols-Humana (2020)
322 pages
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
No ratings yet
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
19 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
19 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Multiple Sequence Alignment 3
No ratings yet
Multiple Sequence Alignment 3
22 pages
1 T Coffee Dalign 18
No ratings yet
1 T Coffee Dalign 18
31 pages
Msa Notes
No ratings yet
Msa Notes
10 pages
Align 2
No ratings yet
Align 2
29 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
No ratings yet
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
22 pages
Drug Design Using Bioinformatics
100% (3)
Drug Design Using Bioinformatics
13 pages
Clustal W Multiple Sequence Alignment
No ratings yet
Clustal W Multiple Sequence Alignment
18 pages
Alignments Jmcinerney
No ratings yet
Alignments Jmcinerney
48 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Phylogeny
No ratings yet
Phylogeny
43 pages
AlinhamentosMultiplos 2023-24
No ratings yet
AlinhamentosMultiplos 2023-24
24 pages
Multiple Sequence Alignment (MSA)
No ratings yet
Multiple Sequence Alignment (MSA)
78 pages
Multiple Alignment PDF
No ratings yet
Multiple Alignment PDF
45 pages
Ploy BBB
No ratings yet
Ploy BBB
13 pages
Clustalw
No ratings yet
Clustalw
9 pages
Lec7 - Multiple Sequence Alignment
No ratings yet
Lec7 - Multiple Sequence Alignment
22 pages
Bioinformatics Lesson 05
No ratings yet
Bioinformatics Lesson 05
13 pages
Notes Bioinformatics
No ratings yet
Notes Bioinformatics
14 pages
Lec4 - Multiple Sequence Alignment
No ratings yet
Lec4 - Multiple Sequence Alignment
22 pages
Bioinformatics
No ratings yet
Bioinformatics
10 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Bio Lec 4
No ratings yet
Bio Lec 4
18 pages
L8 Msa
No ratings yet
L8 Msa
52 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
7 pages
Multiple Sequence Alignment Part 1
No ratings yet
Multiple Sequence Alignment Part 1
64 pages
Note 7 - Group 7 Scribbing
No ratings yet
Note 7 - Group 7 Scribbing
7 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Msa
No ratings yet
Msa
28 pages
List of Biological Databases
100% (1)
List of Biological Databases
8 pages
Multiple Sequence Alignment: Some Slides From Cuong Dang and Others
No ratings yet
Multiple Sequence Alignment: Some Slides From Cuong Dang and Others
27 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
24 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
18 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
Second - Done - w15 - 16 - A - Multiple Sequence Alignment
No ratings yet
Second - Done - w15 - 16 - A - Multiple Sequence Alignment
36 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Chapter 6 Multiple Sequence Alignment 2022 Bioinformatics For Everyone
No ratings yet
Chapter 6 Multiple Sequence Alignment 2022 Bioinformatics For Everyone
7 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
No ratings yet
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
12 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Sequence Allignment
No ratings yet
Sequence Allignment
5 pages
Lecture 101
No ratings yet
Lecture 101
43 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Analytical
No ratings yet
Analytical
24 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Bioinformatics-And-Phylogeny
No ratings yet
Bioinformatics-And-Phylogeny
14 pages
36) Corpet 1988
No ratings yet
36) Corpet 1988
10 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
MultipleSequenceAlignment 2021 PDF
No ratings yet
MultipleSequenceAlignment 2021 PDF
5 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Dendrogram
No ratings yet
Dendrogram
3 pages
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
No ratings yet
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
23 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Top 200 High Impact Journal - Engineering & IT
No ratings yet
Top 200 High Impact Journal - Engineering & IT
9 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
Genomics and Proteomics
No ratings yet
Genomics and Proteomics
3 pages
Bioinformatics Note
No ratings yet
Bioinformatics Note
7 pages
The Needleman Wunsch Algorithm For Sequence Alignment
No ratings yet
The Needleman Wunsch Algorithm For Sequence Alignment
46 pages
Bioinformatics Session8
No ratings yet
Bioinformatics Session8
33 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Homology Modeling
No ratings yet
Homology Modeling
22 pages
Lab 3
No ratings yet
Lab 3
6 pages
MBG2004 Genome-Transcriptome Assembly, Annotation and Comparison Week IX
No ratings yet
MBG2004 Genome-Transcriptome Assembly, Annotation and Comparison Week IX
52 pages
Chlamydomonas Molecular Genetics and Physiology
No ratings yet
Chlamydomonas Molecular Genetics and Physiology
283 pages
Eyrich Bioinformatics 2001
No ratings yet
Eyrich Bioinformatics 2001
2 pages
Kato Bridgious Exam Bioinformatics
No ratings yet
Kato Bridgious Exam Bioinformatics
17 pages
Gene Prediction Methods
No ratings yet
Gene Prediction Methods
2 pages
Diaporthe 1 EF1-728F
No ratings yet
Diaporthe 1 EF1-728F
9 pages
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
No ratings yet
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
9 pages
Basic Concept of Sequence Similarity Identity and Homology
No ratings yet
Basic Concept of Sequence Similarity Identity and Homology
17 pages
Multiple Sequence Alignment For Construction of Phylogenetic Tree
No ratings yet
Multiple Sequence Alignment For Construction of Phylogenetic Tree
5 pages
007 - Asma Saparas (SNF Lab 11) - BLAST
No ratings yet
007 - Asma Saparas (SNF Lab 11) - BLAST
6 pages
Sequencing Whole Genomes
No ratings yet
Sequencing Whole Genomes
7 pages
A Survey of The Most Recent Python Packages For Use in Biology
No ratings yet
A Survey of The Most Recent Python Packages For Use in Biology
4 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Standard-Slope Integration: A New Approach to Numerical Integration
From Everand
Standard-Slope Integration: A New Approach to Numerical Integration
Peter James Italia, MD
No ratings yet
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Multiple Alignment

Uploaded by

Multiple Alignment

Uploaded by

Multiple Sequence Alignment

• We may want to find the optimal alignment

* Actually, neither of these is true … but it makes the computation easier.

BUT, treating all the sequences as independent can cause

c) Other scoring systems also (like maximum entropy)

1. Multi-dimensional dynamic programming

2. Profile Hidden Markov Model (HMM)

3. Progressive pair-wise alignment

• Determining what alignment is best for a

• Pairs of sequences typically aligned using

• Multiple sequence alignment using heuristic

• Below a certain level of identity sequence

• In the twilight zone it is good to make use

• The main paper on it [Thompson, Higgins,

• If we have S sequences of length n, this

• The distances are stored in an SxS

seq A seq B seq C seq D

seq A - 0.2 0.1 0.6

seq B - 0.4 0.1

• branch lengths proportional to estimated

1. Weights sequences according to evolutionary distance:

2. Uses different substitution matrices depending on sequence similarity

4. Guided by guide tree to choose the order of the sequences to align

5. Takes advantage of profile alignment rather than only doing pairwise

You might also like