0% found this document useful (0 votes)

52 views20 pages

Bioinformatics 1 - Lecture 8: Multiple Sequence Alignment

1. Muscle performs multiple sequence alignment in an iterative manner, building an initial alignment, calculating distances, building a new tree and refining the alignment repeatedly until convergence. 2. It splits branches of the phylogenetic tree and realigns the resulting profile alignments, only keeping changes that improve the alignment score. 3. This allows it to incorporate more information from related sequences compared to progressive methods, improving alignment accuracy.

Uploaded by

Mohsan Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views20 pages

Bioinformatics 1 - Lecture 8: Multiple Sequence Alignment

Uploaded by

Mohsan Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

BIoinformatics 1-- lecture 8

Multiple sequence alignment

pop-quiz explanation
• At the end of lecture 7, I presented a pop quiz. How many random
scores have e-values less than or equal to 10? (Answer? 10) Why?
• Consider this random score distribution. Each zone marked has the
same area, right?
• If you randomly pick from this distribution (think about darts), what
does the distribution zones (e-values) look like? (it’s flat!)
m* P(S≥x) = 10
m*P(S≥x) = 2

m* P(S≥x) = 4 m*P(S≥x) = 1

m* P(S≥x) = 3 2

freq.

1 2 3 4 5 6 7 8 9 10
m* P(S≥x)
In class competition exercise:
Editing a multiple sequence alignment in
UGENE
• Download and open “bad alignment”
from the course web page
• Export all sequences as alignment.
• Edit the alignment.
• Try to improve the %identity, and
consolidate gaps.

How many indel events are implied by your alignment?

3
Methods for multiple
sequence alignment
• Dynamic programming
• Star
• Progressive
– ClustalW, uses variable gap penalty
– Muscle, stochastic. Uses profiles.
– Kalign. Very fast. Uses exact match.

MSA algorithms must be computationally efficient

AND biologically relevant. 4
Applications of MSAs
• Phylogenetic analysis
• Function prediction
• Structure prediction

5
• Is optimality possible? DP for three or more
sequences.

A 3D alignment matrix... DP in 3D
S(i,j,k) = MAX {
A(i-1,j-1,k-1)+S(i,j,k),
A(i-1,j,k)-gap,
A(i,j-1,k)-gap,
A(i,j,k-1)-gap,
A(i-1,j-1,k)-gap,
A(i-1,j,k-1)-gap,
A(i,j-1,k-1)-gap }
How many more arrows when we
add a 4th seq?
How does DP scale? 6
Multiple sequence alignment -- Star
method
1. Align all sequences to one sequence.
2. Stack them up. B
D
C
A

E G

Potential problems with

F
star alignment: A G H . I . W W . P F W P
•Unaligned gaps. A G H . I I F W . P Y . .
•Ambiguous associations
A G H I I . . W F P F W P
A G H . I P W W . P . . .

Each pairwise alignment by itself looks fine, but when you stack them up, you see disagreements. 7
BLAST query-anchored alignments are
star alignments

8
Multiple sequence alignment --
Progressive method
distance
1. Align all pairs. Save scores in matrix
2. Pairwise align two most similar.
guide tree
3. Add the next most similar sequence. Etc.
4. Continue until all sequences are aligned

Current alignment { A G H I . W W P F
A G H I I F W P Y
sequence A DP alignment matrix
to add W
P
Y S(P,[W,F]) =(1/2)(S(P,W) + S(P,F))
9
Distance and similarity are
interconvertable metrics.
Maximizing similarity and Minimizing distance are
equivalent if
• d(i,j) + s(i,j) = smax,
where smax is the maximum possible similarity, and
the minimum distance is d=0. For each position in
the alignment.
• Distance based on identity score
(p-distance) dJC
! d = 100 - %identity
• Distance using empirical J-C correction
! dJC = -ln((Sreal-Srand)/(Sident-Srand)) sreal
where Sident = score of an identity alignment, and
Srand = mode score of a false alignment.
• For proteins, Srand ≈ 25%. “Twilight zone”
(R. Doolittle, 1986)
In class: progressive alignment
Making a guide tree
Neighbor-joining algorithm:

A B C D E F A
A 97 81 82 59 32 B
B 77 80 55 31 C
C 90 65 40 D
D 61 42 E
E 33 F
F
Draw guide tree here
Fill in J-C distances.
CLUSTALW
JD Thompson, DG Higgins, TJ Gibson - Nucleic acids research, 1994

• Start with unrooted tree, using Neighbor

joining.
• choose root to get guide tree
• progressive alignment
– matches are scored using sequence weights
– gaps are position dependent
• GOP lower for polar residues
• GOP zero where there is already a gap
http://www.ebi.ac.uk/Tools/msa/clustalw2/
http://www.ch.embnet.org/software/ClustalW-XXL.html 12
There should be no gap penalty for
aligning a gap to an already existing gap.
If i is already a gap position in any sequence, set gap(i)=0.

A G H I . W W P F
A G H I I F W P Y
A
W
P
Y 3

A(i,j) = A(i-1,j) - gap(i)

NOTE: DP is still optimal when the gap penalty is position-specific.
13
CLUSTALW Position specific gap penalty

14
MUSCLE
RC Edgar - Nucleic acids research, 2004

• Iterative MSA
based on short identical
– k-mer distance matrix
matches
– UPGMA tree
– progressive alignment--> MSA1
– Kimura distances from MSA1
– UPGMA tree
– progressive alignment -->MSA2
– For all tree branches:
• split tree into two
Z&B p174
• calculate profiles
• align profiles
• accept or reject the alignment.
• Repeat

15
MUSCLE iterative alignment
XP_001615335
XP_002259219
YEPTDKEMDDILSAYFFYPSYKDYTRYVVDIFHRNYVSIFIYGNIAMPTEKEDENATS--
YDPTDKEMDDLLSAYFFYPSYKDYTKYVVDFFHRNYVSIFIYGNIAMTTEKENENATS--
phylogenetic tree
XP_001347897 YTPTNKEMYDILNAYFFYPSYNAYRTYVNEYFLRNYVVIFIYGNIIISDLKGEENITKNN
XP_726635 YIPTNKEIYDILNAYLFYPLYNSYIKYINNFFHKNYINIFIYGNLSIPNEINIKNETN--
XP_671449
XP_001458064
------------------------------------------------------------
VVQAQYYTAELFLEELNILDLESLQQFHSNYFSNFRVSSFVSGNILRSEVEDLLHSIR-- X
XP_001347129 VVQAQYYTSQLFQDELATLDLESLQEFHSNYFSNFRVSSFVSGNILRSEVEDLLHTIR--
XP_002283970 DNTWPWMDG---LEVIPHLEADDLAKFVPMLLSRAFLECYIAGNIEPKEAEAMIHHIE--
XP_002367832 RNRFSQLDLRSAVTDASS-QFEDFKVFLEKVLTKNALDVFIMGDIDYEEARKLAEDFRAA
random cut point

YEPTDKEMDDILSAYFFYPSYKDYTRYVVDIFHRNYVSIFIYGNIAMPTEKEDENATS--
YDPTDKEMDDLLSAYFFYPSYKDYTKYVVDFFHRNYVSIFIYGNIAMTTEKENENATS--
YTPTNKEMYDILNAYFFYPSYNAYRTYVNEYFLRNYVVIFIYGNIIISDLKGEENITKNN
YIPTNKEIYDILNAYLFYPLYNSYIKYINNFFHKNYINIFIYGNLSIPNEINIKNETN-- YEPTDKEMDDILSAYFFYPSYKDYTRYVVDIFHRNYV..SIFIYGNIAMPTEKEDENATS--
YDPTDKEMDDLLSAYFFYPSYKDYTKYVVDFFHRNYV..SIFIYGNIAMTTEKENENATS--
VVQAQYYTAELFLEELNILDLESLQQFHSNYFSNFRVSSFVSGNILRSEVEDLLHSIR--
VVQAQYYTSQLFQDELATLDLESLQEFHSNYFSNFRVSSFVSGNILRSEVEDLLHTIR--
DNTWPWMDG---LEVIPHLEADDLAKFVPMLLSRAFLECYIAGNIEPKEAEAMIHHIE--
RNRFSQLDLRSAVTDASS-QFEDFKVFLEKVLTKNALDVFIMGDIDYEEARKLAEDFRAA

YTPTNKEMYDILNAYFFYPSYNAYRTYVNEYFLRNYV..FIYGNIIISDLKGEENITKNN
YIPTNKEIYDILNAYLFYPLYNSYIKYINNFFHKNYI..NIFIYGNLSIPNEINIKNETN--
VVQAQYYTAELFLEELNILDLESLQQFHS..NYFSNFRVSSFVSGNILRSEVEDLLHSIR--
VVQAQYYTSQLFQDELATLDLESLQEFHS..NYFSNFRVSSFVSGNILRSEVEDLLHTIR--
DNTWPWMDG---LEVIPHLEADDLAKFVP..MLLSRAFLECYIAGNIEPKEAEAMIHHIE--
RNRFSQLDLRSAVTDASS-QFEDFKVFLE..KVLTKNALDVFIMGDIDYEEARKLAEDFRAA

new MSA
In each iteration:
The phylogenetic tree
is cut at a random
branch, the two
subtrees are converted
to profiles, and aligned.
The new alignment is
either accepted or
rejected 16
DP profile-profile alignment
Databases of multiple
sequence alignments
• bAliBase -- structural alignment-based
• BLOCKS -- gapless regions
• PFAM -- Hidden Markov models
• CDD -- conserved domain database
• FSSP -- structural alignment-based
(families)

17
UGENE podcast: large
alaignments
• Watch UGENE podcast #13
• http://ugene.unipro.ru/
podcast_archive.html

– Reproduce the steps from the podcast!

18
Selective re-alignment
• Global affine-gap DP alignment may
be used to refine an alignment between
two, conserved and confidently aligned
columns.
– Select. Align with MUSCLE. Selected
columns.
– Or, paste into ClustalW web site. Use same
penalty for opening gap and end gap.

19
Review
• Are multiple sequence alignments optimal?
• How is phylogenetic information used in MSA
algorithms?
• What are the advantages/disadvantages of a
“star” alignment?
• What information is ClustalW encoding in its
MSA algorithm?
• What is the outermost loop in the MUSCLE
alignment probably look like?
20

1 T Coffee Dalign 18
No ratings yet
1 T Coffee Dalign 18
31 pages
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
No ratings yet
Lecture 7: Multiple Sequence Alignment (MSA) What Is Multiple Sequence Alignment?
6 pages
Align 2
No ratings yet
Align 2
29 pages
Msa Notes
No ratings yet
Msa Notes
10 pages
BLAST (Basic Local Alignment Search Tool)
100% (1)
BLAST (Basic Local Alignment Search Tool)
23 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
7 pages
Multiple Alignment PDF
No ratings yet
Multiple Alignment PDF
45 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Msa
No ratings yet
Msa
28 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
19 pages
Unit 3 Bioinformatics
No ratings yet
Unit 3 Bioinformatics
11 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Aligner: Shyamlal Bhue
No ratings yet
Aligner: Shyamlal Bhue
8 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Lec7 - Multiple Sequence Alignment
No ratings yet
Lec7 - Multiple Sequence Alignment
22 pages
Lec4 - Multiple Sequence Alignment
No ratings yet
Lec4 - Multiple Sequence Alignment
22 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Clustalw
No ratings yet
Clustalw
9 pages
Note 7 - Group 7 Scribbing
No ratings yet
Note 7 - Group 7 Scribbing
7 pages
Bookmark This Page
No ratings yet
Bookmark This Page
35 pages
Bio Lec 4
No ratings yet
Bio Lec 4
18 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
No ratings yet
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
53 pages
Multiple Sequence Alignment (MSA)
No ratings yet
Multiple Sequence Alignment (MSA)
78 pages
Bioinformatics Lesson 05
No ratings yet
Bioinformatics Lesson 05
13 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
18 pages
05 CAP5510 Fall21
No ratings yet
05 CAP5510 Fall21
40 pages
L8 Msa
No ratings yet
L8 Msa
52 pages
Lecture 8 - BLAST - MSA
No ratings yet
Lecture 8 - BLAST - MSA
15 pages
04 - Cell Cycle and Cell Division
No ratings yet
04 - Cell Cycle and Cell Division
27 pages
Multiple Sequence Alignment: Some Slides From Cuong Dang and Others
No ratings yet
Multiple Sequence Alignment: Some Slides From Cuong Dang and Others
27 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
MultipleSequenceAlignment 2021 PDF
No ratings yet
MultipleSequenceAlignment 2021 PDF
5 pages
Lecture 101
No ratings yet
Lecture 101
43 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
36) Corpet 1988
No ratings yet
36) Corpet 1988
10 pages
Cell Differentiation Exercises 10.4
No ratings yet
Cell Differentiation Exercises 10.4
4 pages
Second - Done - w15 - 16 - A - Multiple Sequence Alignment
No ratings yet
Second - Done - w15 - 16 - A - Multiple Sequence Alignment
36 pages
Genome Functional Annotation
No ratings yet
Genome Functional Annotation
24 pages
Multiple Sequence Alignment Part 1
No ratings yet
Multiple Sequence Alignment Part 1
64 pages
Module - 4 - Reference Course Content
No ratings yet
Module - 4 - Reference Course Content
25 pages
Concept of Secondary Productivity
No ratings yet
Concept of Secondary Productivity
16 pages
Genetics - Chapter 14
No ratings yet
Genetics - Chapter 14
10 pages
Chapter I Ecosystem
No ratings yet
Chapter I Ecosystem
37 pages
Multiple Alignment
No ratings yet
Multiple Alignment
28 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Energy Flow and Nutrient Cycling
No ratings yet
Energy Flow and Nutrient Cycling
15 pages
Analytical
No ratings yet
Analytical
24 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Multiple Sequence Alignment 3
No ratings yet
Multiple Sequence Alignment 3
22 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
9th Class Biology Test Paper 5
100% (1)
9th Class Biology Test Paper 5
1 page
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Protein Sequence Database Ankita Sharma
No ratings yet
Protein Sequence Database Ankita Sharma
31 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Ab Production Theories
No ratings yet
Ab Production Theories
15 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Mitosis Vs Meiosis
100% (1)
Mitosis Vs Meiosis
7 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
LSM3233 Lecture 2 Notes 2013
No ratings yet
LSM3233 Lecture 2 Notes 2013
44 pages
Lecture 7
No ratings yet
Lecture 7
38 pages
Asexual Reproduction: Name:-Siddhesh Taraate STD: - 10 Venus Sub: - Biology
No ratings yet
Asexual Reproduction: Name:-Siddhesh Taraate STD: - 10 Venus Sub: - Biology
16 pages
Lecture Note On Cell Division, Prof. Faye SY Tsang
No ratings yet
Lecture Note On Cell Division, Prof. Faye SY Tsang
37 pages
Interphase: Haploid Cells With Unreplicated Chromosomes
No ratings yet
Interphase: Haploid Cells With Unreplicated Chromosomes
16 pages
Uttam's Ecology
No ratings yet
Uttam's Ecology
19 pages
Avian Viruses
No ratings yet
Avian Viruses
10 pages
(ASC-2012-0548-zj) Expression of Chicken Toll Like Receptors and Signal Adaptors in Spleen and Cecum of Young Chickens Infected With Eimeria Tenella
No ratings yet
(ASC-2012-0548-zj) Expression of Chicken Toll Like Receptors and Signal Adaptors in Spleen and Cecum of Young Chickens Infected With Eimeria Tenella
10 pages
Engineering Apomixis: Clonal Seeds Approaching The Fields: Annual Review of Plant Biology
No ratings yet
Engineering Apomixis: Clonal Seeds Approaching The Fields: Annual Review of Plant Biology
28 pages
CHapter 19
No ratings yet
CHapter 19
17 pages
Transcription Factors: A2 Biology Topic 7
No ratings yet
Transcription Factors: A2 Biology Topic 7
6 pages
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
No ratings yet
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
32 pages
Ecosystem and Interaction
No ratings yet
Ecosystem and Interaction
16 pages
Anti IFN Viru Research
No ratings yet
Anti IFN Viru Research
8 pages
Introduction To Reproduction
No ratings yet
Introduction To Reproduction
10 pages
Cell Division Meiosis
No ratings yet
Cell Division Meiosis
3 pages
Lac Operon Assignment
No ratings yet
Lac Operon Assignment
1 page
Cellular Reproduction Study Guide Cell Cycle Regulation
No ratings yet
Cellular Reproduction Study Guide Cell Cycle Regulation
4 pages
Sixth Grade Work For Cell Reproduction and Mitosis
No ratings yet
Sixth Grade Work For Cell Reproduction and Mitosis
8 pages
APO2
No ratings yet
APO2
1 page
10 Cell Cycle N Cell Division-Sample Notes 2021
No ratings yet
10 Cell Cycle N Cell Division-Sample Notes 2021
2 pages
Describe Gene Regulation in Prokaryotes
No ratings yet
Describe Gene Regulation in Prokaryotes
5 pages
Biology Staar Review Stations Day 2
No ratings yet
Biology Staar Review Stations Day 2
16 pages
Planaria Lab Report
No ratings yet
Planaria Lab Report
4 pages
Differential Equations (Calculus) Mathematics E-Book For Public Exams
From Everand
Differential Equations (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Bioinformatics 1 - Lecture 8: Multiple Sequence Alignment

Uploaded by

Bioinformatics 1 - Lecture 8: Multiple Sequence Alignment

Uploaded by

BIoinformatics 1-- lecture 8

Multiple sequence alignment

How many indel events are implied by your alignment?

MSA algorithms must be computationally efficient

Potential problems with

• Start with unrooted tree, using Neighbor

A(i,j) = A(i-1,j) - gap(i)

– Reproduce the steps from the podcast!

You might also like