0% found this document useful (0 votes)

214 views23 pages

Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary

Multiple sequence alignment is used to compare three or more biological sequences like DNA, RNA or proteins. It identifies regions of similarity and divergence between sequences that may have a common evolutionary origin. Most programs use heuristic methods to find suboptimal but computationally efficient alignments rather than optimal alignments. Well-known tools for multiple sequence alignment include ClustalW, T-Coffee, MSA and programs that search conserved blocks databases like BLOCKS.

Uploaded by

Ankita Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

214 views23 pages

Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary

Uploaded by

Ankita Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

Multiple sequence alignment

Sumbitted to: Submitted by:

Ramesh Bishnoi
Dr.Navneet Choudhary
Nikita jain
What is Multiple Sequence
Alignment
• A sequence alignment of three or more biological sequences, generally
protein, DNA, or RNA.
• The input set of query sequences are assumed to have an evolutionary
relationship by which they share a lineage and are descended from a
common ancestor.
• Used to assess sequence conservation of protein domains, tertiary and
secondary structures, and even individual amino acids or nucleotides.
• Most multiple sequence alignment programs use heuristic methods rather
than global optimization because identifying the optimal alignment
between more than a few sequences of moderate length is prohibitively
computationally expensive.
An example of Multiple Alignment

VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG
LRLSCSSSGFIFSS--YAMYWVRQAPG
LSLTCTVSGTSFDD--YYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG--
ATLVCLISDFYPGA--VTVAWKADS--
AALGCLVKDYFPEP--VTVSWNSG---
VSLTCLVKGFYPSD--IAVEWWSNG--
Goals of Multiple Sequence
Alignment-
1. To generate a concise, information rich summary of sequence data.

2. Used to illustrate the dissimilarity between a group of sequences.

3. Alignments can be treated as models that can be used to treat hypothesis.

4. Use in phylogenetics -Multiple sequence alignments can be used to

create a phylogenetic tree.

5. Used to identify functionally important sites, such as binding sites, active

sites, or sites corresponding to other key functions, by locating conserved
domains.

Why we do multiple alignments?
1. Simple sequence comparison
2. Conserved vs. non-conserved regions

1. proteins - motifs/profiles

2. whole genome - genes, control regions

3. Homology (as opposed to similarity)

1. Evolution - phylogeny

2. Structural homology
4. Sequence differences
1. Single Nucleotide Polymorphisms (SNPs)

5. Help prediction of the secondary and tertiary structures of new sequences;

6. Preliminary step in molecular evolution analysis using Phylogenetic methods
for constructing phylogenetic trees.
Multiple Alignment Method
• The most practical and widely used method in multiple
sequence alignment is the hierarchical extensions of pairwise
alignment methods.

• The principal is that multiple alignments is achieved by

successive application of pairwise methods.
Multiple Alignment Method
• The steps are summarized as follows:

• Compare all sequences pairwise.

• Perform cluster analysis on the pairwise data to generate a hierarchy for

alignment. This may be in the form of a binary tree or a simple ordering

• Build the multiple alignment by first aligning the most similar pair of
sequences, then the next most similar pair and so on. Once an alignment of
two sequences has been made, then this is fixed. Thus for a set of sequences
A, B, C, D having aligned A with C and B with D the alignment of A, B, C, D
is obtained by comparing the alignments of A and C with that of B and D
using averaged scores at each aligned position.
Steps in Multiple Alignment

Multiple Sequence Alignment
Tools
• BLOCKS : HMM profile library
• CDD: Conserved domain database
• Interpro: A unified resource combining PROSITE, PRINTS, ProDom
And Pfam
• iProClass database :From the Protein Information Resource
• Pfam: Profile HMM library
• ClustalW: general purpose multiple sequence alignment program
• DIALIGN: local MSA
• MultAlin :Multiple sequence alignment with hierarchical clustering
• MSA: Multiple Sequence Alignment
• PileUp: general multiple sequence alignment program
• SAGA and COFFEE: Cedric Notredame's work .
ClustalW- for multiple alignment
• ClustaW is a general purpose multiple alignment program for DNA or
proteins.
• ClustalW is produced by Julie D. Thompson, Toby Gibson of European
Molecular Biology Laboratory, Germany and Desmond Higgins of European
Bioinformatics Institute, Cambridge, UK. Algorithmic

• ClustalW is cited: improving the sensitivity of progressive multiple sequence

alignment through sequence weighting, positions-specific gap penalties and
weight matrix choice.
• ClustalW can create multiple alignments, manipulate existing alignments, do
profile analysis and create phylogentic trees.

• Alignment can be done by 2 methods:

– - slow/accurate
– - fast/approximate
•
Running ClustalW
[~]% clustalw

**************************************************************
******** CLUSTAL W (1.7) Multiple Sequence Alignments ********
**************************************************************

1. Sequence Input From Disc

2. Multiple Alignments
3. Profile / Structure Alignments
4. Phylogenetic trees

S. Execute a system command

H. HELP
X. EXIT (leave program)

Your choice:
Running ClustalW

The input file for clustalW is a file containing all

sequences in one of the following formats:

•NBRF/PIR,
EMBL/SwissProt,
• Pearson (Fasta),
•GDE,
•Clustal,
•GCG/MSF,
• RSF.
Output of ClustalW
CLUSTAL W (1.7) multiple sequence alignment

HSTNFR GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------GCAG
SYNTNFTRP GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------GCAG
CFTNFA -------------------------------------------TGTCCAG------ACAG
CATTNFAA GGGAAGAG---CTCCCACATGGCCTGCAACTAATCAACCCTCTGCCCCAG------ACAC
RABTNFM AGGAGGAAGAGTCCCCAAACAACCTCCATCTAGTCAACCCTGTGGCCCAGATGGTCACCC
RNTNFAA AGGAGGAGAAGTTCCCAAATGGGCTCCCTCTCATCAGTTCCATGGCCCAGACCCTCACAC
OATNFA1 GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG------ACAC
OATNFAR GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG------ACAC
BSPTNFA GGGAAGAGCAGTCCCCAGGTGGCCCCTCCATCAACAGCCCTCTGGTTCAA------ACAC
CEU14683 GGGAAGAGCAATCCCCAACTGGCCTCTCCATCAACAGCCCTCTGGTTCAG------ACCC
** *
Blocks database and tools
• Blocks are multiply aligned ungapped segments corresponding
to the most highly conserved regions of proteins.

• The Blocks web server tools are : Block Searcher, Get Blocks
and Block Maker. These are aids to detection and verification
of protein sequence homology.

• They compare a protein or DNA sequence to a database of

protein blocks, retrieve blocks, and create new
blocks,respectively.
The BLOCKS web server
At URL: http://blocks.fhcrc.org/

The BLOCKS WWW server can be used to create blocks of a group

of sequences, or to compare a protein sequence to a database of
blocks.

The Blocks Searcher tool should be used for multiple alignment of

distantly related protein sequences.
The Blocks Searcher tool
• For searching a database of blocks, the first position of the
sequence is aligned with the first position of the first block, and
a score for that amino acid is obtained from the profile column
corresponding to that position. Scores are summed over the
width of the alignment, and then the block is aligned with the
next position.
• This procedure is carried out exhaustively for all positions of
the sequence for all blocks in the database, and the best
alignments between a sequence and entries in the BLOCKS
database are noted. If a particular block scores highly, it is
possible that the sequence is related to the group of sequences
the block represents.
The Blocks Searcher tool
• Typically, a group of proteins has more than one region in
common and their relationship is represented as a series
of blocks separated by unaligned regions. If a second
block for a group also scores highly in the search, the
evidence that the sequence is related to the group is
strengthened, and is further strengthened if a third block
also scores it highly, and so on.
The BLOCKS Database
The blocks for the BLOCKS database are made
automatically by looking for the most highly
conserved regions in groups of proteins represented
in the PROSITE database. These blocks are then
calibrated against the SWISS-PROT database to
obtain a measure of the chance distribution of
matches. It is these calibrated blocks that make up
the BLOCKS database.
The Block Maker Tool
• Block Maker finds conserved blocks in a group of
two or more unaligned protein sequences, which are
assumed to be related, using two different algorithms.
• Input file must contain at least 2 sequences.
• Input sequences must be in FastA format.
• Results are returned by e-mail.
T-Coffee-

• It allows the combination of a collection of

multiple/pairwise, global or local alignments into a
single model
• Pairwise global alignment
• Pairwise local alignment
• Combined above two into a library
• Builds MSA with highest consistency with the library of
alignments (progressive assembly)
T-Coffee
DiAlign-
• It constructs pairwise and multiple alignments by
comparing whole segments of the sequences.
• Alignment of whole segments and not individual amino
acids (bases)
• Pair wise comparison > segment pairs (diagonals),
represent local alignments
• Diagonals weighted for likelihood
• Alignment built from consistent diagonals
• No gap penalties
• Independent of sequence order
Fig: DiAlign

Statistical Methods For Bioinformatics Lecture 5
No ratings yet
Statistical Methods For Bioinformatics Lecture 5
48 pages
Emboss (Pairwise Sequence Alignment: Prepared By:-Bansari Patel (19it02) M.Sc. IT (SEM-2
No ratings yet
Emboss (Pairwise Sequence Alignment: Prepared By:-Bansari Patel (19it02) M.Sc. IT (SEM-2
19 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Proteins Bioinfo Latest
No ratings yet
Proteins Bioinfo Latest
45 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Structural Biology: What Does 3D Tell Us?
No ratings yet
Structural Biology: What Does 3D Tell Us?
20 pages
Handouts
No ratings yet
Handouts
74 pages
Viral Hepatitis For Pho
No ratings yet
Viral Hepatitis For Pho
60 pages
Protein Sequence Analysis
No ratings yet
Protein Sequence Analysis
44 pages
Statistical Test of Hypotheses
No ratings yet
Statistical Test of Hypotheses
36 pages
2 Introduction To PDB
No ratings yet
2 Introduction To PDB
43 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
6 CHAPTER 6 B Cell Generation, Activation and Differentiation
No ratings yet
6 CHAPTER 6 B Cell Generation, Activation and Differentiation
26 pages
B Cells and B Cell Development: Dr. Mansour Elyazji
No ratings yet
B Cells and B Cell Development: Dr. Mansour Elyazji
74 pages
B-Cell Development
No ratings yet
B-Cell Development
85 pages
Dr.T.V.Rao MD Dr.T.V.Rao MD
No ratings yet
Dr.T.V.Rao MD Dr.T.V.Rao MD
108 pages
Chikungunya
No ratings yet
Chikungunya
83 pages
Dr. G. Thiruvenkadam Post Graduate Dept of Pediatric & Preventive Dentistry
No ratings yet
Dr. G. Thiruvenkadam Post Graduate Dept of Pediatric & Preventive Dentistry
31 pages
Presented By:-Himanshu Dev DMLT VI TH Sem. VMMC & SJH
No ratings yet
Presented By:-Himanshu Dev DMLT VI TH Sem. VMMC & SJH
55 pages
Manual Ph.D. Phage Display Libraries
No ratings yet
Manual Ph.D. Phage Display Libraries
44 pages
Fasta and Blast
No ratings yet
Fasta and Blast
3 pages
Unit II Cells & Organs of Immune
No ratings yet
Unit II Cells & Organs of Immune
69 pages
C-12-Tcell and Bcell Activation
No ratings yet
C-12-Tcell and Bcell Activation
43 pages
Cytokine Chart
No ratings yet
Cytokine Chart
2 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
B Lymphocytes
No ratings yet
B Lymphocytes
21 pages
Yeast Display
No ratings yet
Yeast Display
7 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Recombinant Dna Technology & It's Application
No ratings yet
Recombinant Dna Technology & It's Application
20 pages
List of Biological Databases
100% (1)
List of Biological Databases
8 pages
T Cell Development Thymic Education of T Cell
No ratings yet
T Cell Development Thymic Education of T Cell
34 pages
Bioinformatics - Group21 - Report - Application of Bioinformatics in Agriculture
No ratings yet
Bioinformatics - Group21 - Report - Application of Bioinformatics in Agriculture
11 pages
Applications of Biosystematics
No ratings yet
Applications of Biosystematics
13 pages
Gene Silencing Techniques
100% (2)
Gene Silencing Techniques
15 pages
Hepatitis D: The Parasite's Parasite
No ratings yet
Hepatitis D: The Parasite's Parasite
5 pages
Part 2 of Medical Virology
No ratings yet
Part 2 of Medical Virology
113 pages
Gene Silencing: Presented by Aastha Pal M.Sc. 4 Semester (Biotechnology) Swami Rama Himalayan University
No ratings yet
Gene Silencing: Presented by Aastha Pal M.Sc. 4 Semester (Biotechnology) Swami Rama Himalayan University
22 pages
Unit 1: Structure Determination: Protein Structure Database PDB PDB File Format Ramachandran Plot
No ratings yet
Unit 1: Structure Determination: Protein Structure Database PDB PDB File Format Ramachandran Plot
33 pages
Insilico Gene Analysis
No ratings yet
Insilico Gene Analysis
34 pages
Group # 13
No ratings yet
Group # 13
49 pages
Viral Hepatitis: Dr. Staar Mohammed Qader
No ratings yet
Viral Hepatitis: Dr. Staar Mohammed Qader
60 pages
PFAM Database
No ratings yet
PFAM Database
22 pages
Phylogenetic Tree Construction - Methods
No ratings yet
Phylogenetic Tree Construction - Methods
7 pages
Methods For Studying Proteins
No ratings yet
Methods For Studying Proteins
96 pages
Pandas and COVID
No ratings yet
Pandas and COVID
13 pages
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
No ratings yet
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
12 pages
Hepatitis D: Pathophysiology
No ratings yet
Hepatitis D: Pathophysiology
2 pages
QRT-PCR: Quantitative Reverse Transcription PCR
No ratings yet
QRT-PCR: Quantitative Reverse Transcription PCR
19 pages
Review: Genomic Approaches To Studying The Human Microbiota
No ratings yet
Review: Genomic Approaches To Studying The Human Microbiota
7 pages
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
100% (1)
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
66 pages
BLAST
100% (1)
BLAST
4 pages
Science Magazine 5811 2007-01-26
No ratings yet
Science Magazine 5811 2007-01-26
136 pages
Statistical Methods For Bioinformatics Lecture 4
No ratings yet
Statistical Methods For Bioinformatics Lecture 4
29 pages
Fasta Sequence Database
No ratings yet
Fasta Sequence Database
17 pages
DNA Sequencing at 40 - Past Present and Future
No ratings yet
DNA Sequencing at 40 - Past Present and Future
10 pages
Hepatitis D Virus
No ratings yet
Hepatitis D Virus
14 pages
Tumour Viruses
No ratings yet
Tumour Viruses
57 pages
Multiple Alignment
No ratings yet
Multiple Alignment
28 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Corona: System Implications of Emerging Nanophotonic Technology
No ratings yet
Corona: System Implications of Emerging Nanophotonic Technology
12 pages
MayankJain Resume
No ratings yet
MayankJain Resume
1 page
SF Dump
No ratings yet
SF Dump
20 pages
An 8K H.265:HEVC Video Decoder Chip With A New System Pipeline Design
No ratings yet
An 8K H.265:HEVC Video Decoder Chip With A New System Pipeline Design
14 pages
Online Resources: Where To From Here
No ratings yet
Online Resources: Where To From Here
4 pages
Math10 Chapter Notes 2
No ratings yet
Math10 Chapter Notes 2
40 pages
Vol 8 No 1 March 2015ppi 224
No ratings yet
Vol 8 No 1 March 2015ppi 224
230 pages
Automotive Interview Questions PDF
80% (30)
Automotive Interview Questions PDF
59 pages
Microshoft Word Shortcut Keys2
No ratings yet
Microshoft Word Shortcut Keys2
21 pages
Anaplan IPO
No ratings yet
Anaplan IPO
614 pages
Avigilon 2.0C-H4PTZ-DP30
No ratings yet
Avigilon 2.0C-H4PTZ-DP30
4 pages
Neo Analytics
No ratings yet
Neo Analytics
3 pages
User Manual: Powerlogic Pm5500 / Pm5600 / Pm5700 Series
No ratings yet
User Manual: Powerlogic Pm5500 / Pm5600 / Pm5700 Series
228 pages
ICTNWK612 Assessment Workbook
No ratings yet
ICTNWK612 Assessment Workbook
100 pages
DLL Quarter1 Week3 Tle6
No ratings yet
DLL Quarter1 Week3 Tle6
7 pages
Capgemini Interview Questions
No ratings yet
Capgemini Interview Questions
6 pages
Powerpoint Lesson 3 Working With Visual Elements: Microsoft Office 2010 Introductory
No ratings yet
Powerpoint Lesson 3 Working With Visual Elements: Microsoft Office 2010 Introductory
29 pages
Electronic Communications Act 2000
No ratings yet
Electronic Communications Act 2000
7 pages
Lecture 2b - Karnaugh Map - PART 2
No ratings yet
Lecture 2b - Karnaugh Map - PART 2
27 pages
NBIMS-US V3 4.7 Eie-405-415
No ratings yet
NBIMS-US V3 4.7 Eie-405-415
11 pages
Acer Aspire Es1-512 Wistron Ea53-Bm SCH PDF
No ratings yet
Acer Aspire Es1-512 Wistron Ea53-Bm SCH PDF
49 pages
Overview DIP5K/EN OS/A22 DIP 5000
No ratings yet
Overview DIP5K/EN OS/A22 DIP 5000
8 pages
Cns Manual No Source Code
No ratings yet
Cns Manual No Source Code
50 pages
Iot (Internet of Things) : Connect The Things, Shrink The World
No ratings yet
Iot (Internet of Things) : Connect The Things, Shrink The World
26 pages
1 - 6 Years Experience 2nd
No ratings yet
1 - 6 Years Experience 2nd
2 pages
Learn HTML - Semantic HTML Cheatsheet - Codecademy
No ratings yet
Learn HTML - Semantic HTML Cheatsheet - Codecademy
2 pages
Document 1
No ratings yet
Document 1
3 pages
Record Client Request Requirements Information Sheet
No ratings yet
Record Client Request Requirements Information Sheet
9 pages
MS PDF VIEWER Snowsetanswers 2
No ratings yet
MS PDF VIEWER Snowsetanswers 2
475 pages
Solid State Laser Error Agfa
No ratings yet
Solid State Laser Error Agfa
8 pages

Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary

Uploaded by

Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary

Uploaded by

Multiple sequence alignment

Sumbitted to: Submitted by:

2. Used to illustrate the dissimilarity between a group of sequences.

3. Alignments can be treated as models that can be used to treat hypothesis.

4. Use in phylogenetics -Multiple sequence alignments can be used to

5. Used to identify functionally important sites, such as binding sites, active

2. whole genome - genes, control regions

3. Homology (as opposed to similarity)

5. Help prediction of the secondary and tertiary structures of new sequences;

• The principal is that multiple alignments is achieved by

• Compare all sequences pairwise.

• Perform cluster analysis on the pairwise data to generate a hierarchy for

• ClustalW is cited: improving the sensitivity of progressive multiple sequence

• Alignment can be done by 2 methods:

1. Sequence Input From Disc

S. Execute a system command

The input file for clustalW is a file containing all

• They compare a protein or DNA sequence to a database of

The BLOCKS WWW server can be used to create blocks of a group

The Blocks Searcher tool should be used for multiple alignment of

• It allows the combination of a collection of

You might also like