Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG
LRLSCSSSGFIFSS--YAMYWVRQAPG
LSLTCTVSGTSFDD--YYSTWVRQPPG
PEVTCVVVDVSHEDPQVKFNWYVDG--
ATLVCLISDFYPGA--VTVAWKADS--
AALGCLVKDYFPEP--VTVSWNSG---
VSLTCLVKGFYPSD--IAVEWWSNG--
Goals of Multiple Sequence
Alignment-
1. To generate a concise, information rich summary of sequence data.
Why we do multiple alignments?
1. Simple sequence comparison
2. Conserved vs. non-conserved regions
1. proteins - motifs/profiles
1. Evolution - phylogeny
2. Structural homology
4. Sequence differences
1. Single Nucleotide Polymorphisms (SNPs)
• Build the multiple alignment by first aligning the most similar pair of
sequences, then the next most similar pair and so on. Once an alignment of
two sequences has been made, then this is fixed. Thus for a set of sequences
A, B, C, D having aligned A with C and B with D the alignment of A, B, C, D
is obtained by comparing the alignments of A and C with that of B and D
using averaged scores at each aligned position.
Steps in Multiple Alignment
Multiple Sequence Alignment
Tools
• BLOCKS : HMM profile library
• CDD: Conserved domain database
• Interpro: A unified resource combining PROSITE, PRINTS, ProDom
And Pfam
• iProClass database :From the Protein Information Resource
• Pfam: Profile HMM library
• ClustalW: general purpose multiple sequence alignment program
• DIALIGN: local MSA
• MultAlin :Multiple sequence alignment with hierarchical clustering
• MSA: Multiple Sequence Alignment
• PileUp: general multiple sequence alignment program
• SAGA and COFFEE: Cedric Notredame's work .
ClustalW- for multiple alignment
• ClustaW is a general purpose multiple alignment program for DNA or
proteins.
• ClustalW is produced by Julie D. Thompson, Toby Gibson of European
Molecular Biology Laboratory, Germany and Desmond Higgins of European
Bioinformatics Institute, Cambridge, UK. Algorithmic
**************************************************************
******** CLUSTAL W (1.7) Multiple Sequence Alignments ********
**************************************************************
Your choice:
Running ClustalW
•NBRF/PIR,
EMBL/SwissProt,
• Pearson (Fasta),
•GDE,
•Clustal,
•GCG/MSF,
• RSF.
Output of ClustalW
CLUSTAL W (1.7) multiple sequence alignment
HSTNFR GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------GCAG
SYNTNFTRP GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------GCAG
CFTNFA -------------------------------------------TGTCCAG------ACAG
CATTNFAA GGGAAGAG---CTCCCACATGGCCTGCAACTAATCAACCCTCTGCCCCAG------ACAC
RABTNFM AGGAGGAAGAGTCCCCAAACAACCTCCATCTAGTCAACCCTGTGGCCCAGATGGTCACCC
RNTNFAA AGGAGGAGAAGTTCCCAAATGGGCTCCCTCTCATCAGTTCCATGGCCCAGACCCTCACAC
OATNFA1 GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG------ACAC
OATNFAR GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG------ACAC
BSPTNFA GGGAAGAGCAGTCCCCAGGTGGCCCCTCCATCAACAGCCCTCTGGTTCAA------ACAC
CEU14683 GGGAAGAGCAATCCCCAACTGGCCTCTCCATCAACAGCCCTCTGGTTCAG------ACCC
** *
Blocks database and tools
• Blocks are multiply aligned ungapped segments corresponding
to the most highly conserved regions of proteins.
• The Blocks web server tools are : Block Searcher, Get Blocks
and Block Maker. These are aids to detection and verification
of protein sequence homology.