0% found this document useful (0 votes)
3 views17 pages

Multiple Sequence Alignment

The document outlines the main criteria for building multiple sequence alignments (MSA), including structural, evolutionary, functional, and sequence similarity. It discusses applications of MSA such as phylogenetic analysis, structure prediction, and PCR analysis, while also providing guidelines for selecting sequences and naming them appropriately. Additionally, it highlights the importance of recognizing conserved patterns in sequences for identifying protein domains and functional sites.

Uploaded by

afsanaakter1492
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views17 pages

Multiple Sequence Alignment

The document outlines the main criteria for building multiple sequence alignments (MSA), including structural, evolutionary, functional, and sequence similarity. It discusses applications of MSA such as phylogenetic analysis, structure prediction, and PCR analysis, while also providing guidelines for selecting sequences and naming them appropriately. Additionally, it highlights the importance of recognizing conserved patterns in sequences for identifying protein domains and functional sites.

Uploaded by

afsanaakter1492
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Bioinformatics

Multiple Sequence
Alignment
Muhammad Maqsud
Hossain
Main Criteria for Building MSA
• Structural similarity: Amino acids with
similar role in the same column
• Evolutionary similarity: aa or nt related to
the same aa or nt of common ancestor –
same col.
• Functional similarity: same column
• Sequence similarity: closely related,
structural, evolutionary and functional
similarities are equivalent to sequence
similarity.
Main applications of MSA
• Extrapolation
• Phylogenetic analysis
• Pattern identification
• Domain identification
• DNA regulatory elements
• Structure prediction: a good MSA can
give almost perfect prediction of 2D
structure of DNA and RNA. Sometimes 3D
model building
• PCR analysis: can help identify less
degenerated portions. Good side:
blocks.fhcrc.org/codehop.html
Remember
• Important amino acids (or nucleotides)
are not allowed to mutate
• Less important residues change more
easily, sometimes randomly, and
sometimes in order to adapt a function
Kinds of sequences you’re looking
for
• Use proteins whenever possible
• Start with 10-15 sequences and avoid
aligning more than 50 sequences ( can use
>1000 using linux OS)
• Sequences that are 30 percent identical with
more than half of the other sequences in the
set often cause trouble
• Identical sequences: They never help. Avoid
those more than 90 percent identical ( unless
you have a good reason)
• Use sequences that are roughly the same
length.
DNA or Protein?
• If you want to persist in carrying out a
phylogenetic analysis on a set of coding
DNA sequences:
▫ Translate your DNA sequences into
Proteins
▫ Perform multiple sequence alignment on
proteins
Choosing right number of
sequences
• Computing big alignment is difficult:
Public severs have limited resources. Your
job may take very long time
• MSA programs are not very good at
handling very large set of sequences
• Displaying big alignment is difficult:
Interpretation becomes impossible if
columns longer than one page
• Tree building and structure prediction
programs can not handle them easily
• Making accurate big alignment is difficult
MSA don’t like
• Sequences that are very different form
every other sequences in the group
• Sequences that need long
insertions/deletions to be properly
aligned.
Naming your sequences the right
way
• Never use white spaces in your sequence
names
• Do not use special symbols.
• Never use name longer than 15
characters
• Never give the same name to two
different sequences in your set. Although
some accepts most don’t
Gathering sequences with
BLAST
• Characterized: good annotation and
experimental information are available
• Uncharacterized: motivation is to
distinguish between the conserved
positions that can not mutate and othe
less important columns.
Interpreting MSA
• Still involves some educational guesswork
• DNA alignments are by far the most
difficult to interpret
Recognizing the good parts
• (*) entirely conserved column
• (:) roughly the same size residues and
same hydropathy
• (.) where the size or the hydropathy has
been preserved in the course of evolution
Patterns of Conservation
• W,Y,F: It is common to find conserved
tryptophan
▫ Tryptophan is a large hydrophobic residues
that site deep in the core of proteins
▫ Plays important role in stability and
difficult to mutate
▫ When tryptophan mutates it usually
replaced by another aromatic amino acid
such and phenylalanine or tyrosine
▫ Patterns of conserved aromatic amino acids
constitute the most common signatures for
recognizing protein domains.
G,P
• Glycing or proline
• Often coincide with the extremeties of
well-structured beta strand or alpha
helices
• C: Cysteines are famous for making C-C
(disulfide) bridges
▫ Columns of conserved cysteines with a
specific distance provide a useful signature
for recognizing protein domains and folds
• H,S: Histidine and serine are often
involved in catalytic sites, especially those
of proteases
▫ Conserved histidine or a conserved serine
are good candidates for being part of an
active site
• K, R, D, E: These charged amino acids are
often involved in ligand binding
▫ Highly conserved columns can also indicate
a salt bridge inside the core of the protein
• L: Leucines are rarely very conserved
unless they’re involved in protein-
protein interactions such as leucine
zipper

You might also like