Evolutionary Analysis
Evolutionary Analysis
Evolutionary Analysis
Evolutionary Analysis
Evolutionary Analysis
Fiona Brinkman
Simon Fraser University,
Greater Vancouver, BC, Canada
What do
• BLAST
• Protein motif searching
• Protein threading
• Multiple sequence alignment
Have in common?
Current Topics in Genome Analysis 2005
Evolutionary Analysis
• Discoveries of fossils
accumulated
– Remains of unknown but
still living species that are
elsewhere on the planet?
– Cuvier (circa 1800): the
deeper the strata, the
less similar fossils were
to existing species
What is evolution?
Characters
• Heritable changes in features (morphology,
DNA sequence etc…)
time
Current Topics in Genome Analysis 2005
Evolutionary Analysis
A Unique Character:
Hair for Mammals
• Hair evolved only once and is “unreversed”
• Presence of hair strong indication that
organism is a mammal
Homoplasy:
The formation of tails
• Tails evolved independently in the ancestors
of frogs and humans
• Presence of a tail no useful conclusions
Current Topics in Genome Analysis 2005
Evolutionary Analysis
bioinformatics
bioinfortatics
bioinfortatios time
oinformatios
informatios
infortation
information
All share the same deleted sequence region, which is not found
in any other transporter examined to date
Unique character?
Non-unique.
Classification according to
characters – more characters can
be good
Classification according to
characters
VTISCTGSSSNIGAG-NHVKWYQQLPG
VTISCTGTSSNIGS--ITVNWYQQLPG The sole purpose
of multiple
LRLSCSSSGFIFSS--YAMYWVRQAPG sequence
LSLTCTVSGTSFDD--YYSTWVRQPPG alignments is to
place homologous
PEVTCVVVDVSHEDPQVKFNWYVDG--
positions of
ATLVCLISDFYPGA--VTVAWKADS-- homologous
AALGCLVKDYFPEP--VTVSWNSG--- sequences into
the same column.
VSLTCLVKGFYPSD--IAVEWESNG--
in out!
Current Topics in Genome Analysis 2005
Evolutionary Analysis
gh
• Examine alignment:
– Are you confident that aligned residues/bases evolved
from a common ancestor?
– Are domains of the proteins/predicted secondary
structures, etc. aligning correctly?
A phylogenetic tree
A node
Human
A clade
Mouse
Fly
A node
D
Human
A clade
B
Mouse
C
Fly
A
Phylogenetic analysis
• Organismal relationships
• Gene/Protein relationships
Current Topics in Genome Analysis 2005
Evolutionary Analysis
Organismal relationships
Current Topics in Genome Analysis 2005
Evolutionary Analysis
rRNA genes
Multiple genes
Gene/Protein Relationships
Homologs
Homologs
…have common ancestry, but the way they are related can vary
(i.e. the reasons they have diverged into different sequences can
vary)
Gene Duplication
Homologs
True or False?
Human
Mouse
Fly
Worm
Example Problem:
• Therefore, current best hit may be a paralog now and the true ortholog
not yet sequenced
• Assumption:
- Mouse and Human gene datasets are more complete, with more true
orthologs identified
Bunch
of
Eukaryotes
Two
bacteria
Two
Eukaryotes
Bunch
of
Eukaryotes
A bacteria
Bunch
of
bacteria
2 Forms in 1 Species
+ + ++ +
Gene present in
+ common ancestor
Loss
Loss
Gene originates
here
Current Topics in Genome Analysis 2005
Evolutionary Analysis
Gene lost
here
Unusual Distribution -
Evolutionary Rate Variation -?
+
Current Topics in Genome Analysis 2005
Evolutionary Analysis
Unusual Distribution -
Incomplete Data
+/- +/- + +
2004: Environmental
genomics sampling takes
centre stage
• Parsimony
• Neighbor-joining
• Maximum Likelihood
Current Topics in Genome Analysis 2005
Evolutionary Analysis
Parsimony
• “Shortest-way-from-A-to-B” method
• The tree implying the least number of changes in
character states (most parsimonious) is the best.
• Note:
– May get more than one tree
– No branch lengths
– Uses all character data
Neighbor-joining
(and other distance matrix methods)
• “speedy-and-popular” method
• distance matrix constructed
• distance estimates the total branch length between
a given two species/genes/proteins
• Neighbor-joining approach: Pairing those
sequences that are the most alike and using that
pair to join to next closest sequence.
Current Topics in Genome Analysis 2005
Evolutionary Analysis
Maximum Likelihood
• “Inside-out” approach
• produces trees and then sees if the data could
generate that tree.
• gives an estimation of the likelihood of a
particular tree, given a certain model of
nucleotide substitution.
• Notes:
– All sequence info (including gaps) is used
– Based on a specific model of evolution – gives
probability
– Verrrrrrrrrrrry slow (unless topology of tree is known)
Bootstrapping
The number of times a
particular branch is formed
in the tree (out of the X
times the analysis is done)
can be used to estimate its
probability, which can be
indicated on a consensus tree
Parametric Bootstrapping
PHYLIP
http://evolution.genetics.washington.edu/phylip.html
PAUP
http://paup.csit.fsu.edu/
MEGA 2.1
www.megasoftware.net/
TREEVIEW
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
Challenges
How do we classify?
Computational Challenges
More Challenges
Remember:
Evolutionary theory is evolving…