0% found this document useful (0 votes)

57 views32 pages

Introduction To Molecular Evolution: Mike Thomas October 3, 2002

The document provides an introduction to molecular evolution and phylogenetic analysis. It discusses how multiple sequence alignments can be used to reconstruct the evolutionary history of genes and learn about gene structure, function and evolution. It also explains some key assumptions of phylogenetic methods including accurate sequences, homology and sufficient sampling. The document then discusses how phylogenetic trees are used to represent evolutionary relationships and analyze events like gene duplication and loss. It introduces different search strategies and optimality criteria used to evaluate phylogenetic trees, including maximum parsimony and minimum evolution. It also notes some issues with different approaches.

Uploaded by

PINAKIN W

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views32 pages

Introduction To Molecular Evolution: Mike Thomas October 3, 2002

Uploaded by

PINAKIN W

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 32

Introduction to

Molecular Evolution

Mike Thomas
October 3, 2002
What we can learn from multiple sequence alignments

• An alignment is a hypothesis about the relatedness of a set of genes

• This information can be used to reconstruct the evolutionary history of those
genes
• The history of the genes can provide us with information about the structure
and function, and significance of a gene or family of genes
• We can also use the reconstructed history to test hypotheses about evolution
itself:
– Rates of change
– The degree of change
– Implications of change, etc
• We can then pose and test hypotheses about the evolution of phenomena
unrelated to the genes
– Evolution of flight in insects
– Evolution of humans
– Evolution of disease
Assumptions made by phylogenetic methods:

• The sequences are correct

• The sequence are homologous
• Each position is homologous
• The sampling of taxa or genes is sufficient to resolve the
problem of interest
• Sequence variation is representative of the broader group of
interest
• Sequence variation contains sufficient phylogenetic signal (as
opposed to noise) to resolve the problem of intereest
• Each position in the sequence evolved independently
How do you extract this information from an alignment?
Answer: a tree

“Higher” organisms

Haeckel’s Tree of Life

“Lower” organisms

A phylogenetic tree is a
hierarchical, graphical
representation of
relationships
Other Ways to Represent Phylogenies

(a) Cladogram showing the

phylogenetic relationships
between four species.

(b) Relationships of the same

four species represented
as a set of nested
parentheses.

(c) Evolutionary relationships

of the same four species
with nine synapomorphies
(shared, derived
characters) plotted on the
branches.
Using Phylogeny to Understand
Gene Duplication and Loss

A. A gene tree.
B. The gene tree superimposed on a species tree, allowing
identification of the duplication and loss events.
Problems with Phylogenetic Inference

1. How do we know what the

potential candidate trees are?

2. How do we choose which tree is

(most likely) the true tree?
Number of Possible Trees

Number of taxa Number of possible A B C

or genes rooted trees
3 3

B A C
4 15

5 105
C B A
7 10,395
Recipe for reconstructing a phylogeny

1. Select an optimality criterion

2. Select a search strategy
3. Use the selected search strategy
to generate a series of trees, and
apply the selected optimality
criterion to each tree, always
keeping track of the “best” tree
examined thus far
Search strategy: Which is the right tree?

• When m is the number of taxa, the number of

possible trees is:
– [(2m-3)!]/[2m-2(m-2)!]
– For 10 taxa, the number of trees is 34,459,425
• Many trees can be discarded because they are
obviously wrong
• Sometimes, there is a general or even specific
grouping that can serve as a start for the tree search
• There are a number of approaches to tree searches
that can be used
Search Strategies

Strategy Type
• Stepwise addition Algorithmic
• Star decomposition Algorithmic
• Exhaustive Exact
• Branch & bound Exact
• Branch swapping Heuristic
• Genetic algorithm Heuristic
• Markov Chain Monte Carlo heuristic

But, we still need to evaluate the trees in order to identify

the one most likely to be the true tree
Choose an optimality criterion to evaluate trees

Commonalities can be found, but how can

these be used to evaluate a tree?
General differences between optimality criteria

Minimum Maximum Maximum

evolution Parsimony Likelihood
Model based “Model free” Model based
Can account for many types Assumes that all substitutions Can account for many
of sequence substitutions are equal types of sequence
substitutions
Works well with strong or Works only when sequence Works well with strong or
weak sequence similarity similarity is high weak sequence similarity

Computationally fast Computationally fast Computationally slow

Well understood statistical Poorly understood statistical Well understood
properties (easy to test) properties (hard to test) statistical properties (easy
to test)
Can accurately estimate Cannot estimate branch Can estimate branch
branch lengths (important for lengths accurately lengths with some degree
molecular clocks) of accuracy
Maximum Parsimony

1. The parsimony score is the minimum

number of required changes, or steps
2. Only shared, derived characters are used
3. The score for each character (site) is
called the character score
4. Site lengths added over all sites is the
tree length
5. The tree (out of all examined trees) with
the lowest tree length is the most
parsimonious tree… and most likely to
be the true tree
Example: Maximum Parsimony

Tree length: 6 steps Tree length: 12 steps

G H F X H G F
X
5
1 20 teeth, 5
2 toes, 10 ribs,
4 round lobes,
long legs
1 20 teeth 5
10 ribs, 5 toes, 3 toes, round lobes
round lobes, long legs 4 toes, short legs, 8 ribs,
16 teeth, oval lobes
4 toes
oval lobes, 16 teeth, 25 verts, round lobes, 20 teeth, 25 verts,
10 ribs, 5 toes, long legs
8 ribs, 3 toes, short legs
Simple example of parsimony with sequence data
Another example with nucleotide data

(a) Alignment of four

hypothetical DNA
sequences.

(b) Most parsimonious rooted

cladogram for this
alignment.

(c) Corresponding unrooted

cladogram.
Issues & problems with parsimony

• Multiple trees may be the most

parsimonious (have the same tree length)
– A consensus tree can be constructed to visualize
the congruity & discontinuity between these
• Branch lengths (and, therefore, rates of
change) cannot be accurately estimated
• No explicit model of change is used, even
when one might be well supported
– The most parsimonious tree(s) may not be the
true tree
Minimum Evolution (Distance)

1. All data are used, even though some may not be

shared, derived characters
2. The branch lengths represent distance between
a taxon and an ancestor, given an assumed model
of evolution
3. The pairwise distances are calculated for each
pair of taxa, given an assumed model of evolution
4. The tree length is the sum of branch length
across a tree
5. The tree (out of all examined trees) with the
lowest tree length is the minimum evolution
tree… and most likely to be the true tree
The tree is different than a parsimony tree

(a) Hypothetical evolutionary

relationships between three DNA
sequences, in which the
horizontal branch lengths are
proportional to the number of
character-state changes along the
branches.
(b) Topology of the parsimonious
cladogram that would be
constructed from the sequence
similarities produced by such an
evolutionary history if multiple
substitutions had occurred at
several sites.
Models of evolution: choosing parameters
Factors that Affect Phylogenetic Inference
1. Relative base frequencies (A,G,T,C)
2. Transition/transversion ratio
3. Number of substitutions per site
4. Number of nucleotides (or amino acids) in sequence
5. Different rates in different parts of the molecule
6. Synonymous/non-synonymous substitution ratio
7. Substitutions that are uninformative or obfuscatory
1. Parallel substitutions
2. Convergent substitutions
3. Back substitutions
4. Coincidental substitutions

In general, the more factors that are accounted for by the

model (i.e., more parameters), the larger the error of
estimation. It is often best to use fewer parameters by
choosing the simpler model.
Some distance models: p-distance

• p = nd/n, where n is the number of sites

(nucleotides or amino acids), and nd is the number
of differences between the two sequences
examined.
• Very robust when divergence times are recent
and the affect of complicating phenomena is minor
Some distance models: Jukes-Cantor

• Used to estimate the number of

substitutions per site
• The expected number of A T C G
substitutions per site is: A - α α α
• d = 3αt = -(3/4)ln[1-(4/3)p], T α - α α
where p is the proportion of C α α - α
G α α α -
difference between 2 sequences
• Variance can be calculated
• No assumptions are made about
nucleotide frequencies, or
differential substitution rates
Some distance models: Kimura two-parameter

• Used to estimate the number of Pyrimidines

substitutions per site C T


• d = 2rt, where r is the
substitution rate (per site, per    
year) and t is the generation time;
r = α + 2β, so:
• d = 2αt + 4βt 

• Accounts for different transition A Purines

G
and transversion rates
 = transition rate
• No assumptions are made about  = transversion rate
nucleotide frequencies, variance These are treated the
is greater than Jukes-Cantor same for long divergence
times.
Other models
• Hasegawa, Kishino, Yano (HKY): corrects for
unequal nucleotide frequencies and transition/
transversion bias into account
• Unrestricted model: allows different rates between
all pairs of nucleotides
• General Time Reversible model: allows different
rates between all pairs of nucleotides and corrects
for unequal nucleotide frequencies
• Many other models have been invented to correct
for specific problems
• The more parameters are introduced, the larger the
variance becomes
Ways to build trees with distance models: ME

• Minimum Evolution (ME) trees can be found by

exhaustive searches or heuristic searches (starting
with a reasonable tree or eliminating unlikely
possible trees)
• For each tree examined, the total tree length is
calculated as the sum of branch lengths calculated
using a given model
• ME, like Maximum Parsimony, may generate a
number of equal-scoring ME trees and may not
actually result in the true tree Many other models
have been invented to correct for specific problems
Ways to build trees with distance models: UPGMA

• UPGMA (unweighted pair-group method using

arithmetic averages)
• Generally accurate for molecular evolution when
substitution rates are relatively constant, but this can
rarely be assumed to be true
• Method:
– distances for each pair of taxa are computed using the chosen
distance method
– The pair with the smallest value d are combined into a single,
composite taxon
– The distances from this composite taxon to all other taxa are
computed
– The next pair with the smallest d is chosen (including
consideration of pairings with the composite taxon)
Ways to build trees with distance models: Neighbor
Joining
• Neighbor Joining (NJ) is a very robust method that is accurate
even when substitution rates are not constant, and generally
recovers the ME tree (although this is not always the case)
• Method:
– We construct a “star” tree and compute the sum of all branches, SO
(this will be greater than the sum of all branches for the final tree, S F)
– We then pick a pair of taxa to be “neighbors”, (say, taxa 1 & 2) and
compute the sum of all branches, S1,2
– All other pairs of taxa are then placed as neighbors and the sum of all
branches computed
– The neighbors whose pairing results in the greatest reduction in the
sum of all branches will be kept
– Then, another round of neighbor joining is conducted, including using
the neighbor pair retained in the first round
Example: The evolution of flight in stoneflies

•Reconstruction of the
Plectoptera order (stoneflies)
from 18S rRNA sequence
•Kimura 2-parameter distance
used
•Tree rooted with known
outgroup species
•Neighbor-Joining tree building
method used to construct first
tree; tree search was conducted
to ensure that the NJ was also
the ME tree
•Characters related to flight were
then mapped onto the tree

Defined outgroup taxa

Scale, in substitutions/site
Maximum Likelihood

1. The site likelihoods represent probability of

data for one site given an assumed model of
evolution
2. Overall likelihood is the product of the site
likelihoods
3. Trees are evaluated by comparing log-
likelihood scores
4. Likelihood scores are comparable across models
as well as trees, so it provides a way of testing
the goodness of fit of a model
5. The tree (out of all examined trees) with the
lowest tree length is the maximum likelihood
tree… and most likely to be the true tree
Next Tuesday:

1. Examples of phylogenetic reconstructions

2. Uses of phylogenetic trees
3. Other research using molecular evolution

Next Thursday: exam 1

All material through next Tuesday
(10/8) will be covered by the exam

Lab 3
No ratings yet
Lab 3
6 pages
Phylogenetics
No ratings yet
Phylogenetics
49 pages
Mega Tutori̇al PDF
No ratings yet
Mega Tutori̇al PDF
48 pages
Molecular Phylogeny
No ratings yet
Molecular Phylogeny
78 pages
BDMH Phylogenetic
No ratings yet
BDMH Phylogenetic
32 pages
L13 PhylogenyTrees
No ratings yet
L13 PhylogenyTrees
19 pages
BIOL 401 - W22 - Lecture - Phylogenetic Inference
No ratings yet
BIOL 401 - W22 - Lecture - Phylogenetic Inference
39 pages
Phyl o Genetics
No ratings yet
Phyl o Genetics
58 pages
Molecular Phylogenetic Analysis: - Humans-flies-Mollusks - Common Phenotype?
No ratings yet
Molecular Phylogenetic Analysis: - Humans-flies-Mollusks - Common Phenotype?
35 pages
BTC 506 Phylogenetic Analysis
No ratings yet
BTC 506 Phylogenetic Analysis
58 pages
4 - Phylogenetics
No ratings yet
4 - Phylogenetics
30 pages
Phylogenetic Tree Reconstruction: I519 Introduction To Bioinformatics, 2012
No ratings yet
Phylogenetic Tree Reconstruction: I519 Introduction To Bioinformatics, 2012
40 pages
Molecular Phylogeny - Introduction
No ratings yet
Molecular Phylogeny - Introduction
12 pages
Phylogenetic Tree Bioinformatics
No ratings yet
Phylogenetic Tree Bioinformatics
4 pages
4 Phylogenetics
No ratings yet
4 Phylogenetics
43 pages
Final 2
No ratings yet
Final 2
85 pages
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
No ratings yet
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
27 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
47 pages
Introduction - Arbres - Phylogénique
No ratings yet
Introduction - Arbres - Phylogénique
36 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
6 pages
Phylogenetics Basics
No ratings yet
Phylogenetics Basics
28 pages
Phylogenetics
100% (1)
Phylogenetics
51 pages
intro-to-phylo
No ratings yet
intro-to-phylo
51 pages
Phylogenic Tree
No ratings yet
Phylogenic Tree
42 pages
Phylogenetic Analysis1
No ratings yet
Phylogenetic Analysis1
62 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Intro To Phyl o Genetics
No ratings yet
Intro To Phyl o Genetics
44 pages
Computational Methods in Phylogenetic Analysis: Tutorial at CSB 2004 Tandy Warnow
No ratings yet
Computational Methods in Phylogenetic Analysis: Tutorial at CSB 2004 Tandy Warnow
89 pages
Disclaimer
No ratings yet
Disclaimer
36 pages
Evolution - Chapter2 & 16
No ratings yet
Evolution - Chapter2 & 16
87 pages
Aris Brosou 2012
No ratings yet
Aris Brosou 2012
42 pages
Principles-Of-Computational-Biology
No ratings yet
Principles-Of-Computational-Biology
67 pages
Ceng465 Week8
No ratings yet
Ceng465 Week8
40 pages
14 Pam
No ratings yet
14 Pam
9 pages
Phylogenetics
No ratings yet
Phylogenetics
108 pages
Phylogeny Analysis
No ratings yet
Phylogeny Analysis
49 pages
4rth Phylogeny by MAtti Ullah KHanNiazi
No ratings yet
4rth Phylogeny by MAtti Ullah KHanNiazi
9 pages
Inferring Phylogenies 2nd Edition Joseph Felsenstein digital download
No ratings yet
Inferring Phylogenies 2nd Edition Joseph Felsenstein digital download
163 pages
Lecture 9 - Phylogenetic Tree
No ratings yet
Lecture 9 - Phylogenetic Tree
16 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
31 pages
Phylogenetic Analysis
100% (1)
Phylogenetic Analysis
27 pages
Class16-Introduction To Molecular Phylogenetics
No ratings yet
Class16-Introduction To Molecular Phylogenetics
14 pages
Introduction To Phylogeny
No ratings yet
Introduction To Phylogeny
57 pages
Phylogenetic Tree Construction
No ratings yet
Phylogenetic Tree Construction
6 pages
Ch.4 Estimating Evolutionary Trees - 2019.09.23
No ratings yet
Ch.4 Estimating Evolutionary Trees - 2019.09.23
51 pages
2006phylogenynotes Lecture1
No ratings yet
2006phylogenynotes Lecture1
34 pages
Phylogenetic Analysis Extra
No ratings yet
Phylogenetic Analysis Extra
13 pages
Molecular Phylogenetics
No ratings yet
Molecular Phylogenetics
29 pages
Maximum Parsimony and Likelihood
No ratings yet
Maximum Parsimony and Likelihood
34 pages
Lec 10 Phylogenetics
No ratings yet
Lec 10 Phylogenetics
51 pages
Phylogenetic Trees (BIOINFORMATICS)
No ratings yet
Phylogenetic Trees (BIOINFORMATICS)
7 pages
Lec9 Distances
No ratings yet
Lec9 Distances
73 pages
Phylogenetic Trees
No ratings yet
Phylogenetic Trees
23 pages
Phylogenetics 1 and 2
No ratings yet
Phylogenetics 1 and 2
30 pages
Phylogenetic Analisys Course
No ratings yet
Phylogenetic Analisys Course
140 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
Lecture 11 (Phylogenetic)
No ratings yet
Lecture 11 (Phylogenetic)
24 pages
4 Phylogeny PDF
No ratings yet
4 Phylogeny PDF
29 pages
Mega6 Tutorial
100% (1)
Mega6 Tutorial
10 pages
Recpad2024 - Poster
No ratings yet
Recpad2024 - Poster
1 page
Lab 4: Phylogenetics: Bioinformatic Methods I Lab 4
No ratings yet
Lab 4: Phylogenetics: Bioinformatic Methods I Lab 4
20 pages
Nts Ys Get Started Guide 22
No ratings yet
Nts Ys Get Started Guide 22
44 pages
Lab 3 - Blast & Phylogenetic Tree
No ratings yet
Lab 3 - Blast & Phylogenetic Tree
7 pages
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
No ratings yet
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
39 pages
Bioinformatics-And-Phylogeny
No ratings yet
Bioinformatics-And-Phylogeny
14 pages
Phylogeny of The Bangiophycidae
No ratings yet
Phylogeny of The Bangiophycidae
11 pages
Exercises For Phylogeny: Exercise 1. Parsimony and Rooted Versus Unrooted Trees
No ratings yet
Exercises For Phylogeny: Exercise 1. Parsimony and Rooted Versus Unrooted Trees
11 pages
DSFDSFDSFDSF
No ratings yet
DSFDSFDSFDSF
108 pages
Module 5
No ratings yet
Module 5
23 pages
Construction of Phylogenetic Tree.
No ratings yet
Construction of Phylogenetic Tree.
4 pages
Yang and Rannala 2012 Molecular Phylogenetics.
100% (1)
Yang and Rannala 2012 Molecular Phylogenetics.
12 pages
Bioinformatics Session16!17!25102021
No ratings yet
Bioinformatics Session16!17!25102021
39 pages
Get Started Guide 22
No ratings yet
Get Started Guide 22
44 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Saitou N. and Nei M. (1987) - The Neighbor-Joining
No ratings yet
Saitou N. and Nei M. (1987) - The Neighbor-Joining
20 pages
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
No ratings yet
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
32 pages
Introduction To Bioinformatics: Tolga Can
No ratings yet
Introduction To Bioinformatics: Tolga Can
21 pages
Neighbour Join Method Research
No ratings yet
Neighbour Join Method Research
20 pages
PAUP Lab PAUP Tutorial
No ratings yet
PAUP Lab PAUP Tutorial
24 pages
Estimating Phylogenetic Trees With Phangorn (Version 1.6-0) : Klaus P. Schliep April 5, 2012
No ratings yet
Estimating Phylogenetic Trees With Phangorn (Version 1.6-0) : Klaus P. Schliep April 5, 2012
12 pages
Phylogenetic Tree Sec 4
No ratings yet
Phylogenetic Tree Sec 4
7 pages
Cluster Past
No ratings yet
Cluster Past
5 pages
Syntomis Amata Phegea (Lepidoptera, Ctenuchinae
No ratings yet
Syntomis Amata Phegea (Lepidoptera, Ctenuchinae
21 pages
A Primer To Phylogenetic Analysis Using The PHYLIP Package: Jarno Tuimala Fifth Edition
No ratings yet
A Primer To Phylogenetic Analysis Using The PHYLIP Package: Jarno Tuimala Fifth Edition
55 pages

Introduction To Molecular Evolution: Mike Thomas October 3, 2002

Uploaded by

Introduction To Molecular Evolution: Mike Thomas October 3, 2002

Uploaded by

Introduction to

• An alignment is a hypothesis about the relatedness of a set of genes

• The sequences are correct

Haeckel’s Tree of Life

(a) Cladogram showing the

(b) Relationships of the same

(c) Evolutionary relationships

1. How do we know what the

2. How do we choose which tree is

Number of taxa Number of possible A B C

1. Select an optimality criterion

• When m is the number of taxa, the number of

But, we still need to evaluate the trees in order to identify

Commonalities can be found, but how can

Minimum Maximum Maximum

Computationally fast Computationally fast Computationally slow

1. The parsimony score is the minimum

Tree length: 6 steps Tree length: 12 steps

(a) Alignment of four

(b) Most parsimonious rooted

(c) Corresponding unrooted

• Multiple trees may be the most

1. All data are used, even though some may not be

(a) Hypothetical evolutionary

In general, the more factors that are accounted for by the

• p = nd/n, where n is the number of sites

• Used to estimate the number of

• Used to estimate the number of Pyrimidines

substitutions per site C T

• Accounts for different transition A Purines

• Minimum Evolution (ME) trees can be found by

• UPGMA (unweighted pair-group method using

Defined outgroup taxa

1. The site likelihoods represent probability of

1. Examples of phylogenetic reconstructions

Next Thursday: exam 1

You might also like