30/5/2023
SIJ1004
Class 16
Introduction to Molecular Phylogenetics
mammals
vertebrates
invertebrates
protozoa
1
30/5/2023
Molecular sequence evolution
• If molecular sequence evolve by accumulation of mutations.
• The amount of difference in between the two sequences should indicate
the distance of from the shared common ancestor.
• The closer the distance to the common ancestor (diverged in recent past)
fewer differences.
• This means that by comparing three or more related molecular sequences
the evolutionary relationships between them.
2
30/5/2023
Phylogenetic basics
• Phylogenetics: the study of the evolutionary history of organisms.
• Based on fossil data, but more recently on molecular data (evolution at
molecular level).
• A process of mutation with selection.
• Multiple sequence alignment reveal similarities and divergence among
related biological sequences.
• Phylogenetic trees rationalized and visualized the details from MSA.
• Molecular phylogenetics is a fundamental aspect of bioinformatics.
• Advantages of molecular Phylogenetics:
• Molecular data more numerous than fossils
• No sampling bias involved
• More robust phylogenetic trees can be constructed
Which species are the closest living relatives
of modern humans?
Gorillas Humans
Chimpanzees
Chimpanzees
Bonobos
Bonobos
Orangutans Gorillas
Humans Orangutans
15-30 0 14 0
MYA MYA
The pre-molecular view was that the great apes
Molecular data show that bonobos and
(chimpanzees, gorillas and orangutans) formed
chimpanzees are related more closely to
a clade separate from humans, and that
humans than either are to gorillas.
humans diverged from the apes at least 15-30
MYA.
3
30/5/2023
Molecular phylogeny: nomenclature of trees
• Molecular phylogeny uses trees to depict evolutionary relationships
among organisms.
• Molecular Phylogeny trees are based upon DNA and protein sequence
data.
• There are two main kinds of information inherent to any tree:
• Topology
• Branch lengths.
Types of trees: unrooted vs rooted
• A rooted phylogenetic tree is a • Unrooted trees illustrate the
tree with a unique root node relatedness of the leaf nodes
without making assumptions
corresponding to the most about common ancestry. An
recent common ancestor of all unrooted tree has a node with
the entities at the leaves (aka three edges; the rest of the
tips) of the tree. A rooted tree nodes have up to two edges.
is a binary tree.
4
30/5/2023
Unrooted vs rooted
• Unrooted tree:
• No knowledge of common ancestor
• Relative relationships
• No evolutionary direction
• The root of the tree is not known (the common ancestor is already extinct).
• A need to define the root of a tree in practice.
• Define the root of a tree.
• Outgroup (distant relation; e.g.. bird for mammal tree).
• Midpoint root (midpoint of two most divergent groups) divergence from
root to tips for both branches is equal and follows the “molecular clock”
hypothesis.
• Molecular clock
• Molecular sequences evolve at constant rates.
• Amount of accumulated mutations is proportional to evolutionary time.
• branch lengths on a tree can be used to estimate divergence time.
• Assume the of uniformity of evolutionary rates (rarely true in reality).
Terminology
clade
monophyletic taxon/taxa (plural)
node branch
dichotomy polytomy
lineage
root node
• Taxa: Operational Taxonomic Units (OTUs).
• Branch join: node
• Dichotomy: Node bifurcate to give to two descendent
• Multifurcating node: polytomy
• Lineage: Path depicting an ancestor-descendent relationship
• Clade: a group of taxa descended from a single common ancestor
• Monophyletic group: two taxa share a unique common ancestor not shared
by any other taxa (sister taxa to each other)
• Tree topology: The branching pattern.
5
30/5/2023
Examples of clades
Lindblad-Toh et al., Nature
438: 803 (2005), fig. 10
Examples of multifurcation: failure to resolve the branching order
of some metazoans and protostomes
Rokas A. et al., Animal Evolution and the Molecular Signature of Radiations
Compressed in Time, Science 310:1933 (2005).
6
30/5/2023
Dendrogram, cladogram, phylogram
• Dendrogram is the ‘generic’ term applied to any type of diagrammatic representation of
phylogenetic trees. All four trees depicted here are dendrograms.
• Cladogram (to some biologists) is a tree in which branch lengths DO NOT represent evolutionary
time; clades just represent a hypothesis about actual evolutionary history (Not scaled).
TREE1 and TREE2 are cladograms and TREE1 = TREE2
• Phylogram (to some biologists) is a tree in which branch lengths DO represent evolutionary time;
clades represent true evolutionary history (amount of character change) (Scaled).
TREE3 and TREE4 are phylograms and TREE3 ≠ TREE4
Newick format
Cladogram Phylogram
C
A B C D E E
A B D
(((B,C),A),(D,E))
(((B:1,C:2),A:2),(D:1.2,E:2.4))
To provide information of tree topology to computer programs
7
30/5/2023
Finding a tree may be difficult
• The search for a correct tree topology can sometimes be extremely difficult
and computationally demanding.
• Number of possible tree topologies is a function of the number of taxa.
• Unrooted trees:
NU = (2n-5)!/2n-3(n-3)!
• Rooted trees:
NR = (2n-3)!/2n-2(n-2)!
Tree-building methods
• Two tree-building
• Distance-based methods:
• Distance metric: such as the number of amino acid changes
between the sequences, or a distance score.
• UPGMA
• Neighbor-joining (NJ).
• Character-based methods:
• Maximum parsimony: the search for the tree with the fewest
amino acid (or nucleotide) changes that account for the
observed differences between taxa.
• Maximum likelihood.
8
30/5/2023
Multiple Sequence Alignment
Multiple Sequence Alignment
Distance-based tree
Calculate the pairwise alignments;
if two sequences are related,
put them next to each other on the tree
9
30/5/2023
Multiple Sequence Alignment
Character-based tree: identify positions
that best describe how characters (amino
acids) are derived from common ancestors
Classification of phylogenetic building methods
COMPUTATIONAL METHOD
Optimality criterion Clustering algorithm
Characters
PARSIMONY
MAXIMUM LIKELIHOOD
DATA TYPE
Distances
MINIMUM EVOLUTION UPGMA
NEIGHBOR-JOINING
10
30/5/2023
Tree-building methods
[1] distance-based
[2] character-based: maximum parsimony
[3] character- and model-based: maximum likelihood
Tree-building methods: UPGMA
UPGMA:
• unweighted pair group method using arithmetic mean.
• Distance based method
1 2
3
4
11
30/5/2023
Tree-building methods: UPGMA
Step 1: compute the pairwise distances of all
the proteins.
1 2
3
4
Tree-building methods: UPGMA
Step 2: Find the two proteins with the
smallest pairwise distance. Cluster them.
1 2
6
3
1 2
4
12
30/5/2023
Tree-building methods: UPGMA
Step 3: Find the next two proteins
with the smallest pairwise distance. Cluster them.
1 2
6 7
1 2 4 5
3
4
Tree-building methods: UPGMA
Step 4: Keep going. Cluster.
1 2
8
7
6
3
4
5
1 2 4 5 3
13
30/5/2023
Tree-building methods: UPGMA
Step 4: Last cluster! This is your tree.
1 2
8
7
3
4 6
5
1 2 4 5 3
Tree-building methods: UPGMA
• UPGMA is a simple approach for making trees.
• An UPGMA tree is always rooted.
• An assumption of the algorithm is that the molecular
clock is constant for sequences in the tree.
• If there are unequal substitution rates, the tree may
be wrong.
• While UPGMA is simple, it is less accurate than the
neighbor-joining approach.
14