Introduction - Arbres - Phylogénique
Introduction - Arbres - Phylogénique
01
Phylogenetic tree: basic concepts
A useful tool for representing
evolutionary relationships between
objects arising from a common ancestor
02
Content
03
A phylogenetic tree is a combinatorial object
04
A phylogenetic tree is a combinatorial object
a
b An X-tree is a special tree verifying the following
properties
f ● no vertex of degree 2
● every vertex of degree 1 is distinctly labelled
e
c
d
05
A phylogenetic tree is a combinatorial object
a
b
A binary X-tree is an X-tree with only vertices of degree
1 or 3..
e
c
d
a
b
A rooted X-tree is an X-tree with one vertex of degree
2, called the root.
f
r
e
c
d
06
A phylogenetic tree is a combinatorial object
a
b
A phylogenetic tree is an X-tree:
● labelled by gene, population or species names,
● sometimes rooted,
● often binary,
e ● often with branch length.
c
d
a
a
b b
f f
r r
e e
c
d c
07
Phylogenetic tree: current terms
external
branch a
external node
leaf
b operational taxonomic unit (OTU)
taxon
f
root
internal c
branch
internal d
node
08
Phylogenetic inference: general concepts
09
Inferring phylogenetic trees from characters
Key hypothesis:
each considered character is evolving by descent with modification (i.e. homology)
Key idea:
evolutionary history is reconstructed by comparing observed character states from contemporary taxa
a character that is
comparable within taxa
putative
evolutionary
event
a: 0
b: 0
putative ancestral
character state 1>0 f: 0
e: 0
r: 1
c: 1 a character state drawn
from the alphabet of the
d: 1 corresponding
character
10
Inferring phylogenetic trees from characters
Key hypothesis:
each considered character is evolving by descent with modification (i.e. homology)
Key idea:
evolutionary history is reconstructed by comparing observed character states from contemporary taxa
a character that is
comparable within taxa
a: 2 G 0 T ATG M
a set of characters (e.g.
number, nucleotide, binary,
b: 2 A 0 T TTG L
codon, amino acid) for each
taxon
f: 2 A 0 T CTG L
e: 4 G 0 T ATA I
11
Inferring phylogenetic trees from molecular
characters
Key hypothesis:
each character has evolved by descent with modification; therefore, homologous sequences are quite
similar with some local differences (i.e. primary homology)
Key idea:
looking for putative homologous sequences by similarity searches (e.g. BLAST); when performing a multiple
sequence alignment, all possible homologous characters are estimated by comparing all character states
and optimizing some sequence similarity criteria
12
Phylogenetic data: current terms
an aligned
character or
site
ACGTGCATGCATGACGTGATATGCGTGACGTGAACGTGTAACGTG
a (molecular)
ACGTGCATGCATCACGAGATATGCGTGAGGTGATCGTGTAACGTG
sequence
ACGTGCATGCTTGACGTGATATGCGTGACGTGAACCTGACCCGTG
ACGACCATGCTTGACGTGATATGCGTGACGTGAACGTGACCCGTG
ACGTGCATGCTTTACCTGATTTGRGTCACGTGA---TGATCCGTG
a (known)
character state
a gap
a degenerated
character state
13
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion
ACGTGCATGCATGACGTGATATGCGTGACGTGAACGTGTAACGTG
ACGTGCATGCATCACGAGATATGCGTGAGGTGATCGTGTAACGTG
ACGTGCATGCTTGACGTGATATGCGTGACGTGAACCTGACCCGTG
ACGACCATGCTTGACGTGATATGCGTGACGTGAACGTGACCCGTG
ACGTGCATGCTTTACCTGATTTGRGTCACGTGA---TGATCCGTG
14
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion
15
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion
16
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion
17
How many different phylogenetic trees?
Number of distinct binary X-trees on n taxa:
t(n) := (2n−5)!! := (2n−5) (2n−7) (2n−9) ... 15 × 13 × 11 × 9 × 7 × 5 × 3
n t(n)
B C
4 3
5 15
A D
t(n) = 3
6 105
7 945
C B
8 10,395
9 135,135
10 2,027,025
n=4
A D
11 34,459,425
12 654,729,075 D C
13 13,749,310,575
… … A B
18
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion
19
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion
20
(Criscuolo & Gribaldo 2011)
A note on the root
21
A note on the root
For practical reasons, most of the phylogenetic tree reconstruction strategies lead to
unrooted trees…
B
A
C
E
D
22
A note on the root
An X-tree on n taxa contains 2n−3 branches; there exist therefore 2n−3 possible
rooting.
Unfortunately, very few optimality criteria exist to assess the most likely root of a
phylogenetic tree (e.g. probabilistic model requiring important computing resources).
However, there exist some tricks...
B
A
C
E
D
n=5 2n−3 = 7
23
Midpoint rooting
If every sequence has evolved following approximately the same substitution rate, a
simple way is to consider the midpoint of the associated phylogenetic tree as the
putative root.
The midpoint is defined as the middle of the longest path in the tree.
24
Midpoint rooting
If every sequence has evolved following approximately the same substitution rate, a
simple way is to consider the midpoint of the associated phylogenetic tree as the
putative root.
The midpoint is defined as the middle of the longest path in the tree.
Of note, NJplot (tree editor within SeaView) automatically performs midpoint rooting
when displaying a phylogenetic tree.
However, this approach often leads to erroneous rooting when substitution rate is far
from constant...
midpoint root
real root
25
Outgroup rooting
The outgroup rooting is a simple, more general and robust approach that consists in
adding several homologous but distantly related sequences into the considered
dataset.
Be careful: when too distant, the outgroup could sometimes causes artefactual
relationship (e.g. long branch attraction)
In both trees, the outgroup C. albicans is clearly distinct from the ingroup (Saccharomyces taxa)
(from Criscuolo 2011)
26
Storing phylogenetic trees
0
82
A
0
0.
6 8
00
0.
0.0686
0.1012 0.0218 C
02
E 0.
02 0.
04
48
D
27
The NEWICK format for storing a tree
(A:0.0068,B:0.0820)
0
82
A
0
0.
6 8
00
0.
0.0686
0.1012 0.0218 C
02
E 0.
02 0.
04 (C:0.0218,D:0.0448)
48
E:0.1012
D
28
The NEWICK format for storing a tree
(A:0.0068,B:0.0820):0.0686
0
82
A
0
0.
6 8
00
0.
0.0686
0.1012 0.0218 C
02
E 0.
02 0.
04 (C:0.0218,D:0.0448):0.0202
48
E:0.1012
D
29
The NEWICK format for storing a tree
(A:0.0068,B:0.0820):0.0686
0
82
A
0
0.
6 8
00
0.
0.0686
0.1012 0.0218 C
02
E 0.
02 0.
04 (C:0.0218,D:0.0448):0.0202
48
E:0.1012
D
((A:0.0068,B:0.0820):0.0686,(C:0.0218,D:0.0448):0.0202,E:0.1012);
30
The NEWICK format for storing a tree
((A:0.0068,B:0.0820):0.0686,(C:0.0218,D:0.0448):0.0202,E:0.1012);
A B C D E
0.1012
0.0202
0.0686
B
20
A
08
0.
68
00
0.
0.0686
0.1012 0.0218 C
02
E 0.
02 0.
04
48
D
31
The NEWICK format for storing a tree
The NEWICK format is a simple way to store phylogenetic trees:
● with branch lengths
((A:0.0068,B:0.0820):0.0686,(C:0.0218,D:0.0448):0.0202,E:0.1012);
A B C D E
0.1012
0.0202
0.0686
32
The NEWICK format for storing a tree
The NEWICK format is a simple way to store phylogenetic trees:
● without branch length (only topology)
((A,B),(C,D),E);
A B C D E
33
The NEWICK format for storing a tree
The NEWICK format is a simple way to store phylogenetic trees:
● with internal node supports
((A,B)90,(C,D)76,E);
A B C D E
76%
90%
34
The NEWICK format for storing a tree
The NEWICK format is a simple way to store phylogenetic trees:
● with branch lengths and internal node supports
((A:0.0068,B:0.0820)90:0.0686,(C:0.0218,D:0.0448)76:0.0202,E:0.1012);
A B C D E
76%
90%
0.1012
0.0202
0.0686
35
Conclusion
36