0% found this document useful (0 votes)

35 views36 pages

Introduction - Arbres - Phylogénique

The document introduces basic concepts and vocabulary related to phylogenetic inference including phylogenetic trees, characters, and molecular sequences. It discusses how phylogenetic trees are inferred from character data by comparing character states among taxa and selecting trees that optimize criteria like parsimony or probability under an evolutionary model.

Uploaded by

Hajar Mahir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views36 pages

Introduction - Arbres - Phylogénique

Uploaded by

Hajar Mahir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Introduction to phylogenetic inference

Properties and vocabulary

01
Phylogenetic tree: basic concepts
A useful tool for representing
evolutionary relationships between
objects arising from a common ancestor

02
Content

Phylogenetic tree: concepts and vocabulary

Phylogenetic data: concepts and vocabulary

From phylogenetic data to phylogenetic inference: general concepts

The NEWICK coding to store phylogenetic trees

03
A phylogenetic tree is a combinatorial object

A graph is an ordered pair G = (V, E) where

● V is a set of vertices (nodes)
● E is a set of edges (arcs) each joining two
vertices from V

The degree (valency) of a vertex is the number of

edges that connect to it.

A path between two vertices is a set of edges that

connect them.

A connected component is a subgraph in which any

two vertices are connected to each other by paths,
and which is connected to no additional vertex.

A cycle is a subgraph in which there exist at least two

paths between each pair of vertices.

04
A phylogenetic tree is a combinatorial object

A tree is a special graph verifying the following

properties
● only one connected component
● no cycle

a
b An X-tree is a special tree verifying the following
properties
f ● no vertex of degree 2
● every vertex of degree 1 is distinctly labelled
e
c
d

05
A phylogenetic tree is a combinatorial object
a
b
A binary X-tree is an X-tree with only vertices of degree
1 or 3..

e
c
d

a
b
A rooted X-tree is an X-tree with one vertex of degree
2, called the root.
f
r
e
c
d

06
A phylogenetic tree is a combinatorial object
a
b
A phylogenetic tree is an X-tree:
● labelled by gene, population or species names,
● sometimes rooted,
● often binary,
e ● often with branch length.
c
d
a

a
b b

f f
r r
e e
c
d c

07
Phylogenetic tree: current terms

external
branch a

external node
leaf
b operational taxonomic unit (OTU)
taxon

f
root

internal c
branch

internal d
node

08
Phylogenetic inference: general concepts

09
Inferring phylogenetic trees from characters
Key hypothesis:
each considered character is evolving by descent with modification (i.e. homology)
Key idea:
evolutionary history is reconstructed by comparing observed character states from contemporary taxa

a character that is
comparable within taxa
putative
evolutionary
event
a: 0

b: 0
putative ancestral
character state 1>0 f: 0

e: 0
r: 1
c: 1 a character state drawn
from the alphabet of the
d: 1 corresponding
character

10
Inferring phylogenetic trees from characters
Key hypothesis:
each considered character is evolving by descent with modification (i.e. homology)
Key idea:
evolutionary history is reconstructed by comparing observed character states from contemporary taxa

a character that is
comparable within taxa

a: 2 G 0 T ATG M
a set of characters (e.g.
number, nucleotide, binary,
b: 2 A 0 T TTG L
codon, amino acid) for each
taxon
f: 2 A 0 T CTG L

e: 4 G 0 T ATA I

c: 6 C 1 F AAA K a character state drawn

from the alphabet of the
d: 6 T 1 F AGA R corresponding
character

11
Inferring phylogenetic trees from molecular
characters
Key hypothesis:
each character has evolved by descent with modification; therefore, homologous sequences are quite
similar with some local differences (i.e. primary homology)
Key idea:
looking for putative homologous sequences by similarity searches (e.g. BLAST); when performing a multiple
sequence alignment, all possible homologous characters are estimated by comparing all character states
and optimizing some sequence similarity criteria

12
Phylogenetic data: current terms
an aligned
character or
site

ACGTGCATGCATGACGTGATATGCGTGACGTGAACGTGTAACGTG
a (molecular)
ACGTGCATGCATCACGAGATATGCGTGAGGTGATCGTGTAACGTG
sequence
ACGTGCATGCTTGACGTGATATGCGTGACGTGAACCTGACCCGTG
ACGACCATGCTTGACGTGATATGCGTGACGTGAACGTGACCCGTG
ACGTGCATGCTTTACCTGATTTGRGTCACGTGA---TGATCCGTG

a (known)
character state
a gap
a degenerated
character state

13
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion

ACGTGCATGCATGACGTGATATGCGTGACGTGAACGTGTAACGTG

ACGTGCATGCATCACGAGATATGCGTGAGGTGATCGTGTAACGTG

ACGTGCATGCTTGACGTGATATGCGTGACGTGAACCTGACCCGTG

ACGACCATGCTTGACGTGATATGCGTGACGTGAACGTGACCCGTG

ACGTGCATGCTTTACCTGATTTGRGTCACGTGA---TGATCCGTG

Phylogenetic dataset Phylogenetic tree

14
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion

Three main classes of criteria:

1. distance-based tree reconstruction
● estimating the pairwise genetic distance between each pair of aligned
sequences
● search for the tree representation that best represents the estimated
distances
2. maximum of parsimony approach
● searching for the tree(s) that minimize(s) the number of substitutions when
representing the evolutionary history of the aligned characters
3. probabilistic criteria
○ given an evolutionary model, searching for the tree representation that
maximizes the probability of the aligned characters to be represented by
this tree

15
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion

In practice, a ‘good’ criterion should be:

● easy to compare and interpret, e.g. a numerical value to minimize or maximize
● fast to compute from the data for a given tree topology, e.g. sum of branch
lengths (distance), number of substitution steps (parsimony)...

16
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion

In practice, a ‘good’ criterion should be:

A naive approach could therefore be conducted as followed:

● generating every possible tree topology
● estimating the optimality criterion for each generated tree topology
● selecting the one that optimize the considered criterion

17
How many different phylogenetic trees?
Number of distinct binary X-trees on n taxa:
t(n) := (2n−5)!! := (2n−5) (2n−7) (2n−9) ... 15 × 13 × 11 × 9 × 7 × 5 × 3

n t(n)
B C
4 3
5 15
A D

t(n) = 3
6 105
7 945
C B
8 10,395
9 135,135
10 2,027,025

n=4
A D
11 34,459,425
12 654,729,075 D C
13 13,749,310,575
… … A B

18
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion

In practice, a ‘good’ criterion should be:

● easy to compare, e.g. a numerical value to minimize or maximize
● fast to compute from the data for a given tree topology, e.g. sum of branch lengths (distance), number
of substitution steps (parsimony)...

A naive approach could therefore be conducted as followed:

● generating every possible tree topology
● estimating the optimality criterion for each generated tree topology
● selecting the one that optimize the considered criterion

Impossible!!! Too many different tree topologies.

For example, if 1 ms is required to estimate the criterion value for a given tree topology on n = 13 taxa, then
~13,749,310 s (~5 months) are required to perform the naive approach…
Parallelization leads to the same difficulties, but for larger n values.

19
Phylogenetic reconstruction in practice
Key approach:
1. associating an optimality criterion to a given tree topology
2. selecting the tree(s) that optimize(s) this criterion

In practice, a ‘good’ criterion should be:

20
(Criscuolo & Gribaldo 2011)
A note on the root

21
A note on the root
For practical reasons, most of the phylogenetic tree reconstruction strategies lead to
unrooted trees…

B
A

C
E
D

22
A note on the root
An X-tree on n taxa contains 2n−3 branches; there exist therefore 2n−3 possible
rooting.
Unfortunately, very few optimality criteria exist to assess the most likely root of a
phylogenetic tree (e.g. probabilistic model requiring important computing resources).
However, there exist some tricks...

B
A

C
E
D

n=5 2n−3 = 7

23
Midpoint rooting
If every sequence has evolved following approximately the same substitution rate, a
simple way is to consider the midpoint of the associated phylogenetic tree as the
putative root.
The midpoint is defined as the middle of the longest path in the tree.

the more distantly

related taxa in the tree
the middle of the unique
path joining x and y

24
Midpoint rooting
If every sequence has evolved following approximately the same substitution rate, a
simple way is to consider the midpoint of the associated phylogenetic tree as the
putative root.
The midpoint is defined as the middle of the longest path in the tree.

Of note, NJplot (tree editor within SeaView) automatically performs midpoint rooting
when displaying a phylogenetic tree.

However, this approach often leads to erroneous rooting when substitution rate is far
from constant...
midpoint root

real root

25
Outgroup rooting
The outgroup rooting is a simple, more general and robust approach that consists in
adding several homologous but distantly related sequences into the considered
dataset.
Be careful: when too distant, the outgroup could sometimes causes artefactual
relationship (e.g. long branch attraction)

In both trees, the outgroup C. albicans is clearly distinct from the ingroup (Saccharomyces taxa)
(from Criscuolo 2011)

26
Storing phylogenetic trees

0
82
A

0
0.
6 8
00
0.
0.0686

0.1012 0.0218 C
02
E 0.
02 0.
04
48
D

27
The NEWICK format for storing a tree

(A:0.0068,B:0.0820)

0
82
A

0
0.
6 8
00
0.
0.0686

0.1012 0.0218 C
02
E 0.
02 0.
04 (C:0.0218,D:0.0448)
48
E:0.1012
D

28
The NEWICK format for storing a tree

(A:0.0068,B:0.0820):0.0686

0
82
A

0
0.
6 8
00
0.
0.0686

0.1012 0.0218 C
02
E 0.
02 0.
04 (C:0.0218,D:0.0448):0.0202
48
E:0.1012
D

29
The NEWICK format for storing a tree

(A:0.0068,B:0.0820):0.0686

0
82
A

0
0.
6 8
00
0.
0.0686

0.1012 0.0218 C
02
E 0.
02 0.
04 (C:0.0218,D:0.0448):0.0202
48
E:0.1012
D

((A:0.0068,B:0.0820):0.0686,(C:0.0218,D:0.0448):0.0202,E:0.1012);

30
The NEWICK format for storing a tree
((A:0.0068,B:0.0820):0.0686,(C:0.0218,D:0.0448):0.0202,E:0.1012);

A B C D E

0.0068 0.0820 0.0218 0.0448

0.1012
0.0202
0.0686

B
20

A
08
0.

68
00
0.
0.0686

0.1012 0.0218 C
02
E 0.
02 0.
04
48
D

31
The NEWICK format for storing a tree
The NEWICK format is a simple way to store phylogenetic trees:
● with branch lengths

((A:0.0068,B:0.0820):0.0686,(C:0.0218,D:0.0448):0.0202,E:0.1012);

A B C D E

0.0068 0.0820 0.0218 0.0448

0.1012
0.0202
0.0686

32
The NEWICK format for storing a tree
The NEWICK format is a simple way to store phylogenetic trees:
● without branch length (only topology)

((A,B),(C,D),E);

A B C D E

33
The NEWICK format for storing a tree
The NEWICK format is a simple way to store phylogenetic trees:
● with internal node supports

((A,B)90,(C,D)76,E);

A B C D E

76%
90%

34
The NEWICK format for storing a tree
The NEWICK format is a simple way to store phylogenetic trees:
● with branch lengths and internal node supports

((A:0.0068,B:0.0820)90:0.0686,(C:0.0218,D:0.0448)76:0.0202,E:0.1012);

A B C D E

0.0068 0.0820 0.0218 0.0448

76%
90%
0.1012
0.0202
0.0686

35
Conclusion

Phylogenetic Analysis1
No ratings yet
Phylogenetic Analysis1
62 pages
Huson D.H., Rupp R., Scornavacca C. - Phylogenetic networks-CUP (2010) PDF
100% (1)
Huson D.H., Rupp R., Scornavacca C. - Phylogenetic networks-CUP (2010) PDF
375 pages
Constructing Phylogenetic Trees Using Maximum Likelihood
No ratings yet
Constructing Phylogenetic Trees Using Maximum Likelihood
58 pages
Maximum Parsimony and Likelihood
No ratings yet
Maximum Parsimony and Likelihood
34 pages
Evolution - Chapter2 & 16
No ratings yet
Evolution - Chapter2 & 16
87 pages
Phylogenetics
No ratings yet
Phylogenetics
49 pages
Swami
No ratings yet
Swami
12 pages
4 Phylogenetics
No ratings yet
4 Phylogenetics
43 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Principles-Of-Computational-Biology
No ratings yet
Principles-Of-Computational-Biology
67 pages
BIOL 401 - W22 - Lecture - Phylogenetic Inference
No ratings yet
BIOL 401 - W22 - Lecture - Phylogenetic Inference
39 pages
Assignment5 BI12-223
No ratings yet
Assignment5 BI12-223
9 pages
Ceng465 Week8
No ratings yet
Ceng465 Week8
40 pages
Building Phylogenetic Tree in Genomic Era
No ratings yet
Building Phylogenetic Tree in Genomic Era
17 pages
Molecular Phylogeny
No ratings yet
Molecular Phylogeny
78 pages
Phylogenetics Basics
No ratings yet
Phylogenetics Basics
28 pages
Taxonomic Concepts and Principles
No ratings yet
Taxonomic Concepts and Principles
15 pages
BDMH Phylogenetic
No ratings yet
BDMH Phylogenetic
32 pages
Bioengineering 11 00480 With Cover
No ratings yet
Bioengineering 11 00480 With Cover
23 pages
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
No ratings yet
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
32 pages
Lecture 11 (Phylogenetic)
No ratings yet
Lecture 11 (Phylogenetic)
24 pages
Unit IV
No ratings yet
Unit IV
11 pages
H. Lerchs and I. F. Grossmann - 1965 - Optimum Design of Open Pit Mines PDF
33% (3)
H. Lerchs and I. F. Grossmann - 1965 - Optimum Design of Open Pit Mines PDF
8 pages
BTC 506 Phylogenetic Analysis
No ratings yet
BTC 506 Phylogenetic Analysis
58 pages
Phyl o Genetics
No ratings yet
Phyl o Genetics
58 pages
Phylogenetic Analysis Extra
No ratings yet
Phylogenetic Analysis Extra
13 pages
Phylogenetic Tree Sec 4
No ratings yet
Phylogenetic Tree Sec 4
7 pages
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
No ratings yet
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
27 pages
Phylogenetic Tree Construction
No ratings yet
Phylogenetic Tree Construction
6 pages
PHYLOGENY
No ratings yet
PHYLOGENY
17 pages
Molecular Phylogeny - Introduction
No ratings yet
Molecular Phylogeny - Introduction
12 pages
Key Points 1
No ratings yet
Key Points 1
11 pages
Phylogenetics
No ratings yet
Phylogenetics
108 pages
Phylogenic Tree
No ratings yet
Phylogenic Tree
42 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
BIL-Note 2 Last
No ratings yet
BIL-Note 2 Last
44 pages
Phylogenetic Tree Reconstruction: I519 Introduction To Bioinformatics, 2012
No ratings yet
Phylogenetic Tree Reconstruction: I519 Introduction To Bioinformatics, 2012
40 pages
Molecular Phylogenetic Analysis: - Humans-flies-Mollusks - Common Phenotype?
No ratings yet
Molecular Phylogenetic Analysis: - Humans-flies-Mollusks - Common Phenotype?
35 pages
Intro To Phyl o Genetics
No ratings yet
Intro To Phyl o Genetics
44 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
9 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
11 pages
Final 2
No ratings yet
Final 2
85 pages
Updated PHYLOGENETIC TREE Hand Outs
No ratings yet
Updated PHYLOGENETIC TREE Hand Outs
3 pages
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
No ratings yet
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
39 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
12 pages
Phylogenetic Tree Bioinformatics
No ratings yet
Phylogenetic Tree Bioinformatics
4 pages
Introduction To Phylogeny
No ratings yet
Introduction To Phylogeny
57 pages
SLG Bio1 4.4 Interpreting Phylogenetic Trees and Cladograms
No ratings yet
SLG Bio1 4.4 Interpreting Phylogenetic Trees and Cladograms
6 pages
Phylogenetics PDF by Matti Ullah KHan NIazi
No ratings yet
Phylogenetics PDF by Matti Ullah KHan NIazi
4 pages
Molecular Phylogenetics
No ratings yet
Molecular Phylogenetics
29 pages
GTM1 Notes
No ratings yet
GTM1 Notes
13 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
Phylogenetic Tree - JJ2023
No ratings yet
Phylogenetic Tree - JJ2023
2 pages
Phylogenetic Analysis: Based On Two Talks, by
No ratings yet
Phylogenetic Analysis: Based On Two Talks, by
45 pages
Class16-Introduction To Molecular Phylogenetics
No ratings yet
Class16-Introduction To Molecular Phylogenetics
14 pages
4rth Phylogeny by MAtti Ullah KHanNiazi
No ratings yet
4rth Phylogeny by MAtti Ullah KHanNiazi
9 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
Phylogenetic Analysis
100% (1)
Phylogenetic Analysis
27 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
Phylogenetics PDF
No ratings yet
Phylogenetics PDF
21 pages
Lab 3
No ratings yet
Lab 3
6 pages
Applications of Graph Theory in
No ratings yet
Applications of Graph Theory in
18 pages
Spanning Trees
No ratings yet
Spanning Trees
4 pages
Cs6515 Exam 2 Newest Version 2024 Complete 46
No ratings yet
Cs6515 Exam 2 Newest Version 2024 Complete 46
17 pages
Survey of Planar and Outerplanar Graphs in Fuzzy and Neutrosophic Graphs
No ratings yet
Survey of Planar and Outerplanar Graphs in Fuzzy and Neutrosophic Graphs
71 pages
Chapter 6 - A Glimpse of Graph Theory
100% (1)
Chapter 6 - A Glimpse of Graph Theory
15 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
TUTORIAL 5 (A) : Graphs and Trees 1: Q W C K K K
No ratings yet
TUTORIAL 5 (A) : Graphs and Trees 1: Q W C K K K
6 pages
SRM (2011) of I Sem
No ratings yet
SRM (2011) of I Sem
112 pages
Aisee 3.4.3 User Manual
No ratings yet
Aisee 3.4.3 User Manual
113 pages
Generation of Delaney-Dress Symbols: N. Van Cleemput G. Brinkmann
No ratings yet
Generation of Delaney-Dress Symbols: N. Van Cleemput G. Brinkmann
51 pages
Knots and Graphs I-Arc Graphs and Colorings
No ratings yet
Knots and Graphs I-Arc Graphs and Colorings
27 pages
3D Grid Generation
No ratings yet
3D Grid Generation
5 pages
Cs1401 Design and Analysis of Algorithms Unit Iii Dynamic Programming and Greedy Technique
No ratings yet
Cs1401 Design and Analysis of Algorithms Unit Iii Dynamic Programming and Greedy Technique
19 pages
Discrete Math Detailed 1 To 15 Hinglish
No ratings yet
Discrete Math Detailed 1 To 15 Hinglish
5 pages
2004 Bay Area Mathematical Olympiad Problems and Solutions PDF
No ratings yet
2004 Bay Area Mathematical Olympiad Problems and Solutions PDF
5 pages
Download
No ratings yet
Download
2 pages
Data Structures and Algorithms Final Exam PDF
No ratings yet
Data Structures and Algorithms Final Exam PDF
4 pages
Functions and Their Graph
No ratings yet
Functions and Their Graph
26 pages
09 Graph Decomposition Problems 1
No ratings yet
09 Graph Decomposition Problems 1
12 pages
Diffusion Source Localization in Large Networks: Lei Ying Kai Zhu
No ratings yet
Diffusion Source Localization in Large Networks: Lei Ying Kai Zhu
97 pages
07 Voronoi II
No ratings yet
07 Voronoi II
43 pages
1972 18erdos
No ratings yet
1972 18erdos
4 pages
Adc 8
No ratings yet
Adc 8
46 pages
Discrete Practical 1
No ratings yet
Discrete Practical 1
9 pages
Graph - Theory Tut
No ratings yet
Graph - Theory Tut
2 pages
Matgraph - A Matlab Toolbox For Graph Theory
No ratings yet
Matgraph - A Matlab Toolbox For Graph Theory
7 pages
Learning From Reinforcement: - Introduction (10.1) - Failure Is The Surest Path To Success (10.2)
No ratings yet
Learning From Reinforcement: - Introduction (10.1) - Failure Is The Surest Path To Success (10.2)
12 pages
Exercise 2C
No ratings yet
Exercise 2C
2 pages

Introduction - Arbres - Phylogénique

Uploaded by

Introduction - Arbres - Phylogénique

Uploaded by

Introduction to phylogenetic inference

Properties and vocabulary

Phylogenetic tree: concepts and vocabulary

Phylogenetic data: concepts and vocabulary

From phylogenetic data to phylogenetic inference: general concepts

The NEWICK coding to store phylogenetic trees

A graph is an ordered pair G = (V, E) where

The degree (valency) of a vertex is the number of

A path between two vertices is a set of edges that

A connected component is a subgraph in which any

A cycle is a subgraph in which there exist at least two

A tree is a special graph verifying the following

c: 6 C 1 F AAA K a character state drawn

Phylogenetic dataset Phylogenetic tree

Three main classes of criteria:

In practice, a ‘good’ criterion should be:

In practice, a ‘good’ criterion should be:

A naive approach could therefore be conducted as followed:

In practice, a ‘good’ criterion should be:

A naive approach could therefore be conducted as followed:

Impossible!!! Too many different tree topologies.

In practice, a ‘good’ criterion should be:

the more distantly

0.0068 0.0820 0.0218 0.0448

0.0068 0.0820 0.0218 0.0448

0.0068 0.0820 0.0218 0.0448

You might also like