0% found this document useful (0 votes)

17 views44 pages

Intro To Phyl o Genetics

Uploaded by

mcinerneyjames

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views44 pages

Intro To Phyl o Genetics

Uploaded by

mcinerneyjames

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

You are on page 1/ 44

An Introduction To Phylogeny

Conor Meehan (he/they)

[email protected]
@con_meehan
Learning outcomes
Identify the various sections of a phylogenetic tree

Define various phylogenetic terminology

Describe the process of parsimony tree building

Translate a distance matrix into a phylogeny

Discuss the drawbacks of non- and semi-parametric phylogenetic methods

Describe the process of tree space searching

Recognise the maximum likelihood and bootstrap approaches

What is a phylogeny?
The evolutionary relationship between lineages

Lineages can be:

Genes
Individuals
Populations
Species
…

Phylogenetics is the study of these relationships

Derived from the greek meaning “tribe origin”
Often depicted as a tree with sampled taxa at the tips
Tree Terminology
Internal nodes also
called ancestors or (internal) node
parents

The node labelled Root Peripheral branch

(internal) node is the Taxon (plural:
most recent common taxa)
ancestor (MRCA) or Internal branch or Terminal
node
A, B and C
Writing trees on a computer
Trees are represented in a text file using parentheses
and commas (newick format)
Every internal node is a comma and the leaf nodes
that connect to that internal node have ( ) around
them
This tree is represented as:
(((A,B),C),(D,(E,F)))
If we know the lengths of the branches, they are
written after each label or ) after a colon
(((A:0.5,B:0.1):0.2,C:0.5):0.3,(D:0.1,(E:0.2,F:0.2)))
Phylogenetic tree representation
Cladogram: Branch lengths have no meaning Phylogram: Branch lengths have meaning
(Sometimes have straight or curved lines) (e.g. estimated/expected/observed amount of change or time)

Unrooted: No common ancestor is known or time is not known

Phylogenetic relationships terminology

Monophyletic
A group of taxa that contains an ancestor and all its descendants
Monophyletic group is also referred to as a clade

Polyphyletic
A group of taxa brought together by convergent evolution

Paraphyletic
A monophyletic group where a subgroup has been removed
Phylogenetic relationships terminology

Wikipedi
a
Phylogenetic terminology

Monophyletic
M. avium

Polyphyletic
M. simiae
(M. kubicae separate)

Paraphyletic
M. simiae without M. kubicae
(M. avium shares the same ancestor)

Fedrizzi, Meehan et al, Sci Rep

2017
Phylogenetic nodes terminology
Bifurcating
A node that connects only 2 branches

Multifurcating (polytomy)
A node in a tree which connects more than two branches

In-group/out-group
In-group is the set of taxa of interest. Assumed to be monophyletic
Out-group is a related set of taxa to the in-group, used for rooting
Phylogenetic nodes terminology
Multifurcating node

In-group

Bifurcating node

Out-group
(if we only used
it to place root)
In-group vs out-group example
Research into L8 and its
relationshop to L1-L7
In-group
M. canettii used as
outgroup
Allows us to see placement
of L8 samples

Ngabonziza et al, Nat Comms

2020
Tasks 1
Do sampled taxa sit at the end of internal or external branches of a phylogenetic
tree?

What does monophyletic mean?

How many offspring does a bifurcating node have?

How do you write this tree in newick format?

I.e. using the ( X, Y ) format
How do we create a phylogeny?
Use an optimality criterion to define how we measure the fit of the data to a
solution

Tree-search method: how do we decide between possible solutions?

Simplest is parsimony
Referred to as non-parametric
The tree that represents the minimum number of character changes between taxa is the
optimal solution
A morphological example
A morphological example
A morphological parsimony example
Parsimony looks for the least amount of changes that explain the relationships

https://www.khanacademy.org/science/biology/her/tree-of-life/a/building-an-evolutionary-tree
A morphological parsimony example
Parsimony looks for the least amount of changes that explain the relationships

https://www.khanacademy.org/science/biology/her/tree-of-life/a/building-an-evolutionary-tree
The primary problem with parsimony
Convergent evolution: the evolution of the same trait multiple times
E.g. wings in bats and birds

https://www.khanacademy.org/science/biology/her/tree-of-life/a/building-an-evolutionary-tree
Use of molecular data in phylogenetics
Phylogeny is most often undertaken using molecular data
Nucleotides/DNA sequence (sometimes RNA sequences)
Amino acids/Protein sequence
Less sensitive to convergent evolution
Parsimony also can be used on molecular data
Use G, T, C, A or amino acids as the characters
Fine on very closely related isolates (e.g. local transmission cluster seperated by a
couple of SNPs)
Early tree building used distance based methods
Semi-parametric
Get a distance between 2 sequences
Sequences with shortest distance are most related
Most often uses UPGMA or neighbour joining method
UPGMA (sequences)

S1: TTCAG
S2: TTCGG
S3: TTTTG
S4: TTATG
S5: AACTG
UPGMA (distance matrix) S1: TTCAG
S2: TTCGG
S3: TTTTG
S4: TTATG
S5: AACTG

Note: count as differences, not as number of characters in common

Dissimilarity is differences/total sites
UPGMA (filled distance matrix) S1: TTCAG
S2: TTCGG
S3: TTTTG
S4: TTATG
S5: AACTG

Note: count as differences, not as number of characters in common

Dissimilarity is differences/total sites
UPGMA tree building steps
1.

2. Start with every sequence in its own cluster

4. Select the smallest distance between clusters

a) Create a new cluster by joining these two
b) Branch lengths are distance/2
5.

6. Get the distance from that cluster to all others

a) Use a proportional average distance (i.e. using size of cluster*distance)
7.

8. Repeat steps 2 and 3 until complete

We will do this in the hands on session

UPGMA tree building steps
1.

1. S1-S2 and S3-S4 are both smallest so randomly choose

S1-S2
2.
S1: TTCAG
3. New distances to S3 is (0.4+0.4)/2= 0.4 S2: TTCGG
Repeat for all sequences
1.
S3: TTTTG
4.

5. Get smallest distance and repeat 1 and 2 to build entire S4: TTATG
tree S5: AACTG
UPGMA tree building steps
What about unobserved back mutations
over long periods of time?
S1: TTCAG
A->G->A in a sequence
Underestimated distances S2: TTCGG
S3: TTTTG
Are all substitutions equal? S4: TTATG
Some more likely than others S5: AACTG
Transitions vs transversions

Need to correct distances

Use a model of evolution
See optional separate lecture
Tasks 2
Is parsimony a non-parametric or semi-parametric method?

What is the main drawback of parsimony methods?

Is UPGMA a non-parametric or semi-parametric method?

Do you pick the samples with the smallest or largest distance at each step of the
distance approach?
Break
Parametric methods of phylogenetic inference
Previously seen methods take 2 sequences and calculate a distance and continue in
a pairwise manner through the whole set of sequences
Parametric methods take a column in an alignment and calculate the optimality
criterion per position in alignment
Positions are independent
There are two main parametric methods of phylogenetic inference:
Maximum likelihood
Bayesian analysis
Both methods:
Search tree space to find the best tree
Require an explicit model of evolution
Usually GTR
Can incorporate rate heterogeneity
Tree space searching
Imagine a blind person is dropped randomly in the world and told to find Mount Everest
(the global maximum)

They walk in a random direction until they find a section that is sloped upwards

They continue to walk upwards until every direction around them is a downwards slope

They conclude that since they are at the highest point, they must be on Everest

Thus, the highest position they stand at has the maximum likelihood of being Everest,
given the data and starting point
Tree space searching

Lik
elih
oo
d
Trees
Tree space searching

X Random starting
Lik tree
elih
oo
d
Trees
Tree space searching

Search for better trees

X
Lik
elih
oo
d
Trees
Tree space searching
X
Reach the maximum likelihood
tree
(In theory, closest to true true)

X
Lik
elih
oo
d
Trees
Tree space searching

Lik
elih
oo X Random starting
tree
d
Trees
Tree space searching

Lik Search for better trees

elih
oo X
d
Trees
Tree space searching

A local maximum tree is better than those that

are close by (maybe only differ by one or two
branch positions) but is not the best overall tree.

Stuck in local
maximum
X

Lik
elih
oo X
d
Trees
Tree space searching
X Maximum likelihood tree
(Global maximum)

Local
X
maximum

Lik
elih
oo
d
Trees
Tree space searching

The problem is that if they only walk upwards they could get stuck in a local
maximum

Computer programs will implement different strategies to try and get around
this
Multiple starting points
Multiple searches at once; can switch between searching chains
Allow large and small rearrangements
Allow some steps backwards to try improve score
Maximum Likelihood (ML)
A tree topology is proposed
A likelihood score is calculated for each position and added up to get an overall
likelihood score for the data for a given tree topology
Uses P(v) for the proposed branches (see models of evolution lecture)
Searches tree space and tries to find the tree that has the maximum likelihood of
generating the given data
Compare topologies through optimising variables for each to fit data
Example programs: RAxML, IQ-TREE, PAUP*, PhyML

S1: TTCAG Total likelihood: 1225

S2: TTCGG Repeat for new topology
S3: TGTTG
S4:
TGATG
S5:
AACTG
10
0
35
0
etc
…
Bootstrapping
Bootstrapping is a method to test the reliability of an inferred tree
Algorithm (for an alignment of M taxa and N columns):
Create a new alignment of length N from original, sampling columns at random with replacement
Create a new topology for this using the same method
Often in ML analysis this is done with some extra heuristics to speed up the process
Compare topology to original
Every branch that is the same is given a score of 1, any that are not present are given a score of 0
Repeat hundred(s) of times and report as a percentage found for each branch

S1: S1:
TTCAG GACAC
S2: S2:
TTCGG GGCGC
S3: S3:
TGTTG GTTTT
Phylogenetic tree building summary
Can be done on morphological or molecular data
Molecular less likely to be affected by convergent evolution
Many methods for building trees exist, each with its own criteria for
the best fit for the data
Parsimony
Distance
Maximum Likelihood/Bayesian
Most methods require a model of evolution to give information on
how the sequences evolved (see models of evolution learning package)
Complex algorithms such as ML or Bayesian require efficient
searching of the tree space
Tasks 3
What are the main ways to avoid getting stuck in a local maximum in
tree searching?

Does maximum likelihood go sequence by sequence or column by

column?

In ML, at each step do you change the alignment or the tree?

In bootstrapping is sampling done with or without replacement?

Learning outcomes
Identify the various sections of a phylogenetic tree

Define various phylogenetic terminology

Describe the process of parsimony tree building

Translate a distance matrix into a phylogeny

Discuss the drawbacks of non- and semi-parametric phylogenetic methods

Describe the process of tree space searching

Recognise the maximum likelihood and bootstrap approaches

Mathematical Biology - Discrete and Differential Equations (2024)
100% (2)
Mathematical Biology - Discrete and Differential Equations (2024)
379 pages
Technical Program - NGS 2025 - v5
No ratings yet
Technical Program - NGS 2025 - v5
2 pages
Proteogenomics (FULL VERSION DOWNLOAD)
No ratings yet
Proteogenomics (FULL VERSION DOWNLOAD)
17 pages
Bioinformatics Methods Express 1st Edition Paul Dear Instant Download
No ratings yet
Bioinformatics Methods Express 1st Edition Paul Dear Instant Download
48 pages
Databases
No ratings yet
Databases
28 pages
(Ebook) Insect Molecular Biology and Biochemistry by Lawrence I. Gilbert ISBN 0123847478 PDF Download
100% (1)
(Ebook) Insect Molecular Biology and Biochemistry by Lawrence I. Gilbert ISBN 0123847478 PDF Download
52 pages
Assignment2 22BI13350
No ratings yet
Assignment2 22BI13350
6 pages
BIOT 2 2024 MarkSheet
No ratings yet
BIOT 2 2024 MarkSheet
2 pages
GeneAssure Brochure Dec23
No ratings yet
GeneAssure Brochure Dec23
4 pages
Basic Concept of Sequence Similarity Identity and Homology
No ratings yet
Basic Concept of Sequence Similarity Identity and Homology
17 pages
Nisha Bio
No ratings yet
Nisha Bio
4 pages
Lec 10 Phylogenetics
No ratings yet
Lec 10 Phylogenetics
51 pages
Computational Toxicology Flyer
No ratings yet
Computational Toxicology Flyer
1 page
Databases Exercise
No ratings yet
Databases Exercise
3 pages
UCSC Genome Browser
No ratings yet
UCSC Genome Browser
9 pages
BDMH Phylogenetic
No ratings yet
BDMH Phylogenetic
32 pages
Databases of NCBI
No ratings yet
Databases of NCBI
13 pages
Bioinformatics Day2
No ratings yet
Bioinformatics Day2
3 pages
007 - Asma Saparas (SNF Lab 11) - BLAST
No ratings yet
007 - Asma Saparas (SNF Lab 11) - BLAST
6 pages
ADVT - Project Associate - Tea Project 1
No ratings yet
ADVT - Project Associate - Tea Project 1
2 pages
Lecture 11 (Phylogenetic)
No ratings yet
Lecture 11 (Phylogenetic)
24 pages
L13 PhylogenyTrees
No ratings yet
L13 PhylogenyTrees
19 pages
Ceng465 Week8
No ratings yet
Ceng465 Week8
40 pages
4 Phylogenetics
No ratings yet
4 Phylogenetics
43 pages
Slides Week03
No ratings yet
Slides Week03
49 pages
Unigene
No ratings yet
Unigene
7 pages
A Review: Phylogeny Construction Methods: Priyanka Shaktawat, Parvati Bhurani
No ratings yet
A Review: Phylogeny Construction Methods: Priyanka Shaktawat, Parvati Bhurani
4 pages
Swami
No ratings yet
Swami
12 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
12 pages
Molecular Phylogeny
No ratings yet
Molecular Phylogeny
78 pages
Lecture 1-2 Intro
No ratings yet
Lecture 1-2 Intro
24 pages
Phylogenetic Analysis Extra
No ratings yet
Phylogenetic Analysis Extra
13 pages
Disclaimer
No ratings yet
Disclaimer
36 pages
Phylogeny
No ratings yet
Phylogeny
22 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
31 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
9 pages
What Is Bioinformatics
No ratings yet
What Is Bioinformatics
6 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
11 pages
BIL-Note 2 Last
No ratings yet
BIL-Note 2 Last
44 pages
Molecular Phylogeny - Introduction
No ratings yet
Molecular Phylogeny - Introduction
12 pages
Understanding Phylogenies
No ratings yet
Understanding Phylogenies
6 pages
Phylogenetic Analysis1
No ratings yet
Phylogenetic Analysis1
62 pages
BIOL 401 - W22 - Lecture - Phylogenetic Inference
No ratings yet
BIOL 401 - W22 - Lecture - Phylogenetic Inference
39 pages
Unit IV
No ratings yet
Unit IV
11 pages
Phylogenetic Tree Construction
No ratings yet
Phylogenetic Tree Construction
6 pages
Phylogenetics
No ratings yet
Phylogenetics
49 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Phylogenic Tree
No ratings yet
Phylogenic Tree
42 pages
Lab 3
No ratings yet
Lab 3
6 pages
Introduction - Arbres - Phylogénique
No ratings yet
Introduction - Arbres - Phylogénique
36 pages
MPN Table
No ratings yet
MPN Table
3 pages
RNA-Seq and Transcriptome Analysis: Jessica Holmes
No ratings yet
RNA-Seq and Transcriptome Analysis: Jessica Holmes
98 pages
Phyl o Genetics
No ratings yet
Phyl o Genetics
58 pages
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
No ratings yet
Phylogenetic Tree Constructions Methods and Programmes - L 11 - 12
27 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
Aminoacid+Alignment Including PAM & BLOSUM
0% (1)
Aminoacid+Alignment Including PAM & BLOSUM
38 pages
Molecular Evolution and Phylogenetics Session.3
No ratings yet
Molecular Evolution and Phylogenetics Session.3
37 pages
Class16-Introduction To Molecular Phylogenetics
No ratings yet
Class16-Introduction To Molecular Phylogenetics
14 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
No ratings yet
Module 2 Unit - 2 EVOLUTIONARY TREES AND PHYLOGENY
39 pages
Molecular Phylogenetics
No ratings yet
Molecular Phylogenetics
29 pages
Phylogenetics
No ratings yet
Phylogenetics
108 pages
Phylogenetics PDF by Matti Ullah KHan NIazi
No ratings yet
Phylogenetics PDF by Matti Ullah KHan NIazi
4 pages
BTC 506 Phylogenetic Analysis
No ratings yet
BTC 506 Phylogenetic Analysis
58 pages
Phylogenetic Tree Construction - Methods
No ratings yet
Phylogenetic Tree Construction - Methods
7 pages
Maximum Parsimony and Likelihood
No ratings yet
Maximum Parsimony and Likelihood
34 pages
Phylogenetic Tree Bioinformatics - R Shweta
No ratings yet
Phylogenetic Tree Bioinformatics - R Shweta
8 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
31 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Phylogeny Lars Arvestad
No ratings yet
Phylogeny Lars Arvestad
31 pages
Bioinformatics Session16!17!25102021
No ratings yet
Bioinformatics Session16!17!25102021
39 pages
Bioinformatics Overview
100% (1)
Bioinformatics Overview
18 pages
Phylogenetic Analysis
100% (1)
Phylogenetic Analysis
27 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
25 pages
Phylogenetic Tree Reconstruction: I519 Introduction To Bioinformatics, 2012
No ratings yet
Phylogenetic Tree Reconstruction: I519 Introduction To Bioinformatics, 2012
40 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
Introduction To Phylogeny
No ratings yet
Introduction To Phylogeny
57 pages
Bscol 7
No ratings yet
Bscol 7
29 pages
BE Phylogenetics
No ratings yet
BE Phylogenetics
6 pages
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
No ratings yet
Introduction To Molecular Evolution: Mike Thomas October 3, 2002
32 pages
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Final 2
No ratings yet
Final 2
85 pages
Molecular Phylogenetic Analysis: - Humans-flies-Mollusks - Common Phenotype?
No ratings yet
Molecular Phylogenetic Analysis: - Humans-flies-Mollusks - Common Phenotype?
35 pages
Phylogenetic Trees (BIOINFORMATICS)
No ratings yet
Phylogenetic Trees (BIOINFORMATICS)
7 pages
Phylogenetics
100% (1)
Phylogenetics
51 pages
Phylogeny Analysis
No ratings yet
Phylogeny Analysis
49 pages

Intro To Phyl o Genetics

Uploaded by

Intro To Phyl o Genetics

Uploaded by

An Introduction To Phylogeny

Conor Meehan (he/they)

Define various phylogenetic terminology

Describe the process of parsimony tree building

Translate a distance matrix into a phylogeny

Discuss the drawbacks of non- and semi-parametric phylogenetic methods

Describe the process of tree space searching

Recognise the maximum likelihood and bootstrap approaches

Lineages can be:

Phylogenetics is the study of these relationships

The node labelled Root Peripheral branch

Unrooted: No common ancestor is known or time is not known

Fedrizzi, Meehan et al, Sci Rep

Ngabonziza et al, Nat Comms

What does monophyletic mean?

How many offspring does a bifurcating node have?

How do you write this tree in newick format?

Tree-search method: how do we decide between possible solutions?

Note: count as differences, not as number of characters in common

Note: count as differences, not as number of characters in common

2. Start with every sequence in its own cluster

4. Select the smallest distance between clusters

6. Get the distance from that cluster to all others

8. Repeat steps 2 and 3 until complete

We will do this in the hands on session

1. S1-S2 and S3-S4 are both smallest so randomly choose

Need to correct distances

What is the main drawback of parsimony methods?

Is UPGMA a non-parametric or semi-parametric method?

Search for better trees

Lik Search for better trees

A local maximum tree is better than those that

S1: TTCAG Total likelihood: 1225

Does maximum likelihood go sequence by sequence or column by

In ML, at each step do you change the alignment or the tree?

In bootstrapping is sampling done with or without replacement?

Define various phylogenetic terminology

Describe the process of parsimony tree building

Translate a distance matrix into a phylogeny

Discuss the drawbacks of non- and semi-parametric phylogenetic methods

Describe the process of tree space searching

Recognise the maximum likelihood and bootstrap approaches

You might also like