Unit IV
Unit IV
1. Purpose of phylogenetics :
With the aid of sequences, it should be possible to find the genealogical ties between organisms.
Experience learns that closely related organisms have similar sequences, more distantly related
organisms have more dissimilar sequences. One objective is to reconstruct the evolutionary
relationship between species. An other objective is to estimate the time of divergence between
two organisms since they last shared a common ancestor.
2. Disclaimers :
The theory and practical applications of the different models are not universally accepted. With
one dataset, different software packages can give different results. Changes in the dataset can
also give different results. Therefore it is important to have a good alignment to start with.
Trees based on an alignment of a gene represent the relationship between genes and this is not
necessarily the same relationship as between the whole organisms. If trees are calculated based
on different genes from organisms, it is possible that these trees result in different relationships.
3. Terminology :
node : a node represents a taxonomic unit. This can be a taxon (an existing species) or an
ancestor(unknown species : represents the ancestor of 2 or more species).
branch : defines the relationship between the taxa in terms of descent and ancestry.
topology : is the branching pattern.
branch length : often represents the number of changes that have occurred in that branch.
root : is the common ancestor of all taxa.
distance scale : scale which represents the number of differences between sequences (e.g. 0.1
means 10 %differences between two sequences)
• There are several bioinformatics tools and databases that can be used for phylogenetic
analysis.
• These include PANTHER, P-Pod, PFam, TreeFam, and the PhyloFacts structural
phylogenomic encyclopedia.
• Each of these databases uses different algorithms and draws on different sources for
sequence information, and therefore the trees estimated by PANTHER, for example, may differ
significantly from those generated by P-Pod or PFam.
• As with all bioinformatics tools of this type, it is important to test different methods,
compare the results, then determine which database works best (according to consensus results)
for studies involving different types of datasets.
Phylogenetic Models
Possible ways of drawing a tree :Trees can be drawn in different ways. There are trees with
unscaled branches and with scaled branches.
Unscaled branches : the length is not proportional to the number of changes. Sometimes, the
number of changes are indicated on the branches with numbers. The nodes represents the
divergence event on a timescale.
Scaled branches : the length of the branch is proportional to the number of changes. The
distance between2 species is the sum of the length of all branches connecting them.
Is is also possible to draw these trees with or without a root. For rooted trees, the root is the
common ancestor. For each species, there is a unique path that leads from the root to that
species. The direction of each path corresponds to evolutionary time. An unrooted tree
specifies the relationships among species and does not define the evolutionary path.
Figure 2: Some possibilities for drawing a tree. (these are just a few examples, there are a lot
of variations possible)
Tree Building methods
There are two major groups of analyses to examine phylogenetic relationships between
sequences :
Phenetic methods : trees are calculated by similarities of sequences and are based on distance
methods. The resulting tree is called a dendrogram and does not necessarily reflect
evolutionary relationships. Distance methods compress all of the individual differences
between pairs of sequences into a single number.
Cladistic methods : trees are calculated by considering the various possible pathways of
evolution and are based on parsimony or likelihood methods. The resulting tree is called a
cladogram. Cladistic methods use each alignment position as evolutionary information to
build a tree.
From the obtained distance matrix, a phylogenetic tree is calculated with clustering
algorithms. These cluster methods construct a tree by linking the least distant pair of taxa,
followed by successively more distant taxa. UPGMA clustering (Unweighted Pair Group
Method using Arithmetic averages) : this is the simplest method
Neighbor Joining : this method tries to correct the UPGMA method for its assumption that
the rateof evolution is the same in all taxa.
5.2. Cladistic methods based on Parsimony :For each position in the alignment, all possible
trees are evaluated and are given a score based on the number of evolutionary changes needed
to produce the observed sequence changes. The most parsimonious tree is the one with the
fewest evolutionary changes for all sequences to derive from a common ancestor. This is a
more time-consuming method than the distance methods.
5.3. Cladistic methods based on Maximum Likelihood :This method also uses each position
in an alignment, evaluates all possible trees, and calculates the likelihood for each tree using
an explicit model of evolution (<-> Parsimony just looks for the fewest evolutionary changes).
The likelihood's for each aligned position are then multiplied to provide a likelihood for each
tree. The tree with the maximum likelihood is the most probable tree. This is the slowest
method of all but seems to give the best result and the most information about the tree.
The next figure shows that there is a chance that many more mutations occur than visible at a
certain time. Even the best evolutionary models can't solve this problem...
Figure 3: Two homologous DNA sequences which descended from an ancestral sequence and
accumulated mutations since their divergence from each other. Note that although 12 mutations
have accumulated, differences can be detected at only three nucleotide sites. (from
Fundamentals of Molecular Evolution, Wen-Hsiung Li andDan Graur, 1991).
7. Graphical explanation of basic phylogenetic terms
Examples : marine mammals, bipedal mammals, flying vertebrates, trees, algae, etc.
Phylogenetic softwares
1. MEGA
MEGA is a useful software in constructing phylogenies and visualizing them, and also for data
conversion. It can easily convert alignment files to other formats such as nexus, paup, phylip,
and fasta, and so on. The MEGA tree explorer is helpful in editing trees very easily, subtrees
can also be selected and edited separately. Some tree image export options are also available.
The input formats are newick, phylip, mega, and nexus. The phylogenetic tree can also be
converted in newick format but it falls short on converting it into other formats such as phylip
which is required in other analyses such as selection analysis.
2. Dendroscope
It is helpful in visualizing large trees and provides several options to export their graphics with
a command line. Several different views are also available, trees can be easily re-rooted and
node labels and branches can be easily formatted. It can export trees in newick and nexus
format. Although users will have to register themselves first to use this feature.
3. FigTree
It is actually designed to visualize trees that are produced by BEAST program. Tip labels and
node labels can be easily edited. It can easily export trees in nexus, newick, and JSON format
with some graphics export options such as emf, pdf, sg, png, etc.
4. Phylotree.js
It is a javascript based library to visualize and annotate trees and offer some other
customizations. It has a wide application in Datamonkey comparative analyses. A user can
upload trees using Phylotree.js where a user can easily select test and reference branches, and
any changes can be mapped to their position on the corresponding structure. It is also good for
comparison of trees with links between leaves known as a tanglegram, where crossings can
represent evolutionary events. It also offers several export options and other built-in features.
5. ggtree
ggtree is an R package for phylogenetic tree visualization and annotation. It also displays
annotation data on the tree apart from visualizing it. Users can annotate trees with their own
data and can easily convert trees into a data frame, and a lot of other features are available
(https://guangchuangyu.github.io/software/ggtree/).