0% found this document useful (0 votes)
474 views43 pages

Phylogeny

This document discusses phylogenetic analysis and molecular phylogeny. It defines phylogenetics as the study of evolutionary histories using tree diagrams, and explains that phylogenetic trees represent the evolutionary divergence and relationships between organisms. Molecular data like DNA and protein sequences can provide insights into evolution by revealing accumulated mutations over time. Molecular phylogeny analyzes these molecular sequences to reconstruct evolutionary histories and relationships based on shared ancestry.

Uploaded by

Mausam Kumravat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
474 views43 pages

Phylogeny

This document discusses phylogenetic analysis and molecular phylogeny. It defines phylogenetics as the study of evolutionary histories using tree diagrams, and explains that phylogenetic trees represent the evolutionary divergence and relationships between organisms. Molecular data like DNA and protein sequences can provide insights into evolution by revealing accumulated mutations over time. Molecular phylogeny analyzes these molecular sequences to reconstruct evolutionary histories and relationships based on shared ancestry.

Uploaded by

Mausam Kumravat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

PHYLOGENTIC ANALYSIS

Neha Jain

School of Biotechnology, DAVV

Phylogenetics is the study of the evolutionary


history of living organisms using tree-like
diagrams to represent pedigrees of these
organisms.

The tree branching patterns representing the


evolutionary divergence are referred to as
phylogeny.

FINDING OUT WHAT


PHYLOGENETIC
TREES CAN DO FOR YOU
The purpose of phylogenetics to reconstruct the
history of life and explain the present diversity of
living creatures. This can be represented as a
huge genealogic tree (the tree of life).
The underlying principle of phylogeny is to try to
group living creatures according to their level of
similarity. In this con-text, we assume that the
more similar two species are (such as human and
ape), the closer they are to their common
ancestor.
Biology is very much about classifying and the
best means of classification we have is phylogeny.

Phylogenetics is a special kind of phylogeny that


relies on the comparison of equivalent genes
coming from several species for reconstructing
the genealogic tree of these species and finding
out who is the closest relative of whom in the
family.
If necessary, you can also apply phylogenetic
methods to the various genes of a gene family to
reconstruct the history of the gene family by the
same means.

WHY TO USE PHYLOGENETIC


ANALYSIS

In the context of bioinformatic analysis, there are


three major reasons why you may want to use
phylogenetics:

Determining the closest relatives of the


organism that youre interested in: For
instance, if youre studying a new bacterium, you
can sequence its ribosomal RNA and place it on a
phylogenetic tree computed with all known
ribosomal RNAs. This can give you a fairly good
idea of who this bacterium really is.

Discovering the function of a gene: If youre


studying a gene, you can use phylogenetic trees to be
sure that the gene youre interested in is
orthologous (more about that in a minute) to
another well-characterized gene in another species.
Retracing the origin of a gene: Most genes within
a genome travel together through evolutionary time.
However, from time to time, individual genes may
jump from one species to another for instance,
piggy-backing a virus infection. Phylogenetic trees
are a great way to reveal such events, which are
called horizontal (or lateral)transfers.

MOLECULAR PHYLOGENY
Molecular data (DNA & protein sequence) can
provide very usefull evolutionary perspectives of
existing organisms as organisms evolve, the
genetic materials accumulate mutations over
time causing phenotypic changes.
Because genes are the medium for recording the
accumulated mutations, they can serve as
MOLECULAR FOSSILS.
Through comparative analysis of the molecular
fossils from a number of related organisms, the
evolutionary history of the genes and even the
organisms can be revealed.

ADVANTAGE OF MOLECULAR
DATA OVER FOSSILS

Molecular data are more numerous than fossil records


and easier to obtain.
More clear-cut and robust phylogenetic trees can
constructed.
And some times only information's available for
researchers to reconstruct evolution history are the
DNAs.
Therefore the field of Molecular phylogenetics can be
defined as the study of evolutionary relationships of
genes and other biological macromolecules by
analyzing mutations at various positions in their
sequences and developing hypothesis about the
evolutionary relatedness of the biomoleculs.

MAJOR ASSUMPTION IN
MOLECULAR PHYLOGENY
First, molecular sequences used in phylogenetic
construction are homologous, meaning that they
share a common origin and subsequently
diverged through times.
Second, Phylogenetic divergence is assumed to be
bifurcating, parent branch splits into two
daughter branches at any given point.
Third, each position in a sequence evolved
independently.

TERMINOLOGY
The lines in the tree are branches.
At the tip of the branches are the present day
species or sequences known as taxa or
operational taxonomic units.

The connecting points where two adjacent


branches join is called a node, which represent
an inferred ancestor of extant taxa.
The bifurcating point at the very bottom of the
tree is called the ROOT NODE, which
represents the common ancestor of all members
of the tree.

Monophyletic
A group of taxa descended from a single common
ancestor is defined as a clade or monophyletic
group. In a monophyletic group, two taxa share a
unique common ancestor not shared by any other
taxa. They are also referred to assister taxa to
each other.(B and C)
When a number of taxa share more than one
closest common ancestors, they do not fit the
definition of a clade. In this case, they are
referred to as paraphyletic. e.g., taxa B and D

Apomorphy (derived trait)


= a new, derived feature
E.g., for this evolutionary transformation
scales
-------->
(ancestral feature)

feathers
(derived feature)

Presence of feathers is an apomorphy for birds.

HOMOLOGY

Similarity resulting from common ancestry.


E.g.,

the forelimb bones of a bird, bat, and cat

HOMOPLASY (ANALOGY).

Similarity not due to common ancestry.


Reversal loss of new (apomorphic) feature,
resembles ancestral (old) feature. Leg-less
lizards and Snakes.
Convergence (parallelism) gain of new,
similar features independently.
Convergent evolution: spines of cacti &
euphorbs

Ancestral gene

Gene Duplication
can occur!

Ancestral species
Speciation with
divergence of gene

Orthology genes
homologous
Species A

Orthologous genes

Species B

(a) Orthologous genes

Species A
Gene duplication and divergence

Paralogy
genes not homologous
Paralogous genes
Species A after many generations
(b) Paralogous genes
Fig. 26-18

TREE TOPOLOGY

The branching pattern in a tree is called tree


topology. When all branches bifurcate on a
phylogenetic tree, it is referred to as dichotomy
In this case, each ancestor divides and gives rise
to two descendants.
Sometimes, a branch point on a phylogenetic tree
may have more than two descendents resulting in
a multifurcating node. The phylogeny with
multifurcating branches is called polytomy

UNROOTED AND ROOTED


A phylogenetic tree can be either
Rooted or Unrooted.
An unrooted phylogenetic tree does not assume
knowledge of a common ancestor, but only
positions the taxa to show their relative
relationships. There is no direction of an
evolutionary path in an unrooted tree.
In a rooted tree, all the sequences under study
have a common ancestor or root node from which
a unique evolutionary path leads to all other
nodes. To define the direction of an evolution
path, a tree must be rooted.

CHOOSING THE RIGHT SEQUENCES


FOR THE RIGHT TREE

When you build a phylogenetic tree, you make the


assumption that the sequences you are
comparing have a common ancestor. If your
sequences are similar enough, this is a
reasonable hypothesis.

Using DNA or protein sequences:


To establish the relationship between two
sequences, you want to measure
the time that separates their divergence from
their common ancestor.

If your DNA sequences are more than 70


percent identical:

You can make a DNA multiple sequence


alignment. If your sequences are coding for
proteins, however, this is not recommended.
The
problem
with
DNA
sequence
alignments is that the substitution matrices
are not very good and the alignments
take more time to compute because of the
sequence lengths.

If your DNA sequences are less than 70 percent


identical:
If your sequences code for proteins, it is much
safer to translate them into proteins and to build
the multiple sequence alignment with the
proteins. If your sequences are too similar at the
protein level, you can thread the DNA sequences
back onto the protein alignment.

CHOOSING SEQUENCES TO MAKE


EITHER A GENE TREE OR A SPECIES
TREE

Homologous genes are genes that derive from a common ancestor.


They can have three types of relationships:
Orthologs: Theyre only separated by speciation is the
phenomenon during which a common ancestor gives birth to two
subgroups that slowly drift away from their common genetic
makeup to become distinct species.
Assuming that the genomes are not rearranged in the two new
species, two genes are orthologous when they correspond to the
same ancestral gene in the ancestral genome. Biologists usually
expect orthologs to have similar functions and structure. In
Figure 13-1, A1 and A2 are orthologs, and so are B1 and B2

Paralogs:Paralogs are homologues separated by a


duplication event, meaning that within a genome,
a gene was duplicated. One of the dupli-cates
may have kept the original function while the
other duplicate could have acquired a new
function. You can expect paralogs to have different but related functions. For instance, A1 and
B1 are paralogs in Figure

Xenologs:Xeno is a Greek word that means foreigner.


Xenologs result from a lateral transfer between two
organisms a direct DNA transfer between two species. This
means that one of the species contains a gene that does not
have the same history as the genome in which it is inserted.
A typical case of lateral transfer (or xenologs) is the
acquisition of the isoleucyl-tRNA sytnthase from their host by
several bacteria. The isoleucyl-tRNA sytnthase is a protein
involved in the synthesis of other proteins, and its acquisition
by bacteria seems to help them becoming antibiotic
resistant. When this happens, the newly acquired isoleucyltRNA sytnthase is a xenolog of the other tRNA synthases
contained in the bacteria.

When you select a group of homologous genes to


make a phylogenetic tree, you always make what
biologists call a gene tree. It is a tree that tells
the story of the genes it contains.
If you select a group of genes that are all
orthologous from different species, the gene tree
you get looks very much like a species tree
which lets you reconstruct the speciation's that
occurred while the species youre looking at (or
their ancestors) were diverging.

The best example of this type of gene tree is the


ribosomal RNA phylogenetic tree that biologists
use to reconstruct the big tree of life. Ribosomal
RNA genes exist in every species and are clearly
orthologous between species.

Given by Zuckerkandl and Pauling, They noted that rates


of amino acid replacements in animal haemoglobins were
roughly proportional to time - as judged against the fossil
record.

MOLECULAR CLOCK HYPOTHESIS


Molecular clock is an assumption by which
molecular sequences evolve at constant rates so
that the amount of accumulated mutations is
proportional to evolutionary time.
Based on this hypothesis, branch lengths on a
tree can be used to estimate divergence time.

This assumption of uniformity of evolutionary


rates, however, rarely holds true in reality.

NEWICK TREE FORMAT


To provide information of tree topology to
computer programs without having to draw the
tree itself, a special text format known as the
Newick format developed
Trees are stored as a tree file that shows the
relationships in nested-parenthesis notation, i.e.,
a file with the line (A,(B,(C,D))); represents the
tree Sometimes branch lengths are also included
next to the names, e.g., A:0.05. From this
information, a tree-drawing program may be
used to produce a tree representation of the data.

In this linear representation, each internal node


is represented by a pair of parentheses that
enclose all member of a monophyletic group
separated by a comma. For a tree with scaled
branch lengths, the branch lengths in arbitrary
units are placed immediately after the name of
the taxon separated by a colon.
The tree ends with a semicolon.
Example:

Three methods
maximum parsimony,
distance,
maximum likelihood are generally used
to find the evolutionary tree or trees that best
account for the observed variation in a group of
sequences.

DISTANCE METHOD
The amount of dissimilarity between pairs of
sequences, computed on the basis of sequence
alignment.
Distance Estimates attempt to estimate the
mean number of changes per site since 2 species
(sequences) split from each other. Its of Two
types:
Unweighted
Pair
Group
Method
Using
Arithmetic Average (UPGMA)
Neighbor Joining (NJ)

UNWEIGHTED PAIR GROUP METHOD


USING ARITHMETIC AVERAGE
(UPGMA)

The simplest clustering method is UPGMA, which


builds a tree by a sequential clustering method.
Given a distance matrix, it starts by grouping two
taxa with the smallest pairwise distance in the
distance matrix. A node is placed at the midpoint
or half distance between them. It then creates a
reduced matrix by treating the new cluster as a
single taxon.
The same grouping process is repeated and
another newly reduced matrix is created. The
iteration continues until all taxa are placed on the
tree

The basic assumption of the UPGMA method is


that all taxa evolve at a constant rate and that
they are equally distant from the root, implying
that a molecular clock is in effect.
However, owing to its fast speed of calculation, it
has found extensive usage in clustering analysis
of DNA microarray data

GENE PHLOGENY VERSES SPECIES


PHLOGENY

One of the objectives of building phylogenetic trees based


on molecular sequences is to reconstruct the evolutionary
history of the species involved.
However, a gene phylogeny (phylogeny inferred from a gene
or protein sequence) only describes the evolution of that
particular gene or encoded protein.
This sequence may evolve more or less rapidly than other
genes in the genome or may have a different evolutionary
history from the rest of the genome owing to horizontal
gene transfer events.
Thus, the evolution of a particular sequence does not
necessarily correlate with the evolutionary path of the
species. The species evolution is the combined result of
evolution by multiple genes in a genome.

Thus, to obtain a species phylogeny, phylogenetic


trees from a variety of gene families need to be
constructed to give an overall assessment of the
species evolution.

USES OF PHLOGENY
Phylogenetic analyses are useful in many
different contexts
Example., to infer evolutionary history of the
molecule used,
To infer temporal order of other events mapped
on the phylogeny such as gene transfers,
To study epidemiology (Study of Disease).

PHYLOGENETIC ANALYSIS
PROGRAMS

PHYLIP (phylogenetic inference package)


One of the most highly used tool for phylogentic
analysis.

PAUP (phylogenetic analysis using parsimony)

MacClade: Runs in Macintosh

Thanks

You might also like