Phylogenetics PDF by Matti Ullah KHan NIazi

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Phylogenetic Tree Construction Steps:

Phylogenetic tree construction is a complex process that involves several steps:

1. Selection of molecular marker


The first step in constructing a phylogenetic tree is to choose the appropriate molecular
marker. The choice of molecular marker depends on the characteristics of the sequences and
the purpose of the study. Either nucleotide or protein sequence data can be used. For closely
related organisms, nucleotide sequences are preferable, while for more divergent groups,
slowly evolving nucleotide sequences or protein sequences may be used. Protein sequences
are preferred over nucleotide sequences in many cases because they are more conserved and
allow for more sensitive alignment due to having more characters. Although protein
sequences offer several benefits for phylogenetic analysis, DNA sequences can also provide
valuable information in certain instances, especially when dealing with closely related
sequences.
2. Multiple sequence alignment
After the selection of molecular markers, the next step is to align the sequences from different
species. This is the most important step because the accuracy of the resulting phylogenetic
tree depends on the quality of the alignment. Alignment programs such as T-Coffee, MAAFT,
Clustal W can be used. Gblocks is one of the automatic programs that can help improve
alignment by eliminating poorly aligned positions and divergent regions.
3. Selection of a model of evolution
The third step of phylogenetic tree construction is the selection of an appropriate evolutionary
model. Evolutionary (or substitution) models are statistical models that describe the
substitution and divergence of sequences over time. There are several substitution models
available for both nucleotide and amino acids.
Two commonly used substitution models for nucleotides are the Jukes-Cantor (JC) model and
Kimura’s two-parameter model. There are also many amino acid substitution models. The
most commonly used ones are the Dayhoff model (PAM) and the Jones-Taylor-Thornton
(JTT) model.
4. Construction of the phylogenetic tree
The next step is the construction of the phylogenetic tree.
The two main methods for constructing phylogenetic trees are distance-based and character-
based methods.
Distance-based methods rely on computing the amount of dissimilarity between sequences,
while character-based methods use molecular sequences from individual taxa to trace the
character states of the common ancestor.
5. Assessment of the reliability of the tree
The final step involves assessing the reliability of the phylogenetic tree. This can be done by a
statistical method called bootstrapping which is used to assess the reliability of a
phylogenetic tree’s topology. It involves repeatedly resampling the initial sequence data to
generate multiple subsets of derived sequences, referred to as bootstrap samples. These
samples are then used to construct a new phylogenetic tree using the same method as the
original tree. Interior branches that are accurately predicted by the new tree are assigned a
value of 1. This process is repeated numerous times, and the percentage of times each interior
branch receives a value of 1 is calculated as the bootstrap value or confidence value. A
bootstrap value of 95 or more is generally considered to indicate an accurate topology, and
these values are expressed as percentages on the branches of the phylogenetic tree. Besides
bootstrapping, other resampling strategies like Jackknifing and Bayesian Simulation can also
be used.
Phylogenetic Tree Construction Methods
The methods to construct phylogenetic trees can be classified into two major types:
1. Distance-based methods
Distance-based tree construction methods involve calculating evolutionary distances between
sequences by using substitution models, which are then used to construct a distance matrix.
Using the distance matrix, a phylogenetic tree is constructed. The two popular distance-based
methods are UPGMA and NJ.

a. Unweighted Pair Group Method with Arithmetic Mean (UPGMA)


UPGMA is the simplest distance-based method that constructs a rooted phylogenetic tree
using sequential clustering. First, all sequences are compared using pairwise alignment to
calculate the distance matrix. Using this matrix, the two sequences with the smallest pairwise
distance are clustered as a single pair. UPGMA method assumes that the evolutionary rate of
all taxa is constant, and they are equidistant from the root, indicating the presence of a
molecular clock mechanism.

b. Neighbor-Joining (NJ)
The neighbor-joining method is the most widely used distance-based method. It is similar to
the UPGMA method in terms of building the tree using a distance matrix however, it does not
assume the molecular clock and produces an unrooted tree. The algorithm calculates the
pairwise distances between all sequences and uses these distances to determine the closest
neighbors. Once the closest neighbors are identified, the algorithm consolidates them into a
new node, effectively reforming the star tree. This process is repeated until all sequences are
connected in a fully resolved tree.
2. Character-Based Methods
Character-based methods involve analyzing sequence data by directly examining the
sequence characters, rather than relying on pairwise distance comparisons. These methods
evaluate all sequences at once by analyzing one character or site at a time. Character-based
methods are generally considered more accurate than distance-based methods. However,
character-based methods are more computationally intensive and require more sophisticated
statistical models. The maximum parsimony (MP) and maximum likelihood (ML) methods
are the two most commonly used character-based tree construction methods.

a. Maximum parsimony (MP)


Maximum parsimony method is a character-based method that selects the tree with the least
number of evolutionary changes or the shortest total branch length. Initially, multiple
sequence alignment is performed to identify potential positions in the sequences that
correspond to each other. Each aligned position is analyzed to identify the trees that require
the smallest number of evolutionary changes to produce the observed sequence changes. This
process is repeated for all positions in the sequence alignment, and the trees that produce the
lowest overall number of changes for all positions are selected. This method works best for
relatively similar sequences and for small numbers of sequences.

b. Maximum likelihood (ML)


Maximum likelihood is a statistical method that uses probabilistic models to identify the most
appropriate tree that has the maximum probability of generating the observed data. Similar to
the maximum parsimony method, this approach evaluates each column of a multiple sequence
alignment during the analysis. However, unlike maximum parsimony, ML considers all
possible trees that could explain the observed data. The likelihood of each possible tree is
calculated, and the tree with the highest probability is selected as the most likely evolutionary
history of the sequences.

Applications of the phylogenetic tree


Phylogenetic trees have various practical applications, including:

1. Phylogenetic trees can be used to study the evolutionary relationships between different
species and to understand the evolutionary processes over time.
2. Phylogenetic trees can be used to study the diversity and distribution of species and to
develop conservation strategies to protect endangered species and ecosystems.
3. Phylogenetic trees can be used to identify the origins of pathogens and to track the
spread of diseases.
4. Phylogenetic trees can also be used in forensics to identify the origins of biological
samples found at crime scenes and to link suspects to crimes.
5. Phylogenetic trees are useful for organizing and classifying organisms and species
according to their DNA sequences and morphological similarities and differences.

You might also like