0% found this document useful (0 votes)
247 views14 pages

Bioinformatics-And-Phylogeny

Bioinformatics uses principles from computer science, statistics, and linguistics to study genomic and proteomic sequences stored in biological databases. Phylogenetic analysis studies evolutionary relationships through phylogenetic trees, using bioinformatics tools to analyze protein and DNA sequences and detect relationships. The document outlines steps for acquiring sequence data from NCBI, performing BLAST searches, building multiple sequence alignments with ClustalW, and constructing phylogenetic trees using MEGA software.

Uploaded by

Gimber Breg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
247 views14 pages

Bioinformatics-And-Phylogeny

Bioinformatics uses principles from computer science, statistics, and linguistics to study genomic and proteomic sequences stored in biological databases. Phylogenetic analysis studies evolutionary relationships through phylogenetic trees, using bioinformatics tools to analyze protein and DNA sequences and detect relationships. The document outlines steps for acquiring sequence data from NCBI, performing BLAST searches, building multiple sequence alignments with ClustalW, and constructing phylogenetic trees using MEGA software.

Uploaded by

Gimber Breg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Bioinformatics and

Phylogenetic Analysis

Edgar Scott
Multicampus Bioinformatics
Education Specialist
What is Bioinformatics
 Interdisciplinary field that combines
principles and techniques from
computer science, probability and
statistics, and linguistics to the study of
genomic and proteomic sequences.
 Biological database for storing and
organizng DNA and protein sequences
 Computational tools for analyzing
sequences
Phylogenetic Analysis and
Bioinformatics
 Phylogenetics – study of evolutionary
relationships
 Phylogenetic trees used to represent
evolutionary relationships
 Use of protein or DNA sequences to detect
relationships versus morphological characters
 Bioinformatics provides both sequence
repositories and sequence analysis software.
Overview
 Acquiring Data Set
 Text searching at the National Center for
Biotechnology Information (NCBI)
 Sequence similarity and homology
 Sequence similarity searching with Basic Local
Alignment Search Tool (BLAST)
 Analyzing Data Set
 Phylogenetic Analysis with Molecular Evolutionary
Genetics Analysis (MEGA) 3.1 software
 Build multiple sequence alignments of sequences using
ClustalW
 Build phylogenetic trees
Text Searching at NCBI
 NCBI maintains provides molecular
information and bioinformatic tools to
the scientific community
 GenBank – an archival DNA and protein
sequence database
 RefSeq – a curated DNA and protein
sequence database
 Entrez Gene – a gene centered database
Sequence Similarity and
Homology
 Homology – sequence that share a common
ancestral sequence
 Paralogs – arise via gene duplication
 Orthologs – arise via speciation event
 Xenologs – arise via gene transfer
 Evolutionarily related sequences have similar
sequences.
 Sequence differences correspond to amount
of change that has occurred since they last
shared a common ancestral sequence.
Sequence Alignments
 Sequence Alignment – a process that identifies a
series of characters or character patterns that are in
the same order in both sequences.
 Pairwise Global alignment
 Pairwise Local alignment
 Optimal alignment – an alignment between
sequences in which the number of matching
characters are maximized and the mismatching
characters are minimized.
 Quantifying alignments
 Alignment score of the optimal alignment
 Percent identity scores
 Percent similarity scores
Sequence Similarity Searching
 Basic Local Alignment Search Tool (BLAST)
 Blastp, Blastn, Blastx, Tblastn, & TblastX
 Local alignments are reported
 Expectation Value – the number of times an
investigator can expect to find an alignment
that has an alignment score as good or better
than the alignment score under consideration.
Steps to Build a Tree
 Build a multiple sequence alignment of
data set.
 Analyze multiple sequence alignment
using either distance based methods or
character based methods.
Molecular Evolutionary
Genetics Analysis (MEGA) 3.1
 Phylogenetic Analysis program
 Constructs multiple sequence alignment using
ClustalW
 Provides tree building methods
 Distance based Methods
 UPGMA
 Neighbor-joining method
 Minimum Evolution
 Character based Method
 Maximum Parsimony
 Provides a great help document!
Multiple Sequence Alignment
 Multiple Sequence Alignment – an alignment
between three or more sequences.
 Computationally classified as NP-hard
 Programs
 ClustalW – fast, applies a progressive method
 T-Coffee – slower, applies an advanced
progressive method
 Dialign – slow, applies an iterative method
 Combine – combines multiple sequence
alignments
Tree Building methods
 UPGMA, Neighbor-Joining, Minimum Evolution
 Distance based methods
 Analyze the multiple sequence alignment to
calculate a distance matrix.
 Clustering algorithm analyzes the distance matrix
to determine which sequences should be
clustered.
 Maximum parsimony
 Character based method
 Analyze the multiple sequence alignment to create
a tree whose tree length has been minimized.
Tree Reliability
 Bootstrapping – method for assessing
the reliability of trees.
 Steps
 The original data set is resampled several
times (e.g. 1000).
 For each resampling, a tree is built
 The trees created from the resampling
iterations are compared to the original
tree.
Review
 Acquiring Data Set
 Text searching at the National Center for
Biotechnology Information (NCBI)
 Sequence similarity and homology
 Sequence similarity searching with Basic Local
Alignment Search Tool (BLAST)
 Analyzing Data Set
 Phylogenetic Analysis with Molecular Evolutionary
Genetics Analysis (MEGA) 3.1 software
 Build multiple sequence alignments of sequences using
ClustalW
 Build phylogenetic trees

You might also like