Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
Chapter 4
Lesk, A. 2nd Ed.
Introduction to Sequence Alignment
Advantage
– Gives quick pictorial statement of the relationship between
two sequences
Disadvantage
– Its ‘reach’ into the realm of distantly related sequences is
poor
In analyzing sequences, one should always look at
dotplot to be sure of not missing anything obvious,
but be prepared to apply more subtle tools
Some Typical Dotplot
Comparisons
Hamming distance
– Number of positions with mismatching characters
defined between two strings of equal length
Levenshtein, or edit distance
– Minimal number of ‘edit operations’ required to
change one string into another between two
strings of not necessarily equal length
Edit operation is deletion, insertion or alteration of single
character in either sequence
Examples
For proteins
– A variety of scoring schemes have been proposed
Dayhoff matrices or PAM (Percent Accepted Mutation) –
a measure of sequence divergence
BLOSUM matrices
– Developed by S. Henikoff and J.G. Henikof for scoring
substitutions in amino acid sequence comparison
– Goal was to replace Dayhoff matrix with one that would
perform best in identifying distant relationships by making
use of much larger amount of data that had become
available since Dayhoff’s work
– Based on BLOCKS database of aligned protein sequences,
hence the name BLOcks SUbstition Matrix
– BLOSUM62 commonly-used substitution matrix