BLOSUM Matrices
BLOSUM Matrices
BLOSUM Matrices
BLOSUM Matrices
BLOSUM matrices
3
Conserved blocks in alignments
AABCDA...BBCDA
DABCDA.A.BBCBB
BBBCDABA.BCCAA
AAACDAC.DCBCDB
CCBADAB.DBBDCC
AAACAA...BBCCC
4
Constructing BLOSUM r
• To avoid bias in favor of a certain protein, first
eliminate sequences that are more than r%
identical
• The elimination is done by either
– removing sequences from the block, or
– finding a cluster of similar sequences and replacing it by
a new sequence that represents the cluster.
• BLOSUM r is the matrix built from blocks with no
more the r% of similarity
– E.g., BLOSUM62 is the matrix built using sequences with
no more than 62% similarity.
– Note: BLOSUM 62 is the default matrix for protein
BLAST
5
Collecting substitution statistics
From http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf
7
Computing probabilities
8
Computing probabilities
9
Example
From http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf
10
Example
11
Example
12
Example
13
Comparison
• PAM is based on an evolutionary model
using phylogenetic trees
• BLOSUM assumes no evolutionary model,
but rather conserved “blocks” of proteins
14
Relative Entropy
20 i
H pi pj s (i , j )
i 1 j 1
16
PAM versus Blosum
Source: http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf 17
Superiority of BLOSUM for database
searches
(according to Henikoff and Henikoff)
18