BLOSUM Matrices

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 18

Alignment IV

BLOSUM Matrices
BLOSUM matrices

• Blocks Substitution Matrix. Scores


for each position are obtained
frequencies of substitutions in blocks
of local alignments of protein
sequences [Henikoff & Henikoff92].
• For example BLOSUM62 is derived
from sequence alignments with no
more than 62% identity.
2
BLOSUM Scoring Matrices

• BLOck SUbstitution Matrix


• Based on comparisons of blocks of sequences
derived from the Blocks database
• The Blocks database contains multiply aligned
ungapped segments corresponding to the most
highly conserved regions of proteins (local
alignment versus global alignment)
• BLOSUM matrices are derived from blocks whose
alignment corresponds to the BLOSUM-,matrix
number

3
Conserved blocks in alignments

AABCDA...BBCDA
DABCDA.A.BBCBB
BBBCDABA.BCCAA
AAACDAC.DCBCDB
CCBADAB.DBBDCC
AAACAA...BBCCC

4
Constructing BLOSUM r
• To avoid bias in favor of a certain protein, first
eliminate sequences that are more than r%
identical
• The elimination is done by either
– removing sequences from the block, or
– finding a cluster of similar sequences and replacing it by
a new sequence that represents the cluster.
• BLOSUM r is the matrix built from blocks with no
more the r% of similarity
– E.g., BLOSUM62 is the matrix built using sequences with
no more than 62% similarity.
– Note: BLOSUM 62 is the default matrix for protein
BLAST

5
Collecting substitution statistics

1. Count amino acids pairs in each


column; e.g., A
– 6 AA pairs, 4 AB pairs, 4 AC, 1 BC, 0 BB, A
0 CC.
– Total = 6+4+4+1=15
B
A
2. Normalize results to obtain
probabilities (pX’s and qXY’s) C
A
3. Compute log-odds score matrix from
probabilities:
s(X,Y) = log (qXY / (pX py))
6
Computing probabilities

From http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf
7
Computing probabilities

8
Computing probabilities

9
Example

From http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf
10
Example

11
Example

12
Example

13
Comparison
• PAM is based on an evolutionary model
using phylogenetic trees
• BLOSUM assumes no evolutionary model,
but rather conserved “blocks” of proteins

14
Relative Entropy
20 i
H   pi pj s (i , j )
i 1 j 1

• Indicates power of scoring scheme to


distinguish from “background noise” (i.e.,
randomness)
• Relative entropy of a random alignment
should be negative
• Can use H to compare different scoring
matrices
15
Equivalent PAM and Blossum
matrices (according to H)

• PAM100 ==> Blosum90


• PAM120 ==> Blosum80
• PAM160 ==> Blosum60
• PAM200 ==> Blosum52
• PAM250 ==> Blosum45

16
PAM versus Blosum

Source: http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf 17
Superiority of BLOSUM for database
searches
(according to Henikoff and Henikoff)

18

You might also like