0% found this document useful (0 votes)
698 views18 pages

BLOSUM Matrices

BLOSUM matrices are substitution matrices derived from blocks of local sequence alignments with varying levels of identity. They provide scoring values for aligning amino acids in protein sequences. BLOSUM62 is commonly used as it is derived from blocks with less than 62% identity. The matrices are constructed by counting amino acid pairs in alignments, normalizing to probabilities, and calculating log-odds scores. They do not assume an evolutionary model like PAM matrices but rather conserved blocks in proteins.

Uploaded by

Raj Kumar Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
698 views18 pages

BLOSUM Matrices

BLOSUM matrices are substitution matrices derived from blocks of local sequence alignments with varying levels of identity. They provide scoring values for aligning amino acids in protein sequences. BLOSUM62 is commonly used as it is derived from blocks with less than 62% identity. The matrices are constructed by counting amino acid pairs in alignments, normalizing to probabilities, and calculating log-odds scores. They do not assume an evolutionary model like PAM matrices but rather conserved blocks in proteins.

Uploaded by

Raj Kumar Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Alignment IV

BLOSUM Matrices
BLOSUM matrices

• Blocks Substitution Matrix. Scores


for each position are obtained
frequencies of substitutions in blocks
of local alignments of protein
sequences [Henikoff & Henikoff92].
• For example BLOSUM62 is derived
from sequence alignments with no
more than 62% identity.
2
BLOSUM Scoring Matrices

• BLOck SUbstitution Matrix


• Based on comparisons of blocks of sequences
derived from the Blocks database
• The Blocks database contains multiply aligned
ungapped segments corresponding to the most
highly conserved regions of proteins (local
alignment versus global alignment)
• BLOSUM matrices are derived from blocks whose
alignment corresponds to the BLOSUM-,matrix
number

3
Conserved blocks in alignments

AABCDA...BBCDA
DABCDA.A.BBCBB
BBBCDABA.BCCAA
AAACDAC.DCBCDB
CCBADAB.DBBDCC
AAACAA...BBCCC

4
Constructing BLOSUM r
• To avoid bias in favor of a certain protein, first
eliminate sequences that are more than r%
identical
• The elimination is done by either
– removing sequences from the block, or
– finding a cluster of similar sequences and replacing it by
a new sequence that represents the cluster.
• BLOSUM r is the matrix built from blocks with no
more the r% of similarity
– E.g., BLOSUM62 is the matrix built using sequences with
no more than 62% similarity.
– Note: BLOSUM 62 is the default matrix for protein
BLAST

5
Collecting substitution statistics

1. Count amino acids pairs in each


column; e.g., A
– 6 AA pairs, 4 AB pairs, 4 AC, 1 BC, 0 BB, A
0 CC.
– Total = 6+4+4+1=15
B
A
2. Normalize results to obtain
probabilities (pX’s and qXY’s) C
A
3. Compute log-odds score matrix from
probabilities:
s(X,Y) = log (qXY / (pX py))
6
Computing probabilities

From http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf
7
Computing probabilities

8
Computing probabilities

9
Example

From http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf
10
Example

11
Example

12
Example

13
Comparison
• PAM is based on an evolutionary model
using phylogenetic trees
• BLOSUM assumes no evolutionary model,
but rather conserved “blocks” of proteins

14
Relative Entropy
20 i
H   pi pj s (i , j )
i 1 j 1

• Indicates power of scoring scheme to


distinguish from “background noise” (i.e.,
randomness)
• Relative entropy of a random alignment
should be negative
• Can use H to compare different scoring
matrices
15
Equivalent PAM and Blossum
matrices (according to H)

• PAM100 ==> Blosum90


• PAM120 ==> Blosum80
• PAM160 ==> Blosum60
• PAM200 ==> Blosum52
• PAM250 ==> Blosum45

16
PAM versus Blosum

Source: http://www.csit.fsu.edu/~swofford/bioinformatics_spring05/lectures/lecture03-blosum.pdf 17
Superiority of BLOSUM for database
searches
(according to Henikoff and Henikoff)

18

You might also like