0% found this document useful (0 votes)
105 views2 pages

Scoring Matrices and The Statistical Significance of Molecular Sequence Features

This document provides instructions and exercises for analyzing molecular sequence features using scoring matrices and statistical significance. It includes generating PAM matrices from a web tool, comparing BLAST results using BLOSUM62 and PAM30 matrices, calculating alignment scores with different PAM matrices, determining the effect of database growth on distinguishing alignments from chance, calculating a scoring matrix from a phylogenetic tree, and demonstrating the units of the PAM250 matrix.

Uploaded by

Hilda Vebrina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views2 pages

Scoring Matrices and The Statistical Significance of Molecular Sequence Features

This document provides instructions and exercises for analyzing molecular sequence features using scoring matrices and statistical significance. It includes generating PAM matrices from a web tool, comparing BLAST results using BLOSUM62 and PAM30 matrices, calculating alignment scores with different PAM matrices, determining the effect of database growth on distinguishing alignments from chance, calculating a scoring matrix from a phylogenetic tree, and demonstrating the units of the PAM250 matrix.

Uploaded by

Hilda Vebrina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

Scoring matrices and the statistical significance of molecular

sequence features

Exercises
Go to the web-site

www.bioinformaticslaboratory.nl/twiki/bin/view/BioLab/EducationBiomedicalSciences

In the exercises below we will make use of the spreadsheet ‘Exercise.xls’ which contains six tabs
(PAM10, PAM250, PAM500, Alignment, SwissProt Growth, Calculate Score Matrix). The
spreadsheet ‘Excercise+Answer.xls’ includes the answers to some of the questions below.

Excercise 1: Generation of PAM matrices


1. The web-site of the CMBI contains a tool to calculate PAM matrices. Goto
http://www.bioinformatics.nl/tools/pam.html and generate a PAM-120 matrix. Do you
understand the output?

Excercise 2: comparison of PAM-30 and BLOSUM62 matrix


1. Retrieve the HBA (human) protein sequence from a protein database (Accession code
P01966)
2. Goto the NCBI standard protein Blast site (blastp) and Blast the HBA sequence against the
protein database. Use the default Blosum62 matrix (which is similar to the PAM 160 matrix).
3. Repeat these Blast searches with the first 10 amino acids of the P01966 sequence. Use the
BLOSUM62 and PAM30 matrix. Can you explain the results? Compare the the previous step
of this exercise
4. Repeat the Blast search for the first 15 of P01966. Only use the BLOSUM62 matrix and
make sure the 'composition based statistics' is off. Can you explain the results? Do you
understand the output?

Excercise 3: calculate alignment score with PAM matrices


1. In the excel sheet the tab ‘alignments’ contains an alignment of the first 10 amino acids of the
P01966 sequence with itself. The spreadsheet also contains the PAM-10, PAM-250 and PAM-
500 matrix. Can you calculate the score for this alignment with using these three PAM
matrices? Do you understand the result?

Excercise 4: significance of database searches.


1. We have seen that to search the SwissProt database (40.000.000 residues) we require about
33 bits of information to distinguish an alignment from chance (if considering a typical
protein of 250 residues). From the exercise excel sheet (tab ‘SwissProt Growth’) it can be
seen that SwissProt increases with a factor of about 1.2 each year. What is the effect of this
growth on the number of residues we need to distinguish an alignment from chance?
Calculate this for PAM-10, PAM-120 and PAM-250.

Excercise 5: Calculation of a scoring matrix.

1
In this exercise you will calculate your own PAM-1 matrix from a phylogenetic tree. Open the
excel file and try to it.

Exercise 6: PAM250 units


A correction to the alignment score must be made since PAM250 is not in bit units but in units of
logarithm to the base 10, multiplied by 10. These PAM250 scores actually correspond to units of
1/3 bits Can you demonstrate this?

You might also like