0% found this document useful (0 votes)
12 views28 pages

Bioinformatics Module 2 Notes

Uploaded by

crizjames1096
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views28 pages

Bioinformatics Module 2 Notes

Uploaded by

crizjames1096
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

5/19/2024

• The PAM matrices for amino acids, along with the single letter
abbreviations used for genetically encoded amino acids, were
developed by Margaret Dayhoff.

17

PAM
🞂 A percent(or point) accepted mutation — also
known as a PAM — is the replacement of a
single amino acid in the primary structure of
a protein with another single amino acid, which is
accepted by the processes of natural selection.
🞂 These mutations were identified by comparing
highly similar sequences with at least 85% identity

🞂 A PAM matrix is a matrix where each column and


row represents one of the twenty standard amino
acids.

18

1
5/19/2024

• PAM also defines a time unit, where 1 PAM is the time in which 1/100
amino acids are expected to undergo a mutation.
• The PAM1 probability matrix shows the probability of the amino acid
at row i being replaced by the amino acid at column j.

19

20

2
5/19/2024

• PAM250 probability matrix, describing the replacement probabilities


given 250 PAM units of time

21

PAM
🞂 Each entry indicates the likelihood of the amino acid
of that row being replaced with the amino acid of
that column through a series of one or more point
accepted mutations during a specified evolutionary
interval, rather than these two amino acids being
aligned due to chance.
🞂 Different PAM matrices correspond to different
lengths of time in the evolution of the protein
sequence.

22

3
5/19/2024

PAM Matrices
🞂 PAM matrices are amino acid substitution matrices that
encode the expected evolutionary change at the amino
acid level.
🞂 Each PAM matrix is designed to compare two sequences
which are a specific number of PAM units apart.
🞂 One PAM unit is defined as 1% of the amino acids positions
that have been changed.
🞂 Two sequences S1 and S2 are at evolutionary distance of 1
PAM unit ,if S1 has converted to S2 with an average of one
amino acid substitution per 100 amino acids.
🞂 250 PAM = 250 mutations per 100 amino acids, so 2.5
accepted mutations per amino acid

23

PAM Matrices
🞂 When used for protein comparison, the mutation
probability (odds) matrix is normalized and the
logarithm is taken. (this lets us add the scores along
a protein instead of multiplying the probabilities).
The resulting matrix is the "log-odds" matrix, known
as the PAM matrix.

24

4
5/19/2024

PAM Series
🞂 There is a whole series of matrices: PAM10 ……..
PAM250
🞂 These matrices are extrapolated from PAM1 matrix
(by matrix multiplication)
🞂 The PAM120 score matrix is designed to compare
between sequences that are 120 PAM units apart:
The score it gives a pair of sequences is the (log of
the) probabilities of such sequences evolving during
120 PAM units of evolution.

25

PAM Series
🞂 For any specific pair (Ai, Aj) of amino acids the (i,j)
entry in the PAM n matrix reflects the frequency at
which Ai is expected to replace with Aj in two
sequences that are n PAM units diverged. These
frequencies should be estimated by gathering
statistics on replaced amino acids.

26

5
5/19/2024

PAM 100

27

Creation of a PAM matrix


1. Construct a multiple sequence alignment.
2. From the alignment, construct a phylogenetic tree.
3. For each amino acid type, the frequency with
which it is substituted by every other amino acid is
calculated.(Fij)
4. Compute the amino acid mutability, mi of each
amino acid.

28

6
5/19/2024

Problems with PAM


🞂 Not all position are same
🞂 Evolutionary rates vary greatly with in a sequence.
🞂 Environment changes over evolutionary time
🞂 Difficulty of determining ancestral relationships
among sequences.

29

BLOSUM
🞂 Block Substitution Matrix.
🞂 BLOSUM matrices were first by Steven Henikoff and
Jorja Henikoff
🞂 Only blocks of amino acid sequences with small
change between them are considered. These blocks
are called conserved blocks.
🞂 Local alignment

30

7
5/19/2024

BLOSUM
🞂 The Blocks database contains multiply aligned
ungapped segments corresponding to the most
highly conserved regions of proteins (local alignment
versus global alignment).
🞂 Blocks contains sequences at all different
evolutionary distances.

31

BLOSUM
🞂 In each alignment the sequences similar at some threshold
value of percent identity were clustered into groups and
averaged.
🞂 Different BLOSUM matrices differ in the % sequence identity
used in clustering.
🞂 Therefore, BLOSUM62 means that the sequences used to
create this matrix have approximately 62% identity.
🞂 BLOSUM matrices are derived from blocks whose alignment
corresponds to the BLOSUM-matrix number.
🞂 BLOSUM62 represents closer sequences than BLOSUM45.

32

8
5/19/2024

BLOSUM

33

Construction of BLOSUM
Step 0: Eliminating the sequences that are more than r%
identical

34

9
5/19/2024

Construction of BLOSUM

35

Construction of BLOSUM

36

10
5/19/2024

Construction of BLOSUM
🞂 Step 3: Count the observed frequency of Amino acid
pair.
◦ ABobs =8/60
🞂 Step 4: Count the expected frequency of amino acid
pairs.
◦ ABexp =(14/24 X 4/24) X 2
= 112/576
● 2 -> Since ancestral states are not known , we will consider both
substitutions AB and BA as equiprobable.
🞂 Step 5: Calculate the log odd ratio.
◦ 2log2AB = 2log2(O/E) = 2log2((8/60)/(112/576)
= - 1.09

37

Construction of BLOSUM
Pair Observed(O) Expected (E) 2log2(O/E)
AA 26/60 196/576 .70
AB 8/60 112/576 -1.09
AC 10/60 168/576 -1.61
BB 3/60 16/576 1.70
BC 6/60 48/576 0.53
CC 7/60 36/576 1.80

38

11
5/19/2024

BLOSUM Matrices
🞂 No extrapolations are made in going to higher
evolutionary distances.
🞂 High number - closely related sequences
🞂 Low number - distant sequences.
🞂 BLOSUM62 is the most popular: best for general
alignment.

39

PAM VS BLOSUM
PAM BLOSUM

PAM matrices are used to score alignments BLOSUM matrices are used to score alignments
between closely related protein sequences. between evolutionarily divergent protein
sequences.
Based on global alignments Based on local alignments

Alignments have high similarity than Alignments have low similarity than PAM
BLOSUM alignments alignments

Mutations in global alignments are very Based on highly conserved stretches of


significant. alignments

Higher numbers in the PAM matrix naming Higher numbers in the BLOSUM matrix
denotes greater evolutionary distance naming denotes higher sequence similarity
and smaller evolutionary distance

useful at short evolutionary distances (PAM10 - At long evolutionary distances, for example
PAM120). PAM250 or 20% identity, BLOSUM matrices are
more effective
Example: PAM 250 is used for more distant Example: BLOSUM 80 is used for closely
sequences than PAM 120 related sequences than BLOSUM 62

40

12

You might also like