2-Substitution Matrices and Python - 2017
2-Substitution Matrices and Python - 2017
+
Table of contents
• Scoring matrices
– PAM
– BLOSUM
• Intro to Python
Aligning Protein Sequences
• Classification
• Clustering of families
• Annotations (functional and structural)
Aligning Protein Sequences
• Proteins consist of 20 amino acids.
ACG
ATG
Mutational Distance
Assume we start with Methionine, which is
encoded by a single codon: ATG
Thr (Threonine) is encoded by AC[ACGT]
In order to mutate Met to Thr, one SNP (single
nucleotide point) mutation is enough
• 3 point mutations are required to mutate Met
to His - encoded by CA[TC]
• Therefore, His is more distant to Met.
Amino acids’ chemical properties
• Size
• Structure
• Polarity
• Charge
• Acidity (pKa)
These properties affect
mutation probabilities
Amino acids’ chemical properties
Mutations which
change functionality
(chemical properties)
of the protein, should
be less likely to occur.
Evolutionary time
• Time is another aspect which needs attention.
• Does longer time permits less or more
mutation?
• How can that be included in the scoring
system ?
Evolutionary Substitution Matrix
• A substitution matrix contains values
proportional to the probability that amino acid
mutates into amino acid for all pairs of amino
acids.
Evolutionary Substitution Matrix
• A substitution matrix contains values proportional to
the probability that amino acid mutates into amino
acid for all pairs of amino acids.
• Based on empirical observations
• Assumption: frequent substitutions reflect “safe
variations” and thus should be given a higher score,
while infrequent mutations are probably detrimental
and thus should be given lower score.
• The two major types of substitution matrices are PAM
and BLOSUM.
PAM Matrices
PAM – Percent/Point Accepted Mutations.
• The first widely used
scoring scheme used for
amino acid alignment.
• Devised by Margaret
Oakley Dayhoff and Co.
in 1978.
PAM – point accepted mutation
• Substitution of an amino acid in a protein with
another amino acid, which is accepted by the
process of natural selection.
• Silent or lethal mutations are not point
accepted mutations
PAM Matrices
• PAM matrices are noted as PAMn matrices
• PAM1 represents the time period over which
we expect 1% of the amino acids to undergo
point accepted mutations
Constructing PAM Matrices
• Examined 1572 substitutions in 71 families of
proteins (71 phylogenetic trees)
• The proteins sequences were at least 85%
identical
Constructing PAM Matrices
• Calculating - the amount of observed cases
when amino acid mutated to amino acid .
Constructing PAM Matrices
• - the amount of observed cases when amino acid
mutated to amino acid
• is the number amino acid appearances
• is a constant
• s[1:4]
– 'ell' -- chars starting at index 1 and extending up to but not including index 4
• s[1:]
– 'ello' -- omitting either index defaults to the start or end of the string
• s[:]
– 'Hello' -- omitting both always gives us a copy of the whole thing (this is the pythonic way to copy a
sequence like a string or list)
• s[1:100]
– 'ello' -- an index that is too big is truncated down to the string length
• s[-1]
– 'o' -- last char (1st from the end)
• s[-3:]
– 'llo' -- starting with the 3rd char from the end and extending to the end of the string.
If statement
if speed >= 80:
print 'License and registration please'
if mood == 'terrible' or speed >= 100: Indentation is very
print 'You have the right to remain silent.' important!
elif mood == 'bad' or speed >= 90:
print "I'm going to have to write you a ticket."
write_ticket()
else:
print "Let's try to keep it under 80 ok?"
f = open(“testfile.txt”,”w”)
f.write(“Hello World”)
f.write(“This is our new text file”)
f.write(“and this is another line.”)
f.write(“Why? Because we can.”)
f.close()
Functions
# Function definition is here
def print_info( name, age = 35 ):
print "Name: ", name
print "Age ", age
• Python 3
print('Hello, World!')
print("some text,", end="")
print(' print more text on the same line')
Python 2 vs. Python 3
• Many prefer to use Python 2 because of larger
library support
• Coding style is very similar – not hard to
transition from Python 2 to Python 3
• In the assignment you will use Python 2
Next practical session
• Blast
• Fasta