0% found this document useful (0 votes)
8 views20 pages

Lecture 9 Scoring Matrices

The document discusses scoring matrices in bioinformatics, focusing on PAM and BLOSUM matrices for sequence alignment. It highlights the differences between these matrices, their applications, and the importance of using appropriate scoring for evolutionary analysis. Additionally, it outlines an assignment related to the Pyruvate Decarboxylase Gene, emphasizing the use of bioinformatics tools for genetic analysis.

Uploaded by

aditya23045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Lecture 9 Scoring Matrices

The document discusses scoring matrices in bioinformatics, focusing on PAM and BLOSUM matrices for sequence alignment. It highlights the differences between these matrices, their applications, and the importance of using appropriate scoring for evolutionary analysis. Additionally, it outlines an assignment related to the Pyruvate Decarboxylase Gene, emphasizing the use of bioinformatics tools for genetic analysis.

Uploaded by

aditya23045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Practical Bioinformatics

Lecture 9
Scoring Matrices
Which PAM matrix to use?
Human beta globin (NP_000509.1)
and Chimp beta globin
(XP_508242.1)- 100% amino acid
identity.

Human beta globin and alpha globin


-Divergent. Mismatches are
assigned large negative scores.

Most broadly useful scoring matrix


such as BLOSUM62
Twilight Zone
Dayhoff Model (Step 6):Mutation Probability to Odds

1: substitution occurs as often as can be expected by chance.

> 1: Alignment of two residues occurs more often than expected by chance (e.g., a
conservative substitution of serine for threonine)

<1: Alignment is not favored


Dayhoff Model (Step 7): log Odds as the score
Relatedness

IS symmetric
Salient Differences between PAM and Relatedness
PAM Matrix (Asymmetric)

● The PAM matrix PPP is a Markov transition matrix that describes the
probability of one amino acid changing into another over a given evolutionary
timescale.
● The elements of PPP, say P(A→B)P(A \to B)P(A→B), represent the
probability of amino acid AAA mutating into BBB.
● Due to unequal amino acid frequencies and directional mutation rates,
P(A→B)≠P(B→A) making PPP asymmetric.
Salient Differences between PAM and Relatedness
Construction of the Relatedness Matrix (Symmetric)

● The relatedness matrix is derived from the PAM matrix using the
Using scores to align sequences

Using PAM 250

Intuition:
How will we use today’s learning?
pam100 <- read.table(system.file("matrices/pam/pam100", package = "seqinr"),
as.is = TRUE)

print(pam100)

seq1 <- "HEAGAWGHEE"

seq2 <- "PAWHEAE"


Example Alignment using PAM
install.packages("Biostrings")

library(Biostrings)

alignment <- pairwiseAlignment(pattern = seq1, subject = seq2, substitutionMatrix


= pam100)

print(alignment)
BLOSUM: Henikoff and Henikoff (1992, 1996)
BLOCKS database- over 500 groups of local multiple alignments (blocks) of
distantly related protein sequences.

Focused on conserved regions (blocks) of proteins that are distantly related to


each other.

Henikoffs’ score

General form of
substitution matrices
BLOSUM62
Default scoring matrix for the BLAST protein search programs at NCBI

Merges all proteins in an alignment that have 62% amino acid identity or greater
into one sequence.

E.g. a block of aligned globin orthologs have 62, 80, and 95% AA identity- all
weighted (grouped) as one sequence.

Useful for scoring proteins that share less than 62% identity because it is weighted
more heavily by proteins that share less than 62% identity
BLOSUM 62
Summary of Henikoffs’ Paper
● BLOSUM performed dramatically better than PAM matrices
● Especially useful for identifying weakly scoring alignments
● BLOSUM62 performed slightly better than BLOSUM60 or
BLOSUM70
● BLOSUM50 and BLOSUM90 are other commonly used scoring
matrices in BLAST searches.
● The FASTA family of sequence comparison programs use
BLOSUM50 as a default
Salient differences between PAM and BLOSUM
1. PAM - explicit evolutionary model (i.e. replacements are counted on the
branches of a phylogenetic tree), BLOSUM- No phylogenetic tree
2. PAM - global alignment matrix, includes both highly conserved and highly
mutable regions. BLOSUM - only highly conserved regions in series of
alignments forbidden to contain gaps.
3. BLOSUM - relatedness is contextual to the specific group of sequences.
4. PAM - Higher numbers in name denote larger evolutionary distance,
BLOSUM - higher number implies higher sequence similarity and
therefore smaller evolutionary distance.
Notice the sequence
Assignment 2
● Background: Pyruvate Decarboxylase Gene is key for increasing CO2 production
by yeast and may have a role in improving bread leavening ad wine making
● Access the NCBI database (https://www.ncbi.nlm.nih.gov/) and download the DNA
sequence of the Saccharomyces cerevisiae PDC gene.
● Use BLAST (Basic Local Alignment Search Tool) to compare the PDC gene
sequence across different Saccharomyces cerevisiae strains and identify SNPs
● Use an translation tool (e.g., ExPASy Translate tool) to translate the normal and
SNP-containing PDC gene sequences into amino acid sequences
● Select a SNP within the PDC gene. Utilize tools like SIFT (Sorting Intolerant From
Tolerant) or PolyPhen (Polymorphism Phenotyping) to predict the impact of the
SNP on the PDC protein function
Evaluation Criteria
● Understanding of the biological concepts and bioinformatics tools.
● Accuracy in data retrieval and analysis.
● Depth of analysis in the impact of SNPs on protein function.
● Quality and clarity of writing.
● Proper citation of sources and presentation of data.

You might also like