0% found this document useful (0 votes)

8 views20 pages

Lecture 9 Scoring Matrices

The document discusses scoring matrices in bioinformatics, focusing on PAM and BLOSUM matrices for sequence alignment. It highlights the differences between these matrices, their applications, and the importance of using appropriate scoring for evolutionary analysis. Additionally, it outlines an assignment related to the Pyruvate Decarboxylase Gene, emphasizing the use of bioinformatics tools for genetic analysis.

Uploaded by

aditya23045

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views20 pages

Lecture 9 Scoring Matrices

Uploaded by

aditya23045

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Practical Bioinformatics

Lecture 9
Scoring Matrices
Which PAM matrix to use?
Human beta globin (NP_000509.1)
and Chimp beta globin
(XP_508242.1)- 100% amino acid
identity.

Human beta globin and alpha globin

-Divergent. Mismatches are
assigned large negative scores.

Most broadly useful scoring matrix

such as BLOSUM62
Twilight Zone
Dayhoff Model (Step 6):Mutation Probability to Odds

1: substitution occurs as often as can be expected by chance.

> 1: Alignment of two residues occurs more often than expected by chance (e.g., a
conservative substitution of serine for threonine)

<1: Alignment is not favored

Dayhoff Model (Step 7): log Odds as the score
Relatedness

IS symmetric
Salient Differences between PAM and Relatedness
PAM Matrix (Asymmetric)

● The PAM matrix PPP is a Markov transition matrix that describes the
probability of one amino acid changing into another over a given evolutionary
timescale.
● The elements of PPP, say P(A→B)P(A \to B)P(A→B), represent the
probability of amino acid AAA mutating into BBB.
● Due to unequal amino acid frequencies and directional mutation rates,
P(A→B)≠P(B→A) making PPP asymmetric.
Salient Differences between PAM and Relatedness
Construction of the Relatedness Matrix (Symmetric)

● The relatedness matrix is derived from the PAM matrix using the
Using scores to align sequences

Using PAM 250

Intuition:
How will we use today’s learning?
pam100 <- read.table(system.file("matrices/pam/pam100", package = "seqinr"),
as.is = TRUE)

print(pam100)

seq1 <- "HEAGAWGHEE"

seq2 <- "PAWHEAE"

Example Alignment using PAM
install.packages("Biostrings")

library(Biostrings)

alignment <- pairwiseAlignment(pattern = seq1, subject = seq2, substitutionMatrix

= pam100)

print(alignment)
BLOSUM: Henikoff and Henikoff (1992, 1996)
BLOCKS database- over 500 groups of local multiple alignments (blocks) of
distantly related protein sequences.

Focused on conserved regions (blocks) of proteins that are distantly related to

each other.

Henikoffs’ score

General form of
substitution matrices
BLOSUM62
Default scoring matrix for the BLAST protein search programs at NCBI

Merges all proteins in an alignment that have 62% amino acid identity or greater
into one sequence.

E.g. a block of aligned globin orthologs have 62, 80, and 95% AA identity- all
weighted (grouped) as one sequence.

Useful for scoring proteins that share less than 62% identity because it is weighted
more heavily by proteins that share less than 62% identity
BLOSUM 62
Summary of Henikoffs’ Paper
● BLOSUM performed dramatically better than PAM matrices
● Especially useful for identifying weakly scoring alignments
● BLOSUM62 performed slightly better than BLOSUM60 or
BLOSUM70
● BLOSUM50 and BLOSUM90 are other commonly used scoring
matrices in BLAST searches.
● The FASTA family of sequence comparison programs use
BLOSUM50 as a default
Salient differences between PAM and BLOSUM
1. PAM - explicit evolutionary model (i.e. replacements are counted on the
branches of a phylogenetic tree), BLOSUM- No phylogenetic tree
2. PAM - global alignment matrix, includes both highly conserved and highly
mutable regions. BLOSUM - only highly conserved regions in series of
alignments forbidden to contain gaps.
3. BLOSUM - relatedness is contextual to the specific group of sequences.
4. PAM - Higher numbers in name denote larger evolutionary distance,
BLOSUM - higher number implies higher sequence similarity and
therefore smaller evolutionary distance.
Notice the sequence
Assignment 2
● Background: Pyruvate Decarboxylase Gene is key for increasing CO2 production
by yeast and may have a role in improving bread leavening ad wine making
● Access the NCBI database (https://www.ncbi.nlm.nih.gov/) and download the DNA
sequence of the Saccharomyces cerevisiae PDC gene.
● Use BLAST (Basic Local Alignment Search Tool) to compare the PDC gene
sequence across different Saccharomyces cerevisiae strains and identify SNPs
● Use an translation tool (e.g., ExPASy Translate tool) to translate the normal and
SNP-containing PDC gene sequences into amino acid sequences
● Select a SNP within the PDC gene. Utilize tools like SIFT (Sorting Intolerant From
Tolerant) or PolyPhen (Polymorphism Phenotyping) to predict the impact of the
SNP on the PDC protein function
Evaluation Criteria
● Understanding of the biological concepts and bioinformatics tools.
● Accuracy in data retrieval and analysis.
● Depth of analysis in the impact of SNPs on protein function.
● Quality and clarity of writing.
● Proper citation of sources and presentation of data.

Biostatistics in Orthodontics
100% (3)
Biostatistics in Orthodontics
108 pages
BLOSUM Matrices
No ratings yet
BLOSUM Matrices
18 pages
Bioinformatics Assingment - New Kandy - Draft
100% (1)
Bioinformatics Assingment - New Kandy - Draft
14 pages
Fundamental Concepts For New Clinical Trialists - 1st Edition Complete Ebook Edition
100% (14)
Fundamental Concepts For New Clinical Trialists - 1st Edition Complete Ebook Edition
15 pages
Bioinformatics: A Practical Guide To Next Generation Sequencing Data Analysis (Chapman & Hall/CRC Computational Biology Series) 1st Edition Hamid D. Ismail Download
No ratings yet
Bioinformatics: A Practical Guide To Next Generation Sequencing Data Analysis (Chapman & Hall/CRC Computational Biology Series) 1st Edition Hamid D. Ismail Download
42 pages
(Computational Biology Series) Dmitrij Frishman, Manja Marz - Virus Bioinformatic-CRC Press (2021)
No ratings yet
(Computational Biology Series) Dmitrij Frishman, Manja Marz - Virus Bioinformatic-CRC Press (2021)
297 pages
Jamshedpur Co Operative College
No ratings yet
Jamshedpur Co Operative College
2 pages
Substitution Matrix
No ratings yet
Substitution Matrix
10 pages
Bioinformatics 1 p3
No ratings yet
Bioinformatics 1 p3
17 pages
BLAST Lecture Notes
No ratings yet
BLAST Lecture Notes
16 pages
Bioinformatics Lecture 1
No ratings yet
Bioinformatics Lecture 1
48 pages
PAM and BLOSUM
No ratings yet
PAM and BLOSUM
21 pages
PAM and BLOSUM Presentation
No ratings yet
PAM and BLOSUM Presentation
11 pages
2-Substitution Matrices and Python - 2017
No ratings yet
2-Substitution Matrices and Python - 2017
65 pages
Bioinformatics Module 2 Notes
No ratings yet
Bioinformatics Module 2 Notes
28 pages
Bioinformatics II: PAM Matrices
No ratings yet
Bioinformatics II: PAM Matrices
9 pages
Bioinformatics in PAM AND BLOSUM
100% (15)
Bioinformatics in PAM AND BLOSUM
17 pages
Basic Bioinformatics
No ratings yet
Basic Bioinformatics
40 pages
16 Unnamed 08 08 2024
No ratings yet
16 Unnamed 08 08 2024
13 pages
PAM and BLOSUM Substitution Matrices
No ratings yet
PAM and BLOSUM Substitution Matrices
3 pages
Computation and System Biology Assignment Help
No ratings yet
Computation and System Biology Assignment Help
12 pages
Unit Iii
No ratings yet
Unit Iii
14 pages
Using Scoring Matrices
No ratings yet
Using Scoring Matrices
3 pages
Lecture2-Structural Bioinformatics
No ratings yet
Lecture2-Structural Bioinformatics
8 pages
PAM and BLOSUM Matrices
No ratings yet
PAM and BLOSUM Matrices
3 pages
Blast
No ratings yet
Blast
26 pages
Selecting The Right Similarity-Scoring Matrix
No ratings yet
Selecting The Right Similarity-Scoring Matrix
18 pages
Biostatistics On Health
No ratings yet
Biostatistics On Health
44 pages
Protein Alignment Scoring - PAM and BLOSUM
No ratings yet
Protein Alignment Scoring - PAM and BLOSUM
11 pages
1 Pearson
No ratings yet
1 Pearson
9 pages
Mount - 2008 - Using BLOSUM in Sequence Alignments
No ratings yet
Mount - 2008 - Using BLOSUM in Sequence Alignments
5 pages
Reactome
No ratings yet
Reactome
4 pages
Lecture 0
No ratings yet
Lecture 0
10 pages
Hoffman Biostatistics PHD Statement of Purpose
No ratings yet
Hoffman Biostatistics PHD Statement of Purpose
6 pages
Bif 401 PPT 1to 80 by M.habib
No ratings yet
Bif 401 PPT 1to 80 by M.habib
588 pages
The Bioinformatics Toolbox Extends MATLAB
No ratings yet
The Bioinformatics Toolbox Extends MATLAB
19 pages
BLOSUM Matrices
No ratings yet
BLOSUM Matrices
18 pages
Rosales
No ratings yet
Rosales
27 pages
MSC Bioinformatics and Systems Biology
No ratings yet
MSC Bioinformatics and Systems Biology
4 pages
Lab Report 8
No ratings yet
Lab Report 8
7 pages
Comparative Genomics
No ratings yet
Comparative Genomics
11 pages
Comparison of The PAM and BLOSUM Amino Acid Substitution Matrices
No ratings yet
Comparison of The PAM and BLOSUM Amino Acid Substitution Matrices
4 pages
Biostatisticsfor Pharmacy Students
No ratings yet
Biostatisticsfor Pharmacy Students
236 pages
Activity 1 - Data Presentation
No ratings yet
Activity 1 - Data Presentation
18 pages
Unit-5 Bioinformatics
No ratings yet
Unit-5 Bioinformatics
13 pages
Biometry: The Principles and Practices of Statistics in Biological Research. ISBN 0716724111, 978-0716724117
100% (23)
Biometry: The Principles and Practices of Statistics in Biological Research. ISBN 0716724111, 978-0716724117
23 pages
CL662 Homework 3: Roll Number: 150020027 Name: Prathamesh Kulkarni
No ratings yet
CL662 Homework 3: Roll Number: 150020027 Name: Prathamesh Kulkarni
21 pages
Scoring Matrices and The Statistical Significance of Molecular Sequence Features
No ratings yet
Scoring Matrices and The Statistical Significance of Molecular Sequence Features
2 pages
5.pairwise Alignment
No ratings yet
5.pairwise Alignment
85 pages
Where Did The BLOSUM62 Alignment Score Matrix Come From?: Primer
No ratings yet
Where Did The BLOSUM62 Alignment Score Matrix Come From?: Primer
2 pages
Blast
No ratings yet
Blast
19 pages
On Mantel Haenszel
No ratings yet
On Mantel Haenszel
3 pages
BIOL 1462 Mon Biometry Answers
No ratings yet
BIOL 1462 Mon Biometry Answers
9 pages
Bioinformatics I
No ratings yet
Bioinformatics I
39 pages
L2 Proteomics, Genomics and Bioinformatics
No ratings yet
L2 Proteomics, Genomics and Bioinformatics
30 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
72 pages
12 Blossum
No ratings yet
12 Blossum
10 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Unit2 2
No ratings yet
Unit2 2
30 pages
1 What Is Bioinformatics
No ratings yet
1 What Is Bioinformatics
34 pages
BLOSUM
No ratings yet
BLOSUM
3 pages
SECT 5 SL L1-Rev
No ratings yet
SECT 5 SL L1-Rev
30 pages
BSN 315 Biostatistics
No ratings yet
BSN 315 Biostatistics
2 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Bioe Week 2-Nature of Biostatistics and Data Processing
No ratings yet
Bioe Week 2-Nature of Biostatistics and Data Processing
6 pages
Sequence Alignment: Scoring Matrices
No ratings yet
Sequence Alignment: Scoring Matrices
30 pages
Lecture 5 - DataBase
No ratings yet
Lecture 5 - DataBase
18 pages
Introduction To Bioinformatics: Sequence Alignment
No ratings yet
Introduction To Bioinformatics: Sequence Alignment
29 pages
Lecture 3: Sequence Alignments: Ly Le, PHD
No ratings yet
Lecture 3: Sequence Alignments: Ly Le, PHD
35 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
PAM Blosum: Assignment 1 Bioinformatics (DSE 1)
100% (3)
PAM Blosum: Assignment 1 Bioinformatics (DSE 1)
9 pages
GlOsario Bioinformatica
No ratings yet
GlOsario Bioinformatica
5 pages
Mid Bioinfor
No ratings yet
Mid Bioinfor
6 pages
6 Blastp
No ratings yet
6 Blastp
1 page
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
PAM Abd BLOSUM
No ratings yet
PAM Abd BLOSUM
3 pages
02.-Sequence Analysis PDF
No ratings yet
02.-Sequence Analysis PDF
14 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Gene Finding
No ratings yet
Gene Finding
31 pages
D. Higgins, Willie Taylor Bioinformatics Sequence, Structure and Databanks PDF
100% (2)
D. Higgins, Willie Taylor Bioinformatics Sequence, Structure and Databanks PDF
268 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Cellular and Molecular Pharmacology
From Everand
Cellular and Molecular Pharmacology
Dr. Amteshwar Singh Jaggi
4.5/5 (6)
Classical Approach to Constrained and Unconstrained Molecular Dynamics
From Everand
Classical Approach to Constrained and Unconstrained Molecular Dynamics
Ajith Gunaratne
No ratings yet

Lecture 9 Scoring Matrices

Uploaded by

Lecture 9 Scoring Matrices

Uploaded by

Practical Bioinformatics

Human beta globin and alpha globin

Most broadly useful scoring matrix

1: substitution occurs as often as can be expected by chance.

<1: Alignment is not favored

Using PAM 250

seq1 <- "HEAGAWGHEE"

seq2 <- "PAWHEAE"

alignment <- pairwiseAlignment(pattern = seq1, subject = seq2, substitutionMatrix

Focused on conserved regions (blocks) of proteins that are distantly related to

You might also like