0% found this document useful (0 votes)

13 views11 pages

LO5 Pairwise Sequence Alignment

Uploaded by

dumpdave30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

LO5 Pairwise Sequence Alignment

Uploaded by

dumpdave30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Bio16 Computational Biology

Pairwise Sequence Alignment (PSA)

Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University

What do we do with sequenced data?

Find whether a gene/protein is RELATED to other genes/proteins

➢ Homologous
➢ Common function
➢ Domains or Motifs shared among groups

Pairwise Sequence Alignment

➢ The process of lining up two or more sequences to achieve
maximal levels of identity (and conservation, in the case of
amino acid sequences) for the purpose of assessing the
degree of similarity and the possibility of homology.

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

1
Pairwise Sequence Alignment

HOMOLOGY IDENTITY
share a common evolutionary ancestry extent to which two amino acid (or nucleotide)
sequences are invariant (unchanged) = exact
no “degrees”; either homologous or NOT matching

almost always share a significantly consider 3D structure also

related three-dimensional structure
(diverge much more slowly)
SIMILARITY
usually share significant
general description of a relationship = optimal
identity/similarity
matching
either PARALOGOUS or ORTHOLOGOUS does not imply any reasons for the observed
sameness

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Basis of Similarity in Proteins

Esquivel et al. (2013). Decoding the Building Blocks of Life from the Perspective of Quantum Information. InTech. doi: 10.5772/55160

2
Homology

ORTHOLOGS PARALOGS
homologous sequences in different homologous sequences that arose by a
species that arose from a common mechanism such as gene duplication
ancestral gene during speciation

Human Globins

Myoglobin

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Basis of Scoring Matrices

Sources of Sequence Variations

Mismatch = 0
Perfect Match = +1

Substitution
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

3
Basis of Scoring Matrices
Sources of Sequence Variations

Gap Opening = -2
Gap= -1

Insertion/Deletion
(InDel)

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Basis of Scoring Matrices

Why penalize gaps?

The optimal alignment of two similar sequences is usually that which

✓ maximizes the number of matches and
✓ minimizes the number of gaps

There is a tradeoff between these two

✓ adding gaps reduces mismatches

Permitting the insertion of arbitrarily many gaps can lead to high-

scoring alignments of non-homologous sequences.

Penalizing gaps forces alignments to have relatively few gaps.

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

4
Protein Sequence Alignment

Identity matrix
o Exact matches receive one score and non-exact matches a different
score (1 on the diagonal 0 everywhere else)

Mutation data matrix

o a scoring matrix compiled based on observation of protein mutation
rates: some mutations are observed more often than others (PAM,
BLOSUM)

Physical properties matrix

o amino acids with similar biophysical properties receive a high score.
Genetic code matrix
o amino acids are scored based on similarities in the coding triple.

Przytycka, T. (2007). Scoring Matrices Position Specific Scoring Matrices Motifs (Lecture 3: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect03_Scoring_Matr_Motifs.pdf

Basis of Scoring Matrices

Accepted Point Mutation

(PAM)

a replacement of one amino

acid in a protein by another
residue that has been
accepted by natural selection

Mutation Probability Matrix at Interval of 1 (PAM1)

(unit of evolutionary divergence in which 1% of the amino acids have been
changed between the two protein sequences)

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

5
Basis of Scoring Matrices

Accepted Point Mutation

(PAM)

a replacement of one amino

acid in a protein by another
residue that has been
accepted by natural selection

Log –Odds Matrix for PAM10 and PAM250

(allows us to sum the scores of the aligned residues when we perform an
overall alignment of two sequences)

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Basis of Scoring Matrices

Accepted Point Mutation

(PAM)

a replacement of one amino

acid in a protein by another
residue that has been
accepted by natural selection

Log –Odds Matrix for PAM10 and PAM250

(allows us to sum the scores of the aligned residues when we perform an
overall alignment of two sequences)

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

6
Basis of Scoring Matrices
Block Substitution Matrix
(BLOSUM)

By Henikoff and Henikoff (1992,

1996)
They focused on conserved
regions (blocks) of proteins that
are distantly related to each
other

BLOSUM62 matrix
+ values = frequent exchanges
- values = rare replacements
Al-Neman and Ali (2019). An Efficient Parallel Algorithm for Improving Multiple Sequence Alignment on Multi-core. Conference Paper. DOI: 10.1109/IEC47844.2019.8950543

Basis of Scoring Matrices

Well‐Conserved Distantly Related

Proteins Proteins
General Use

BLAST

Summary of PAM and BLOSUM matrices

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

7
Methods of Alignment
By Hand
➢ slide sequences on two lines of a
word processor

Dot Plot
➢ Graphical matrix GCCTA - TTACGTCCT
Rigorous Algorithm
GCATACGTA-GCCCT
➢ Dynamic programming (slow,
optimal)
Aligning by hand
Heuristic methods
➢ fast, approximate need a scoring system to find the best alignment
➢ BLAST and FASTA = word matching (e.g., % Identity=10/14=71.4%)
and hash tables

Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf

Methods of Alignment
By Hand A
➢ slide sequences on two lines of a G
word processor T
Dot Plot C
➢ Graphical matrix A
T
Rigorous Algorithm
➢ Dynamic programming (slow, T
optimal) A
G
Heuristic methods
➢ fast, approximate C
➢ BLAST and FASTA = word matching T T A C T T G A T T
and hash tables
Dot Plot
gives an overview of all possible alignment
Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf

8
Methods of Alignment
By Hand
➢ slide sequences on two lines of a
word processor

Dot Plot
➢ Graphical matrix

Rigorous Algorithm
➢ Dynamic programming (slow,
optimal)

Heuristic methods
➢ fast, approximate
➢ BLAST and FASTA = word matching
and hash tables
Reducing Noise in Dot Plot

Methods of Alignment
Algorithm vs Program
By Hand
➢ slide sequences on two lines of a a step-by-step set of a set of instructions that
instructions designed to solve uses an algorithm to solve
word processor a specific problem a task

Dot Plot
Dynamic Programming Heuristic Programming
➢ Graphical matrix
finding optimal alignments makes approximations of the
between sequences by best solution without
Rigorous Algorithm exhaustively
considering all possible
➢ Dynamic programming (slow, alignments and scoring considering every possible
optimal) them based on a scoring outcome
system
Heuristic methods
➢ fast, approximate
➢ BLAST and FASTA = word matching
and hash tables
Global vs
Local
Alignment Alignment
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

9
Dynamic Programming Examples

Global Alignment vs Local Alignment

Needleman and Wunsch (1970) Smith and Waterman (1981)

contains the entire sequence of each focuses on the regions of greatest

protein or DNA molecule similarity between two sequences

start at the beginning of two finds the region (or regions) of

sequences and add gaps to each highest similarity between two
until the end of one is reached sequences and build the alignment
(end-to-end) outward from there

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Statistical Significance of PSA

maximize the sensitivity and
specificity of sequence
alignments

Global Alignments
Z-score = use α as the threshold

Local Alignments
% Identity = may not be
informative; 25% at 150 more
residues or 40% at 70 residues
H = relative entropy; measures
observed alignment distribution
to expected distribution by
chance

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

10
Bio16 Computational Biology
Pairwise Sequence Alignment (PSA)

References:
Al-Neman and Ali (2019). An Efficient Parallel Algorithm for Improving Multiple Sequence Alignment on Multi-core. Conference Paper. DOI:
10.1109/IEC47844.2019.8950543

Esquivel, R. O., Molina-Espíritu, M., Salas, F., Soriano, C., Barrientos, C., Dehesa, J. S., and Dobado, J. A. (2013). Decoding the Building Blocks of Life from the
Perspective of Quantum Information. InTech. doi: 10.5772/55160

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Przytycka, T. (2007). Scoring Matrices Position Specific Scoring Matrices Motifs (Lecture 3: Principles of Computational Biology). Accessed at
https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect03_Scoring_Matr_Motifs.pdf

Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at
https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf

Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065

Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University

Proteins Concepts in Biochemistry 1st Edition Best Quality Download
100% (18)
Proteins Concepts in Biochemistry 1st Edition Best Quality Download
14 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Molecular Biology MSC Final
No ratings yet
Molecular Biology MSC Final
42 pages
Protein Structure Prediction Thesis
100% (3)
Protein Structure Prediction Thesis
8 pages
2011 Book StrainEngineering
100% (1)
2011 Book StrainEngineering
477 pages
UsersGuide1 8 PDF
No ratings yet
UsersGuide1 8 PDF
1,093 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Functional Genomics
100% (3)
Functional Genomics
404 pages
Phylogeny of Clematis
No ratings yet
Phylogeny of Clematis
43 pages
Critical Assessment of Techniques For Protein Structure Prediction
No ratings yet
Critical Assessment of Techniques For Protein Structure Prediction
345 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
Lecture 6 Evolutionary Sequence Alignment Algorithms
No ratings yet
Lecture 6 Evolutionary Sequence Alignment Algorithms
26 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Lec 02
No ratings yet
Lec 02
103 pages
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
No ratings yet
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
22 pages
An Introductory Course Bioinformatics-I: A Student Handout
No ratings yet
An Introductory Course Bioinformatics-I: A Student Handout
320 pages
Bioinformatics Toolbox™ User's Guide PDF
No ratings yet
Bioinformatics Toolbox™ User's Guide PDF
351 pages
L6-Pairwise Seq Alignment
No ratings yet
L6-Pairwise Seq Alignment
70 pages
5.pairwise Alignment
No ratings yet
5.pairwise Alignment
85 pages
Manual PDF
100% (1)
Manual PDF
53 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
BT302 L3 Psa
No ratings yet
BT302 L3 Psa
47 pages
BLOSUM Matrices
No ratings yet
BLOSUM Matrices
18 pages
Module II
No ratings yet
Module II
51 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
L8 Msa
No ratings yet
L8 Msa
52 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Inexact Matching, Sequence Alignment, and Dynamic Programming
No ratings yet
Inexact Matching, Sequence Alignment, and Dynamic Programming
57 pages
BLAST Topic
No ratings yet
BLAST Topic
13 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Evolution Is Teleological
From Everand
Evolution Is Teleological
Dirk Bontes
1/5 (1)
Topical Guidebook For GCE O Level Biology 3 Part 2
From Everand
Topical Guidebook For GCE O Level Biology 3 Part 2
Esther Chen
5/5 (1)
Aula 2
No ratings yet
Aula 2
22 pages
AsBioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
AsBioinfo Ders 7 ALLIGNMENT - 1
9 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Bioinformatics I
No ratings yet
Bioinformatics I
39 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Algorithm Design and Scoring Matrices PDF
No ratings yet
Algorithm Design and Scoring Matrices PDF
31 pages
Bioinformatics Approaches and Applications in Plan
No ratings yet
Bioinformatics Approaches and Applications in Plan
13 pages
Introduction To Bioinformatics Lecture 3
No ratings yet
Introduction To Bioinformatics Lecture 3
20 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
Sequence Alignment: Scoring Matrices
No ratings yet
Sequence Alignment: Scoring Matrices
30 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Larrondo. 2003. A Novel Extracellular Multicopper Oxidase From Phanerochaete Chrysosporium With Ferroxidase Activity
No ratings yet
Larrondo. 2003. A Novel Extracellular Multicopper Oxidase From Phanerochaete Chrysosporium With Ferroxidase Activity
7 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Building A Multiple Sequence Alignment
No ratings yet
Building A Multiple Sequence Alignment
7 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Best Practices For Variant Calling in Clinical Sequencing: Review Open Access
No ratings yet
Best Practices For Variant Calling in Clinical Sequencing: Review Open Access
13 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Syllabus 2010
No ratings yet
Syllabus 2010
38 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Sequence Weights: Stephen F. Altschul
No ratings yet
Sequence Weights: Stephen F. Altschul
17 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Multiple Sequence Alignment Black and White
No ratings yet
Multiple Sequence Alignment Black and White
2 pages
Introduction To Bioinformatics: Sequence Alignment
No ratings yet
Introduction To Bioinformatics: Sequence Alignment
29 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Role of Bioinformatics in Agriculture
No ratings yet
Role of Bioinformatics in Agriculture
6 pages
Building Phylogenetic Trees From Molecular Data With MEGA
No ratings yet
Building Phylogenetic Trees From Molecular Data With MEGA
7 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
John Moult, Krzysztof Fidelis, CASP
No ratings yet
John Moult, Krzysztof Fidelis, CASP
4 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Lecture/Lab: BLAST: Materials Last Updated June 2007
No ratings yet
Lecture/Lab: BLAST: Materials Last Updated June 2007
11 pages