LO5 Pairwise Sequence Alignment
LO5 Pairwise Sequence Alignment
Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
1
Pairwise Sequence Alignment
HOMOLOGY IDENTITY
share a common evolutionary ancestry extent to which two amino acid (or nucleotide)
sequences are invariant (unchanged) = exact
no “degrees”; either homologous or NOT matching
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
Esquivel et al. (2013). Decoding the Building Blocks of Life from the Perspective of Quantum Information. InTech. doi: 10.5772/55160
2
Homology
ORTHOLOGS PARALOGS
homologous sequences in different homologous sequences that arose by a
species that arose from a common mechanism such as gene duplication
ancestral gene during speciation
Human Globins
Myoglobin
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
Mismatch = 0
Perfect Match = +1
Substitution
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
3
Basis of Scoring Matrices
Sources of Sequence Variations
Gap Opening = -2
Gap= -1
Insertion/Deletion
(InDel)
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
4
Protein Sequence Alignment
Identity matrix
o Exact matches receive one score and non-exact matches a different
score (1 on the diagonal 0 everywhere else)
Przytycka, T. (2007). Scoring Matrices Position Specific Scoring Matrices Motifs (Lecture 3: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect03_Scoring_Matr_Motifs.pdf
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
10
5
Basis of Scoring Matrices
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
11
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
12
6
Basis of Scoring Matrices
Block Substitution Matrix
(BLOSUM)
BLOSUM62 matrix
+ values = frequent exchanges
- values = rare replacements
Al-Neman and Ali (2019). An Efficient Parallel Algorithm for Improving Multiple Sequence Alignment on Multi-core. Conference Paper. DOI: 10.1109/IEC47844.2019.8950543
13
BLAST
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
14
7
Methods of Alignment
By Hand
➢ slide sequences on two lines of a
word processor
Dot Plot
➢ Graphical matrix GCCTA - TTACGTCCT
Rigorous Algorithm
GCATACGTA-GCCCT
➢ Dynamic programming (slow,
optimal)
Aligning by hand
Heuristic methods
➢ fast, approximate need a scoring system to find the best alignment
➢ BLAST and FASTA = word matching (e.g., % Identity=10/14=71.4%)
and hash tables
Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf
15
Methods of Alignment
By Hand A
➢ slide sequences on two lines of a G
word processor T
Dot Plot C
➢ Graphical matrix A
T
Rigorous Algorithm
➢ Dynamic programming (slow, T
optimal) A
G
Heuristic methods
➢ fast, approximate C
➢ BLAST and FASTA = word matching T T A C T T G A T T
and hash tables
Dot Plot
gives an overview of all possible alignment
Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf
16
8
Methods of Alignment
By Hand
➢ slide sequences on two lines of a
word processor
Dot Plot
➢ Graphical matrix
Rigorous Algorithm
➢ Dynamic programming (slow,
optimal)
Heuristic methods
➢ fast, approximate
➢ BLAST and FASTA = word matching
and hash tables
Reducing Noise in Dot Plot
Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf
17
Methods of Alignment
Algorithm vs Program
By Hand
➢ slide sequences on two lines of a a step-by-step set of a set of instructions that
instructions designed to solve uses an algorithm to solve
word processor a specific problem a task
Dot Plot
Dynamic Programming Heuristic Programming
➢ Graphical matrix
finding optimal alignments makes approximations of the
between sequences by best solution without
Rigorous Algorithm exhaustively
considering all possible
➢ Dynamic programming (slow, alignments and scoring considering every possible
optimal) them based on a scoring outcome
system
Heuristic methods
➢ fast, approximate
➢ BLAST and FASTA = word matching
and hash tables
Global vs
Local
Alignment Alignment
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
18
9
Dynamic Programming Examples
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
19
Global Alignments
Z-score = use α as the threshold
Local Alignments
% Identity = may not be
informative; 25% at 150 more
residues or 40% at 70 residues
H = relative entropy; measures
observed alignment distribution
to expected distribution by
chance
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
20
10
Bio16 Computational Biology
Pairwise Sequence Alignment (PSA)
References:
Al-Neman and Ali (2019). An Efficient Parallel Algorithm for Improving Multiple Sequence Alignment on Multi-core. Conference Paper. DOI:
10.1109/IEC47844.2019.8950543
Esquivel, R. O., Molina-Espíritu, M., Salas, F., Soriano, C., Barrientos, C., Dehesa, J. S., and Dobado, J. A. (2013). Decoding the Building Blocks of Life from the
Perspective of Quantum Information. InTech. doi: 10.5772/55160
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
Przytycka, T. (2007). Scoring Matrices Position Specific Scoring Matrices Motifs (Lecture 3: Principles of Computational Biology). Accessed at
https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect03_Scoring_Matr_Motifs.pdf
Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at
https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf
Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065
Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University
21
11