0% found this document useful (0 votes)
13 views11 pages

LO5 Pairwise Sequence Alignment

Uploaded by

dumpdave30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

LO5 Pairwise Sequence Alignment

Uploaded by

dumpdave30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Bio16 Computational Biology

Pairwise Sequence Alignment (PSA)

Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University

What do we do with sequenced data?

Find whether a gene/protein is RELATED to other genes/proteins


➢ Homologous
➢ Common function
➢ Domains or Motifs shared among groups

Pairwise Sequence Alignment


➢ The process of lining up two or more sequences to achieve
maximal levels of identity (and conservation, in the case of
amino acid sequences) for the purpose of assessing the
degree of similarity and the possibility of homology.

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

1
Pairwise Sequence Alignment

HOMOLOGY IDENTITY
share a common evolutionary ancestry extent to which two amino acid (or nucleotide)
sequences are invariant (unchanged) = exact
no “degrees”; either homologous or NOT matching

almost always share a significantly consider 3D structure also


related three-dimensional structure
(diverge much more slowly)
SIMILARITY
usually share significant
general description of a relationship = optimal
identity/similarity
matching
either PARALOGOUS or ORTHOLOGOUS does not imply any reasons for the observed
sameness

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Basis of Similarity in Proteins

Esquivel et al. (2013). Decoding the Building Blocks of Life from the Perspective of Quantum Information. InTech. doi: 10.5772/55160

2
Homology

ORTHOLOGS PARALOGS
homologous sequences in different homologous sequences that arose by a
species that arose from a common mechanism such as gene duplication
ancestral gene during speciation

Human Globins

Myoglobin

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Basis of Scoring Matrices


Sources of Sequence Variations

Mismatch = 0
Perfect Match = +1

Substitution
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

3
Basis of Scoring Matrices
Sources of Sequence Variations

Gap Opening = -2
Gap= -1

Insertion/Deletion
(InDel)

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Basis of Scoring Matrices


Why penalize gaps?

The optimal alignment of two similar sequences is usually that which


✓ maximizes the number of matches and
✓ minimizes the number of gaps

There is a tradeoff between these two


✓ adding gaps reduces mismatches

Permitting the insertion of arbitrarily many gaps can lead to high-


scoring alignments of non-homologous sequences.

Penalizing gaps forces alignments to have relatively few gaps.

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

4
Protein Sequence Alignment

Identity matrix
o Exact matches receive one score and non-exact matches a different
score (1 on the diagonal 0 everywhere else)

Mutation data matrix


o a scoring matrix compiled based on observation of protein mutation
rates: some mutations are observed more often than others (PAM,
BLOSUM)

Physical properties matrix


o amino acids with similar biophysical properties receive a high score.
Genetic code matrix
o amino acids are scored based on similarities in the coding triple.

Przytycka, T. (2007). Scoring Matrices Position Specific Scoring Matrices Motifs (Lecture 3: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect03_Scoring_Matr_Motifs.pdf

Basis of Scoring Matrices

Accepted Point Mutation


(PAM)

a replacement of one amino


acid in a protein by another
residue that has been
accepted by natural selection

Mutation Probability Matrix at Interval of 1 (PAM1)


(unit of evolutionary divergence in which 1% of the amino acids have been
changed between the two protein sequences)

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

10

5
Basis of Scoring Matrices

Accepted Point Mutation


(PAM)

a replacement of one amino


acid in a protein by another
residue that has been
accepted by natural selection

Log –Odds Matrix for PAM10 and PAM250


(allows us to sum the scores of the aligned residues when we perform an
overall alignment of two sequences)

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

11

Basis of Scoring Matrices

Accepted Point Mutation


(PAM)

a replacement of one amino


acid in a protein by another
residue that has been
accepted by natural selection

Log –Odds Matrix for PAM10 and PAM250


(allows us to sum the scores of the aligned residues when we perform an
overall alignment of two sequences)

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

12

6
Basis of Scoring Matrices
Block Substitution Matrix
(BLOSUM)

By Henikoff and Henikoff (1992,


1996)
They focused on conserved
regions (blocks) of proteins that
are distantly related to each
other

BLOSUM62 matrix
+ values = frequent exchanges
- values = rare replacements
Al-Neman and Ali (2019). An Efficient Parallel Algorithm for Improving Multiple Sequence Alignment on Multi-core. Conference Paper. DOI: 10.1109/IEC47844.2019.8950543

13

Basis of Scoring Matrices

Well‐Conserved Distantly Related


Proteins Proteins
General Use

BLAST

Summary of PAM and BLOSUM matrices

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

14

7
Methods of Alignment
By Hand
➢ slide sequences on two lines of a
word processor

Dot Plot
➢ Graphical matrix GCCTA - TTACGTCCT
Rigorous Algorithm
GCATACGTA-GCCCT
➢ Dynamic programming (slow,
optimal)
Aligning by hand
Heuristic methods
➢ fast, approximate need a scoring system to find the best alignment
➢ BLAST and FASTA = word matching (e.g., % Identity=10/14=71.4%)
and hash tables

Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf

15

Methods of Alignment
By Hand A
➢ slide sequences on two lines of a G
word processor T
Dot Plot C
➢ Graphical matrix A
T
Rigorous Algorithm
➢ Dynamic programming (slow, T
optimal) A
G
Heuristic methods
➢ fast, approximate C
➢ BLAST and FASTA = word matching T T A C T T G A T T
and hash tables
Dot Plot
gives an overview of all possible alignment
Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf

16

8
Methods of Alignment
By Hand
➢ slide sequences on two lines of a
word processor

Dot Plot
➢ Graphical matrix

Rigorous Algorithm
➢ Dynamic programming (slow,
optimal)

Heuristic methods
➢ fast, approximate
➢ BLAST and FASTA = word matching
and hash tables
Reducing Noise in Dot Plot

Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf

17

Methods of Alignment
Algorithm vs Program
By Hand
➢ slide sequences on two lines of a a step-by-step set of a set of instructions that
instructions designed to solve uses an algorithm to solve
word processor a specific problem a task

Dot Plot
Dynamic Programming Heuristic Programming
➢ Graphical matrix
finding optimal alignments makes approximations of the
between sequences by best solution without
Rigorous Algorithm exhaustively
considering all possible
➢ Dynamic programming (slow, alignments and scoring considering every possible
optimal) them based on a scoring outcome
system
Heuristic methods
➢ fast, approximate
➢ BLAST and FASTA = word matching
and hash tables
Global vs
Local
Alignment Alignment
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

18

9
Dynamic Programming Examples

Global Alignment vs Local Alignment


Needleman and Wunsch (1970) Smith and Waterman (1981)

contains the entire sequence of each focuses on the regions of greatest


protein or DNA molecule similarity between two sequences

start at the beginning of two finds the region (or regions) of


sequences and add gaps to each highest similarity between two
until the end of one is reached sequences and build the alignment
(end-to-end) outward from there

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

19

Statistical Significance of PSA


maximize the sensitivity and
specificity of sequence
alignments

Global Alignments
Z-score = use α as the threshold

Local Alignments
% Identity = may not be
informative; 25% at 150 more
residues or 40% at 70 residues
H = relative entropy; measures
observed alignment distribution
to expected distribution by
chance

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

20

10
Bio16 Computational Biology
Pairwise Sequence Alignment (PSA)

References:
Al-Neman and Ali (2019). An Efficient Parallel Algorithm for Improving Multiple Sequence Alignment on Multi-core. Conference Paper. DOI:
10.1109/IEC47844.2019.8950543

Esquivel, R. O., Molina-Espíritu, M., Salas, F., Soriano, C., Barrientos, C., Dehesa, J. S., and Dobado, J. A. (2013). Decoding the Building Blocks of Life from the
Perspective of Quantum Information. InTech. doi: 10.5772/55160

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Przytycka, T. (2007). Scoring Matrices Position Specific Scoring Matrices Motifs (Lecture 3: Principles of Computational Biology). Accessed at
https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect03_Scoring_Matr_Motifs.pdf

Przytycka, T. (2007). Pairwise sequence alignment (Lecture 2: Principles of Computational Biology). Accessed at
https://www.ncbi.nlm.nih.gov/CBBresearch/Przytycka/download/lectures/PCB_Lect02_Pairwise_allign.pdf

Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065

Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University

21

11

You might also like