0% found this document useful (0 votes)

31 views18 pages

Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001

This document discusses pairwise sequence alignment using dynamic programming. It introduces the tasks of comparing DNA or protein sequences to find the optimal correspondences between subsequences that maximize similarity. Dynamic programming is used to solve this problem by dividing it into smaller subproblems and storing the solutions in a matrix. The document provides examples of how to initialize the matrix and fill it in using a scoring scheme to find the highest scoring alignment. It analyzes the computational complexity and also discusses extensions like local alignment.

Uploaded by

Fadhili Dunga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PS, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views18 pages

Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001

Uploaded by

Fadhili Dunga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PS, PDF, TXT or read online on Scribd

You are on page 1/ 18

Pairwise Sequence Alignment

CS 838
www.cs.wisc.edu/~craven/cs838.html
Mark Craven
[email protected]
January 2001

Announcements
• New optional, but recommended, reading on the
course web page: Molecular Biology for Computer
Scientists by Larry Hunter

1
Pairwise Alignment:
Task Definition
• Given
– a pair of sequences (DNA or protein)
– a method for scoring the similarity of a pair of
characters
• Do
– determine the correspondences between
substrings in the sequences such that the
similarity score is maximized

Motivation
• comparing sequences to gain information
about the structure/function of a query
sequence
• putting together a set of sequenced
fragments (fragment assembly)
• comparing a segment sequenced by two
different labs

2
The Role of Homology
• homology: similarity due to descent from a
common ancestor
• often we can infer homology from similarity
• thus we can sometimes infer
structure/function from sequence similarity

Homology
• homologous sequences can be divided into
two groups
– orthologous sequences: sequences that differ
because they are found in different species
(e.g. human α-globin and mouse α-globin)
– paralogous sequences: sequences that differ
because of a gene duplication event
(e.g. human α-globin and human β-globin,
various versions of both )

3
Issues in Sequence Alignment
• the sequences we’re comparing probably differ in
length
• there may be only a relatively small region in the
sequences that matches
• we want to allow partial matches (i.e. some amino
acid pairs are more substitutable than others)
• variable length regions may have been
inserted/deleted from the common ancestral
sequence

Gaps
• sequences may have diverged from a
common ancestor through various types of
mutations:
– substitutions (ACGA AGGA)
– insertions (ACGA ACCGA)
– deletions (ACGA AGA)
• the latter two will result in gaps in
alignments

4
Insertions/Deletions and
Protein Structure

loop structures: insertions/deletions

here not so significant

Example Alignment
GSAQVKGHGKKVADALTNAVAHV---D--DMPNALSALSDLHAHKL
++ ++++H+ KV + +A ++ +L+ L+++H+ K
NNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKG

• gaps depicted with –

• middle line shows matches
– identical matches shown with letters
– similar amino acids shown with +
– dissimilar amino acids/gaps indicated by space

5
Alignments in the Olden Days:
Dot Plots
G A C G G A T T A G
G n n n n
A n n n
T n n
C n
G n n n n
G n n n n
A n n n
A n n n
T n n
A n n n
G n n n n

Types of Alignment
• global: find best match of both sequences in their
entirety
• local: find best subsequence match
• semi-global: find best match without penalizing
gaps on the ends of the alignment

6
Pairwise Alignment Via Dynamic
Programming
• Needleman & Wunsch, Journal of Molecular
Biology, 1970
• dynamic programming: solve an instance of a
problem by taking advantage of computed
solutions for smaller subparts of the problem
• determine alignment of two sequences by
determining alignment of all prefixes of the
sequences

Scoring Scheme Components

• substitution matrix
– s(a,b) indicates score of aligning character a
with character b
• gap penalty function
– w(k) indicates cost of a gap of length k

7
Linear Gap Penalty Function
• different gap penalty functions require
somewhat different DP algorithms
• the simplest case is when a linear gap
function is used

w(k ) = gk
where g is a constant
• we’ll start by considering this case

Dynamic Programming Idea

• consider last step in computing alignment of AAAC with
AGC
• three possible options; in each we’ll choose a different
pairing for end of alignment, and add this to best alignment
of previous characters
AAA C AAAC -
AG C AG C

AAA C consider best score of

AGC -
alignment of + aligning
these prefixes this pair

8
Dynamic Programming Idea
• given an n-character sequence x, and an m-
character sequence y
• construct an (n+1) x (m+1) matrix F
• F [ i, j ] = score of the best alignment of
x[1…i ] with y[1…j ]

Dynamic Programming Idea

F[i-1, j-1] F[i, j-1]

+g
+ s(x[i],y[j])

F[i-1, j] F[i, j]
+g

9
Dynamic Programming Idea
• in extending an alignment, we have 3 choices:
– align x[ 1… i-1] with y[ 1… j-1] and match x[ i ]
with y[ i ]
– align x[1… i ] with y[ 1… j-1 ] and match a gap
with y[ j ]
– align x[ 1…i-1 ] with y[ 1… j ] and match a gap
with x[ i ]
• choose highest scoring choice to fill in F [ i, j ]

DP Algorithm for Global Alignment

with Linear Gap Penalty
• one way to specify the DP is in terms of its
recurrence relation:

 F (i − 1, j − 1) +s ( xi, yj )

F (i, j ) = max  F (i − 1, j ) + g
 F (i, j − 1) + g


10
Initializing Matrix: Global
Alignment with Linear Gap Penalty
A G C

0 g 2g 3g

A g

A 2g

A 3g

C 4g

DP Algorithm Sketch
• initialize first row and column of matrix
• fill in rest of matrix from top to bottom, left
to right
• for each F [ i, j ], save pointer(s) to cell(s)
that resulted in best score
• F [m, n] holds the optimal alignment score;
trace pointers back from F [m, n] to F [0, 0]
to recover alignment

11
DP Algorithm Example
• suppose we choose the following scoring scheme:
s(x[i], y[j]) =
+1 when x[i] = y[j]
-1 when x[i] <> y[j]
g (penalty for aligning with a gap) = -2

DP Algorithm Example
A G C

0 -2 -4 -6

A one optimal alignment

-2 1 -1 -3
x: A A A C
A y: A G - C
-4 -1 0 -2

A -6 -3 -2 -1

C -8 -5 -4 -1

12
DP Comments
• works for either DNA or protein sequences,
although the substitution matrices used
differ
• finds an optimal alignment
• the exact algorithm (and computational
complexity) depends on gap penalty
function (we’ll come back to this issue)

Equally Optimal Alignments

• many optimal alignments may exist for a given
pair of sequences
• can use preference ordering over paths when
doing traceback
highroad 1 lowroad 3
2 2

3 1
• highroad and loadroad alignments show the two
most different optimal alignments

13
Highroad & Lowroad Alignments
A G C
highroad alignment
0 -2 -4 -6
x: A A A C
A y: A G - C
-2 1 -1 -3

A -4 -1 0 -2 lowroad alignment
x: A A A C
A -6 -3 -2 -1 y: - A G C

C -8 -5 -4 -1

Dynamic Programming Analysis

• there are

 2n  (2n)! 2 2 n
  = ≈
 
n ( n! ) 2
πn
possible alignments of length n
• e.g. two sequences of length 1000 have ≈ 10
600

possible alignments
• but the DP approach finds an optimal alignment
efficiently

14
Computational Complexity
• initialization: O(m), O(n)
• filling in rest of matrix: O(mn)
• traceback: O(m + n)
• hence, if sequences have nearly same
length, the computational complexity is
O (n 2 )

Local Alignment
• so far we have discussed global alignment,
where we are looking for best match
between sequences from one end to the
other.
• more commonly, we will want a local
alignment, the best match between
subsequences of x and y.

15
Local Alignment Motivation
• useful for comparing protein sequences that
share a common domain but differ
elsewhere
• useful for comparing against genomic
sequences (long stretches of
uncharacterized sequence)
• more sensitive when comparing highly
diverged sequences

Local Alignment DP Algorithm

• original formulation: Smith & Waterman,
Journal of Molecular Biology, 1981
• interpretation of array values is somewhat
different
– F [ i, j ] = score of the best alignment of a
suffix of x[1…i ] and a suffix of y[1…j ]

16
Local Alignment DP Algorithm
• the recurrence relation is slightly different than for
global algorithm

 F (i − 1, j − 1) +s( xi, yj )
 F (i − 1, j ) + g

F (i, j ) = max 
 F (i, j − 1) + g
0

Local Alignment DP Algorithm

• initialization: first row and first column initialized
with 0’s
• traceback:
– find maximum value of F(i, j); can be anywhere
in matrix
– stop when we get to a cell with value 0

17
Local Alignment Example
A A G A
0 0 0 0 0
T 0 0 0 0 0
T 0 0 0 0 0
A 0 1 1 0 1
A 0 1 2 0 1
G 0 0 0 3 1
x: A A G
y: A A G

Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Multiple Alignment PDF
No ratings yet
Multiple Alignment PDF
45 pages
Developing Pairwise Sequence Alignment Algorithms: Dr. Nancy Warter-Perez
No ratings yet
Developing Pairwise Sequence Alignment Algorithms: Dr. Nancy Warter-Perez
33 pages
Unlock Huawei E153
100% (1)
Unlock Huawei E153
10 pages
Veterinary Oncology
100% (6)
Veterinary Oncology
311 pages
2NGS 01 Alignment
No ratings yet
2NGS 01 Alignment
18 pages
Dynamic Programming Approach
No ratings yet
Dynamic Programming Approach
32 pages
Hierarchical Clustering Implementation
No ratings yet
Hierarchical Clustering Implementation
34 pages
Morphology of Bacteria
100% (1)
Morphology of Bacteria
44 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
COB Sequencealignment
No ratings yet
COB Sequencealignment
49 pages
Drug Design
0% (1)
Drug Design
28 pages
Lec7 - Multiple Sequence Alignment
No ratings yet
Lec7 - Multiple Sequence Alignment
22 pages
Lec4 - Multiple Sequence Alignment
No ratings yet
Lec4 - Multiple Sequence Alignment
22 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
L8 Msa
No ratings yet
L8 Msa
52 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
10 pages
DNA Alignment
No ratings yet
DNA Alignment
76 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
2.2 Plan of Work
No ratings yet
2.2 Plan of Work
3 pages
Generic Drug Product Development: SPH SPH IHBK055-FM IHBK055-Kanfer January 8, 2010 19:20 Char Count
No ratings yet
Generic Drug Product Development: SPH SPH IHBK055-FM IHBK055-Kanfer January 8, 2010 19:20 Char Count
330 pages
Analytical
No ratings yet
Analytical
24 pages
Bio 3
No ratings yet
Bio 3
51 pages
Quorum Sensing
100% (3)
Quorum Sensing
20 pages
Week 4
No ratings yet
Week 4
38 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Unit I Algorithms
No ratings yet
Unit I Algorithms
42 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
BT302 L3 Psa
No ratings yet
BT302 L3 Psa
47 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Mini Dissertation Example
100% (2)
Mini Dissertation Example
5 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Lecture 5 Introduction Dynamic Programming
No ratings yet
Lecture 5 Introduction Dynamic Programming
52 pages
Chapter 2 Techniques in Cell and Molecular Biology
No ratings yet
Chapter 2 Techniques in Cell and Molecular Biology
2 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
Introduction Dynamic Programming
No ratings yet
Introduction Dynamic Programming
52 pages
Lecture 9 and 10 Pair Wise Global Alignment.
No ratings yet
Lecture 9 and 10 Pair Wise Global Alignment.
27 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
Mutation
No ratings yet
Mutation
22 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Gastric Cancer, NCCN, 2021
No ratings yet
Gastric Cancer, NCCN, 2021
135 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
AsBioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
AsBioinfo Ders 7 ALLIGNMENT - 1
9 pages
Ste Biotech q3m4 Nnhs
No ratings yet
Ste Biotech q3m4 Nnhs
34 pages
Lecture 4.1 and 4.2 Sequence Alignment (Global and Local)
No ratings yet
Lecture 4.1 and 4.2 Sequence Alignment (Global and Local)
14 pages
B SC Zoology Scheme Batch 2013 PDF
No ratings yet
B SC Zoology Scheme Batch 2013 PDF
117 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
36) Corpet 1988
No ratings yet
36) Corpet 1988
10 pages
Lecture2 Sequence Alignment
No ratings yet
Lecture2 Sequence Alignment
26 pages
Martin Shkreli COVID-19 Plan
No ratings yet
Martin Shkreli COVID-19 Plan
11 pages
Detailed Lesson Plan - 2nd Quarter - Demo
No ratings yet
Detailed Lesson Plan - 2nd Quarter - Demo
7 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
TB Final2023 643ef57734b858.643ef57a5a0fb6.40844090
100% (3)
TB Final2023 643ef57734b858.643ef57a5a0fb6.40844090
71 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Gregor Johann Mendel: 20 July 1822-6 January 1884
No ratings yet
Gregor Johann Mendel: 20 July 1822-6 January 1884
4 pages
Drug Discovery
No ratings yet
Drug Discovery
31 pages
Cells Study Guide KEY
No ratings yet
Cells Study Guide KEY
6 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
What Is Dynamic Programming?
No ratings yet
What Is Dynamic Programming?
7 pages
Support Vector Machine Classification of Microarray Gene Expression Data UCSC-CRL-99-09
No ratings yet
Support Vector Machine Classification of Microarray Gene Expression Data UCSC-CRL-99-09
31 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Position Paper Age Groups
No ratings yet
Position Paper Age Groups
5 pages
6.1 Some Preliminary Remarks
No ratings yet
6.1 Some Preliminary Remarks
38 pages
Step 7: Describing Segments: 9.1 Developing A Complete Picture of Market Segments
No ratings yet
Step 7: Describing Segments: 9.1 Developing A Complete Picture of Market Segments
38 pages
Step 6: Profiling Segments: 8.1 Identifying Key Characteristics of Market Segments
No ratings yet
Step 6: Profiling Segments: 8.1 Identifying Key Characteristics of Market Segments
15 pages
Inferring Regulatory Networks From Gene Expression Data
No ratings yet
Inferring Regulatory Networks From Gene Expression Data
14 pages
Step 10: Evaluation and Monitoring: 12.1 Ongoing Tasks in Market Segmentation
No ratings yet
Step 10: Evaluation and Monitoring: 12.1 Ongoing Tasks in Market Segmentation
13 pages
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
No ratings yet
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
9 pages
Ijcai01 HMM
No ratings yet
Ijcai01 HMM
7 pages
Step 8: Selecting The Target Segment(s)
No ratings yet
Step 8: Selecting The Target Segment(s)
7 pages
Covid Patients2022
No ratings yet
Covid Patients2022
221 pages
Step 3: Collecting Data: 5.1 Segmentation Variables
No ratings yet
Step 3: Collecting Data: 5.1 Segmentation Variables
17 pages
Restriction Enzyme: Genetic Science: Elusive Brains..., Dec 13 1997
No ratings yet
Restriction Enzyme: Genetic Science: Elusive Brains..., Dec 13 1997
15 pages
4.1 Finger Prints and Maps
No ratings yet
4.1 Finger Prints and Maps
12 pages
La Souris, La Mouche Et L'Homme A. Human Genome Project: B. DNA Computers/DNA Nanorobots: C. Phylogenomics
No ratings yet
La Souris, La Mouche Et L'Homme A. Human Genome Project: B. DNA Computers/DNA Nanorobots: C. Phylogenomics
12 pages
A Probabilistic Learning Approach To Whole-Genome Operon Prediction
No ratings yet
A Probabilistic Learning Approach To Whole-Genome Operon Prediction
12 pages
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
No ratings yet
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
12 pages
Chain Supply Lab
No ratings yet
Chain Supply Lab
16 pages
Constructing Biological Knowledge Bases by Extracting Information From Text Sources
No ratings yet
Constructing Biological Knowledge Bases by Extracting Information From Text Sources
10 pages
Preliminaries: What Is Life?
No ratings yet
Preliminaries: What Is Life?
8 pages
RT PCR Tests Kits Evaluation Summ 30052020 PDF
No ratings yet
RT PCR Tests Kits Evaluation Summ 30052020 PDF
3 pages
General Biology 1 First Quarter Test
No ratings yet
General Biology 1 First Quarter Test
2 pages
Introduction To Stem Cells and Disease
No ratings yet
Introduction To Stem Cells and Disease
44 pages
Nyuki, Ufugaji Bora Wa
No ratings yet
Nyuki, Ufugaji Bora Wa
2 pages
Exosap-It PCR Cleanup Reagents: One Step To Superior Sequencing Results
No ratings yet
Exosap-It PCR Cleanup Reagents: One Step To Superior Sequencing Results
5 pages
Unit 1 - Life Science Biology
No ratings yet
Unit 1 - Life Science Biology
41 pages
Informed Consent Form
No ratings yet
Informed Consent Form
3 pages
Plasmido - pJOQ - Artículo Red Fluorescent Protein Mkate
No ratings yet
Plasmido - pJOQ - Artículo Red Fluorescent Protein Mkate
10 pages
Trizol Reagent
No ratings yet
Trizol Reagent
4 pages
Perceptions of Epigenetics
No ratings yet
Perceptions of Epigenetics
3 pages
Facebook
No ratings yet
Facebook
1 page
NGA, Frontiers in Plant Science, 2014
No ratings yet
NGA, Frontiers in Plant Science, 2014
11 pages
Tomato Guide: DAY Fungicide Insecticide Foliar Feed Fertilizer
No ratings yet
Tomato Guide: DAY Fungicide Insecticide Foliar Feed Fertilizer
1 page
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001

Uploaded by

Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001

Uploaded by

Pairwise Sequence Alignment

loop structures: insertions/deletions

• gaps depicted with –

Scoring Scheme Components

Dynamic Programming Idea

AAA C consider best score of

Dynamic Programming Idea

F[i-1, j-1] F[i, j-1]

DP Algorithm for Global Alignment

A one optimal alignment

Equally Optimal Alignments

Dynamic Programming Analysis

Local Alignment DP Algorithm

Local Alignment DP Algorithm

You might also like