0% found this document useful (0 votes)
40 views49 pages

COB Sequencealignment

Here is a pseudocode for merging two sorted arrays A and B into a sorted array C: i = 0 j = 0 k = 0 while i < size(A) and j < size(B): if A[i] <= B[j]: C[k] = A[i] i = i + 1 else: C[k] = B[j] j = j + 1 k = k + 1 while i < size(A): C[k] = A[i] i = i + 1 k = k + 1 while j < size(B): C[k] = B[j

Uploaded by

Paddy Nji Kily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views49 pages

COB Sequencealignment

Here is a pseudocode for merging two sorted arrays A and B into a sorted array C: i = 0 j = 0 k = 0 while i < size(A) and j < size(B): if A[i] <= B[j]: C[k] = A[i] i = i + 1 else: C[k] = B[j] j = j + 1 k = k + 1 while i < size(A): C[k] = A[i] i = i + 1 k = k + 1 while j < size(B): C[k] = B[j

Uploaded by

Paddy Nji Kily
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Pairwise sequence

alignment
Overview
• Alignment
• Global alignment
• Local alignment
• Affine gap penalties
• Divide and conquer
• Linear space alignment
Basic question

• Given two sequences, how similar are they?


• ACTGACTG and AACCTTGG (??)
• ACTGACTG and GCTGACTG (87.5%
similar)
• ACTAA versus ATCTAA
What about these?
What about the following sequences?
Problem Statement

NPNQKIITIGSICMVTGIVSLMLQIGNMISIWVSHSIHTGNQH
QAEPISNTNFLTEKAVASVRLAGNSSLCPINGWAVYSKDNSIR
SCSHLECRTFFLTQGALLNDKHSNGTVKDRSPHRTLMSCPVGV
ASACHDGTSWLTIGISGPDNGAVAVLKYNGIITDTIKSWRNNI
FTVMTDGPSNGQASHKIFKMELLKVVKSVELDAPNYHYEECSC
HGSNRPWVSFNQNLEYQIGYICSGVFGDNPRPNDGTGSCGPVS
NGVWIGRTKSTNSRSGFEMIWDPNGWTETDSSFSVKQDIVAIT ...

NPNQKIITIGSICMVTGIVSLMLQIGNMISIWVSHSIHTGNQH
QAEPISNTNFLTEKAVASVRLAGNSSLCPINGWAVYSKDNSIR
SCSHLECRTFFLTQGALLNDKHSNGTVKDRSPHRTLMSCPVGV
ASACHOHMYGODTHEYKILLEDKENNYNNGIITDTIKSWRNNI
FTVMTDGPSNGQASHKIFKMELLKVVKSVELDAPNYHYEECSC
HGSNLPWVSHDGKSWLTIGISGPDNGAVAVLKNDGTGSCGPVS
NGVWITIGSICMVTGIVSLMLQIGNMISTDSSFSVKQDIVAIT ...

Frank Neven (Hasselt University) Pairwise Sequence Alignment 4 / 17


Alignment
• model for sequence similarity
• stacking of two sequences, identifying
similar parts in sequences
Alignment
Scoring an alignment
• scoring matrix for matches
and mismatches

•penalizing indels: = -1.5


scoring scheme
Overview
• Alignment
• Global alignment
• Local alignment
• Affine gap penalties
• Divide and conquer
• Linear space alignment
Global alignment
Alignment graph
The alignment graph G = (V, E) for two
sequences x and y equals

• V = {(i, j) | i ⇤ {0, . . . , |x|}, j ⇤ {0, . . . , |y|}}

• ((i, j), (i , j )) ⇤ E iff 0 ⇥ i i ⇥ 1 and 0 ⇥ j j⇥1

source = (0,0)
sink = (|x|, |y|)
Alignment graph
one-to-one correspondence
between alignment and path
from source to sink

Let xi (yi ) denote the ith letter of x and y.

Edge entering (i, j)



• diagonal write xi
yj

• vertical write xi


• horizontal write yj
Alignment graph
one-to-one correspondence
between alignment and path
from source to sink

Let xi (yi ) denote the ith letter of x and y.

Edge entering (i, j)



• diagonal write xi
yj

• vertical write xi


• horizontal write yj
Alignment graph
one-to-one correspondence
between alignment and path
from source to sink

Let xi (yi ) denote the ith letter of x and y.

Edge entering (i, j)



• diagonal write xi
yj

• vertical write xi


• horizontal write yj
Alignment graph
one-to-one correspondence
between alignment and path
from source to sink

Let xi (yi ) denote the ith letter of x and y.

Edge entering (i, j)



• diagonal write xi
yj

• vertical write xi


• horizontal write yj
Alignment graph
one-to-one correspondence
between alignment and path
from source to sink

From alignment to path:


For each position in alignment
xi

• yj go diagonal
xi

• go vertical

• yj go horizontal
Graphs to the rescue

Computing best alignment =


computing optimal path from source
to sink

score of alignment transfers


to score of path
Number of possible alignments/paths

roughly increases exponentially


Dynamic programming

• DP = holy grail in computational biology


• Main principle:
• do not consider every possible path seperately
• compute best path of length n from best paths of
length n-1
• be smart: avoid recomputation
Dynamic programming to the rescue
• A = a1 . . . an , B = b1 , . . . , bm

• ⇥(ai , bi ) = (mis)match score between ai and bi ,

• is the gap (indel) penalty

• S[i, j] = the score of an optimal path from the source to (i, j)

• S[0, 0] = 0 GTA
---
• S[i, 0] = i ⇥
has score 3 ⇥ 3 =-9
• S[0, i] = i ⇥
Dynamic programming to the rescue
• A = a1 . . . an , B = b1 , . . . , bm

• ⇥(ai , bi ) = (mis)match score between ai and bi ,

• is the gap (indel) penalty

• S[i, j] = the score of an optimal path from the source to (i, j)


Dynamic programming to the rescue
• A = a1 . . . an , B = b1 , . . . , bm

• ⇥(ai , bi ) = (mis)match score between ai and bi ,

• is the gap (indel) penalty

• S[i, j] = the score of an optimal path from the source to (i, j)


Dynamic programming to the rescue
Dynamic programming to the rescue

match = +8

mismatch = -5

gap = -3
Dynamic programming to the rescue

Time complexity: O(nm)


Space complexity: O(nm) Needleman-Wunsch
not exponential !
Computing the alignment
Dynamic programming versus recursion
funct int GA(i,j){
if (i=0) return -b x j;
if (j=0) return -b x i;
m = max( G(i-1,j) - b,
G(i,j-1) - b,
G(i-1,j-1) + s(i,j));
return m;
}
G(5,5)

G(4,5) G(5,4) G(4,4)

G(3,5) G(4,4) G(3,4) G(4,4) G(5,3) G(4,3) G(3,4) G(4,3) G(3,3)


Dynamic programming

• takes advantage of overlapping subproblems, optimal


substructure, and trades space for time to improve the runtime
complexity of algorithms.

• Term stems from Richard Bellman to describe the process of


solving problems where one needs to find the best decisions
one after another

• The word "programming" in "dynamic programming" has no


particular connection to computer programming at all, and
instead comes from the term "mathematical programming", a
synonym for optimization.
Overview
• Alignment
• Global alignment
• Local alignment
• Affine gap penalties
• Divide and conquer
• Linear space alignment
Local alignment

There are O(n2 ) fragments of x and O(m2 ) fragments of y.

Global alignment on each of them results in an O(n3 m3 ) algorithm.


Basic idea
• Paths for global alignment always start at
the sink (0,0) with start score 0
• When the best path at (i,j) has a negative
score, it makes little sense to try to extend
it.
• Use (i,j) as a sink.
Local alignment

smith waterman
Local alignment

global local
Local alignment
Local alignment
String similarity

• hamming distance, Levenshtein distance,


Jaro Winkler, Jaccard similarity, q-grams, ...
Overview
• Alignment
• Global alignment
• Local alignment
• Affine gap penalties
• Divide and conquer
• Linear space alignment
Overview
• Alignment
• Global alignment
• Local alignment
• Affine gap penalties
• Divide and conquer
• Linear space alignment
Divide-and-Conquer

Divide-and-conquer.
! Break up problem into several parts.
! Solve each part recursively.
! Combine solutions to sub-problems into overall solution.

Most common usage.


!Break up problem of size n into two equal parts of size !n.
!Solve two parts recursively.
!Combine two solutions into overall solution in linear time.

Consequence.
! Brute force: n2.
! Divide-and-conquer: n log n. Divide et impera.
Veni, vidi, vici.
- Julius Caesar

2
Mergesort

Mergesort.
!Divide array into two halves.
!Recursively sort each half.
!Merge two halves to make sorted whole.

Jon von Neumann (1945)

A L G O R I T H M S

A L G O R I T H M S divide O(1)

A G L O R H I M S T sort 2T(n/2)

A G H I L M O R S T merge O(n)

5
Merging

Merging. Combine two pre-sorted lists into a sorted whole.

How to merge efficiently?


!Linear number of comparisons.
!Use temporary array.

A G L O R H I M S T

A G H I

Challenge for the bored. In-place merge. [Kronrud, 1969]

using only a constant amount of extra storage


A Useful Recurrence Relation

Def. T(n) = number of comparisons to mergesort an input of size n.

Mergesort recurrence.

' 0 if n = 1
)
T(n) " ( T ( #n /2$ ) + T ( %n /2& ) + n otherwise
{
) 14243 14243
merging
* solve left half solve right half

!
Solution. T(n) = O(n log2 n).

Assorted proofs. We describe several ways to prove this recurrence.


Initially we assume n is a power of 2 and replace ! with =.
Proof by Induction

Claim. If T(n) satisfies this recurrence, then T(n) = n log2 n.

assumes n is a power of 2

" 0 if n = 1
$
T(n) = # n
2T(n /2) + { otherwise
$% 14243
sorting both halves merging

Pf. (by induction


! on n)
! Base case: n = 1.
! Inductive hypothesis: T(n) = n log2 n.
! Goal: show that T(2n) = 2n log2 (2n).

T(2n) = 2T(n) + 2n
= 2n log2 n + 2n
= 2n(log2 (2n) "1) + 2n
= 2n log2 (2n)

10

!
Overview
• Alignment
• Global alignment
• Local alignment
• Affine gap penalties
• Divide and conquer
• Linear space alignment
Basic idea: recycle space

• S[i,j] depends on previous row (i-1) and previous column (j-1)


• after computation of row i, row i-1 can be deleted
• Only one row is needed !!
Basic idea: recycle space

•S-[j’] contains S[i,j’] for j’ smaller than j


•S-[j’] contains S[i-1,j’] for j’ larger than j
•S[n,m] contains the optimal alignment score
Basic idea: recycle space
• Time complexity: O(nm)
• Space complexity: O(k) for k = min{n,m}

What is the problem?


Complexity
• Space: O(n)
• Time: amount of area
• nm + nm/2 + nm/4 + ... <= 2 nm
• O(nm)
3rd pass: 1/4
first pass: 1 4th pass: 1/8

2nd pass: 1/2

You might also like