0% found this document useful (0 votes)
17 views17 pages

Sequence Alignment

This document discusses DNA sequence alignment and describes how to find the optimal alignment of two DNA sequences using dynamic programming. It first provides background on DNA sequences and similarity. It then explains the naive recursive algorithm before describing the dynamic programming approach, which builds up a table of partial solutions in order to reuse computations and run in quadratic time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views17 pages

Sequence Alignment

This document discusses DNA sequence alignment and describes how to find the optimal alignment of two DNA sequences using dynamic programming. It first provides background on DNA sequences and similarity. It then explains the naive recursive algorithm before describing the dynamic programming approach, which builds up a table of partial solutions in order to reuse computations and run in quadratic time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

DNA Sequence Alignment

A dynamic programming algorithm


Some ideas stole from Winter 1996 offering of 590BI at
http://www/education/courses/590bi/98wi/
See Lecture 2 by Prof. Ruzzo. Or try current quarter of CSE 527.
Those slides are more detailed and biologically accurate.
DNA Sequence Alignment (aka
“Longest Common Subsequence”)
• The problem
– What is a DNA sequence?
– DNA similarity
– What is DNA sequence alignment?
– Using English words
• The Naïve algorithm
• The Dynamic Programming algorithm
• Idea of Dynamic Programming
What is a DNA sequence
• DNA: string using letters A,C,G,T
– Letter = DNA “base”
– e.g. AGATGGGCAAGATA
• DNA makes up your “genetic code”
DNA similarity
• DNA can mutate.
– Change a letter
• AACCGGTT  ATCCGGTT
– Insert a letter
• AACCGGTT  ATAACCGGTT
– Delete a letter
• AACCGGTT  ACCGGTT
• A few mutations makes sequences different, but
“similar”
Why is DNA similarity important
• New sequences compared to existing
sequences
• Similar sequences often have similar
function
• Most widely used algorithm in
computational biology tools
– e.g. BLAST at
http://www.ncbi.nlm.nih.gov/BLAST/
What is DNA sequence
alignment?
• Match 2 sequences, with underscore ( _ )
wildcards.
• Best Alignment  minimum underscores
(slight simplification, but okay for 326)
• e.g. ACCCGTTT
TCCCTTT

Best alignment: A_CCCGTTT


(3 underscores) _TCCC_TTT
Moving to English words

zasha
ashes

zash__a
_ashes_
Naïve algorithm
• Try every way to put in underscores
• If it works, and is best so far, record it.
• At end, return best solution.
Naïve Algorithm – Running
Time
• Strings size M,N: ( 2 M  N )
Dynamic Approach – A table
• Table(x,y): best alignment for first x letters
of string 1, and first y letters of string 2
• Decide what to do with the end of string,
then look up best alignment of remainder in
Table.
e.g. ‘a’ vs. ‘s’
• “zasha” vs. “ashes”. 2 possibilities for last
letters:
– (1) match ‘a’ with ‘_’:
• best_alignment(“zash”,”ashes”)+1
– (2) match ‘s’ with ‘_’:
• best_alignment(“zasha”,”ashe”)+1
 best_alignment(“zasha”,”ashes”)
=min(best_alignment(“zash”,”ashes”)+1,
best_alignment(“zasha”,”ashe”)+1)
An example
(empty) Z A S H A
(empty)
A
S
H
E
S
Example with solution
(empty) Z A S H A
(empty) 0 1 2 3 4 5
A 1 2 1 2 3 4
S 2 3 2 1 2 3
H 3 4 3 2 1 2
E 4 5 4 3 2 3
S 5 6 5 4 3 4
zasha__
_ash_es
Pseudocode (bottom-up)
Given: Strings X,Y , Table[0..x,0..y]

For i=1 to x do
Table[i,0]=i
For j=1 to y do
Table[0,j]=i
i=1, j=1
While i<=x and j<=y
If X[x]=Y[y] Then
// matches – no underscores
Table[x,y]=Table[x-1,y-1]
Else
Table[x,y]=min(Table[x-1,y],Table[x,y-1])+1
End If
i=i+1
If i>x Then
i=1
j=j+1
End If
Pseudocode (top-down)
Given: Strings X,Y , Table[0..x,0..y]

BestAlignment (x,y)
Compute Table[x-1,y] if necessary
Compute Table[x,y-1] if necessary
Compute Table[x-1,y-1] if necessary

If X[x]=Y[y] Then
// matches – no underscores
Table[x,y]=Table[x-1,y-1]
Else
Table[x,y]=min(Table[x-1,y],Table[x,y-1])+1
End If
Running time
• Every square in table is filled in once
• Filling it in is constant time
 (n2) squares
 alg is (n2)
Idea of dynamic Albert Q.
Dynamic
programming at Whisler
mountain

Picture from PhotoDisc.com

• Re-use expensive computations


– Identify critical input to problem (e.g. best
alignment of prefixes of strings)
– Store results in table, indexed by critical input
– Solve cells in table of other cells
• Top-down often easier to program

You might also like