0% found this document useful (0 votes)
3 views

L26 DynamicProgramming III

This document discusses the longest common subsequence (LCS) problem and its solution using dynamic programming. It begins by defining the LCS problem and providing an example. It then describes a brute force solution with exponential runtime. The document introduces the dynamic programming approach, showing that the problem exhibits optimal substructure and overlapping subproblems. It presents the recursive formula and uses a table to compute the length of the LCS in quadratic time. Finally, it explains how to reconstruct the actual LCS sequence by tracing back in the table.

Uploaded by

Akash Sahu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

L26 DynamicProgramming III

This document discusses the longest common subsequence (LCS) problem and its solution using dynamic programming. It begins by defining the LCS problem and providing an example. It then describes a brute force solution with exponential runtime. The document introduces the dynamic programming approach, showing that the problem exhibits optimal substructure and overlapping subproblems. It presents the recursive formula and uses a table to compute the length of the LCS in quadratic time. Finally, it explains how to reconstruct the actual LCS sequence by tracing back in the table.

Uploaded by

Akash Sahu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Dynamic Programming - III

Longest Common Sequence

Instructor: Ashok Singh Sairam


Lecture Plan
• Longest Common Sequence
 Motivation and Problem Definition
 Brute Force Approach
 Dynamic Programming Approach

MA512: Data Structures and Algorithms


2
Longest Common Subsequence (LCS)
• We have two strings X and Y and we want to find
the longest possible string that is a subsequence
of both X and Y.
• When we say that Z is a subsequence of X, we
mean we can derive Z from X by removing some of
its elements

MA512: Data Structures and Algorithms


3
Ex: LCS
• X = A, B, C, B, D, A, B X = A, B, C, B, D, A, B
• Y = B, D, C, A, B, A Y = B, D, C, A, B, A

• B, C, B, A and B, D, A, B are longest


common subsequences of X and Y (length = 4)

• B, C, A, however is not a LCS of X and Y

MA512: Data Structures and Algorithms


4
Practical example of LCS
• DNA string: A sequence of symbols A, C, G, T
 S= ACCGGTCGAGCTTCGAATT

• DNA analysis: two DNA string comparison

MA512: Data Structures and Algorithms


5
Practical example of LCS
• DNA strand has millions of base elements and even a
simple gene has many thousands of elements.
 Efficiency in comparison is important

• Exact match of gene unlikely, so researchers look for


common subsequences of gene and the DNA strand
 If subsequence is most of length of gene sequence,

strand is regarded as containing the gene.

MA512: Data Structures and Algorithms


6
Problem Defintion
• Given two sequences
X = x1, x2, …, xm
Y = y1, y2, …, yn
find a maximum length common subsequence
(LCS) of X and Y

MA512: Data Structures and Algorithms


7
Brute-Force Solution
• For every subsequence of X, check whether it’s a
subsequence of Y

• There are 2m subsequences of X to check


• Each subsequence takes (n) time to check
 scan Y for first letter, scan for second, and so on

• Running time: (n2m)

MA512: Data Structures and Algorithms


8
Notations
• Given a sequence X = x1, x2, …, xm we define the
i-th prefix of X, for i = 0, 1, 2, …, m

Xi = x1, x2, …, xi

• c[i, j] = the length of an LCS of the sequences


Xi = x1, x2, …, xi and Yj = y1, y2, …, yj

MA512: Data Structures and Algorithms


9
LCS DP: Step 1
• Characterize optimal substructure

MA512: Data Structures and Algorithms


10
What the theorem says?
• Case 1: If 𝑥𝑚 = 𝑦𝑛 find LCS of 𝑥𝑚−1 and 𝑦𝑛−1 ,
then append 𝑥𝑚
 Include one element into the common sequence (Z) and
solve the resulting subproblem
 One subproblem

X = A, B, D, E
Y = Z, B, E

MA512: Data Structures and Algorithms


11
What the theorem says? – contd.
• Case2: If 𝑥𝑚 ≠ 𝑦𝑛 find LCS of 𝑥𝑚−1 and 𝑦𝑛 and LCS
of 𝑥𝑚 and 𝑦𝑛−1 , take which one is longer
 Exclude an element from a string and solve the resulting
subproblem
 Results in two subproblems
X = A, B, D, G
Y = Z, B, D

• find a LCS of Xi-1 and Yj: Xi-1 = A, B, D and Yj = Z, B, D


• find a LCS of Xi and Yj-1: Xi = A, B, D, G and Yj = Z, B

MA512: Data Structures and Algorithms


12
Overlapping substructure
• Overlapping substructure
 Both LCS of 𝑥𝑚−1 and 𝑦𝑛 and
LCS of 𝑥𝑚 and 𝑦𝑛−1 ,
will need to solve LCS of 𝑥𝑚−1 and 𝑦𝑛−1

MA512: Data Structures and Algorithms


13
LCS DP: Step 2 – Recursive solution
• Recall c[i, j] = Length of LCS of sequences Xi and Yj
• Case 1: xi = yj
c[i, j] = c[i - 1, j - 1] + 1
• Case 2: xi  yj
c[i, j] = max { c[i - 1, j], c[i, j-1] }
• Recursive formula

MA512: Data Structures and Algorithms


14
Overlapping Subproblems
• To find a LCS of X and Y
 we may need to find the LCS between Xm and Yn-1 and
that of Xm-1 and Y
 Both the above subproblems has the subproblem of
finding the LCS of Xm-1 and Yn-1

• Subproblems share subsubproblems

MA512: Data Structures and Algorithms


15
LCS DP: Step 3 – Computing length
of the LCS

j
• Store the value of yj: y1 y2 yn
the c[i,j] in a table 0 xi 0 0 0 0 0 0
 Compute the 1 x1 first
0
entries in a row-
2 x2 0 second
major order i
0
0
m xm 0
0 1 2 n
MA512: Data Structures and Algorithms
16
Additional information
A matrix b[i, j]:
• For a subproblem [i, j] it 0 1 2 3 n
tells us what choice was b & c: yj: A C D F
made to obtain the 0 xi 0 0 0 0 0 0
optimal value 1 A 0
• If xi = yj
2 B
b[i, j] = “ ” 0 c[i-1,j]
i
• Else, if c[i - 1, j] ≥ c[i, j-1] 3 C 0 c[i,j-1]
b[i, j] = “  ” 0
else m D 0
b[i, j] = “  ”
j

MA512: Data Structures and Algorithms


17
LCS-LENGTH(X, Y)
1. for i ← 1 to m
The length of the LCS if one of the sequences
2. do c[i, 0] ← 0 is empty is zero
3. for j ← 0 to n
4. do c[0, j] ← 0
5. for i ← 1 to m
6. do for j ← 1 to n
7. do if xi = yj
Case 1: xi = yj
8. then c[i, j] ← c[i - 1, j - 1] + 1
9. b[i, j ] ← “ ”
10. else if c[i - 1, j] ≥ c[i, j - 1]
11. then c[i, j] ← c[i - 1, j]
12. b[i, j] ← “↑”
Case 2: xi  yj
13. else c[i, j] ← c[i, j - 1]
14. b[i, j] ← “←”
15. return c and b
Running time: (mn)
MA512: Data Structures and Algorithms
18
0 if i = 0 or j = 0
Example c[i, j] = c[i-1, j-1] + 1 if xi = yj
max(c[i, j-1], c[i-1, j]) if xi  yj
X = A, B, C, B, D, A
Y = B, D, C, A, B, A 0 1 2 3 4 5 6
yj B D C A B A
If xi = yj 0 xi 0 0 0 0 0 0 0
b[i, j] = “ ” 1 A   
0 0 0 0 1 1 1
Else if c[i - 1, j] ≥ c[i, j-1] 
2 B 0 1 1 1 1 2 2
b[i, j] = “  ”
3 C    
else 0 1 1 2 2 2 2
4 B   
b[i, j] = “  ” 0 1 1 2 2 3 3
5 D     
0 1 2 2 2 3 3
   
6 A 0 1 2 2 3 3 4
   
7 B 0 1 2 2 3 4 4

19
4. Constructing a LCS
• Start at b[m, n] and
follow the arrows 0 1 2 3 4 5 6
• When we encounter yj B D C A B A
a “ “ in b[i, j]  0 xi 0 0 0 0 0 0 0
xi = yj is an 1 A 0

0

0

0 1 1 1
element of the LCS 2 B 
0 1 1 1 1 2 2
3 C    
0 1 1 2 2 2 2
4 B   
0 1 1 2 2 3 3
5 D     
0 1 2 2 2 3 3
   
6 A 0 1 2 2 3 3 4
   
7 B 0 1 2 2 3 4 4
20
PRINT-LCS(b, X, i, j)
1. if i = 0 or j = 0 Running time: (m + n)
2. then return
3. if b[i, j] = “ ”
4. then PRINT-LCS(b, X, i - 1, j - 1)
5. print xi
6. elseif b[i, j] = “↑”
7. then PRINT-LCS(b, X, i - 1, j)
8. else PRINT-LCS(b, X, i, j - 1)
Initial call: PRINT-LCS(b, X, length[X], length[Y])
21
Improving the Code
• What can we say about how each entry c[i, j] is
computed?
 It depends on c[i -1, j - 1], c[i - 1, j], and c[i, j - 1]
 Eliminate table b and compute in O(1) which of the
three values was used to compute c[i, j]
 We save (mn) space from table b
 However, we do not asymptotically decrease the
auxiliary space requirements: still need table c

MA512: Data Structures and Algorithms


22
Improving the Code – contd.
• If we only need the length of the LCS
 LCS-LENGTH works only on two rows of c at a time

• The row being computed and the previous row

 We can reduce the asymptotic space requirements by


storing only these two rows

MA512: Data Structures and Algorithms


23
Exercise

MA512: Data Structures and Algorithms


24

You might also like