0% found this document useful (0 votes)
89 views3 pages

Notes 17

The document discusses substring and subsequences, and algorithms for finding the longest common subsequence (LCS) between two strings. It describes: 1) Definitions of substring and subsequence - a substring must occur contiguously in the original string, while a subsequence can have positions removed. 2) The exponential recursive algorithm to find LCS runs in O(2^n) time due to repeated subproblems. 3) The dynamic programming solution builds a 2D table where each entry is the LCS length for prefixes of the two strings, solving each subproblem only once in O(n^2) time.

Uploaded by

savio77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views3 pages

Notes 17

The document discusses substring and subsequences, and algorithms for finding the longest common subsequence (LCS) between two strings. It describes: 1) Definitions of substring and subsequence - a substring must occur contiguously in the original string, while a subsequence can have positions removed. 2) The exponential recursive algorithm to find LCS runs in O(2^n) time due to repeated subproblems. 3) The dynamic programming solution builds a 2D table where each entry is the LCS length for prefixes of the two strings, solving each subproblem only once in O(n^2) time.

Uploaded by

savio77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CS109A Notes for Lecture 3/11/96

Substrings

In ML notation, x is a substring of y if y = u^x^v


for some strings u and v.
Similar notion for lists, i.e., y = u@x@v.
Example: The substrings of aba are (the empty
string), a, b, ab, ba, and aba.
Note that a substring need not be proper (i.e.,
less than the whole string).
Special case: pre x of y is any substring x
that begins at the beginning of y.
Example: Pre xes of aba are , a, ab, and aba.
Special case: su x of y is a substring that
ends at the end of y.
Example: Su xes of aba are , a, ba, and aba.

Subsequences

A subsequence of a string y is what we can obtain


by striking out 0 or more of the positions of y.
Example: Subsequences of aba are , a, b, ab, ba,
aa, aba.
A common subsequence of x and y is a string
that is a subsequence of both.
A longest common subsequence (LCS) of x
and y is a common subsequence of x and y
that is as long as any common subsequence
of these strings.

Why LCS's?

Secret of the UNIX diff command ( nd the


di erences between two les).
diff nds a LCS of the two les and assumes the changes are \everything else."
Generalizations important in matching of
DNA sequences.
1

An Exponential LCS Algorithm

The following assumes two lists (not strings) and


computes their LCS:
fun lcs( ,nil) = nil
|
lcs(nil, ) = nil
|
lcs(x::xs, y::ys) =
if x=y then x::lcs(xs,ys)
else let
val l1 = lcs(xs, y::ys);
val l2 = lcs(x::xs, ys);
in
if length(l1) > length(l2)
then l1
else l2
end;

Problem: If size n = sum of the lengths of the


lists, then there are two recursive calls to lcs
on arguments of one smaller size.
Leads to recurrence relation T (n) =
O(n) + 2T (n
1), with solution O(2n ).

Dynamic Programming Solution

Recursions like this waste time because they wind


up solving the same problem repeatedly.
Example: If x = 1; 2; 3; 4] and y = a; b; c; d], we
call lcs twice on ( 2; 3; 4]; b; c; d]), four times on
( 3; 4]; c; d]) , and so on.
Dynamic programming solutions tabulate the
answers to subproblems, so they are available
for use many times.
Example: The most common example is comn
n
n
puting m by the recursion m = m 1 + nm1
1
vs. computing it by Pascal's triangle (see p. 172,
FCS).
For LCS, build an array L such that L i] j ] is
the length of the LCS for the rst i positions
of x and the rst j positions of y.
Given this array, lled in, one can easily
recover an LCS | see p. 324 , FCS.
2

Fill in order of i + j .
Basis: i + j = 0. Surely L 0] 0] = 0.

Induction:

If either i or j is 0, then L i] j ] = 0.
If neither is 0, consider ai and bj , the ith and
j th elements of strings x and y , respectively.
If ai = bj , L i] j ] = 1 + L i 1] j 1].
Otherwise, L i] j ] is the larger of
L i] j
1] and L i 1] j ].
Either way, the L entries needed have already
been computed.

Running Time of LCS

If n = sum of lengths of strings, time is O(n2 ).


Fill (n + 1)2 entries, each in O(1) time.

You might also like