0% found this document useful (0 votes)

11 views

Lecture 03

The document discusses using lcp-comparisons to improve the efficiency of sorting and searching strings. Lcp-comparisons provide more information than standard comparisons by returning both the order and the length of the longest common prefix. String sorting algorithms like mergesort and binary search can leverage this extra information to reduce runtime complexity from O(n log n) or O(m log n) to O(ΣLCP(R) + n log n) or O(m logm n), where m is the length of the query string.

Uploaded by

demomento

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Lecture 03

Uploaded by

demomento

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Lcp-Comparisons

General (non-string) comparison-based sorting algorithms are not optimal

for sorting strings because of an imbalance between effort and result in a
string comparison: it can take a lot of time but the result is only a bit or a
trit of useful information.
String quicksort solves this problem by processing the obtained information
immediately after each symbol comparison.
An opposite approach is to replace a standard string comparison with an
lcp-comparison, which is the operation LcpCompare(A, B, k):

• The return value is the pair (x, `), where x ∈ {<, =, >} indicates the
order, and ` = lcp(A, B), the length of the longest common prefix of
strings A and B.

• The input value k is the length of a known common prefix, i.e., a lower
bound on lcp(A, B). The comparison can skip the first k characters.

The extra time spent in the comparison is balanced by the extra information
obtained in the form of the lcp value.
49
The following result shows how we can use the information from earlier
comparisons to obtain a lower bound or even the exact value for an lcp.
Lemma 1.30: Let A, B and C be strings.

(a) lcp(A, C) ≥ min{lcp(A, B), lcp(B, C)}.

(b) If A ≤ B ≤ C, then lcp(A, C) = min{lcp(A, B), lcp(B, C)}.
(c) If lcp(A, B) 6= lcp(B, C), then lcp(A, C) = min{lcp(A, B), lcp(B, C)}.

Proof. Assume ` = lcp(A, B) ≤ lcp(B, C). The opposite case

lcp(A, B) ≥ lcp(B, C) is symmetric.

(a) Now A[0..`) = B[0..`) = C[0..`) and thus lcp(A, C) ≥ `.

(b) Either |A| = ` or A[`] < B[`] ≤ C[`]. In either case, lcp(A, C) = `.
(c) Now lcp(A, B) < lcp(B, C). If lcp(A, C) > min{lcp(A, B), lcp(B, C)}, then
lcp(A, B) < min{lcp(A, C), lcp(B, C)}, which violates (a).

The above means that the three lcp values between three strings can never
be three different values. At least two of them are the same and the third
one is the same or bigger.
50
It can also be possible to determine the order of two strings without
comparing them directly.

Lemma 1.31: Let A, B, B 0 and C be strings such that A ≤ B ≤ C and

A ≤ B 0 ≤ C.

(a) If lcp(A, B) > lcp(A, B 0 ), then B < B 0 .

(b) If lcp(B, C) > lcp(B 0 , C), then B > B 0 .

Proof. We show (a); (b) is symmetric. Assume to the contrary that B ≥ B 0 .

Then by Lemma 1.30, lcp(A, B) = min{lcp(A, B 0 ), lcp(B 0 , B)} ≤ lcp(A, B 0 ),
which is a contradiction.

Intuitively, the above result makes sense if you think of lcp(·, ·) as a measure
of similarity between two strings. The higher the lcp, the closer the two
strings are lexicographically.

51
String Mergesort
String mergesort is a string sorting algorithm that uses lcp-comparisons. It
has the same structure as the standard mergesort: sort the first half and the
second half separately, and then merge the results.
Algorithm 1.32: StringMergesort(R)
Input: Set R = {S1 , S2 , . . . , Sn } of strings.
Output: R sorted and augmented with LCPR values.
(1) if |R| = 1 then return ((S1 , 0))
(2) m ← bn/2c
(3) P ← StringMergesort({S1 , S2 , . . . , Sm })
(4) Q ← StringMergesort({Sm+1 , Sm+2 , . . . , Sn })
(5) return StringMerge(P, Q)

The output is of the form

((T1 , `1 ), (T2 , `2 ), . . . , (Tn , `n ))
where `i = lcp(Ti , Ti−1 ) for i > 1 and `1 = 0. In other words, `i = LCPR [i].
Thus we get not only the order of the strings but also a lot of information
about their common prefixes. The procedure StringMerge uses this
information effectively.
52
Algorithm 1.33: StringMerge(P,Q)
Input: Sequences P = (S1 , k1 ), . . . , (Sm , km ) and Q = (T1 , `1 ), . . . , (Tn , `n )
Output: Merged sequence R
(1) R ← ∅; i ← 1; j ← 1
(2) while i ≤ m and j ≤ n do
(3) if ki > `j then append (Si , ki ) to R; i ← i + 1
(4) else if `j > ki then append (Tj , `j ) to R; j ← j + 1
(5) else // ki = `j
(6) (x, h) ← LcpCompare(Si , Tj , ki )
(7) if x = ”<” then
(8) append (Si , ki ) to R; i ← i + 1
(9) `j ← h
(10) else
(11) append (Tj , `j ) to R; j ← j + 1
(12) ki ← h
(13) while i ≤ m do append (Si , ki ) to R; i ← i + 1
(14) while j ≤ n do append (Tj , `j ) to R; j ← j + 1
(15) return R

53
Lemma 1.34: StringMerge performs the merging correctly.
Proof. We will show that the following invariant holds at the beginning of
each round in the loop on lines (2)–(12):

Let X be the last string appended to R (or ε if R = ∅). Then

ki = lcp(X, Si ) and `j = lcp(X, Tj ).

The invariant is clearly true in the beginning. We will show that the invariant
is maintained and the smaller string is chosen in each round of the loop.

• If ki > `j , then lcp(X, Si) > lcp(X, Tj ) and thus

– Si < Tj by Lemma 1.31.
– lcp(Si , Tj ) = lcp(X, Tj ) because, by Lemma 1.30,
lcp(X, Tj ) = min{lcp(X, Si ), lcp(Si , Tj )}.
Hence, the algorithm chooses the smaller string and maintains the
invariant. The case `j > ki is symmetric.

• If ki = `j , then clearly lcp(Si, Tj ) ≥ ki and the call to LcpCompare is safe,

and the smaller string is chosen. The update `j ← h or ki ← h maintains
the invariant.
54
Theorem 1.35: String mergesort sorts a set R of n strings in
O(ΣLCP (R) + n log n) time.

Proof. If the calls to LcpCompare took constant time, the time complexity
would be O(n log n) by the same argument as with the standard mergesort.

Whenever LcpCompare makes more than one, say t + 1 symbol

comparisons, one of the lcp values stored with the strings increases by t.
Since the sum of the final lcp values is exactly ΣLCP (R), the extra time
spent in LcpCompare is bounded by O(ΣLCP (R)).

• Other comparison based sorting algorithms, for example heapsort and

insertion sort, can be adapted for strings using the lcp-comparison
technique.

55
String Binary Search

An ordered array is a simple static data structure supporting queries in

O(log n) time using binary search.

Algorithm 1.36: Binary search

Input: Ordered set R = {k1 , k2 , . . . , kn }, query value x.
Output: The number of elements in R that are smaller than x.
(1) lef t ← 0; right ← n + 1 // output value is in the range [lef t..right)
(2) while right − lef t > 1 do
(3) mid ← b(lef t + right)/2c
(4) if kmid < x then lef t ← mid
(5) else right ← mid
(6) return lef t

With strings as elements, however, the query time is

• O(m log n) in the worst case for a query string of length m.

• O(log n logσ n) on average for a random set of strings.

56
We can use the lcp-comparison technique to improve binary search for
strings. The following is a key result.

Lemma 1.37: Let A, B, B 0 and C be strings such that A ≤ B ≤ C and

A ≤ B 0 ≤ C. Then lcp(B, B 0 ) ≥ lcp(A, C).

Proof. Let Bmin = min{B, B 0 } and Bmax = max{B, B 0 }. By Lemma 1.30,

lcp(A, C) = min(lcp(A, Bmax ), lcp(Bmax , C))
≤ lcp(A, Bmax ) = min(lcp(A, Bmin ), lcp(Bmin , Bmax ))
≤ lcp(Bmin , Bmax ) = lcp(B, B 0 )

57
During the binary search of P in {S1 , S2 , . . . , Sn }, the basic situation is the
following:

• We want to compare P and Smid.

• We have already compared P against Slef t and Sright, and we know that
Slef t ≤ P, Smid ≤ Sright .

• By using lcp-comparisons, we know lcp(Slef t, P ) and lcp(P, Sright).

By Lemmas 1.30 and 1.37,

lcp(P, Smid ) ≥ lcp(Slef t , Sright ) = min{lcp(Slef t , P ), lcp(P, Sright )}
Thus we can skip min{lcp(Slef t , P ), lcp(P, Sright )} first characters when
comparing P and Smid .

58
Algorithm 1.38: String binary search (without precomputed lcps)
Input: Ordered string set R = {S1 , S2 , . . . , Sn }, query string P .
Output: The number of strings in R that are smaller than P .
(1) lef t ← 0; right ← n + 1
(2) llcp ← 0 // llcp = lcp(Slef t , P )
(3) rlcp ← 0 // rlcp = lcp(P, Sright )
(4) while right − lef t > 1 do
(5) mid ← b(lef t + right)/2c
(6) mlcp ← min{llcp, rlcp}
(7) (x, mlcp) ← LcpCompare(P, Smid , mlcp)
(8) if x = “ < ” then right ← mid; rlcp ← mclp
(9) else lef t ← mid; llcp ← mclp
(10) return lef t

• The average case query time is now O(log n).

• The worst case query time is still O(m log n) (exercise).

59
We can improve the worst case complexity by choosing the midpoint closer
to the smaller lcp value:

• If llcp − rlcp > 1, choose the middle position closer to the right.

• This is achieved by choosing the midpoint as weighted average of the

left position and the right position. The weights are d and ln(d + 1),
where d = llcp − rlcp.

• If rlcp − llcp > 1, choose the middle position closer to the left in a
symmetric way.

• The worst case time complexity of the resulting algorithm (shown on

the next slide) is O(m logm n). The proof is omitted here.

• The lower bound on string binary searching time has been shown to be
 
m log log n
Θ + m + log n .
log log m log log n
log n

There is a complicated algorithm achieving this time complexity.

60
Algorithm 1.39: Skewed string binary search (without precomputed lcps)
Input: Ordered string set R = {S1 , S2 , . . . , Sn }, query string P .
Output: The number of strings in R that are smaller than P .
(1) lef t ← 0; right ← n + 1
(2) llcp ← 0 // llcp = lcp(Slef t , P )
(3) rlcp ← 0 // rlcp = lcp(P, Sright )
(4) while right − lef t > 1 do
(5) if llcp − rlcp > 1 then
(6) d ← llcp − rlcp
(7) mid ← d((ln(d + 1)) · lef t + d · right)/(d + ln(d + 1))e
(8) else if rlcp − llcp > 1 then
(9) d ← rlcp − llcp
(10) mid ← b(d · lef t + (ln(d + 1)) · right)/(d + ln(d + 1))c
(11) else
(12) mid ← b(lef t + right)/2c
(13) mlcp ← min{llcp, rlcp}
(14) (x, mlcp) ← LcpCompare(P, Smid , mlcp)
(15) if x = “ < ” then right ← mid; rlcp ← mclp
(16) else lef t ← mid; llcp ← mclp
(17) return lef t

61
The lower bound above assumes that no other information besides the
ordering of the strings is given. We can further improve string binary
searching by using precomputed information about the lcp’s between the
strings in R.
Consider again the basic situation during string binary search:

• We want to compare P and Smid.

• We have already compared P against Slef t and Sright, and we know

lcp(Slef t , P ) and lcp(P, Sright ).

In the unskewed algorithm, the values lef t and right are fully determined by
mid independently of P . That is, P only determines whether the search ends
up at position mid at all, but if it does, lef t and right are always the same.
Thus, we can precompute and store the values
LLCP [mid] = lcp(Slef t , Smid )
RLCP [mid] = lcp(Smid , Sright )
Now we know all lcp values between P , Slef t , Smid , Sright except lcp(P, Smid ).
The following lemma shows how to utilize this.
62
Lemma 1.40: Let A, B, B 0 and C be strings such that A ≤ B ≤ C and
A ≤ B 0 ≤ C.

(a) If lcp(A, B) > lcp(A, B 0 ), then B < B 0 and lcp(B, B 0 ) = lcp(A, B 0 ).

(b) If lcp(A, B) < lcp(A, B 0 ), then B > B 0 and lcp(B, B 0 ) = lcp(A, B).
(c) If lcp(B, C) > lcp(B 0 , C), then B > B 0 and lcp(B, B 0 ) = lcp(B 0 , C).
(d) If lcp(B, C) < lcp(B 0 , C), then B < B 0 and lcp(B, B 0 ) = lcp(B, C).
(e) If lcp(A, B) = lcp(A, B 0 ) and lcp(B, C) = lcp(B 0 , C), then
lcp(B, B 0 ) ≥ max{lcp(A, B), lcp(B, C)}.

Proof. Cases (a)–(d) are symmetrical, we show (a). B < B 0 follows from
Lemma 1.31. Then by Lemma 1.30, lcp(A, B 0 ) = min{lcp(A, B), lcp(B, B 0 )}.
Since lcp(A, B 0 ) < lcp(A, B), we must have lcp(A, B 0 ) = lcp(B, B 0 ).

In case (e), we use Lemma 1.30:

lcp(B, B 0 ) ≥ min{lcp(A, B), lcp(A, B 0 )} = lcp(A, B)
lcp(B, B 0 ) ≥ min{lcp(B, C), lcp(B 0 , C)} = lcp(B, C)
Thus lcp(B, B 0 ) ≥ max{lcp(A, B), lcp(B, C)}.
63
Algorithm 1.41: String binary search (with precomputed lcps)
Input: Ordered string set R = {S1 , S2 , . . . , Sn }, arrays LLCP and RLCP,
query string P .
Output: The number of strings in R that are smaller than P .
(1) lef t ← 0; right ← n + 1
(2) llcp ← 0; rlcp ← 0
(3) while right − lef t > 1 do
(4) mid ← b(lef t + right)/2c
(5) if LLCP [mid] > llcp then lef t ← mid
(6) else if LLCP [mid] < llcp then right ← mid; rlcp ← LLCP [mid]
(7) else if RLCP [mid] > rlcp then right ← mid
(8) else if RLCP [mid] < rlcp then lef t ← mid; llcp ← RLCP [mid]
(9) else
(10) mlcp ← max{llcp, rlcp}
(11) (x, mlcp) ← LcpCompare(P, Smid , mlcp)
(12) if x = “ < ” then right ← mid; rlcp ← mclp
(13) else lef t ← mid; llcp ← mclp
(14) return lef t

64
Theorem 1.42: An ordered string set R = {S1 , S2 , . . . , Sn } can be
preprocessed in O(ΣLCP (R) + n) time and O(n) space so that a binary
search with a query string P can be executed in O(|P | + log n) time.

Proof. The values LLCP [mid] and RLCP [mid] can be computed in
O(lcp(Smid , R \ {Smid }) + 1) time. Thus the arrays LLCP and RLCP can be
computed in O(Σlcp(R) + n) = O(ΣLCP (R) + n) time and stored in O(n)
space.

The main while loop in Algorithm 1.41 is executed O(log n) times and
everything except LcpCompare on line (11) needs constant time.

If a given LcpCompare call performs t + 1 symbol comparisons, mclp

increases by t on line (11). Then on lines (12)–(13), either llcp or rlcp
increases by at least t, since mlcp was max{llcp, rlcp} before LcpCompare.
Since llcp and rlcp never decrease and never grow larger than |P |, the total
number of extra symbol comparisons in LcpCompare during the binary
search is O(|P |).

Other comparison-based data structures such as binary search trees can be

augmented with lcp information in the same way (study groups).

65
Hashing and Fingerprints
Hashing is a powerful technique for dealing with strings based on mapping
each string to an integer using a hash function:
H : Σ∗ → [0..q) ⊂ N
The most common use of hashing is with hash tables. Hash tables come in
many flavors that can be used with strings as well as with any other type of
object with an appropriate hash function. A drawback of using a hash table
to store a set of strings is that they do not support lcp and prefix queries.

Hashing is also used in other situations, where one needs to check whether
two strings S and T are the same or not:

• If H(S) 6= H(T ), then we must have S 6= T .

• If H(S) = H(T ), then S = T and S 6= T are both possible.

If S 6= T , this is called a collision.

When used this way, the hash value is often called a fingerprint, and its
range [0..q) is typically large as it is not restricted by a hash table size.
66
Any good hash function must depend on all characters. Thus computing
H(S) needs Ω(|S|) time, which can defeat the advantages of hashing:

• A plain comparison of two strings is faster than computing the hashes.

• The main strength of hash tables is the support for constant time
insertions and queries, but for example inserting a string S into a hash
table needs Ω(|S|) time when the hash computation time is included.
Compare this to the O(|S|) time for a trie under a constant alphabet
and the O(|S| + log n) time for a ternary trie.

However, a hash table can still be competitive in practice. Furthermore,

there are situations, where a full computation of the hash function can be
avoided:

• A hash value can be computed once, stored, and used many times.

• Some hash functions can be computed more efficiently for a related set
of strings. An example is the Karp–Rabin hash function.

67
Definition 1.43: The Karp–Rabin hash function for a string
S = s0 s1 . . . sm−1 over an integer alphabet is
H(S) = (s0 rm−1 + s1 rm−2 + · · · + sm−2 r + sm−1 ) mod q
for some fixed positive integers q and r.
Lemma 1.44: For any two strings A and B,
H(AB) = (H(A) · r|B| + H(B)) mod q
H(B) = (H(AB) − H(A) · r|B| ) mod q

Proof. Without the modulo operation, the result would be obvious. The
modulo does not interfere because of the rules of modular arithmetic:
(x + y) mod q = ((x mod q) + (y mod q)) mod q
(xy) mod q = ((x mod q)(y mod q)) mod q

Thus we can quickly compute H(AB) from H(A) and H(B), and H(B) from
H(AB) and H(A). We will see applications of this later.
If q and r are coprime, then r has a multiplicative inverse r−1 modulo q, and
we can also compute H(A) = ((H(AB) − H(B)) · (r−1 )|B| ) mod q.
68
The parameters q and r have to be chosen with some care to ensure that
collisions are rare for any reasonable set of strings.

• The original choice is r = σ and q is a large prime.

• Another possibility is that q is a power of two and r is a small prime
(r = 37 has been suggested). This is faster in practice, because the
slow modulo operations can be replaced by bitwise shift operations. If
q = 2w , where w is the machine word size, the modulo operations can
be omitted completely. (But a bad case for this is a Thue-Morse
sequence.)

• If q and r were both powers of two, then only the last d(log q)/ log re
characters of the string would affect the hash value. More generally, q
and r should be coprime, i.e, have no common divisors other than 1.

• The hash function can be randomized by choosing q or r randomly. For

example, if q is a prime and r is chosen uniformly at random from [0..q),
the probability that two strings of length m collide is at most m/q.

• A random choice over a set of possibilities has the additional advantage

that we can change the choice if the first choice leads to too many
collisions.
69
Automata
Finite automata are a well known way of representing sets of strings. In this
case, the set is often called a (regular) language.
A trie is a special type of an automaton.
• The root is the initial state, the leaves are accept states, ...
• Trie is generally not a minimal automaton.
• Trie techniques including path compaction can be applied to automata.

Automata are much more powerful than tries in representing languages:

• Infinite languages
• Nondeterministic automata
• Even an acyclic, deterministic automaton can represent a language of
exponential size.

Automata support set inclusion testing but not other trie operations:
• No insertions and deletions
• No satellite data, i.e., data associated to each string

70
Sets of Strings: Summary

Efficient algorithms and data structures for sets of strings:

• Storing and searching: trie and ternary trie and their compact versions,
string binary search, Karp–Rabin hashing.
• Sorting: string quicksort and mergesort, LSD and MSD radix sort.

Lower bounds:

• Many of the algorithms are optimal.

• General purpose algorithms are asymptotically slower.

The central role of longest common prefixes:

• LCP array LCPR and its sum ΣLCP (R).

• Lcp-comparison technique.

71
Selected Literature

• Trie:
Fredkin: Trie Memory. Communications of the ACM. 3(9),
1960, pp. 490–499.

• Compact trie:
Morrison: PATRICIA—Practical Algorithm To Retrieve
Information Coded in Alphanumeric. Journal of the ACM, 15(4),
1968, pp. 514–534.

• Ternary trie and string quicksort:

Bentley & Sedgewick: Fast algorithms for sorting and searching
strings. Proc. 8th Annual ACM–SIAM Symposium on Discrete
Algorithms (SODA), 1997, pp. 36–369.

• MSD radix sort in O(ΣLCP (R) + n + σ) time:

Paige & Tarjan: Three partition refinement algorithms. SIAM
Journal on Computing 16(6), 1987, pp. 973–989.

72
• String mergesort:
Ng & Kakehi: Merging String Sequences by Longest Common
Prefixes. IPSJ Journal, 49(2), 2008, pp. 958–967.

• Complexity of string binary search without precomputed lcp information:

Andersson, Hagerup,, Håstad, & Petersson: Tight bounds for
searching a sorted array of strings. SIAM Journal on Computing,
30(5), 2000, pp. 1552–1578.

• LCP array and string binary search using lcp information:

Manber & Myers: Suffix Arrays: A New Method for On-Line
String Searches. SIAM Journal on Computing. 22(5), 1993,
pp. 935–948.

• Karp–Rabin hashing:
Karp & Rabin: Efficient randomized pattern-matching
algorithms. IBM Journal of Research and Development, 31 (2),
1987, pp. 249–260.

Azure Foundations Cheat Sheets
93% (14)
Azure Foundations Cheat Sheets
19 pages
Capstone Chapter 9 Case Problem Grey Code Corporation SBA 1 2
No ratings yet
Capstone Chapter 9 Case Problem Grey Code Corporation SBA 1 2
10 pages
Week 13.. - Laudon-Traver - Ec15 - PPT - ch12 - PPT
No ratings yet
Week 13.. - Laudon-Traver - Ec15 - PPT - ch12 - PPT
58 pages
Lecture03 PDF
No ratings yet
Lecture03 PDF
22 pages
Lecture 02
No ratings yet
Lecture 02
20 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
1 s2.0 S0020019015000411 Main
No ratings yet
1 s2.0 S0020019015000411 Main
3 pages
Lec06 448
No ratings yet
Lec06 448
6 pages
Lecture 04 Inaryseachtree
No ratings yet
Lecture 04 Inaryseachtree
20 pages
Boyer Moore Algorithm: Idan Szpektor
100% (1)
Boyer Moore Algorithm: Idan Szpektor
48 pages
54.string 2notes
No ratings yet
54.string 2notes
20 pages
String Processing Algorithms (Autumn 2014) Exercises 3 (November 11)
No ratings yet
String Processing Algorithms (Autumn 2014) Exercises 3 (November 11)
1 page
B306 DAA Lab Manual Exp 7
No ratings yet
B306 DAA Lab Manual Exp 7
8 pages
Lect11 DP Lcs
No ratings yet
Lect11 DP Lcs
6 pages
Strings
No ratings yet
Strings
9 pages
MS 101: Algorithms: Instructor Neelima Gupta Ngupta@cs - Du.ac - in
No ratings yet
MS 101: Algorithms: Instructor Neelima Gupta Ngupta@cs - Du.ac - in
28 pages
Longest Common Subsquence
No ratings yet
Longest Common Subsquence
8 pages
CSC323 Module 2 Classical Design Techniques NEW
No ratings yet
CSC323 Module 2 Classical Design Techniques NEW
64 pages
Longest Common String
No ratings yet
Longest Common String
40 pages
Dynamic Programing in Dsa
No ratings yet
Dynamic Programing in Dsa
32 pages
String Matching and Hashing
No ratings yet
String Matching and Hashing
10 pages
Lec04a-full version
No ratings yet
Lec04a-full version
26 pages
An Introduction To Programming Though C++: Abhiram G. Ranade Ch. 16: Arrays and Recursion
No ratings yet
An Introduction To Programming Though C++: Abhiram G. Ranade Ch. 16: Arrays and Recursion
49 pages
Assign5 Solution
No ratings yet
Assign5 Solution
4 pages
APExp4 tekrat
No ratings yet
APExp4 tekrat
6 pages
Draft 1
No ratings yet
Draft 1
6 pages
Prefix Tree and String Matching
No ratings yet
Prefix Tree and String Matching
18 pages
CSE 205 Lab Manual 13 LCS
No ratings yet
CSE 205 Lab Manual 13 LCS
5 pages
AAD Lec11
No ratings yet
AAD Lec11
5 pages
Dsa Lab File: III-rd SEMESTER 2021 Name - Mudrax Dagar ROLL NO. - 2K20/CO/281
No ratings yet
Dsa Lab File: III-rd SEMESTER 2021 Name - Mudrax Dagar ROLL NO. - 2K20/CO/281
104 pages
Z Function and Its Calculation:: Int Int Int Int For Int If While If
No ratings yet
Z Function and Its Calculation:: Int Int Int Int For Int If While If
32 pages
Deterministic Selection and Sorting:: Problem P
No ratings yet
Deterministic Selection and Sorting:: Problem P
18 pages
13. W-8_L-1_DP Longest Common Subsequence and Edit Distance.pptx
No ratings yet
13. W-8_L-1_DP Longest Common Subsequence and Edit Distance.pptx
19 pages
L5 Dsa
No ratings yet
L5 Dsa
25 pages
Practical 6 - Standard Functions
No ratings yet
Practical 6 - Standard Functions
7 pages
Insertion Sort Algorithm: Problem Exercises Problem 1 Pseudocode
No ratings yet
Insertion Sort Algorithm: Problem Exercises Problem 1 Pseudocode
4 pages
My Notes - LeetCode
100% (1)
My Notes - LeetCode
31 pages
Strings
No ratings yet
Strings
73 pages
The Longest Common Extension Problem Revisited and Applications To Approximate String Searching (2010)
No ratings yet
The Longest Common Extension Problem Revisited and Applications To Approximate String Searching (2010)
11 pages
Module V
No ratings yet
Module V
4 pages
12 StringMatching
No ratings yet
12 StringMatching
23 pages
DSA AND ALGO
No ratings yet
DSA AND ALGO
43 pages
Between Strings
No ratings yet
Between Strings
3 pages
32. Longest Common Subsequence (Dynamic Programming)
No ratings yet
32. Longest Common Subsequence (Dynamic Programming)
18 pages
DAA - Week 11 - Lecture 1 - Longest Common Subsequence
No ratings yet
DAA - Week 11 - Lecture 1 - Longest Common Subsequence
9 pages
Geeksforgeeks (Set1)
No ratings yet
Geeksforgeeks (Set1)
185 pages
Advanced Recursion-1407
No ratings yet
Advanced Recursion-1407
12 pages
03 Myers Bit Vector
No ratings yet
03 Myers Bit Vector
12 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
Mathematical Model For String Pattern Matching Algorithm (Boyer-Moore's Algorithm)
No ratings yet
Mathematical Model For String Pattern Matching Algorithm (Boyer-Moore's Algorithm)
5 pages
4 module algorithms
No ratings yet
4 module algorithms
28 pages
Computer Algoritham For Chennai Univarsity Unit5
No ratings yet
Computer Algoritham For Chennai Univarsity Unit5
11 pages
Unit2 PDF
No ratings yet
Unit2 PDF
15 pages
Lecture 8
No ratings yet
Lecture 8
30 pages
Daa 9
No ratings yet
Daa 9
4 pages
193
No ratings yet
193
16 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Module 2 Coa
No ratings yet
Module 2 Coa
13 pages
CMA Cyber Incident Respons Plan Template Public
No ratings yet
CMA Cyber Incident Respons Plan Template Public
19 pages
chuyen sau tu vung
No ratings yet
chuyen sau tu vung
34 pages
Samsung Mobile: Market Share and Profitability in Smartphones
No ratings yet
Samsung Mobile: Market Share and Profitability in Smartphones
25 pages
Apple Pay Instant Link – Telegraph
No ratings yet
Apple Pay Instant Link – Telegraph
11 pages
Principles of Information: Systems, Tenth Edition
No ratings yet
Principles of Information: Systems, Tenth Edition
58 pages
CRPC 1973-Usha Jaganath Guide
No ratings yet
CRPC 1973-Usha Jaganath Guide
372 pages
Genmath. 1st Quarter Reviewer
No ratings yet
Genmath. 1st Quarter Reviewer
16 pages
0320-020T User's Manual
No ratings yet
0320-020T User's Manual
59 pages
Revit 2012 API Developer Guide PDF
50% (2)
Revit 2012 API Developer Guide PDF
433 pages
L1-V4-04-MiCOM C264 HMI-F-03
No ratings yet
L1-V4-04-MiCOM C264 HMI-F-03
35 pages
Digital Transmission
No ratings yet
Digital Transmission
9 pages
Final Solutions
No ratings yet
Final Solutions
20 pages
Uts It Ug Course Guide
No ratings yet
Uts It Ug Course Guide
50 pages
SWE-212 CO&A Assignment 1
No ratings yet
SWE-212 CO&A Assignment 1
2 pages
Internet Infrastructure Security Guidelines For Africa
No ratings yet
Internet Infrastructure Security Guidelines For Africa
32 pages
∏ (Σ (P×R) ) −∏ (Σ (Q×R) ) 2. Q: R⋈ (Σ (S) ) : Σ (R⋈S) B) Σ (Rlojs) Rloj (Σ (S) ) D) Σ (R) Lojs
No ratings yet
∏ (Σ (P×R) ) −∏ (Σ (Q×R) ) 2. Q: R⋈ (Σ (S) ) : Σ (R⋈S) B) Σ (Rlojs) Rloj (Σ (S) ) D) Σ (R) Lojs
6 pages
Manual PowerView Software V1.0
No ratings yet
Manual PowerView Software V1.0
31 pages
The OpenAltaRica Platform - Getting Started
No ratings yet
The OpenAltaRica Platform - Getting Started
32 pages
G17 Final Project Report
No ratings yet
G17 Final Project Report
65 pages
Hashing
No ratings yet
Hashing
24 pages
Cropwat Training Tor 1
No ratings yet
Cropwat Training Tor 1
6 pages
"Photo Editing Application": A Project Report
No ratings yet
"Photo Editing Application": A Project Report
8 pages
A Project Report On "Payroll Management System"
No ratings yet
A Project Report On "Payroll Management System"
48 pages
25.kstar_line-interactive_ups_datasheet
No ratings yet
25.kstar_line-interactive_ups_datasheet
8 pages
Repurchase & New Joinee Program
No ratings yet
Repurchase & New Joinee Program
2 pages
User Manual Casio G-Shock GST-B400 (English - 27 Pages)
No ratings yet
User Manual Casio G-Shock GST-B400 (English - 27 Pages)
1 page

Lecture 03

Uploaded by

Lecture 03

Uploaded by

Lcp-Comparisons

General (non-string) comparison-based sorting algorithms are not optimal

(a) lcp(A, C) ≥ min{lcp(A, B), lcp(B, C)}.

Proof. Assume ` = lcp(A, B) ≤ lcp(B, C). The opposite case

(a) Now A[0..`) = B[0..`) = C[0..`) and thus lcp(A, C) ≥ `.

Lemma 1.31: Let A, B, B 0 and C be strings such that A ≤ B ≤ C and

(a) If lcp(A, B) > lcp(A, B 0 ), then B < B 0 .

Proof. We show (a); (b) is symmetric. Assume to the contrary that B ≥ B 0 .

The output is of the form

Let X be the last string appended to R (or ε if R = ∅). Then

• If ki > `j , then lcp(X, Si) > lcp(X, Tj ) and thus

• If ki = `j , then clearly lcp(Si, Tj ) ≥ ki and the call to LcpCompare is safe,

Whenever LcpCompare makes more than one, say t + 1 symbol

• Other comparison based sorting algorithms, for example heapsort and

An ordered array is a simple static data structure supporting queries in

Algorithm 1.36: Binary search

With strings as elements, however, the query time is

• O(m log n) in the worst case for a query string of length m.

• O(log n logσ n) on average for a random set of strings.

Lemma 1.37: Let A, B, B 0 and C be strings such that A ≤ B ≤ C and

Proof. Let Bmin = min{B, B 0 } and Bmax = max{B, B 0 }. By Lemma 1.30,

• We want to compare P and Smid.

• By using lcp-comparisons, we know lcp(Slef t, P ) and lcp(P, Sright).

By Lemmas 1.30 and 1.37,

• The average case query time is now O(log n).

• The worst case query time is still O(m log n) (exercise).

• This is achieved by choosing the midpoint as weighted average of the

• The worst case time complexity of the resulting algorithm (shown on

There is a complicated algorithm achieving this time complexity.

• We want to compare P and Smid.

• We have already compared P against Slef t and Sright, and we know

(a) If lcp(A, B) > lcp(A, B 0 ), then B < B 0 and lcp(B, B 0 ) = lcp(A, B 0 ).

In case (e), we use Lemma 1.30:

If a given LcpCompare call performs t + 1 symbol comparisons, mclp

Other comparison-based data structures such as binary search trees can be

• If H(S) 6= H(T ), then we must have S 6= T .

• If H(S) = H(T ), then S = T and S 6= T are both possible.

• A plain comparison of two strings is faster than computing the hashes.

However, a hash table can still be competitive in practice. Furthermore,

• The original choice is r = σ and q is a large prime.

• The hash function can be randomized by choosing q or r randomly. For

• A random choice over a set of possibilities has the additional advantage

Automata are much more powerful than tries in representing languages:

Efficient algorithms and data structures for sets of strings:

• Many of the algorithms are optimal.

The central role of longest common prefixes:

• LCP array LCPR and its sum ΣLCP (R).

• Ternary trie and string quicksort:

• MSD radix sort in O(ΣLCP (R) + n + σ) time:

• Complexity of string binary search without precomputed lcp information:

• LCP array and string binary search using lcp information:

You might also like