0% found this document useful (0 votes)
19 views

Lecture15 String Matching

The document discusses the Rabin-Karp algorithm for string matching. It transforms the pattern and substrings of the text into integers using hash functions to reduce the problem to comparing integers. It uses modulo arithmetic to ensure computations can be done with single-precision. While the worst case is still O(nm), it avoids many unnecessary string comparisons in practice.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lecture15 String Matching

The document discusses the Rabin-Karp algorithm for string matching. It transforms the pattern and substrings of the text into integers using hash functions to reduce the problem to comparing integers. It uses modulo arithmetic to ensure computations can be done with single-precision. While the worst case is still O(nm), it avoids many unnecessary string comparisons in practice.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 10

String Matching

String matching 2

Pattern Matching
 Given a text string T[0..n-1] and a pattern
P[0..m-1], find all occurrences of the pattern
within the text.

 Example: T = 000010001010001 and P =


0001, the occurrences are:
 first occurrence starts at T[1]
 second occurrence starts at T[5]
 third occurrence starts at T[11]
String matching 3

Naïve algorithm

Worst-case running time = O(nm).


String matching 4

Rabin-Karp Algorithm
 Key idea:
 think of the pattern P[0..m-1] as a key, transform
(hash) it into an equivalent integer p
 Similarly, we transform substrings in the text string
T[] into integers
For s=0,1,…,n-m, transform T[s..s+m-1] to an equivalent
integer ts
 The pattern occurs at position s if and only if p=ts
 If we compute p and ts quickly, then the
pattern matching problem is reduced to
comparing p with n-m+1 integers
String matching 5

Rabin-Karp Algorithm …
 How to compute p?
p = 2m-1 P[0] + 2m-2 P[1] + … + 2 P[m-2] + P[m-1]

 Using horner’s rule

This takes O(m) time, assuming each arithmetic operation


can be done in O(1) time.
String matching 6

Rabin-Karp Algorithm …
 Similarly, to compute the (n-m+1) integers ts from the
text string

 This takes O((n – m + 1) m) time, assuming that each


arithmetic operation can be done in O(1) time.
 This is a bit time-consuming.
String matching 7

Rabin-Karp Algorithm
 A better method to compute the integers is:

This takes O(n+m) time, assuming that each arithmetic


operation can be done in O(1) time.
String matching 8

Problem
 The problem with the previous strategy is that when m
is large, it is unreasonable to assume that each
arithmetic operation can be done in O(1) time.
 In fact, given a very long integer, we may not even be able to
use the default integer type to represent it.

 Therefore, we will use modulo arithmetic. Let q be a


prime number so that 2q can be stored in one
computer word.
 This makes sure that all computations can be done using
single-precision arithmetic.
String matching 9
String matching 10

 Once we use the modulo arithmetic, when p=ts for


some s, we can no longer be sure that P[0 .. M-1] is
equal to T[s .. S+ m -1 ]

 Therefore, after the equality test p = ts, we should


compare P[0..m-1] with T[s..s+m-1] character by
character to ensure that we really have a match.

 So the worst-case running time becomes O(nm), but it


avoids a lot of unnecessary string matchings in
practice.

You might also like