Lecture15 String Matching
Lecture15 String Matching
String matching 2
Pattern Matching
Given a text string T[0..n-1] and a pattern
P[0..m-1], find all occurrences of the pattern
within the text.
Naïve algorithm
Rabin-Karp Algorithm
Key idea:
think of the pattern P[0..m-1] as a key, transform
(hash) it into an equivalent integer p
Similarly, we transform substrings in the text string
T[] into integers
For s=0,1,…,n-m, transform T[s..s+m-1] to an equivalent
integer ts
The pattern occurs at position s if and only if p=ts
If we compute p and ts quickly, then the
pattern matching problem is reduced to
comparing p with n-m+1 integers
String matching 5
Rabin-Karp Algorithm …
How to compute p?
p = 2m-1 P[0] + 2m-2 P[1] + … + 2 P[m-2] + P[m-1]
Rabin-Karp Algorithm …
Similarly, to compute the (n-m+1) integers ts from the
text string
Rabin-Karp Algorithm
A better method to compute the integers is:
Problem
The problem with the previous strategy is that when m
is large, it is unreasonable to assume that each
arithmetic operation can be done in O(1) time.
In fact, given a very long integer, we may not even be able to
use the default integer type to represent it.