Unit 5 String Matching 2010
Unit 5 String Matching 2010
1. Introduction
We formalize the string-matching problem as follows. Given a text array, T[1 . . n], of n character and a
pattern array, P[1 . . m], of m characters. The problem is to find an integer s, called valid shift where 0
≤ s < n-m and T[s+1 . . .s+m] = P[1 . . m]. In other words, to find whether P in T i.e., whether P is a
substring of T. Generally string matching is used for pattern searching (text editors), DNA matching
etc.
Notation and terminology
We shall let ∑* (read “sigma-star”) denote the set of all finite-length strings formed using characters
from the alphabet ∑*. In this chapter, we consider only finite length strings. The zero-length empty
string, denoted ε, also belongs to ∑*. The length of a string x is denoted |x|. The concatenation of two
strings x and y, denoted xy, has length |x| + |y| and consists of the characters from x followed by the
characters from y. We say that a string w is a prefix of a string x, denoted, if x = wy . Similarly,
we say that a string w is a suffix of a string x, denoted , if x = yw. For example, we have
.
****************************************************************************
Q. Write an algorithm for naïve string matcher? What is its worst case complexity? Show the
comparisons the naïve string matcher makes for the pattern P=0001 in the text
T=000010001010001
NAÏVE_STRING_MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n-m do
4. if P[1 . . m] = T[s+1 . . s+m]
5. then return valid shift s
As shown in above figs pattern P is searched in text T for shift values s=0…….s-m. The match is
found for shift value s=2.
Note: refer the class notes for the solution of the problem.
*************************************************************************
Q. Write a rabin-karp algo for string matching. Given working modulo q=11.how may spurious hits does
the rabin- karp matcher encountered in the Text T=3151592653589793 when looking for pattern
P=26.
Q. Explain rabin-karp method with example.
2. Rabin-Karp Algorithm
Since p= ts. if and only if T [s + 1 . . s + m] = P[1 . .m]; thus, s is a valid shift if and only if ts = p.
If we compute p and ts. Quickly, then the pattern matching problem is reduced to comparing p with n-
m+1 integers.
We can compute p in time O(m) using Horner’s rule :
To compute the remaining values t1, t2, . . . , tn−m in time _(n−m), it suffices to observe that ts+1 can be
13 find new ts
[End of for loop s]
Solution to the Problem
Analysis: The running time of Knuth-Morris-Pratt algorithm is proportional to the time needed to read the
characters in text and pattern. In other words, the worst-case running time of the algorithm is O(m+n) and it
requires O(m) extra space. It is important to note that these quantities are independent of the size of the
underlying alphabet.
****************************************************************************
Q. Give a linear time algorithm to determine if a text T is a cyclic rotation of another string T’. For example, arc
and car are cyclic rotations of each others. Verify that your algorithm is correct for the example “arc” and “car”
Ans :
- we can show T is a cyclic rotation of T’ by showing TT=T’
Algo : we can use KMP matcher algo by assuming T=TT and T’ as pattern
Example: :