0% found this document useful (0 votes)
26 views

Unit 5 String Matching 2010

Uploaded by

Aesthete Tushhuu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Unit 5 String Matching 2010

Uploaded by

Aesthete Tushhuu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Unit-V

String Matching Techniques

1. Introduction
We formalize the string-matching problem as follows. Given a text array, T[1 . . n], of n character and a
pattern array, P[1 . . m], of m characters. The problem is to find an integer s, called valid shift where 0
≤ s < n-m and T[s+1 . . .s+m] = P[1 . . m]. In other words, to find whether P in T i.e., whether P is a
substring of T. Generally string matching is used for pattern searching (text editors), DNA matching
etc.
Notation and terminology
We shall let ∑* (read “sigma-star”) denote the set of all finite-length strings formed using characters
from the alphabet ∑*. In this chapter, we consider only finite length strings. The zero-length empty
string, denoted ε, also belongs to ∑*. The length of a string x is denoted |x|. The concatenation of two
strings x and y, denoted xy, has length |x| + |y| and consists of the characters from x followed by the
characters from y. We say that a string w is a prefix of a string x, denoted, if x = wy . Similarly,
we say that a string w is a suffix of a string x, denoted , if x = yw. For example, we have
.

****************************************************************************
Q. Write an algorithm for naïve string matcher? What is its worst case complexity? Show the
comparisons the naïve string matcher makes for the pattern P=0001 in the text
T=000010001010001

1. Naïve String Matching


The naïve approach simple test all the possible placement of Pattern P[1 . . m] relative to text T[1 . . n].
Specifically, we try shift s = 0, 1, . . . , n-m, successively and for each shift, s. Compare T[s+1 . . s+m]
to P[1 . . m]

NAÏVE_STRING_MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n-m do
4. if P[1 . . m] = T[s+1 . . s+m]
5. then return valid shift s

1 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST


Complexity: Worst-case= O ((n-m+1) m) if m=n/2 then f(n)=O(n2)

Example: consider the text T=acaabc and pattern P=aab.

 As shown in above figs pattern P is searched in text T for shift values s=0…….s-m. The match is
found for shift value s=2.

Note: refer the class notes for the solution of the problem.

*************************************************************************
Q. Write a rabin-karp algo for string matching. Given working modulo q=11.how may spurious hits does
the rabin- karp matcher encountered in the Text T=3151592653589793 when looking for pattern
P=26.
Q. Explain rabin-karp method with example.

2. Rabin-Karp Algorithm

 Suppose we have given a pattern P[1..m] and text T[1…….n].


In this method we transform (hash) a pattern P[1…m] into an equivalent integer p, Similarly, we
transform substrings in the text string T[1……n] into integers For s=0,1,…,n-m, transform
T[s+1..s+m] to an equivalent integer ts

 Since p= ts. if and only if T [s + 1 . . s + m] = P[1 . .m]; thus, s is a valid shift if and only if ts = p.

 If we compute p and ts. Quickly, then the pattern matching problem is reduced to comparing p with n-
m+1 integers.
 We can compute p in time O(m) using Horner’s rule :

p = P[m] + 10 (P[m − 1] + 10(P[m − 2]+· · ·+10(P[2] + 10P[1]) · · ·)) .

The value t0 can be similarly computed from T [1 . . m] in time _(m).

 To compute the remaining values t1, t2, . . . , tn−m in time _(n−m), it suffices to observe that ts+1 can be

computed from ts in constant time, since

2 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST


 The problem with the above transformation is that for a large pattern the number representing
the string is too large. This problem can be removed with the help of modulo operation. So integer
value of P and T are calculated as-
o p= P[1…m] mod q
o ts = T[s+1…..s+m] mod q
 Where q is the working modulo and it is a prime number.
 Once we use the modulo arithmetic , when p=ts we are not sure that P[1…..m]=T[s+1…..s+m] for
some s. so after the equality test p=ts, we should compare p[1…m]=T[s+1….s+m], character by
character to ensure that we really have a match.
 If the hash value (transformed value) matches (p=ts) but the string does not match this is called
spurious hit.
Algorithm:

13 find new ts
[End of for loop s]
Solution to the Problem

3 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST


Q. Explain Knuth-Morris-Pratt method with its algorithm.
Q. Explain kunth-morris-pratt string matching algorithm. Write an algorithm to find Prefix
function. Calculate the prefix function phi for the patter – a b b a b a

The Knuth-Morris-Pratt algorithm


We now present a linear-time string-matching algorithm due to Knuth, Morris, and Pratt. This algorithm makes use
of the shift operation intelligently.

The Prefix Function π


The KMP algorithm preprocess the pattern P by computing a prefix function π that indicates the largest
possible shift s using previously performed comparisons. Specifically, the prefix function π (q) is defined as
the length of the longest prefix of P.

Analysis: The running time of Knuth-Morris-Pratt algorithm is proportional to the time needed to read the
characters in text and pattern. In other words, the worst-case running time of the algorithm is O(m+n) and it
requires O(m) extra space. It is important to note that these quantities are independent of the size of the
underlying alphabet.

4 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST


As an example, consider the pattern P = a b b a b a . The prefix function, using above algorithm is

Note: this is an answer for the problem given in question.

****************************************************************************
Q. Give a linear time algorithm to determine if a text T is a cyclic rotation of another string T’. For example, arc
and car are cyclic rotations of each others. Verify that your algorithm is correct for the example “arc” and “car”

Ans :
- we can show T is a cyclic rotation of T’ by showing TT=T’

Algo : we can use KMP matcher algo by assuming T=TT and T’ as pattern

KMP –matcher (T, T’)


1. T=TT // concatenate T and T
2. n= length[T] , m= length[T’]
3. Computer prefix function pi
4. q=0
5. for i= 1 to n
6. do while q>0 and T’[q+1]<> T[i]
7. do q=pi[q]
8. if T’[q+1]=T[i] then q=q+1
9. if q=m then print “pattern occurs”
10. q= pi[q]

Example: :

Since KMP is a linear time algorithm.

5 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST

You might also like