0% found this document useful (0 votes)
26 views

Unit 5 String Matching 2010

Uploaded by

Aesthete Tushhuu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Unit 5 String Matching 2010

Uploaded by

Aesthete Tushhuu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Unit-V

String Matching Techniques

1. Introduction
We formalize the string-matching problem as follows. Given a text array, T[1 . . n], of n character and a
pattern array, P[1 . . m], of m characters. The problem is to find an integer s, called valid shift where 0
≤ s < n-m and T[s+1 . . .s+m] = P[1 . . m]. In other words, to find whether P in T i.e., whether P is a
substring of T. Generally string matching is used for pattern searching (text editors), DNA matching
etc.
Notation and terminology
We shall let ∑* (read “sigma-star”) denote the set of all finite-length strings formed using characters
from the alphabet ∑*. In this chapter, we consider only finite length strings. The zero-length empty
string, denoted ε, also belongs to ∑*. The length of a string x is denoted |x|. The concatenation of two
strings x and y, denoted xy, has length |x| + |y| and consists of the characters from x followed by the
characters from y. We say that a string w is a prefix of a string x, denoted, if x = wy . Similarly,
we say that a string w is a suffix of a string x, denoted , if x = yw. For example, we have
.

****************************************************************************
Q. Write an algorithm for naïve string matcher? What is its worst case complexity? Show the
comparisons the naïve string matcher makes for the pattern P=0001 in the text
T=000010001010001

1. Naïve String Matching


The naïve approach simple test all the possible placement of Pattern P[1 . . m] relative to text T[1 . . n].
Specifically, we try shift s = 0, 1, . . . , n-m, successively and for each shift, s. Compare T[s+1 . . s+m]
to P[1 . . m]

NAÏVE_STRING_MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n-m do
4. if P[1 . . m] = T[s+1 . . s+m]
5. then return valid shift s

1 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST


Complexity: Worst-case= O ((n-m+1) m) if m=n/2 then f(n)=O(n2)

Example: consider the text T=acaabc and pattern P=aab.

 As shown in above figs pattern P is searched in text T for shift values s=0…….s-m. The match is
found for shift value s=2.

Note: refer the class notes for the solution of the problem.

*************************************************************************
Q. Write a rabin-karp algo for string matching. Given working modulo q=11.how may spurious hits does
the rabin- karp matcher encountered in the Text T=3151592653589793 when looking for pattern
P=26.
Q. Explain rabin-karp method with example.

2. Rabin-Karp Algorithm

 Suppose we have given a pattern P[1..m] and text T[1…….n].


In this method we transform (hash) a pattern P[1…m] into an equivalent integer p, Similarly, we
transform substrings in the text string T[1……n] into integers For s=0,1,…,n-m, transform
T[s+1..s+m] to an equivalent integer ts

 Since p= ts. if and only if T [s + 1 . . s + m] = P[1 . .m]; thus, s is a valid shift if and only if ts = p.

 If we compute p and ts. Quickly, then the pattern matching problem is reduced to comparing p with n-
m+1 integers.
 We can compute p in time O(m) using Horner’s rule :

p = P[m] + 10 (P[m − 1] + 10(P[m − 2]+· · ·+10(P[2] + 10P[1]) · · ·)) .

The value t0 can be similarly computed from T [1 . . m] in time _(m).

 To compute the remaining values t1, t2, . . . , tn−m in time _(n−m), it suffices to observe that ts+1 can be

computed from ts in constant time, since

2 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST


 The problem with the above transformation is that for a large pattern the number representing
the string is too large. This problem can be removed with the help of modulo operation. So integer
value of P and T are calculated as-
o p= P[1…m] mod q
o ts = T[s+1…..s+m] mod q
 Where q is the working modulo and it is a prime number.
 Once we use the modulo arithmetic , when p=ts we are not sure that P[1…..m]=T[s+1…..s+m] for
some s. so after the equality test p=ts, we should compare p[1…m]=T[s+1….s+m], character by
character to ensure that we really have a match.
 If the hash value (transformed value) matches (p=ts) but the string does not match this is called
spurious hit.
Algorithm:

13 find new ts
[End of for loop s]
Solution to the Problem

3 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST


Q. Explain Knuth-Morris-Pratt method with its algorithm.
Q. Explain kunth-morris-pratt string matching algorithm. Write an algorithm to find Prefix
function. Calculate the prefix function phi for the patter – a b b a b a

The Knuth-Morris-Pratt algorithm


We now present a linear-time string-matching algorithm due to Knuth, Morris, and Pratt. This algorithm makes use
of the shift operation intelligently.

The Prefix Function π


The KMP algorithm preprocess the pattern P by computing a prefix function π that indicates the largest
possible shift s using previously performed comparisons. Specifically, the prefix function π (q) is defined as
the length of the longest prefix of P.

Analysis: The running time of Knuth-Morris-Pratt algorithm is proportional to the time needed to read the
characters in text and pattern. In other words, the worst-case running time of the algorithm is O(m+n) and it
requires O(m) extra space. It is important to note that these quantities are independent of the size of the
underlying alphabet.

4 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST


As an example, consider the pattern P = a b b a b a . The prefix function, using above algorithm is

Note: this is an answer for the problem given in question.

****************************************************************************
Q. Give a linear time algorithm to determine if a text T is a cyclic rotation of another string T’. For example, arc
and car are cyclic rotations of each others. Verify that your algorithm is correct for the example “arc” and “car”

Ans :
- we can show T is a cyclic rotation of T’ by showing TT=T’

Algo : we can use KMP matcher algo by assuming T=TT and T’ as pattern

KMP –matcher (T, T’)


1. T=TT // concatenate T and T
2. n= length[T] , m= length[T’]
3. Computer prefix function pi
4. q=0
5. for i= 1 to n
6. do while q>0 and T’[q+1]<> T[i]
7. do q=pi[q]
8. if T’[q+1]=T[i] then q=q+1
9. if q=m then print “pattern occurs”
10. q= pi[q]

Example: :

Since KMP is a linear time algorithm.

5 @2010 Mr. Shankar Thawkar, HOD, IT dept. HCST

You might also like