0% found this document useful (0 votes)
7 views13 pages

BNP Unit-5 Lecture 19

The document discusses string matching algorithms, focusing on the Naive String Matching and Rabin-Karp algorithms. It explains the process of finding a substring within a larger string and provides details on the complexity and implementation of these algorithms. Additionally, it highlights the performance of the Rabin-Karp algorithm in terms of preprocessing and average-case running time.

Uploaded by

aniketpsingh2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

BNP Unit-5 Lecture 19

The document discusses string matching algorithms, focusing on the Naive String Matching and Rabin-Karp algorithms. It explains the process of finding a substring within a larger string and provides details on the complexity and implementation of these algorithms. Additionally, it highlights the performance of the Rabin-Karp algorithm in terms of preprocessing and average-case running time.

Uploaded by

aniketpsingh2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Design & Analysis of Algorithms

(KCS-503)
Unit-5
String matching
Course Outline:-
⮚ String Matching
⮚ Naive String Matching
⮚ Rabin-Karp-String Matching
String Matching

String Matching Algorithm is also called "String Searching Algorithm." This is a vital
class of string algorithm is declared as "this is the method to find a place where one is
several strings are found within the larger string.“
• Given a text array, T [1.....n], of n character and a pattern array, P [1......m], of m
characters. The problems are to find an integer s, called valid shift where 0 ≤ s < n-m
and T [s+1......s+m] = P [1......m].
• In other words, to find even if P in T, i.e., where P is a substring of T. The item of P
and T are character drawn from some finite alphabet such as {0, 1} or {A, B .....Z, a,
b..... z}.Given a string T [1......n], the substrings are represented as T [i......j] for some
0≤i ≤ j≤n-1, the string formed by the characters in T from index i to index j,
inclusive. This process that a string is a substring of itself (take i = 0 and j =m).
Algorithms used for String Matching

There are different types of method is used to finding the


string

• The Naive String Matching Algorithm


• The Rabin-Karp-Algorithm
• Finite Automata
• The Knuth-Morris-Pratt Algorithm
The Naive String Matching Algorithm
The naïve approach tests all the possible placement of Pattern P [1.......m] relative
to text T [1......n]. We try shift s = 0, 1.......n-m, successively and for each shift s.
Compare T [s+1.......s+m] to P [1......m].
The naïve algorithm finds all valid shifts using a loop that checks the condition P
[1.......m] = T [s+1.......s+m] for each of the n - m +1 possible value of s.
NAIVE-STRING-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. for s ← 0 to n -m
4. do if P [1.....m] = T [s + 1....s + m]
5. then print "Pattern occurs with shift" s
The Naive String Matching Algorithm
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least
m characters at the end) times and in iteration we are doing m
comparisons. So the total complexity is O ((n-m+1).m).

Show the comparisons the naive string matcher makes for the
pattern P = 0001 in the text T = 000010001010001.
The Rabin-Karp-Algorithm
Rabin and Karp have proposed a string-matching algorithm that performs well in
practice and that also generalizes to other algorithms for related problems, such as
two-dimensional pattern matching. The Rabin-Karp algorithm uses Θ(m)
preprocessing time, and its worst-case running time is Θ((n - m +1)m). Based on
certain assumptions, however, its average-case running time is better.
Given a pattern P[1………m], we let p denote its corresponding decimal value. In
a similar manner, given a text T [1………n], we let t s denote the decimal value of
the length-m substring T[s + 1…… s + m], for s = 0, 1, . . . , n - m. Certainly, t s = p
if and only if T [s + 1.. s + m] = P[1……m]; thus, s is a valid shift if and only if t s =
p. If we could compute p in time Θ(m) and all the t s values in a total of Θ(n - m +
1) time,[1] then we could determine all valid shifts s in time Θ(m) + Θ(n - m + 1) =
Θ(n) by comparing p with each of the ts's.
The Rabin-Karp-Algorithm
In general, with a d-ary alphabet {0, 1, . . . ,d - 1}, we choose q so that d q fits within a
computer word and adjust the recurrence equation to work modulo q, so that it
becomes
ts+1 = (d(ts - T[s + 1]h) + T[s + m + 1]) mod q ,
where h = dm-1 (mod q) is the value of the digit "1" in the high-order position of an m-
digit text window.
Any shift s for which ts = p (mod q) must be tested further to see if s is really valid or
we just have a spurious hit. This testing can be done by explicitly checking the
condition P[1 . . m] = T[s + 1 . . s + m]. If q is large enough, then we can hope that
spurious hits occur infrequently enough that the cost of the extra checking is low.
The Rabin-Karp-Algorithm
RABIN-KARP-MATCHER (T, P, d, q)
1. n ← length [T]
2. m ← length [P]
3. h ← dm-1 mod q
4. p ← 0
5. t0 ← 0
6. for i ← 1 to m
7. do p ← (dp + P[i]) mod q
8. t0 ← (dt0+T [i]) mod q
9. for s ← 0 to n-m
10. do if p = ts
11. then if P [1.....m] = T [s+1.....s + m]
12. then "Pattern occurs with shift" s
13. If s < n-m
14. then ts+1 ← (d (ts -T [s+1]h)+T [s+m+1])mod q
The Rabin-Karp-Algorithm
The running time of RABIN-KARP-MATCHER is ((n - m +
1)m) in the worst case, since (like the naive string-matching
algorithm) the Rabin-Karp algorithm explicitly verifies every
valid shift. If P = am and T = an, then the verifications take time
((n - m + 1)m), since each of the n - m + 1 possible shifts is valid.
(Note also that the computation of d m-1 mod q on line 3 and the
loop on lines 6-8 take time O(m) = O((n - m + 1 )m).) q
Numerical
Working modulo q = 11, how many spurious hits
does the Rabin-Karp matcher encounter in the
text T = 3141592653589793 when looking for the
pattern P = 26?
The End

B N Pandey 7/5/2020

You might also like