0% found this document useful (0 votes)
3 views

StringMatchingAlgorithms Rabin and finite

The Rabin-Karp Algorithm is a string searching method that utilizes hashing to efficiently find multiple pattern strings within a text. It computes hash values for both the pattern and text subsequences, allowing for quick comparisons and reducing the need for character matching unless hash values match. The algorithm has a time complexity of O(n-m+1) on average, but can degrade to O(mn) in the worst-case scenario due to spurious hits.

Uploaded by

mudit6565
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

StringMatchingAlgorithms Rabin and finite

The Rabin-Karp Algorithm is a string searching method that utilizes hashing to efficiently find multiple pattern strings within a text. It computes hash values for both the pattern and text subsequences, allowing for quick comparisons and reducing the need for character matching unless hash values match. The algorithm has a time complexity of O(n-m+1) on average, but can degrade to O(mn) in the worst-case scenario due to spurious hits.

Uploaded by

mudit6565
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Rabin-Karp Algorithm

• Rabin-Karp Algorithm is a string searching algorithm created by Richard M.


Karp and Michael O. Rabin in 1987.
• The algorithm uses hashing to find a set of pattern strings in a text.
• It is an other application of hashing.
• Widely used for multiple pattern search.

Slide 1
The Rabin-Karp-Algorithm

• The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as
well as for each M-character subsequences of text to be compared.
• If the hash values are unequal, the algorithm will determine the hash value for next
M-character sequence.
• If the hash values are equal, the algorithm will analyze the pattern and the M-
character sequence.
• In this way, there is only one comparison per text subsequence, and character
matching is only required when the hash values match.

Slide 2
Rabin-Karp Algorithm

• Generates a hash of pattern that we are looking for in the text.


• Check if the rolling hash of text matches the pattern or not
• If it doesn't match, then pattern doesn't exist in the text.
• However, if it does match, the pattern can be present in the text
Calculating hash value in Rabin Karp Algorithm

Slide 4
Step 2: Calculate hash value of Pattern

Slide 5
Step 3: Calculate hash value of first Text window

Slide 6
Step 4: Updating the hash value

• Now, we need to remove the previous character and move to the next character. In
this process, the hash value should also be updated till we find the match.

Slide 7
Rabin-Karp Algorithm

Text: ccaccaaedba n=11 Codes


a–1
Pattern: d b a m=3 :.There are 3 letters, and total no. of letters in b–2
codes are 10, so we apply
4x102 + 2x101 + 1x100 P[1]*10m-1+ P[1]*10m-2 + P[1]*10m-3
c–3
4*100+ 2*10 + 1*10 = 421 :. If letters are more than 10 we take that d–4
number as base instead of 10
e–5
Hash code f–6
Text: c c a c c a a e d b a
g–7
3x102 + 3x101 + 1x100 = 331
h–8
Pattern: d b a m= 3
i–9
4x102 + 2x101 + 1x100 = 421 j – 10
Note: Letters are assigned assumed numbers,
you can use actual ones.
Rabin-Karp Algorithm
Text: c c a c c a a e d b a Pattern: dba =421

3x102 + 3x101 + 1x100 = 331 Codes


c c a c c a a e d b a Rolling hash a–1
331-3x102 =31 b–2
31x10 or (3x101 + 1x100 ) x10 = 310+ 3x100 = 313 c–3
3x102 + 1x101 + 3x100
d–4
e–5
c c a c c a a e d b a
f–6
313-3x102= 13
g–7
( 1x101 + 3x100 )10= 130+ 3x100 = 133
h–8
1x102 + 3x101 + 3x100 i–9
c c a c c a a e d b a j – 10
133- 1x102 = 33
Rabin-Karp Algorithm

Text: c c a c c a a e d b a Pattern: dba =421 Codes


133 - 1x102 = 33 a–1
(3x101 + 3x100 ) x10 = 330+ 1x100 = 331 b–2
3x102 + 3x101 + 1x100 c–3
c c a c c a a e d b a d–4
133 - 1x102 = 33 e–5
(3x101 + 1x100 ) x10 = 310+ 1x100 = 311 f–6
3x102 + 1x101 + 1x100 g–7
c c a c c a a e d b a h–8
311 - 3x102 = 11 i–9
(1x101 + 1x100 ) x10 = 110+ 5x100 = 115 j – 10
1x102 + 1x101 + 5x100
Rabin-Karp Algorithm

Text: c c a c c a a e d b a Pattern: dba =421 Codes


311 - 3x102 = 11 a–1
(1x101 + 1x100 ) x10 = 110+ 5x100 = 115 b–2
1x102 + 1x101 + 5x100 c–3
c c a c c a a e d b a d–4
115 - 1x102 = 15 e–5
(1x101 + 5x100 ) x10 = 150+ 4x100 = 154 f–6
1x102 + 5x101 + 4x100 g–7
c c a c c a a e d b a h–8
154 - 1x102 = 54 i–9
(5x101 + 4x100 ) x10 = 540+ 2x100 = 542 j – 10
5x102 + 4x101 + 2x100
Rabin-Karp Algorithm

Text: c c a c c a a e d b a Pattern: dba =421 Codes


154 - 1x102 = 54 a–1
(5x101 + 4x100 ) x10 = 540+ 2x100 = 542
b–2
c–3
5x102 + 4x101 + 2x100
d–4
c c a c c a a e d b a
e–5
542 - 5x102 = 42
f–6
(4x101 + 2x100 ) x10 = 420+ 1x100 = 421 g–7
4x102 + 2x101 + 1x100 h–8
i–9
Pattern match- the calculations are called rolling hash j – 10
Rabin-Karp Algorithm

• Time complexity is O(n-m+1)


• Worst time is O(mn) for spurious (fake) hits
The Rabin-Karp-Algorithm

• For string matching, working module q = 11, how many spurious hits does the Rabin-
Karp matcher encounters in Text T = 31415926535.......

Slide 14
The Rabin-Karp-Algorithm

Slide 15
The Rabin-Karp-Algorithm

Slide 16
The Rabin-Karp-Algorithm

Slide 17
The Rabin-Karp-Algorithm

Slide 18
Complexity:

• The running time of RABIN-KARP-MATCHER in the worst case scenario O ((n-m+1) m but it
has a good average case running time.
• If the expected number of strong shifts is small O (1) and prime q is chosen to be quite large,
then the Rabin-Karp algorithm can be expected to run in time O (n+m) plus the time to
require to process spurious hits.

Slide 19
Finite Automata

• The string-matching automaton is very efficient: it examines each character in the


text exactly once and reports all the valid shifts in O(n) time.
Basic Idea:
• Each character in the pattern has a state.
• Each match sends the automaton into a new state.
• If all the characters in the pattern has been matched, the automaton enters the
accepting state.
• Otherwise, the automaton will return to a suitable state according to the current
state and the input character such that this returned state reflects the maximum
advantage we can take from the previous matching.
• the matching takes O(n) time since each character is examined once.
Finite Automata

Terminology:
• Five tuples or ingredients are: (Q, q0, A, ∑, δ)
• Q is a finite set of states.
• q0 Q is the start state.
• A Q is a distinguish set of accepting states.
• ∑ (sigma) is a finite input alphabet
• δ (delta) is a function from Q × ∑ into Q, called the transition function of M.
Finite Automata

Terminology:
• Five tuples or ingredients are: (Q, q0, A, ∑, δ)
• Q is a finite set of states = {0,1}
• q0 Q is the start state = {0}
• A Q is a distinguish set of accepting states.
• ∑ (sigma) is a finite input alphabet ={a,b}
• δ (delta) is a function from Q × ∑ into Q, called the transition function of M. = δ(0,a)->1
= δ(1,a)->0
= δ(0,b)->0
Finite Automata

Terminology:
• Five tuples or ingredients are: (Q, q0, A, ∑, δ)
• Q is a finite set of states = {0,1}
• q0 Q is the start state = {0}
• A Q is a distinguish set of accepting states. (or final states)
• ∑ (sigma) is a finite input alphabet ={a,b}
Cartesian product Q∑

• δ (delta) is a transition function which maps Q × ∑ into Q, i-e called Q × ∑ -> Q = δ(0,a)->1 Input (alphabets)
Mapping is represented through Transition Table or Transition function. = δ(1,a)->0
= δ(0,b)->0 a b
0 1 0
states 1 0 0

Transition Table
Finite Automata

Text= a b a b a b a c a b a Pattern= a b a b a c a

• The first step is to make finite automata of given pattern


• While making pattern there are two concepts involved:
• Prefix : take a substring of pattern string from left-to-right
e.g a, ab, aba, abab, ababa, ababac ……
• Suffix : take a substring of pattern string from right-to-left
e.g a, ca, aca, baca, abaca, babaca, ….
• While making automata it is important to note were prefix and suffix match.
String Matching with Finite Automata

• The string-matching automaton is a very useful tool which is used in string matching
algorithm.
• It examines every character in the text exactly once and reports all the valid shifts in
O (n) time. The goal of string matching is to find the location of specific text pattern
within the larger body of text (a sentence, a paragraph, a book, etc.)

Slide 25
Finite Automata:

• A finite automaton M is a 5-tuple (Q, q0,A,∑,δ), where


• Q is a finite set of states,
• q0 ∈ Q is the start state,
• A ⊆ Q is a notable set of accepting states,
• ∑ is a finite input alphabet,
• δ is a function from Q x ∑ into Q called the transition function of M.

Slide 26
Finite Automata:

• The finite automaton starts in state q0 and reads the characters of its input string one
at a time. If the automaton is in state q and reads input character a, it moves from
state q to state δ (q, a). Whenever its current state q is a member of A, the machine
M has accepted the string read so far. An input that is not allowed is rejected.

• A finite automaton M induces a function ∅ called the called the final-state function,
from ∑* to Q such that ∅(w) is the state M ends up in after scanning the string w.
Thus, M accepts a string w if and only if ∅(w) ∈ A.

Slide 27
Finite Automata

Text= a b a b a b a c a b a Pattern= a b a b a c a
• Pattern has 7 letters so state will be from 0-7
• ∑ (sigma) is a finite input alphabet ={a,b,c}
• We will check all symbols on each state.
• Check for prefix and suffix for each letter.
• The count of letters in a match with prefix and suffix determines the state number.
a a
a
a b a b a c a
Initial state 0 1 2 3 4 5 6 7 End state
b b
Finite Automata
a
Finite Automata

• T= a b a b a b a c a b a P= a b a b a c a
• While making machines
• First check match with pattern, and change the state
• If pattern doesn’t match, check prefix and suffix
• If prefix and suffix match, check no. of letters and determine the state number.
• If prefix and suffix doesn't match ignore and move on.
• After finite automata convert it into transition table. (next slide please)
Finite State Automata (FSA)
 An FSA is defined by 5 components
 Q is the set of states

q0 q1 q2 … qn
Finite State Automata (FSA)
 An FSA is defined by 5 components
 Q is the set of states

q0 q1 q2 … qn

 q0 is the start state q7

 A  Q, is the set of accepting states where |A| > 0


 Σ is the alphabet (e.g. {A, B}
  is the transition function from Q x Σ to Q
QΣ Q B

q 0 A q1
q0 B q2 q0
A
q1
A
q2 …
q 1 A q1

FSA operation

B A A

q0 q1 q1 q1
A B A

B
B

An FSA starts at state q0 and reads the characters of the input string one at a time.
If the automaton is in state q and reads character a, then it transitions to state (q,a).
If the FSA reaches an accepting state (q  A), then the FSA has found a match.
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

What pattern does this represent?


FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
FSA operation
P = ABA
B A A

q0 q1 q1 q1
A B A

B
B

S = BABABBABABA
Finite Automata:

• The function f is defined as

Slide 46
Finite Automata:

• The primary loop structure of FINITE- AUTOMATON-MATCHER implies that its running
time on a text string of length n is O (n).

• Computing the Transition Function: The following procedure computes the transition
function δ from given pattern P [1......m]

Slide 47
Finite Automata:

Slide 48
• Suppose a finite automaton which accepts even number of a's where ∑ = {a, b, c}

q0 is the initial state.

Slide 49
Slide 50
Finite Automata
a a
a
a b a b a c a
Initial state 0 1 2 3 4 5 6 7 End state
b b
Finite Automata
a
Transition Table
• T= a b a b a b a c a b a a b c
δ(0,a), δ(0,b), δ(0,c) 0 or q0 1 0 0
• P= a b a b a c a
δ(1,a), δ(1,b), δ(1,c) 1 or q1 1 2 0
δ(2,a), δ(2,b), δ(2,c) 2 or q2 3 0 0
δ(3,a), δ(3,b), δ(3,c) 3 or q3 1 4 0
δ(4,a), δ(4,b), δ(4,c) 4 or q4 5 0 0
δ(5,a), δ(5,b), δ(5,c) 5 or q5 1 4 6
δ(6,a), δ(6,b), δ(6,c) 6 or q6 7 0 0
δ(7,a), δ(7,b), δ(7,c) 7 or q7 1 2 0
Finite Automata
a a
a
a b a b a c a
Initial state 0 1 2 3 4 5 6 7 End state
b b
Finite Automata
a
Transition Table
• T= a b a b a b a c a b a (length n) a b c
0 or q0 1 0 0
• P= a b a b a c a (length 1 or q1 1 2 0
m)
2 or q2 3 0 0
3 or q3 1 4 0
4 or q4 5 0 0
State, input string text(T)
When state equals to length of pattern 5 or q5 1 4 6
6 or q6 7 0 0
7 or q7 1 2 0
ith index of text (T) – pattern length
Finite Automata
a a
a
a b a b a c a
Initial state 0 1 2 3 4 5 6 7 End state
b b
Finite Automata
• T= a b a b a b a c a b a (length n)
a
Transition Table
• P= a b a b a c a (length m) a b c
0 or q0 1 0 0
1 or q1 1 2 0
2 or q2 3 0 0
3 or q3 1 4 0
P= a b a b a c a 4 or q4 5 0 0
i - 1 2 3 4 5 6 7 8 9 10 11 5 or q5 1 4 6
9-7=2
T[i] - a b a b a b a c a b a 6 or q6 7 0 0
Pattern occurs with shift 2
state 0 1 2 3 4 5 4 5 6 7 7 or q7 1 2 0
Finite Automata

• Preprocessing Θ(m|Σ|) and matching time Θ(n)


References
• Book Introduction to algorithms, 3rd edition, Chapter String Matching
• https://www.youtube.com/watch?v=qQ8vS2btsxI check for spurious hits
• https://www.youtube.com/watch?v=M_XpGQyyqIQ
• http://cs.bc.edu/~alvarez/Algorithms/Notes/stringMatching2.html
• http://web.cs.mun.ca/~wang/courses/cs6783-13f/n2-string-1.pdf
• https://www.youtube.com/watch?v=-ZeP4KHibkU finite automata machine
• http://web.cs.mun.ca/~wang/courses/cs6783-13f/n2-string-1.pdf

You might also like