Advanced String Lecture
Advanced String Lecture
String: “abcdeacdoe”
Pattern: “bcd”
Rabin-Karp Algorithm
● AKA Karp–Rabin algorithm
● Uses hashing (more specifically Rolling Hash)
● Helps quickly filter out positions of the text that cannot match the pattern, and
then checks for a match at the remaining positions
What is Rolling hash?
● A rolling hash is a hash function where the input is hashed in a window that
moves through the input.
● Think of it as a wheel moving on an inclined plane
Steps followed in Rolling hash pattern matching
1. Calculate Hash for Pattern
2. Calculate hash for 1st window in substring
3. Repeat step 2 until we get to the end of the substring
Trivial Rolling Hash Drawbacks?
● What are the drawbacks of rolling hash?
Let’s consider the case
String: AABAABCABA
Rabin–Karp string search algorithm
This is a simple rolling hash function that only uses multiplications and additions
A B E D A
● There is a method of making this process of finding the hash more efficient!
Steps to follow in Rabin-Karp’s algorithm
1. Subtract the value of the character removed from the window from the old
hash
2. Divide the old hash value with the number we picked
3. Add the new character value in the window multiplied by the length of the
pattern - 1
● So in the previous example:
● 52 - (val(A)) = 51
● 51/(prime) = 17
● 17 + (val(D)*3^2) = 53
Practice:
String: “AABAACAADAABAABA”
Pattern = “AABA”
A A B A A C A A D A A B A A B A
Time Complexity
● Best-case? Worst-case?
● When is worst-case time complexity achieved?
● How to reduce worst-case scenarios?
KMP Pattern Matching
- Knuth-Morris-Pratt Algorithm
- The basic idea behind KMP’s algorithm is: whenever we detect a mismatch
(after some matches), we already know some of the characters in the text of
the next window
- KMP algorithm was the first linear time complexity algorithm for string
matching.
● KMP algorithm is used to find a "Pattern" in a "Text". This algorithm compares
character by character from left to right. But whenever a mismatch occurs, it
uses a preprocessed table called "Prefix Table" to skip characters comparison
while matching.
● Some times prefix table is also known as LPS Table. Here LPS stands for
"Longest proper Prefix which is also Suffix".
Steps for creating LPS Table (Preprocessing)
1. Define a one dimensional array with the size equal to the length of the
Pattern. (LPS[size])
2. Define variables i & j. Set i = 0, j = 1 and LPS[0] = 0.
3. Compare the characters at Pattern[i] and Pattern[j].
4. If both are matched then set LPS[i] = j+1 and increment both i & j values by
one. Go to Step 3.
5. If both are not matched then check the value of variable 'j'. If it is '0' then set
LPS[i] = 0 and increment 'i' value by one, if it is not '0' then set i = LPS[j-1]. Go
to Step 3.
6. Repeat above steps until all the values of LPS[] are filled.
Example
String: abcdabca
a b c d a b c a
0 1 2 3 4 5 6 7
Step 2:
i, j
a b c d a b c a
0 1 2 3 4 5 6 7
j i
a b c d a b c a
0 1 2 3 4 5 6 7
0 0
● pat[i] != pat[j]
● I += 1
j i
a b c d a b c a
0 1 2 3 4 5 6 7
0 0 0
j i
a b c d a b c a
0 1 2 3 4 5 6 7
0 0 0 0
j i
a b c d a b c a
0 1 2 3 4 5 6 7
0 0 0 0 1
0 0 0 0 1 2
j i
a b c d a b c a
0 1 2 3 4 5 6 7
0 0 0 0 1 2 3
j i
a b c d a b c a
0 1 2 3 4 5 6 7
0 0 0 0 1 2 3 ?
0 1 2 3 4 5 6 7
0 0 0 0 1 2 3 1
0 1 2 3 4 5 6 7
0 0 0 0 1 2 3 1
Pattern: abcaby
0 1 2 3 4 5
0 0 0 1 2 0
I
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
● LPS[j] != text[i]
● What do we do next?
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
● LPS[j-1] = 2. As a result j = 2.
● This is because “ab” is a prefix that is also a suffix. We don’t need to check from the
beginning.
● Since text[i] == LPS[jnew] both i and j move to next index.
● What if LPS[jnew] was different with text[i]?
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
i
a b x a b c a b c a b y
j
a b c a b y
0 1 2 3 4 5
0 0 0 1 2 0
String: adsgwadsxdsgwadsgz
Pattern: dsgwadsgz
Questions
1. Longest Happy Prefix
2. Find the Index of the First Occurrence in a String