0% found this document useful (0 votes)

79 views50 pages

Advanced String Lecture

The document discusses string searching algorithms like the Rabin-Karp and KMP algorithms. The Rabin-Karp algorithm uses hashing to quickly filter text positions that cannot match the pattern. It calculates a rolling hash over a moving window. The KMP algorithm uses a prefix table to skip character comparisons after a mismatch by leveraging the fact that some text is already known not to match the pattern. It first builds the prefix table through preprocessing and then uses it to efficiently search for patterns in text.

Uploaded by

Yared Tegegn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views50 pages

Advanced String Lecture

Uploaded by

Yared Tegegn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Advanced String Lecture

String Searching Algorithms

How would you normally try to check if there is a substring that matches the
pattern?

String: “abcdeacdoe”

Pattern: “bcd”
Rabin-Karp Algorithm
● AKA Karp–Rabin algorithm
● Uses hashing (more specifically Rolling Hash)
● Helps quickly filter out positions of the text that cannot match the pattern, and
then checks for a match at the remaining positions
What is Rolling hash?
● A rolling hash is a hash function where the input is hashed in a window that
moves through the input.
● Think of it as a wheel moving on an inclined plane
Steps followed in Rolling hash pattern matching
1. Calculate Hash for Pattern
2. Calculate hash for 1st window in substring
3. Repeat step 2 until we get to the end of the substring
Trivial Rolling Hash Drawbacks?
● What are the drawbacks of rolling hash?
Let’s consider the case

String: AABAABCABA
Rabin–Karp string search algorithm
This is a simple rolling hash function that only uses multiplications and additions

where a is a constant, and c1, c2,...,ck are the input characters

● We then return H(P) = H mod d

d is preferably a large prime number. Why?

Consider the string ABEDA

Character Values: A => 1, B => 2, …. Z => 26

Let’s choose a prime number 3

1st Window: ABE
A B E D A
1 * 3^0 + 2 * 3^1 + 5 * 3^2
1 + 6 + 45 = 52 hash value

2nd Window: BED

A B E D A

23^0 + 53^1 + 4*3^2

2 + 15 + 36 = 53 hash value

● There is a method of making this process of finding the hash more efficient!
Steps to follow in Rabin-Karp’s algorithm
1. Subtract the value of the character removed from the window from the old
hash
2. Divide the old hash value with the number we picked
3. Add the new character value in the window multiplied by the length of the
pattern - 1
● So in the previous example:
● 52 - (val(A)) = 51
● 51/(prime) = 17
● 17 + (val(D)*3^2) = 53
Practice:
String: “AABAACAADAABAABA”

Pattern = “AABA”

A A B A A C A A D A A B A A B A
Time Complexity
● Best-case? Worst-case?
● When is worst-case time complexity achieved?
● How to reduce worst-case scenarios?
KMP Pattern Matching
- Knuth-Morris-Pratt Algorithm
- The basic idea behind KMP’s algorithm is: whenever we detect a mismatch
(after some matches), we already know some of the characters in the text of
the next window
- KMP algorithm was the first linear time complexity algorithm for string
matching.
● KMP algorithm is used to find a "Pattern" in a "Text". This algorithm compares
character by character from left to right. But whenever a mismatch occurs, it
uses a preprocessed table called "Prefix Table" to skip characters comparison
while matching.
● Some times prefix table is also known as LPS Table. Here LPS stands for
"Longest proper Prefix which is also Suffix".
Steps for creating LPS Table (Preprocessing)
1. Define a one dimensional array with the size equal to the length of the
Pattern. (LPS[size])
2. Define variables i & j. Set i = 0, j = 1 and LPS[0] = 0.
3. Compare the characters at Pattern[i] and Pattern[j].
4. If both are matched then set LPS[i] = j+1 and increment both i & j values by
one. Go to Step 3.
5. If both are not matched then check the value of variable 'j'. If it is '0' then set
LPS[i] = 0 and increment 'i' value by one, if it is not '0' then set i = LPS[j-1]. Go
to Step 3.
6. Repeat above steps until all the values of LPS[] are filled.
Example

String: abcdabca

Step 1 - Define a one dimensional array

a b c d a b c a
0 1 2 3 4 5 6 7
Step 2:

i, j
a b c d a b c a
0 1 2 3 4 5 6 7

● First one is always 0

Remaining Steps until i == len(pat):

j i
a b c d a b c a

0 1 2 3 4 5 6 7

0 0

● pat[i] != pat[j]
● I += 1
j i
a b c d a b c a
0 1 2 3 4 5 6 7

0 0 0
j i
a b c d a b c a
0 1 2 3 4 5 6 7

0 0 0 0
j i

a b c d a b c a
0 1 2 3 4 5 6 7

0 0 0 0 1

● Now pat[i] == pat[j]

● Pat[i] = j + 1
● j,i += 1
j i
a b c d a b c a
0 1 2 3 4 5 6 7

0 0 0 0 1 2
j i
a b c d a b c a
0 1 2 3 4 5 6 7

0 0 0 0 1 2 3
j i
a b c d a b c a

0 1 2 3 4 5 6 7

0 0 0 0 1 2 3 ?

● What happens in this case?

j (moved here) j i
a b c d a b c a

0 1 2 3 4 5 6 7

0 0 0 0 1 2 3 1

● As mentioned in step 5, pat[i] != pat[j]

● So in this case we check pat[j-1] value and j will go to this index
● Since pat[i] == pat[pat[j-1]], then pat[i] = pat[pat[j-1]] + 1
● Therefore j = 0 and pat[i] = 1
j i
a b c d a b c a

0 1 2 3 4 5 6 7

0 0 0 0 1 2 3 1

● Since i is now equal to size of pat, we stop

Practice making LPS
1. String = aabaabaaa
2. String = aaacaaaaac
● But this is only the first part (Pattern Preprocessing)
● Next the String Matching phase begins
String Matching Explained
1. Start comparing first character in pattern with the first character in the text
until there is a mismatch
2. If there is a mismatch at index = i then we check LPS[i-1] and compare ith
character in LPS with the next character in the text
3. If the value is different from 0 this means there is a suffix that is also a prefix
hence we don’t need to compare starting from the first character in the
pattern. This is the brilliance of KMP algorithm.
4. We continue this process until we find a match or our index runs out and
conclude we don’t have a match.
Text: abxabcabcaby

Pattern: abcaby

LPS of the pattern:

a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
I
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0

● What happens here since there is a mismatch?

i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0

● We check the LPS[j-1] which LPS[1] = 0

● This means next comparison is going start from 0th index in the LPS
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0

● Since LPS[j] != text[i], i += 1

i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0

● LPS[j] != text[i]
● What do we do next?
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0

● LPS[j-1] = 2. As a result j = 2.
● This is because “ab” is a prefix that is also a suffix. We don’t need to check from the
beginning.
● Since text[i] == LPS[jnew] both i and j move to next index.
● What if LPS[jnew] was different with text[i]?
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0
i
a b x a b c a b c a b y

j
a b c a b y

0 1 2 3 4 5

0 0 0 1 2 0

● We have found a match

Practice:

String: adsgwadsxdsgwadsgz

Pattern: dsgwadsgz
Questions
1. Longest Happy Prefix
2. Find the Index of the First Occurrence in a String

54.string Inotes
No ratings yet
54.string Inotes
20 pages
DAA-DA
No ratings yet
DAA-DA
9 pages
Design & Analysis of algorithm- 6
No ratings yet
Design & Analysis of algorithm- 6
32 pages
20BCS5977_DAA LAB WORKSHEET 3.3pdf
No ratings yet
20BCS5977_DAA LAB WORKSHEET 3.3pdf
5 pages
AOA Module 6 - String of Algorithms - Aeraxia - in
No ratings yet
AOA Module 6 - String of Algorithms - Aeraxia - in
26 pages
AAD Lec11
No ratings yet
AAD Lec11
5 pages
Daa Exp 09
No ratings yet
Daa Exp 09
7 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Cse 217
No ratings yet
Cse 217
10 pages
Module III Problem Solving
No ratings yet
Module III Problem Solving
16 pages
String Matching - RYS - Lect - 1 - 2 - 3 - Update
No ratings yet
String Matching - RYS - Lect - 1 - 2 - 3 - Update
61 pages
CSE 205 Lab Manual 12 KMP
No ratings yet
CSE 205 Lab Manual 12 KMP
6 pages
DS UNIT-5 TOPIC
No ratings yet
DS UNIT-5 TOPIC
26 pages
DAA-DA-output
No ratings yet
DAA-DA-output
9 pages
Naïve Method. Code:: Naive, Rabin-Karp, and Knuth-Morris-Pratt Algorithms For String Matching
No ratings yet
Naïve Method. Code:: Naive, Rabin-Karp, and Knuth-Morris-Pratt Algorithms For String Matching
5 pages
String Matching
No ratings yet
String Matching
35 pages
G5 Advanced String Algorithms Lecture (With Code)
No ratings yet
G5 Advanced String Algorithms Lecture (With Code)
142 pages
G5 Advanced String Algorithms Lecture (No Code)
No ratings yet
G5 Advanced String Algorithms Lecture (No Code)
136 pages
KMP Algorithm For Strings
No ratings yet
KMP Algorithm For Strings
4 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
A357460420 - 22393 - 2 - 2018 - String Matching
No ratings yet
A357460420 - 22393 - 2 - 2018 - String Matching
27 pages
Co 4 (Lo 2)
No ratings yet
Co 4 (Lo 2)
12 pages
Knuth Morris Pratt Algorithm
No ratings yet
Knuth Morris Pratt Algorithm
4 pages
exp 10 daa ak
No ratings yet
exp 10 daa ak
7 pages
Unit 3
No ratings yet
Unit 3
34 pages
String Naive and KMP
No ratings yet
String Naive and KMP
18 pages
ADSA_IA2_solution
No ratings yet
ADSA_IA2_solution
14 pages
KMP algorithm
No ratings yet
KMP algorithm
19 pages
BCS304 DS Module 1 KMP Algorithm
No ratings yet
BCS304 DS Module 1 KMP Algorithm
6 pages
Lecture 56string Matching
No ratings yet
Lecture 56string Matching
43 pages
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
No ratings yet
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
20 pages
String Matching
No ratings yet
String Matching
27 pages
patternmatching
No ratings yet
patternmatching
29 pages
Lecture 34, 35 36 - String Matching Algorithms
No ratings yet
Lecture 34, 35 36 - String Matching Algorithms
42 pages
Rabin-Karp Algorithm For Pattern Searching: Examples
No ratings yet
Rabin-Karp Algorithm For Pattern Searching: Examples
5 pages
experiment 9 DAA
No ratings yet
experiment 9 DAA
5 pages
Strings and Pattern Matching
No ratings yet
Strings and Pattern Matching
17 pages
String Matching Chapter 12 Goodrich Nep
No ratings yet
String Matching Chapter 12 Goodrich Nep
43 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
Week4 PPT SM
No ratings yet
Week4 PPT SM
35 pages
4string Matching Kmprabin Karp and Naive
No ratings yet
4string Matching Kmprabin Karp and Naive
57 pages
4th_Sem_DAA_Module_4
No ratings yet
4th_Sem_DAA_Module_4
10 pages
String Matching
No ratings yet
String Matching
34 pages
Daa 9
No ratings yet
Daa 9
4 pages
Daa 9
No ratings yet
Daa 9
4 pages
String Matching Problem
No ratings yet
String Matching Problem
16 pages
DSA _Strings_ Notes
No ratings yet
DSA _Strings_ Notes
8 pages
UNIT 5
No ratings yet
UNIT 5
14 pages
Knuth Moris 2797348
No ratings yet
Knuth Moris 2797348
21 pages
String Matching
No ratings yet
String Matching
89 pages
Daa Exp-9
No ratings yet
Daa Exp-9
4 pages
KMP Algorithm
No ratings yet
KMP Algorithm
21 pages
How A Search Engine Works
No ratings yet
How A Search Engine Works
28 pages
Pattern Matching
No ratings yet
Pattern Matching
2 pages
DAA EXP-9 AJAY
No ratings yet
DAA EXP-9 AJAY
4 pages
Strings
No ratings yet
Strings
73 pages
CS 240 Tutorial 11 Notes: C A A B A
No ratings yet
CS 240 Tutorial 11 Notes: C A A B A
2 pages
02 Exact KMP Boyer - Moore
No ratings yet
02 Exact KMP Boyer - Moore
100 pages
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
Experiment: Digital Image Processing: Image As A 2 D Signal
No ratings yet
Experiment: Digital Image Processing: Image As A 2 D Signal
16 pages
Numerical Methods! A Satire!!
No ratings yet
Numerical Methods! A Satire!!
2 pages
Segment Tree
No ratings yet
Segment Tree
80 pages
Chapter 8. Tree
No ratings yet
Chapter 8. Tree
104 pages
Complaint Detail 21102240002197
No ratings yet
Complaint Detail 21102240002197
2 pages
Task Performance Os Pre Final
No ratings yet
Task Performance Os Pre Final
3 pages
Hashing Part 1 Lecture
No ratings yet
Hashing Part 1 Lecture
33 pages
Session 3 - 2023-2
100% (1)
Session 3 - 2023-2
29 pages
Signal filtering using fourier transform
No ratings yet
Signal filtering using fourier transform
8 pages
Details of Assembled Global stiffness matrices
No ratings yet
Details of Assembled Global stiffness matrices
3 pages
Comprehensive_Pattern_Recognition_Lecture_Notes
No ratings yet
Comprehensive_Pattern_Recognition_Lecture_Notes
12 pages
Searching: Unit-5
No ratings yet
Searching: Unit-5
21 pages
Lab 5
No ratings yet
Lab 5
8 pages
Sem4 - Important Id
No ratings yet
Sem4 - Important Id
6 pages
Computational Thinking, Problem-Solving and Programming:: IB Computer Science
No ratings yet
Computational Thinking, Problem-Solving and Programming:: IB Computer Science
8 pages
Data Analytics Unit-4
No ratings yet
Data Analytics Unit-4
47 pages
Polynomials Class 9
No ratings yet
Polynomials Class 9
5 pages
latex
No ratings yet
latex
4 pages
DSP Test1 Cheatsheet
0% (1)
DSP Test1 Cheatsheet
3 pages
C - C++ - Ebooks Collection Sha Hashsum
No ratings yet
C - C++ - Ebooks Collection Sha Hashsum
12 pages
Nz6S73TkKj1-j6-9 - zINMuJvN2BBTtj0u-EPSM - Unit 7 - End of Unit Quiz MR
No ratings yet
Nz6S73TkKj1-j6-9 - zINMuJvN2BBTtj0u-EPSM - Unit 7 - End of Unit Quiz MR
3 pages
Introduction To Matrices in Matlab: Vectors Vector
No ratings yet
Introduction To Matrices in Matlab: Vectors Vector
4 pages
Chapter 3 - Introduction To Linear Programming A
No ratings yet
Chapter 3 - Introduction To Linear Programming A
37 pages
SEC3014 Part3
No ratings yet
SEC3014 Part3
39 pages
Introduction To Linear Programming
No ratings yet
Introduction To Linear Programming
56 pages
Implement On A Data Set of Characters The Three CRC Polynomials
No ratings yet
Implement On A Data Set of Characters The Three CRC Polynomials
26 pages
ADA Techmax Searchable
No ratings yet
ADA Techmax Searchable
100 pages
Final Quiz 2 4
No ratings yet
Final Quiz 2 4
4 pages
Dfs TSP
No ratings yet
Dfs TSP
7 pages
KNN Solved Example
100% (1)
KNN Solved Example
6 pages

Advanced String Lecture

Uploaded by

Advanced String Lecture

Uploaded by

Advanced String Lecture

String Searching Algorithms

where a is a constant, and c1, c2,...,ck are the input characters

● We then return H(P) = H mod d

d is preferably a large prime number. Why?

Character Values: A => 1, B => 2, …. Z => 26

Let’s choose a prime number 3

2nd Window: BED

2*3^0 + 5*3^1 + 4*3^2

Step 1 - Define a one dimensional array

● First one is always 0

● Now pat[i] == pat[j]

● What happens in this case?

● As mentioned in step 5, pat[i] != pat[j]

● Since i is now equal to size of pat, we stop

LPS of the pattern:

● What happens here since there is a mismatch?

● We check the LPS[j-1] which LPS[1] = 0

● Since LPS[j] != text[i], i += 1

● We have found a match

You might also like

23^0 + 53^1 + 4*3^2