0% found this document useful (0 votes)

114 views38 pages

Lecture#8 - String Matching Algorithm

Uploaded by

Pritom Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views38 pages

Lecture#8 - String Matching Algorithm

Uploaded by

Pritom Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

STRING MATCHING

CSE-237 : ALGORITHM DESIGN AND

A N A LY S I S
BASIC

String Matching Algorithms

STRING MATCHING

• Problem : Find if a pattern P of length m occurs within text T of length n

• Given a pattern P[1..m] and a text T[1..n], find all occurrences of P in T.
• Both P and T belong to *.

• Solution : if P  T, P occurs with shift s (beginning at s+1):

P[1]=T[s+1], P[2]=T[s+2], …, P[m]=T[s+m].
• If so, call s is a valid shift, otherwise, an invalid shift.
P = ABABDE
• Note: one occurrence may begin within another one - ║║║║║║
T = ABABABDEAA,
• P=abab, T=abcabababbc, P occurs at s=3 and s=5
s=3

String Matching Algorithms ‹#›

APPLICATIONS OF STRING MATCHING

• Remember :
• *Text is the string that we are searching to
• *Pattern is the string that we are searching for
• *Shift is an offset into a string

• Why do we need string matching?

• String matching is used in various applications like spell checkers, keyword
matching, spam filters, search engines, plagiarism detectors,
bioinformatics & DNA sequencing, database searching etc.

String Matching Algorithms ‹#›

NOTATION AND TERMINOLOGY

• Prefix : w is a prefix of x, if x = wy for some y  *. Denoted as w  x.

• Suffix : w is a suffix of x, if x = yw for some y  *. Denoted as w  x.
• For example, we have ab  abcca and cca  abcca

• Overlapping Shift Lemma :

Suppose x, y, z and x  z and y  z, then
a) if |x| < |y|, then x  y;
b) if |x| > |y|, then y  x;
c) if |x| = |y|, then x = y.

String Matching Algorithms ‹#›

BASIC CLASSIFICATION

• Naive Algorithm : performing a brute-force comparison of each character in the

pattern at each possible placement of the pattern in the string.
• It is O(mn) in the worst case scenario.

• Rabin-Karp Algorithm : compares the string’s hash values, rather than string
themselves.
• Performs well in practice and generalized to other algorithm for problems like 2D matching.

• Knuth-Morris-Pratt Algorithm : modification of on brute-force algorithm and is

capable of running O(m+n) in the worst case.
• It improves the running time by taking advantage of prefix function

String Matching Algorithms ‹#›

NAÏVE ALGORITHM

String Matching Algorithms

NAÏVE STRING MATCHING

• Naïve (Brute Force) approach : The naive algorithm finds all valid
shifts using a loop that checks the condition P[1..m] = T[s+1, s+2,
s+m] for each of the n - m + 1 possible values of s.

String Matching Algorithms ‹#›

NAÏVE STRING MATCHING ...

• Problem with naïve algorithm : Suppose P = ABABC, T = CABABABCD

• Whenever a character mismatch occurs after matching of several characters, the comparison
begins by going back in T from the character which follows the last beginning character.
• Worst-case complexity : O(m(n-m+1)); Average case performance is surprisingly good
provided stings are neither long nor have lots of repeated letters.

• Can we do better: not go back in T?

String Matching Algorithms ‹#›

RABIN-KARP ALGORITHM

String Matching Algorithms

RABIN-KARP ALGORITHM

• Rabin-Karp string matching algorithm is actually the naive approach

augmented with a powerful programming technique - hash function
• Algorithm
• Calculate the hash for the pattern P.
• Calculate the hash values for all the prefixes of the text T.
• If the hash values are equal, we can compare (in constant time) the pattern
with M-character sequence using Brute Force approach.
• If the hash values for a particular subsequence are unequal, the algorithm
will calculate the hash value for next M-character sequence.

String Matching Algorithms ‹#›

PSEUDOCODE

String Matching Algorithms ‹#›

EXAMPLE

String Matching Algorithms ‹#›

CALCULATING HASH VALUE

• Let’s associate one integer with every letter of the alphabet.

 Hence we can say ‘A’ corresponds to 1, ‘B’ corresponds to 2 , ‘C’
corresponds to 3……
 Similarly all other letters are corresponding to their index values.

• The Hash Value of the String “CAT” will be -

String Matching Algorithms ‹#›

WHAT IF TWO VALUES COLLIDE

• If the hash value matches for two strings then it is called a ‘hit’.
• It may be possible that two or more different strings produce the same
hash value.
String 1: “CBJ” hash code=3*100 + 2*10 + 10 = 330
String 2: “CAT” hash code=3*100 + 1*10 + 20 = 330
• Hence it is necessary to check whether it is a hit or not?
• Any hit will have to be tested to verify that it is not spurious and that
p[1..m] = T[s+1..s+m]

String Matching Algorithms ‹#›

MATHEMATICAL RESOLUTION

• Let’s take an m-character sequence as an m-digit number in base b.

Then the text subsequence t[ i .. i + m-1] is mapped to the number as
follows :

• If m is very large then the hash value will be very large in size, so
we can hash the value by taking mod a prime number, say q.

String Matching Algorithms ‹#›

REHASHING AND COMPLEXITY

• Rehashing : Hash at next shift must be efficiently computable (O(1))

from the current hash value and next character in text. [ To do
rehashing, we need to take off the most significant digit and add the
new least significant digit for in hash value. ]
hash(t[s+1 .. s+m]) = (d(hash(t[s, s+m-1]) - t[s]*h) +
t[s+m]) mod q
• where
• hash( t[s, s+m-1]) : Hash value at shift s.
• hash( t[s,+1 s+m]) : Hash value at next shift ( or shift s+1)
• d : Number of characters in the alphabet, q : A prime number and h : d^(m-
1)
String Matching Algorithms ‹#›
• Time complexity :: Best-case = O(N+M), Worst-case = O(NM)
KNUTH-MORRIS-PRATT ALGORITHM

String Matching Algorithms

MOTIVATION OF KMP ALGORITHM

• Knuth-Morris-Pratt’s algorithm compares the pattern to the text in

left-to-right, but shifts the pattern more intelligently than the brute-
force algorithm.

• Idea : after some character (such as q) matches of P with T and then a

mismatch, the matched q characters allows us to determine
immediately that certain shifts are invalid. So directly go to the shift
which is potentially valid.
• The matched characters in T are in fact a prefix of P, so just from P,
it is OK to determine whether a shift is invalid or not.

String Matching Algorithms ‹#›

MOTIVATION OF KMP ALGORITHM ...

• Question : When a mismatch occurs, what is the most we can shift

the pattern so as to avoid redundant comparisons?

• Answer : the largest prefix of

P[1..j-1] that is a also suffix of
P[1..j-1]

String Matching Algorithms ‹#›

COMPONENTS OF KMP ALGORITHM

• The prefix function, Π :

The prefix function,Π for a pattern encapsulates knowledge about how the
pattern matches against shifts of itself. This information can be used to avoid
useless shifts of the pattern ‘p’. In other words, this enables avoiding
backtracking on the text ‘T’.

• The KMP Matcher :

With text ‘T’, pattern ‘p’ and prefix function ‘Π’ as inputs, finds the occurrence of
‘p’ in ‘T’ and returns the number of shifts of ‘p’ after which occurrence is found.

String Matching Algorithms

FUNCTIONALITY OF Π

• Naive : • Smarter technique :

Step#1: • We can slide the pattern ahead so
p: pappar that the longest PREFIX of p that we
have already processed matches the
t: pappappapparrassanuaragh
longest SUFFIX of t that we have
already matched.
Step#2
p: pappar
• p: pappar
t: pappappapparrassanuaragh
• t: pappappapparrassanuaragh

String Matching Algorithms

LPS (LONGEST PREFIX SUFFIX) CALCULATION

• Initialization

• Step#1

• Step#2

String Matching Algorithms

LPS CALCULATION ...

• Step#3

• Step#4

String Matching Algorithms

LPS CALCULATION ...

• Step#5

• Step#6

String Matching Algorithms

LPS CALCULATION ...

• After iterating 6 times, the prefix function computation is complete:

• The running time of the prefix function is O(m).

String Matching Algorithms

PSEUDOCODE FOR THE PREFIX FUNCTION, Π

String Matching Algorithms

PSEUDOCODE FOR KMP MATCHER

String Matching Algorithms

KMP MATCHER

• Let us execute the KMP algorithm to find whether ‘p’ occurs in ‘T’.

• For ‘p’ the prefix function, Π will be :

String Matching Algorithms

KMP MATCHER ...

• Initialization : n = size of T = 15; m = size of p = 7

• Step#1 : i = 1, q = 0;
comparing p[1] with T[1]

• Step#2 : i = 2, q = 0;
comparing p[1] with T[2]

String Matching Algorithms

KMP MATCHER ...

• Step#3 : i = 3, q = 1;
comparing p[2] with T[3]

• Step#4 : i = 4, q = 0;
comparing p[1] with T[4]

String Matching Algorithms

KMP MATCHER ...

• Step#5 : i = 5, q = 0;
comparing p[1] with T[5]

• Step#6 : i = 6, q = 1;
comparing p[2] with T[6]

String Matching Algorithms

KMP MATCHER ...

• Step#7 : i = 7, q = 2;
comparing p[3] with T[7]

• Step#8 : i = 8, q = 3;
comparing p[4] with T[8]

String Matching Algorithms

KMP MATCHER ...

• Step#9 : i = 9 q = 4;
comparing p[5] with T[9]

• Step#10 : i = 10, q = 5;
comparing p[6] with T[10]

String Matching Algorithms

KMP MATCHER ...

• Step#11 : i = 11 q = 4;
comparing p[5] with T[11]

• Step#12 : i = 12, q = 5;
comparing p[6] with T[12]

String Matching Algorithms

KMP MATCHER ...

• Step#13 : i = 13 q = 6;
comparing p[7] with T[13]

• Pattern ‘p’ has been found to completely occur in text ‘T’. The total
number of shifts that took place for the match to be found are : i – m =
13 – 7 = 6 shifts.
• The running time of the KMP-Matcher function is O(n).

String Matching Algorithms

PERFORMANCE

• Advantages :
• The running time of the KMP algorithmis optimal (O(m + n)), which is very
fast.
• The algorithm never needs to move backwards in the input text T. It makes
the algorithm good for processing very large files.

• Disadvantages :
• Doesn’t work so well as the size of the alphabets increases. By which more
chances of mismatch occurs.

String Matching Algorithms

COMPUTATIONAL GEOMETRY
Explore it on NEXT DAY

Unit II
No ratings yet
Unit II
94 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
String Matching
No ratings yet
String Matching
30 pages
Lecture 34, 35 36 - String Matching Algorithms
No ratings yet
Lecture 34, 35 36 - String Matching Algorithms
42 pages
AAD-String Matching
No ratings yet
AAD-String Matching
15 pages
String Matching Kmprabin Karp and Naive
No ratings yet
String Matching Kmprabin Karp and Naive
41 pages
DAA Unit 5
No ratings yet
DAA Unit 5
22 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
52 pages
Unit 5 String Matching 2010
No ratings yet
Unit 5 String Matching 2010
5 pages
Ada Notes Unit 4
No ratings yet
Ada Notes Unit 4
28 pages
String Matching 2019
No ratings yet
String Matching 2019
50 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
46 pages
Abstract
No ratings yet
Abstract
12 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
63 pages
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt - Regular Expressions
No ratings yet
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt - Regular Expressions
21 pages
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt
No ratings yet
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt
49 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
19 pages
Unit 5
No ratings yet
Unit 5
14 pages
Patternmatching
No ratings yet
Patternmatching
29 pages
Ch-5 Numerical Daa
No ratings yet
Ch-5 Numerical Daa
11 pages
AOA Module 6 - String of Algorithms - Aeraxia - in
No ratings yet
AOA Module 6 - String of Algorithms - Aeraxia - in
26 pages
11 Data Structures and Algorithms - Narasimha Karumanchi
100% (1)
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
5 pages
16 String Matching - Naive String Algorithm
100% (1)
16 String Matching - Naive String Algorithm
9 pages
Design and Analysis of Algorithms: String Matching Knuth-Morris-Pratt (KMP) Algorithm
No ratings yet
Design and Analysis of Algorithms: String Matching Knuth-Morris-Pratt (KMP) Algorithm
46 pages
M3-String Matching
No ratings yet
M3-String Matching
74 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
42 pages
Pattern Matching
No ratings yet
Pattern Matching
33 pages
String Matching
100% (1)
String Matching
27 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
18 pages
CH 8
No ratings yet
CH 8
26 pages
UNIT-V String Matching
No ratings yet
UNIT-V String Matching
24 pages
String Matching
No ratings yet
String Matching
35 pages
DAA Unit 5 Part 1
No ratings yet
DAA Unit 5 Part 1
27 pages
String Searching Algorithms Explained
No ratings yet
String Searching Algorithms Explained
23 pages
Naive and Rabin Karp
No ratings yet
Naive and Rabin Karp
47 pages
Lecture 39 Knutt Morris Pratt
No ratings yet
Lecture 39 Knutt Morris Pratt
15 pages
String Matching Chapter 12 Goodrich Nep
No ratings yet
String Matching Chapter 12 Goodrich Nep
43 pages
String Data Types and Matching Algorithms
No ratings yet
String Data Types and Matching Algorithms
20 pages
4th Sem DAA Module 4
No ratings yet
4th Sem DAA Module 4
10 pages
Lecture 18 - String Matching-KMP
No ratings yet
Lecture 18 - String Matching-KMP
40 pages
Lecture 56string Matching
No ratings yet
Lecture 56string Matching
43 pages
String Matching Algorithms Analysis
No ratings yet
String Matching Algorithms Analysis
5 pages
17 StringMatching
No ratings yet
17 StringMatching
18 pages
CSE 205 Lab Manual 12 KMP
No ratings yet
CSE 205 Lab Manual 12 KMP
6 pages
Co 4 (Lo 2)
No ratings yet
Co 4 (Lo 2)
12 pages
M269 - Lec8 Fall 1819
No ratings yet
M269 - Lec8 Fall 1819
24 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
63 pages
KMP 2
No ratings yet
KMP 2
7 pages
String Matching Algorithms in Bioinformatics
No ratings yet
String Matching Algorithms in Bioinformatics
7 pages
4 Module Algorithms
No ratings yet
4 Module Algorithms
28 pages
New PPT Daa2
No ratings yet
New PPT Daa2
12 pages
String Matching
No ratings yet
String Matching
15 pages
Pattern Matching 2
No ratings yet
Pattern Matching 2
46 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
43 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
6 pages
Lecture#7 - Flow Network Algorithm
No ratings yet
Lecture#7 - Flow Network Algorithm
40 pages
Regression Stat Assignment
No ratings yet
Regression Stat Assignment
7 pages
DP Matrix-Chain Multiplication (MCM) :: Questions
No ratings yet
DP Matrix-Chain Multiplication (MCM) :: Questions
3 pages
Slides
No ratings yet
Slides
41 pages
Regression Bhowal, Barua
No ratings yet
Regression Bhowal, Barua
12 pages
Branch and Bound Algorithm Overview
No ratings yet
Branch and Bound Algorithm Overview
32 pages
BUS203 Suggestions
No ratings yet
BUS203 Suggestions
2 pages
Error Calculations
No ratings yet
Error Calculations
10 pages
Quadratic and Double Hashing
No ratings yet
Quadratic and Double Hashing
8 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
4 pages
Data Structure and Algorithm Lab Exercise
No ratings yet
Data Structure and Algorithm Lab Exercise
3 pages
Uninformed Search Strategies Guide
No ratings yet
Uninformed Search Strategies Guide
43 pages
AI Problem Solving for Students
No ratings yet
AI Problem Solving for Students
43 pages
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
No ratings yet
Done DS GTU Study Material Presentations Unit-4 13032021035653AM
24 pages
Hash to Text Conversion Algorithm
No ratings yet
Hash to Text Conversion Algorithm
9 pages
Collision Handling Techniques
No ratings yet
Collision Handling Techniques
4 pages
L03 Problem Solving As Search I
No ratings yet
L03 Problem Solving As Search I
66 pages
Edmonds-Karp Algorithm Overview
No ratings yet
Edmonds-Karp Algorithm Overview
4 pages
Data Structures & Algorithms Assignment
No ratings yet
Data Structures & Algorithms Assignment
4 pages
CS188 Spring 2025 Project Deadlines
No ratings yet
CS188 Spring 2025 Project Deadlines
71 pages
Traveling Salesman Problem: BFS & DFS
No ratings yet
Traveling Salesman Problem: BFS & DFS
13 pages
World Hash Directory 2008
100% (1)
World Hash Directory 2008
100 pages
Hybrid Genetic Algorithm for VRP Solutions
No ratings yet
Hybrid Genetic Algorithm for VRP Solutions
14 pages
Ai Pastpapers
No ratings yet
Ai Pastpapers
3 pages
Nqueen
No ratings yet
Nqueen
12 pages
Thapar Institute of Engineering and Technology, Patiala: Any, Suitably
No ratings yet
Thapar Institute of Engineering and Technology, Patiala: Any, Suitably
3 pages
Hashing Essentials for Tech Enthusiasts
No ratings yet
Hashing Essentials for Tech Enthusiasts
4 pages
Shattered Full
No ratings yet
Shattered Full
24 pages
Data Structures Unit 2
No ratings yet
Data Structures Unit 2
22 pages
Artificial Intelligence Questions Solved
No ratings yet
Artificial Intelligence Questions Solved
17 pages
Informed Search: CS 4804 Fall 2020
No ratings yet
Informed Search: CS 4804 Fall 2020
27 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
Hashing Names with ASCII Values
No ratings yet
Hashing Names with ASCII Values
11 pages
Alpha Beta Example 2 PDF
No ratings yet
Alpha Beta Example 2 PDF
28 pages
2.3. Hill Climbing Technique
No ratings yet
2.3. Hill Climbing Technique
13 pages
Minimax and Alpha-Beta Pruning in AI
No ratings yet
Minimax and Alpha-Beta Pruning in AI
17 pages
Handling Collisions and Rehashing
No ratings yet
Handling Collisions and Rehashing
26 pages
Chapter 5 - Dual Simplex
No ratings yet
Chapter 5 - Dual Simplex
24 pages

Lecture#8 - String Matching Algorithm

Uploaded by

Lecture#8 - String Matching Algorithm

Uploaded by

STRING MATCHING

CSE-237 : ALGORITHM DESIGN AND

String Matching Algorithms

• Problem : Find if a pattern P of length m occurs within text T of length n

• Solution : if P  T, P occurs with shift s (beginning at s+1):

String Matching Algorithms ‹#›

• Why do we need string matching?

String Matching Algorithms ‹#›

• Prefix : w is a prefix of x, if x = wy for some y  *. Denoted as w  x.

• Overlapping Shift Lemma :

String Matching Algorithms ‹#›

• Naive Algorithm : performing a brute-force comparison of each character in the

• Knuth-Morris-Pratt Algorithm : modification of on brute-force algorithm and is

String Matching Algorithms ‹#›

String Matching Algorithms

String Matching Algorithms ‹#›

• Problem with naïve algorithm : Suppose P = ABABC, T = CABABABCD

• Can we do better: not go back in T?

String Matching Algorithms ‹#›

String Matching Algorithms

• Rabin-Karp string matching algorithm is actually the naive approach

String Matching Algorithms ‹#›

String Matching Algorithms ‹#›

String Matching Algorithms ‹#›

• Let’s associate one integer with every letter of the alphabet.

• The Hash Value of the String “CAT” will be -

String Matching Algorithms ‹#›

String Matching Algorithms ‹#›

• Let’s take an m-character sequence as an m-digit number in base b.

String Matching Algorithms ‹#›

• Rehashing : Hash at next shift must be efficiently computable (O(1))

String Matching Algorithms

• Knuth-Morris-Pratt’s algorithm compares the pattern to the text in

• Idea : after some character (such as q) matches of P with T and then a

String Matching Algorithms ‹#›

• Question : When a mismatch occurs, what is the most we can shift

• Answer : the largest prefix of

String Matching Algorithms ‹#›

• The prefix function, Π :

• The KMP Matcher :

String Matching Algorithms

• Naive : • Smarter technique :

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

• After iterating 6 times, the prefix function computation is complete:

• The running time of the prefix function is O(m).

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

• For ‘p’ the prefix function, Π will be :

String Matching Algorithms

• Initialization : n = size of T = 15; m = size of p = 7

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

String Matching Algorithms

You might also like