0% found this document useful (0 votes)
3 views20 pages

Lec 10 KMP

Uploaded by

Saksham Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

Lec 10 KMP

Uploaded by

Saksham Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

CS213/293 Data Structure and Algorithms 2023

Lecture 10: Pattern matching

Instructor: Ashutosh Gupta

IITB India

Compile date: 2023-09-03

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 1
Topic 10.1

Pattern matching problem

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 2
Pattern matching

Definition 10.1
In a pattern-matching problem, we need to find the position of all occurrences of a pattern string
P in a string T .

Usage:
▶ Text editor
▶ DNA sequencing

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 3
Example : Näive approach for pattern matching
Example 10.1
Consider the following text T and pattern P. We try to match the pattern in every position.

T x y z x y x x y x y p x

P x y x y

x y x y

x y x y

x y x y

x y x y

x y x y

x y x y
Running time complexity is O(|T ||P|).
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 4
Wasteful attempts of matching.
i
T x y z x y x x y x y p x

P x y x y

x y x y

x y x y

Should we have tried to match the second and third positions?

No.

Let us suppose we failed to match position i of T and position 2 of P.


▶ We know that T [i − 1] = y . Therefore, there is no match starting at i − 1. (Why?)

▶ We know that T [i] ̸= x. Therefore, there is no match starting at i. (Why?)


cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 5
Shifting the pattern
Let us suppose at position i of T and j of P the matching fails.

T ... ...

Let us suppose we want to resume the search by only updating j.

If we assign j some value k, we are shifting the pattern forward by j − k.


Exercise 10.1
What is the meaning of k = j − 1, k = 0, or k = −1?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 6
What is a good value of k?
We know T [i − j : i − 1] = P[0 : j − 1] and T [i] ̸= P[j].

T ... ...

P
P[j − k : j − 1] j

P[0 : k − 1]
k

We must have P[0 : k − 1] = P[j − k : j − 1] and P[j] ̸= P[k](Why?).


Exercise 10.2
Should we choose the largest k or smallest k?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 7
The largest k implies the minimum shift

We choose the largest k such that

P[0 : k − 1] = P[j − k : j − 1] and P[j] ̸= P[k].

k only depends on P and j.

Since P is typically small, we may pre-compute array h such that h[j] = k.


Example 10.2
P x y x y P x y x z

h −1 0 −1 0 2 h −1 0 −1 1 0

We can compute h in O(|P|) time. We will discuss this later.

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 8
Knuth–Morris–Pratt algorithm
Algorithm 10.1: KMP(string T,string P)
Running time complexity:
1 i := 0; j := 0; found := ∅;
▶ if branch will only execute ≤ |T | times.
2 h := KMPTable(P);
3 while i < |T | do
▶ How do we bound the number of
4 if P[j] = T [i] then
iterations where the else branch does not
5 i := i + 1; j := j + 1; increment i?
6 if j = |P| then 1. The else branch reduces j.
7 found.insert(i − j); 2. Since j ≥ 0 at loop head, the no. of
8 j = h[j]; reductions of j ≤ no. of the increments
of j.
9 else
3. i and j are incremented together.
10 j = h[j]; 4. no. of reductions of j ≤ no. of
11 if j < 0 then increments of i.
12 i := i + 1; j := j + 1; 5. no. of reductions of j ≤ |T |
▶ O(|T |) algorithm
13 return found
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 9
Topic 10.2

How to compute array h?

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 10
Recall: the definition of h

For a pattern P, h[j] is the largest k such that

P[0 : k − 1] = P[j − k : j − 1] and P[j] ̸= P[k].

We use KMP like algorithm again to compute h.

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 11
Self-matching: We use KMP again for computing h

For largest j such that P[i − j : i − 1] = P[0 : j − 1] and P[i] ̸= P[j].

P
j

We assign h[i] = j.

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 12
Self-matching: Moving the pattern forward
After the mismatch, we need to move the pattern forward as little as possible.

P
j

We must have computed h for earlier indexes. Therefore, j := h[j].


Exercise 10.3
Why the value of h[j] be available?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 13
Self-matching: if no disagreement
Let us consider the case when matching continues. How should we assign h[i]?

P
j

h[i] := j may not be efficient.

If the suffix of part of T does not match with P[0 : i] then it will also not match with P[0 : j].

We will be jumping again to h[j]. We should directly assign h[i] = h[j].


cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 14
Computing h array

Algorithm 10.2: KMPTable(string P)


i := 1; j := 0; h[0] := − 1;
while i < |P| do
if P[j] ̸= P[i] then
4 h[i] := j;
5 while j ≥ 0 and P[j] ̸= P[i] do
6 j := h[j]; // Moving forward the pattern in minimum steps as in KMP
else
8 h[i] := h[j];
i := i + 1; j := j + 1;
h[|P|] := j;
return h

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 15
Topic 10.3

Problem

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 16
Exercise: compute h

Exercise 10.4
Compute array h for pattern ”babbaabba”.

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 17
Exercise: version of KMPtable

Exercise 10.5
Is the following version of KMPtable correct?

Algorithm 10.3: KMPTableV2(string P)


i := 1; j := 0; h[0] := − 1;
while i < |P| do
h[i] := j;
while j ≥ 0 and P[j] ̸= P[i] do
5 j := h[j]; // Moving forward the pattern in minimum steps as in KMP
i := i + 1; j := j + 1;
h[|P|] := j;
return h

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 18
Exercise: compute h(i)

Exercise 10.6
Suppose that there is a letter z in P of length n such that it occurs in only one place, say k, which
is given in advance. Can we optimize the computation of h?

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 19
End of Lecture 10

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 20

You might also like