Lec 10 KMP
Lec 10 KMP
IITB India
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 1
Topic 10.1
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 2
Pattern matching
Definition 10.1
In a pattern-matching problem, we need to find the position of all occurrences of a pattern string
P in a string T .
Usage:
▶ Text editor
▶ DNA sequencing
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 3
Example : Näive approach for pattern matching
Example 10.1
Consider the following text T and pattern P. We try to match the pattern in every position.
T x y z x y x x y x y p x
P x y x y
x y x y
x y x y
x y x y
x y x y
x y x y
x y x y
Running time complexity is O(|T ||P|).
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 4
Wasteful attempts of matching.
i
T x y z x y x x y x y p x
P x y x y
x y x y
x y x y
No.
T ... ...
T ... ...
P
P[j − k : j − 1] j
P[0 : k − 1]
k
h −1 0 −1 0 2 h −1 0 −1 1 0
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 8
Knuth–Morris–Pratt algorithm
Algorithm 10.1: KMP(string T,string P)
Running time complexity:
1 i := 0; j := 0; found := ∅;
▶ if branch will only execute ≤ |T | times.
2 h := KMPTable(P);
3 while i < |T | do
▶ How do we bound the number of
4 if P[j] = T [i] then
iterations where the else branch does not
5 i := i + 1; j := j + 1; increment i?
6 if j = |P| then 1. The else branch reduces j.
7 found.insert(i − j); 2. Since j ≥ 0 at loop head, the no. of
8 j = h[j]; reductions of j ≤ no. of the increments
of j.
9 else
3. i and j are incremented together.
10 j = h[j]; 4. no. of reductions of j ≤ no. of
11 if j < 0 then increments of i.
12 i := i + 1; j := j + 1; 5. no. of reductions of j ≤ |T |
▶ O(|T |) algorithm
13 return found
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 9
Topic 10.2
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 10
Recall: the definition of h
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 11
Self-matching: We use KMP again for computing h
P
j
We assign h[i] = j.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 12
Self-matching: Moving the pattern forward
After the mismatch, we need to move the pattern forward as little as possible.
P
j
P
j
If the suffix of part of T does not match with P[0 : i] then it will also not match with P[0 : j].
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 15
Topic 10.3
Problem
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 16
Exercise: compute h
Exercise 10.4
Compute array h for pattern ”babbaabba”.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 17
Exercise: version of KMPtable
Exercise 10.5
Is the following version of KMPtable correct?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 18
Exercise: compute h(i)
Exercise 10.6
Suppose that there is a letter z in P of length n such that it occurs in only one place, say k, which
is given in advance. Can we optimize the computation of h?
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 19
End of Lecture 10
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 20