KMP Algorithm: Engineerpro - K01
KMP Algorithm: Engineerpro - K01
KMP Algorithm
2
1. Introduction
3
KMP Algorithm
● The problem: Finding occurrence of the string pattern inside the string haystack.
○ Doesn't have to be strings, can be any array of data such as array of integers, bytes, etc…
● Definitions:
○ n: length of pattern
○ m: length of haystack
● Brute force algorithm: For every position of haystack, check if the substring pattern can start from it.
○ Time complexity: O(n * m)
○ Why is it inefficient?
■ We have to start from the beginning of pattern for every position of haystack.
■ Can we skip some section of pattern that we already know matched?
4
KMP Algorithm
● KMP table (Partial match table): For a string s, the KMP table kmpTable is an array of integers with:
kmpTable[0] = -1
kmpTable[i] = max(v | s[0: v] == s[i - v: i]) for all i > 0
● When matching pattern in haystack , if one character mismatch, we can use pattern's KMP table to minimize the number
of characters we have to backtrack.
● Algorithm to calculate kmpTable :
if (s[i] == s[j])
kmpTable[i] = ++j;
}
return kmpTable;
}
5
KMP Algorithm
● Evaluating time complexity of calculateKMPTable():
○ Outer loop: i goes from 1 to n - 1.
○ Inner loop:
■ j gets increased by one in every iteration → j can go up to n - 1.
■ j may get decreased several times, but can never go below 0 → cannot be decreased more than n -
1 times.
→ At most 2n - 2 addition/subtraction operations, equivalent to O(n) time complexity.
6
KMP Algorithm
std::vector<int> kmpSearch(const std::string& haystack, const std::string& pattern) {
auto result = std::vector<int>();
int m = haystack.size();
int n = pattern.size();
auto kmpTable = calculateKMPTable(pattern);
for (int i = 0, j = 0; i < m; i ++) {
while (j > 0 && haystack[i] != pattern[j]) {
j = kmpTable[j - 1];
}
if (haystack[i] == pattern[j]) {
if (++j == n) {
result.push_back(i - n + 1);
j = kmpTable[j - 1];
}
}
}
return result;
}
7
KMP Algorithm
● Evaluating time complexity of kmpSearch():
○ Outer loop: i goes from 1 to m - 1.
○ Inner loop:
■ j gets increased by one in every iteration until we match n → j can go up to n - 1.
■ j may get decreased several times, but can never go below 0 → cannot be decreased more than n -
1 times.
→ At most m + 2n - 2 addition/subtraction operations, equivalent to O(n + m) time complexity.
● Overall complexity of the KMP Algorithm: O(n + m).
8
KMP Algorithm
● Visualization: https://cmps-people.ok.ubc.ca/ylucet/DS/KnuthMorrisPratt.html
9
2. Example
10
Example 1
● https://leetcode.com/problems/find-the-index-of-the-first-occurrence-in-a-string/description/
○ Well, just calculate the KMP table
11
Example 2
● https://leetcode.com/problems/longest-happy-prefix/description/
○ Well, just calculate the KMP table too
12
Example 3
● https://leetcode.com/problems/shortest-palindrome/description/
○ Well, it's the previous problem, but in reverse…
13
Example 4
● https://leetcode.com/problems/repeated-string-match/description/
○ It's the classic KMP problem, but with a twist!
14
3. Homework
15
Homework
1. https://leetcode.com/problems/remove-all-occurrences-of-a-substring/description/
○ Implementing it may look challenging, but actually not so!
2. https://leetcode.com/problems/form-array-by-concatenating-subarrays-of-another-array/
○ KMP on array + dynamic programming, oh boi…
3. https://leetcode.com/problems/camelcase-matching/
○ Now do this again, but with passion!
16