Knuth- Morris- Pratt Algorithm
The Knuth-Morris-Pratt (KMP) Algorithm is a linear-time string matching algorithm that is used to search for occurrences of a given pattern in a given text. Developed by Donald Knuth, Vaughan Pratt, and James Morris in the early 1970s, the KMP algorithm can be considered as an improvement over the naive string searching algorithms, as it does not require backtracking of the text in case of a mismatch. The primary advantage of the KMP algorithm is its time complexity, which is O(n+m) (where n is the length of the text and m is the length of the pattern), making it an efficient choice for string matching problems.
The core of the KMP algorithm lies in the construction of a prefix function (also known as the failure function, or the longest proper prefix-suffix array), which is an auxiliary array that stores the length of the largest proper prefix that is also a suffix for each character in the pattern. The prefix function allows the KMP algorithm to avoid unnecessary comparisons and skip over characters that are guaranteed not to match. During the search process, whenever a mismatch occurs, the algorithm shifts the pattern according to the value stored in the prefix function array, allowing it to continue the search without backtracking in the text. This mechanism ensures that each character in the text is compared at most once, resulting in the linear time complexity of the algorithm.
/*
Petar 'PetarV' Velickovic
Algorithm: Knuth-Morris-Pratt
*/
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <iostream>
#include <vector>
#include <list>
#include <string>
#include <algorithm>
#include <queue>
#include <stack>
#include <set>
#include <map>
#include <complex>
#define MAX_N 1000001
using namespace std;
typedef long long lld;
int n, m;
string needle, haystack;
int P[MAX_N];
vector<int> matches;
//Knuth-Morris-Pratt algoritam za string matching
//Slozenost: O(N + M)
inline void KMP()
{
for (int i=0;i<m;i++) P[i] = -1;
for (int i=0, j=-1;i<m;)
{
while (j > -1 && needle[i] != needle[j]) j = P[j];
i++;
j++;
P[i] = j;
}
for (int i=0, j=0;i<n;)
{
while (j > -1 && haystack[i] != needle[j]) j = P[j];
i++;
j++;
if (j == m)
{
matches.push_back(i - m);
j = P[j];
}
}
}
int main()
{
n = 6, m = 2;
haystack = "abcabc";
needle = "bc";
KMP();
for (int i=0;i<matches.size();i++) printf("%d ",matches[i]);
printf("\n");
return 0;
}