Strings and Pattern Searching
Strings and Pattern Searching
1
9 Anagram Substring Search (Or Search for all permutations) 41
Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2
Chapter 1
Output:
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
3
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13
4
search(pat, txt);
return 0;
}
Python
Output:
txt[] = "AABCCAADDEE"
pat[] = "FAA"
5
The number of comparisons in best case is O(n).
What is the worst case ?
The worst case of Naive Pattern Searching occurs in following scenarios.
1) When all characters of the text and pattern are same.
txt[] = "AAAAAAAAAAAAAAAAAA"
pat[] = "AAAAA".
2) Worst case also occurs when only the last character is different.
txt[] = "AAAAAAAAAAAAAAAAAB"
pat[] = "AAAAB"
Source
http://www.geeksforgeeks.org/searching-for-patterns-set-1-naive-pattern-searching/
6
Chapter 2
Output:
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
7
Pattern found at index 0
Pattern found at index 9
Pattern found at index 13
txt[] = "AAAAAAAAAAAAAAAAAB"
pat[] = "AAAAB"
txt[] = "ABABABCABABABCABABABC"
pat[] = "ABABAC" (not a worst case, but a bad case for Naive)
The KMP matching algorithm uses degenerating property (pattern having same
sub-patterns appearing more than once in the pattern) of the pattern and im-
proves the worst case complexity to O(n). The basic idea behind KMP’s al-
gorithm is: whenever we detect a mismatch (after some matches), we already
know some of the characters in the text (since they matched the pattern char-
acters prior to the mismatch). We take advantage of this information to avoid
matching the characters that we know will anyway match.
KMP algorithm does some preprocessing over the pattern pat[] and constructs
an auxiliary array lps[] of size m (same as size of pattern). Here name lps in-
dicates longest proper prefix which is also suffix.. For each sub-pattern
pat[0…i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper
prefix which is also a suffix of the sub-pattern pat[0..i].
Examples:
For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]
8
For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]
For the pattern “AAAAA”, lps[] is [0, 1, 2, 3, 4]
For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3]
For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]
Searching Algorithm:
Unlike the Naive algo where we slide the pattern by one, we use a value from
lps[] to decide the next sliding position. Let us see how we do that. When
we compare pat[j] with txt[i] and see a mismatch, we know that characters
pat[0..j-1] match with txt[i-j+1…i-1], and we also know that lps[j-1] characters
of pat[0…j-1] are both proper prefix and suffix which means we do not need
to match these lps[j-1] characters with txt[i-j…i-1] because we know that these
characters will anyway match. See KMPSearch() in the below code for details.
Preprocessing Algorithm:
In the preprocessing part, we calculate values in lps[]. To do that, we keep
track of the length of the longest prefix suffix value (we use len variable for this
purpose) for the previous index. We initialize lps[0] and len as 0. If pat[len] and
pat[i] match, we increment len by 1 and assign the incremented value to lps[i].
If pat[i] and pat[len] do not match and len is not 0, we update len to lps[len-1].
See computeLPSArray () in the below code for details.
C
9
while (i < N)
{
if (pat[j] == txt[i])
{
j++;
i++;
}
if (j == M)
{
printf("Found pattern at index %d \n", i-j);
j = lps[j-1];
}
10
{
if (len != 0)
{
// This is tricky. Consider the example
// AAACAAAA and i = 7.
len = lps[len-1];
Python
11
while i < N:
if pat[j] == txt[i]:
i+=1
j+=1
if j==M:
print "Found pattern at index " + str(i-j)
j = lps[j-1]
txt = "ABABDABACDABABCABAB"
pat = "ABABCABAB"
KMPSearch(pat, txt)
12
Output:
Please write comments if you find anything incorrect, or you want to share more
information about the topic discussed above.
Source
http://www.geeksforgeeks.org/searching-for-patterns-set-2-kmp-algorithm/
13
Chapter 3
Output:
Pattern found at index 10
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
14
The Naive String Matching algorithm slides the pattern one by one. After each
slide, it one by one checks characters at the current shift and if all characters
match then prints the match.
Like the Naive Algorithm, Rabin-Karp algorithm also slides the pattern one by
one. But unlike the Naive algorithm, Rabin Karp algorithm matches the hash
value of the pattern with the hash value of current substring of text, and if the
hash values match then only it starts matching individual characters. So Rabin
Karp algorithm needs to calculate hash values for following strings.
1) Pattern itself.
2) All the substrings of text of length m.
Since we need to efficiently calculate hash values for all the substrings of size m
of text, we must have a hash function which has following property.
Hash at the next shift must be efficiently computable from the current hash
value and next character in text or we can say hash(txt[s+1 .. s+m]) must be
efficiently computable from hash(txt[s .. s+m-1]) and txt[s+m] i.e., hash(txt[s+1
.. s+m])= rehash(txt[s+m], hash(txt[s .. s+m-1]) and rehash must be O(1)
operation.
The hash function suggested by Rabin and Karp calculates an integer value.
The integer value for a string is numeric value of a string. For example, if
all possible characters are from 1 to 10, the numeric value of “122” will be 122.
The number of possible characters is higher than 10 (256 in general) and pattern
length can be large. So the numeric values cannot be practically stored as an
integer. Therefore, the numeric value is calculated using modular arithmetic to
make sure that the hash values can be stored in an integer variable (can fit in
memory words). To do rehashing, we need to take off the most significant digit
and add the new least significant digit for in hash value. Rehashing is done
using the following formula.
hash( txt[s+1 .. s+m] ) = d ( hash( txt[s .. s+m-1]) – txt[s]*h ) + txt[s + m] )
mod q
hash( txt[s .. s+m-1] ) : Hash value at shift s.
hash( txt[s+1 .. s+m] ) : Hash value at next shift (or shift s+1)
d: Number of characters in the alphabet
q: A prime number
h: dˆ(m-1)
#include<stdio.h>
#include<string.h>
15
// d is the number of characters in input alphabet
#define d 256
16
// Calculate hash value for next window of text: Remove leading digit,
// add trailing digit
if ( i < N-M )
{
t = (d*(t - txt[i]*h) + txt[i+M])%q;
The average and best case running time of the Rabin-Karp algorithm is O(n+m),
but its worst-case time is O(nm). Worst case of Rabin-Karp algorithm occurs
when all characters of pattern and text are same as the hash values of all the
substrings of txt[] match with hash value of pat[]. For example pat[] = “AAA”
and txt[] = “AAAAAAA”.
Please write comments if you find anything incorrect, or you want to share more
information about the topic discussed above.
References:
http://net.pku.edu.cn/~course/cs101/2007/resource/Intro2Algorithm/book6/
chap34.htm
http://www.cs.princeton.edu/courses/archive/fall04/cos226/lectures/string.
4up.pdf
http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
Related Posts:
Searching for Patterns | Set 1 (Naive Pattern Searching)
Searching for Patterns | Set 2 (KMP Algorithm)
17
Source
http://www.geeksforgeeks.org/searching-for-patterns-set-3-rabin-karp-algorithm/
18
Chapter 4
#include<stdio.h>
#include<string.h>
19
while(i <= N - M)
{
int j;
Output:
Pattern found at index 4
Pattern found at index 12
Please write comments if you find anything incorrect, or you want to share more
information about the topic discussed above.
20
Source
http://www.geeksforgeeks.org/pattern-searching-set-4-a-naive-string-matching-algo-question/
Category: Strings Tags: Pattern Searching
21
Chapter 5
Output:
Pattern found at index 10
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
22
Pattern searching is an important problem in computer science. When we do
search for a string in notepad/word file or browser or database, pattern searching
algorithms are used to show the search results.
We have discussed the following algorithms in the previous posts:
Naive Algorithm
KMP Algorithm
Rabin Karp Algorithm
In this post, we will discuss Finite Automata (FA) based pattern searching
algorithm. In FA based algorithm, we preprocess the pattern and build a 2D
array that represents a Finite Automata. Construction of the FA is the main
tricky part of this algorithm. Once the FA is built, the searching is simple. In
search, we simply need to start from the first state of the automata and first
character of the text. At every step, we consider next character of text, look
for the next state in the built FA and move to new state. If we reach final state,
then pattern is found in text. Time complexity of the search prcess is O(n).
Before we discuss FA construction, let us take a look at the following FA for
pattern ACACAGA.
23
Number of states in FA will be M+1 where M is length of the pattern. The
main thing to construct FA is to get the next state from the current state for
every possible character. Given a character x and a state k, we can get the next
state by considering the string “pat[0..k-1]x” which is basically concatenation
of pattern characters pat[0], pat[1] … pat[k-1] and the character x. The idea is
to get length of the longest prefix of the given pattern such that the prefix is
also suffix of “pat[0..k-1]x”. The value of length gives us the next state. For
example, let us see how to get the next state from current state 5 and character
‘C’ in the above diagram. We need to consider the string, “pat[0..5]C” which is
“ACACAC”. The lenght of the longest prefix of the pattern such that the prefix
is suffix of “ACACAC”is 4 (“ACAC”). So the next state (from state 5) is 4 for
character ‘C’.
In the following code, computeTF() constructs the FA. The time complexity of
the computeTF() is O(mˆ3*NO_OF_CHARS) where m is length of the pattern
and NO_OF_CHARS is size of alphabet (total number of possible characters
in pattern and text). The implementation tries all possible prefixes starting
from the longest possible that can be a suffix of “pat[0..k-1]x”. There are better
implementations to construct FA in O(m*NO_OF_CHARS) (Hint: we can use
something like lps array construction in KMP algorithm). We have covered the
better implementation in our next post on pattern searching.
#include<stdio.h>
#include<string.h>
#define NO_OF_CHARS 256
// Start from the largest possible value and stop when you find
// a prefix which is also suffix
for (ns = state; ns > 0; ns--)
{
if(pat[ns-1] == x)
{
for(i = 0; i < ns-1; i++)
24
{
if (pat[i] != pat[state-ns+1+i])
break;
}
if (i == ns-1)
return ns;
}
}
return 0;
}
/* This function builds the TF table which represents Finite Automata for a
given pattern */
void computeTF(char *pat, int M, int TF[][NO_OF_CHARS])
{
int state, x;
for (state = 0; state <= M; ++state)
for (x = 0; x < NO_OF_CHARS; ++x)
TF[state][x] = getNextState(pat, M, state, x);
}
int TF[M+1][NO_OF_CHARS];
computeTF(pat, M, TF);
25
{
char *txt = "AABAACAADAABAAABAA";
char *pat = "AABA";
search(pat, txt);
return 0;
}
Output:
References:
Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald
L. Rivest, Clifford Stein
Please write comments if you find anything incorrect, or you want to share more
information about the topic discussed above.
Source
http://www.geeksforgeeks.org/searching-for-patterns-set-5-finite-automata/
26
Chapter 6
27
The abvoe diagrams represent graphical and tabular representations of pattern
ACACAGA.
Algorithm:
1) Fill the first row. All entries in first row are always 0 except the entry for
pat[0] character. For pat[0] character, we always need to go to state 1.
2) Initialize lps as 0. lps for the first index is always 0.
3) Do following for rows at index i = 1 to M. (M is the length of the pattern)
…..a) Copy the entries from the row at index equal to lps.
…..b) Update the entry for pat[i] character to i+1.
…..c) Update lps “lps = TF[lps][pat[i]]” where TF is the 2D array which is being
constructed.
Implementation
Following is C implementation for the above algorithm.
#include<stdio.h>
#include<string.h>
#define NO_OF_CHARS 256
/* This function builds the TF table which represents Finite Automata for a
given pattern */
void computeTransFun(char *pat, int M, int TF[][NO_OF_CHARS])
{
int i, lps = 0, x;
28
// Copy values from row at index lps
for (x = 0; x < NO_OF_CHARS; x++)
TF[i][x] = TF[lps][x];
int TF[M+1][NO_OF_CHARS];
computeTransFun(pat, M, TF);
29
Output:
Source
http://www.geeksforgeeks.org/pattern-searching-set-5-efficient-constructtion-of-finite-automata/
Category: Strings Tags: Pattern Searching
30
Chapter 7
1) Input:
Output:
2) Input:
txt[] = "AABAACAADAABAAABAA"
pat[] = "AABA"
Output:
31
Pattern found at index 13
/* Program for Bad Character Heuristic of Boyer Moore String Matching Algorithm */
# include <limits.h>
32
# include <string.h>
# include <stdio.h>
int badchar[NO_OF_CHARS];
33
will become -1 after the above loop */
if (j < 0)
{
printf("\n pattern occurs at shift = %d", s);
else
/* Shift the pattern so that the bad character in text
aligns with the last occurrence of it in pattern. The
max function is used to make sure that we get a positive
shift. We may get a negative shift if the last occurrence
of bad character in pattern is on the right side of the
current character. */
s += max(1, j - badchar[txt[s+j]]);
}
}
Output:
The Bad Character Heuristic may take O(mn) time in worst case. The worst
case occurs when all characters of the text and pattern are same. For example,
txt[] = “AAAAAAAAAAAAAAAAAA” and pat[] = “AAAAA”.
Please write comments if you find anything incorrect, or you want to share more
information about the topic discussed above.
34
Source
http://www.geeksforgeeks.org/pattern-searching-set-7-boyer-moore-algorithm-bad-character-heuristic/
35
Chapter 8
0 banana 5 a
1 anana Sort the Suffixes 3 ana
2 nana ----------------> 1 anana
3 ana alphabetically 0 banana
4 na 4 na
5 a 2 nana
36
Naive method to build Suffix Array
A simple method to construct suffix array is to make an array of all suffixes and
then sort the array. Following is implementation of simple method.
37
for (int i = 0; i < n; i++)
suffixArr[i] = suffixes[i].index;
Output:
The time complexity of above method to build suffix array is O(n2 Logn) if we
consider a O(nLogn) algorithm used for sorting. The sorting step itself takes
O(n2 Logn) time as every comparison is a comparison of two strings and the
comparison takes O(n) time.
There are many efficient algorithms to build suffix array. We will soon be
covering them as separate posts.
Search a pattern using the built Suffix Array
To search a pattern in a text, we preprocess the text and build a suffix array
of the text. Since we have a sorted array of all suffixes, Binary Search can be
used to search. Following is the search function. Note that the function doesn’t
report all occurrences of pattern, it only report one of them.
38
// This code only contains search() and main. To make it a complete running
// above code or see http://ideone.com/1Io9eN
39
// search pat in txt using the built suffix array
search(pat, txt, suffArr, n);
return 0;
}
Output:
The time complexity of the above search function is O(mLogn). There are more
efficient algorithms to search pattern once the suffix array is built. In fact there
is a O(m) suffix array based algorithm to search a pattern. We will soon be
discussing efficient algorithm for search.
Applications of Suffix Array
Suffix array is an extremely useful data structure, it can be used for a wide
range of problems. Following are some famous problems where Suffix array can
be used.
1) Pattern Searching
2) Finding the longest repeated substring
3) Finding the longest common substring
4) Finding the longest palindrome in a string
See this for more problems where Suffix arrays can be used.
This post is a simple introduction. There is a lot to cover in Suffix arrays. We
have discussed a O(nLogn) algorithm for Suffix Array construction here. We
will soon be discussing more efficient suffix array algorithms.
References:
http://www.stanford.edu/class/cs97si/suffix-array.pdf
http://en.wikipedia.org/wiki/Suffix_array
Please write comments if you find anything incorrect, or you want to share more
information about the topic discussed above
Source
http://www.geeksforgeeks.org/suffix-array-set-1-introduction/
40
Chapter 9
This problem is slightly different from standard pattern searching problem, here
we need to search for anagrams as well. Therefore, we cannot directly apply
standard pattern searching algorithms like KMP, Rabin Karp, Boyer Moore,
etc.
A simple idea is to modify Rabin Karp Algorithm. For example we can keep
the hash value as sum of ASCII values of all characters under modulo of a big
prime number. For every character of text, we can add the current character to
41
hash value and subtract the first character of previous window. This solution
looks good, but like standard Rabin Karp, the worst case time complexity of
this solution is O(mn). The worst case occurs when all hash values match and
we one by one match all characters.
We can achieve O(n) time complexity under the assumption that alphabet size
is fixed which is typically true as we have maximum 256 possible characters in
ASCII. The idea is to use two count arrays:
1) The first count array store frequencies of characters in pattern.
2) The second count array stores frequencies of characters in current window of
text.
The important thing to note is, time complexity to compare two count arrays is
O(1) as the number of elements in them are fixed (independent of pattern and
text sizes). Following are steps of this algorithm.
1) Store counts of frequencies of pattern in first count array countP[]. Also store
counts of frequencies of characters in first window of text in array countTW[].
2) Now run a loop from i = M to N-1. Do following in loop.
…..a) If the two count arrays are identical, we found an occurrence.
…..b) Increment count of current character of text in countTW[]
…..c) Decrement count of first character in previous window in countWT[]
3) The last window is not checked by above loop, so explicitly check it.
Following is C++ implementation of above algorithm.
42
// countP[]: Store count of all characters of pattern
// countTW[]: Store count of current window of text
char countP[MAX] = {0}, countTW[MAX] = {0};
for (int i = 0; i < M; i++)
{
(countP[pat[i]])++;
(countTW[txt[i]])++;
}
Output:
Found at Index 0
Found at Index 5
Found at Index 6
43
This article is contributed by Piyush Gupta. Please write comments if you
find anything incorrect, or you want to share more information about the topic
discussed above
Source
http://www.geeksforgeeks.org/anagram-substring-search-search-permutations/
44
Chapter 10
45
Building a Trie of Suffixes
1) Generate all suffixes of given text.
2) Consider all suffixes as individual words and build a trie.
Let us consider an example text “banana\0� where ‘\0� is string termination
character. Following are all suffixes of “banana\0�
banana\0
anana\0
nana\0
ana\0
na\0
a\0
\0
If we consider all of the above suffixes as individual words and build a Trie, we
get following.
46
present. To store indexes, we use a list with every node that stores indexes of
suffixes starting at the node.
Following is C++ implementation of the above idea.
47
SuffixTrieNode root;
public:
// Constructor (Builds a trie of suffies of the given text)
SuffixTrie(string txt)
{
// Consider all suffixes of given string and insert
// them into the Suffix Trie using recursive function
// insertSuffix() in SuffixTrieNode class
for (int i = 0; i < txt.length(); i++)
root.insertSuffix(txt.substr(i), i);
}
48
// if there is an edge from the current node of suffix trie,
// follow the edge.
if (children[s.at(0)] != NULL)
return (children[s.at(0)])->search(s.substr(1));
/* Prints all occurrences of pat in the Suffix Trie S (built for text)*/
void SuffixTrie::search(string pat)
{
// Let us call recursive search function for root of Trie.
// We get a list of all indexes (where pat is present in text) in
// variable 'result'
list<int> *result = root.search(pat);
49
S.search("forgeeks");
return 0;
}
Output:
Source
http://www.geeksforgeeks.org/pattern-searching-using-trie-suffixes/
50
Chapter 11
Given a string ‘str’ of digits, find length of the longest substring of ‘str’, such
that the length of the substring is 2k digits and sum of left k digits is equal to
the sum of right k digits.
Examples:
51
// A simple C based program to find length of longest even length
// substring with same sum of digits in left and right
#include<stdio.h>
#include<string.h>
Output:
52
Dynamic Programming [ O(n2 ) and O(n2 ) extra space]
The above solution can be optimized to work in O(n2 ) using Dynamic Pro-
gramming. The idea is to build a 2D table that stores sums of substrings. The
following is C based implementation of Dynamic Programming approach.
53
// Driver program to test above function
int main(void)
{
char str[] = "153803";
printf("Length of the substring is %d", findLength(str));
return 0;
}
Output:
Time complexity of the above solution is O(n2 ), but it requires O(n2 ) extra
space.
[A O(n2 ) and O(n) extra space solution]
The idea is to use a single dimensional array to store cumulative sum.
54
}
}
return ans;
}
Output:
55
/* move on both sides till indexes go out of bounds */
while (r < n && l >= 0)
{
lsum += str[l] - '0';
rsum += str[r] - '0';
if (lsum == rsum)
ans = max(ans, r-l+1);
l--;
r++;
}
}
return ans;
}
Output:
Source
http://www.geeksforgeeks.org/longest-even-length-substring-sum-first-second-half/
Category: Strings Tags: Dynamic Programming
56
Chapter 12
Given a string you need to print all possible strings that can be made by placing
spaces (zero or one) in between them.
57
#include <iostream>
#include <cstring>
using namespace std;
// This function creates buf[] to store individual output string and uses
// printPatternUtil() to print all permutations.
void printPattern(char *str)
{
int n = strlen(str);
58
printPattern(str);
return 0;
}
Output:
ABCD
ABC D
AB CD
AB C D
A BCD
A BC D
A B CD
A B C D
Time Complexity: Since number of Gaps are n-1, there are total 2ˆ(n-1) patters
each having length ranging from n to 2n-1. Thus overall complexity would be
O(n*(2ˆn)).
This article is contributed by Gaurav Sharma. Please write comments if you
find anything incorrect, or you want to share more information about the topic
discussed above
Source
http://www.geeksforgeeks.org/print-possible-strings-can-made-placing-spaces/
Category: Strings
59
Chapter 13
Manacher’s Algorithm -
Linear Time Longest
Palindromic Substring -
Part 1
60
Here center position is not only the actual string character position but it could
be the position between two characters also.
Consider string “abaaba” of even length. This string is palindrome around the
position between 3rd and 4th characters a and a respectively.
In these two strings, left and right side of the center positions (position 7 in 1st
string and position 6 in 2nd string) are symmetric. Why? Because the whole
string is palindrome around the center position.
If we need to calculate Longest Palindromic Substring at each 2*N+1 positions
from left to right, then palindrome’s symmetric property could help to avoid
some of the unnecessary computations (i.e. character comparison). If there is a
palindrome of some length L cantered at any position P, then we may not need
to compare all characters in left and right side at position P+1. We already
calculated LPS at positions before P and they can help to avoid some of the
comparisons after position P.
This use of information from previous positions at a later point of time makes the
Manacher’s algorithm linear. In Set 2, there is no reuse of previous information
and so that is quadratic.
Manacher’s algorithm is probably considered complex to understand, so here we
will discuss it in as detailed way as we can. Some of it’s portions may require
multiple reading to understand it properly.
Let’s look at string “abababa”. In 3rd figure above, 15 center positions are
shown. We need to calculate length of longest palindromic string at each of
these positions.
61
• At position 2, there is no LPS at all (left and right characters a and b
don’t match), so length of LPS will be 0.
• At position 3, LPS is aba, so length of LPS will be 3.
• At position 4, there is no LPS at all (left and right characters b and a
don’t match), so length of LPS will be 0.
• At position 5, LPS is ababa, so length of LPS will be 5.
…… and so on
We store all these palindromic lengths in an array, say L. Then string S and
LPS Length L look like below:
In LPS Array L:
• LPS length value at odd positions (the actual character positions) will be
odd and greater than or equal to 1 (1 will come from the center character
itself if nothing else matches in left and right side of it)
• LPS length value at even positions (the positions between two characters,
extreme left and right positions) will be even and greater than or equal to
0 (0 will come when there is no match in left and right side)
Position and index for the string are two different things here. For
a given string S of length N, indexes will be from 0 to N-1 (total N
indexes) and positions will be from 0 to 2*N (total 2*N+1 positions).
LPS length value can be interpreted in two ways, one in terms of index and
second in terms of position. LPS value d at position I (L[i] = d) tells that:
62
Now the main task is to compute LPS array efficiently. Once this array is
computed, LPS of string S will be centered at position with maximum LPS
length value.
We will see it in Part 2.
This article is contributed by Anurag Singh. Please write comments if you
find anything incorrect, or you want to share more information about the topic
discussed above
Source
http://www.geeksforgeeks.org/manachers-algorithm-linear-time-longest-palindromic-substring-part-1/
63
Chapter 14
Manacher’s Algorithm -
Linear Time Longest
Palindromic Substring -
Part 2
In Manacher’s Algorithm – Part 1, we gone through some of the basics and LPS
length array.
Here we will see how to calculate LPS length array efficiently.
To calculate LPS array efficiently, we need to understand how LPS length for
any position may relate to LPS length value of any previous already calculated
position.
For string “abaaba”, we see following:
We calculate LPS length values from left to right starting from position 0, so
we can see if we already know LPS length values at positions 1, 2 and 3 already
then we may not need to calculate LPS length at positions 4 and 5 because
they are equal to LPS length values at corresponding positions on left side of
position 3.
If we look around position 6:
64
• LPS length value at position 5 and position 7 are same
• LPS length value at position 4 and position 8 are same
65
• centerPosition – This is the position for which LPS length is calculated
and let’s say LPS length at centerPosition is d (i.e. L[centerPosition] =
d)
• centerRightPosition – This is the position which is right to the center-
Position and d position away from centerPosition (i.e. centerRightPo-
sition = centerPosition + d)
• centerLeftPosition – This is the position which is left to the centerPo-
sition and d position away from centerPosition (i.e. centerLeftPosition
= centerPosition – d)
• currentRightPosition – This is the position which is right of the center-
Position for which LPS length is not yet known and has to be calculated
• currentLeftPosition – This is the position on the left side of centerPo-
sition which corresponds to the currentRightPosition
centerPosition – currentLeftPosition = currentRightPosition –
centerPosition
currentLeftPosition = 2* centerPosition – currentRightPosition
• i-left palindrome – The palindrome i positions left of centerPosition, i.e.
at currentLeftPosition
• i-right palindrome – The palindrome i positions right of centerPosition,
i.e. at currentRightPosition
• center palindrome – The palindrome at centerPosition
When we are at centerPosition for which LPS length is known, then we also
know LPS length of all positions smaller than centerPosition. Let’s say LPS
length at centerPosition is d, i.e.
L[centerPosition] = d
It means that substring between positions “centerPosition-d” to “centerPosi-
tion+d” is a palindrom.
Now we proceed further to calculate LPS length of positions greater than cen-
terPosition.
Let’s say we are at currentRightPosition ( > centerPosition) where we need to
find LPS length.
For this we look at LPS length of currentLeftPosition which is already calcu-
lated.
If LPS length of currentLeftPosition is less than “centerRightPosition – curren-
tRightPosition”, then LPS length of currentRightPosition will be equal to LPS
length of currentLeftPosition. So
L[currentRightPosition] = L[currentLeftPosition] if L[currentLeftPosition] <
centerRightPosition – currentRightPosition. This is Case 1.
Let’s consider below scenario for string “abababa”:
66
We have calculated LPS length up-to position 7 where L[7] = 7, if we consider
position 7 as centerPosition, then centerLeftPosition will be 0 and centerRight-
Position will be 14.
Now we need to calculate LPS length of other positions on the right of center-
Position.
For currentRightPosition = 8, currentLeftPosition is 6 and L[currentLeftPosition]
=0
Also centerRightPosition – currentRightPosition = 14 – 8 = 6
Case 1 applies here and so L[currentRightPosition] = L[8] = 0
Case 1 applies to positions 10 and 12, so,
L[10] = L[4] = 0
L[12] = L[2] = 0
If we look at position 9, then:
currentRightPosition = 9
currentLeftPosition = 2* centerPosition – currentRightPosition = 2*7 – 9 = 5
centerRightPosition – currentRightPosition = 14 – 9 = 5
Here L[currentLeftPosition] = centerRightPosition – currentRightPosition, so
Case 1 doesn’t apply here. Also note that centerRightPosition is the extreme
end position of the string. That means center palindrome is suffix of input
string. In that case, L[currentRightPosition] = L[currentLeftPosition]. This is
Case 2.
Case 2 applies to positions 9, 11, 13 and 14, so:
L[9] = L[5] = 5
L[11] = L[3] = 3
L[13] = L[1] = 1
L[14] = L[0] = 0
What is really happening in Case 1 and Case 2? This is just utilizing the
palindromic symmetric property and without any character match, it is finding
LPS length of new positions.
When a bigger length palindrome contains a smaller length palindrome centered
at left side of it’s own center, then based on symmetric property, there will be
another same smaller palindrome centered on the right of bigger palindrome
center. If left side smaller palindrome is not prefix of bigger palindrome, then
Case 1 applies and if it is a prefix AND bigger palindrome is suffix of the input
string itself, then Case 2 applies.
The longest palindrome i places to the right of the current center (the i-right
palindrome) is as long as the longest palindrome i places to the left of the current
center (the i-left palindrome) if the i-left palindrome is completely contained in
the longest palindrome around the current center (the center palindrome) and
the i-left palindrome is not a prefix of the center palindrome (Case 1) or (i.e.
when i-left palindrome is a prefix of center palindrome) if the center palindrome
is a suffix of the entire string (Case 2).
67
In Case 1 and Case 2, i-right palindrome can’t expand more than corresponding
i-left palindrome (can you visualize why it can’t expand more?), and so LPS
length of i-right palindrome is exactly same as LPS length of i-left palindrome.
Here both i-left and i-right palindromes are completely contained in center
palindrome (i.e. L[currentLeftPosition] <= centerRightPosition – currentRight-
Position)
Now if i-left palindrome is not a prefix of center palindrome (L[currentLeftPosition]
< centerRightPosition – currentRightPosition), that means that i-left palin-
drome was not able to expand up-to position centerLeftPosition.
If we look at following with centerPosition = 11, then
68
Case 2: L[currentRightPosition] = L[currentLeftPosition] applies when:
69
we try to expand it by comparing characters in left and right side starting from
distance 4 (As up-to distance 3, it is already known that characters will match).
If we take center position 11, then Case 4 applies at currentRightPosition 15
because L[currentLeftPosition] = L[7] = 7 > centerRightPosition – currentRight-
Position = 20 – 15 = 5. In the case, it is guaranteed that L[15] will be at least
5, and so in implementation, we 1st set L[15] = 5 and then we try to expand
it by comparing characters in left and right side starting from distance 5 (As
up-to distance 5, it is already known that characters will match).
Now one point left to discuss is, when we work at one center position and com-
pute LPS lengths for different rightPositions, how to know that what would be
next center position. We change centerPosition to currentRightPosition if palin-
drome centered at currentRightPosition expands beyond centerRightPosition.
Here we have seen four different cases on how LPS length of a position will
depend on a previous position’s LPS length.
In Part 3, we have discussed code implementation of it and also we have looked
at these four cases in a different way and implement that too.
This article is contributed by Anurag Singh. Please write comments if you
find anything incorrect, or you want to share more information about the topic
discussed above
Source
http://www.geeksforgeeks.org/manachers-algorithm-linear-time-longest-palindromic-substring-part-2/
70
Chapter 15
Manacher’s Algorithm -
Linear Time Longest
Palindromic Substring -
Part 3
In Manacher’s Algorithm Part 1 and Part 2, we gone through some of the basics,
understood LPS length array and how to calculate it efficiently based on four
cases. Here we will implement the same.
We have seen that there are no new character comparison needed in case 1 and
case 2. In case 3 and case 4, necessary new comparison are needed.
In following figure,
71
way could be to work on given string itself but here even and odd positions
should be handled appropriately.
Here we will start with given string itself. When there is a need of expansion and
character comparison required, we will expand in left and right positions one
by one. When odd position is found, comparison will be done and LPS Length
will be incremented by ONE. When even position is found, no comparison done
and LPS Length will be incremented by ONE (So overall, one odd and one even
positions on both left and right side will increase LPS Length by TWO).
char text[100];
void findLongestPalindromicString()
{
int N = strlen(text);
if(N == 0)
return;
N = 2*N + 1; //Position count
int L[N]; //LPS Length Array
L[0] = 0;
L[1] = 1;
int C = 1; //centerPosition
int R = 2; //centerRightPosition
int i = 0; //currentRightPosition
int iMirror; //currentLeftPosition
int expand = -1;
int diff = -1;
int maxLPSLength = 0;
int maxLPSCenterPosition = 0;
int start = -1;
int end = -1;
72
if(diff > 0)
{
if(L[iMirror] < diff) // Case 1
L[i] = L[iMirror];
else if(L[iMirror] == diff && i == N-1) // Case 2
L[i] = L[iMirror];
else if(L[iMirror] == diff && i < N-1) // Case 3
{
L[i] = L[iMirror];
expand = 1; // expansion required
}
else if(L[iMirror] > diff) // Case 4
{
L[i] = diff;
expand = 1; // expansion required
}
}
else
{
L[i] = 0;
expand = 1; // expansion required
}
if(expand == 1)
{
//Attempt to expand palindrome centered at currentRightPosition i
//Here for odd positions, we compare characters and
//if match then increment LPS Length by ONE
//If even position, we just increment LPS by ONE without
//any character comparison
while ( ((i + L[i]) < N && (i - L[i]) > 0) &&
( ((i + L[i] + 1) % 2 == 0) ||
(text[(i + L[i] + 1)/2] == text[(i - L[i] - 1)/2] )))
{
L[i]++;
}
}
73
// adjust centerPosition C based on expanded palindrome.
if (i + L[i] > R)
{
C = i;
R = i + L[i];
}
//Uncomment it to print LPS Length array
//printf("%d ", L[i]);
}
//printf("\n");
start = (maxLPSCenterPosition - maxLPSLength)/2;
end = start + maxLPSLength - 1;
//printf("start: %d end: %d\n", start, end);
printf("LPS of string is %s : ", text);
for(i=start; i<=end; i++)
printf("%c", text[i]);
printf("\n");
}
strcpy(text, "babcbabcbaccba");
findLongestPalindromicString();
strcpy(text, "abaaba");
findLongestPalindromicString();
strcpy(text, "abababa");
findLongestPalindromicString();
strcpy(text, "abcbabcbabcba");
findLongestPalindromicString();
strcpy(text, "forgeeksskeegfor");
findLongestPalindromicString();
strcpy(text, "caba");
findLongestPalindromicString();
strcpy(text, "abacdfgdcaba");
findLongestPalindromicString();
strcpy(text, "abacdfgdcabba");
findLongestPalindromicString();
74
strcpy(text, "abacdedcaba");
findLongestPalindromicString();
return 0;
}
Output:
This is the implementation based on the four cases discussed in Part 2. In Part
4, we have discussed a different way to look at these four cases and few other
approaches.
This article is contributed by Anurag Singh. Please write comments if you
find anything incorrect, or you want to share more information about the topic
discussed above
Source
http://www.geeksforgeeks.org/manachers-algorithm-linear-time-longest-palindromic-substring-part-3-2/
75
Chapter 16
Manacher’s Algorithm -
Linear Time Longest
Palindromic Substring -
Part 4
In Manacher’s Algorithm Part 1 and Part 2, we gone through some of the basics,
understood LPS length array and how to calculate it efficiently based on four
cases. In Part 3, we implemented the same.
Here we will review the four cases again and try to see it differently and imple-
ment the same.
All four cases depends on LPS length value at currentLeftPosition (L[iMirror])
and value of (centerRightPosition – currentRightPosition), i.e. (R – i). These
two information are know before which helps us to reuse previous available
information and avoid unnecessary character comparison.
If we look at all four cases, we will see that we 1st set minimum of L[iMirror]
and R-i to L[i] and then we try to expand the palindrome in whichever case it
can expand.
Above observation may look more intuitive, easier to understand and imple-
ment, given that one understands LPS length array, position, index, symmetry
property etc.
76
// A C program to implement Manacher’s Algorithm
#include <stdio.h>
#include <string.h>
char text[100];
int min(int a, int b)
{
int res = a;
if(b < a)
res = b;
return res;
}
void findLongestPalindromicString()
{
int N = strlen(text);
if(N == 0)
return;
N = 2*N + 1; //Position count
int L[N]; //LPS Length Array
L[0] = 0;
L[1] = 1;
int C = 1; //centerPosition
int R = 2; //centerRightPosition
int i = 0; //currentRightPosition
int iMirror; //currentLeftPosition
int maxLPSLength = 0;
int maxLPSCenterPosition = 0;
int start = -1;
int end = -1;
int diff = -1;
77
//if match then increment LPS Length by ONE
//If even position, we just increment LPS by ONE without
//any character comparison
while ( ((i + L[i]) < N && (i - L[i]) > 0) &&
( ((i + L[i] + 1) % 2 == 0) ||
(text[(i + L[i] + 1)/2] == text[(i - L[i] - 1)/2] )))
{
L[i]++;
}
strcpy(text, "babcbabcbaccba");
findLongestPalindromicString();
strcpy(text, "abaaba");
findLongestPalindromicString();
strcpy(text, "abababa");
78
findLongestPalindromicString();
strcpy(text, "abcbabcbabcba");
findLongestPalindromicString();
strcpy(text, "forgeeksskeegfor");
findLongestPalindromicString();
strcpy(text, "caba");
findLongestPalindromicString();
strcpy(text, "abacdfgdcaba");
findLongestPalindromicString();
strcpy(text, "abacdfgdcabba");
findLongestPalindromicString();
strcpy(text, "abacdedcaba");
findLongestPalindromicString();
return 0;
}
Output:
Other Approaches
We have discussed two approaches here. One in Part 3 and other in current
article. In both approaches, we worked on given string. Here we had to handle
even and odd positions differently while comparing characters for expansion
(because even positions do not represent any character in string).
To avoid this different handling of even and odd positions, we need to make even
positions also to represent some character (actually all even positions should rep-
resent SAME character because they MUST match while character comparison).
79
One way to do this is to set some character at all even positions by modifying
given string or create a new copy of given string. For example, if input string is
“abcb”, new string should be “#a#b#c#b#” if we add # as unique character
at even positions.
The two approaches discussed already can be modified a bit to work on modified
string where different handling of even and odd positions will not be needed.
We may also add two DIFFERENT characters (not yet used anywhere in string
at even and odd positions) at start and end of string as sentinels to avoid bound
check. With these changes string “abcb” will look like “ˆ#a#b#c#b#$” where
ˆ and $ are sentinels.
This implementation may look cleaner with the cost of more memory.
We are not implementing these here as it’s a simple change in given implemen-
tations.
Implementation of approach discussed in current article on a modified string
can be found at Longest Palindromic Substring Part II and a Java Translation
of the same by Princeton.
This article is contributed by Anurag Singh. Please write comments if you
find anything incorrect, or you want to share more information about the topic
discussed above
Source
http://www.geeksforgeeks.org/manachers-algorithm-linear-time-longest-palindromic-substring-part-4/
80