10 String Algorithms
10 String Algorithms
PROGRAMMING CONTESTS
Jaehyun Park
Constant
size
th letter of (1-indexed)
, , : single letters in
, , : strings
= AGCATGCTGCAGTCATGCTTAGGCTA
= GCT
We
Hash Function
Hash Table
Hash Table
Pros:
Easy
to implement
Significant speedup in practice
Cons:
Doesnt
Can
[]
10
[]
12345678901234567890123
ABC ABCDAB ABCDABCDABDE
ABCDABD
1234567
Mismatch at 4 again!
We define 0 = 1
We
12345678901234567890123
ABC ABCDAB ABCDABCDABDE
ABCDABD
1234567
Mismatch at 11 !
Mismatch at 11 again!
12345678901234567890123
ABC ABCDAB ABCDABCDABDE
ABCDABD
1234567
Mismatch at 18
12345678901234567890123
ABC ABCDAB ABCDABCDABDE
ABCDABD
1234567
Computing
Observation 1: if 1 [] is a suffix of 1 ,
then 1 1 is a suffix of 1 1
Well,
obviously
1 , 1 , 1
suffixes of 1
are all
Computing
A non-obvious conclusion:
First,
e.g.
If
there is no such , [] = 0
Implementation
pi[0] = -1;
int k = -1;
for(int i = 1; i <= m; i++) {
while(k >= 0 && P[k+1] != P[i])
k = pi[k];
pi[i] = ++k;
}
Suffix Trie
Incremental Construction
Incremental Construction
Construction Example
a
b
b
a
Construction Example
a
b
b
a
Construction Example
a
b
b
a
3. Make a c-transition at
Construction Example
a
b
Construction Example
c
Construction Example
c
e.g.
= aaabbb
Pattern Matching
Suffix Array
Input string
BANANA
1
2
3
4
5
6
BANANA
ANANA
NANA
ANA
NA
A
6
4
2
1
5
3
A
ANA
ANANA
BANANA
NA
NANA
6,4,2,1,5,3
Suffix Array
Memory usage is
Has the same computational power as suffix trie
Can be constructed in time (!)
But