Suffix
Suffix
0 banana 5 a
1 anana 3 ana
2 nana 1 anana
=⇒
3 ana 0 banana
4 na 4 na
5 a 2 nana
Applications
I full text indexing
I Burrows-Wheeler transform (bzip2 compressor)
I replacement for more complex suffix tree
Ordered alphabet
I only comparisons of characters allowed
Constant alphabet
I ordered alphabet of constant size
I multiset of characters can be sorted in linear time
Integer alphabet
I alphabet is {1, . . ., σ} for integer σ ≥ 2
I multiset of k characters can be sorted in O (k + σ) time
012345 125024
banana -> aaabnn
213131 <- 111233
S = banana
5 a 5 a
3 ana 3 ana
1 anana 1 anana
+ =⇒
0 banana 0 banana
4 na 4 na
2 nana 2 nana
12
S 325241
recursive call
531042 suffix array
436251 lex. names (ranks) among 23 suffixes
a 4n 2a n 3a 5n a 6s 1.
12
S 365421 names already unique
c 3h 4i h 6u 2a h 5u 1a
inline bool leq(int a1, int a2, int b1, int b2) {
return(a1 < b1 || a1 == b1 && a2 <= b2);
}
inline bool leq(int a1, int a2, int a3, int b1, int b2, int b3) {
return(a1 < b1 || a1 == b1 && leq(a2,a3, b2,b3));
}
for (int i=0, j=0; i < n02; i++) if (SA12[i] < n0) s0[j++] = 3*SA12[i];
radixPass(s0, SA0, s, n0, K);
I tuning
I use larger difference covers
I external memory implementation
I parallel implementation
I combine with best algorithms for easy inputs
[Manzini Ferragina 02, Schürmann Stoye 05]
Future/Ongoing Work
I Implementation (internal/external/parallel)
I Large scale applications