Suffix Trees and Suffix Arrays
Suffix Trees and Suffix Arrays
Trie
• A tree representing a set of strings.
a c
{
aeef b
ad e
bbfe d b
bbfg
e
f
c }
f c
e g
Trie (Cont)
• Assume no string is a prefix of another
e
f
f c
e g
Compressed Trie
• Compress unary nodes, label edges by strings
a c c
a
b
e
d b bbf
d
e eef
f
f c c
e g e g
Suffix tree
Given a string s a suffix tree of s is a
compressed trie of all suffixes of s
{ $
$ a b
b
b$ $
ab$ a
a $ b
bab$ b $
abab$ $
}
Trivial algorithm to build a Suffix tree
a
Put the largest suffix in b
a
b
$
a b
Put the suffix bab$ in b a
a b
b $
$
a b
b a
a b
b $
$
We will also label each leaf with the starting point of the corres.
suffix.
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
Analysis
Takes O(n2) time to build.
If we did not get stuck traversing the pattern then the pattern
occurs in the text.
Each leaf in the subtree below the node we reach corresponds
to an occurrence.
Let s = cbaaba
The maximal palindrome with center between i-1 and i is the LCP
of the suffix at position i of s and the suffix at position m-i+1 of sr
Maximal palindromes algorithm
Prepare a generalized suffix tree for
s = cbaaba$ and sr = abaabc#
a #
b $
c 7
7
$
a
b a
b
6 c b
c
c # a 6
a $ a
#
a
# $
4
b a
a 5 5 b
3 3 $
b a c a
c 4 $ # $
2 1
2
# 1
Analysis
O(n) time to identify all palindromes
Drawbacks
• Suffix trees consume a lot of space
Let s = abab
Sort the suffixes lexicographically:
ab, abab, b, bab
The suffix array gives the indices of the
suffixes in sorted order
3 1 4 2
How do we build it ?
• Build a suffix tree
• Traverse the tree in DFS, lexicographically
picking edges outgoing from each node
and fill the suffix array.
• O(n) time
How do we search for a pattern ?
• If P occurs in T then all its occurrences are
consecutive in the suffix array.
Maintain = LCP(P,L)
L
Maintain r = LCP(P,R)
R r
How do we accelerate the search ?
L
If > r then