09 SuffixTrees
09 SuffixTrees
Michael T. Goodrich
University of California, Irvine
c
{ a
aeef b
ad e
bbfe d b
bbfg
e
f
c }
f c
e g
Trie (Cont)
Assume no string is a prefix of another
a c
c
a
b
e
d b bbf
d
e eef
f
f c c
e g e g
Children of each node can still be indexed by a character from the alphabet
(the first one in the substring)
Suffix tree
Given a string s a suffix tree of s is a
compressed trie of all suffixes of s
{ $ (4,4)
$ a b
b$ b (0,1) (3,3)
ab$ $ (4,4)
a
bab$ a $ b (4,4)
abab$ b $ (2,4) (2,4)
$
}
O(n) space
Trivial algorithm to build a Suffix tree
a
Put the largest suffix in b
a
b
$
a b
Put the next largest (bab$) b a
a b
suffix in b $
$
a b
b a
a b
b $
$
We also label each leaf with the starting point of the corresponding
suffix. $
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
Analysis
Naively, this takes O(n2) time to build in the worst
case.
If we did not get stuck traversing the pattern then the pattern
occurs in the text.
Each leaf in the subtree below the node we reach corresponds
to an occurrence.
b$ b# #
ab$ ab# b a a $ 3
bab$ aab# b b 4
# $
abab$
a $ # 1 2
} b
$ 3 2
1
Longest common substring (of two strings)
Every node with a leaf
descendant from string s1 and a #
$
leaf descendant from string s2 a b 5 4
represents a maximal common #
substring and vice versa. $
b a a 3
b b 4
Find such node with # $
largest “string depth” a $ # 1 2
b
$ 3 2
1
Longest Substring that is a Palindrome
a #
b c $
7 7
$
a
b a
b
6 c b
c
c # a 6
a $ a
#
a
# $
4
b a
3 3 a 5 5 b
$ a c
b a
c 4 $ # $
2 1
2
1
#