0% found this document useful (0 votes)

19 views21 pages

09 SuffixTrees

This document discusses suffix trees, which are compressed tries representing all suffixes of a string or set of strings. Suffix trees allow for efficient exact string matching and finding the longest common substring between strings. While a naive algorithm to build a suffix tree takes O(n^2) time, more sophisticated algorithms can construct one in linear O(n) time. Suffix trees have applications in searching patterns against databases and finding all occurrences of a pattern in a text.

Uploaded by

khansara7744

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views21 pages

09 SuffixTrees

Uploaded by

khansara7744

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Suffix Trees

Michael T. Goodrich
University of California, Irvine

Most slides adapted from http://www.cs.tau.ac.il/~bchor/CG/suffixtrees.ppt by Haim Kaplan

Trie
 A digital tree representing a set of strings.

c
{ a
aeef b
ad e
bbfe d b
bbfg
e
f
c }
f c
e g
Trie (Cont)
 Assume no string is a prefix of another

Each edge is labeled by a letter, c

a
no two edges outgoing from the same b
node are labeled the same.
e
Each string corresponds to a leaf. d b

The children of a node can be e

f
given in a list or a hash table
(indexed by characters from c
f e
the alphabet) g
Compressed Trie
 Compress unary nodes, label edges by substrings

a c 
c
a
b
e
d b bbf
d
e eef
f
f c c
e g e g

Children of each node can still be indexed by a character from the alphabet
(the first one in the substring)
Suffix tree
Given a string s a suffix tree of s is a
compressed trie of all suffixes of s

To make these suffixes prefix-free we add a

special character, say $, at the end of s

Mississippi -> Mississippi$

Suffix tree (Example)
Let s=abab, a suffix tree of s is a compressed
trie of all suffixes of s=abab$

{ $ (4,4)
$ a b
b$ b (0,1) (3,3)
ab$ $ (4,4)
a
bab$ a $ b (4,4)
abab$ b $ (2,4) (2,4)
$
}

O(n) space
Trivial algorithm to build a Suffix tree

a
Put the largest suffix in b
a
b
$

a b
Put the next largest (bab$) b a
a b
suffix in b $
$
a b
b a
a b
b $
$

Put the third suffix (ab$)

a b
in b a
b
$
a $
b
$
a b
b a
b
$
a $
b
$

Put the next largest suffix (b$) in a b

b
$
a
a $ b
b $
$
a b
b
$
a
a $ b
b $
$

Put the last suffix ($) in $

a b
b
$
a
a $ b
b $
$
$
a b
b
$
a
a $ b
b $
$

We also label each leaf with the starting point of the corresponding
suffix. $
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1
Analysis
Naively, this takes O(n2) time to build in the worst
case.

More sophisticated algorithms can construct a suffix

tree in O(n) time… (to be continued).
The Naïve Algorithm in Practice
 The naïve construction algorithm is not usually
as bad as O(n2) time in practice.
 A worst-case example is an-1b$. This is rare.
 For example, for a random string, the naïve
algorithm runs in O(n log n) expected time.
 Why?
What can we do with it ?
Exact string matching:
Given a Text T, |T| = n, preprocess it such that
when a pattern P, |P|=m, arrives you can quickly
decide when it occurs in T.

We may also want to find all occurrences of P in T

Exact string matching
In preprocessing we just build a suffix tree in O(n) time
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1

Given a pattern P = ab we traverse the tree according to the

pattern.
$
a b 5
b
$
a
a $ b 4
b $
$ 3
2
1

If we did not get stuck traversing the pattern then the pattern
occurs in the text.
Each leaf in the subtree below the node we reach corresponds
to an occurrence.

By traversing this subtree we get all k occurrences in O(m+k)

time
So what can we do with it ?
Matching a pattern against a database of
strings

1. Construct a suffix tree for the text

2. Search for each pattern in the suffix tree
Generalized suffix tree
Given a set of strings S a generalized suffix
tree of S is a compressed trie of all suffixes of
sS
To make these suffixes prefix-free we add a
special char, say $, at the end of s

To associate each suffix with a unique string

in S add a different special char to each s
Generalized suffix tree (Example)
Let s1=abab and s2=aab
here is a generalized suffix tree for s1 and s2
#
{ $
a b
$ # 5 4

b$ b# #
ab$ ab# b a a $ 3
bab$ aab# b b 4
# $
abab$
a $ # 1 2
} b
$ 3 2
1
Longest common substring (of two strings)
Every node with a leaf
descendant from string s1 and a #
$
leaf descendant from string s2 a b 5 4
represents a maximal common #
substring and vice versa. $
b a a 3
b b 4
Find such node with # $
largest “string depth” a $ # 1 2
b
$ 3 2
1
Longest Substring that is a Palindrome

Let s = cbaaba$ then sr = abaabc#

a #
b c $
7 7
$
a

b a
b

6 c b
c

c # a 6
a $ a
#

a
# $
4
b a
3 3 a 5 5 b
$ a c
b a
c 4 $ # $
2 1
2
1
#

Flat - Unit-2
No ratings yet
Flat - Unit-2
32 pages
Tutorial Suffix Tree
No ratings yet
Tutorial Suffix Tree
16 pages
String Matching: CPSC 212: Algorithms and Data Structures Brian C. Dean
No ratings yet
String Matching: CPSC 212: Algorithms and Data Structures Brian C. Dean
23 pages
TOC Notes
No ratings yet
TOC Notes
15 pages
Bottom Up Evalution of Inherited Attributes - Group 7
No ratings yet
Bottom Up Evalution of Inherited Attributes - Group 7
11 pages
Toc Problem Solving
No ratings yet
Toc Problem Solving
5 pages
Question - Bank - Complier Design
No ratings yet
Question - Bank - Complier Design
7 pages
Lecture 6
No ratings yet
Lecture 6
133 pages
Gsaca
No ratings yet
Gsaca
63 pages
Talg 11
No ratings yet
Talg 11
33 pages
Trie
No ratings yet
Trie
13 pages
Tries and Suffix Tries
No ratings yet
Tries and Suffix Tries
29 pages
Types of Tries
No ratings yet
Types of Tries
20 pages
Lec 11 Trie
No ratings yet
Lec 11 Trie
28 pages
Notesa
No ratings yet
Notesa
15 pages
Trie and Suffix Trees
No ratings yet
Trie and Suffix Trees
17 pages
Lecture Four Language Grammar
No ratings yet
Lecture Four Language Grammar
12 pages
Jda 2009
No ratings yet
Jda 2009
29 pages
Lecture04 SuffixArray
No ratings yet
Lecture04 SuffixArray
5 pages
Week - 3 3) First and Follow 3.1) Simulate First and Follow of A Grammar. Program
No ratings yet
Week - 3 3) First and Follow 3.1) Simulate First and Follow of A Grammar. Program
14 pages
FLAT (5th) Dec2022
No ratings yet
FLAT (5th) Dec2022
2 pages
Compiler Design BCST 602
No ratings yet
Compiler Design BCST 602
2 pages
Ads 2 Part 4
No ratings yet
Ads 2 Part 4
18 pages
Bbbbbbbbbbbbbbig
No ratings yet
Bbbbbbbbbbbbbbig
8 pages
Tries
No ratings yet
Tries
17 pages
16 Rabin Karp Algorithm 07-02-2025
No ratings yet
16 Rabin Karp Algorithm 07-02-2025
7 pages
Ukkonen
No ratings yet
Ukkonen
14 pages
PL-3 Handout
No ratings yet
PL-3 Handout
6 pages
Suffix Trees in Detail
No ratings yet
Suffix Trees in Detail
23 pages
Suffix Trees
No ratings yet
Suffix Trees
76 pages
Exact String Matching Using Suffix Trees
No ratings yet
Exact String Matching Using Suffix Trees
2 pages
Suffix Array
No ratings yet
Suffix Array
71 pages
Suffix Tree
No ratings yet
Suffix Tree
130 pages
Suffix Trees and Their Applications in String Algo
No ratings yet
Suffix Trees and Their Applications in String Algo
21 pages
Tries and Radix Tree1
No ratings yet
Tries and Radix Tree1
27 pages
First and Follow Problems
No ratings yet
First and Follow Problems
4 pages
Suffix Tree
No ratings yet
Suffix Tree
6 pages
Notes 06 Text Indexing PDF
No ratings yet
Notes 06 Text Indexing PDF
162 pages
Lecture03 SuffixTree
No ratings yet
Lecture03 SuffixTree
3 pages
Unit5 Trie
No ratings yet
Unit5 Trie
23 pages
ATC (18CS54) Module-4: 14.1 The Decidable Questions
No ratings yet
ATC (18CS54) Module-4: 14.1 The Decidable Questions
23 pages
Current Challenges in Textual Databases: Gonzalo Navarro
No ratings yet
Current Challenges in Textual Databases: Gonzalo Navarro
44 pages
SSCD Chapter3
No ratings yet
SSCD Chapter3
97 pages
10 TSP Exam Sol
No ratings yet
10 TSP Exam Sol
8 pages
LL 1
No ratings yet
LL 1
73 pages
Suffixtrees
No ratings yet
Suffixtrees
50 pages
6 Suffix-Tree
No ratings yet
6 Suffix-Tree
20 pages
Tries and Suffix Tries
No ratings yet
Tries and Suffix Tries
26 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
9 Suffix Trees: Tttta
No ratings yet
9 Suffix Trees: Tttta
9 pages
Compiler Design Introduction
No ratings yet
Compiler Design Introduction
23 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
Suffix Trees, Suffix Arrays, and Their Applications
No ratings yet
Suffix Trees, Suffix Arrays, and Their Applications
29 pages
Trie
No ratings yet
Trie
6 pages
Suffix Trees: CSC 448 Bioinformatics Algorithms Alexander Dekhtyar
No ratings yet
Suffix Trees: CSC 448 Bioinformatics Algorithms Alexander Dekhtyar
8 pages
MidtermS20Key 1
No ratings yet
MidtermS20Key 1
7 pages
Trie Tree
No ratings yet
Trie Tree
21 pages
CD GTU Question Bank
No ratings yet
CD GTU Question Bank
6 pages
TCS1
No ratings yet
TCS1
27 pages
Automata - Unit 3-1
No ratings yet
Automata - Unit 3-1
26 pages
CS154 Midterm
No ratings yet
CS154 Midterm
7 pages
Chapter 3 Part 2
No ratings yet
Chapter 3 Part 2
22 pages
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
No ratings yet
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
78 pages
Suffix Arrays
No ratings yet
Suffix Arrays
20 pages
6 Sem Solution Bank
No ratings yet
6 Sem Solution Bank
251 pages
Valida CNPJ-CPF Java
No ratings yet
Valida CNPJ-CPF Java
3 pages
Toc
No ratings yet
Toc
6 pages
Tries
No ratings yet
Tries
3 pages
Pattern Matching: Suffix Tree Applications
No ratings yet
Pattern Matching: Suffix Tree Applications
39 pages
Suffix Trees and Suffix Arrays
No ratings yet
Suffix Trees and Suffix Arrays
33 pages
Questions Set Compiler
No ratings yet
Questions Set Compiler
8 pages
Lecture4 - Indexing and Searching I
No ratings yet
Lecture4 - Indexing and Searching I
56 pages
Yacc Examples
No ratings yet
Yacc Examples
9 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
CD UNIT-II Syntax Analysis
No ratings yet
CD UNIT-II Syntax Analysis
13 pages
Interpreter Pattern - Behavioural!: - Intent"
No ratings yet
Interpreter Pattern - Behavioural!: - Intent"
15 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Operator Precedence Grammar
100% (2)
Operator Precedence Grammar
5 pages
Chapter 3 REGULAR EXPRESSION
No ratings yet
Chapter 3 REGULAR EXPRESSION
28 pages
Suf Tree
No ratings yet
Suf Tree
6 pages
Compiler Design
No ratings yet
Compiler Design
11 pages
Outline and Reading: Tries 4/1/2003 9:02 AM
No ratings yet
Outline and Reading: Tries 4/1/2003 9:02 AM
3 pages
Chapter 04 - Context Free Language
No ratings yet
Chapter 04 - Context Free Language
21 pages
Applications of Suffix Trees
No ratings yet
Applications of Suffix Trees
40 pages

09 SuffixTrees

Uploaded by

09 SuffixTrees

Uploaded by

Suffix Trees

Most slides adapted from http://www.cs.tau.ac.il/~bchor/CG/suffixtrees.ppt by Haim Kaplan

Each edge is labeled by a letter, c

The children of a node can be e

To make these suffixes prefix-free we add a

Mississippi -> Mississippi$

Put the third suffix (ab$)

Put the next largest suffix (b$) in a b

Put the last suffix ($) in $

More sophisticated algorithms can construct a suffix

We may also want to find all occurrences of P in T

Given a pattern P = ab we traverse the tree according to the

By traversing this subtree we get all k occurrences in O(m+k)

1. Construct a suffix tree for the text

To associate each suffix with a unique string

Let s = cbaaba$ then sr = abaabc#

You might also like