Lecture03_SuffixTree

Uploaded by

mahmoudsharaf796

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Lecture03_SuffixTree

Uploaded by

mahmoudsharaf796

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Cairo University

Faculty of Computers and Artificial Intelligence

Computer Science Department

Advanced Data Structures Tries and Suffix Trees Dr. Amin Allam

[For more details, refer to “Jewels of Stringology” by Maxime Crochemore and Wojciech Rytter]

1 Tries
The trie data structures is a tree that stores several small strings (dataset), and allows to search for
(retrieve) a given (query) string inside the stored dataset. The following trie stores 5 strings:
0) AGA
1) AG T
A
2) AAG C
G
3) GAAG
4) TCG A G G
A
4
G
1 A
A
2
0
G

Insertions and retrievals start from the root. Each edge is labelled with one character. Edges from
a node to its children must be labelled with different characters. The ID of a dataset string is
contained in the node such that the path from the root to that node is labelled with that string (as
shown in the squares in the above figure).

Each insertion or retrieval traverses at most exactly m edges where m is the string length, thus
costing O(m) time assuming that O(1) time is needed to traverse from a node to its correct child
according to the edge label.

Suppose that the alphabet size (number of possible different characters) is |Σ|. A trie can be
implemented using one of the following methods:
• Each node contains an array of length |Σ|, whose ith element holds a child node pointer con-
nected by the ith character of the alphabet. Each node requires O(1) time and O(|Σ|) space.
• Each node contains a linked list where each element contains a character and a child node
pointer. Each node requires O(|Σ|) time and O(1) space.
• Each node contains a red-black tree where each element contains a character as the key, and a
child node pointer. Each node requires O(log(|Σ|)) time and O(1) space.
• One hash table for the whole trie, where each element contains a character and two node point-
ers: parent and child. The hash function is a function of the parent node pointer and the character.
Each node requires O(1) time and O(1) space, but this method suffers from cache misses.

1
FCAI-CU AdvDS Tries and Suffix Trees Amin Allam

2 Suffix tries
The suffix trie data structures is a trie that stores all suffixes of a given large string of length n. A
suffix of a string is a substring that ends at the last location (n − 1). The suffix ID is its starting
location inside the original string. The suffix trie allows to search for a given substring inside the
original string. The following suffix trie stores all suffixes of the string banana:
012345
banana
a b n

5
n a a
4
a n n
3
n a a
2
a n
1
a
0

The suffix trie requires O(n2 ) space and construction time, which makes it impractical. To make it
practical, nodes with one child should be removed. Before doing that, a sentinel $ is added to the
original string to make sure that no suffix ends at an internal node of the suffix trie:
0123456
banana$
$ ab n

6 $ n a a

5 a n $ n

$ n a 4 a

3 a n $

$ a 2

1 $

To search for a substring, the suffix trie is traversed from the root to a node. IDs associated with all
nodes in the subtree of the reached node are reported as locations of that substring. For example,
searching for an or ana results {3,1}, while searching for a results {5,3,1}.

Now, one-child nodes can be safely removed to make a suffix tree. Also, since all suffixes end
at leaves, suffix IDs can be removed and deduced after query traversal by subtracting number of
traversed characters from n.

2
FCAI-CU AdvDS Tries and Suffix Trees Amin Allam

3 Suffix trees
A suffix tree is a compact suffix trie which contains all suffixes of an original string of length n
(including $), does not contain any one-child node, and all suffixes end at leaves. Consider the
following suffix tree of the string banana$:
0123456
banana$
[6,1] $
a n
[1,1] a
6 [6,1] $ [2,2]

n b
5 [2,2] a a [6,1] $
n
[6,1] $ a 4
n
[0,7] n [4,3] a
a $
3
n $
[4,3] a
$
2

1
[st,len]

After one-child nodes are removed, some edges need to be labelled with substrings, not with single
characters as in the suffix trie. To avoid O(n2 ) space, edges are labelled with the start location and
the length of a substring inside the original string, instead of labelling them with the substrings
themselves. Thus, the original string must be available to conduct queries. Substrings are shown
on edges in the above figure only for illustration. The substring length can also be removed and
deduced by subtracting the smallest start location of children from the start location of parent.

Thus, each node in the suffix tree needs O(1) space, and the number of leaves equals to the number
of suffixes n. The number of internal nodes is ≤ n − 1 ∗ . Thus, the suffix tree needs O(n) space.

∗ The number of internal nodes of a tree with no one-child nodes ≤ number of leaves −1.
Proof: Consider a procedure which starts with leaves and attempts to construct arbitrary tree by
picking at least two nodes and creating a new internal node as their parent. After each step, the
number of nodes with no parent decreases by one. The procedure stops when there is exactly one
node with no parent (which is the root). The number of steps, as well as the number of created
internal nodes, cannot exceed the number of leaves −1.

To construct a suffix tree, we should not create an O(n2 ) suffix trie then use it to construct the suffix
tree, because O(n2 ) space or time is not available for large strings. Ukkonen proposed a practical
algorithm to construct the O(n) suffix tree using only O(n) space and time.

The time complexity of searching for a substring inside the suffix tree is O(m+occ) where m is the
length of the substring, and occ is the number of occurrences of that substring inside the original
string. That result follows because O(m) is needed as initial traversal, then O(occ) is needed for a
depth first search starting from the internal node or the place where we stopped.

Core Java Durga Sir
84% (25)
Core Java Durga Sir
850 pages
Tries Data Structures (Trie) PPT
100% (1)
Tries Data Structures (Trie) PPT
11 pages
Types of Tries.pptx
No ratings yet
Types of Tries.pptx
20 pages
tries and Radix Tree1
No ratings yet
tries and Radix Tree1
27 pages
Trie
No ratings yet
Trie
6 pages
Suffixtrees
No ratings yet
Suffixtrees
50 pages
Unit 3 Tries
No ratings yet
Unit 3 Tries
16 pages
Chapter 3 Part 2
No ratings yet
Chapter 3 Part 2
22 pages
Suffix Trees, Suffix Arrays, and Their Applications
No ratings yet
Suffix Trees, Suffix Arrays, and Their Applications
29 pages
Ads 2 Part 4
No ratings yet
Ads 2 Part 4
18 pages
Suf Tree
No ratings yet
Suf Tree
6 pages
6 Suffix-Tree
No ratings yet
6 Suffix-Tree
20 pages
09 SuffixTrees
No ratings yet
09 SuffixTrees
21 pages
TRIE Trees: Search Engines Genome Analysis Data Analytics
No ratings yet
TRIE Trees: Search Engines Genome Analysis Data Analytics
6 pages
Tries_and_Suffix_Tries
No ratings yet
Tries_and_Suffix_Tries
29 pages
Lecture4 - Indexing and Searching I
No ratings yet
Lecture4 - Indexing and Searching I
56 pages
Notes 06 Text Indexing PDF
No ratings yet
Notes 06 Text Indexing PDF
162 pages
6.851 Advanced Data Structures (Spring'12) Prof. Erik Demaine Problem 9 Sample Solution
No ratings yet
6.851 Advanced Data Structures (Spring'12) Prof. Erik Demaine Problem 9 Sample Solution
2 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
Suffix Tree
No ratings yet
Suffix Tree
130 pages
Trie Tree
No ratings yet
Trie Tree
21 pages
Suffix Trees and Suffix Arrays
No ratings yet
Suffix Trees and Suffix Arrays
33 pages
unit5_trie
No ratings yet
unit5_trie
23 pages
Suffix Tree
No ratings yet
Suffix Tree
6 pages
Trie and Suffix Trees
No ratings yet
Trie and Suffix Trees
17 pages
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
No ratings yet
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
78 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
Suffix Trees
No ratings yet
Suffix Trees
76 pages
Tries 1427
No ratings yet
Tries 1427
19 pages
55 TriesNOTES
No ratings yet
55 TriesNOTES
18 pages
Suffix Trees in Detail
No ratings yet
Suffix Trees in Detail
23 pages
Tries and Huffman Encoding
No ratings yet
Tries and Huffman Encoding
16 pages
Trie Data Structure
No ratings yet
Trie Data Structure
5 pages
Tries.pptx
No ratings yet
Tries.pptx
33 pages
A2SV - Trie Lecture (No Code)
No ratings yet
A2SV - Trie Lecture (No Code)
39 pages
Tutorial Suffix Tree
No ratings yet
Tutorial Suffix Tree
16 pages
Representation:: Insertion and Search in Trie Data Structure
No ratings yet
Representation:: Insertion and Search in Trie Data Structure
25 pages
Presentation 1
No ratings yet
Presentation 1
20 pages
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
No ratings yet
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
11 pages
P7_CSC_22_39
No ratings yet
P7_CSC_22_39
4 pages
lec-11-trie
No ratings yet
lec-11-trie
28 pages
On-Line Construction of Suffix Trees
No ratings yet
On-Line Construction of Suffix Trees
18 pages
5.4. ADS_Tries_Standard Tries
No ratings yet
5.4. ADS_Tries_Standard Tries
34 pages
Indexed Search Tree (Trie) : Nelson Padua-Perez Chau-Wen Tseng
No ratings yet
Indexed Search Tree (Trie) : Nelson Padua-Perez Chau-Wen Tseng
21 pages
Trie Insertion
No ratings yet
Trie Insertion
31 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
L17
No ratings yet
L17
23 pages
Tries
No ratings yet
Tries
17 pages
Trie
No ratings yet
Trie
16 pages
9 Suffix Trees: Tttta
No ratings yet
9 Suffix Trees: Tttta
9 pages
Advance Data Structures
No ratings yet
Advance Data Structures
184 pages
Tries Data Structure
100% (1)
Tries Data Structure
14 pages
A. Yet Another Problem With Strings: ACM ICPC Practice Contest, 8 November, 2015
No ratings yet
A. Yet Another Problem With Strings: ACM ICPC Practice Contest, 8 November, 2015
2 pages
Outline and Reading: Tries 4/1/2003 9:02 AM
No ratings yet
Outline and Reading: Tries 4/1/2003 9:02 AM
3 pages
Ders10 Data Structures-Tries
No ratings yet
Ders10 Data Structures-Tries
34 pages
Applications of Suffix Trees
No ratings yet
Applications of Suffix Trees
40 pages
Programming Assignment 1: Suffix Trees
No ratings yet
Programming Assignment 1: Suffix Trees
21 pages
Tries
No ratings yet
Tries
3 pages
Tries and Suffix Tries
No ratings yet
Tries and Suffix Tries
26 pages
Matrix Theory and Applications for Scientists and Engineers
From Everand
Matrix Theory and Applications for Scientists and Engineers
Alexander Graham
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lesson 1 - Kotlin Basics
No ratings yet
Lesson 1 - Kotlin Basics
59 pages
What's New in C#
No ratings yet
What's New in C#
38 pages
Itp Unit-4
No ratings yet
Itp Unit-4
11 pages
Java - Lang Class: General Questions
No ratings yet
Java - Lang Class: General Questions
16 pages
Cps Pointers and Preprocessor Vtu Notes
No ratings yet
Cps Pointers and Preprocessor Vtu Notes
9 pages
Java MCQ Unit 1
No ratings yet
Java MCQ Unit 1
16 pages
Java Programming: Program 2: Write A Program in Java For Widening and Narrowing Conversion
No ratings yet
Java Programming: Program 2: Write A Program in Java For Widening and Narrowing Conversion
4 pages
Pointers To Class Members
No ratings yet
Pointers To Class Members
3 pages
Ayman OOP
No ratings yet
Ayman OOP
7 pages
C 8
No ratings yet
C 8
25 pages
Zeos Component Library:: Expression Reference Guide
No ratings yet
Zeos Component Library:: Expression Reference Guide
8 pages
Type Checking: CS416 Compiler Design 1
No ratings yet
Type Checking: CS416 Compiler Design 1
10 pages
2nd Largest No - in An Array Using 8086
No ratings yet
2nd Largest No - in An Array Using 8086
9 pages
Computer Science 1: CPS109 With Prof. Kosta Derpanis
No ratings yet
Computer Science 1: CPS109 With Prof. Kosta Derpanis
246 pages
TYPECASTING
No ratings yet
TYPECASTING
323 pages
Permission Contact
No ratings yet
Permission Contact
2 pages
Example Programs 3 Function Overloading
No ratings yet
Example Programs 3 Function Overloading
9 pages
Chapter 7 Dictionaries 1
No ratings yet
Chapter 7 Dictionaries 1
5 pages
Single Row Function: - Operates On Character Datatype
No ratings yet
Single Row Function: - Operates On Character Datatype
14 pages
VHDL - Array, Record and Access Types
No ratings yet
VHDL - Array, Record and Access Types
25 pages
Template Woocommerce
No ratings yet
Template Woocommerce
39 pages
Selenium Java Training
No ratings yet
Selenium Java Training
994 pages
Object Oriented Programming (OOP) - CS304 Power Point Slides Lecture 23
No ratings yet
Object Oriented Programming (OOP) - CS304 Power Point Slides Lecture 23
25 pages
PPL (Unit2 Data Types)
75% (4)
PPL (Unit2 Data Types)
43 pages
Single Precision and Double Precision
No ratings yet
Single Precision and Double Precision
2 pages
Paper 3
No ratings yet
Paper 3
6 pages
Week 10
No ratings yet
Week 10
55 pages
3D Shearing
No ratings yet
3D Shearing
7 pages
Chapter 5 Strings - CBSE
No ratings yet
Chapter 5 Strings - CBSE
45 pages

Lecture03_SuffixTree

Uploaded by

Lecture03_SuffixTree

Uploaded by

Cairo University

Faculty of Computers and Artificial Intelligence

You might also like