0% found this document useful (0 votes)

234 views

Tutorial Suffix Tree

The document introduces two data structures for text indexing: suffix trees and suffix arrays. A suffix tree is a compact tree that represents all suffixes of a text, with each leaf mapping to a suffix and paths spelling out suffixes. It allows searching for patterns in optimal O(m+occurrences) time by traversing the tree with the pattern. A suffix array simply stores all suffixes in lexicographic order in an array, allowing searches in O(m log n) time via binary search. Both use linear O(n) space but suffix arrays use less space in practice.

Uploaded by

Vikash Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

234 views

Tutorial Suffix Tree

Uploaded by

Vikash Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

CS4311 Design and Analysis of Algorithms

Suffix Tree and Suffix Array

About this tutorial

Introduce two data structures for text indexing problem: Suffix Tree and Suffix Array

Text Indexing
String Matching problem: Given a text T and a pattern P, how to locate all occurrences of P in T ? KMP algorithm can solve this in O(|T|+|P|) time optimal In some applications, T is very long, and given in advance, and we will search different patterns against it later E.g., T= Human DNA, P = gene
3

Text Indexing
Text Indexing problem: Suppose a text T is known. Can we build a data structure for T, such that for any pattern P given later, we can find all occurrences of P in T quickly ? The data structure is called an index of T Target: search better than O(|T|+|P|) ??
4

Text Indexing
Two main kinds of text indexes: Word-Based: (for texts formed by words) Used by most text search engine E.g., Inverted Files Full-Text: (for texts with no word boundaries) Used in indexing DNA E.g., Suffix Tree, Suffix Array
5

Suffix Tree
Let T[1..n] be a text with n characters we assume T[n] is a unique character For any j, T[j..n] is called a suffix of T T has exactly n suffixes Weiner (1973) and McCreight (1976) independently invented the suffix tree a tree formed by putting all suffixes of T together
6

# 8 c a # 6 c # 3 a a c a

c # 7 a c # 4 c a a c # 1
7

a ca a

# 5

c # 2

Suffix Tree of acacaac#

Definition of a Suffix Tree

Suffix tree is an edge-labeled compact tree (no degree-1 nodes) with n leaves each leaf suffix leaf label starting pos of suffix If we traverse from root to leaf k : edge labels along path suffix T[k..n] edge-label to each child starts with different character
8

Searching in a Suffix Tree

Theorem: If a pattern P occurs at position j in T, P is a prefix of T[j..n] This suggests the searching algorithm below: Start from root of the suffix tree Traverse the suffix tree using P What we are doing is to match P with all suffixes of T at the same time
9

Searching in a Suffix Tree

Theorem: Pattern P occurs in T if and only if all chars of P are matched in the traversal of the searching algorithm Questions: 1. How to locate the occurrences? 2. What is the searching time? O(|P|+r) time, where r = #occurrences
10

Space Usage
There are O(n) nodes and O(n) edges in the suffix tree O(n) space ? Each edge needs to store its label, which can contain O(n) chars In the worst-case, total O(n2) chars Can we reduce space usage?
11

Space Usage
Observation: Each edge label must be equal to some substring of T Clever Idea: 1. Store T, and 2. Replace each edge label by 2 integers, telling which substring it is equal to Total space: O(n)
12

[8,8] 8 [6,8] 5 [8,8] 6 [6,8] 3 [1,1] [2,2] [3,3]

[2,2] [8,8] 7 [6,8] 4 [4,8] 2 [3,3]

[4,8] 1
13

Suffix Tree of acacaac#

Suffix Array
Although suffix tree takes O(n) space, the hidden constant is quite large around 40n to 60n bytes Manber and Myers (1990) simplified the suffix tree, and invented the suffix array An array storing the suffixes of T in the dictionary order
14

Suffix Array
Suffix Array of acacaac# 1 2 3 4 5 6 7 8 # aac# ac# acaac# acacaac# c# caac# cacaac#

The suffix array SA for T has n entries For any j, SA[j] stores the jth smallest suffix, based on alphabetical order Theorem: If P occurs in T, its occurrences correspond to consecutive region in SA
15

Suffix Array
Suffix Array of acacaac# 1 2 3 4 5 6 7 8 # aac# ac# acaac# acacaac# c# caac# cacaac#

Searching P takes O(|P| log n) time using binary search Space: We can represent each suffix by its starting position O(n) space In practice, around 14n bytes
16

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Skillcast GDPR Training Presentation
100% (3)
Skillcast GDPR Training Presentation
24 pages
Assigning Server and Database Roles
No ratings yet
Assigning Server and Database Roles
25 pages
6 Suffix-Tree
No ratings yet
6 Suffix-Tree
20 pages
Suffix Array
No ratings yet
Suffix Array
71 pages
Suffix Trees, Suffix Arrays, and Their Applications
No ratings yet
Suffix Trees, Suffix Arrays, and Their Applications
29 pages
Chapter 3 Part 2
No ratings yet
Chapter 3 Part 2
22 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
L17
No ratings yet
L17
23 pages
Lecture4 - Indexing and Searching I
No ratings yet
Lecture4 - Indexing and Searching I
56 pages
Suffix Trees and Suffix Arrays
No ratings yet
Suffix Trees and Suffix Arrays
33 pages
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
No ratings yet
Suffix Tree and Suffix Array Techniques For Pattern Analysis in Strings
78 pages
Suf Tree
No ratings yet
Suf Tree
6 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
9 Suffix Trees: Tttta
No ratings yet
9 Suffix Trees: Tttta
9 pages
Notes 06 Text Indexing PDF
No ratings yet
Notes 06 Text Indexing PDF
162 pages
Applications of Suffix Trees
No ratings yet
Applications of Suffix Trees
40 pages
09 SuffixTrees
No ratings yet
09 SuffixTrees
21 pages
Unit 3
No ratings yet
Unit 3
34 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
Suffix Trees in Detail
No ratings yet
Suffix Trees in Detail
23 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
Suffix Tree
No ratings yet
Suffix Tree
6 pages
Current Challenges in Textual Databases: Gonzalo Navarro
No ratings yet
Current Challenges in Textual Databases: Gonzalo Navarro
44 pages
Suffixtrees
No ratings yet
Suffixtrees
50 pages
Suffix Trees and Their Applications in String Algo
No ratings yet
Suffix Trees and Their Applications in String Algo
21 pages
Toc
No ratings yet
Toc
6 pages
String Matching
No ratings yet
String Matching
89 pages
Suffix Arrays
No ratings yet
Suffix Arrays
20 pages
Pattern Matching: Suffix Tree Applications
No ratings yet
Pattern Matching: Suffix Tree Applications
39 pages
Lecture04_SuffixArray
No ratings yet
Lecture04_SuffixArray
5 pages
11 Data Structures and Algorithms - Narasimha Karumanchi
No ratings yet
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
Lecture03_SuffixTree
No ratings yet
Lecture03_SuffixTree
3 pages
UNIT 5.3 (String Mactching)
No ratings yet
UNIT 5.3 (String Mactching)
23 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
String Matching
No ratings yet
String Matching
5 pages
Co 4 (Lo 2)
No ratings yet
Co 4 (Lo 2)
12 pages
Programming-Assignment-3
No ratings yet
Programming-Assignment-3
17 pages
Suffix Trees
No ratings yet
Suffix Trees
76 pages
gsaca
No ratings yet
gsaca
63 pages
unit5_trie
No ratings yet
unit5_trie
23 pages
Algorithms 06 00319
No ratings yet
Algorithms 06 00319
33 pages
Trie
No ratings yet
Trie
6 pages
Pattern Matching + Hashing
No ratings yet
Pattern Matching + Hashing
29 pages
M269_lec8 Fall 1819
No ratings yet
M269_lec8 Fall 1819
24 pages
Suffix Tree
No ratings yet
Suffix Tree
130 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
1.advanced Tree Structures
No ratings yet
1.advanced Tree Structures
29 pages
Suffix Trees: CSC 448 Bioinformatics Algorithms Alexander Dekhtyar
No ratings yet
Suffix Trees: CSC 448 Bioinformatics Algorithms Alexander Dekhtyar
8 pages
16 Rabin Karp Algorithm 07-02-2025
No ratings yet
16 Rabin Karp Algorithm 07-02-2025
7 pages
Lecture 6
No ratings yet
Lecture 6
133 pages
Fla 03
No ratings yet
Fla 03
27 pages
Burros Wheeler Transform - Bioinformatics
No ratings yet
Burros Wheeler Transform - Bioinformatics
67 pages
Week 4
No ratings yet
Week 4
18 pages
Obs Ds Unit5
No ratings yet
Obs Ds Unit5
10 pages
Topic _Q Implement Trie(prefix tree)_Information o..
No ratings yet
Topic _Q Implement Trie(prefix tree)_Information o..
3 pages
Strings and Pattern Searching
100% (1)
Strings and Pattern Searching
80 pages
Succinct Suffix Arrays Based On Run-Length Encoding
No ratings yet
Succinct Suffix Arrays Based On Run-Length Encoding
26 pages
Application of Tries
No ratings yet
Application of Tries
38 pages
Group Theory
From Everand
Group Theory
W. R. Scott
4.5/5 (3)
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Teradata Vantage SQL Basics
No ratings yet
Teradata Vantage SQL Basics
41 pages
Oracle Audit Checklist
67% (3)
Oracle Audit Checklist
1 page
SQL and Xquery Tutorial For Ibm Db2, Part 1
100% (1)
SQL and Xquery Tutorial For Ibm Db2, Part 1
26 pages
unit1
No ratings yet
unit1
36 pages
A Framework For The Design of Distributed Databases
No ratings yet
A Framework For The Design of Distributed Databases
5 pages
AWE Tables
No ratings yet
AWE Tables
2 pages
User Guide: Informatica Data Quality Analyst (Version 9.5.0)
No ratings yet
User Guide: Informatica Data Quality Analyst (Version 9.5.0)
61 pages
SQL Tutorial PDF
100% (1)
SQL Tutorial PDF
43 pages
SYST15123 Assignment 01
No ratings yet
SYST15123 Assignment 01
2 pages
DBMS 2016
No ratings yet
DBMS 2016
5 pages
ANT336 Building Data Mesh Architectures On AWS
No ratings yet
ANT336 Building Data Mesh Architectures On AWS
50 pages
Comp Codes CADWORX
100% (1)
Comp Codes CADWORX
6 pages
Master Enablement Plan Alteryx
0% (1)
Master Enablement Plan Alteryx
14 pages
Lifecycle of A Data Science Project
No ratings yet
Lifecycle of A Data Science Project
1 page
612719980-DATA-ware-house-mining-NOTES
No ratings yet
612719980-DATA-ware-house-mining-NOTES
31 pages
Big Data Framework For National E-Governance Plan: Rajagopalan M.R
No ratings yet
Big Data Framework For National E-Governance Plan: Rajagopalan M.R
5 pages
Ibm Infosphere Master Data Management Server: Harness The Value of Information Throughout The Enterprise
No ratings yet
Ibm Infosphere Master Data Management Server: Harness The Value of Information Throughout The Enterprise
8 pages
Abap Transaction Codes
No ratings yet
Abap Transaction Codes
3 pages
DBMS Qbank - CA
No ratings yet
DBMS Qbank - CA
6 pages
Top 32 Power BI Interview Questions and Answers For 2021
No ratings yet
Top 32 Power BI Interview Questions and Answers For 2021
33 pages
Python Pandas For Data Analytics
No ratings yet
Python Pandas For Data Analytics
7 pages
Spring Batch
No ratings yet
Spring Batch
14 pages
Resolving Archive Gap Between Primary
No ratings yet
Resolving Archive Gap Between Primary
18 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Cricket Management System Scenario
No ratings yet
Cricket Management System Scenario
4 pages
PL 4
No ratings yet
PL 4
27 pages
Raid
No ratings yet
Raid
2 pages
11 Displaying A PNR Host
No ratings yet
11 Displaying A PNR Host
20 pages

Tutorial Suffix Tree

Uploaded by

Tutorial Suffix Tree

Uploaded by

CS4311 Design and Analysis of Algorithms

Suffix Tree and Suffix Array

About this tutorial

Suffix Tree of acacaac#

Definition of a Suffix Tree

Searching in a Suffix Tree

Searching in a Suffix Tree

[8,8] 8 [6,8] 5 [8,8] 6 [6,8] 3 [1,1] [2,2] [3,3]

[2,2] [8,8] 7 [6,8] 4 [4,8] 2 [3,3]

Suffix Tree of acacaac#

You might also like