0% found this document useful (0 votes)
17 views28 pages

Lec 11 Trie

Uploaded by

Saksham Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views28 pages

Lec 11 Trie

Uploaded by

Saksham Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CS213/293 Data Structure and Algorithms 2023

Lecture 11: Trie: storing string → Values

Instructor: Ashutosh Gupta

IITB India

Compile date: 2023-09-07

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 1
If keys are strings!

The problem of storing maps boils down to storing keys in an organized manner.

▶ For unordered keys, we used hash tables


▶ For ordered keys, we used red-black trees.
▶ Let us suppose. Our keys are strings.

We have more structure over keys than total order. Can we exploit the structure?

Exercise 11.1
Can we define total order over strings?

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 2
Applications of string keys

▶ Web search

▶ All occurrences of a text

▶ Routing table (Keys are IP addresses)

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 3
Topic 11.1

Trie

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 4
Trie
Let letters of strings be from an alphabet Σ.

Example 11.1
Definition 11.1
A trie is an ordered tree such that each In the following trie, we store words
node except the root is labeled with a {bear , bile, bid, bent}.
letter in Σ and has at most |Σ| children.

A trie may store a set of words. e i

a n d l
A word stored in a trie is a path from the
r e e
root to a leaf.
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 5
End marker
Sometimes a word is a prefix of another word. We need to add end markers in our trie. In our
slides, We will use $.
Example 11.2
Consider set of words {act, actor }

o $
r

$
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 6
Running times

▶ Storage O(W ), W is the total sum of lengths of words


▶ Find, insert, and delete will take O(|Σ|m), where m is the length of the input word
▶ At each node, we need to search among children for the node with the next letter.

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 7
Application: word search
We may use trie to store the positions of all words of a text. The leaves of trie point at
▶ the first occurrence position of the word or
▶ a list of all occurrence positions.
Example 11.3
T l i f t i s l i g h t

l i
i s
f g 5 $
t h
0 $ t
8 $
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 8
Application: routing table

Example 11.4
An internet router contains a routing table that maps IP addresses to links attached to the router.

The IP address of a packet is matched with the


trie. The link with the longest match will receive
the packet.
10 link 4 ∗
Exercise 11.2
link 1 ∗ 26
Which link will receive the following IP address?
15 1 ▶ 21.10.1.6
▶ 10.26.10.6
link 2 38 link 3 ∗ ▶ 10.26.1.6
▶ 10.26.15.9
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 9
Topic 11.2

Compressed trie

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 10
Compressed trie

Example 11.5
In the following compressed trie, we store
Definition 11.2 words {actor , act}.
A compressed trie is a trie with nodes that
do not have single children and nodes are
labeled with substrings of words. act

Obtained from standard trie by compressing or $


chain of redundant nodes.
$

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 11
Commentary: We did not prove exactly the following
theorem. But, we can easily modify our earlier proof.
The number of nodes in the compressed trie

Theorem 11.1 (Recall)


If each internal node has at least two children, then the number of internal nodes is less than the
number of leaves.

Each leaf represents a word.

Therefore, the number of internal nodes in the compressed trie is bounded by the number of words
stored in the trie.

Does compression save the space?

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 12
Typical usage of trie: fast search of words on a large text
The text must have been stored separately from the trie.

We need not store the strings on the nodes.

All we need to point at the position of the stored text and the length of the substring.
Example 11.6
T l i f t i s l i g h t

li is 0:2 5:2

ft ght 5 $ 2:2 10:3 5 $

0 $ 8 $ 0 $ 8 $
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 13
Insertion and deletion on trie

Exercise 11.3
Give an algorithm for insertion and deletion in a compressed trie.

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 14
Topic 11.3

Suffix tree

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 15
Pattern search problem: Another perspective

Typical setting: We search in a (mostly) stable text T using many patterns several times.

Example 11.7
▶ A text editor, where text changes slowly and searches are performed regularly.
▶ Searching in well-known large sequences like genomes.

Can we construct a data structure from the text that allows fast search?

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 16
Suffix tree

Definition 11.3
A suffix tree of a text T is a trie built for suffixes of T .

Exercise 11.4
How many leaves are possible for a suffix tree?

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 17
Example: suffix tree
Example 11.8
The following is the suffix tree of ”abaa”.

b a $

a b a $

a a $

$ a

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 18
Usage of suffix tree

▶ Check if pattern P occurs in T .

▶ Check if P is a suffix of T .

▶ Count the number of occurrences of P in T .

Exercise 11.5
Using suffix tree find the longest string that repeats.

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 19
Suffix links
We can solve more interesting problems if we add more structure to our suffix tree.
Definition 11.4
In each node xα, we add a pointer suffix link that points to node α.

b a $

Example 11.9 a b a $
The following is the suffix tree of ”abaa”
with suffix links. a a $

$ a

$
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 20
Usage of suffix links
Example 11.10
The following is the suffix tree of T =
”abaa”. Let us find the longest sub-string of
Find the longest sub-string of T and P. ”baba” and T.

▶ Walk down the suffix tree following P.


b a $
▶ At a dead end, save the current depth
and follow the suffix link from the
a b a $
current node.
▶ After exhausting P, return the longest
a a $
substring found.
$ a

$
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 21
Topic 11.4

Constructing suffix tree

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 22
Suffix tree construction
Example 11.11
Let us add c in the suffix tree of ”abaa”.
If we have the suffix tree for T [0 : i − 1],
We construct the suffix tree for T [0 : i].
b a c $
In the order of the suffix links, Insert
T [i] at the end of each path of the tree,
a b a c $

Exercise 11.6 a a c $
a. What is the complexity of the above
algorithm? c a $
b. What will be the complexity of suffix
tree construction without suffix links? $ c

$
cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 23
Ukkonen’s algorithm

Suffix links give us O(n2 ) construction, can we do better?

Yes.

Ukkonen’s algorithm uses more programming tricks to achieve O(n). We will not cover the
algorithm in this course.

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 24
Topic 11.5

Problems

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 25
Exercise: suffix tree

Exercise 11.7
Compute the suffix tree for abracadabra$. Compress degree 1 nodes. Use substrings as edge labels.
Put a square around nodes where a word ends. Use it to locate the occurrences of abr .

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 26
Exercise: worst-case suffix tree

Exercise 11.8
Review the argument that for a given text T, consisting of k words, the ordinary trie occupies
space which is a constant multiple of |T |. How is it that the suffix tree for a text T is of size
O(|T |2 )? Give a worst-case example.

Commentary: Milind notes

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 27
End of Lecture 11

cbna CS213/293 Data Structure and Algorithms 2023 Instructor: Ashutosh Gupta IITB India 28

You might also like