0% found this document useful (0 votes)
13 views24 pages

Week 2+3 TRIE (Student Copy)

The document describes how a trie data structure works for storing and searching strings. It allows for very fast string searches and insertions in time proportional to the length of the string.

Uploaded by

jubairahmed1678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views24 pages

Week 2+3 TRIE (Student Copy)

The document describes how a trie data structure works for storing and searching strings. It allows for very fast string searches and insertions in time proportional to the length of the string.

Uploaded by

jubairahmed1678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

TRIE

Prepared By
Lec Swapnil Biswas

1
TRIE
❏ A tree based data structure (k-ary tree)

❏ Root is an empty node.

❏ (k=26) Each node will have 26 children (Each child represents a alphabetic
letter)

❏ Implemented by linked data structure

❏ It allows for very fast searching and insertion operations

❏ The word TRIE comes from the word Retrieval

❏ It refers to the quick retrieval of strings

❏ Used for storing strings, string matching, lexicographical sorting etc.

2
WHY TRIE?
❑ Consider a database of strings

❑ Number of strings in the database is n

❑ Now what is the complexity to find a given string x whether x exists in the database or not

❑ Ans: O(n x m) where m is the average length of the strings

❑ Now if the database is too big, then finding a string from the database will be time consuming
❑ Goal is to find a string x without the dependency of n
❑ TRIE will solve this issue to find a string x in O(length(x)) complexity

❑ So doesn’t matter how long the database is, time complexity of finding a string x will remain length(x)

3
INSERT IN TRIE
❑ insert(“MIT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) B B
M
C M
❑ insert(“MISTCE”) U C I I
❑ insert(“BUBT”) U S ST T
U
❑ insert(“MISTME”) U
E
B
P T T
❑ insert(“BUP”)
B P
❑ insert(“CU”) E C
M
C M
❑ insert(“MIST”)
❑ Is it possible to know the frequency T T E ALREDY E
of any string in the TRIE? T T INSERTED
E E
• NO
❑ But keeping a counter variable at each
node can address this issue

4
INSERT IN TRIE (WITH COUNTER)
❑ insert(“MIT”)
Root
❑ insert(“MIST”)
❑ insert(“BUET”) B B
M
C M
❑ insert(“MISTCE”) U I I
C
❑ insert(“BUBT”) U ST
S T
❑ insert(“MISTME”) U 1
U
❑ insert(“BUP”) E
B
P 1 T T
12
❑ insert(“CU”) B P
E C C
M M
❑ insert(“MIST”) 21
❑ insert(“BUP”) T T E E
T T E E
1 1 1 1

5
INSERT IN TRIE (WITH COUNTER)
❑ insert(“MIT”)
❑ insert(“MIST”)
❑ insert(“BUET”)
❑ insert(“MISTCE”)
❑ insert(“BUBT”)
❑ insert(“MISTME”)
❑ insert(“BUP”)
❑ insert(“CU”)
❑ insert(“MIST”)
❑ insert(“BUP”)
NODE REPRESENTATION
struct Node{
EoW
int EoW;
children
Node *children[26]; A B C D E F U V W X Y Z
} …
0 1 2 3 4 5 20 21 22 23 24 25
.

7
NODE REPRESENTATION
❑ insert(“CA”) EoW = 0 root
root A B C X Y Z
❑ insert(“CZ”) uu children u C
0 1 2
… 23 24 25
Iteration-2
Iteration-1 C
u
Iteration-2
Iterations
udoes= root
Iteration-1
does
are completed
uu have
= root
Iterations have
are aacompleted
child ‘c’?
child ‘Z’? .
does
does
root
Increment
Or
=
Increment
u
u
new
have
have
Node(
EoW
a
a child
of u‘C’?‘A’?
)
child
EoW of u
u->children[25]!=NULL
Or u->children[2]!=NULL uv A
Z
Or
u->EoW u->children[2]!=NULL
= u->EoW+1 EoW = 0
Or
u->EoW NO
,, u->children[2]!=NULL
=NOu->EoW + 1 X Y Z A Z
v
,
= new
, v = newYES
NO ( )
Node
Node( ) u
uv children
A B C
Push v = new Node(
u down towards
u->children[2] =v
) ‘C’ … uv 1 uv 1
u->children[25]
u->children[0] == v v 0 1 2 23 24 25
uu==uv-> children[2]
u=v .
EoW = 0 1 EoW = 10
uv A B C X Y Z uv A B C X Y Z
children … children …
0 1 2 23 24 25 0 1 2 23 24 25
. . 8
INSERT IN TRIE
insert(x)

Node pointer u ← root Initially pointing u at the root

for k ← 0 to size(x) - 1 Iterates for size(x) number of times

r ← x[k] - 65 r is the relative position of current char


O(|x|) if u->children[r] is NULL No children condition

u->children[r] ← new Node( ) Creates new node under children[r]

u ← u->children[r] Pushes u down for next iteration

u->EoW ← u->EoW + 1; Increments u->EoW after completing iteration

9
SEARCH IN TRIE
❑ insert(“MIT”) ❑ search(“BUBT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) B B M
❑ insert(“MISTCE”) U C
I
❑ insert(“BUBT”) U S T
❑ insert(“MISTME”) U
1
❑ insert(“BUP”) B 1 T
B 2
P
❑ insert(“CU”) E C M
❑ insert(“MIST”) 2
❑ insert(“BUP”) T

We reach a vertex with counter >0 T T E E


Means “BUBT” exists 1 1 1 1

10
SEARCH IN TRIE
❑ insert(“MIT”) ❑ search(“BUBT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) ❑ search(“BRAC”) B M
B
❑ insert(“MISTCE”) R R C
I
❑ insert(“BUBT”) U S T
❑ insert(“MISTME”) U
1
NULL T
❑ insert(“BUP”) 1 2
B P
❑ insert(“CU”) E C M
❑ insert(“MIST”) 2
❑ insert(“BUP”)
T T E E
We reach to NULL
1 1 1 1
Means “BRAC” doesn’t exist

11
SEARCH IN TRIE
❑ insert(“MIT”) ❑ search(“BUBT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) ❑ search(“BRAC”) B M
M
❑ insert(“MISTCE”) ❑ search(“MI”) C I I
❑ insert(“BUBT”) U S T
❑ insert(“MISTME”) U
1
❑ insert(“BUP”) 1 T
B 2
P
❑ insert(“CU”) E C M
❑ insert(“MIST”) 2
❑ insert(“BUP”)
T T E E
We can’t reach a node with counter=0 1 1 1 1
Means “MI” doesn’t exist

12
SEARCH IN TRIE
❑ insert(“MIT”) ❑ search(“BUBT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) ❑ search(“BRAC”) B M
C
❑ insert(“MISTCE”) ❑ search(“MI”) C
I
❑ insert(“BUBT”) U S T
❑ search(“CUET”) U
❑ insert(“MISTME”) U
1
❑ insert(“BUP”) 1 T
B 2
P E
❑ insert(“CU”) E C M
❑ insert(“MIST”) 2
❑ insert(“BUP”) E
T T E E
We reach to NULL 1 1 1
NULL 1
Means “CUET” doesn’t exist

13
SEARCH IN TRIE
❑ We don’t find a string in TRIE if Root

• The search ends to a NULL


B M
• The search ends to a node with I
C
counter = 0 (Not the end of a word)
U S T
1
U
1 T
B 2
P
E C M
2

T T E E
1 1 1 1

14
METHODS
❑ void insert(string x)

❑ int search(string x)

❑ bool delete(string x)

❑ void lexSort( )

15
RELATIVE POSITION OF A CHARACTER
❑ Consider the strings can only contain uppercase letters

❑ The relative position of a character is obtained by subtracting 65 from it

Character Relative Position Character Relative Position Character Relative Position


A 0 I 9 R 18
B 1 J 10 S 19
C 2 K 11 T 20
D 3 L 12 U 21
E 4 M 13 V 22
F 5 N 14 W 23
G 6 O 15 X 24
H 7 P 16 Y 25
I 8 Q 17

16
RELATIVE POSITION OF A CHARACTER
int relPos(char c){
int ascii = (int) c;
return ascii – 65;

17
SEARCH IN TRIE
find(x, Node pointer cur ← root, k ← 0) Root
if cur is NULL find(“MI”, k=0) = 0 NOT FOUND
0
return 0
M find(“MI”, k=1) =0
C 0
if k equals size(x) 0 I find(“MI”, k=2) =0
0
return cur->EoW U S T
3 0 4
r ← x[k] - 65 T
2
return find(x, cur->children[r], k+1)
C M
❑ find(“MI”) 0 0
E E
5 1

18
SEARCH IN TRIE
find(x, Node pointer cur ← root, k ← 0) Root
if cur is NULL find(“MIT”, k=0) =4 FOUND 4 TIMES
0
return 0
M find(“MIT”, k=1) =4
C 0
if k equals size(x) 0 I find(“MIT”, k=2) =4
0
return cur->EoW U S T
3 0 4
r ← x[k] - 65 T find(“MIT”, k=3) =4
2
return find(x, cur->children[r], k+1)
C M
❑ find(“MI”) 0 0

❑ find(“MIT”) E E
5 1

19
SEARCH IN TRIE
find(x, Node pointer cur ← root, k ← 0) Root
if cur is NULL find(“CWC”, k=0) =0 NOT FOUND
return 0 find(“CWC”, k=1) =0 0
C M 0
if k equals size(x) 0 I
0
return cur->EoW U W S T
3 0 4
r ← x[k] - 65 find(“CWC”, k=2) =0 T
2
return find(x, cur->children[r], k+1)
C M
❑ find(“MI”) 0 0

❑ find(“MIT”) E E

❑ find(“CWC”) 5 1

20
SEARCH IN TRIE (COMPLEXITY)
❑ Number of recursive call can not exceed the length of longest string in the TRIE
• Let the longest string in the TRIE is s
• So the time complexity of searching is O(|s|)

21
LEXICOGRAPHICAL ORDER
❑ What are the strings stored in the TRIE? Root
BUBT
BUET B M
BUP B M I
C
CU U MI
S T
MIST C
U
MISTCE BU MIS T MIT
B P CU
MISTME E C M
MIT MIST
BUB BUE BUP
❑ Strings are sorted lexicographically MIST MISTM
❑ Left to Right approach T T EC E
(Merging with parent) BUBT BUET

MISTCE MISTME
22
LEXICOGRAPHICAL ORDER
void printTRIE(Node *cur = root, string s=“”)
{
if(cur->EoW>1)
Base case:
{
cout<<s<<endl;
If the pointer reaches to the end of a word
} Then the word is printed
for(int i=0; i<26; i++)
{ Traversing all the edges of a node from left to right
if(cur->children[i]!=NULL) Calling the function recursively for those nodes
{ Having at least one child(edge).
char c = char(i + 65);
printTRIE(cur->children[i], s+c); So for leaf node: No recursive call is made
}
}
}

23
Thank You!

24

You might also like