Week 2+3 TRIE (Student Copy)
Week 2+3 TRIE (Student Copy)
Prepared By
Lec Swapnil Biswas
1
TRIE
❏ A tree based data structure (k-ary tree)
❏ (k=26) Each node will have 26 children (Each child represents a alphabetic
letter)
2
WHY TRIE?
❑ Consider a database of strings
❑ Now what is the complexity to find a given string x whether x exists in the database or not
❑ Now if the database is too big, then finding a string from the database will be time consuming
❑ Goal is to find a string x without the dependency of n
❑ TRIE will solve this issue to find a string x in O(length(x)) complexity
❑ So doesn’t matter how long the database is, time complexity of finding a string x will remain length(x)
3
INSERT IN TRIE
❑ insert(“MIT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) B B
M
C M
❑ insert(“MISTCE”) U C I I
❑ insert(“BUBT”) U S ST T
U
❑ insert(“MISTME”) U
E
B
P T T
❑ insert(“BUP”)
B P
❑ insert(“CU”) E C
M
C M
❑ insert(“MIST”)
❑ Is it possible to know the frequency T T E ALREDY E
of any string in the TRIE? T T INSERTED
E E
• NO
❑ But keeping a counter variable at each
node can address this issue
4
INSERT IN TRIE (WITH COUNTER)
❑ insert(“MIT”)
Root
❑ insert(“MIST”)
❑ insert(“BUET”) B B
M
C M
❑ insert(“MISTCE”) U I I
C
❑ insert(“BUBT”) U ST
S T
❑ insert(“MISTME”) U 1
U
❑ insert(“BUP”) E
B
P 1 T T
12
❑ insert(“CU”) B P
E C C
M M
❑ insert(“MIST”) 21
❑ insert(“BUP”) T T E E
T T E E
1 1 1 1
5
INSERT IN TRIE (WITH COUNTER)
❑ insert(“MIT”)
❑ insert(“MIST”)
❑ insert(“BUET”)
❑ insert(“MISTCE”)
❑ insert(“BUBT”)
❑ insert(“MISTME”)
❑ insert(“BUP”)
❑ insert(“CU”)
❑ insert(“MIST”)
❑ insert(“BUP”)
NODE REPRESENTATION
struct Node{
EoW
int EoW;
children
Node *children[26]; A B C D E F U V W X Y Z
} …
0 1 2 3 4 5 20 21 22 23 24 25
.
7
NODE REPRESENTATION
❑ insert(“CA”) EoW = 0 root
root A B C X Y Z
❑ insert(“CZ”) uu children u C
0 1 2
… 23 24 25
Iteration-2
Iteration-1 C
u
Iteration-2
Iterations
udoes= root
Iteration-1
does
are completed
uu have
= root
Iterations have
are aacompleted
child ‘c’?
child ‘Z’? .
does
does
root
Increment
Or
=
Increment
u
u
new
have
have
Node(
EoW
a
a child
of u‘C’?‘A’?
)
child
EoW of u
u->children[25]!=NULL
Or u->children[2]!=NULL uv A
Z
Or
u->EoW u->children[2]!=NULL
= u->EoW+1 EoW = 0
Or
u->EoW NO
,, u->children[2]!=NULL
=NOu->EoW + 1 X Y Z A Z
v
,
= new
, v = newYES
NO ( )
Node
Node( ) u
uv children
A B C
Push v = new Node(
u down towards
u->children[2] =v
) ‘C’ … uv 1 uv 1
u->children[25]
u->children[0] == v v 0 1 2 23 24 25
uu==uv-> children[2]
u=v .
EoW = 0 1 EoW = 10
uv A B C X Y Z uv A B C X Y Z
children … children …
0 1 2 23 24 25 0 1 2 23 24 25
. . 8
INSERT IN TRIE
insert(x)
9
SEARCH IN TRIE
❑ insert(“MIT”) ❑ search(“BUBT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) B B M
❑ insert(“MISTCE”) U C
I
❑ insert(“BUBT”) U S T
❑ insert(“MISTME”) U
1
❑ insert(“BUP”) B 1 T
B 2
P
❑ insert(“CU”) E C M
❑ insert(“MIST”) 2
❑ insert(“BUP”) T
10
SEARCH IN TRIE
❑ insert(“MIT”) ❑ search(“BUBT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) ❑ search(“BRAC”) B M
B
❑ insert(“MISTCE”) R R C
I
❑ insert(“BUBT”) U S T
❑ insert(“MISTME”) U
1
NULL T
❑ insert(“BUP”) 1 2
B P
❑ insert(“CU”) E C M
❑ insert(“MIST”) 2
❑ insert(“BUP”)
T T E E
We reach to NULL
1 1 1 1
Means “BRAC” doesn’t exist
11
SEARCH IN TRIE
❑ insert(“MIT”) ❑ search(“BUBT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) ❑ search(“BRAC”) B M
M
❑ insert(“MISTCE”) ❑ search(“MI”) C I I
❑ insert(“BUBT”) U S T
❑ insert(“MISTME”) U
1
❑ insert(“BUP”) 1 T
B 2
P
❑ insert(“CU”) E C M
❑ insert(“MIST”) 2
❑ insert(“BUP”)
T T E E
We can’t reach a node with counter=0 1 1 1 1
Means “MI” doesn’t exist
12
SEARCH IN TRIE
❑ insert(“MIT”) ❑ search(“BUBT”) Root
❑ insert(“MIST”)
❑ insert(“BUET”) ❑ search(“BRAC”) B M
C
❑ insert(“MISTCE”) ❑ search(“MI”) C
I
❑ insert(“BUBT”) U S T
❑ search(“CUET”) U
❑ insert(“MISTME”) U
1
❑ insert(“BUP”) 1 T
B 2
P E
❑ insert(“CU”) E C M
❑ insert(“MIST”) 2
❑ insert(“BUP”) E
T T E E
We reach to NULL 1 1 1
NULL 1
Means “CUET” doesn’t exist
13
SEARCH IN TRIE
❑ We don’t find a string in TRIE if Root
T T E E
1 1 1 1
14
METHODS
❑ void insert(string x)
❑ int search(string x)
❑ bool delete(string x)
❑ void lexSort( )
15
RELATIVE POSITION OF A CHARACTER
❑ Consider the strings can only contain uppercase letters
16
RELATIVE POSITION OF A CHARACTER
int relPos(char c){
int ascii = (int) c;
return ascii – 65;
17
SEARCH IN TRIE
find(x, Node pointer cur ← root, k ← 0) Root
if cur is NULL find(“MI”, k=0) = 0 NOT FOUND
0
return 0
M find(“MI”, k=1) =0
C 0
if k equals size(x) 0 I find(“MI”, k=2) =0
0
return cur->EoW U S T
3 0 4
r ← x[k] - 65 T
2
return find(x, cur->children[r], k+1)
C M
❑ find(“MI”) 0 0
E E
5 1
18
SEARCH IN TRIE
find(x, Node pointer cur ← root, k ← 0) Root
if cur is NULL find(“MIT”, k=0) =4 FOUND 4 TIMES
0
return 0
M find(“MIT”, k=1) =4
C 0
if k equals size(x) 0 I find(“MIT”, k=2) =4
0
return cur->EoW U S T
3 0 4
r ← x[k] - 65 T find(“MIT”, k=3) =4
2
return find(x, cur->children[r], k+1)
C M
❑ find(“MI”) 0 0
❑ find(“MIT”) E E
5 1
19
SEARCH IN TRIE
find(x, Node pointer cur ← root, k ← 0) Root
if cur is NULL find(“CWC”, k=0) =0 NOT FOUND
return 0 find(“CWC”, k=1) =0 0
C M 0
if k equals size(x) 0 I
0
return cur->EoW U W S T
3 0 4
r ← x[k] - 65 find(“CWC”, k=2) =0 T
2
return find(x, cur->children[r], k+1)
C M
❑ find(“MI”) 0 0
❑ find(“MIT”) E E
❑ find(“CWC”) 5 1
20
SEARCH IN TRIE (COMPLEXITY)
❑ Number of recursive call can not exceed the length of longest string in the TRIE
• Let the longest string in the TRIE is s
• So the time complexity of searching is O(|s|)
21
LEXICOGRAPHICAL ORDER
❑ What are the strings stored in the TRIE? Root
BUBT
BUET B M
BUP B M I
C
CU U MI
S T
MIST C
U
MISTCE BU MIS T MIT
B P CU
MISTME E C M
MIT MIST
BUB BUE BUP
❑ Strings are sorted lexicographically MIST MISTM
❑ Left to Right approach T T EC E
(Merging with parent) BUBT BUET
MISTCE MISTME
22
LEXICOGRAPHICAL ORDER
void printTRIE(Node *cur = root, string s=“”)
{
if(cur->EoW>1)
Base case:
{
cout<<s<<endl;
If the pointer reaches to the end of a word
} Then the word is printed
for(int i=0; i<26; i++)
{ Traversing all the edges of a node from left to right
if(cur->children[i]!=NULL) Calling the function recursively for those nodes
{ Having at least one child(edge).
char c = char(i + 65);
printTRIE(cur->children[i], s+c); So for leaf node: No recursive call is made
}
}
}
23
Thank You!
24