0% found this document useful (0 votes)
12 views33 pages

Tries

A Trie is a multiway tree data structure used for efficiently storing and searching strings, allowing for operations like prefix searching and autocomplete. It offers advantages over hash tables, such as efficient prefix searches and alphabetical ordering of words, while maintaining a time complexity of O(M) for search operations. Tries are commonly used in applications like spell checkers and autocomplete features, optimizing search and retrieval processes.

Uploaded by

OML series
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views33 pages

Tries

A Trie is a multiway tree data structure used for efficiently storing and searching strings, allowing for operations like prefix searching and autocomplete. It offers advantages over hash tables, such as efficient prefix searches and alphabetical ordering of words, while maintaining a time complexity of O(M) for search operations. Tries are commonly used in applications like spell checkers and autocomplete features, optimizing search and retrieval processes.

Uploaded by

OML series
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Tries

What is Trie?
• Trie is a type of k-ary search tree used for storing and searching a specific key from a set.
Using Trie, search complexities can be brought to optimal limit (key length).
• A trie (derived from retrieval) is a multiway tree data structure used for storing strings over
an alphabet. It is used to store a large amount of strings. The pattern matching can be done
efficiently using tries.
• Trie is also known as digital tree or prefix tree.
• Trie data structure is defined as a Tree based data structure that is used for storing some
collection of strings and performing efficient search operations on them. The word Trie is
derived from reTRIEval, which means finding something or obtaining it.
• Trie follows some property that If two strings have a common prefix then they will have the
same ancestor in the trie. A trie can be used to sort a collection of strings alphabetically as
well as search whether a string with a given prefix is present in the trie or not.
• The trie shows words like allot, alone, ant, and, are, bat, bad. The idea is that all strings
sharing common prefix should come from a common node. The tries are used in spell
checking programs.
What is Trie?
• Preprocessing pattern improves the performance of pattern
matching algorithm. But if a text is very large then it is
better to preprocess text instead of pattern for efficient
search.
• A trie is a data structure that supports pattern matching
queries in time proportional to the pattern size.
• If we store keys in a binary search tree, a well balanced BST
will need time proportional to M * log N, where M is the
maximum string length and N is the number of keys in the
tree. Using Trie, the key can be searched in O(M) time.
However, the penalty is on Trie storage requirements
Need for Trie Data Structure?
• A Trie data structure is used for storing and
retrieval of data and the same operations
could be done using another data structure
which is Hash Table but Trie can perform these
operations more efficiently than a Hash Table.
• Moreover, Trie has its own advantage over the
Hash table. A Trie data structure can be used
for prefix-based searching whereas a Hash
table can’t be used in the same way.
Advantages of Trie Data Structure
over a Hash Table
• The A trie data structure has the following advantages over
a hash table:
– We can efficiently do prefix search (or auto-complete) with Trie.
– We can easily print all words in alphabetical order which is not
easily possible with hashing.
– There is no overhead of Hash functions in a Trie data structure.
– Searching for a String even in the large collection of strings in a
Trie data structure can be done in O(L) Time complexity, Where L
is the number of words in the query string. This searching time
could be even less than O(L) if the query string does not exist in
the trie.
Properties of a Trie Data Structure
• Below are some important properties of the Trie data structure:
– There is one root node in each Trie.
– Each node of a Trie represents a string and each edge represents a
character.
– Every node consists of hashmaps or an array of pointers, with each
index representing a character and a flag to indicate if any string ends at
the current node.
– Trie data structure can contain any number of characters
including alphabets, numbers, and special characters. But for this
article, we will discuss strings with characters a-z. Therefore, only 26
pointers need for every node, where the 0th index represents ‘a’ and
the 25th index represents ‘z’ characters.
– Each path from the root to any node represents a word or string.
How does Trie Data Structure work?
• Trie data structure can contain any number of characters including alphabets, numbers,
and special characters.
• Here strings with characters a-z are considered. Therefore, only 26 pointers need for every
node, where the 0th index represents ‘a’ and the 25th index represents ‘z’ characters.
• Any lowercase English word can start with a-z, then the next letter of the word could
be a-z, the third letter of the word again could be a-z, and so on.
• So for storing a word, we need to take an array (container) of size 26 and initially, all the
characters are empty as there are no words and it will look as shown below.

An array of pointers inside every Trie node


How does Trie Data Structure work?
• Let’s see how a word “and” and “ant” is stored in the
Trie data structure:
1. Store “and” in Trie data structure:
– The word “and” starts with “a“, So we will mark the
position “a” as filled in the Trie node, which represents the
use of “a”.
– After placing the first character, for the second character
again there are 26 possibilities, So from “a“, again there is
an array of size 26, for storing the 2nd character.
– The second character is “n“, So from “a“, we will move to
“n” and mark “n” in the 2nd array as used.
– After “n“, the 3rd character is “d“, So mark the position “d”
as used in the respective array.
How does Trie Data Structure work?
2. Store “ant” in the Trie data structure:
– The word “ant” starts with “a” and the position of “a”
in the root node has already been filled. So, no need
to fill it again, just move to the node ‘a‘ in Trie.
– For the second character ‘n‘ we can observe that the
position of ‘n’ in the ‘a’ node has already been filled.
So, no need to fill it again, just move to node ‘n’ in
Trie.
– For the last character ‘t‘ of the word, The position for
‘t‘ in the ‘n‘ node is not filled. So, filled the position of
‘t‘ in ‘n‘ node and move to ‘t‘ node.
How does Trie Data Structure work?
After storing the word “and” and “ant” the Trie will look like this:
Representation of Trie node
• Every Trie node consists of a character pointer array or hashmap
and a flag to represent if the word is ending at that node or not.
But if the words contain only lower-case letters (i.e. a-z), then we
can define Trie Node with an array instead of a hashmap.
struct TrieNode
{
struct TrieNode* children[ALPHABET_SIZE];
// This will keep track of number of strings that are
// stored in the Trie from root node to any Trie node.
int wordCount = 0;
};
Basic Operations on Trie Data
Structure
• Insertion
• Search
• Deletion
1. Insertion in Trie Data Structure
• This operation is used to insert new strings into the Trie data structure
• Let us try to Insert “and” & “ant” in this Trie:

The word “and” &


“ant” have shared
some common
node (i.e “an”) this
is because of the
property of the
Trie data structure
that If two strings
have a common
prefix then they
will have the same
ancestor in the
trie.
1. Insertion in Trie Data Structure
• Now let us try to Insert “dad” & “do”:
Algorithm
1. Define a function insert(TrieNode *root, string &word) which will take
two parameters one for the root and the other for the string that we
want to insert in the Trie data structure.
2. Now take another pointer currentNode and initialize it with
the root node.
3. Iterate over the length of the given string and check if the value
is NULL or not in the array of pointers at the current character of the
string.
1. If It’s NULL then, make a new node and point the current character to this
newly created node.
2. Move the curr to the newly created node.
4. Finally, increment the wordCount of the last currentNode, this implies
that there is a string ending currentNode.
void insert_key(TrieNode* root, string& key)
{
// Initialize the currentNode pointer
// with the root node
TrieNode* currentNode = root;

// Iterate across the length of the string


for (auto c : key) {

// Check if the node exist for the current


// character in the Trie.
if (currentNode->childNode == NULL) {

// If node for current character does not exist


// then make a new node
TrieNode* newNode = new TrieNode();

// Keep the reference for the newly created


// node.
currentNode->childNode = newNode;
}
// Now, move the current node pointer to the newly
// created node.
currentNode = currentNode->childNode;
}

// Increment the wordEndCount for the last currentNode


// pointer this implies that there is a string ending at
// currentNode.
currentNode->wordCount++;
}
2. Searching in Trie Data Structure
• Search operation in Trie is performed in a similar way as the insertion operation but
the only difference is that whenever we find that the array of pointers in curr
node does not point to the current character of the word then return false instead
of creating a new node for that current character of the word.
• This operation is used to search whether a string is present in the Trie data structure
or not. There are two search approaches in the Trie data structure.
• Find whether the given word exists in Trie.
• Find whether any word that starts with the given prefix exists in Trie.
• There is a similar search pattern in both approaches. The first step in searching a
given word in Trie is to convert the word to characters and then compare every
character with the trie node from the root node. If the current character is present in
the node, move forward to its children. Repeat this process until all characters are
found.
2.1 Searching Prefix in Trie Data Structure
• Search for the prefix “an” in the Trie Data Structure.
Implementation of Prefix Search in Trie
data structure
bool isPrefixExist(TrieNode* root, string& key)
{
// Initialize the currentNode pointer
// with the root node
TrieNode* currentNode = root;

// Iterate across the length of the string


for (auto c : key) {

// Check if the node exist for the current


// character in the Trie.
if (currentNode->childNode == NULL) {

// Given word as a prefix does not exist in Trie


return false;
}

// Move the currentNode pointer to the already


// existing node for current character.
currentNode = currentNode->childNode;
}

// Prefix exist in the Trie


return true;
}
2.2 Searching Complete word in Trie
Data Structure
• It is similar to prefix search but additionally, we have
to check if the word is ending at the last character of
the word or not
Implementation of Search in Trie data
structure
bool search_key(TrieNode* root, string& key)
{
// Initialize the currentNode pointer
// with the root node
TrieNode* currentNode = root;

// Iterate across the length of the string


for (auto c : key) {

// Check if the node exist for the current


// character in the Trie.
if (currentNode->childNode == NULL) {

// Given word does not exist in Trie


return false;
}

// Move the currentNode pointer to the already


// existing node for current character.
currentNode = currentNode->childNode;
}

return (currentNode->wordCount > 0);


}
3. Deletion in Trie Data Structure
• This operation is used to delete strings from
the Trie data structure. There are three cases
when deleting a word from Trie.
1. The deleted word is a prefix of other words in
Trie.
2. The deleted word shares a common prefix with
other words in Trie.
3. The deleted word does not share any common
prefix with other words in Trie.
3.1 The deleted word is a prefix of
other words in Trie
• The deleted word “an” share a complete prefix with
another word “and” and “ant“

An easy solution
to perform a
delete operation
for this case is to
just decrement
the wordCount
by 1 at the
ending node of
the word
3.2 The deleted word shares a common
prefix with other words in Trie
• The deleted word “and” has some common prefixes
with other words ‘ant’. They share the prefix ‘an’.

The solution
for this case is
to delete all
the nodes
starting from
the end of the
prefix to the
last character
of the given
word.
3.3 The deleted word does not share any
common prefix with other words in Trie
• The word “geek” does not share any common prefix
with any other words.

The solution
for this case is
just to delete
all the nodes.
Implementation of all the cases
bool delete_key(TrieNode* root, string& word)
{
TrieNode* currentNode = root;
TrieNode* lastBranchNode = NULL;
char lastBrachChar = 'a';

for (auto c : word) {


if (currentNode->childNode == NULL) {
return false;
}
else {
int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}

if (count > 1) {
lastBranchNode = currentNode;
lastBrachChar = c;
}
currentNode = currentNode->childNode;
}
}
Implementation of all the cases
int count = 0;
for (int i = 0; i < 26; i++) {
if (currentNode->childNode[i] != NULL)
count++;
}

// Case 1: The deleted word is a prefix of other words


// in Trie.
if (count > 0) {
currentNode->wordCount--;
return true;
}

// Case 2: The deleted word shares a common prefix with


// other words in Trie.
if (lastBranchNode != NULL) {
lastBranchNode->childNode[lastBrachChar] = NULL;
return true;
}
// Case 3: The deleted word does not share any common
// prefix with other words in Trie.
else {
root->childNode[word[0]] = NULL;
return true;
}
}
Applications of Trie data structure
1. Autocomplete Feature: Autocomplete provides suggestions based
on what you type in the search box. Trie data structure is used to
implement autocomplete functionality
2. Spell Checkers: If the word typed does not appear in the
dictionary, then it shows suggestions based on what you typed.
It is a 3-step process that includes :
– Checking for the word in the data dictionary.
– Generating potential suggestions.
– Sorting the suggestions with higher priority on top.
– Trie stores the data dictionary and makes it easier to build an algorithm
for searching the word from the dictionary and provides the list of valid
words for the suggestion.
Applications of Trie data structure
3. Longest Prefix Matching Algorithm(Maximum
Prefix Length Match): This algorithm is used in
networking by the routing devices in IP networking.
Optimization of network routes requires contiguous
masking that bound the complexity of lookup a time
to O(n), where n is the length of the URL address in
bits.
• To speed up the lookup process, Multiple Bit trie
schemes were developed that perform the
lookups of multiple bits faster.
Advantages of Trie data structure
• Trie allows us to input and finds strings in O(l) time, where l is the length of a single word. It is
faster as compared to both hash tables and binary search trees.
• It provides alphabetical filtering of entries by the key of the node and hence makes it easier to
print all words in alphabetical order.
• Trie takes less space when compared to BST because the keys are not explicitly saved instead each
key requires just an amortized fixed amount of space to be stored.
• Prefix search/Longest prefix matching can be efficiently done with the help of trie data structure.
• Since trie doesn’t need any hash function for its implementation so they are generally faster than
hash tables for small keys like integers and pointers.
• Tries support ordered iteration whereas iteration in a hash table will result in pseudorandom order
given by the hash function which is usually more cumbersome.
• Deletion is also a straightforward algorithm with O(l) as its time complexity, where l is the length
of the word to be deleted.
Disadvantages of Trie data structure
• The main disadvantage of the trie is that it takes a
lot of memory to store all the strings. For each
node, we have too many node pointers which are
equal to the no of characters in the worst case.
• An efficiently constructed hash table(i.e. a good
hash function and a reasonable load factor) has
O(1) as lookup time which is way faster than O(l)
in the case of a trie, where l is the length of the
string.

You might also like