Trie Insertion

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

TRIE

PROBLEM :Given a dictionary that contains a list of strings, and a string , we


want to check whether or not is in the dictionary.
1. A hash table (also called a hash map) is a data structure that is used to map
keys to values in an unsorted way. In our problem, we can treat each string in
the dictionary as a key to the hash table.

Time complexity : Since a hash function needs to consider all characters of the
input string, it is o(n) , where n is the length of the input string.

2. A trie or a prefix tree is a particular kind of search tree, where


nodes are usually keyed by strings.
In a trie, a link between two nodes represents a character in the
keyed string.
how a trie data structure looks like for key, value pairs ("abc",1),("xy",2),("xyz",5),
("abb",9)("xyzb",8), ("word",5).
Comparisons

Lookup Speed

When we look up a string for a hash table, we first calculate the hash value of the
string, which takes o(n) time. Then, it will take o(1) time to locate the hash value in
the memory, assuming we have a good hash function. Therefore, the overall
lookup time complexity is o(n)
When we look up a string for a trie, we go through each character of the string and
locate its corresponding node in the trie. The overall lookup time complexity is
also o(n)
However, the trie has some more overhead to retrieve the whole string. We need
to access the memory multiple times to locate the trie nodes along the character
path. For the hash table, we only need to compute the hash value for the input
string once. Therefore, it is relatively faster when we look up a whole string in the
hash table.
Memory Requirement

When we first construct a hash table, we normally pre-allocate a big chunk of


memory to avoid collisions by hashing uniformly on the size of the memory. In the
future when we insert a string into the hash table, we only need to store the string
content.
For a trie data structure, we need to store extra data such as character link
pointers and complete node flags. Therefore, the trie requires more memory to
store the string data.
However, if there are many common prefixes, the memory requirement becomes
smaller as we can share the prefix nodes.
Overall, the memory requirement between a hash table and a trie is based on the size
of pre-allocated hash table memory and input dictionary strings.
Applications
Trie : can quickly look up prefixes of keys, enumerate all entries with
a given prefix, etc.
Trie advantages :
Predictable O(k) lookup time where k is the size of the key
Lookup can take less than k time if it's not there
Supports ordered traversal
No need for a hash function
Deletion is straightforward
it all depends on what problem you're trying to solve. If all you
need to do is insertions and lookups, go with a hash table. If you
need to solve more complex problems such as prefix-related
queries, then a trie might be the better solution.
Trie insertion
struct TrieNode { //pointer array for child nodes of each node
TrieNode *childNode[26];
int wordEndCnt;
//constructor TrieNode()
{
//initialize the wordEndCnt variable with 0 //initialize every index of
childNode array with NULL
wordEndCnt = 0;
for (int i = 0; i < 26; i++)
{
childNode[i] = NULL;
}
}
};
Each TrieNode will have 26 children from a-z represented by a
character pointer array.

Each node will have a wordEndCnt integer variable. This variable


will store the count of the strings in the Trie which are the same as
that of the prefix represented by that node of the Trie.

Inside the structure of a TrieNode we made a constructor


TrieNode() which will initialize every index of the childNode pointer
array with NULL whenever a new node is created. It will also
initialize the wordEndCnt value for every node with 0.
TrieNode* insert_key(TrieNode *root, string &key){
//initialize the currentNode pointer with the root node TrieNode *currentNode = root;

//Store the length of the key string


int length = key.size();
//iterate across the length of the string
for (int i = 0; i < length; i++)
{
//Check X-'a' th index is NULL or not
if (currentNode->childNode[key[i] - 'a'] == NULL)
{
//If null make a new node TrieNode * newNode = new TrieNode();
//Point the X-'a' th index of current node to the new node
currentNode->childNode[key[i] - 'a'] = newNode; }

//Move the current node pointer to the newly created node.


currentNode = currentNode->childNode[key[i] - 'a'];
}
currentNode->wordEndCnt++;

//return the updated root node return root;}


Implementation of the Search Operation in a Trie Data Structure
bool search_key(TrieNode *root, string &queryString){

//Initialize the currentNode pointer with the root node TrieNode *currentNode = root;

//Store the length of the query string int length = queryString.size();


for (int i = 0; i < length; i++)
{
//Check if the X-'a' th index is NULL or not if (currentNode->childNode[queryString[i] - 'a'] == NULL)
{return false;}
//If null then the query string is not present in the Trie //return false

//If not NULL //Move the currentNode pointer to the node pointed by X-'a' th index of the //
current node currentNode = currentNode->childNode[queryString[i] - 'a'];
}

//If currentNode pointer is not NULL //and wordEndCnt for the currentNode pointer //is greater than
0 then return true else //return false
return true if currentNode != NULL && currentNode->wordEndCnt > 0; else false
}
Deletion

1. If key 'k' is not present in trie, then we should not modify trie in any way.
2. If key 'k' is not a prefix nor a suffix of any other key and nodes of key 'k' are not part
of any other key then all the nodes starting from root node(excluding root node) to
leaf node of key 'k' should be deleted.
3. If key 'k' is a prefix of some other key, then leaf node corresponding to key 'k' should
be marked as 'not a leaf node'. No node should be deleted in this case.
4. If key 'k' is a suffix of some other key 'k1', then all nodes of key 'k' which are not part
of key 'k1' should be deleted.

5. If key 'k' is not a prefix nor a suffix of any other key but some nodes of key 'k' are
shared with some other key 'k1', then nodes of key 'k' which are not common to any
other key should be deleted and shared nodes should be kept intact.
Compressed Trie
compressed Trie
Standard Trie :The size of a trie is directly correlated to the size of all
the possible values that the trie could represent.
The first thing that we’ll notice when we look at this trie is that there
are two keys for which we have redundant nodes as well as a
redundant chain of edges.
A redundant node is one that takes up an undue amount of space
because it only has one child node of its own. We’ll see that for the
key "deck", the node for the character "e" is redudant, because it
only has a single child node, but we still have to initialize an entire
node, with all of its pointers, as a result.
Similarly, the edges that connect the key "did" are redundant, as
they connect redundant nodes that don’t really all need to be
initialized, since they each have only one child node of their own.
The redundancy of a standard trie comes from the fact that we are
repeating ourselves by allocating space for nodes or edges that
contain only one possible string or word. Another way to to think about
is that we repeat ourselves by allocating a lot of space for something
that only has one possible branch path.
RULE FOR COMPRESSED TRIE
each internal node (every parent node) must have two or more child nodes. If a
parent has two child nodes, which is at two branch paths to potential leaf nodes,
then it doesn’t need to be compressed, since we actually need to allocate space
and memory for both of these branch paths.
However, if a parent node only has one child node — that is to say, if it only has
one possible branch path to a leaf node — then it can be compressed. In order to
do the work of “compacting” the trie, each node that is the only child of its
parent node is merged into its parent. The parent node and the single-child node
are fused together, as are the values that they contain.
Compressed tries are also known as radix trees, radix tries, or compact prefix
trees.
a space-optimized version of a standard trie. Unlike regular tries, the
references/edges/pointers of a radix tree can hold a sequence of a string, and
not just a single character element.
PATRICIA tree.
A trie’s keys could be read and processed a byte at a time, half a byte at a time, or two bits at a
time. However, there is one particular type of radix tree that processes keys in a really
interesting way, called a PATRICIA tree.
PATRICIA stands for “Practical Algorithm To Retrieve Information Coded In Alphanumeric”.

The most important thing to remember about a PATRICIA tree is that its radix is 2. Since we
know that the way that keys are compared happens r bits at a time, where 2 to the power of r is
the radix of the tree, we can use this math to figure out how a PATRICIA tree reads a key.

Since the radix of a PATRICIA tree is 2, we know that r must be equal to 1, since 2¹ = 2. Thus, a
PATRICIA tree processes its keys one bit at a time.
Let’s say that we want to turn our original set of keys, ["dog", "doge",
"dogs"] into a PATRICIA tree representation. Since a PATRICIA tree
reads keys one bit at a time, we’ll need to convert these strings down
nto binary so that we can look at them bit by bit.
dog: 01100100 01101111 01100111
doge: 01100100 01101111 01100111 01100101
dogs: 01100100 01101111 01100111 01110011
Notice how the keys "doge" and "dogs" are both substrings of "dog". The binary
representation of these words is the exact same up until the 25th digit. Interestingly,
even "doge" is a substring of "dogs"; the binary representation of both of these two
words is the same up until the 28th digit!

so since we know that "dog" is a prefix of "doge", we will compare them bit by bit. The
point at which they diverge is at bit 25, where "doge" has a value of 0. Since we know
that our binary radix tree can only have 0’s and 1’s, we just need to put "doge" in the
correct place. Since it diverges with a value of 0, we’ll add it as the left child node of
our root node "dog".
Now we’ll do the same thing with "dogs". Since "dogs" differs from its binary prefix "doge" at bit
28, we’ll compare bit by bit up until that point.
Suffix trie
Suffix tree is nothing but an extended version of trie. It's a compressed trie
which includes all of a string's suffixes. There are some string-related problems
which can be solved using suffix trees. Some of those problems are pattern
matching, identifying unique substrings inside a string, and determining the
longest palindrome. The suffix tree for the string S of length n is defined as a
tree that has the following properties:
1. There are exactly n leaves on the tree, numbered 1 through n.
2. Each edge is identified by a non-empty S substring.
3. Every internal node has at least two child, with the exception of the root.
4. String-labels can't start with the same character on two edges that emerge from
the same node.
5. The suffix S[i..n], for I from 1 to n, is formed by combining all the string-labels
encountered on the path from the root to the leaf i.

Functionality of Suffix Tree


A suffix tree for a string S of length n can be created in Theta (n) time if the letters
come from an alphabet of integers with a polynomial range of -infinity to +infinity (in
particular, this is true for fixed-sized alphabets). The majority of the time is spent
sorting the letters into an O(n)-sized range for larger alphabets; on average, it takes
O(nlog n) time. Imagine that over the string S of length n, a suffix tree has been
constructed, then you can:
1. Look for strings:
In O(m) time, determine whether a string P of length m is a substring.
In O(m) time, find the very first occurrence of the sequences P1...PQ with a total
length of m as substrings.
In O(m+z) time, find all z occurrences of the patterns P1...PQ of length m in
substrings.
Trie applications
Consider a web browser. Do you know how the web browser can auto complete
your text or show you many possibilities of the text that you could be writing? Yes,
with the trie you can do it very fast. Do you know how an orthographic corrector
can check that every word that you type is in a dictionary? Again a trie. You can
also use a trie for suggested corrections of the words that are present in the text
but not in the dictionary.

You might also like