Trie Insertion
Trie Insertion
Trie Insertion
Time complexity : Since a hash function needs to consider all characters of the
input string, it is o(n) , where n is the length of the input string.
Lookup Speed
When we look up a string for a hash table, we first calculate the hash value of the
string, which takes o(n) time. Then, it will take o(1) time to locate the hash value in
the memory, assuming we have a good hash function. Therefore, the overall
lookup time complexity is o(n)
When we look up a string for a trie, we go through each character of the string and
locate its corresponding node in the trie. The overall lookup time complexity is
also o(n)
However, the trie has some more overhead to retrieve the whole string. We need
to access the memory multiple times to locate the trie nodes along the character
path. For the hash table, we only need to compute the hash value for the input
string once. Therefore, it is relatively faster when we look up a whole string in the
hash table.
Memory Requirement
//Initialize the currentNode pointer with the root node TrieNode *currentNode = root;
//If not NULL //Move the currentNode pointer to the node pointed by X-'a' th index of the //
current node currentNode = currentNode->childNode[queryString[i] - 'a'];
}
//If currentNode pointer is not NULL //and wordEndCnt for the currentNode pointer //is greater than
0 then return true else //return false
return true if currentNode != NULL && currentNode->wordEndCnt > 0; else false
}
Deletion
1. If key 'k' is not present in trie, then we should not modify trie in any way.
2. If key 'k' is not a prefix nor a suffix of any other key and nodes of key 'k' are not part
of any other key then all the nodes starting from root node(excluding root node) to
leaf node of key 'k' should be deleted.
3. If key 'k' is a prefix of some other key, then leaf node corresponding to key 'k' should
be marked as 'not a leaf node'. No node should be deleted in this case.
4. If key 'k' is a suffix of some other key 'k1', then all nodes of key 'k' which are not part
of key 'k1' should be deleted.
5. If key 'k' is not a prefix nor a suffix of any other key but some nodes of key 'k' are
shared with some other key 'k1', then nodes of key 'k' which are not common to any
other key should be deleted and shared nodes should be kept intact.
Compressed Trie
compressed Trie
Standard Trie :The size of a trie is directly correlated to the size of all
the possible values that the trie could represent.
The first thing that we’ll notice when we look at this trie is that there
are two keys for which we have redundant nodes as well as a
redundant chain of edges.
A redundant node is one that takes up an undue amount of space
because it only has one child node of its own. We’ll see that for the
key "deck", the node for the character "e" is redudant, because it
only has a single child node, but we still have to initialize an entire
node, with all of its pointers, as a result.
Similarly, the edges that connect the key "did" are redundant, as
they connect redundant nodes that don’t really all need to be
initialized, since they each have only one child node of their own.
The redundancy of a standard trie comes from the fact that we are
repeating ourselves by allocating space for nodes or edges that
contain only one possible string or word. Another way to to think about
is that we repeat ourselves by allocating a lot of space for something
that only has one possible branch path.
RULE FOR COMPRESSED TRIE
each internal node (every parent node) must have two or more child nodes. If a
parent has two child nodes, which is at two branch paths to potential leaf nodes,
then it doesn’t need to be compressed, since we actually need to allocate space
and memory for both of these branch paths.
However, if a parent node only has one child node — that is to say, if it only has
one possible branch path to a leaf node — then it can be compressed. In order to
do the work of “compacting” the trie, each node that is the only child of its
parent node is merged into its parent. The parent node and the single-child node
are fused together, as are the values that they contain.
Compressed tries are also known as radix trees, radix tries, or compact prefix
trees.
a space-optimized version of a standard trie. Unlike regular tries, the
references/edges/pointers of a radix tree can hold a sequence of a string, and
not just a single character element.
PATRICIA tree.
A trie’s keys could be read and processed a byte at a time, half a byte at a time, or two bits at a
time. However, there is one particular type of radix tree that processes keys in a really
interesting way, called a PATRICIA tree.
PATRICIA stands for “Practical Algorithm To Retrieve Information Coded In Alphanumeric”.
The most important thing to remember about a PATRICIA tree is that its radix is 2. Since we
know that the way that keys are compared happens r bits at a time, where 2 to the power of r is
the radix of the tree, we can use this math to figure out how a PATRICIA tree reads a key.
Since the radix of a PATRICIA tree is 2, we know that r must be equal to 1, since 2¹ = 2. Thus, a
PATRICIA tree processes its keys one bit at a time.
Let’s say that we want to turn our original set of keys, ["dog", "doge",
"dogs"] into a PATRICIA tree representation. Since a PATRICIA tree
reads keys one bit at a time, we’ll need to convert these strings down
nto binary so that we can look at them bit by bit.
dog: 01100100 01101111 01100111
doge: 01100100 01101111 01100111 01100101
dogs: 01100100 01101111 01100111 01110011
Notice how the keys "doge" and "dogs" are both substrings of "dog". The binary
representation of these words is the exact same up until the 25th digit. Interestingly,
even "doge" is a substring of "dogs"; the binary representation of both of these two
words is the same up until the 28th digit!
so since we know that "dog" is a prefix of "doge", we will compare them bit by bit. The
point at which they diverge is at bit 25, where "doge" has a value of 0. Since we know
that our binary radix tree can only have 0’s and 1’s, we just need to put "doge" in the
correct place. Since it diverges with a value of 0, we’ll add it as the left child node of
our root node "dog".
Now we’ll do the same thing with "dogs". Since "dogs" differs from its binary prefix "doge" at bit
28, we’ll compare bit by bit up until that point.
Suffix trie
Suffix tree is nothing but an extended version of trie. It's a compressed trie
which includes all of a string's suffixes. There are some string-related problems
which can be solved using suffix trees. Some of those problems are pattern
matching, identifying unique substrings inside a string, and determining the
longest palindrome. The suffix tree for the string S of length n is defined as a
tree that has the following properties:
1. There are exactly n leaves on the tree, numbered 1 through n.
2. Each edge is identified by a non-empty S substring.
3. Every internal node has at least two child, with the exception of the root.
4. String-labels can't start with the same character on two edges that emerge from
the same node.
5. The suffix S[i..n], for I from 1 to n, is formed by combining all the string-labels
encountered on the path from the root to the leaf i.