Implement Trie (Prefix Tree)
Implement Trie (Prefix Tree)
trie (pronounced as "try") or prefix tree is a tree data structure used to efficiently store and retrieve
keys in a dataset of strings. There are various applications of this data structure, such as autocomplete
and spellchecker.
Implement the Trie class:
Trie() Initializes the trie object.
void insert(String word) Inserts the string word into the trie.
boolean search(String word) Returns true if the string word is in the trie (i.e., was
inserted before), and false otherwise.
boolean startsWith(String prefix) Returns true if there is a previously inserted
string word that has the prefix prefix , and false otherwise.
Example 1:
Input ["Trie", "insert", "search", "search", "startsWith", "insert", "search"] [[],
["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]] Output [null, null, true, false, true,
null, true]
class Trie:
def __init__(self):
Perf:
Runtime: 92.58%
Memory: 85.49%
Learnings
Representing the induction hypothesis for node-based data structures like Linked Lists and Trees
can often take the form: DataStructure[head:cur] .
Examples:
Linked List: List[head:cur]
Tree: Tree[root:cur]
Note that for Tree, this only works when we are going along a linear path. With this caveat
in mind, it's no surprise that they look so similar: they are essentially the same thing.
Also for node-based data structures, remember a couple of these useful concepts:
Dummy node (like root in our example)
Terminal value (like None in both our example and Linked Lists)
Sentinel value (like 'EOW' in our example)
Interesting Notes
Something that initially surprised me was that the similarity of the loop invariant for the read
methods, search/startsWith was the same as the write method, insert . Upon reflection
though, it makes sense, as a lot of it is just making sure we are keeping i and cur in-sync.
Really, the main contribution of the loop invariant here is to help us traverse through our data
structures.
Ok. We have to implement
insert(word)
search(word)
startsWith(prefix)
I can visualize a Trie in my head. My intuition is also telling me that I should have a dummy node acting as a
root, so that we don't end up with potentially 26 disparate trees. We should probably instantiate it in the
constructor. My first thought is to create a Node class, but is there a different way to model this? Perhaps
just a bunch of nested dictionaries? Let's start with the nested dictionaries approach. I'm not even sure how
nested classes in Python work off the top of my head.
I feel like search and startsWith are easier to implement. I'm wondering if there's any downside to
starting with them, but let's go with it. Ok. So how do I implement search(word) ? Well, we'd have a
pointer, i , to the character, word[i] . What is the (loop) invariant? Let root[0:i] denote all nodes up
to the i th level. Then we can express our invariant as: root[0:i] contains word[0:i] . We can
initialize the invariant with: i = 0 Given that root[0:i] contains word[0:i] , how do we make sure
that root[0:i+1] contains word[0:i+1] ? Well, we can't. So the loop terminates if root[0:i+1] does
not contain word[0:i+1] . If it does, then we can execute i = i+1 . Ok, so how can we check if
root[0:i+1] contains word[0:i+1] ? We can check by seeing if word[i+1] in root[i] . Ok. Now I
feel as if we have to get more precise with how we are indexing into root . So. We root is a nested
dictionary. So root[key] == { ... } (some dictionary). Ok. It looks like we'll have to maintain a different
kind of index for root besides i . A kind of pointer. Let's call it cur . Then our invariant is: word[i] in
cur Well, the only way we can satisfy this invariant in the beginning is: i,cur = -1,root Loop guard:
i+1 < len(word) and word[i+1] in cur[word[i]] I'm having a hard time with the fact that we
actually can't guarantee word[0] in root Ok. So how about word[-1] in root where we define
word[-1] as None ? Ahh. Actually yeah, that's fine. I was actually having a hard time figuring out how to
index into cur in the loop guard...but that's not what we're supposed to do!!
Ok. Let's begin again. We have the right initialization code: i,cur = -1,root The loop guard is: i+1 <
len(word) and word[i+1] in cur The body? Well, we know we have to run several commands:
i = i+1
cur = cur[i]
The loop guard guarantees that they maintain the loop invariant so...I think we're good! Then we can just
return i == len(word) after the loop. Let's put it all together:
def search(self, word: str) -> bool:
i,cur = -1,root
while i+1 < len(word) and word[i+1] in cur:
i,cur = i+1,cur[word[i+1]]
return i == len(word)-1 and word[i] in cur
As you can see, I had to modify the code in order to correct some mistakes in reasoning I caught along the
way. The code feels a bit clumsy though. Among other things, I don't like that we have to manually do an
extra check at the end... Perhaps a better way to articulate the invariant is: word[0:i] ==
tree[root:cur] Actually, by conceptualizing the invariant this way, perhaps we can get rid of that extra
check at the end:
def search(self, word: str) -> bool:
i,cur = 0,root
while i < len(word) and word[i] in cur:
i,cur = i+1,cur[word[i+1]]
return i == len(word)
I actually keep seeing this kind of thing over and over again when it comes to node-based data structures
like trees and linked lists. In fact, I did something very similar when reasoning through some linked list
algorithms. But it's still not quite right: word[i+1] will be out of bounds and, even if I get that fixed,
cur[word[i+1]] will result in a KeyError if the word we are looking for isn't in our Trie. Hmmm...I get
the feeling that I'm actually not satisfying the loop invariant. Specifically, perhaps the right bound, cur , in
tree[root:cur] is not truly exclusive? Or we can just guard cur = cur[word[i+1]] lol. So simple.
Loop guard...hmmm...what do we want the postcondition of the loop to be? Let's think that through. We
know that if i == len(word) and cur == None , then we can return True . For anything else, we
return False . To explore a little more, what do the other cases correspond to?
Another approach:
i,cur = 0,self.root
while i < len(word) and cur != None:
i,cur = i+1,cur[word[i]]
return i == len(word) and cur == None
I realized while implementing these approaches that the source of my confusion yesterday was in how I was
incorrectly "incrementing" cur . Yesterday, I was doing: cur = cur[word[i+1]] . But think about the
initial conditions when i,cur = 0,self.root ... If word[0] in self.root , then for word[1] , we
need to check if it's in self.root[word[0]] ! So I don't have to worry about word[i+1] going out of
bounds anymore...
Woops. I just realized that in these 2 approaches, I totally forgot to check if word[i] in cur ! But you
know what? I can solve this in a slick way: have the default value for a non-existent key be None ! I'll have
to take care of that in the constructor and insert() , so I don't have to change anything in search() .
Wait wait wait. I don't think this is quite right. Hold on. What happened to our 'EOW' sentinel value? Ok. I
think the only modification I need to make is to change the return statement:
return i == len(word) and cur != None and 'EOW' in cur Wow. That case analysis I did
earlier was off! Let's re-do it to make sure I really understand what is happening. Let's first make
sure we know exactly what values cur can be:
A dictionary
None , in the case where word[i-1] not in parent(cur)
This happens when our Trie doesn't contain word even as a prefix. Right. So if cur ==
None we can definitely return False . And if cur != None , but 'EOW' not in cur ,
then our Trie contains word as a proper prefix, but not as a word.
Ok. With all that, this is my preferred version:
i,cur = 0,self.root
while i < len(word) and cur != None:
i,cur = i+1,cur.get(word[i], None)
return i == len(word) and cur != None and 'EOW' in cur
I was trying to decide between using collections.defaultdict or .get() , and I decided to go with
.get() as my "go-to" pattern so that I can use dictionary comprehensions. This also means that I won't
have to do anything special in the constructor or in insert() .
Ok ok. Now we also know how to implement startsWith pretty easily! It's literally the same thing, but we
just don't have to check for 'EOW' in cur .
Now, let's work on insert(word) . The loop invariant is:
word[0:i] == tree[root:cur] Lol. It's the same loop invariant. But this time, instead of
terminating when we encounter a key that doesn't exist, we insert a new dictionary at that key and
update cur to be that new dictionary. We terminate when i == len(word) , when we've
processed the entire word .
i,cur = 0,self.root
while i < len(word):
if word[i] not in cur:
cur[word[i]] = {}
i,cur = i+1,cur[word[i]]
cur['EOW'] = None
Beautiful. Now, for a final touch, I think it's good Software Engineering to make 'EOW' some kind of private
constant. I wonder how to do this in python?
Lol. Python doesn't have true privates. Instead, you put a leading underscore to mark it for internal
use only.
https://www.geeksforgeeks.org/underscore-_-python/
Python also doesn't have constants. Instead, you capitalize all letters to communicate that the
variable should not be changed.
https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python
Ugh. I'm getting errors with using _EOW I'm going to slate this for later.
def __init__(self):
self.root = { 'EOW': None }