0% found this document useful (0 votes)
70 views

Implement Trie (Prefix Tree)

The document describes a trie data structure and provides an example Python implementation of the Trie class. A trie is a tree that stores strings to efficiently perform operations like autocomplete and spellchecking. The Trie class needs to implement insert(), search(), and startsWith() methods. insert() adds a string to the trie. search() checks if a string is in the trie. startsWith() checks if a string prefix is in the trie.

Uploaded by

frencheddonuts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Implement Trie (Prefix Tree)

The document describes a trie data structure and provides an example Python implementation of the Trie class. A trie is a tree that stores strings to efficiently perform operations like autocomplete and spellchecking. The Trie class needs to implement insert(), search(), and startsWith() methods. insert() adds a string to the trie. search() checks if a string is in the trie. startsWith() checks if a string prefix is in the trie.

Uploaded by

frencheddonuts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

trie (pronounced as "try") or prefix tree is a tree data structure used to efficiently store and retrieve
keys in a dataset of strings. There are various applications of this data structure, such as autocomplete
and spellchecker.
Implement the Trie class:
Trie()  Initializes the trie object.
void insert(String word)  Inserts the string  word  into the trie.
boolean search(String word)  Returns  true  if the string  word  is in the trie (i.e., was
inserted before), and  false  otherwise.
boolean startsWith(String prefix)  Returns  true  if there is a previously inserted
string  word  that has the prefix  prefix , and  false  otherwise.
Example 1:
Input ["Trie", "insert", "search", "search", "startsWith", "insert", "search"] [[],
["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]] Output [null, null, true, false, true,
null, true]

Explanation Trie trie = new Trie(); trie.insert("apple"); trie.search("apple"); // return


True trie.search("app"); // return False trie.startsWith("app"); // return True
trie.insert("app"); trie.search("app"); // return True
Constraints:
1 <= word.length, prefix.length <= 2000
word  and  prefix  consist only of lowercase English letters.
At most  3 * 104  calls in total will be made to  insert ,  search , and  startsWith .

class Trie:

def __init__(self):

def insert(self, word: str) -> None:

def search(self, word: str) -> bool:

def startsWith(self, prefix: str) -> bool:

# Your Trie object will be instantiated and called as such:


# obj = Trie()
# obj.insert(word)
# param_2 = obj.search(word)
# param_3 = obj.startsWith(prefix)
My Solution
class Trie:
def __init__(self):
self.root = { 'EOW': None }

def insert(self, word: str) -> None:


i,cur = 0,self.root
while i < len(word):
if word[i] not in cur:
cur[word[i]] = {}
i,cur = i+1,cur[word[i]]
cur['EOW'] = None

def search(self, word: str) -> bool:


i,cur = 0,self.root
while i < len(word) and cur != None:
i,cur = i+1,cur.get(word[i], None)
return i == len(word) and cur != None and 'EOW' in cur

def startsWith(self, prefix: str) -> bool:


i,cur = 0,self.root
while i < len(prefix) and cur != None:
i,cur = i+1,cur.get(prefix[i], None)
return i == len(prefix) and cur != None

Perf:
Runtime: 92.58%
Memory: 85.49%

Learnings
Representing the induction hypothesis for node-based data structures like Linked Lists and Trees
can often take the form: DataStructure[head:cur] .
Examples:
Linked List: List[head:cur]
Tree: Tree[root:cur]
Note that for Tree, this only works when we are going along a linear path. With this caveat
in mind, it's no surprise that they look so similar: they are essentially the same thing.
Also for node-based data structures, remember a couple of these useful concepts:
Dummy node (like root in our example)
Terminal value (like None in both our example and Linked Lists)
Sentinel value (like 'EOW' in our example)

Interesting Notes
Something that initially surprised me was that the similarity of the loop invariant for the read
methods, search/startsWith was the same as the write method, insert . Upon reflection
though, it makes sense, as a lot of it is just making sure we are keeping i and cur in-sync.
Really, the main contribution of the loop invariant here is to help us traverse through our data
structures.
Ok. We have to implement
insert(word)
search(word)
startsWith(prefix)

I can visualize a Trie in my head. My intuition is also telling me that I should have a dummy node acting as a
root, so that we don't end up with potentially 26 disparate trees. We should probably instantiate it in the
constructor. My first thought is to create a Node class, but is there a different way to model this? Perhaps
just a bunch of nested dictionaries? Let's start with the nested dictionaries approach. I'm not even sure how
nested classes in Python work off the top of my head.
I feel like search and startsWith are easier to implement. I'm wondering if there's any downside to
starting with them, but let's go with it. Ok. So how do I implement search(word) ? Well, we'd have a
pointer, i , to the character, word[i] . What is the (loop) invariant? Let root[0:i] denote all nodes up
to the i th level. Then we can express our invariant as: root[0:i] contains word[0:i] . We can
initialize the invariant with: i = 0 Given that root[0:i] contains word[0:i] , how do we make sure
that root[0:i+1] contains word[0:i+1] ? Well, we can't. So the loop terminates if root[0:i+1] does
not contain word[0:i+1] . If it does, then we can execute i = i+1 . Ok, so how can we check if
root[0:i+1] contains word[0:i+1] ? We can check by seeing if word[i+1] in root[i] . Ok. Now I
feel as if we have to get more precise with how we are indexing into root . So. We root is a nested
dictionary. So root[key] == { ... } (some dictionary). Ok. It looks like we'll have to maintain a different
kind of index for root besides i . A kind of pointer. Let's call it cur . Then our invariant is: word[i] in
cur Well, the only way we can satisfy this invariant in the beginning is: i,cur = -1,root Loop guard:
i+1 < len(word) and word[i+1] in cur[word[i]] I'm having a hard time with the fact that we
actually can't guarantee word[0] in root Ok. So how about word[-1] in root where we define
word[-1] as None ? Ahh. Actually yeah, that's fine. I was actually having a hard time figuring out how to
index into cur in the loop guard...but that's not what we're supposed to do!!
Ok. Let's begin again. We have the right initialization code: i,cur = -1,root The loop guard is: i+1 <
len(word) and word[i+1] in cur The body? Well, we know we have to run several commands:

i = i+1
cur = cur[i]

The loop guard guarantees that they maintain the loop invariant so...I think we're good! Then we can just
return i == len(word) after the loop. Let's put it all together:
def search(self, word: str) -> bool:
i,cur = -1,root
while i+1 < len(word) and word[i+1] in cur:
i,cur = i+1,cur[word[i+1]]
return i == len(word)-1 and word[i] in cur
As you can see, I had to modify the code in order to correct some mistakes in reasoning I caught along the
way. The code feels a bit clumsy though. Among other things, I don't like that we have to manually do an
extra check at the end... Perhaps a better way to articulate the invariant is: word[0:i] ==
tree[root:cur] Actually, by conceptualizing the invariant this way, perhaps we can get rid of that extra
check at the end:
def search(self, word: str) -> bool:
i,cur = 0,root
while i < len(word) and word[i] in cur:
i,cur = i+1,cur[word[i+1]]
return i == len(word)

I actually keep seeing this kind of thing over and over again when it comes to node-based data structures
like trees and linked lists. In fact, I did something very similar when reasoning through some linked list
algorithms. But it's still not quite right: word[i+1] will be out of bounds and, even if I get that fixed,
cur[word[i+1]] will result in a KeyError if the word we are looking for isn't in our Trie. Hmmm...I get
the feeling that I'm actually not satisfying the loop invariant. Specifically, perhaps the right bound, cur , in
tree[root:cur] is not truly exclusive? Or we can just guard cur = cur[word[i+1]] lol. So simple.

def search(self, word: str) -> bool:


i,cur = 0,root
while i < len(word) and word[i] in cur:
i = i+1
if i < len(word) and word[i] in cur:
cur = cur[word[i]]
return i == len(word)

Eh. The first one actually feels better.


I suspect some version of that loop will be written over and over again during the course of implementing
the rest of the methods.
The Next Day... You know what? I'm still not happy with search() , but let's move on to startsWith() .
Perhaps by deriving it from scratch, I can get a different perspective on search() as well.
Alright. startsWith(prefix) ... We want to see if prefix exists as a path in our Trie. Oh! Oh shit. I just
realized that what I wrote for search() is actually startsWith() ! To properly implement search() ,
we actually have to make sure that cur contains or becomes some kind of sentinel value that denotes the
end of a word! Man. This is what I was kind of worried about when deciding to focus on implementing one
function at a time. Deciding to implement search() and startsWith() was the right idea, as figuring
out what they need informs how we implement insert() as well.
So what I'm realizing is that every valid word's last node will also include an end-of-word (EOW) sentinel
value. What should this sentinel value be? Perhaps something like "EOW"? I should probably initialize root
w/ it as well. The corresponding value can be None .
Ok. Actually, let's go back to implementing search() with this new insight. What do we want our invariant
to be? I really want word[0:i] == tree[root:cur] to work. Let's try it again. Initialization:
i,cur = 0,self.root

Loop guard...hmmm...what do we want the postcondition of the loop to be? Let's think that through. We
know that if i == len(word) and cur == None , then we can return True . For anything else, we
return False . To explore a little more, what do the other cases correspond to?

i < len(word) and cur == None


Our Trie only contains a proper prefix of the word .
i == len(word) and cur != None
Our Trie contains word as a proper prefix, but not as a whole word.
word is a proper prefix of some other word that is in our Trie.

i < len(word) and cur != None


This should never happen because in this state, the loop should still be processing word .
The negation of this predicate is consistent with the 3 other states. Here's a question: can
we just use the negation of i == len(word) and cur == None as our loop guard?
i != len(word) or cur != None Hmmm...If we go this route, we'd definitely need return
False inside the loop body, since the loop would only terminate if our Trie contains word .
What would be nice about this approach is that we can simply return True after the loop. Let's
try both approaches. One approach:
i,cur = 0,self.root
while i != len(word) or cur != None:
if i < len(word) and cur != None:
i,cur = i+1,cur[word[i]]
else:
return False
return True

Another approach:
i,cur = 0,self.root
while i < len(word) and cur != None:
i,cur = i+1,cur[word[i]]
return i == len(word) and cur == None

I realized while implementing these approaches that the source of my confusion yesterday was in how I was
incorrectly "incrementing" cur . Yesterday, I was doing: cur = cur[word[i+1]] . But think about the
initial conditions when i,cur = 0,self.root ... If word[0] in self.root , then for word[1] , we
need to check if it's in self.root[word[0]] ! So I don't have to worry about word[i+1] going out of
bounds anymore...
Woops. I just realized that in these 2 approaches, I totally forgot to check if word[i] in cur ! But you
know what? I can solve this in a slick way: have the default value for a non-existent key be None ! I'll have
to take care of that in the constructor and insert() , so I don't have to change anything in search() .
Wait wait wait. I don't think this is quite right. Hold on. What happened to our 'EOW' sentinel value? Ok. I
think the only modification I need to make is to change the return statement:
return i == len(word) and cur != None and 'EOW' in cur Wow. That case analysis I did
earlier was off! Let's re-do it to make sure I really understand what is happening. Let's first make
sure we know exactly what values cur can be:
A dictionary
None , in the case where word[i-1] not in parent(cur)
This happens when our Trie doesn't contain word even as a prefix. Right. So if cur ==
None we can definitely return False . And if cur != None , but 'EOW' not in cur ,
then our Trie contains word as a proper prefix, but not as a word.
Ok. With all that, this is my preferred version:
i,cur = 0,self.root
while i < len(word) and cur != None:
i,cur = i+1,cur.get(word[i], None)
return i == len(word) and cur != None and 'EOW' in cur

I was trying to decide between using collections.defaultdict or .get() , and I decided to go with
.get() as my "go-to" pattern so that I can use dictionary comprehensions. This also means that I won't
have to do anything special in the constructor or in insert() .
Ok ok. Now we also know how to implement startsWith pretty easily! It's literally the same thing, but we
just don't have to check for 'EOW' in cur .
Now, let's work on insert(word) . The loop invariant is:
word[0:i] == tree[root:cur] Lol. It's the same loop invariant. But this time, instead of
terminating when we encounter a key that doesn't exist, we insert a new dictionary at that key and
update cur to be that new dictionary. We terminate when i == len(word) , when we've
processed the entire word .
i,cur = 0,self.root
while i < len(word):
if word[i] not in cur:
cur[word[i]] = {}
i,cur = i+1,cur[word[i]]
cur['EOW'] = None

Beautiful. Now, for a final touch, I think it's good Software Engineering to make 'EOW' some kind of private
constant. I wonder how to do this in python?
Lol. Python doesn't have true privates. Instead, you put a leading underscore to mark it for internal
use only.
https://www.geeksforgeeks.org/underscore-_-python/
Python also doesn't have constants. Instead, you capitalize all letters to communicate that the
variable should not be changed.
https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python
Ugh. I'm getting errors with using _EOW I'm going to slate this for later.
def __init__(self):
self.root = { 'EOW': None }

def insert(self, word: str) -> None:


i,cur = 0,self.root
while i < len(word):
if word[i] not in cur:
cur[word[i]] = {}
i,cur = i+1,cur[word[i]]
cur['EOW'] = None

def search(self, word: str) -> bool:


i,cur = 0,self.root
while i < len(word) and cur != None:
i,cur = i+1,cur.get(word[i], None)
return i == len(word) and cur != None and 'EOW' in cur

def startsWith(self, prefix: str) -> bool:


i,cur = 0,self.root
while i < len(prefix) and cur != None:
i,cur = i+1,cur.get(prefix[i], None)
return i == len(prefix) and cur != None

Killed it! First submission passed.


Runtime: 92.58%
Memory: 85.49%
The core difficulty I ran into here was the implementation. Specifically, reasoning about how to develop
guards for termination for Trees. Like, what is the analogue of i < len(arr) for Trees?

You might also like