Tut7 Soln
Tut7 Soln
Let’s revisit the same old problem that we’ve discussed at the beginning of the semester, finding
missing items in the array. Given n items in no particular order, but this time possibly with dupli-
cates, find the first missing number (if we were to start counting from 1), or output “all present”
if all values 1 to n were present in the input.
For example, given [8, 5, 3, 3, 2, 1, 5, 4, 2, 3, 3, 2, 1, 9], the first missing number here is 6.
Bonus: (no need for hash functions): Can we do the same thing using O(1) space? i.e. in-place.
Solution: The idea here is to use a hash set to store all the values that we’ve seen in the array
and ignore duplicates. Then, we’ll start going through the hash set to check if each number was
present, starting from 1. This would take O(n) time in expectation given the right assumption for
hash functions.
Note that we could actually just use an array of size n instead to accomplish the same goal (by
naively using element value as the array index)! Sometimes it’s important to realise when a
simple array would suffice and using something more complex such as a hash set can be unnecessary.
Solution to Bonus: The solution to the bonus question does not use hash sets, but it has a
similar idea to cuckoo hashing. The idea is that we will consider every item once, and given its
value, we will try to place it in its own index by swapping it in-place.
For example, based on the input given above, we would try to place item 8 at index 8, but
there’s an item of value 4 there, so then item 8 “kicks out” the 4 from that place. Now we try
to place 4 in its index, but there’s an item of value 3 there. So we repeat the same process
with 3 before stopping because we realise that 3 is already sitting at index 3. If there is a value
larger than n, we can simply ignore that value since we know that it will not be part of consideration.
After we do this for all elements, we run through the array one more time and check if at index i
the value i is stored. Then, output the first index such that it fails.
The overall time complexity of this solution is O(n). One way to reason about this is that an
element will not be moved once it is in the correct position. So, when a swap happens, at least
one element would be moved to the correct position, and not swapped thereafter. Hence, the total
number of swaps would be at most n. Finally, we add in the cost of iterating through the array
(which involves only constant time operations per element). Therefore, we get O(n).
1
Problem 2. Data Structure 2.0
2. Insert(val) which inserts an item val into the set if not present.
3. Remove(val) which removes the item val from the set if present.
4. GetRandom() which returns a random element from the current set of elements. Every
element must have an equal probability of being returned.
Insert, Remove and GetRandom must work in expected O(1) time! Hint: a Hash Table might
come in handy!
Assume that the maximum number of elements present in the RandomizedSet will never exceed
a reasonable number n.
Solution: We will use a combination of an Array and a Hash Table. The Array is used to store
all our items. The Hash Table is used to check if an item is present in the Array efficiently, and
also to check the position of an item in the Array efficiently.
On Insert(val), if val is already in T , we do nothing. Otherwise, we will insert val into the back of
A, i.e. A[size] = val. Then, increment size by one. We will also insert a (key, value) pair of (val,
size) into T . This (key, value) pair is used to keep track of the location of val in A.
On Remove(val), if val is not in T , we do nothing. Otherwise, we will need to remove val from A
and T . First, we check the location of val in A with T [val]. If val is not positioned at the last
index of A, we swap it with the element at the last index of A, and update the corresponding
positions in T . Finally, we decrement size by one (effectively deleting the last element) and remove
val from T .
On GetRandom(), we simply need to randomly choose an index from [0, size − 1], both inclusive.
Then, return the value at that index in A.
2
Problem 3. Data Structure 3.0
Let’s try to improve upon the kind of data structures we’ve been using so far a little. Implement
a data structure with the following operations:
2. Modify the BST: for each node of the BST, we also store a pointer to its predecessor and
successor (let’s call the pointers “previous” and “next”).
• For insertions and deletions: perform the usual insertion and deletion procedure on the BST,
and add/remove the corresponding entry in the hash table, which will take O(log n) time.
However, there is also the added work of having to maintain the two additional pointers.
After we insert, we run the usual successor and predecessor algorithm, which should take
O(log n) time. After those are found, we need to: (1) set the predecessor’s “next” to the
newly inserted node, (2) set the newly inserted node’s “previous” to the predecessor, and the
“next” to the successor and (3) set the “previous” of the successor to the newly inserted node.
Deletion works off the same idea.
• For lookup queries: use the hash table to find the corresponding node in O(1) time.
• For successor and predecessor queries: first, perform a lookup query to obtain the node in
the BST, and then use either the “next” or the “previous” pointer to get the corresponding
successor/predecessor node that we need.
3
Problem 4. (Priority queue)
There are situations where, given a data set containing n unique elements, we want to know
the top k highest value elements. A possible solution is to store all n elements first, sort the data
set in O(n log n), then report the right-most k elements. This works, but we can do better.
(a) Design a data structure that supports the following operation better than O(n log n):
• getKLargest(): returns the top k highest value elements in the data set.
(b) Instead of having a static data set, you could have the data streaming in. However, your data
structure must still be ready to answer queries for the top k elements efficiently. Expand on
your data structure to support the following two operations better than O(n log n):
For example, if the data set contains {1, 13, 7, 9, 8, 4} initially and we want to know the top
3 highest value elements, calling getKLargest() should return the values {13, 9, 8}.
Suppose we then add the number 11 into the data set by calling insertNext(11). The data
set now contains {1, 13, 7, 9, 8, 4, 11} and calling getKLargest() should return {13, 11, 9}.
Solution: For part (a), we can quick-select the k th largest element in expected O(n) time. The
elements that are required will be found between this element and the end of the array.
For part (b), a key thing to realise is that we only need to store the largest k elements at any time.
We can maintain a min priority queue of no more than k elements. After each insertion, if the
priority queue contains more than k elements, remove the smallest element so that the priority
queue size remains to be k. Each insertion and removal run in O(log k) time since the priority
queue contains at most k elements. For getKLargest(), we extract and return all items from the
priority queue, then insert them back. These can be done in O(k log k) time. Since k ≤ n, this
approach is able to support the operations better than O(n log n).
The overall complexity over q queries of insertNext is O(q log k). Certain languages (such as
C++) use this idea to implement partial_sort. Priority queue must be used here (instead of
waiting for all elements before we do a quick select) so that we can always have access to k largest
elements throughout all stages of the stream instead of only at the end.
4
Problem 5. Toothless and His Teeth
Toothless has recently lost his teeth. Fortunately, his friend Toothful told him that there are n
houses along a street numbered from 1 to n, some of which are owned by dentists.
• If Toothless visits a house i whose owner is not a dentist, the owner will give Toothless some
candies, causing him to lose |ai | teeth, where ai < 0.
Toothless starts with 0 teeth (because he just lost them). Being an outgoing dragon, he cares
less about his teeth, and more about visiting as many houses on the street as possible, in increas-
ing order of house number. Given A = [a1 , a2 , · · · , an ], give an O(n log n) time algorithm to help
Toothless find the maximum number of houses he can visit such that the number of teeth that he
has at any point of time cannot be negative.
For example, given [4, −4, 1, −3, 1, −3], the maximum number of houses he can visit is 5,
achieved by visiting house 1, 3, 4, 5 and 6 (ending up with 0 teeth).
5
Solution: Let t be the current number of teeth and h be the number of houses we are planning
to visit (both initialised to 0). Also, let P be a min priority queue storing the houses we are
planning to visit. For each ai from i = 1 to n, we first insert ai into P . Then,
• If t + ai < 0, extract (and remove) the most negative house aj from P and cancel the visit to
that house. Visit house i instead. Update t to t − aj + ai .
In each iteration, we are performing an insertion (and possibly an extraction) on a priority queue
whose size is bounded by n. For example, when ai ≥ 0 for all i, we will be doing n insertions on
P . Therefore, assuming that we are using a min heap, with n iterations, the algorithm runs in
O(n log n) time in the worst-case.
Intuitively, the algorithm visits a house whenever possible. If a house cannot be visited, the
algorithm makes the best use of the current house by cancelling the visit to the most negative
house and visiting the current one instead, as an attempt to maximise the current number of teeth
(despite number of houses visited does not change), so that it is able to visit more houses in the
future. It can be rigorously proven that this strategy gives us the maximum number of houses, as
follows:
We claim that once the algorithm terminates, h is the maximum number of houses Toothless can
visit. In particular, the following invariants are maintained throughout the algorithm:
• t is the maximum possible number of teeth among all valid sequences of houses of length h.
• The elements of P forms the unique multiset of houses with sum t and length h.
In the (i + 1)-st iteration, if t + ai+1 ≥ 0, we know that there exists a subsequence of length h
that can be extended to also include house i + 1. Furthermore, by the maximality of t, t + ai+1
is the maximum possible number of teeth among all valid sequences of houses of length h + 1. By
uniqueness of P , the only way to achieve sum t + ai+1 and length h + 1 is to include house i + 1
on top of what were already in P .
So in fact, apart from helping Toothless to maximize the number of houses he visit, we have also
found the maximum number of teeth he can keep after visiting this many houses.
6
Problem 6. Optional. No need to prepare! TAs will pick some puzzles and solve as a class!
You have two doors in front of you. One door leads to heaven, and the other to hell. There are
two male guards, one by each door. One guard always tells the truth, and the other always lies,
but you don’t know who is who.
You can only ask one question to one guard to find the door to heaven. What question would
you ask?
Solution: “If I ask the other guard about which door leads to heaven, what would his answer
be?”
Irrespective of whom do you ask this question, you will always get an answer which leads to hell.
So pick the alternative.
A housewife finally has time for a break. She goes to a bar and challenges a bartender to guess
her kids’ ages:
The bartender does some calculation but claims he still doesn’t know.
“Well, guess i’m getting a free drink. It’s nice to relax. I’ve been playing chess with my youngest
every night, she’s improving too quickly.”
7
Solution: Of all the possible factorisations of 72 comprising of 3 factors, only {2, 6, 6} and
{3, 3, 8} both have the same product and sum. Hence, the bartender couldn’t have known the
individual ages until he knows a youngest child exist.
Note that each response by the bartender conveys some form of information. The first response
is slightly ’trivial’ (if 72 has a unique factorisation, the bartender would’ve known the breakdown)
but the same concept is applied in the subsequent steps which helps frame the solution.
8
Problem 6.c. ‘Shared’ Information - 100 green-eyed Captives
Source: TED-Ed
100 green-eyed logicians are held captive by a (un)reasonable dictator, Tdolf Ailer, on a secluded
island. There is no escape but there is in place, one peculiar rule: Any prisoner can approach the
guard at night and ask to leave. If the prisoners have green eyes, they’ll be released. Otherwise,
they are tossed into the volcano.
As it happens, all 100 prisoners have green eyes! But they’ve lived there since birth. And Tdolf
has ensured they can’t learn their own eye colour. There are no reflective surfaces (even water is in
opaque containers), and most importantly, they cannot communicate among themselves. However,
they do see each other during each morning headcount. Nevertheless, they all know no one would
ever risk trying to leave without absolute certainty of success.
After much pressure from human rights groups. Tdolf reluctantly agrees to let you visit the
island and speak to the prisoners under the following conditions:
• You may only make 1 statement
• You cannot tell them any new information
You thought long and hard on what to say to avoid incurring the wrath of the dictator. Finally,
you tell the crowd: “At least one of you has green eyes!”
The dictator is suspicious but reassures himself that your statement has no consequential im-
pact. You leave, and life seemingly goes on as before. But on the 100th morning after your visit,
all 100 prisoners are gone, each having asked to leave the previous night.
Hint: Start with a much smaller group! What’s the minimum group size you can consider?
Solution: https://youtu.be/98TQv5IAtY8?t=120
This is a bit difficult to think about on such a large scale, so let’s start by looking at some smaller
cases to gain some intuition. Consider just 2 captives. Each sees one person with green eyes. If
both stay put for the first night, each of them acquire new information - the other person is seeing
someone with green eyes (recall that they now know at least one of them has green eyes).
It’s slightly trickier with 3 captives, Andre, Jonas, and Seth. Andre knows Jonas and Seth have
green eyes. Andre waits out for the 2nd morning and sees everyone present, which makes sense
since Jonas and Seth each sees at least one other person with green eyes. Andre waits out for the
3rd morning and still, he sees everyone present. If instead he had blue (or any other coloured)
eyes, Jonas and Seth would have left. The fact they stayed behind suggests they are seeing 2
person with green eyes. So Andre happily leaves on the 3rd night. So did the other 2.
We can then inductively apply this reasoning for 100 people which takes 100 nights.
9
Problem 6.d. An optimization problem - Cross The Bridge
Source: TED-Ed
Quick! Your team needs to get away from a bunch of zombies fast. With you, there are the
young biologist, the middle-aged physicist, and the old mathematician.
There is only one way to safety - across an old rope bridge, spanning a massive gorge. Here’s
how fast each of your team can dash across:
• You: 1 minute
• Biologist: 2 minutes
• Physicist: 5 minutes
• Mathematician: 10 minutes
The mathematician calculated that you have just 17 minutes before the zombies catch up. So,
you only have that much time to get everyone across and cut the ropes. Sadly, the bridge can only
hold two people at a time and they need to walk at the pace of the slower one. Worse, the
place is pitch black! And you only have 1 lantern that illuminates a tiny area. You have no other
tools, and must make sure everyone is safely across before the first zombie steps onto the bridge.
Can you figure out a way to have everyone successfully escape?
(Bonus, optional) If you would like a greater challenge, try tackling the more general version
of this problem here: https://open.kattis.com/problems/bridge
Solution: https://youtu.be/7yDmGnA8Hw0?t=126
The key idea is to minimise the time wasted by the two slowest people by having them cross
together. And because you have to make a couple return trips back and forth, you will want to
have the fastest people available to do so. Here’s the breakdown:
10