Prep Doc Coding Algo
Prep Doc Coding Algo
Kaiyu Zheng
January 2017
1 Ground
People say that interviews at Google will cover as much ground as pos-
sible. As a new college graduate, the ground that I must capture are
the following. Part of the list is borrowed from a reddit post: https://www.
reddit.com/r/cscareerquestions/comments/206ajq/my_onsite_interview_experience_at_google/
#bottom-comments.
1. Data structures
2. Trees and Graph algorithms
3. Dynamic Programming
4. Recursive algorithms
5. Scheduling algorithms (Greedy)
6. Caching
1
Shared by TRS
7. Sorting
8. Files
9. Computability
10. Bitwise operators
11. System design
As a anticipated Machine Learning TA, I would think that they might
ask me several questions about machine learning. I should also prepare
something about that. Operating systems is another area where you’ll
get asked, or you may answer questions from that aspect which can
potentially impress your interviewer.
Google likes to ask follow-up questions that scale up the size of input.
So it may be helpful to prepare some knowledge on Big Data, distributed
systems, and databases.
The context of any algorithm problems can involve sets, arrays,
hashtables, strings, etc. It will be useful to know several well-known
algorithms for those contexts, such the KMP algorithm for substrings.
In this document, I will also summarize my past projects, the most
difficult bugs, etc., things that might get asked. When I fly to Mountain
View, this is the only document I will bring with me. I believe that it is
powerful enough.
Contents
1 Ground 1
2 Knowledge Review 5
2.1 Data structures . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Array . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Tuple . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Union . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.4 Tagged union . . . . . . . . . . . . . . . . . . . . . 6
2.1.5 Dictionary . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.6 Multimap . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.7 Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.8 Bag . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.9 Stack . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.10 Queue . . . . . . . . . . . . . . . . . . . . . . . . . 9
2
Shared by TRS
3
2.11 Computability . . . . . . . . . . . . . . . . . . . . . . . . 50
2.11.1 Countability . . . . . . . . . . . . . . . . . . . . . 50
2.11.2 The Halting Problem . . . . . . . . . . . . . . . . . 50
2.11.3 Turing Machine . . . . . . . . . . . . . . . . . . . . 51
2.11.4 P-NP . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.12 Bitwise operators . . . . . . . . . . . . . . . . . . . . . . . 53
2.12.1 Facts and Tricks . . . . . . . . . . . . . . . . . . . 54
2.13 Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.13.1 GCDs and Modulo . . . . . . . . . . . . . . . . . . 56
2.13.2 Prime numbers . . . . . . . . . . . . . . . . . . . . 57
2.13.3 Palindromes . . . . . . . . . . . . . . . . . . . . . . 58
2.13.4 Combination and Permutation . . . . . . . . . . . 59
2.13.5 Series . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.14 Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.14.1 Threads & Processes . . . . . . . . . . . . . . . . . 61
2.14.2 Locks . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.15 System design . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.15.1 Specification . . . . . . . . . . . . . . . . . . . . . 63
2.15.2 Subtyping and Subclasses . . . . . . . . . . . . . . 64
2.15.3 Design Patterns . . . . . . . . . . . . . . . . . . . . 66
2.15.4 Architecture . . . . . . . . . . . . . . . . . . . . . 69
2.15.5 Testing . . . . . . . . . . . . . . . . . . . . . . . . 69
3 Flagship Problems 72
3.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6 Divide and Conquer . . . . . . . . . . . . . . . . . . . . . 83
3.7 Dynamic Programming . . . . . . . . . . . . . . . . . . . 84
3.8 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.9 Unsolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4 Behavioral 92
4.1 Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.1.1 introduce yourself . . . . . . . . . . . . . . . . . . 92
4.1.2 talk about your last internship . . . . . . . . . . . 92
4.1.3 talk about your current research . . . . . . . . . . 93
4.1.4 talk about your projects . . . . . . . . . . . . . . . 93
4.1.5 why Google? . . . . . . . . . . . . . . . . . . . . . 94
4
4.2 Favorites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2.1 project? . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2.2 class? . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2.3 language? . . . . . . . . . . . . . . . . . . . . . . . 94
4.2.4 thing about Google? . . . . . . . . . . . . . . . . . 94
4.2.5 machine learning technique? . . . . . . . . . . . . . 95
4.3 Most difficult . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.1 bug? . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.2 design decision in your project? . . . . . . . . . . . 95
4.3.3 teamwork issue? . . . . . . . . . . . . . . . . . . . 95
4.3.4 failure? . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3.5 interview problem you prepared? . . . . . . . . . . 96
5 Appendix 97
5.1 Java Implementation of Trie . . . . . . . . . . . . . . . . . 97
5.2 Python Implementation of the KMP algorithm . . . . . . 98
5.3 Python Implementation of Union-Find . . . . . . . . . . . 100
2 Knowledge Review
2.1 Data structures
2.1.1 Array
An array is used to describe a collection of elements, where each ele-
ment is identified by an index that can be computed at run time by the
program. I am familiar with this, so no need for more information.
5
Circular buffer A circular buffer is a single, fixed-size buffer used as
if it is connected end-to-end. It is useful as a FIFO buffer, because we
do not need to shift every element back when the first-inserted one is
consumed. Non-circular buffer is suited as a LIFO buffer.
2.1.2 Tuple
A tuple is a finite ordered list of elements. In mathematics, n-tuple is an
ordered list of n elements, where n is non-negative integer. A tuple may
contain multiple instances of the same element. Two tuples are equal
if and only if every element in one tuple equalst to the element at the
corresponding index in the other tuple.
2.1.3 Union
In computer science, a union is a value that may have any of several
representations or formats within the same position in memory; or it is
a data structure that consists of a variable that may hold such a value.
Think about the union data type in C, where essentially it allows you
to store different data types in the same memory location.
6
2.1.5 Dictionary
A dictionary, also called a map or associative array, is a collection of
key, value pairs, such that each possible key appears at most once in
the collection.
There are numerous ways to implement a dictionary. This is basically
a review of CSE 332.
As more elements are inserted into the hash table, there will likely be
collisions. A collision is when two distinct keys map to the same bucket
in the list T . Here are several common collision resolution strategies:
1. Separate Chaining: If we hash multiple items to the same bucket,
store a LinkedList of those items at that bucket. Worst case insert
and delete is O(n). Average is O(1). Separate Chaining is easy
to understand and implement, but requires a lot more memory
allocation.
2. Open Addressing: Choose a different bucket when the natural
choice (one computed by hash function) is full. Techniques include
linear probing, quadratic probing, and double hashing. The opti-
mal open addressing technique allows (1) duplication of the path
we took, (2) coverage of all spaces in the table, (3) avoid putting
lots of keys close together. The reasons to use open addressing
could be less memory allocation and easier data representation.
7
from T , we must use lazy deletion, i.e., mark that element as deleted,
but not actually removing it. Otherwise, we won’t be able to retrace
the insertion path. Linear probing can cause primary clustering, which
happens when different keys collide to form one big group1
Quadratic probing: Similar to linear probing, except that we use a
different formula to deal with collision: (h(key) + i2 ) % |T |. Theory
shows that if λ < 1/2, quadratic probing will find an empty slot in at
most |T |/2 probes (no failure of insertion)2 . Quadratic probing causes
secondary clustering, which happens when different keys hash to the
same place and follow the same probing sequence3 .
Double hashing: When there is a collision, simply apply a second,
independent hash function g to the key: (h(key) + i*g(key)) % |T |.
With careful choice of g, we can avoid the infinite loop problem similar
to quadratic probing. An example is h(key) = key % p, g(x) = q -
(key % p) for primes p, q with 2 < q < p.
Variations As we can see, the way keys are inserted into a hash table
is dependent on the hash function, which can compute in O(1) time.
We don’t have to use hashing to store these keys, because this way we
will not have any reasonable ordering of the keys in T . So we may
use a binary search tree to hold the keys if we want keys to be sorted
automatically.
In Java, besides HashMap and TreeMap, there is another popular im-
plementation – LinkedHashMap (so there’s even LinkedHashSet.)The Linked-
HashMap differs from the HashMap in that it maintains a separate
doubly-linked list running through all of its entries, which defines the
order of keys iteration. The order is normally insertion-order. Insertion
1 See page 6 of https://courses.cs.washington.edu/courses/cse332/15au/
lectures/hashing-2/hashing-2.pdf.
2 λ is the load factor, defined as λ = N/|T |, where N is the number of elements in
lectures/hashing-2/hashing-2.pdf.
8
order is not affected if a key is re-inserted into the map. This information
is from the Oracle Java Documentation. So essentially, LinkedHashMap
has no difference w.r.t. normal hash table at the data-storage level.
Bitmap Same structure as bit array. The only thing is that each bit
is mapped to some other stuff of meaning.
2.1.6 Multimap
A multimap is a generalization of a associative array in which more than
one value may be associated with and returned for a given key.
2.1.7 Set
A set is a collection of certain values without any particular order, and no
repeated values. It is basically a finite set in mathematics. Set Theory
is useful to understand what you can do with sets. The most basic
operations are union(S, T), intersection(S, T), difference(S, T),
subset(S, T),
2.1.8 Bag
A bag, also called a multiset, is a set that allows repeated values (dupli-
cates).
2.1.9 Stack
You should be pretty familar with this already. To implement stack, we
can use an array, and keep track of the top of the stack.
2.1.10 Queue
You should be pretty familar with this already. To implement queue,
we can use an array, and keep track of the front, rear element of the
queue. Or we can just remember the front element, and keep a count of
elements.
9
ADT Priority Queue ADT should at least support the following func-
tions. Some times we also like to have the decrease key(v, p) function.
insert(v, p) find min() delete min()
2.1.12 List
Linked list A linked list consists of a group of nodes that together
represent a sequence (ordered list). In the basic form, each node stores
some data and a pointer to the next node.
Doubly linked list A doubly linked list differs from singly linked list
in that each node has two references, one pointing to the next node, and
one pointing to the previous node.
The convenience of doubly linked list is that it allows traversal of the list
in either direction. In operating systems, doubly linked lists are used to
maintain active processes, threads, etc.
10
There is a classic problem: Convert a given binary tree to doubly
linked list (or the other way around). This problem will be discussed
later.
Unrolled linked list An unrolled linked list differs from linked list in
that each node stores multiple elements. It is useful for increasing cache
performance while decreasing the memory overhead assoicated with stor-
ing list metadata (e.g. references). Related to the B-tree. A typical node
looks like this:
record node {
node next
int numElements // number of elements in this node, up to
some max limit
array elements
}
XOR linked list An XOR linked list takes advantage of the bitwise
XOR operation, to decrease storage requirements for doubly linked lists.
An ordinary doubly linked list requires two references in a node, one for
the previous node, one for the next node. An XOR linked list uses one
address field to compress the two references, by storing the bitwise XOR
between the address of the previous node and the address of the next
node.
Example: We have XOR linked list nodes A, B, C, D, in order. So
for node B, it has a field A⊕C; for node C it has a field B⊕D, etc.
When we traverse the list, if we are at C, we can obtain the address
of D by XORing the address of B with the reference field of C, i.e.
B⊕(B⊕D)=(B⊕B)⊕D=0⊕D=D.
For the traversal to work, we can store B’s address alone in A’s field,
and store C’s address alone in D’s field, and we have to mark A as start
and D as end. This is because given an arbitrary middle node in XOR
linked list, one cannot tell the next or previous addresses of that node.
Advantages: Obviously it saves a lot of space.
Disadvantages: General-purpose debugging tools cannot follow XOR
chain. Most garbage collection schemes do not work with data structures
that do not contain literal pointers. Besides, while traversing the list, you
have to remember the address of the previous node in order to figure out
the next one. Also, XOR linked list does not have all features of doubly
linked list, e.g. the ability to delete a node knowing only its address.
11
Self-organizing list From Wikipedia:A self-organizing list is a list that
reorders its elements based on some self-organizing heuristic to improve
average access time. The aim of a self-organizing list is to improve effi-
ciency of linear search by moving more frequently accessed items towards
the head of the list. A self-organizing list achieves near constant time
for element access in the best case. A self-organizing list uses a reor-
ganizing algorithm to adapt to various query distributions at runtime.
Self-organizing list can be used in compilers (even for code on embed-
ded systems) to maintain symbol tables during compilation of program
source code. Some techniques for rearranging nodes:
Move to Front (MTF) Method: Moves the accessed element to the
front of the list. Pros: easy to implement, no extra memory. Cons: may
prioritize infrequently used nodes.
Count Method: Keep a count of the number of times each node is
accessed. Then, nodes are rearranged according to decreasing count.
Pros: realistic in representing the actual access pattern. Cons: extra
memory; unable to quickly adapt to rapid changes in access pattern.
Skip list A skip list is a probabilistic data structure that allows fast
search within an ordered sequence of elements. Fast search is made
possible by maintaining a linked hierarchy of subsequences, where each
subsequence skips over fewer elements than the previous one.
A skip list is built in layers. The bottom layer is the ordinary linked
list. Each layer higher is an ”express lane” for the lists below, where an
element in layer i appears in layer i + 1 with some fixed probability p.
This seems fancy. How are these express lanes used in searching? How
are skip lists used?
A search for a target starts at the head element of the top layer list,
and it proceeds horizontally until the current element is greater than
or equal to the target. If equal, target is found. If greater, the search
returns to the previous element, and drops down vertically to the list
at the lower layer. The expected run time is O(logn). Skip lists can be
used to maintain some, e.g. key-value, structure in databases.
People compare skip lists with balanced trees. Skip lists have the
same asymptotic expected time bounds as balanced trees, and they are
12
simpler to implement, and use less space. The average time for search,
insert and delete are all O(logn). Worst case O(n).
2.1.13 Heap
Minimum-heap property: All children are larger.
Binary heap One more property of binary heap is that the tree has
no gaps. Implementation using an array: parent(n) = (n-1) / 2;
leftChild(n) = 2n + 1; rightChild(n) = 2n + 2. Floyd’s build heap
algorithm takes O(n). There are several variations of binary heap. In-
sert, deleteMin, decreaseKey operations take Θ(logn) time. Merge takes
Θ(n) time.
2.1.14 Graph
A graph consists of a set of vertices and a set of pairs of these vertices
as edges. If these pairs are unordered, then the graph is an undirected
graph. If these pairs are ordered pairs, then the graph is a directed
graph.
Paths A path is called simple if all its vertices are distinct from one
another. A cycle is a path {v1 , v2 , · · · , vk−1 , vk } in which for k > 2, the
first k − 1 nodes are distinct, and v1 = vk . The distance between two
nodes u and v is the minimum number of edges in a u-v path.
13
ADT The following is the typical operations for a graph abstract data
type (ADT). During the interview, if you need to use a graph library,
you can expect it to have these functions. In Python, there are graph
libraries such as python-graph.
add vertex(G, v) add edge(G, u, v) neighbors(G, v)
remove vertex(G, v) remove edge(G, u, v) adjacent(G, v, w)
Common representations of a graph are adjacency list or adjacency
matrix. Wikipedia has a nice explanation and comparison of different
representations of a graph ADT. Check it out below.
Adjacency list: Vertices are stored as records or objects, and every
vertex stores a list of adjacent vertices. This data structure allows the
storage of additional data on the vertices. Additional data can be stored
if edges are also stored as objects, in which case each vertex stores its
incident edges and each edge stores its incident vertices.
Adjacency matrix: A two-dimensional matrix, in which the rows rep-
resent source vertices and columns represent destination vertices. Data
on edges and vertices must be stored externally. Only the cost for one
edge can be stored between each pair of vertices.
Adjacency list Adjacency matrix
Store graph O(|V | + |E|) O(|V |2 )
Add vertex O(1) O(|V |2 )
Add edge O(1) O(1)
Remove vertex O(|E|) O(|V |2 )
Remove edge O(|E|) O(1)
adjacent(G, v, w) O(|V |) O(1)
2.1.15 Tree
An undirected grah is a tree if it is connected and does not contain a
cycle (acyclic). Descendant & ancestor: We say that w is a descendent
of v if v lies on the path from root to w. In this case, v is an ancestor of
w. There are so many different kinds of trees. Won’t discuss them here.
Trie A trie is also called a prefix tree. Each edge in a trie represents
a character, and the value in each node represents the current prefix by
collecting all characters on the edges when traversing from the root (an
empty string) to that node. All the descendants of a node have the same
prefix as that node. See 5.1 for my Java implementation of Trie.
A compressed trie is a trie where non-branching paths are compressed
into a single edge. See the figure in 2.8.3 as an example.
14
B-Tree In computer science, a B-tree is a self-balancing tree data
structure that keeps data sorted and allows searches, sequential access,
insertions, and deletions in logarithmic time. The B-tree is a general-
ization of a binary search tree in that a node can have more than two
children (Comer 1979, p. 123). For an interview, I doubt that we need to
know implementation details for a B-Tree. Know its motivation through.
Motivation: Self-balanced binary search tree (e.g. AVL tree) is slow
when the height of the tree reaches a certain limit such that manipulating
nodes require disk access. In fact, for a large dictionary, most of the data
is stored on disk. So we want a self-balancing tree that is even shallower
than AVL tree, to minimize the number of disk accesses, and exploit
disk block size.
BIT[i+1]=BIT[i+1]+arr[i].
n=j+j&(-j).
4 Watch YouTube video for decent explanation, by Tushar Roy: https://www.
youtube.com/watch?v=CWDQJGaN1gY&t=13s
15
We add the value arr[i] to each of the affected node, until the
computed index n is out of bound. The run time here is O(logn).
Here is how we use a BIT to compute the prefix sum of the first k
elements. Just like before, we find the BIT node with index k+1. We
add the value of that node to our sum. Then, traverse from the node
BIT[i+1] back to the root. Each time we go to the parent node of current
node, suppose BIT[j], we compute the index of that parent node p, by
p=j-j&(-j)
Then we add the value of the parent node to sum, until we reach the
root. Return sum as the result. This process is also O(logn) time. The
space of BIT is O(n).
2.1.16 Union-Find
A union-find data structure, also called a disjoin-set data structure, is a
data structure that maintains a set of disjoint subsets (e.g. components
of a graph). It supports two operations:
1. Find(u): Given element u, it will return the name of the set that
u is in. This can be used for checking if u and v are in the same
set. Optimal run time: O(logn).
2. Union(N1 , N2 ): Given disjoint sets N1 , N2 , this operation will
merge the two components into one set. Optimal run time O(1) if
we use pointers; if not, it is O(logn).
First, we will discuss an implementation using implicit list. Assume
that all objects can be labeled by numbers 1, 2, 3, · · · . Suppose we have
three disjoint sets as shown in the upper part of the following image.
Notice that the above representation is called an explicit list, because it
explicitly connects objects within a set together.
16
As shown in the lower part of the image above, we can use a single
implicit list to represent the disjoint sets that can remember (1) pointers
to the canonical element (i.e. name) for a disjoint set, (2) the size of each
disjoint set. See appendix 5.3 for my Python implementation of Union-
Find, using implicit list.
When we union two sets, it is conceptually like joining two trees to-
gether, and the root of the tree is the canonical element of the set after
union. Path compression is basically the idea that we always join the
tree with smaller height into the one with greater height, i.e. the root of
the taller tree will be the root of the new tree after union. In my imple-
mentation, I used union-by-size instead of height, which can produce the
same result in run time analysis5 . The run time of union is determined
by the run time of find in this implementation. Analysis shows that
the upper bound of run time of find is the inverse Ackermann function,
which is even better than O(logn).
There is another implementation that uses tree that is also optimal
for union. In this case, the union-find data structure is a collection of
trees (forest), where each tree is a subset. The root of the tree is the
canonical element (i.e. name) of the disjoint set. It is essentially the
same idea as implicit list.
BFS/DFS(G=(V,E), s) {
worklist = [s]
seen = {s}
while worklist is not empty:
node = worklist.remove
{visit node}
for each neighbor u of node:
if u is not in visited:
queue.add(u)
seen.add(u)
}
edu/courses/cse332/15au/lectures/union-find/union-find.pdf.
17
There is another way to implement DFS using recursion (From MIT
6.006 lecture):
DFS(V , Adj):
parent = {}
for s in V :
if s is not in parent:
parent[s] = None
DFS-Visit(Adj, s, parent)
DFS-Visit(Adj, s, parent):
for v in Adj[s]:
if v is not in parent:
parent[v] = s
DFS-Visit(Adj, v, parent)
BFS/DFS(G=(V,E), s) {
worklist = [s]
seen = {s}
layers = {s:0}
while worklist is not empty:
node = worklist.remove
{visit node}
for each neighbor u of node:
if u is not in visited:
queue.add(u)
seen.add(u)
layers.put(u, layers[node]+1)
Go through keys in layers and obtain the set of nodes for
each layer.
}
18
in some BFS layer.
Theorem 2.3 (BFS). For BFS tree T , if there is an edge (x, y) in G
such that node x belongs to layer Li , and node y belongs to layer Lj ,
then i and j differ at most 1.
Theorem 2.4 (DFS). Let T be a DFS tree. Let x, y be nodes in T . Let
(x, y) be an edge of G that is NOT an edge of T . Then, one of x or y is
an ancestor of the other.
Theorem 2.5 (DFS). For a given recursive call DFS(u), all nodes that
are marked visited (e.g. put into the parent map) between the invocation
and end of this recursive call are descendents of u in the DFS tree T .
Now, let us look at how BFS layers is used for the two-colorable graph
problem. A graph is two-colorable if and only if it is bipartite. A bipartite
graph is a graph whose vertices can be divided into disjoint sets U and
V (i.e. U and V are independent sets), such that every edge connects a
vertex in U to one in V .
Theorem 2.6 (No Odd Cycle). If a graph G is bipartite, then it cannot
contain an odd cycle, i.e. a cycle with odd number of edges (or nodes).
19
Cycle detection
Theorem 2.8. A graph G has a cycle if and only if the DFS has a
backward edge.
Here we discuss Kahn’s algorithm for topological sort; run time is O(|V |+
|E|). First, start with a list of nodes that have no incoming edges, and
insert them into a set S. Then, we remove all nodes from G in S. While
removing a node, we remove the outgoing edges of that node. We will
repeat these two steps until S is empty. The order that we remove nodes
from G is the topological order.
Kahn’s algorithm can be used to test if a graph is a DAG. After S is
empty, if the graph is a DAG, all edges should have been removed from
G. If there is still edges, then G is not a DAG.
Another, more gentle (no removal of nodes and edges) algorithm uses
DFS. The idea behind is that when we do DFS, we start visiting a
node before its descendants in the DFS tree, and finish visiting those
descendants before we actually finish visiting the node itself.
The topological sort thus can be obtained by doing DFS, and output
the reverse of the finishing times of verticies6 . So if node v is finished
visiting after node u, then v is topologically sorted before u; v must be
the ancestor of u in the tree. We must check if there is a back edge in
the graph, before we output the final topological order, because if so,
the graph has a cycle.
6 From MIT 0.006 class, Fall 2011
20
2.2.3 Paths
Obtain BFS/DFS path from s to t When we have an unweighted
graph, we can find a path from s to t simply by BFS or DFS. BFS gives
the shortest path in this case. It is straightforward if we have a map
that can tell the parent of each node in the tree.
BFS/DFS(G=(V,E), s, t):
worklist = [s]
seen = {s}
parent = {s:None}
while worklist is not empty:
node = worklist.remove
if node == t:
return Get-Path(s, t, parent)
for each neighbor u of node:
if u is not in visited:
queue.add(u)
seen.add(u)
parent[u] = node
return NO PATH
Get-Path(s, t, parent):
v = t
path = {t}
while v != s:
v = parent[v]
path.add(v)
return path
Dijkstra’s-Algorithm(G=(V,E), s, t):
S = {}; d[s] = 0; d[v] = infinity for v != s
prev[s] = None
While S != V
Choose v in V-S with minimum d[v]
Add v to S
For each w in the neighborhood of v
if d[v] + c(v,w) < d[w]:
21
d[w] = d[v] + c(v,w)
prev[w] = v
return d, prev
Bellman-Ford(G=(V,E), s, t):
d[s] = 0; d[v] = infinity for v != s
prev[s] = None
for i = 1 to |V|-1:
for each edge (u, v) with weight w in E:
if d[u] + w < d[v]:
d[v] = d[u] + w
prev[v] = u
return d, prev
22
# First tuple means (state, path_to_state, cost_of_path)
worklist.push((startState, [], 0),
0 + heuristic(startState, problem))
while not worklist.isEmpty():
state, actions, pathCost = worklist.pop()
if state in visitedStates:
continue
if problem.isGoalState(state):
return actions
successors = problem.getSuccessors(state)
for stepInfo in successors:
sucState, action, stepCost = stepInfo
sucPathCost = pathCost + stepCost
worklist.push((sucState, actions + [action],
sucPathCost),
sucPathCost + heuristic(sucState,
problem))
23
Hamiltonian path In the mathematical field of graph theory, a Hamil-
tonian path (or traceable path) is a path in an undirected or directed
graph that visits each vertex exactly once. A Hamiltonian cycle (or
Hamiltonian circuit) is a Hamiltonian path that is a cycle. Determining
whether such paths and cycles exist in graphs is the Hamiltonian path
problem, which is NP-complete.
Dynamic Programming algorithm for finding a Hamiltonian path:
Bellman, Held, and Karp proposed an algorithm to find Hamiltonian
path in time O(n2 2n ). In this method, one determines, for each set S
of vertices and each vertex v in S, whether there is a path that covers
exactly the vertices in S and ends at v. For each choice of S andv, a
path exists for (S, v) if and only if v has a neighbor w such that a path
exists for (S − v, w), which can be looked up from already-computed
information in the dynamic program7 .
Prim’s Algorithm Start with a root node s and try to greedily grow
a tree from s outward. At each step, we simply add the node that can
be attached as cheaply as possible to the partial tree we already have.
If we use adjacency list to represent the graph, and use Fibonacci heap
to retrieve minimum cost edge, then the run time is O(|E| + |V |log|V |).
NearOpt.pdf
24
2.2.5 Max-flow Min-cut
Given a directed weighted graph, with one source node s and one sink
node t, we can find a partition of the nodes A, B such that the sum of the
cost of edges that go from component A to B is minimum, compared to
all other possible partition (Min-cut). This sum equals to the maximum
flow value from the s to t9 .
To find Max-flow, we can use the Ford-Fulkerson Algorithm. Pseudo-
code is as follows.
augment(f , P ):
b = bottleneck(f , P )
for e = (u, v) in P :
if e is forward edge, then
increase f (e) in G by b
else e is a backward edge
decrease f (e) in G by b
return f
25
the final problem from optimal solutions to subproblems10 .
to be True.
We can think of a cell being True as there is an interleaving of i
characters of x0 and j characters of y 0 that makes up the first i + j
characters of string s.
If we filled out this table, we can return True if for some i and j such
that i is a multiple of |x| AND j is a multiple of |j| AND i + j = l AND
10 Cited from CMU class lecture note: https://www.cs.cmu.edu/ avrim/451f09/
~
lectures/lect1001.pdf
11 This problem is from Algorithm Design, by Kleinberg, Tardos, pp.329 19.
26
Opt[i, j] = True. This is precisely saying that we return True if s can
be composed by interleaving some repetition of x and y.
So first, our job now is to fill up this table. We can traverse j from 1
to l (traverse each row). And inside each iteration, we traverse i from 1
to l. Inside each iteration of this nested loop, we set the value of Opt[i,
j] according to the rule described above.
Then, we can come up with i and j by fixing i and increment j by
|y| until i + j > l (*). Then, we increment i by |x|. Then, we repeat the
previous step (*), and stop when i + j > l. We check if i + j = l inside
each iteration, and if so, we check if Opt[i,j]=True. If yes, we return
True. If we don’t return True, we return False at the end.
27
where 1(ai 6=bj ) is the indicator function that equals to 1 if ai 6= bj , and
leva,b (i, j) is the distance between the first i characters of a and the first
j characters of b.
28
operations. The first parenthesization is obviously more preferable.
Given n matrices, the total number of ways to parenthesize them is
P (n) = Ω(4n /n3/2 ), so brute force is impractical12 .
We use dynamic programming. First, we characterize the structure
of an optimal solution. We claim that one possible structure is the
following:
((A1:i )(Ai+1:n )) (1)
where Ai:j means matrix multiplication of Ai Ai+1 · · · Aj . In order for
the above to be optimal, the parenthesization for A1:i and Ai+1:n must
also be optimal. Therefore, we can recursively break down the problem,
till we only have one matrix. A subproblem is of the form Ai:j , with 1 ≤
i, j ≤ n, which means there are O(n2 ) unique subproblems (counting).
Let Opt[i, j] be the cost of computing Ai:j . If the final multiplication
of Ai:j is Ai:j = Ai:k Ak+1,j , assuming that Ai:k is pi−1 × pk , and Ak+1:j
is pk × pj , then for i < j,
29
once. You can store these solutions in an array or hash table.
This view of Dynamic Programming is often called memoizing.
For example, the longest common subsequence (LCS) problem can be
solved with this top-down approach. Here is the pseudo-code, from the
CMU lecture note.
LCS(S,n,T,m)
{
if (n==0 || m==0)
return 0;
if (arr[n][m] != unknown)
return arr[n][m]; // memoization (use)
if (S[n] == T[m])
result = 1 + LCS(S,n-1,T,m-1);
else
result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) );
arr[n][m] = result; // memoization (store)
return result;
}
If we compare the above code with the bottom-up formula for LCS, we
realize that they are just using the same algorithm, with same cases. The
idea that both approaches share is that, we only care about computing
the value for a particular subproblem.
30
Binary Search This search algorithm runs in O(logn) time. It works
by comparing the target element with the middle element of the array,
and narrow the search to half of the array, until the middle element is
exactly the target element, or until the remaining array has only one
element. Binary search is naturally a divide-and-conquer algorithm.
Binary-Search-Iterative(arr, target):
lo = 0
hi = arr.length
while lo < hi:
mid = lo + (hi-lo)/2
if arr[mid] == target:
return mid
else if arr[mid] > target:
lo = mid + 1
else:
hi = mid
return NOT FOUND
31
2. Split the set of points into two equal-sized subsets by a vertical
line x = xsplit .
3. Solve the problem recursively in the left and right subsets. This
yields the left-side and right-side minimum distances dLmin and
dRmin , respectively.
4. Find the minimal distance dLRmin among the set of pairs of points
in which one point lies on the left of the dividing vertical and the
other point lies to the right.
5. The final answer is the minimum among dLmin , dRmin , and dLRmin .
The recurrence of this algorithm is T (n) = 2T (n/2) + O(n), where O(n)
is the time needed for step 4. This recurrence to O(nlogn). Why can
step 4 be completed in linear time? How? Suppose from step 3, we know
the current minimum distance is δ. For step 4, we first pick the points
with x-coordinates that are within [xsplit − δ, xsplit + δ], call this the
boundary zone. Suppose we have p1 , · · · , pm inside the boundary zone.
Then, we have the following magical theorem.
Theorem 2.11. If dist(pi , pj ) < δ, then j − i ≤ 15.
With this, we can write the pseudo-code for this algorithm14 :
Closest-Pair(P ):
if |P | == 2:
return dist(P [0], P [1])
L, R = SplitPointsByHalf(P )
dL = Closest-Pair(L)
dR = Closest-Pair(R)
dLR = min(dL, dR)
S = BoundaryZonePoints(L, R, dLR)
for i = 1, ..., |S|:
for j = 1, ..., 15:
dLR = min(dist(S[i], S[j]), d)
return dLR
~ckingsf/bioinfo-lectures/closepoints.pdf
32
algorithms and understood them, but I will save my time and not discuss
them here.
2.4.2 Backtracking
Backtracking is a general algorithm for finding all (or some) solutions to
some computational problems, notably constraint satisfaction problems,
that incrementally builds candidates to the solutions, and abandons each
partial candidate c (”backtracks”) as soon as it determines that c cannot
possibly be completed to a valid solution15 .
33
Relation between backtracking and DFS: DFS is a specific form of back-
tracking related to searching tree structures. Backtracking is more broad
– it can be used for searching in non-tree structures, such as the Sudoku
board.
CountCall(processes):
Heap h = a heap of processes ordered by descending start time.
count_calls = 0
While h is not empty:
17 This problem is from Kleinberg Algorithm Design, pp.194 14.
34
p = h.RemoveMin()
call_time = start time of p
count_calls += 1
For each process q in h:
If q is running at call_time:
h.Remove(q)
Return count_calls
35
rithm produces optimal result for 0 <= j <= k. Thus, the result
produced by the algorithm for n = k + 1 matches the optimal in
case (b), which is to NOT call status check any more.
Conclusion From the above proof of base case and induction
step, by Strong Induction, we have shown that our algorithm
works for integer n >= 0.
Indeed, induction is how you formally prove that a greedy rule works
correctly.
Justification for run time: The above algorithm is efficient. We first
construct a heap of processes, which takes O(nlogn) time. Then we loop
until we remove all items inside the heap, which is O(nlogn) time. Since
we do not add any process back into the heap after we removed it, the
algorithm will terminate when the heap is empty. Besides, any other
operations in the algorithm is O(1). Therefore, combined, our algorithm
has an efficient runtime of O(nlogn) + O(nlogn) = O(nlogn).
2.6 Sorting
2.6.1 Merge sort
Merge sort is a divide-and-conquer, stable sorting algorithm. Worst case
O(nlogn); Worst space O(n). Here is a pseudo-code for non-in-place
merge sort. An in-place merge sort is possible.
Mergesort(arr):
if arr.length == 1:
return arr
l = Mergesort(arr[0:mid])
r = Mergesort(arr[mid:length])
return merge(l, r)
2.6.2 Quicksort
Quicksort is a divide-and-conquer, unstable sorting algorithm. Average
run time O(nlogn); Worst case run time O(n2 ); Worst case auxiliary
space18 O(logn) with good implementation. (Naive implementation is
O(n) space still.) Quicksort is fast if all comparisons are done with
constant-time memory access (assumption). People have argued which
sort is the best. Here is an answer from Stackoverflow, by user11318:
18 Auxiliary Space is the extra space or temporary space used by an algorithm.
From GeeksForGeeks.
36
... However if your data structure is big enough to live on disk,
then quicksort gets killed by the fact that your average disk does
something like 200 random seeks per second. But that same disk
has no trouble reading or writing megabytes per second of data
sequentially. Which is exactly what merge sort does.
Therefore if data has to be sorted on disk, you really,
really want to use some variation on merge sort. (Gen-
erally you quicksort sublists, then start merging them together
above some size threshold.) ...
Here is the pseudo-code:
37
2.6.4 Radix sort
Radix sort is a non-comparison sort, where the array must only contain
elements that are integers. Suppose the array arr has size n, and each
value arri ∈ {1, · · · , k}, k = nO(1) and arri has base b. Then radix sort
can complete this sorting task in time O(cn), where c = logn k.
More specifically, radix sort basically sorts the array of integers by
by each digit, and uses counting sort for each digit sorting.
Counting sort works when the given array is of integers, and each
ranges from p to q; it creates an array, say M , of p − q elements,
counts the number of occurrence each element in the given array,
and records it into M , and then the sorted order can be produced
by traversing M and repeating each element19 with the occurrence
recorded. The run time of this sort is O(n + (p − q)).
Suppose the number in arr has base b. So the time needed to sort by
each digit is O(n + b) using counting sort. Then, the number of digits
of arri is maximum d = logb k. Thus, the run time of radix sort can be
derived:
2.7 Searching
Graph search algorithms have been discussed already. Binary search
has been discussed in the divide-and-conquer section. We will only look
at quickselect here. Searching is a topic where interviewers like to ask
questions, for example, search for an element in a sorted an rotated 20
array.
2.7.1 Quickselect
Quickselect is a method to select the element with rank k in an unsorted
array. Here is the pseudocode:
38
if hi <= lo:
return arr[lo]
pivot = ChoosePivot(arr, lo, hi)
p = Partition(arr, lo, hi, pivot)
if p == k:
return arr[k]
else if p > k:
return Quickselect(arr, k, lo, p)
else:
return Quickselect(arr, k-p, p, hi)
2.8 String
2.8.1 Regular expressions
Regular expression needs no introduction. In interviews, the interviewer
may ask you to implement a regular expression matcher for a subset of
regular expression symbols. Similar problems could be asking you to
implement a program that can recognize a particular string pattern.
Here is a regex matcher written by Brian Kernighan and Rob Pike
in their book The Practice of Programming 21 .
39
if (regexp[0] == ’\0’)
return 1;
if (regexp[1] == ’*’)
return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == ’#’ \&\& regexp[1] == ’\0’)
/* # means dollar sign here! */
return *text == ’\0’;
if (*text!=’\0’ && (regexp[0]==’.’ || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}
Building prefix table (π table) The first thing that KMP does is to
preprocess the pattern P and create a π table. π[i] is the largest integer
smaller than i such that P0 · · · Pπ[i] is a suffix of P0 · · · Pi . Consider the
following example:
i 0 1 2 3 4 5 6 7
Pi a b a b c a b a
π[i] -1 -1 0 1 -1 0 1 2
40
When we are filling π[i], we focus on the substring P0 · · · Pi , and see if
there is a prefix equal to the suffix in that substring. π[0], π[1], π[4] are
−1, meaning that there is no prefix equal to suffix in the corresponding
substring. For example, for π[4], the substring of concern is ababc, and
there is no valid index value for π[4] to be set. π[7] = 2, because the
substring P0 · · · P7 is ababcaba, and the prefix P0 · · · P2 , aba, is a suffix
of that substring.
Below is a pseudo-code for constructing a π table. The idea behind
the pseudo-code is captured by two observations:
1. If P0 · · · Pπ[i] is a suffix for P0 · · · Pi , then P0 · · · Pπ[i]−1 is a suffix
for P0 · · · Pi−1 as well.
2. If P0 · · · Pπ[i] is a suffix for P0 · · · Pi , then so does P0 · · · Pπ[π[i]] , and
so does P0 · · · Pπ[π[π[i]]] , etc., a recursion of the π values.
So we can use two pointers i, j, and we are always looking at if the
prefix P0 · · · Pj−1 is a suffix for the substring P0 · · · Pi−1 . So pointer i
moves quicker than pointer j. In fact i moves up by 1 every time we are
done with a comparison between Pi and Pj , and j moves up by 1 when
Pi = Pj (observation 1). At this time (Pi = Pj ), we set π[i] = j. If
Pi 6= Pj , we will move j back to π[j − 1] + 1 (+1 because π[i] is -1 when
there is no matching prefix.) This guarantees that the prefix P0 · · · Pj−1
is the longest suffix for the substring P0 · · · Pi−1 . We need to initialize
π[−1] = −1 and π[0] = −1.
Construct-π-Table(P ):
j = 0, i = 1
while i < |P |:
if Pi = Pj :
π[i] = j
i += 1, j += 1
else:
if j > 0:
j = max(0, π[j-1]+1)
else:
π[i] = -1
i += 1
41
be the following, as an example. (P is the same as the above example.)
W = abccababababcaababcaba, P = ababcaba
Based on the way we construct the π table above, we have the following
rules when doing the matching. Assume the matched substring (i.e. the
substring of P before the first mismatch, starting at W [k], has length d.
1. If d = |P |, we found a match. Return k.
2. Else, if d > 0, and π[d − 1] = −1, then the next comparison starts
at W [k + d].
3. Else, if d > 0, and π[d − 1] 6= −1, then the next comparison starts
at. W [k + d − π[d − 1] − 1].Note: we don’t need the −1 here if π table is
1-based index. See Stanford slides.
42
2.8.4 Permutation
String permutation is another topic that interviewers may like to ask.
One generates permutations typically by depth-first search (i.e. a form
of backtracking); we can imagine that all possible permutations are the
leafs of a tree, and the paths from root to them represent the characters
chosen.
2.9 Caching
In general a cache is a location to store a small amount of data for more
convenient access. It is everywhere. Here are some common examples,
described at a high level.
CPU cache is used by the CPU to quickly access data in the main
memory. Depending on the type of memory data, e.g. regular data,
instruction, virtual address (translation lookaside buffer used by MMU),
etc., there may be different types of caches, such as data cache and
instruction cache.
Cache server (web cache) basically saves web data (e.g. web page,
requests) and serve them when the user request the same thing again,
in order to reduce the amount of data transmitted over the web.
43
it will figure out the tag on that address (e.g. by dividing page size),
and check if the mapped block in the cache has the same tag, and if the
valid bit is set. If so, then the cached data is usable.
Fully-Associative cache is one that allows any memory page to be
cached in any cache block, opposite to direct mapped cache. The advan-
tage is that it avoids the possibly constantly empty entries in a direct
mapped cache, so that the cache miss rate is reduced. The drawback
is that such cache requires hardware sophistication to support parallel
look-up of tags, because in this case, the only way to identify if an ad-
dress is cached is to compare the tag on that address with all tags in the
cache (in parallel).
In CSE 351, we adopted the set-associative cache as the reason-
able compromise between complicated hardware and the direct mapped
cache. Here, we divide the cache into sets, where each set contains sev-
eral entries. This way, we can reduce the cache miss rate compared to
direct mapped cache, but also check the tags efficiently enough in par-
allel, because there are only a few entries in a set. We say a cache is
n-way, if in each set there are n cache blocks.
For the sake of interviews, we will discuss LRU cache and LFU cache
(in software). Both of them fall under the topic of cache replacement
policies, which means that when a cache is full, how do we evict cached
data. There are numerous policies, including FIFO, LIFO, LRU (Least
Recently Used), LFU (Least Frequently Used), etc. In hardware, these
caches are usually implemented by manipulating some bits (e.g. LRU
counter for each cache block) in the block to keep track of some property
such as age. Concepts are similar.
the value as the cache block associated with the tag of that address.
44
by removing it (constant time for doubly linked list) then prepending it
to the list.
where VK (s) means the utility function of agent K for a state s. The
parameter sK is the state that agent K takes control – agent K makes
the decision of how to change from sK to some successor state. So, agent
A tries to maximize the utility, and Z tries to minimize its utility.
23 From Wikipedia
24 Modified from CSE 473 lecture slides, 2016 spring, by Prof. L. Zettlemoyer.
45
Alpha-beta is a pruning method for Minimax tree. The output of
Alpha-beta is the same as the output of Minimax. The α value represents
the assured maximum score that the agent A can get, and the β value
is the assured minimum score that agent Z can get.
Below is my Python implementation of alpha-beta, when doing the
pacman assignment. Only the agent with index 0 is a maximizing agent.
score = -float("inf")
if agentIndex != 0:
score = -score
nextAgent = curAgent + 1
if nextAgent >= gameState.getNumAgents():
nextAgent = 0
depth -= 1
46
• A set of states s ∈ S,
• A set of actions a ∈ A,
• A transition function T (s, as0 ) for the probability of transition from
s to s0 with action a,
• A reward function R(s, as0 ),
• A start state,
• (maybe) a terminal state.
The world for MDP is usually a grid world, where some grids have pos-
itive reward, and some have negative reward. The goal of solving an
MDP is to figure out an optimal policy π ∗ (s), the optimal action for
state s, so that the agent can take actions according to the policy in or-
der to gain the highest amount of reward possible. There are two ways to
solve MDP discussed in the undergraduate level AI class: value (utility)
iteration and policy iteration. I will just put some formulas here. For
most of the time, understanding them is straightforward and sufficient.
Definition 2.4. The utility of a state is V ∗ (s), the expected utility
starting in s and acting optimally.
Definition 2.5. The utility of a q-state25 is Q∗ (s, a), the expected util-
ity starting by taking action a in state s, and act optimally afterwards.
Using the above definition, we have the following recursive definition
of utilities. The γ value is a discount, from 0 to 1, which can let the
model prefer sooner reward, and help the algorithm converge. Note max
is different from argmax.
V ∗ (s) = max Q∗ (s, a)
a
X
Q∗ (s, a) = T (s, a, s0 )[R(s, a, s0 ) + γV ∗ (s0 )]
s0
X
V ∗ (s) = max T (s, a, s0 )[R(s, a, s0 ) + γV ∗ (s0 )]
a
s0
47
Policy Iteration: Start with an arbitrary policy π0 , then iteratively
evaluate and improve the current policy until policy converges. This is
better than value iteration in that policy usually converges long before
value converges, and the run time for value iteration is not desirable.
X
πi
Vk+1 ← T (s, πi (s), s0 )[R(s, πi (s), s0 ) + γVk (s0 )πi ]
s0
X
πi+1 (s) = argmax T (s, a, s0 )[R(s, a, s0 ] + γV πi (s0 )]
a
s0
48
This is called the forward algorithm. From this, we can derive the formula
for the belief at the next time frame, given current evidence:
X
B 0 (Xt+1 ) = P (Xt+1 |e1:t ) = P (Xt+1 |xt )P (xt |e1:t )
xt
X
= P (Xt+1 |xt )B(xt )
xt
P (B|A)P (A)
P (A|B) =
P (B)
A B
D E
In general,
n
Y
P (x1 , x2 , · · · , xn ) = P (xi |parents(Xi ))
i=1
49
2.11 Computability
This section is about some fundamental theory of computer science. I
am not sure if any interview will ask questions directly related to them,
but knowing them definitely helps.
2.11.1 Countability
A set S is countable if and only if the elements in S can be mapped on-to
N. The union of countably many countable sets is countable. (Given
axiom of choice)
D(x):
if x \% 2 == 0:
while(1)
else:
return H() # H is an arbitrary function
3. As you can see, the output of this program is either ”loop forever”,
or what H returns.
4. If A(CODE(D), 1, 2) returns true, then D(1) 6= D(2). So D(1)
halts, which means H halts.
50
5. If A(CODE(D), 1, 2) returns false, then D(1) = D(2). So D(1)
does not halt. (Bascially, when two outputs D(1)= D(2), the only
chance that happens is that both when into infinite loop, which
does not return a value (abstractly, this ”infinite loop” is the re-
turned value).
6. Suppose H is a program that actually plays the same role as A but
for the halting set, a set defined like this:
2.11.4 P-NP
The following definitions and theorems are provided by Algorithm De-
sign, book by Kleinberg et. al., and the CSE 332 lecture slides on P-NP,
by Adam Blank.
Definition 2.6. A complexity class is a set of problems limited by some
resource constraint (e.g. time, space).
Definition 2.7. A decision problem is a set of strings (L ∈ Σ∗). An al-
gorithm (from Σ∗ to boolean) solves a decision problem when it outputs
True if and only if the input is in the set.
Definition 2.8. P is the set of decision problems with a polynomial
time (in terms of the input) algorithm
51
Definition 2.9. NP (1) NP is the set of decision problems with a non-
deterministic polynomial time algorithm.
Definition 2.10. A certifier for problem X is an algorithm that takes
as input: (1) a string s which is an instance of X; (2) A string w which
is a ”certificate” or ”witness” that s ∈ X, and returns False if s ∈ X
regardless of w, and returns True otherwise, i.e. there exists some w to
let s ∈ X.
Definition 2.11. NP (2) NP is the set of decision problems with a
polynomial time certificate.
Definition 2.12. Suppose X, Y are two different problems. We say X
is at least as hard as Y if there is a ”black box” capable of solving X,
and if we can use polynomial operations plus polynomial number of calls
to X in order to solve Y . This also means that Y ≤P X.
In otherwords, X is powerful enough for us to solve Y .
Theorem 2.12. Suppose Y ≤P X. If X can be solved in polynomial
time, then Y can be as well.
Theorem 2.13. Suppose Y ≤P X. If Y cannot be solved in polynomial
time, then X cannot either.
Theorem 2.14. P ⊆ N P .
Definition 2.13. Problem X is NP-hard if for all Y ∈ N P , Y ≤P X.
Definition 2.14. Problem X is NP-Complete if and only if X ∈ N P
and for all Y ∈ N P , Y ≤P X (X is NP-hard).
Theorem 2.15. Suppose X is an NP-Complete problem. Then X is
solvable in polynomial time if and only if P = N P .
Example P-NP reduction. Multiple Interval Scheduling Problem (Al-
gorithm Design, pp.512 14): you’re given a set of n jobs, each specified
by a set of time intervals, and you want to answer the following question:
For a given number k, is it possible to accept at least k of the jobs so
that no two of the accepted jobs have any overlap in time?
52
Multiple Interval Scheduling Problem is NP-hard Now we show
that we can reduce an instance of the Multiple Interval Scheduling Prob-
lem (MISP) to Independent Set Problem (ISP). Suppose we are given
an instance of ISP: a graph G = (V, E), and a number k. We can create
an instance of MISP in polynomial time by: First, for each node vi ∈ V
, we create a job li . For each edge e ∈ E with end nodes vi and vj , we
create an interval such that jobs li and lj require to work in that interval.
Then, we use the same number k in MISP. Now we claim that there is
k non-overlapping jobs scheduled if and only if the corresponding Inde-
pendent Set Problem has independent set of size k. Suppose we have k
non-overlapping jobs scheduled. Then, because if two jobs overlap, they
must require to work in some identical interval. Therefore, the k nodes
corresponding to the k jobs must have no edges connecting each other,
which means that it is an independent set of size k. Suppose we have
an independent set of size k. Then because of the way we construct
the MISP, the corresponding k jobs must have no overlapping intervals.
Therefore, we proved our claim.
N = 01101
N 0 = 10110
53
because the rightmost 1 in N was dropped, and circulated onto the
leftmost bit of N 0 . Left rotate works in similar way. Sometimes the
circulated bit is stored also stored in a carry flag (another single bit),
which is called rotate through carry.
x = x ^ y
y = x ^ y
x = x ^ y
(A Y B) Y A = (A Y A) Y B = 0 Y B = B
However, the compiler cannot tell what you are doing with these lines
of code. So sometimes it may be better to just leave the optimization
work to the compiler.
Find Odd in Evens Given a set of numbers where all elements occur
even number of times, except for one number, find that number that
only occur odd number of times.
Because of XOR’s properties, including communitivity, associativity
and self-zero, if we xor all numbers in the given set, the result will be
exactly the odd occuring number!
54
Max Without Comparison With bit operations, we can implement
the max operation between two integers a, b without comparisons.
/*
* sign - return 1 if positive, 0 if zero, and -1 if negative
* Examples: sign(130) = 1
* sign(-23) = -1
*/
int sign(int x) {
// If x is negative, x >> 31 should give all 1s
// which is essentailly -1
int neg = x >> 31;
// need to make x ^ 0 only 0 or 1
int pos_or_zero = !(x >> 31) & !!(x ^ 0);
return neg + pos_or_zero;
}
55
Another way to compute sign:
int flip(int bits):
return 1 ^ bits
2.13 Math
2.13.1 GCDs and Modulo
Theorem 2.16. Euclid’s Algorithm To compute gcd(n1 , n2 ), produce
a new pair of number that consists of min(n1 , n2 ) and the difference
|n1 − n2 |; Keep doing this until the numbers in the pair are the same.
My implementation of gcd:
56
(a mod m) + (b mod m) ≡ a + b (mod m)
Modulo can be used to obtain the single digit of a number. My Python
code for solving a Google online assessment problem is as follows.
57
2.13.3 Palindromes
Although palindromic numbers are most often considered in the decimal
system, the concept of palindromicity can be applied to the natural
numbers in any numeral system. Formal definition of Palindromicity:
Definition 2.16. Consider a number n > 0 in base b ≥ 2, where it is
written in standard notation with k digits ai as:
k−1
X
n= (ai bi )
i=0
vector<int> get_digits(int n)
{
vector<int> result;
int d = n % 10;
n /= 10;
result.push_back(d);
while (n != 0)
{
d = n % 10;
n /= 10;
result.push_back(d);
}
return result;
}
// base = 10
bool is_palindrome(int n)
{
vector<int> digits = get_digits(n);
int size = digits.size();
for (int i = 0; i < size / 2; i++)
{
if (digits.at(i) != digits.at(size-1-i))
{
return false;
}
58
}
return true;
}
P (n, k) P (n, k) n!
C(n, k) = = =
P (k, k) k! (n − k)!k!
2.13.5 Series
The following description is from Wikipedia.
an = a1 + (n − 1)d
and in general
an = am + (n − m)d
A finite portion of an arithmetic progression is called a finite arithmetic
progression and sometimes just called an arithmetic progression. The
sum of a finite arithmetic progression is called an arithmetic series,
n(a1 + an ) n
Sn = = [2a1 + (n − 1)d]
2 2
59
Geometric Series The n-th term of a geometric sequence with initial
value a and common ratio r is given by
an = arn−1
an = ran−1
which is valid for |x| < 1, is one of the most important examples of a
power series, as are the exponential function formula
∞
X xn x2 x3
ex = =1+x+ + + ··· ,
n=0
n! 2! 3!
60
valid for all real x.
These power series are also examples of Taylor series. The Taylor
series of a real or complex-valued function f (x) that is infinitely differ-
entiable at a real or complex number a is the power series
f 0 (a) f 00 (a) f 000 (a)
f (a) + (x − a) + (x − a)2 + (x − a)3 + · · · .
1! 2! 3!
which can be written in the more compact sigma notation as
∞
X f (n) (a)
(x − a)n
n=0
n!
where n! denotes the factorial of n and f (n) (a) denotes the nth derivative
of f evaluated at the point a. The derivative of order zero of f is defined
to be f itself and (x − a)0 and 0! are both defined to be 1.
2.14 Concurrency
2.14.1 Threads & Processes
Typically, threads share the same address space, and processes have
independent and private address spaces. Below is the context switch
assembly code in xv6.
/* save current sp */
movl current_thread, %eax
movl %esp, (%eax)
61
/* load saved sp for next_thread*/
movl (%eax), %esp
/* LOAD next_thread’s SP */
popal
/* pop return address from stack */
ret
2.14.2 Locks
Locks help us write code that uses concurrency by avoiding multiple
processors modifying the same resource at the same time. A lock has
two operations, acquire and release. When one process acquire a lock,
other processes have to wait until the lock is released. Wherever code
accesses a section of code which is shared, always lock26 .
One problem with locking is deadlock. It happens when two processes
acquire the locks for two independent resources, and wait for each other
to release their lock in order to grab the other lock and proceed. This
problem can be avoided by one simple rule: Always acquire the locks in
a predetermined order.
Lock implementation with atomic instruction:
62
void acquire(struct lock *l) {
for (;;) {
if (atomic_exchange(&l->locked, 1) == 0)
return;
}
}
void release(struct lock *l) {
l->locked = 0;
}
AF : R ⇒ A
RI : R ⇒ boolean
// Abstraction Function:
For a given RatTerm t, "coefficient of t" is synonymous with
t.coeff, and, likewise, "exponent of t" is synonymous with
t.expt. All RatTerms with a zero coefficient are represented
by the zero RatTerm, z, which has zero for its coefficient
AND exponent.
63
// Representation Invariant:
coeff != null
coeff.equals(RatNum.ZERO) ==> expt == 0
64
Python classes provide all the standard features of Object Ori-
ented Programming: the class inheritance mechanism allows mul-
tiple base classes, a derived class can override any methods of its
base class or classes, and a method can call the method of a base
class with the same name.
Python has several special methods that may be overrided (”customiz-
able”).
class MyClass:
def __init__(self):
# The instantiation operation
def __del__(self):
# Called when the instance is about to be destroyed
def __repr__(self):
# Called by repr() built-in function. Official string
representation.
def __str__(self):
# Called by str() built-in function and print(). Informal
string representation.
def __eq__(self, other)
def __ne__(self, other)
def __lt__(self, other)
def __le__(self, other)
def __gt__(self, other)
def __ge__(self, other)
# These are comparison methods, called for comparison
operators. Note: x==y is True does not imply that
x!=y is False (no dependence).
def __cmp__(self, other)
# Called by comparison operations. Returns a negative
integer if self < other, zero if self == other, and a
positive integer if self > other.
def __hash__(self, other)
# Called by comparison operations.
The way inheirtance works in Python is just like in Java. Subclasses can
override superclass methods and fields. In the override version, the sub-
class can call the superclass’s function by simply referring that function
as an attribute with a dot.
According to the documentation, Python supports class-private mem-
bers in a limited way, to avoid name clashes with subclasses’ names, by
using name mangling. Any identifier of the form spam (at least two
underscores, at most one trailing underscore) is textually replaced with
65
classname spam.
66
Bridge: Decouple an abstraction from its implementation allowing
the two to vary independently.
Composite: Compose objects into tree structures to represent part-
whole hierarchies. Composite lets clients treat individual objects and
compositions of objects uniformly.
Decorator: Attach additional responsibilities to an object dynami-
cally keeping the same interface. Decorators provide a flexible alterna-
tive to subclassing for extending functionality.
Flyweight: Use sharing to support large numbers of similar objects
efficiently.
67
Active Object: Decouples method execution from method invocation
that reside in their own thread of control. The goal is to introduce
concurrency, by using asynchronous method invocation and a scheduler
for handling requests.
Balking: Only execute an action on an object when the object is in
a particular state.
Double-checked locking: Reduce the overhead of acquiring a lock by
first testing the locking criterion (the ’lock hint’) in an unsafe manner;
only if that succeeds does the actual locking logic proceed. Can be
unsafe when implemented in some language/hardware combinations. It
can therefore sometimes be considered an anti-pattern.
Monitor object: An object whose methods are subject to mutual
exclusion, thus preventing multiple objects from erroneously trying to
use it at the same time.
Reactor: A reactor object provides an asynchronous interface to re-
sources that must be handled synchronously.
Scheduler: Explicitly control when threads may execute single-threaded
code.
Thread-specific storage: Static or ”global” memory local to a thread.
Lock: One thread puts a ”lock” on a resource, preventing other
threads from accessing or modifying it.
Read-write lock: Allows concurrent read access to an object, but
requires exclusive access for write operations.
Other Patterns There are other useful patterns, such as the MVC
pattern.
Model-view-controller (MVC): Model–view–controller (MVC) is a soft-
ware design pattern for implementing user interfaces on computers. The
model directly manages the data, logic, and rules of the application. A
view can be any output representation of information, such as a chart
or a diagram. Multiple views of the same information are possible, such
as a bar chart for management and a tabular view for accountants. The
third part, the controller, accepts input and converts it to commands
for the model or view.
Active Record: The active record pattern is an approach to accessing
data in a database. A database table or view is wrapped into a class.
Thus, an object instance is tied to a single row in the table. After cre-
ation of an object, a new row is added to the table upon save. Any
object loaded gets its information from the database. When an object is
updated, the corresponding row in the table is also updated. The wrap-
68
per class implements accessor methods or properties for each column in
the table or view. Think about Rails.
Data access object: a data access object (DAO) is an object that pro-
vides an abstract interface to some type of database or other persistence
mechanism. By mapping application calls to the persistence layer, the
DAO provides some specific data operations without exposing details of
the database.
2.15.4 Architecture
Software architecture is the fundamental structures of a software system,
the discipline of creating such structures, and the documentation of these
structures28 .
2.15.5 Testing
Defects and Failures Software faults occur through the following
processes. A programmer makes an error (mistake), which results in a
defect (fault, bug) in the software source code. If this defect is executed,
in certain situations the system will produce wrong results, causing a
failure.
28 From Wikipedia.
69
Testing Methods There are several well-known testing methods, as
discussed below.
• Static vs. Dynamic testing: There are many approaches available
in software testing. Reviews, walkthroughs, or inspections are re-
ferred to as static testing, whereas actually executing programmed
code with a given set of test cases is referred to as dynamic testing.
• Black-box testing & White-box testing Black-box testing treats
the software as a ”black box”, examining functionality without any
knowledge of internal implementation, without seeing the source
code. The testers are only aware of what the software is supposed
to do, not how it does it. White-box testing (also known as clear
box testing, glass box testing, transparent box testing and struc-
tural testing, by seeing the source code) tests internal structures
or workings of a program, as opposed to the functionality exposed
to the end-user.
70
• System testing: System testing, or end-to-end testing, tests a com-
pletely integrated system to verify that the system meets its re-
quirements. For example, a system test might involve testing a
logon interface, then creating and editing an entry, plus sending
or printing results, followed by summary processing or deletion (or
archiving) of entries, then logoff.
71
3 Flagship Problems
It turns out that I do not have a lot of time to complete many problems
and record the solutions here. I will include the description several im-
portant problems, and solve them after I print this out. I will update the
solutions hopefully eventually. Refer to 3.9 for these unsolved problems.
3.1 Arrays
Missing Ranges (Source. Leetcode 163) Given a sorted integer array
where the range of elements are in the inclusive range [lower, upper],
return its missing ranges.
For example, given [0, 1, 3, 50, 75], lower = 0 and upper =
99, return ["2", "4->49", "51->74", "76->99"].
My code:
class Solution(object):
def __getRange(self, a, b):
if b - a > 1:
if b - a == 2:
return str(a+1)
else:
return str(a+1) + "->" + str(b-1)
else:
return None
for i, n in enumerate(nums):
rg = None
if i == 0:
# First number
rg = self.__getRange(lower, nums[0])
else:
rg = self.__getRange(nums[i-1], nums[i])
if rg is not None:
72
result.append(rg)
# Last number
rg = self.__getRange(lastOne, upper)
if rg is not None:
result.append(rg)
return result
Given [1,3],[2,6],[8,10],[15,18],
return [1,6],[8,10],[15,18].
My code:
class Solution(object):
def merge(self, intervals):
if len(intervals) <= 1:
return intervals
Given [0,1,2,4,5,7],
Return ["0->2","4->5","7"].
73
My code:
class Solution(object):
def summaryRanges(self, nums):
if len(nums) == 0:
return []
ranges = []
start = nums[0]
prev = start
j = 1
while j <= len(nums):
cur = nums[-1] + 2
if j < len(nums):
cur = nums[j]
if cur - prev > 1:
if start == prev:
ranges.append(str(start))
else:
ranges.append(str(start) + "->" + str(prev))
start = cur
prev = cur
j += 1
return ranges
3.2 Strings
Longest Absolute Image File Path (Source. Leetcode 388) Sup-
pose we abstract our file system by a string in the following manner.
The string
dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext
represents:
dir
subdir1
file1.ext
subsubdir1
subdir2
subsubdir2
file2.png
74
We are interested in finding the longest (number of characters) absolute
path to a file within our file system. For example, in the example above,
the longest absolute path to an image file is
dir/subdir2/subsubdir2/file2.png
and its length is 32 (not including the double quotes).
Given a string S representing the file system in the above format, re-
turn the length of the longest absolute path to file with image extension
(png, jpg, bmp) in the abstracted file system. If there is no file in the
system, return 0.
Idea: Use a hash table to keep track of each level and the most recent
absolute path size at this level, as well as the maximum absolute path
size at this level. See code next page.
75
def solution(S):
S += ’\n’
# Key: level
# Value: a tuple (most recent absolute path size at this
level, maximum absolute path size at this level)
dict = {}
dict[-1] = (0,0)
curLevelCount = 0
curFname = ""
for ch in S:
if ch != ’\n’ and ch != ’ ’:
# Append new character if it is not a special character
curFname += ch
elif ch == ’\n’:
curFnamePathSize = dict[curLevelCount-1][0] +
len(curFname)
if curLevelCount != 0:
# For the slash
curFnamePathSize += 1
if curLevelCount in dict:
prevMax = dict[curLevelCount][1]
dict[curLevelCount] = (curFnamePathSize,
max(prevMax, pathSizeWeCare))
else:
dict[curLevelCount] = (curFnamePathSize,
pathSizeWeCare)
curFname = ""
curLevelCount = 0
else:
curLevelCount +=1
maxPathSize = 0
for level in dict:
if level >= 0:
maxPathSize = max(maxPathSize, dict[level][1])
return maxPathSize
76
Repeated Substring Pattern (Source. Leetcode 459) Given a non-
empty string check if it can be constructed by taking a substring of it and
appending multiple copies of the substring together. You may assume
the given string consists of lowercase English letters only and its length
will not exceed 10000. Difficulty: Easy. Examples:
Input: "abab"
Output: True
--
Input: "abcabcabc"
Output: False
Idea 1: There is a Greedy way to solve this by using the π table in the
KMP algorithm (refer to 2.8.2 for more description.) Once we have the
π table of the given string P , then P is a repetition of its substring if:
• π[|P | − 1] ≥ (|P | − 1)/2. Basically the longest prefix equal to suffix
must end beyond half of the string.
• The length of the given string, |P |, must be divisible by the pattern,
given by P0 · · · Pπ [|P | − 1]
class Solution(object):
def kmpTable(self, p):
... Check this code in appendix (5.2).
def repeatedSubstringPattern(s):
q = s + s
return q[1:len(q)-1].find(s) != -1
77
Valid Parenthesis Given a string containing just the characters ’(’,
’)’, ’’, ’’, ’[’ and ’]’, determine if the input string is valid.
The brackets must close in the correct order, ”()” and ”()[]” are all
valid but ”(]” and ”([)]” are not.
class Solution(object):
def isValid(self, s):
"""
:type s: str
:rtype: bool
"""
if len(s) % 2 != 0:
return False
pstart = {’(’: ’)’, ’{’: ’}’, ’[’: ’]’}
if s[0] not in pstart:
return False
recorder = []
for ch in s:
if ch in pstart:
recorder.append(ch)
else:
rch = recorder.pop()
if pstart[rch] != ch:
return False
return len(recorder) == 0
3.3 Permutation
There are several classic problems related to permutations. Some involve
strings, and some are just array of numbers.
[
"((()))",
"(()())",
78
"(())()",
"()(())",
"()()()"
]
class Solution(object):
def generateParenthesis(self, n):
S = {}
solution = []
S[’(’] = (1,0)
while True:
if len(S.keys()) == 0:
break
str, tup = S.popitem()
o, c = tup
if o == n:
if c == n:
solution.append(str)
continue
else:
S[str+’)’] = (o, c+1)
elif o == c:
S[str+’(’] = (o+1, c)
else:
S[str+’(’] = (o+1, c)
S[str+’)’] = (o, c+1)
return solution
class Solution(object):
def canPermutePalindrome(self, s):
freq = {}
for c in s:
79
if c in freq:
freq[c] += 1
else:
freq[c] = 1
if len(s) % 2 == 0:
for c in freq:
if freq[c] % 2 != 0:
return False
return True
else:
count = 0
for c in freq:
if freq[c] % 2 != 0:
count += 1
if count > 1:
return False
return True
My code:
class Solution(object):
def _swap(self, p, a, b):
t = p[a]
p[a] = p[b]
p[b] = t
80
if len(p) == 2:
self._swap(p, 0, 1)
return
i = len(p)-1
while i > 0 and p[i-1] >= p[i]:
i -= 1
# Want to increase the number at p[i-1]. That number
# should be the smallest one (but >= p[i] in the range
# i to len(p)-1
if i > 0:
smallest = p[i]
smallestIndex = i
for j in range(i, len(p)):
if p[j] > p[i-1] and p[j] <= smallest:
smallest = p[j]
smallestIndex = j
self._swap(p, i-1, smallestIndex)
# Reverse [i to len(p)-1)].
for j in range(i, i+(len(p)-i)/2):
self._swap(p, j, len(p)-1-(j-i))
3.4 Trees
Binary Tree Longest Consecutive Sequence (Source. Leetcode
298) Given a binary tree, find the length of the longest consecutive se-
quence path.
The path refers to any sequence of nodes from some starting node
to any node in the tree along the parent-child connections. The longest
consecutive path need to be from parent to child (cannot be the reverse).
Example:
1
\
3
/ \
2 4
\
5
81
My code:
3.5 Graphs
Number of Islands (Source. 200) Given a 2d grid map of ’1’s (land)
and ’0’s (water), count the number of islands. An island is surrounded
by water and is formed by connecting adjacent lands horizontally or
vertically. You may assume all four edges of the grid are all surrounded
by water. Example:
11000
11000
00100
00011
Answer: 3
82
My code:
class Solution(object):
def __getRange(self, a, b):
if b - a > 1:
if b - a == 2:
return str(a+1)
else:
return str(a+1) + "->" + str(b-1)
else:
return None
for i, n in enumerate(nums):
rg = None
if i == 0:
# First number
rg = self.__getRange(lower, nums[0])
else:
rg = self.__getRange(nums[i-1], nums[i])
if rg is not None:
result.append(rg)
# Last number
rg = self.__getRange(lastOne, upper)
if rg is not None:
result.append(rg)
return result
83
This is a hard problem. I have two blog posts about two different ap-
proaches to find k-th smallest elements in two sorted arrays:
• Recursive O(log(mn)):
http://zkytony.blogspot.com/2016/09/find-kth-smallest-element-in-two-sorted.html
• Recursive O(logk)):
http://zkytony.blogspot.com/2016/09/find-kth-smallest-element-in-two-sorted_19.
html
My code:
class Solution(object):
def findMedianSortedArrays(self, nums1, nums2):
n = len(nums1)
m = len(nums2)
if (n+m) % 2 == 0:
m0 = self.kth(nums1, nums2, (n+m)/2-1)
m1 = self.kth(nums1, nums2, (n+m)/2)
return (m0 + m1) / 2.0
else:
return self.kth(nums1, nums2, (n+m)/2)
i = min(len(A)-1, k/2)
j = min(len(B)-1, k-i)
84
You have to paint all the posts such that no more than two adjacent
fence posts have the same color.
Return the total number of ways you can paint the fence. Note that
n and k are non-negative integers.
My code29 :
class Solution(object):
def numWays(self, n, k):
if n == 0:
return 0
if n == 1:
return k
# Now n >= 2.
# Initialize same and diff as if n == 2
same = k
diff = k*(k-1)
for i in range(3,n+1):
r_prev = same + diff # r(i-1)
same = diff # same(i)=diff(i-1)
diff = r_prev*(k-1)
return same + diff
3.8 Miscellaneous
Range Sum Query 2D - Mutable (Source. Leetcode 308) Given
a 2D matrix matrix, find the sum of the elements inside the rectangle
defined by its upper left corner (row1, col1) and lower right corner (row2,
col2). Difficulty: Hard. Example:
Given matrix = [
[3, 0, 1, 4, 2],
[5, 6, 3, 2, 1],
[1, 2, 0, 1, 5],
[4, 1, 0, 1, 7],
[1, 0, 3, 0, 5]
]
sumRegion(2, 1, 4, 3) -> 8
29 Ihave a blog post about this problem: http://zkytony.blogspot.com/2016/09/
paint-fence.html
85
//The above rectangle (with the red border) is defined by (row1,
col1) = (2, 1) and (row2, col2) = (4, 3), which contains sum
= 8.
update(3, 2, 2)
sumRegion(2, 1, 4, 3) -> 10
Idea: Use binary indexed tree. This kind of tree is built to solve problems
like this. See 2.1.15 for more detailed explanation of how it works. Code
is below. The formula to compute parent index of an index i, parent(i)
= i+i&(-i), not only works for 1D array, but also for the row and
column index for 2D array.
class BinaryIndexTree():
def __init__(self, matrix):
if not matrix:
return
self.num_rows = len(matrix)+1
self.num_cols = len(matrix[0])+1 if len(matrix) > 0 else 0
self.matrix = [[0 for x in range(self.num_cols-1)] for y
in range(self.num_rows-1)]
self.tree = [[0 for x in range(self.num_cols)] for y in
range(self.num_rows)]
for r in range(self.num_rows-1):
for c in range(self.num_cols-1):
self.update(r, c, matrix[r][c])
86
j -= ((~j+1) & j)
i -= ((~i+1) & i)
return result
class NumMatrix(object):
def __init__(self, matrix):
self.BIT = BinaryIndexTree(matrix)
3.9 Unsolved
Longest Substring With At Most k Distinct Characters (Source.
Leetcode 340) Given a string, find the length of the longest substring T
that contains at most k distinct characters.
For example, given s = ’eceba’ and k = 2, T is ’ece’ which its
length is 3.
I solved this problem, but my code is very hard to understand. So not
included here. Treat this problem as unsolved.
87
Input:
rows = 3, cols = 6, sentence = ["a", "bcd", "e"]
Output:
2
Explanation:
a-bcd-
e-a---
bcd-e-
Return 4.
88
Find Minimum in sorted rotated array Suppose an array sorted
in ascending order is rotated at some pivot unknown to you beforehand.
(i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2). Find the mini-
mum element. You may assume no duplicate exists in the array.
Follow-up: What if duplicates are allowed?
89
segment. Note that the last key point, where the rightmost building
ends, is merely used to mark the termination of the skyline, and always
has zero height. Also, the ground in between any two adjacent buildings
should be considered part of the skyline contour.
For instance, the skyline in Figure B should be represented as:[ [2
10], [3 15], [7 12], [12 0], [15 10], [20 8], [24, 0] ].
Notes:
1. The input list is already sorted in ascending order by the left x
position Li.
2. The output list must be sorted by the x position.
3. There must be no consecutive horizontal lines of equal height in the
output skyline. For instance, [...[2 3], [4 5], [7 5], [11
5], [12 7]...] is not acceptable; the three lines of height 5
should be merged into one in the final output as such: [...[2
3], [4 5], [12 7], ...]
Minimum Path Sum (Source. Leetcode 64) Given a m×n grid filled
with non-negative numbers, find a path from top left to bottom right
which minimizes the sum of all numbers along its path. Note: You can
only move either down or right at any point in time.
90
Wiggle Sort (Source. Leetcode 324) Given an unsorted array nums,
reorder it in-place such that nums[0] <= nums[1] >= nums[2] <= nums[3]....
For example, given nums = [3, 5, 2, 1, 6, 4], one possible an-
swer is [1, 6, 2, 5, 3, 4].
b a l l
a r e a
l e a d
l a d y
All words will have the exact same length. Word length is at least 1 and
at most 5. Each word contains only lowercase English alphabet a-z.
91
4 Behavioral
4.1 Standard
4.1.1 introduce yourself
This question is an ice breaker. For this kind of question, the most
important points to hit are (1) What is your interest in software en-
gineering, (2 Very briefly say your background; don’t be too detailed
because that will take up too much time in the interview. Be yourself.
Be natural. I will probably say something as follows.
I major in Computer Science. I am expected to graduate in June,
2017. I am interested in backend or fullstack software develop-
ment. I also hope to do work that involves some flavor of research,
because I love doing research. I am also interested in using ma-
chine learning to solve some of problems I will work on. I am
currently working at the Robotics State-Estimation Lab at UW.
I can talk more about that project later. [(but) The big parts
that I’ve contributed is that I’ve improved the navigation system
for our robot, and also made a pipeline for data collection of the
deep learning model.]This part can be omitted Besides robotics, I am
also leading a group of 4 to work on the Koolio project, a web-
site for people to share flippable content. In the last summer, I
worked at CME Group for basically full stack development of a
web application for helping my PM creating JIRA subtickets for
the upcoming sprint (2 weeks). I think I am well prepared to be
able to work at Google.
92
Chicago Mercantile Exchange physical place, and the transition
to high-frequency online trading.
93
This project is implemented with Ruby on Rails. I chose this
because I saw Github and Twitter was using this framwork, and
it is suitable for our purpose, and has a big enough community
behind. I chose to use PostgresSQl. It’s not a big difference from
MySQL, but since I have used MySQL before, I wanted to use
something different. The production sever is NginX plus Unicorn.
This is a quite popular combo for Rails projects. So I chose it.
(Back story. Can be omitted if you see fit.) I had this idea
in sophomore year, and there were some stories in assembling the
team. But the important piece is that, when I was doing intern-
ship in Chicago, my landlord is a student at SAIC. I basically
persuaded her to be the designer for this project, and she has
done a fantastic job. Being busy recently, I have got several new
people in my team, with background in computer science and in-
formatics. I think this project may one day be explosive, so I have
been persisting on it. It’s fun and satisfying to do.
4.2 Favorites
4.2.1 project?
Koolio.io. See 4.1.4 for how to describe it.
4.2.2 class?
Major class: Machine learning, CSE 332, data abstraction, and Com-
puter Graphics. Intriguing. Non-major class: JSIS 202 and OCEAN
250. Broaden my vision.
4.2.3 language?
Python. Python is like math. It is my go-to language if I just want to
code something up. It is slow, but it has nice integration with C.
94
4.2.5 machine learning technique?
Support Vector Machine (SVM), and boosting. SVM optimizes the de-
cision boundary by maximizing the margin (it is offline). Math shows
that maximizing the margin is the same as minimizing the norm for the
weight vector. Boosting is an ensemble algorithm that combines a set
of weak learners into a strong learner. Another amazing thing about
Boosting is that: it does not overfit.
95
desperate situation, and I stood out and said something like, ”Greg (our
professor) said every group in this class ends up doing fantastic work. I
trust that. But why don’t we go ahead and see if that is true?” Then
we began brainstorming new ideas, and eventually went for one that is
related to helping small business owners choosing ideal location. We did
great in the end.
4.3.4 failure?
Interviewed 10 companies last year for internship, and all rejected. I
think I didn’t do enough preparation for the coding interview. But I
ended up working at the UW RSE lab, which is a great thing.
96
5 Appendix
5.1 Java Implementation of Trie
Below is my Java implementation of Trie; I wrote this when working on
project 1 in CSE 332.
public Trie() {
root = new TrieNode(); // start with empty string
}
if (current.children.containsKey(c)) {
current = current.children.get(c);
} else {
TrieNode newNode = new TrieNode(c);
current.children.put(c, newNode);
current = newNode;
}
if (i == word.length() - 1) {
current.isWord = true;
}
}
}
if (current.children.containsKey(c)) {
current = current.children.get(c);
} else {
return false;
}
97
if (current.isWord && i == word.length() - 1) {
return true;
}
}
return false;
}
if (current.children.containsKey(c)) {
current = current.children.get(c);
} else {
return false;
}
}
return true;
}
public TrieNode() {
this(’\0’);
}
}
}
98
def kmpTable(p):
i, j = 1, 0
table = {-1: -1, 0: -1}
while i < len(p):
if p[i] == p[j]:
table[i] = j
j += 1
i += 1
else:
if j > 0:
j = max(0, table[j-1] + 1)
else:
table[i] = -1
i += 1
return table
# KMP rules
if d == len(P):
return k
elif d > 0 and table[d-1] == -1:
k = k + d
elif d > 0 and table[d-1] != -1:
k = k + d - table[d-1] - 1
else: # d == 0
k = k + 1
return -1
99
5.3 Python Implementation of Union-Find
Based on details described in 2.1.16, I implemented the Union-Find data
structure in python as follows.
class UnionFind:
"""Caveat: This implementation does not support adding
additional elements other than ones given initially."""
def __init__(self, elems):
"""
Constructs a union find data structure. Assumes that all
elements in elems are hashable.
"""
self.elems = list(elems)
self.idxmap = {}
self.impl = []
for i in range(len(elems)):
self.idxmap[self.elems[i]] = i
self.impl.append(-1)
100