Symbol Tables
Symbol Tables
Symbol Tables
Symbol Tables
Intro to Programming
1. Elements of Programming
1.1 Your First Program
1.2 Built-in Types of Data
1.3 Conditionals and Loops
1.4 Arrays
1.5 Input and Output
1.6 Case Study: PageRank
2. Functions
2.1 Static Methods
2.2 Libraries and Clients
2.3 Recursion
2.4 Case Study: Percolation
3. OOP
3.1 Data Types
3.2 Creating Data Types
3.3 Designing Data Types
3.4 Case Study: N-Body
3.5 Purple America
3.6 Inheritance
4. Data Structures
4.1 Performance
4.2 Sorting and Searching
4.3 Stacks and Queues
4.4 Symbol Tables
4.5 Case Study: Small World
Intro to CS
0. Prologue
5. A Computing Machine
5.1 Data Representations
5.2 TOY Machine
5.3 TOY Instruction Set
5.4 TOY Programming
5.5 TOY Simulator
6. Building a Computer
6.1 Combinational Circuits
http://introcs.cs.princeton.edu/java/44st/
1/19
11/17/2014
Symbol Tables
Web Resources
FAQ
Data
http://introcs.cs.princeton.edu/java/44st/
2/19
11/17/2014
Symbol Tables
Code
Errata
Appendices
A. Operator Precedence
B. Writing Clear Code
C. Gaussian Distribution
D. Java Cheatsheet
E. Matlab
Lecture Slides
Programming Assignments
Search
Symbol tables.
A symbol table is a data type that we use to associate values with keys. Clients can store (put) an entry into
the symbol table by specifying a key-value pair and then can retrieve (get) the value corresponding to a
particular key from the symbol table. For example, a university might associate information such as a
student's name, home address, and grades (the value) with that student's social security number (the key), so
that each student's records can be accessed by specifying a social security number. The same approach
might be used by a scientist to organize data, by a business to keep track of customer transactions, or by an
internet search engine to associate keywords with web pages.
API.
A symbol table is a collection of key-value pairs. We use a generic type Key for keys and a generic type
Value for values: every symbol-table entry associates a Value with a Key. In most applications, the keys
have some natural ordering, so (as we did with sorting) we require the key type Key to implement Java's
Comparable interface.
public class *ST<Key extends Comparable<Key>, Value>
*ST()
// create a symbol table
void put(Key key, Value v) // put key-value pair into the table
Value get(Key key)
// return value paired with key
// or null if no such value
boolean contains(Key key)
// is there a value paired with key?
3/19
11/17/2014
Symbol Tables
Comparable keys. We take advantage of key ordering to develop efficient implementations of put and
get. We also assume the keys do not change their value while in the symbol table. The simplest and
most commonly used types of keys, String and built-in wrapper types like Integer and Double, are
immutable.
Replace-the-old-value policy. If when a key-value pair is inserted into the symbol table that already
associates another value with the given key, we adopt the convention that the new value replaces the
old one (just as with an array assignment statement).
Not found. The method get() returns null if no entry with the given key has previously been put into
the table. This choice has two implications, discussed next.
Null keys and values. Clients are not permitted to use null as a key or value. This convention enables
us to implement contains() as follows:
public boolean contains(Key key) {
return get(key) != null;
}
Remove. We do not include a method for removing keys from the symbol table. Many applications
do require such a method, but we leave implementations as an exercise or for more advanced courses
in data structures in algorithms. Since clients cannot associate null with a key, one simple interface is
to take a put command with null as the value to mean remove.
Iterable. As with most collections, it is best to implement Iterable and to provide an appropriate
iterator that allows clients to visit the contents of the table. The natural order of iteration is in order of
the keys, so we implement Iterable<Key> and use get to get values, if desired.
Variations. Numerous other useful operations on symbol tables have been identified, and APIs based
on various subsets of them have been widely studied. We will consider several of these at the end of
this section.
Action
Key
Value
phone book
person's name
phone number
dictionary
look up word
word
definition
Internet DNS
website
IP address
Reverse DNS
IP address
Website
genomics
codon
amino acid
Java compiler
Variable name
stock quote
stock symbol
price
file share
song name
machine
http://introcs.cs.princeton.edu/java/44st/
4/19
11/17/2014
Symbol Tables
file system
file name
Program Lookup.java builds a set of key-value pairs from a file of comma-separated values and then
prints out values corresponding to keys read from standard input. The command-line arguments are
the file name and two integers, one specifying the field to serve as the key and the other specifying
the field to serve as the value.
Here are some sample data files: amino.csv (codons and amino acids), DJIA.csv (Dow Jones
Industrial average by date), ip.csv (hostnames and IP addresses), morse.csv (Morse code),
elements.csv (Periodic table of elements), mktsymbols.csv (market symbols and names), topleveldomain.txt (top-level domain names and their country).
Indexing. Program Index.java is a prototypical example of a symbol table client that uses an
intermixed sequence of calls to get() and put(): it reads in a list of strings from standard input and
prints a sorted table of all the different strings along with a list of integers specifying the positions
where each string appeared in the input. We have a large amount of data and want to know where
certain strings of interest are found. In this case, we seem to be associating multiple values with each
key, but we actually associating just one: a queue. Again, this approach is familiar:
Application
Action
Key
Value
book index
term
page numbers
genomics
DNA substring
locations
web search
keyword
websites
business
find transactions
customer name
transactions
Elementary implementations.
Symbol-table implementations have been heavily studied, Many different algorithms and data structures
have been invented for this purpose. We first describe two simple approaches.
Binary search implementation. Program BinarySearchST.java implements a symbol table by
maintaining two parallel arrays of keys and values, keeping them in key-sorted order. It uses binary
search for get. To maintain the keys in sorted order, it moves all of the larger elements over for put.
The get operation is logarithmic, but the put operation takes linear time per operation.
Linked list implementation. Program LinkedListST.java implements a symbol table with an
(unordered) linked list. Both put and get take linear time per operation: to search for a key, we need
to traverse its links; to put a key-value pair, we need to search for the given key.
http://introcs.cs.princeton.edu/java/44st/
5/19
11/17/2014
Symbol Tables
To develop a symbol-table implementation that is feasible for use with clients like Lookup and Index, we
need the flexibility of linked lists and the efficiency of binary search. Binary search trees, which we
consider next, provide just this combination.
This definition is like our definition of nodes for linked lists, except that it has two links, not just one.
Search in a BST. Suppose that you want to search for a node with a given key in a BST (or to get a
value with a given key in a symbol table). There are two possible outcomes: the search might be
successful (we find the key in the BST; in a symbol-table implementation we return the associated
value) or it might be unsuccessful (there is no key in the BST with the given key; in a symbol-table
implementation, we return null). A recursive algorithm is immediate: Given a BST (a reference to a
Node), first check whether the tree is empty (the reference is null). If so, then terminate the search as
unsuccessful (in a symbol-table implementation, return null). If the tree is non-empty, check whether
the key in the node is equal to the search key. If so, then terminate the search as successful (in a
symbol-table implementation, return the value associated with the key). If not, compare the search
key with the key in the node. If it is smaller, search (recursively) in the left subtree; if it is greater,
search (recursively) in the right subtree.
http://introcs.cs.princeton.edu/java/44st/
6/19
11/17/2014
Symbol Tables
Inserting into a BST. Suppose that you want to insert a new node into a BST (in a symbol-table
implementation, put a new key-value pair into the data structure). The logic is similar to searching for
a key, but the code is trickier: If the BST is empty, we create and return a new Node containing the
key-value pair; if the search key is less than the key at the root, we set the left link to the result of
inserting the key-value pair into the left subtree; if the search key is less, we set the right link to the
result of inserting the key-value pair into the right subtree; otherwise if the search key is equal, we
overwrite the existing value with the new value. Resetting the link after the recursive call in this way
is usually unnecessary, because the link changes only if the subtree is empty, but it is as easy to set
the link as to test to avoid setting it.
Program BST.java is a symbol-table implementation based on these two recursive algorithms. Here is a
useful binary search tree application.
Average case. If we insert random keys, we might expect the search times to be logarithmic as well,
because the first element becomes the root of the tree and should divide the keys roughly in half.
Applying the same argument to the subtrees, we expect to get about the same result as for the best
case. This intuition is indeed validated by careful analysis: a classic mathematical derivation shows
that the time required for put and get in a tree constructed from randomly ordered keys is logarithmic.
More precisely, the expected number of key comparisons is ~ 2 ln N for a random put or get in a key
built from N randomly ordered keys.
http://introcs.cs.princeton.edu/java/44st/
7/19
11/17/2014
Symbol Tables
Worst case. In the worst case, each node has exactly one null link, so the BST is like a linked list,
where put operations and get requests take linear time. Unfortunately this worst case is not rare in
practice - it arises, for example, when we insert the keys in order. Thus, good performance of the
basic BST implementation is dependent on the keys being sufficiently similar to random keys that the
tree is not likely to contain many long paths.
Remarkably, there are BST variants that eliminate this worst case and guarantee logarithmic performance
per operation, by making all trees nearly perfectly balanced. One popular variant is known as a red-black
tree. The Java library java.util.TreeMap implements a symbol table using this approach. Our symbol-table
implementation ST.java uses this data structure to implement our symbol-table API. It is remarkably
efficient: typically, it only accesses a small number of the nodes in the BST (those on the path from the root
to the node sought or to the leaf to which the new node is attached) and it only creates one new Node and
adds one new link for the put operation. Next, we show that put operations and get requests take
logarithmic time (under certain assumptions).
Traversing a BST.
Perhaps the most basic tree-processing function is known as tree traversal: Given a (reference to) a tree, we
want to systematically process every key-value pair in the tree. For linked lists, we accomplish this task by
following the single link to move from one node to the next. For trees, however, we have decisions to
make, because there are generally two links to follow. But recursion comes immediately to the rescue. To
http://introcs.cs.princeton.edu/java/44st/
8/19
11/17/2014
Symbol Tables
This remarkably simple method is worthy of careful study. It can be used as a basis for a toString()
implementation for BSTs and also a starting point for developing an iterator.
This approach necessitates periodically cleaning out nodes in the BST with null values, because
performance will degrade unless the size of the data structure stays proportional to the number of
key-value pairs in the table.
Range search. With a recursive method like traverse(), we can count the number of keys that fall
within a given range or return all the keys falling into a given range.
Order statistics. If we maintain an instance variable in each node having the size of the subtree
rooted at each node, we can implement a recursive method that returns the kth largest key in the BST.
9/19
11/17/2014
Symbol Tables
As a final example, we consider a data type that is simpler than a symbol table, still broadly useful, and
easy to implement with BSTs. A set is an (unordered) collection of distinct comparable keys, defined by the
following API:
A set is a symbol table with no values. We could use BST to implement SET, but a direct implementation is
simpler, and client code that uses SET is simpler and clearer than it would be to use placeholder values and
ignore them. Program SET.java implements the set API. Program DeDup.java is a SET client that reads in a
sequence of strings from standard input and prints out the first occurrence of each string (thereby removing
duplicates).
Perspective.
The use of binary search trees to implement symbol tables is a sterling example of exploiting the tree
abstraction, which is ubiquitous and familiar. Trees lie at the basis of many scientific topics, and are widely
used in computer science. We are accustomed to many tree structures in everyday life including family
trees, sports tournaments, the organization chart of a company, the tree of life, and parse trees in grammar.
Trees also arise in numerous computational applications including function call trees, parse trees, and file
systems. Many important applications are rooted in science and engineering, including phylogenetic trees in
computational biology, multidimensional trees in computer graphics, minimax game trees in economics,
and quad trees in molecular dynamics simulations.
Q + A.
17.
1.
17.
1.
Exercises
1. Modify Lookup to to make a program LookupAndPut that allows put operations to be specified on
standard input. Use the convention that a plus sign indicates that the next two strings typed are the
key-value pair to be inserted.
2. Modify Lookup to make a program LookupMultiple that handles multiple values having the same key
by putting the values on a queue, as in Index, and then printing them all out on a get request, as
http://introcs.cs.princeton.edu/java/44st/
10/19
11/17/2014
Symbol Tables
follows:
% java LookupMultiple amino.csv 3 0
Leucine
TTA TTG CTT CTC CTA CTG
3. Modify Index to make a program IndexByKeyword that takes a file name from the command line and
makes an index from standard input using only the keywords in that file. Note : using the same file
for indexing and keywords should give the same result as Index.
4. Modify Index to make a program IndexLines that considers only consecutive sequences of letters as
keys and uses line numbers instead of word position as the value and to . This functionality is useful
for programs, as follows:
% java IndexLines 6 0 < Index.java
continue 12
enqueue 15
Integer 4 5 7 8 14
parseInt 4 5
println 22
8. Draw the BST that results when you insert items with keys
E A S Y Q U E S T I O N
11/19
11/17/2014
Symbol Tables
10. Suppose that the following 31 keys appear (in some order) in a BST of height 5:
10 15 18 21 23 24 30 30 38 41
42 45 50 55 59 60 61 63 71 77
78 83 84 85 86 88 91 92 93 94
98
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
Draw the top 3 nodes of the tree (the root and its two children).
Implement toString() for BST, using a recursive helper method like traverse(). As usual, you can
accept quadratic performance because of the cost of string concatenation. Extra credit : Write a
linear-time toString() method for BST.
True or false. Given a BST, let x be a leaf node, and let p be its parent. Then either (i) the key of p is
the smallest key in the BST larger than the key of x or (ii) the key of p is the largest key in the BST
smaller than the key of x. Answer: true.
Modify the symbol-table API to use a wrapper type STentry that holds the keys and values (with
accessor methods key() and value() to access them). Your iterator should return STentry objects in
key-sorted order. Implement BST and Index as dictated by this API. Discuss the pros and cons of this
approach versus the one given in the text.
Modify the symbol-table API to handle values with duplicate keys by having get() return a iterator
for the values having a given key. Implement BST and Index as dictated by this API. Discuss the
pros and cons of this approach versus the one given in the text.
Prove that the expected number of key comparisons for a random put or get in a BST build from
randomly ordered keys is ~ 2 ln N.
Prove that the number of different BSTs that can be built from N keys is ~
Run experiments to validate the claims in the text that the put operations and get requests for Lookup
and Index are logarithmic in the size of the table when using BST.
Modify BST to implement a symbol-table API where keys are arbitrary objects. Use hashCode() to
convert keys to integers and use integer keys in the BST.
Modify BST to add methods min() and max() that return the smallest (largest) key in the table (or
null if no such key exists).
Modify BST to add a method size() that returns the number of elements in the table. Use the
approach of storing within each Node the number of nodes in the subtree rooted there.
Modify BST to add a method rangeCount() that takes two Key arguments and returns the number of
keys in a BST between the two given keys. Your method should take time proportional to the height
of the tree. Hint : First work the previous exercise.
Modify SET to add methods floor(), ceil(), and nearest() that take as argument a Key and return the
largest (smallest, nearest) element in the set that is no larger than (no smaller than, closest to) the
given Key.
Write a ST client GPA.java that creates a symbol table mapping letter grades to numerical scores, as
in the table below, then reads from standard input a list of letter grades and computes their average
(GPA).
A+
A
AB+
B
BC+
C
CD
F
4.33 4.00 3.67 3.33 3.00 2.67 2.33 2.00 1.67 1.00 0.00
12/19
11/17/2014
Symbol Tables
1. Implement the following methods for a binary tree that each take as argument a Node that is the root
of a binary tree.
size(): number of nodes in the tree.
leaves(): number of nodes whose links are both null
total(): sum of the key values in all nodes
Your methods should all run in linear time.
2. Implement a linear-time method height() that returns the maximum number of nodes on any path
from the root to a leaf node (the height of the null tree is 0; the height of a 1-node tree is 1).
3. A binary tree is heap-ordered if the key at the root is larger than the keys in all of its descendants.
Implement a linear-time method heapOrdered() that returns true if the tree is heap-ordered, false
otherwise.
4. A binary tree is balanced if both its subtrees are balanced and the height of its two subtrees differ by
at most 1. Implement a linear-time method balanced() that returns true if the tree is balanced, false
otherwise.
5. Two binary trees are isomorphic if only their key values differ (they have the same shape).
Implement a linear-time static method isomorphic() that takes two tree references as parameters and
returns true if they refer to isomorphic trees, false otherwise. Then implement a linear-time static
method eq() that takes two tree references as parameters and returns true if they refer to identical
trees (isomorphic with same key values), false otherwise.
public static boolean isomorphic(Node x, Node y) {
if (x == null && y == null) return true;
// both null
if (x == null || y == null) return false; // exactly one null
return isomorphic(x.left, y.left) && isomorphic(x.right, y.right);
}
6. Implement a linear-time method isBST() that returns true if the tree is a BST, false otherwise.
Solution : This task is a bit more difficult than it might seem. We use an overloaded recursive method
isBST() that takes two additional arguments min and max and returns true if the tree is a BST and all
its values are between min and max.
public static boolean isBST() {
return isBST(root, Integer.MIN_VALUE, Integer.MAX_VALUE);
}
private boolean isBST(Node x, int min, int max) {
if (x == null) return true;
if (x.val < min || x.val > max) return false;
return isBST(x.left, min, x.val) && isBST(x.right, x.val, max);
}
7. Write a method levelOrder() that prints BST keys in level order: First print the root, then the nodes
one level below the root, left to right, then the nodes two levels below the root (left to right), and so
forth. Hint : Use a Queue<Node>.
8. Compute the value returned by mystery() on some sample binary trees, then formulate an hypothesis
about its behavior and prove it.
http://introcs.cs.princeton.edu/java/44st/
13/19
11/17/2014
Symbol Tables
14/19
11/17/2014
Symbol Tables
case, output the list of web pages that contain at least one occurrence of each of the query words.
11. Multiple word search. Write a program that takes k words from the command-line, reads in
sequence of words from standard input, and identifies the smallest subsequence of text that contains
all of the k words (not necessarily in the same order). Don???t consider partial words.
Hint: for each index i, find the smallest subsequence [i, j] that contains the k query words. Keep a
count of the number of times each of the k query words appear. Given [i, j], compute [i+1, j'] by
decrementing the counter for word i. Then gradually increase j until the subsequence contains at least
one copy of each of the k words.
12. Repetition draw in chess. In the game of chess, if a board position is repeated three times with the
same side to move, the side to move can declare a draw. Describe how you could test this condition
using a computer program.
13. Registrar scheduling. The Registrar at a prominent northeastern University recently scheduled an
instructor to teach two different classes at the same exact time. Help the Registrar prevent future
mistakes by describing a method to check for such conflicts. For simplicity, assume all classes run
for 50 minutes starting at 9, 10, 11, 1, 2, or 3.
14. Entropy. We define the relative entropy of a text corpus with N words, k of which are distinct as
E = 1 / (N lg N) (p1 lg(k / p1) + ... + pk lg(k / pk) )
15.
16.
17.
18.
where pi is the fraction of times that word i appears. Write a program that reads in a text corpus and
prints out the relative entropy. Convert all letters to lowercase and treat punctuation marks as
whitespace.
Order statistics. Add to BST a method select() that takes an integer argument k and returns the kth
largest key in the BST. Assume that the subtree sizes are maintained in each node. The running time
should be proportional to the height of the tree.
Delete ith element. Create an ADT that supports the following operations: isEmpty(), insert(), and
delete(int i), where the deletion operation deletes and returns the ith least recently added object.
Use a BST with a counter that records the total number of elements n inserted and give the nth
element inserted the key n. Each BST node should also maintain the total number of nodes in the
subtree rooted at that node. To find the ith least recently added item, search for the ith smallest
element in the BST.
Mutable string. Create an ADT that supports the following operations on a string: get(int i),
insert(int i, char c), and delete(int i), where get returns the ith character of the string,
insert() inserts the character c and makes it the ith character, and delete() deletes the ith character.
Use a BST to implement operations in logarithmic time.
Sparse vectors and matrices. An N-by-N matrix is sparse if its number of nonzeros is proportional
to N (or less). Goal: represent sparse matrix with space proportional to N by only implicitly storing
the zero entries. Add two sparse vectors/matrices in time proportional to the number of nonzeros.
Design ADTs SparseVector.java and SparseMatrix.java that support the following APIs.
public class SparseVector
public SparseVector(int N)
public void (put i, double value)
public double get(int i)
public double dot(SparseVector b)
public SparseVector plus(SparseVector b)
public class SparseMatrix
public SparseMatrix(int N)
public void put(int i, int j, double val)
public double get(int i, int j)
http://introcs.cs.princeton.edu/java/44st/
//
//
//
//
//
0-vector of length N
a[i] = val
return a[i]
vector dot product
vector addition
// a[i][j] = val
// return a[i][j]
15/19
11/17/2014
Symbol Tables
// matrix product
// matrix addition
19. Expression parser. Write a program to parse and evaluate expressions of the following form. Store
the variable names in a symbol table.
A = 5
B = 10
C = A + B
D = C * C
Result: A = 5,
B = 10,
C = 15, D = 225
Dept
Dept ID
---------------Sales
31
Engineering 33
Clerical
34
Marketing
35
22. Molecular weight calculator. Write a program MolecularWeight.java that reads in a list of elements
and their molecular weights, and then prompts the user for the molecular description of a chemical
compound (e.g., CO.2 or Na.Cl or N.H4.N.O3 = ammonium nitrate) and prints out its molecular
weight.
Web Exercises
1. Codon usage table. Describe how to hash codons (3 consecutive nucleotides) to an integer between 0
and 63. Print out summary statistics for each codon, e.g.,
UUU 13.2(
http://introcs.cs.princeton.edu/java/44st/
724)
UCU 19.6(
1079)
UAU 16.5(
909)
UGU 12.4(
680)
16/19
11/17/2014
Symbol Tables
UUC 23.5(
UUA 5.8(
UUG 17.6(
1290)
317)
965)
UCC 10.6(
UCA 16.1(
UCG 11.8(
583)
884)
648)
UAC 14.7(
UAA 0.7(
UAG 0.2(
808)
38)
13)
CUU 21.2(
CUC 13.5(
CUA 6.5(
CUG 10.7(
1166)
739)
357)
590)
CCU 10.4(
CCC 4.9(
CCA 41.0(
CCG 10.1(
571)
267)
2251)
553)
CAU 13.3(
CAC 8.2(
CAA 24.9(
CAG 11.4(
AUU 27.1(
AUC 23.3(
AUA 5.9(
AUG 22.3(
1491)
1279)
324)
1223)
ACU 25.6(
ACC 13.3(
ACA 17.1(
ACG 9.2(
1405)
728)
940)
505)
AAU
AAC
AAA
AAG
GUU 25.7(
GUC 15.3(
GUA 8.7(
1412)
843)
478)
GCU 24.2(
GCC 12.6(
GCA 16.8(
1328)
691)
922)
UGC
UGA
UGG
8.0(
0.3(
9.5(
438)
15)
522)
733)
448)
1370)
625)
CGU 10.5(
CGC 4.2(
CGA 10.7(
CGG 3.7(
578)
233)
588)
201)
27.2(
21.0(
32.7(
23.9(
1495)
1151)
1794)
1311)
AGU 11.9(
AGC 6.8(
AGA 14.2(
AGG 2.8(
653)
374)
782)
156)
GAU 49.4(
GAC 22.1(
GAA 39.8(
2714)
1212)
2185)
GGU 11.8(
GGC 7.0(
GGA 47.2(
650)
384)
2592)
2. Functions on trees. Write a function count() that takes a Node x as an argument returns the number
of nodes in the subtree rooted at x (including x). The number of elements in an empty binary tree is 0
(base case), and the number of elements in a non-empty binary tree is equal to one plus the number of
elements in the left subtree plus the number of elements in the right subtree.
public static int count(TwoNode x) {
if (x == null) return 0;
return 1 + count(x.left) + count(x.right);
}
3. Random element. Add a symbol table function random to a BST that returns a random element.
Assume that the nodes of your BST have integer size fields that contain the number of elements in
the subtree rooted at that node. The running time should be proportional to the length of the path
from the root to the node returned.
4. Markov language model. Create a data type that supports the following two operations: add and
random. The add method should insert a new item into the data structure if it doesn't yet exists; if it
already exists, it should increase its frequency count by one. The random method should return an
element at random, where the probabilities are weighted by the frequency of each element.
5. Queue with no duplicates. Create a data type that is a queue, except that an element may only
appear on the queue at most once at any given time. Ignore requests to insert an item if it is already
on the queue.
6. Bayesian spam filter. Follow the ideas in A Plan for Spam. Here is a place to get test data.
7. Symbol table with random access. Create a data type that supports inserting a key-value pair,
searching for a key and returning the associated value, and deleting and returning a random value.
Hint: combine a symbol table and a randomized queue.
8. Random phone numbers. Write a program that takes a command line input N and prints N random
phone numbers of the form (xxx) xxx-xxxx. Use a symbol table to avoid choosing the same number
more than once. Use this list of area codes to avoid printing out bogus area codes.
9. Unique substrings of length L. Write a program that reads in text from standard input and calculate
the number of unique substrings of length L that it contains. For example, if the input is cgcgggcgcg
then there are 5 unique substrings of length 3: cgc, cgg, gcg, ggc, and ggg. Applications to data
compression. Hint: use the string method substring(i, i + L) to extract ith substring and insert into
a symbol table. Test it out on the first million digits of . or 10 million digits of .
10. The great tree-list recursion problem. A binary search tree and a circular doubly linked list are
conceptually built from the same type of nodes - a data field and two references to other nodes. Given
http://introcs.cs.princeton.edu/java/44st/
17/19
11/17/2014
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
Symbol Tables
a binary search tree, rearrange the references so that it becomes a circular doubly-linked list (in sorted
order). Nick Parlante describes this as one of the neatest recursive pointer problems ever devised.
Hint: create a circularly linked list A from the left subtree, a circularly linked list B from the right
subtree, and make the root a one node circularly linked list. Them merge the three lists.
Optimal BST.
Password checker. Write a program that reads in a string from the command line and a dictionary of
words from standard input, and checks whether it is a "good" password. Here, assume "good" means
that it (i) is at least 8 characters long, (ii) is not a word in the dictionary, (iii) is not a word in the
dictionary followed by a digit 0-9 (e.g., hello5), (iv) is not two words separated by a digit (e.g.,
hello2world)
Reverse password checker. Modify the previous problem so that (ii) - (v) are also satisfied for
reverses of words in the dictionary (e.g., olleh and olleh2world). Simple solution: insert each word
and its reverse into the symbol table.
Cryptograms. Write a program to read in a cryptogram and solve it. A cryptogram is ancient form of
encryption known as a substitution cipher in which each letter in the original message is replaced by
another letter. Assuming we use only lowercase letters, there are 26! possibilities, and your goal is to
find one that results in a message where each word is a valid word in the dictionary. Use
Permutations.java and backtracking.
Frequency counter. Write a program FrequencyCounter.java that reads in a sequence of strings from
standard input and counts the number of times each string appears.
Unordered array symbol table. Write a program SequentialSearchST.java that implements a
symbol-table using (unordered) parallel arrays.
Exception filter. The client program ExceptionFilter.java reads in a sequence of strings from a
whitelist file specified as a command-line argument, then it prints out all words from standard input
not in the whitelist.
Give the preorder traversal of some BST (not including null nodes). Ask the reader to reconstruct the
tree.
Write an ADT IterativeBST.java that implements a symbol table using a BST, but uses iterative
versions of get() and put().
Largest absolute value.
Tree reconstruction. Given the following traversals of a binary tree (only the elements, not the null
nodes), can you can you reconstruct the tree?
1. Preorder and inorder.
2. Postorder and inorder.
3. Level order and inorder.
4. Preorder and level order.
5. Preorder and postorder.
Answers
1. Yes. Scan the preorder from left to right, and use the inorder traversal to identify the left and
right subtrees.
2. Yes. Scan the postorder from right to left.
3. Yes. Scan the level order from left to right.
4. No. Consider two trees with A as the root and B as either the left or right child. The preorder
traversal of both is AB and the level order traversal of both is AB.
5. No. Same counterexample as above.
Highlighting browser hyperlinks. Browsers typically denote hyperlinks in blue, unless they've
already been visited, in which case they're depicted in purple Write a program
HyperLinkColorer.java that reads in a list of web addresses from standard input and output blue if it's
the first time reading in that string, and purple otherwise.
Spam blacklist. Insert known spam email addresses into a SET.java ADT, and use to blacklist spam.
Inverted index of a book. Write a program that reads in a text file from standard input and compiles
an alphabetical index of which words appear on which lines, as in the following input. Ignore case
http://introcs.cs.princeton.edu/java/44st/
18/19
11/17/2014
Symbol Tables
and punctuation.
It
it
it
it
was
was
was
was
the
the
the
the
best of times,
worst of times,
age of wisdom,
age of foolishness,
age 3-4
best 1
foolishness 4
it 1-4
of 1-4
the 1-4
times 1-2
was 1-4
wisdom 4
worst 2
Hint: create a symbol table whose key is a String that represents a word and whose value is a
Sequence<Integer> that represents the list of pages on which the word appears.
25. Randomly generated identities. The files names20k.csv and The file names20k-2.csv each contain
20,000 randomly generated identities (number, gender, first name, middle initial, last name, street
address, city, state, zip code, country, email address, telephone number, mother's maiden name,
birthday, credit card type, credit card number, credit card expiration date, Social security number)
from fakenamegenerator.com.
26. Distance between zip codes. Write a data type Location.java that represents a named location on the
surface of the earth (a name, latitude, and longitude). Then, write a client program that takes the
name of a ZIP code file (such as zips.txt) as a command-line argument, reads the data from the file,
and stores it in a symbol table. Then, repeatedly read in pairs of ZIP codes from standard input and
output the great-circle distance between them (in statute miles). This distance is used by the post
office to calculate shipping rates.
http://introcs.cs.princeton.edu/java/44st/
19/19