Practical 6
Practical 6
& APIs
Part One: Trees
This practical is aimed at giving you some practice with building and manipulating trees and
experimenting with hashtables. This will probably take you longer than two hours, but next week’s
practical is a bit shorter so there is no problem in letting this one spill over to then.
For the first part of the practical, you will start with a simple binary tree node that holds an integer
value and a method to add nodes to the tree.
42
35
24
17
9
7
5
3
The results of the in-order traversal are returned as a String that you can print out and use to check
that you are getting the correct results. IntTreeTest performs some simple tests to confirm that the
above methods work as intended.
Read through the code and ensure that you understand how it works then try adding some new
nodes in the method IntTreeTest.inorderTest and checking the results with new tests.
Try altering the order in which the initial set of nodes are added to the tree in testWalks and confirm
that the results of walking the trees with a pre-order, in-order and post-order walk are what you
expect. Note how the initial node added to the tree can skew the tree that is produced.
Checkpoint 5.1: Once you are satisfied that these methods work, show the code for your traversal
methods and the test results to a demonstrator.
• designing your own code to make use of an existing API, and the importance of meeting the
API’s contract
• we’ll run some experiments to see the performance difference that arises from taking
shortcuts
• we’ll also have brief introductions to “random” numbers, Java annotations, and Java foreach
loops
I’ve highlighted a few useful new concepts that are also worth remembering as we go; you’ve
probably not seen these before but if you have you have a bit of a head start already!
This part is challenging, and you should expect to take some time to complete it. However, the
tasks of implementing code to match an API are exactly the kind of thing you might encounter in a
real programming job, so take your time and enjoy!
First up, we have an example BankAccount class to represent bank accounts. You’ll see that it’s
pretty simple. There are three attributes; an ID number, a name for the account holder, and a
balance (i.e. how much money is in the account). There is a public constructor to initialise these, and
an implementation of toString() to allow a pretty version of the object to be displayed.
New Concept: the toString() method has @Override above it. This is called an
annotation. Annotations are used to tell the Java compiler about your intentions when writing
the code. This one tells Java that toString() should override a method inherited from a
superclass. “wait,” I hear you say, “BankAccount” doesn’t ‘extend’ anything!”. Ah, but it does:
by default, all Java objects extend the “Object” class that comes included with Java, and this
defines several methods, including toString(), equals() and hashCode(). The
@Override annotation is really useful because it tells the compiler to check that a method really
does override something: if you accidentally typed tostring() (try it!) then you would have
made an all-new method, and not overridden the behaviour of Object.toString(); the
annotation means that the compiler would generate an error to help you spot the bug.
Now take a look at the Experiment1 class: it only contains a main method, divided into three
parts. First it initialises some values, including a Random Number Generator (RNG). Then it creates
and fills an array with new BankAccount objects. Finally, it loops over this array and prints
summaries of the objects. Take a look and try to understand what is happening at each step. What
would you expect to see in the output?
New Concept: We want to generate a lot of data at random for our experiments, but
computers like precise instructions and aren’t very good at doing random things. A RNG uses
some maths to generate a sequence of numbers that look random (actually, this is called
pseudo-random). The sequence is completely predictable for a particular starting number
called a seed, but is close enough to “random” for most purposes. The seed is set like this: new
Random(1) and you’ll see that, if you try changing the number, you can make different lists of
BankAccount objects.
New Concept: You’ll see two kinds of for(…) loops here. Imagine there is a list of things; this
could be an array, a queue, a stack, or many other different Java types. If we wanted to do
something with each element in the list we’d have to have code like this:
for (int i = 0; i < list.length; i++) {
System.out.println(list[i]);
}
This is like the first loop we have in the Experiment1 class. The following code is an
alternative for looping that is a little bit easier to read. For lists of Strings:
Much nicer! This is a Java for-each loop. It is used for looping over elements in a Collection.
(We can read the above as, “for each String item in list”). Broken down we have:
This is the second loop that’s used, when we just want to look at all the accounts in turn.
HashSets
We’re now going to make use of a HashSet, a class provided as part of Java Collections. You can
see the documentation for this here:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/HashSet.html . A HashSet
is a hashtable implementation of a “set” - a collection in which no duplicate objects are allowed (a
duplicate is when object1.equals(object2)). Items are grouped into buckets using their
hashcode, and checked against existing items in the HashSet. Attempting to add an item that is
already stored will do nothing. This is described in the documentation for the Set interface here:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/HashSet.html#add(E)
There is a large block of commented-out code in the Experiment1 class. Uncomment this. It will
create another array of BankAccount objects. You’ll see that it creates a new RNG using the same
seed as before, so these objects are identical to those in the first array: they will have the same IDs,
names and balances. You can check this by changing the print loop at the bottom of the class to
iterate over accounts2.
We then create a HashSet, and use a for loop fill it with the BankAccount objects in the original
accounts array.
New Concept: Java Generics. You’ll see that after HashSet is <BankAccount>. This tells
Java that you want a HashSet for containing BankAccount objects. You don’t need to
worry about the details here, but it makes for much cleaner code and again gives the compiler
more information about your intentions.
So, we now have a HashSet filled with BankAccount objects. Let’s use our array of identical
BankAccount objects to check whether the HashSet contains them... Make sure that the final
“print summaries” loop is over accounts2. Then, within the loop, after the existing
System.out.println(), add this:
System.out.println(accountSet.contains(ba));
This will print true or false for each of the accounts. What values do you expect to see here? Will
they all be true, all false, or a mixture? Run it and see.
The reason we are seeing what we do is that we haven’t overridden equals() for our
BankAccount objects. In the Object class this method only returns true if the objects are the
same object, not just objects containing the same values. So, when the HashSet checks the
accounts2 objects against the objects that came from accounts it always comes up as false.
They are the same in one sense (the IDs), but different in another (the specific objects)!
To fix this, let’s implement equals() in BankAccount, so Java knows that two BankAccounts
with the same ID are to be treated the same. An implementation is provided, you can just
uncomment it. BankAccount objects are now regarded as equal as long as they have the same ID
number. What do you expect to see if we run the code now? Will all the values be true now? Re-run
Experiment1 to find out.
Okay, there is another problem! All, or most, of those BankAccount objects are going missing.
This might seem like the basis of a money-making scam, but let’s stay on the right side of the law.
Let’s look at the documentation for equals()…
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/
Object.html#equals(java.lang.Object ) … “Note that it is generally necessary to override the hashCode
method whenever this method is overridden, so as to maintain the general contract for the hashCode
method, which states that equal objects must have equal hash codes.”
What has happened is that the objects are still using the hashCode() implementation from the
Object class, but are now using the equals() method we’ve added to BankAccount. We’ve
broken the contract specified in the documentation! This means that equal objects are getting
different hashcodes, and so are ending up in different buckets in the hashtable inside HashSet.
Given an object, we don’t look at the whole hashtable for matching objects, we just look in the
appropriate bucket. So, we look in the wrong place and don’t see the matching objects.
Let’s fix hashCode() so it is based on the same variable as equals(). Uncomment the
hashCode implementation. Rerun Experiment1. What happens now?
Try changing the RNG seed for the second array of BankAccount objects – what happens?
[Checkpoint 5.2] Show the demonstrator your implementation of hashCode(). Run Experiment1
once with the two RNG seeds being the same – all the items should have “True” printed after
them, showing they were correctly found in the set. They should all (or mostly) be false when the
seeds are different.
So, now we have the basics. We can create a lot of BankAccount objects, and put them into a
HashSet. Let’s see the impact of good and bad hashing functions on performance. Experiment2
contains the code for a simple experiment. It will generate a large number of BankAccount
objects and put them into a HashSet. It then generates another large number of BankAccount
objects; half of them are equal to the first ones, and half are new. It measures the time taken to
check whether these are already in the HashSet. This is divided by the number of objects, so the
number displayed on the console is how long, per object, the checking takes. We might expect the
time to check each object to go up as the HashSet has more objects in it – what do you think will
happen? Open up Excel – we are going to record just how slow this thing can get.
1. Try running Experiment2 and recording the values of N and the time per object.
2. Double the value of numberOfAccounts, run Experiment2 again and record the values.
3. Repeat a few times – the experiment takes longer because creating more objects takes more time.
You might find that over around 8,000,000 objects is your limit (possibly due to memory limitations).
4. Make a plot of time vs N. What trend do you see?
Imagine we had implemented hashCode() poorly and it tended to give the same value to all
objects. We can simulate this by just making BankAccount.hashCode() always return a zero.
(this doesn’t break the contract, because all objects that are equal to each other will also have the
same hashcode!) Repeat the experiment above. What trend do you see now? [If it seems to be
running but not doing anything, make numberOfAccounts smaller e.g. divide it by 10].