0% found this document useful (0 votes)
79 views

Hashcode and Equals

The document discusses the hashCode and equals methods in Java. It explains what these methods do, how they are used for primitive values and objects like Strings. It also discusses how to properly implement the equals method for user-defined classes by overriding it and comparing the important fields.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Hashcode and Equals

The document discusses the hashCode and equals methods in Java. It explains what these methods do, how they are used for primitive values and objects like Strings. It also discusses how to properly implement the equals method for user-defined classes by overriding it and comparing the important fields.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

hashCode

and equals
This article is part of Marcus Biel’s free Java course focusing on clean code principles. It concludes a
series where we go over all the methods of the java.lang.Object class. Since hashCode and
equals are two of java.lang.Object’s methods which follow a contract that binds them together, it
makes sense to talk about both methods together. In fact, they are bound in such a way that we cannot
implement one without the other and expect to have a well-written class. Knowing all the details about
these two methods is essential to becoming a better Java programmer. So what do these two methods
do?

equals method
The equals method is used to compare two 1 @Test
objects for equality, similar to the equals 2 public void primitivesShouldBeEqual() {
operator used for primitive values. In Listing 1, 3 int i = 4;
4 int j = 4;
we can see that CarTest will pass for 5 assertTrue(i == j);
primitivesShouldBeEqual, since the 6 }
equality operator == inherently works for Listing 1
primitives. The two primitive values i and j are
assigned respectively as follows, i = 4 and j = 4. Asserting with the equals operator succeeds and we
get a passing test, as expected.

In Listing 2, stringShouldBeEqual has two 1 @Test


distinct String reference variables, hello1 and 2 public void StringShouldBeEqual() {
hello2, with two distinct “Hello” String 3 String hello1 = "Hello";
4 String hello2 = "Hello";
constants assigned to them. Before we talk about 5 String hello3 = "H";
equals, let’s look at what would happen if we 6 hello3 = hello3 + "ello";
tried comparing these two Strings with the 7 System.out.println(hello3);
8 assertTrue(hello1.equals(hello3));
equality operator which is used for primitive 9 }
values. hello1 == hello2 would return true
without needing to use the equals method at all. Listing 2
This is because String values are optimized in the background for constants wherein there is actually
only one of them and both variables point to the same object in memory. The part of memory where
the variables hello1 and hello2 are stored is called a stack. On the other hand, the actual single
“Hello” object both variables point to is stored in another part of memory called the heap.

Going back to our equality operator scenario, things become a little more complicated as soon as we
add a new String variable where we initialize hello3 to “H” and re-assign hello3 to itself
concatenated with “ello”. hello1 == hello3 would evaluate to false. Printing them both out, we
would expect the hello1 == hello3 to evaluate to true. However, String has the same gotcha as

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/
do other objects, i.e. we should not use the equality operator on objects, including Strings. Instead,
we use the equals method as is shown on Listing 2, line 8.

String’s equals method 1 public boolean equals(Object anObject) {


The String class has an equals 2 if (this == anObject) {
3 return true;
method as seen in Listing 3. First, it 4 }
checks if the equality operator works 5 if (anObject instanceof String) {
for the two Strings being compared. 6 String anotherString = (String) anObject;
7 int n = value.length;
This is actually checking to see if both 8 if (n == anotherString.value.length) {
variables share the same reference, i.e. 9 char v1[] = value;
pointing to the same String in 10 char v2[] = anotherString.value;
11 int i = 0;
memory – making it a performance
12 while (n-- != 0) {
optimization. 13 if (v1[i] != v2[i]) return false;
14 i++;
15 }
Afterwards, it checks if the second
16 return true;
object is actually a String. This is 17 }
achieved with the instanceof 18 }
19 return false;
operator. instanceof safely casts the
20 }
object to a String so that we can
compare both of them. An Listing 3
instanceof feature which many developers do not know about is that it includes a check for null. A
null object will never be a String and will never evaluate to true. Lastly, the method simply loops
through and compares each String’s characters for equality, returning true if and only if they are all
completely equal.

String is actually a subclass of Object and its equals method is overridden, making it work on
Strings like hello1 and hello3 from Listing 1 as expected.

User classes
Consider two instances 1 @Test
of a custom class Car in 2 public void porscheShouldBeEqual() {
Listing 4. It is an instance 3 Car myPorsche1 = new Car("Marcus", "Porsche", "silver");
4 Car myPorsche2 = new Car("Marcus", "Porsche", "silver");
of our Car class defined 5 myPorsche2 = myPorsche1;
in Listing 4 which 6 assertTrue(myPorsche1.equals(myPorsche2));
happens to be a silver 7 }
Porsche owned by Listing 4
Marcus. Both instances seem to have exactly the same attributes, so we would expect them to be the
same car. Using the equals operator on them however returns false. It doesn’t work and the test fails
because we have two distinct instances.

Consider the simple case where myPorsche1 == myPorsche1, comparing myPorsche1 to itself. This
comparison returns true as the equality operator inherently returns true whenever we compare an

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/
object to itself. Moving on to myPorsche1 == myPorsche2, it would seem that we would need to use
the equals method again. In fact, running the assertion fails our test. So we try using equals by
asserting myPorsche1.equals(myPorsche2) only to find ourselves greeted with a failing test result.
This was working so nicely for Strings, so why is it not working with Car?

If we closely follow the execution of our code, we would discover that we eventually end up in the
Object class because our Car class does not define and override its own equals method. What
actually ends up happening internally is a comparison by the object with itself. In our case,
myPorsche1.equals(myPorsche2) would only compare the reference variables themselves and not
the actual Car objects. This comparison would invariably return false simply because we have two
different reference variables. At any rate, it would be possible to make this true again if we were to
assign myPorsche2 to myPorsche1. This would make it so the object initialized to myPorsche2 no
longer has a variable pointing to it. This would also make both variables point to the object myPorsche1
was initialized to. Running the comparison yet again, we would then have a successful test run because
we would have compared the same object with itself again. Unfortunately, this is not what we want.
Instead, we want to compare both objects in a way that makes sense for our program.

Let’s have a look at the Car class.


1 package com.marcusbiel.javacourse;
2
3 public class Car {
4 private String owner;
5 private String manufacturer;
6 private String color;
7
8 public Car(String owner, String manufacturer, String color) {
9 this.owner = owner;
10 this.manufacturer = manufacturer;
11 this.color = color;
12 }
13
14 @Override
15 public boolean equals(Object obj) {
16 return false;
17 }
18 }
Listing 5

Listing 5 shows a small simple class with a constructor, some attributes, and an equals method. Notice
that we do not have Car car as the parameter for our equals method. We cannot do this because of
Object’s original equals signature, so instead we want our parameter to be something like Object
obj. Later on we have to cast it to Car similar to what we saw earlier with the String class. Moreover,
it’s always a good idea to add the @Override annotation to ensure that we have properly overridden
the method. Our equals implementation is a stub that only returns false as a placeholder value
though, and we will eventually have to flesh out the details. It turns out however that implementing
equals is not so easy, so we will need to tackle some more theory before we continue.

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/
Design
Most of the time, developers just skip the design of
this method and simply click on “auto-generate”
which in many cases leads to severe bugs or at least
suboptimal performance. To implement equals
correctly, however, we have to define what makes
two Car instances equal in a way that makes sense
for the context of the program it is used in. Does it
make sense in its context for two cars to be equal
when they have the same manufacturer? When
they have the same manufacturer and color? The same engine? Or the same number of wheels? The
same top speed? The same vehicle identification number (VIN)? Based on the context, we have to
decide which set of fields can be used to identify an object and which fields to compare are redundant.
Additionally, for improved performance, we need to define the order in which we want to compare the
attributes. Are there any fields with a high chance of being unequal? Do some fields compare faster than
others? Implementing a meaningful equals method requires analysis of these aspects in great detail.

In this example, we assume that the car’s top speed


relates to the type of the engine in some form,
making it a redundant field which is not so helpful
for implementing equals – so we remove it. Let’s say
our cars always have four wheels in our example.
The number of wheels would not help us to
differentiate between cars. What about the VIN?
This would again depend on the type of program we
want to create. For the police, this is probably the only proper way of identification. But what about a
program the manufacturer uses? While the car is still being built, the VIN would probably be less
important. We just cannot make broad assumptions about how we tell between cars. This is something
that should be verified with the business people involved with the program. Based on what we gather
from them, we select attributes from our two Car instances to compare and in what order. In our
upcoming example, we are the business analyst and the developer as well, so we will just go ahead and
define two cars to be different when they have different VINs.

The equals contract


First of all, there are five conditions that our equals implementation have to comply with. These are all
actually pretty straightforward, so let’s go through some examples and see what they imply:

Reflexivity
“An object must be equal to itself.”

This implies that myOldCar.equals(myOldCar) should always return true. This is pretty trivial but it
is still important to make sure we test for it.

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/
Symmetry
“Two objects must agree whether or not they are equal.”

If myOldCar.equals(someOtherCar) returns true, then so should


someOtherCar.equals(myOldCar). This also sounds pretty obvious when it fact it isn’t so when it
comes to inheritance. If you have a Car class and a BMW subclass of Car, it follows that all BMWs are
cars but not every car is a BMW. This makes the rule quite tricky to follow. Make sure to cover each
contract condition with a dedicated unit test to make sure the class stays fully compliant with the
contract.

Transitivity
“If one object is equal to a second, and the second to a third, the first must be equal to the third.”

This rule is actually more straightforward than it actually looks. For instance, let’s say we have three
cars: carA, carB and carC. If carA == carB, and carB == carC, then carA == carC.

Consistency
“If two objects are equal, they must remain equal for all time, unless one of them is changed.”

This implies that two objects should not be altered in any way by the equals method. So when you
repeatedly compare the same two objects with the equals method, it should always return the same
result.

Null returns false


“All objects must be unequal to null.”

This last rule is what Josh Bloch, author of Effective Java, calls “Non-nullity”. When given a null as an
equals method parameter, we should always return false and never throw a
NullPointerException.

Collections and Hashes


In Java, similar objects are
usually put into a
“collection” for processing.
Java collections are like
more powerful arrays, or
“arrays on steroids”!
Among other things, it
allows us to look up objects not only based on position, but also by their specific values. This is where
the equals method comes into play. To speed up the lookup, the creators of Java added specific hash-
based containers which use the hash value as a grouping mechanism to reduce the number of equal
comparisons needed. Ideally, objects considered unequal by equals will return different hash codes
which are used to group objects in “buckets”. In this ideal case, we will be able to find each object
simply by a lookup based on its hash value.

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/
However, there are “hash collisions” that come up
from two unequal objects sharing the same hash
code; in which case they end up in the same bucket.
If we are looking for, say, the instance dadsCar, we
have to look up the correct bucket based on the
hash code minus 391 that dadsCar will return.
With the hash collision however, we will have to do
an equals comparison on a list of two cars on top of
getting the hash codes.

The hashCode contract


Implementing hashCode comes with two rules. The first one being:

“For any two objects, return the same hash code when equals returns true.”

To achieve this, we use the same identifying attributes for both methods, in the same order. The second
condition is:

“When hashCode is invoked more than once and on the same object, it must consistently return the
same int value as long as the object is not changed.”

This rule is similar to the equals consistency rule introduced previously. Both equals and hashCode
methods must return consistent results.

To fulfill this contract you must overwrite hashCode whenever you overwrite equals and vice
versa. Also, when you add or remove attributes from your class you will have to adjust your equals and
hashCode methods. Last but not least, aim to return different hash codes when equals returns
false. Aiming to return different hash codes is not a hard and fast rule, but it will improve the
performance of your program by minimizing the number of hash collisions. Taken to its extreme,
the hashCode contract allows us to statically return a placeholder value such as 42 for all objects. As
Josh Bloch states in his book Effective Java however, this could result in quadratic rather than linear
execution time and therefore, could be the difference between working and not working.

The hashCode method is in fact a rather complicated beast. It is of such a level of complexity that
should there be new, vastly superior algorithms for hashing to a 32-bit integer as is the case with Java,
such a discovery would probably earn the highest honors and awards in computer science. Some of Josh
Bloch’s algorithms are some of the standards we use in Java for hashing as we will see in later examples.

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/
hashCode and equals implemented

equals
Both equals and 1 public class Car {
hashCode use 2 private Vin vehicleIdentificationNumber;
manufacturer, engine 3 private Manufacturer manufacturer;
4 private Engine engine;
and color in their 5 private Color color;
implementation. These 6 private int numberOfWheels;
fields are arbitrarily chosen 7
8 @Override
as per the simulated design 9 public boolean equals(Object obj) {
phase in our theory 10 if (this == obj)
discussion because they 11 return true;
12 if (obj == null)
make sense for our business 13 return false;
scenarios. Consequently, we 14 if (getClass() != obj.getClass())
disregard using other fields 15 return false;
16
like numberOfWheels and
17 Car other = (Car) obj;
VIN for the two methods. 18
Note that the order in which 19 if (!manufacturer.equals(other.manufacturer))
20 return false;
they are compared is also
21 if (!engine.equals(other.engine))
arbitrary. 22 return false;
23 if (!color.equals(other.color))
For our scenario, we choose 24 return false;
to compare manufacturers 25 return true;
26 }
first, followed by engines, 27
and lastly by color. If we 28 @Override
compare two objects which 29 public int hashCode() {...}
30 }
are not equal, we want to Listing 6
leave the method as early as
possible because the earlier we leave, the faster the entire code will run. In our scenario, we know that
there is a plethora of manufacturers, so we can quickly return false for cars which are most likely to have
different manufacturers. For our example, engines are more likely to be equal as many manufacturers
use the same engines for many cars. Even though there are actually millions of colors, we make the
assumption that there are only so many paint jobs and finishes that cars can come in. This way, we avoid
the even higher probability of cars coming in the same color. What we end up having is a comparison
order where we optimize by starting from the most likely to be unequal to the least likely, i.e. comparing
manufacturers, then engines and lastly colors. Ideally, performance tests on actual sets of data will point
us towards the best implementation.

Lines 10-15 in Listing 6 come at the top for performance reasons as well. Firsly, we check if the object is
being compared to itself by comparing this to obj, as is shown on line 10. This can actually be
removed if we know that such a scenario would never happen. That being said, it is one of the cheaper
checks we can make. We then make a very important check for null on line 12, returning false when

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/
passed it and fulfilling the last part of the equals contract. Doing so helps us avoid
NullPointerExceptions. Afterwards, we assert that we are comparing two Car instances. Failing
this, our cast would throw a ClassCastException. Now that we can cast obj to Car, we can
compare them by their private fields – which we can do because both are instances of the same class.
This allows us to easily say that engine.equals(other.engine), making the code much more
readable and shorter. For each pair of fields we compare, we can directly leave the method when they
are unequal. Only when we go through all the fields without returning false do we get to return
true. Note that we do not compare numberOfWheels as well as vehicleIdentificationNumber.
Firstly, this is because we have cars which always have four wheels in our example. Secondly, we could
have two physically identical cars, say both blue BMWs with the same engine, and we would still
consider them to be the same car in our program.

hashCode
We have learned so far that 1 @Override
implementing hashCode is 2 public int hashCode() {
quite complex. In order to 3 int result = 1;
4 result = 31 * result + manufacturer.hashCode();
understand the importance 5 result = 31 * result + engine.hashCode();
of the 31 constant, we need 6 return 31 * result + color.hashCode();
to go over a few details. 31 is 7
a prime number, can be Listing 7
multiplied very quickly using shift and subtraction as follows: 31 * i == (i << 5) – i, and
according to research, it provides a better key distribution which minimizes the number of hash
collisions. In Listing 7, 31 is successively multiplied with the accumulated result, adding in the hash codes
of the three fields we used earlier for equals. All of this optimizes our code performance by minimizing
the collisions.

As you can see, the method is actually not doing much other than delegating the calculation of the hash
code to the other classes representing the parts of our car. Out of these three classes, let’s have a
detailed look into Engine to see how hashCode is implemented in further detail.

The Engine Class


1 package com.marcusbiel.javacourse;
2
3 public class Engine {
4 private long type;
5 private String optionalField;
6
7 @Override
8 public boolean equals(Object obj) {
9 if (this == obj) return true;
10 if (obj == null) return false;
11 if (getClass() != obj.getClass()) return false;
12
13 Engine other = (Engine) obj;
14
15 if (type != other.type) return false;

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/
16 if (optionalField == null)
17 if (other.optionalField != null) return false;
18 else if (!optionalField.equals(other.optionalField))
19 return false;
20
21 return true;
22 }
23
24 @Override
25 public int hashCode() {
26 int result = 1;
27 result = 31 * result + (int) (type ^ (type >>> 32));
28 return 31 * result
29 + ((optionalField == null) ? 0 : optionalField.hashCode());
30 }
31 }
Listing 8

Engine’s equals method is similar to that of Car’s, a notable difference being the check for null on
the optionalField. We have to do some slightly more complicated null checks on both objects’
optionalField variables to prevent any NullPointerException. With optional primitive fields
however, we do not need to check for nulls. We do not make these same assumptions on required fields
such as manufacturer, engine and color, so we also do not make null checks for them. Doing so
would clutter the code with “rocket code”, making it extra safe but messy.

As for hashCode, we convert the long field variable type into an int as shown on Listing 8 line 27.
Since longs have 64 bits and ints only have 32, we halve type somehow to minimize collisions. To
understand how this is accomplished, we need to look at (int) (type ^ (type >>> 32)). First, we do
a bitwise shift with the operator >>> on type over 32 bits. This moves the first 32 bits of type over to
its last 32 bits. Second, we do an
1 @Override
“exclusive or” or “xor” with the ^ 2 public int hashCode() {
operator which effectively 3 int result = 0;
preserves and merges some of the 4 result = 31 * result + myByte;
5 result = 31 * result + myShort;
information from the two halves of 6 result = 31 * result + myInt;
type, ensuring that we minimize 7
collisions when we finally downcast 8 result = 31 * result
9 + (int) (myLong ^ (myLong >>> 32));
our long type to an int. Note 10 result = 31 * result
that this is the current standard for 11 + Float.floatToIntBits(myFloat);
hashing long fields. We do roughly 12
13 long temp = Double.doubleToLongBits(myDouble);
the same routine as with Car’s 14 result = 31 * result
hashCode and accumulate the 15 + (int) (temp ^ (temp >>> 32));
resulting hash. 16
17 result = 31 * result
someClass 18 + (myBoolean ? 1231 : 1237);
19 result = 31 * result + myChar;
Listing 9 shows a special example 20 return 31 * result + myString.hashCode();
which illustrates hashCode 21 }
implementation with primitive
Listing 9
© 2016, Marcus Biel, Software Craftsman
http://www.marcus-biel.com/
types in more detail. We see the same standard long field hashing being used. Floats are handled
differently in that they are converted to integer bits with a native helper function before being
accumulated into the hash. Double, having the same length as a long, is converted to a long with a
native helper function with the long being converted back and accumulated into the integer result.
Lines 17-18 look very different as they use some special syntax and two specifically chosen prime
constants, namely 1231 and 1237. Line 18 uses the ternary operator “?:” which works in a similar way
to an if-else statement, except that it is an expression that evaluates to a boolean. In this case, you
have myBoolean as the if condition where if true would return 1231, else which returns 1237. As for
1231 and 1237, these numbers are chosen for being large primes that minimize the number of hash
collisions. Lastly, char is basically a smaller integer number type, so it is trivial to accumulate it into the
hash.

equals
For primitive values, we simply use 1 @Override
the equality operator. We start with 2 public boolean equals(Object obj) {
the smallest possible values because 3 if (this == obj)
4 return true;
checking them has the highest chance 5 if (obj == null)
of making the code run faster across 6 return false;
multiple runs, i.e. we go from byte, 7 if (getClass() != obj.getClass())
8 return false;
to short, to int, to long and so on. 9
Floating types are a little more 10 SomeClass other = (SomeClass) obj;
complex to compare. float and 11
12 if (myByte != other.myByte)
double have to be converted to int 13 return false;
and long respectively before we 14 if (myShort != other.myShort)
compare them. booleans are trivial 15 return false;
16 if (myInt != other.myInt)
to compare, and we could have
17 return false;
placed them at the top for better 18 if (myLong != other.myLong)
performance. 19 return false;
20 if (Float.floatToIntBits(myFloat) !=
Thorough business analysis can reveal 21 Float.floatToIntBits(other.myFloat))
22 return false;
even better ways to arrange these 23 if (Double.doubleToLongBits(myDouble) !=
fields for performance. Again, 24 Double.doubleToLongBits(other.myDouble))
performance tests on live data are 25 return false;
26 if (myBoolean != other.myBoolean)
ultimately the best ways to eke out 27 return false;
the fastest possible runtimes when 28 if (myChar != other.myChar)
needed. Of course, all of these 29 return false;
30 return myString.equals(other.myString);
optimization techniques would only 31 }
be useful if performance were an
Listing 10
issue. Otherwise, the comparison
order probably doesn’t matter anyway.

© 2016, Marcus Biel, Software Craftsman


http://www.marcus-biel.com/

You might also like