Hashcode and Equals
Hashcode and Equals
and equals
This article is part of Marcus Biel’s free Java course focusing on clean code principles. It concludes a
series where we go over all the methods of the java.lang.Object class. Since hashCode and
equals are two of java.lang.Object’s methods which follow a contract that binds them together, it
makes sense to talk about both methods together. In fact, they are bound in such a way that we cannot
implement one without the other and expect to have a well-written class. Knowing all the details about
these two methods is essential to becoming a better Java programmer. So what do these two methods
do?
equals method
The equals method is used to compare two 1 @Test
objects for equality, similar to the equals 2 public void primitivesShouldBeEqual() {
operator used for primitive values. In Listing 1, 3 int i = 4;
4 int j = 4;
we can see that CarTest will pass for 5 assertTrue(i == j);
primitivesShouldBeEqual, since the 6 }
equality operator == inherently works for Listing 1
primitives. The two primitive values i and j are
assigned respectively as follows, i = 4 and j = 4. Asserting with the equals operator succeeds and we
get a passing test, as expected.
Going back to our equality operator scenario, things become a little more complicated as soon as we
add a new String variable where we initialize hello3 to “H” and re-assign hello3 to itself
concatenated with “ello”. hello1 == hello3 would evaluate to false. Printing them both out, we
would expect the hello1 == hello3 to evaluate to true. However, String has the same gotcha as
String is actually a subclass of Object and its equals method is overridden, making it work on
Strings like hello1 and hello3 from Listing 1 as expected.
User classes
Consider two instances 1 @Test
of a custom class Car in 2 public void porscheShouldBeEqual() {
Listing 4. It is an instance 3 Car myPorsche1 = new Car("Marcus", "Porsche", "silver");
4 Car myPorsche2 = new Car("Marcus", "Porsche", "silver");
of our Car class defined 5 myPorsche2 = myPorsche1;
in Listing 4 which 6 assertTrue(myPorsche1.equals(myPorsche2));
happens to be a silver 7 }
Porsche owned by Listing 4
Marcus. Both instances seem to have exactly the same attributes, so we would expect them to be the
same car. Using the equals operator on them however returns false. It doesn’t work and the test fails
because we have two distinct instances.
Consider the simple case where myPorsche1 == myPorsche1, comparing myPorsche1 to itself. This
comparison returns true as the equality operator inherently returns true whenever we compare an
If we closely follow the execution of our code, we would discover that we eventually end up in the
Object class because our Car class does not define and override its own equals method. What
actually ends up happening internally is a comparison by the object with itself. In our case,
myPorsche1.equals(myPorsche2) would only compare the reference variables themselves and not
the actual Car objects. This comparison would invariably return false simply because we have two
different reference variables. At any rate, it would be possible to make this true again if we were to
assign myPorsche2 to myPorsche1. This would make it so the object initialized to myPorsche2 no
longer has a variable pointing to it. This would also make both variables point to the object myPorsche1
was initialized to. Running the comparison yet again, we would then have a successful test run because
we would have compared the same object with itself again. Unfortunately, this is not what we want.
Instead, we want to compare both objects in a way that makes sense for our program.
Listing 5 shows a small simple class with a constructor, some attributes, and an equals method. Notice
that we do not have Car car as the parameter for our equals method. We cannot do this because of
Object’s original equals signature, so instead we want our parameter to be something like Object
obj. Later on we have to cast it to Car similar to what we saw earlier with the String class. Moreover,
it’s always a good idea to add the @Override annotation to ensure that we have properly overridden
the method. Our equals implementation is a stub that only returns false as a placeholder value
though, and we will eventually have to flesh out the details. It turns out however that implementing
equals is not so easy, so we will need to tackle some more theory before we continue.
Reflexivity
“An object must be equal to itself.”
This implies that myOldCar.equals(myOldCar) should always return true. This is pretty trivial but it
is still important to make sure we test for it.
Transitivity
“If one object is equal to a second, and the second to a third, the first must be equal to the third.”
This rule is actually more straightforward than it actually looks. For instance, let’s say we have three
cars: carA, carB and carC. If carA == carB, and carB == carC, then carA == carC.
Consistency
“If two objects are equal, they must remain equal for all time, unless one of them is changed.”
This implies that two objects should not be altered in any way by the equals method. So when you
repeatedly compare the same two objects with the equals method, it should always return the same
result.
This last rule is what Josh Bloch, author of Effective Java, calls “Non-nullity”. When given a null as an
equals method parameter, we should always return false and never throw a
NullPointerException.
“For any two objects, return the same hash code when equals returns true.”
To achieve this, we use the same identifying attributes for both methods, in the same order. The second
condition is:
“When hashCode is invoked more than once and on the same object, it must consistently return the
same int value as long as the object is not changed.”
This rule is similar to the equals consistency rule introduced previously. Both equals and hashCode
methods must return consistent results.
To fulfill this contract you must overwrite hashCode whenever you overwrite equals and vice
versa. Also, when you add or remove attributes from your class you will have to adjust your equals and
hashCode methods. Last but not least, aim to return different hash codes when equals returns
false. Aiming to return different hash codes is not a hard and fast rule, but it will improve the
performance of your program by minimizing the number of hash collisions. Taken to its extreme,
the hashCode contract allows us to statically return a placeholder value such as 42 for all objects. As
Josh Bloch states in his book Effective Java however, this could result in quadratic rather than linear
execution time and therefore, could be the difference between working and not working.
The hashCode method is in fact a rather complicated beast. It is of such a level of complexity that
should there be new, vastly superior algorithms for hashing to a 32-bit integer as is the case with Java,
such a discovery would probably earn the highest honors and awards in computer science. Some of Josh
Bloch’s algorithms are some of the standards we use in Java for hashing as we will see in later examples.
equals
Both equals and 1 public class Car {
hashCode use 2 private Vin vehicleIdentificationNumber;
manufacturer, engine 3 private Manufacturer manufacturer;
4 private Engine engine;
and color in their 5 private Color color;
implementation. These 6 private int numberOfWheels;
fields are arbitrarily chosen 7
8 @Override
as per the simulated design 9 public boolean equals(Object obj) {
phase in our theory 10 if (this == obj)
discussion because they 11 return true;
12 if (obj == null)
make sense for our business 13 return false;
scenarios. Consequently, we 14 if (getClass() != obj.getClass())
disregard using other fields 15 return false;
16
like numberOfWheels and
17 Car other = (Car) obj;
VIN for the two methods. 18
Note that the order in which 19 if (!manufacturer.equals(other.manufacturer))
20 return false;
they are compared is also
21 if (!engine.equals(other.engine))
arbitrary. 22 return false;
23 if (!color.equals(other.color))
For our scenario, we choose 24 return false;
to compare manufacturers 25 return true;
26 }
first, followed by engines, 27
and lastly by color. If we 28 @Override
compare two objects which 29 public int hashCode() {...}
30 }
are not equal, we want to Listing 6
leave the method as early as
possible because the earlier we leave, the faster the entire code will run. In our scenario, we know that
there is a plethora of manufacturers, so we can quickly return false for cars which are most likely to have
different manufacturers. For our example, engines are more likely to be equal as many manufacturers
use the same engines for many cars. Even though there are actually millions of colors, we make the
assumption that there are only so many paint jobs and finishes that cars can come in. This way, we avoid
the even higher probability of cars coming in the same color. What we end up having is a comparison
order where we optimize by starting from the most likely to be unequal to the least likely, i.e. comparing
manufacturers, then engines and lastly colors. Ideally, performance tests on actual sets of data will point
us towards the best implementation.
Lines 10-15 in Listing 6 come at the top for performance reasons as well. Firsly, we check if the object is
being compared to itself by comparing this to obj, as is shown on line 10. This can actually be
removed if we know that such a scenario would never happen. That being said, it is one of the cheaper
checks we can make. We then make a very important check for null on line 12, returning false when
hashCode
We have learned so far that 1 @Override
implementing hashCode is 2 public int hashCode() {
quite complex. In order to 3 int result = 1;
4 result = 31 * result + manufacturer.hashCode();
understand the importance 5 result = 31 * result + engine.hashCode();
of the 31 constant, we need 6 return 31 * result + color.hashCode();
to go over a few details. 31 is 7
a prime number, can be Listing 7
multiplied very quickly using shift and subtraction as follows: 31 * i == (i << 5) – i, and
according to research, it provides a better key distribution which minimizes the number of hash
collisions. In Listing 7, 31 is successively multiplied with the accumulated result, adding in the hash codes
of the three fields we used earlier for equals. All of this optimizes our code performance by minimizing
the collisions.
As you can see, the method is actually not doing much other than delegating the calculation of the hash
code to the other classes representing the parts of our car. Out of these three classes, let’s have a
detailed look into Engine to see how hashCode is implemented in further detail.
Engine’s equals method is similar to that of Car’s, a notable difference being the check for null on
the optionalField. We have to do some slightly more complicated null checks on both objects’
optionalField variables to prevent any NullPointerException. With optional primitive fields
however, we do not need to check for nulls. We do not make these same assumptions on required fields
such as manufacturer, engine and color, so we also do not make null checks for them. Doing so
would clutter the code with “rocket code”, making it extra safe but messy.
As for hashCode, we convert the long field variable type into an int as shown on Listing 8 line 27.
Since longs have 64 bits and ints only have 32, we halve type somehow to minimize collisions. To
understand how this is accomplished, we need to look at (int) (type ^ (type >>> 32)). First, we do
a bitwise shift with the operator >>> on type over 32 bits. This moves the first 32 bits of type over to
its last 32 bits. Second, we do an
1 @Override
“exclusive or” or “xor” with the ^ 2 public int hashCode() {
operator which effectively 3 int result = 0;
preserves and merges some of the 4 result = 31 * result + myByte;
5 result = 31 * result + myShort;
information from the two halves of 6 result = 31 * result + myInt;
type, ensuring that we minimize 7
collisions when we finally downcast 8 result = 31 * result
9 + (int) (myLong ^ (myLong >>> 32));
our long type to an int. Note 10 result = 31 * result
that this is the current standard for 11 + Float.floatToIntBits(myFloat);
hashing long fields. We do roughly 12
13 long temp = Double.doubleToLongBits(myDouble);
the same routine as with Car’s 14 result = 31 * result
hashCode and accumulate the 15 + (int) (temp ^ (temp >>> 32));
resulting hash. 16
17 result = 31 * result
someClass 18 + (myBoolean ? 1231 : 1237);
19 result = 31 * result + myChar;
Listing 9 shows a special example 20 return 31 * result + myString.hashCode();
which illustrates hashCode 21 }
implementation with primitive
Listing 9
© 2016, Marcus Biel, Software Craftsman
http://www.marcus-biel.com/
types in more detail. We see the same standard long field hashing being used. Floats are handled
differently in that they are converted to integer bits with a native helper function before being
accumulated into the hash. Double, having the same length as a long, is converted to a long with a
native helper function with the long being converted back and accumulated into the integer result.
Lines 17-18 look very different as they use some special syntax and two specifically chosen prime
constants, namely 1231 and 1237. Line 18 uses the ternary operator “?:” which works in a similar way
to an if-else statement, except that it is an expression that evaluates to a boolean. In this case, you
have myBoolean as the if condition where if true would return 1231, else which returns 1237. As for
1231 and 1237, these numbers are chosen for being large primes that minimize the number of hash
collisions. Lastly, char is basically a smaller integer number type, so it is trivial to accumulate it into the
hash.
equals
For primitive values, we simply use 1 @Override
the equality operator. We start with 2 public boolean equals(Object obj) {
the smallest possible values because 3 if (this == obj)
4 return true;
checking them has the highest chance 5 if (obj == null)
of making the code run faster across 6 return false;
multiple runs, i.e. we go from byte, 7 if (getClass() != obj.getClass())
8 return false;
to short, to int, to long and so on. 9
Floating types are a little more 10 SomeClass other = (SomeClass) obj;
complex to compare. float and 11
12 if (myByte != other.myByte)
double have to be converted to int 13 return false;
and long respectively before we 14 if (myShort != other.myShort)
compare them. booleans are trivial 15 return false;
16 if (myInt != other.myInt)
to compare, and we could have
17 return false;
placed them at the top for better 18 if (myLong != other.myLong)
performance. 19 return false;
20 if (Float.floatToIntBits(myFloat) !=
Thorough business analysis can reveal 21 Float.floatToIntBits(other.myFloat))
22 return false;
even better ways to arrange these 23 if (Double.doubleToLongBits(myDouble) !=
fields for performance. Again, 24 Double.doubleToLongBits(other.myDouble))
performance tests on live data are 25 return false;
26 if (myBoolean != other.myBoolean)
ultimately the best ways to eke out 27 return false;
the fastest possible runtimes when 28 if (myChar != other.myChar)
needed. Of course, all of these 29 return false;
30 return myString.equals(other.myString);
optimization techniques would only 31 }
be useful if performance were an
Listing 10
issue. Otherwise, the comparison
order probably doesn’t matter anyway.