Home > Java, NetBeans, open source, UI > Java: Creating correct equals and hashCode methods

Java: Creating correct equals and hashCode methods


Equals and hashcode

Generating correct equals and hashCode methods is hard. There’s an entire chapter or two devoted to it in Joshua Bloch’s Effective Java, a definitive tome for Java developers. Getting these two methods correct is important if you’re going to be using your domain objects within Java collections, particularly hash maps.

As manually generating equals and hashCode methods is difficult and error prone, there are a few different techniques for helping the developer with the process.

We will be using the following simple class to explore this issue:

public class Point {
    private int x;
    private int y;

    public Point(int x, int y) {
      this.x = x;
      this.y = y;
    }

    public int getX() {
      return x;
    }

    public int getY() {
      return y;
    }
}

Library based solution

There is at least one library that supports the generation of equals and hashCode methods, and that is the excellent Apache Commons Lang. There is a good explanation of how to use it here, but here’s a condensed version, from the javadocs with my annotation added in the comments:

// Nick: An example using reflection to determine all of the fields of the object;
// easiest to use but as it uses reflection it will be slower than manually
// including the fields
public boolean equals(Object obj) {
   return EqualsBuilder.reflectionEquals(this, obj);
}

// Nick: An example explicitly choosing which fields to include within the EqualsBuilder
// Note that it still requires some knowledge of creating correct equals methods, so it's not as idiot proof as the previous method
public boolean equals(Object obj) {
  if (obj instanceof MyClass == false) {
    return false;
  }
  if (this == obj) {
    return true;
  }
  MyClass rhs = (MyClass) obj;
  return new EqualsBuilder()
                .appendSuper(super.equals(obj))
                .append(field1, rhs.field1)
                .append(field2, rhs.field2)
                .append(field3, rhs.field3)
                .isEquals();
 }

The HashCodeBuilder works similarly:

public class Person {
   String name;
   int age;
   boolean isSmoker;
   ...



   public int hashCode() {
     // you pick a hard-coded, randomly chosen, non-zero, odd number
     // ideally different for each class
     return new HashCodeBuilder(17, 37).
       append(name).
       append(age).
       append(smoker).
       toHashCode();
   }

   // Nick: Alternatively, for the lazy:
   public int hashCode() {
      return HashCodeBuilder.reflectionHashCode(this);
   }

}

Using the library is a good approach, but it also introduces a dependency that may not be otherwise necessary. There are a lot of good classes in Apache Commons Lang, but if all you are using it for is the EqualsBuilder and ToStringBuilder, you’re probably better off avoiding the dependency. In this case, you can make your IDE do the heavy lifting for you.

IDE based code generation

Given that IDEs like NetBeans and Eclipse do such a good job of automatically creating things like getters/setters, constructors, etc., it’s no surprise that they can be used to generate equals/hashCode methods as well. Unfortunately, they are not perfect, which prompted me to write this post in the first place.

I will be focusing on NetBean’s implementation of the equals/hashCode code generation as of version 6.9 (the most recent version).

When you are in NetBeans and press Ctrl+I, the IDE provides a popup menu with options for methods that it can automatically generate for you.

Generate options in NetBeans 6.9

When you choose the equals() and hashCode() option, you are presented with the following screen (where the variables will differ depending on your class, obviously).

equals() and hashCode() generation dialog

After checking all of the checkboxes and pressing generate, the IDE inserts the following two snippets of code:

@Override
public boolean equals(Object obj) {
    if (obj == null) {
        return false;
    }
    if (getClass() != obj.getClass()) {
        return false;
    }
    final Point other = (Point) obj;
    if (this.x != other.x) {
        return false;
    }
    if (this.y != other.y) {
        return false;
    }
    return true;
}

@Override
public int hashCode() {
    int hash = 3;
    hash = 97 * hash + this.x;
    hash = 97 * hash + this.y;
    return hash;
}

Perfect. Great. The IDE has done all the work for you. It’s definitely more verbose than the Apache Commons solution, but at least there are no dependencies introduced into your code. If you change your class so as to introduce more variables you wish to consider for equality and hashCode, you should delete the generated methods and regenerate them.

While this is functional, there are two main problems I have with this dialog:
* Multiple checkboxes
* No linkage between equals/hashCode

I will address each in turn

Multiple checkboxes

There is no means for enabling or disabling all of the fields. Any time there are (potentially) a lot of checkboxes, you should give the user the option to toggle them all at once. You can that the NetBeans designers did just this in the Generate Getters and Setters dialog in NetBeans 6.9.

Generate getter / setters

Here you can see a checkbox next to the Point class name which toggles all of the children nodes’ checkboxes (all of the variables). This is pretty standard UI stuff; here is this pattern at work in GMail and Google Docs.

GMail's select/deselect options Google Doc's select/deselect

This is not the end of the world, as the dialog does support keyboard navigation and toggling of the check boxes via the space bar. It is a bizarre UI feature though, as there is absolutely no indication as to which of the two panes has focus, and thus which checkbox you’re about to toggle. By the fact that I’m familiar with focus traversal, I intuited that tab would shift the focus between the panes but there’s no way a novice would know that and no indication of this. In the following screenshot, note that it’s impossible to tell whether I’m about to toggle the x or the y variable.

What will happen when I press space?

Lack of coupling between the equals/hashCode methods

Usually coupling is considered a bad thing in programming. However, when creating an equals and hashCode methods, it’s vital that the same fields be used in the construction of both methods. For instance, if you use a variable x and y to create the equals methods, you should use exactly the variables x and y while constructing the hashCode method.

Why?

This post from bytes.com does a good job of explaining this:

Overriding the hashCode method.

The contract for the equals method should really have another line saying you must proceed to override the hashCode method after overriding the equals method. The hashCode method is supported for the benefit of hash based collections.

The contract

Again from the specs:

  • Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

So equal objects must have equal hashCodes. An easy way to ensure that this condition is always satisfied is to use the same attributes used in determining equality in determining the hashCode. You should now see why it is important to override hashCode every time you override equals.

That sentence from the last paragraph sums it up: “An easy way to ensure that this condition is always satisfied is to use the same attributes used in determining equality in determining the hashCode”. Thus it’s clear that the dialog should provide a linkage between the equals/hashCode columns such that toggling the row of one column toggles the corresponding row. Otherwise you can create situations that are guaranteed to violate the contract of equals/hashCode, nullifying the entire point of having the IDE generate these methods for you.

For instance, see the following screen shot:
Violation of contract, allowed by the GUI

The dialog will allow you to continue, blithely creating the erroneous methods, only to manifest itself as subtle bugs later, with no warning. Either the dialog should force you to choose the variables in tandem, or at the very least it should offer a warning that choosing mismatching variables for the equals and hashCode methods can introduce bugs into the program.

Conclusion

I’ve investigated two ways of freeing the developer from the burden of implementing a correct version of equals and hashCode, through the use of Apache Commons Lang and NetBeans IDE. I’ve also detailed problems in the UI design of the dialogs presented for the generation of these two methods from NetBeans.

EDIT:
Thanks to Daniel for bringing Eclipse’s dialog to my attention. Eclipse's dialog
As you can see, they do not separate out the equals/hashCode, which makes a lot more sense to me.

  1. August 27, 2010 at 9:40 pm

    A+ would read again

  2. Daniel
    August 30, 2010 at 1:26 pm

    Another suggestion would be to use a static analysis tool like find bugs which lists the lack of either hashCode / equals as a hard error. It isn’t smart enough to detect the bug you explicitly caused in NB and frankly I’m surprised they even allow one to customize different fields for has/equals. Eclipse uses a single list selection when creating its hash/equals which should be the safest and most performance acceptable code for most cases. For the rare occasions where there’s some pre-existing bias to the grouping of the data or for CPU / memory efficiency on immutable’s, the hashCode() could be made differently, but that’s an exception, and not the rule.

    • i82much
      August 30, 2010 at 2:36 pm

      Good point about the static analysis. Thanks for tip on Eclipse – I’ve updated the post with a screenshot from that program’s dialog

  3. November 11, 2010 at 6:06 am

    good post, bookmarking it for future reference.

    Thanks,
    Goutham

  4. November 28, 2012 at 10:59 pm

    In addition Intellij IDEA also has Generate equals() and hashCode() too 🙂

  5. Amir
    May 15, 2015 at 12:58 am

    Hi Nick,

    Thanks for posting this subject. I’m just wondering what’s the point here? Because the rule of thumb is that we always have to generate equals() and hashCode() for each field. Basically what you have brought here is problems of NetBeans UI when generating such methods. Which is bad UI design since you are able to generate equals() for field variable x but then hashCode() for variable y, which really brings the bugs and other stuff.

    So what I really recommend and what you also have stated, always use equals() and hashCode() for each field and not one field with equals() and other field with hashCode().

    Thanks

    • i82much
      May 26, 2015 at 9:50 pm

      We are in agreement. The point is that the bad user interface makes it easy to violate the contract of equals() and hardCore(). Not much else to say

  6. Charlie Reitzel
    July 16, 2015 at 2:31 pm

    Ime, in most cases, only the unique identifier field(s) should be included in hashCode() and equals(). In fact, using non-identifier or any changing fields in either method can cause problems! Any developer using the Java Collections API should have a good feel for what the identifiers are for any collection. If not, it’s time to stop and think a bit until you do!

    For example, it is not uncommon for an object to remain in a collection for some extended period of time (e.g. more than 1 request in a web app). For example, if there are references to an object in a HashSet and the non-identifier fields have changed. Later on, when an attempt is made to remove the object from the collection, it will never be found because the hash code will now point to a different bucket in the hash table!

    Another very common mistake is to make the hash code somehow based on prime numbers or even to perform bit mixing on the values.

    First, the JRE has not used modulo of a prime number as the index of the hash bucket for a number of major versions (at least not since Java 5, if not earlier). Instead, the # of hash buckets is always a power of 2 and the lower N bits of the hash code are used as the bucket index.

    Second, before taking the lower N bits, the JRE itself does a good job of mixing the bits of the hashCode() value supplied by the application. So there is no need to do it twice! Just return the simplest value that is unique-ish. If you have an integral ID value, just use it as is. If you have a String or Object ID value, just return its hashCode(). Keep it simple and fast.

    Finally, as per the spec, it is not always required to include all fields in the hashCode() that are used by equals(). But the reverse is true: all fields used in hashCode() MUST be used in equals() – equals() MUST be a true superset of hashCode().

    Not so hard.

  7. i82much
    July 17, 2015 at 8:42 am

    @Charlie Reitzel, thanks for the informative reply. In general I’ve switched to programming with immutable objects so the point you raise about including mutable fields in your equals/hashcode applies less to me. But it’s definitely a good point. I’ve seen bugs in production from people putting objects in a set and then mutating it through some other reference.

  1. December 11, 2010 at 3:02 pm
  2. July 17, 2012 at 5:17 am

Leave a comment