Objects, equals() and hash-values

Figure 421. Hashing principle Slide presentation Create comment in forum
Hashing principle
I want the 12p one

Figure 422. Quickly identify by simple value Slide presentation Create comment in forum
  • Where is the blond haired guy?

  • I take the pink flower.

  • The 334.50$ cellular phone.


Figure 423. Hashing in Java and equals() Slide presentation Create comment in forum

Method hashCode(): Instance 0 ⇨ o.hashCode(), of type int.

  • Same value on repeated invocation

  • Objects with identical value with respect to equals() must have identical hash values:

    true == a.equals(b)a.hashCode() == b.hashCode().

  • Conversely: Two instances differing with respect to equals() may have identical hash values.

Consequence: equals() and hashCode() must be redefined simultaneously!


Figure 424. Rectangle equals(...) and hashCode() Slide presentation Create comment in forum
public class Rectangle {
    int width, height;
    @Override public boolean equals(Object o) {
        if (o instanceof  Rectangle) {
            return width == ((Rectangle) o).width
                    && height == ((Rectangle) o).height;
        } else {
            return false;
        }
    }
    @Override public int hashCode() {
        return width + height;
    }
}

Figure 425. Rectangle hash values Slide presentation Create comment in forum
width height hash value
1 3 4
2 2 4
5 5 10
2 7 9
4 9 13

Figure 426. Better hashCode() method Slide presentation Create comment in forum
public class Rectangle {
    int width, height;
...
    @Override public int hashCode() {
        return width + 13 * height;
    }
}

exercise No. 126

Choosing a good hashCode() method Create comment in forum

Q:

We consider:

public class TimePeriod {
  private final int hours, minutes, seconds;
  /**
   * A time period within a day e.g. 4 hours, 23 minutes and 11 seconds.
   *
   * @param hours Hours ranging from 0 to 23
   * @param minutes Minutes ranging from 0 to 59
   * @param seconds Seconds ranging from 0 to 59
   */
  public TimePeriod(final int hours, final int minutes, final int seconds) {
    this.hours = hours;
    this.minutes = minutes;
    this.seconds = seconds;
  }

  @Override
  public boolean equals(final Object o) {
    if (o instanceof TimePeriod) {
      final TimePeriod other = (TimePeriod) o;
      return hours == other.hours &&
          minutes == other.minutes &&
          seconds == other.seconds;
    } else {
      return false;
    }
  }
}

Which of the two hashCode() implementations is better with respect to quickly telling whether two TimePeriod instances are having the same value or not?

Method 1 Method 2
public class TimePeriod {
...
  @Override
  public int hashCode() {
    return seconds + 60 * (minutes + 60 * hours);
  }
}
public class TimePeriod {
...
  @Override 
  public int hashCode() {
    return seconds + minutes + hours;
  }
}

Tip

Excerpt from Object.hashCode():

However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

A:

According to Figure 423, “Hashing in Java and equals() part of a hashCode() methods contract is:

true == a.equals(b)a.hashCode() == b.hashCode().

An ideal/perfect hashCode() method in addition will return different values whenever two instances a and b differ in value with respect to the underlying equals() method:

false == a.equals(b)a.hashCode() != b.hashCode().

Combining these two statements a perfect hashCode() method will have the following property with respect to its corresponding equals() method:

a.equals(b) == (a.hashCode() == b.hashCode())

Method 1 serves this purpose: Its returns a given TimePeriod instance into its total amount of seconds equivalent. Example: (1 hour, 2 minutes, 5 seconds) amounts to (1 * 60 + 2) * 60 + 5 = 3725 seconds. Due to its specification an instance of TimePeriod cannot exceed 24 hours being equivalent to 24 * 60 * 60 = 86400 seconds. Thus a 4-byte int is safe with respect to overflow issues.

Conclusion: Whenever two instances of TimePeriod differ their method 1 hashCode() values will differ as well.

This does not hold with respect to method 2. Consider two instances (1 hour, 2 minutes, 3 seconds) vs. (3 hours, 2 minutes, 1 second) returning an identical hashCode() of 1 + 2 + 3 = 3 + 2 + 1 = 6. So method 2 requiring just two additions offers (slightly) better runtime performance at the expense of a higher hash value collision rate.

exercise No. 127

String and good hashCode() implementations. Create comment in forum

Q:

In the previous exercise we found an ideal hashCode() implementation:

public class TimePeriod {
...
  @Override
  public int hashCode() {
    return seconds + 60 * (minutes + 60 * hours);
  }
}

Is this possible for instances of String as well?

Tip

Consider the possible number of different strings.

A:

It is not possible to construct a perfect hashCode() method acting on arbitrary strings. A Java String consists of individual char elements each requiring two bytes. Considering strings of fixed length we have the following number of different strings:

Number of chars Number of possible strings
1 2 16
2 2 2 × 16
3 2 3 × 16

A four byte int only offers 2 32 different values. Thus even mapping just one- and two-Unicode character strings exceeds the number of different int values thus requiring different string instances being mapped to identical hash values. Consider for example:

Code Execution result
System.out.println("hashcode of AA: " + "Aa".hashCode());
System.out.println("hashcode of BB: " + "BB".hashCode());
hashcode of AA: 2112
hashcode of BB: 2112