## Objects, equals() and hash-values

No. 133

### Choosing a “good”hashCode() method

Q:

We consider:

public class TimePeriod {
private final int hours, minutes, seconds;
/**
* A time period within a day e.g. 4 hours, 23 minutes and 11 seconds.
*
* @param hours Hours ranging from 0 to 23
* @param minutes Minutes ranging from 0 to 59
* @param seconds Seconds ranging from 0 to 59
*/
public TimePeriod(final int hours, final int minutes, final int seconds) {
this.hours = hours;
this.minutes = minutes;
this.seconds = seconds;
}

@Override
public boolean equals(final Object o) {
if (o instanceof TimePeriod t) {
return hours == t.hours &&
minutes == t.minutes &&
seconds == t.seconds;
} else {
return false;
}
}
}

Which of the two hashCode() implementations is better with respect to quickly telling whether two TimePeriod instances are having the same value or not?

Method 1 Method 2
public class TimePeriod {
...
@Override
public int hashCode() {
return seconds + 60 * (minutes + 60 * hours);
}
}
public class TimePeriod {
...
@Override
public int hashCode() {
return seconds + minutes + hours;
}
}

### Tip

Excerpt from Object.hashCode():

However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

A:

According to Figure 412, “Hashing in Java and equals() part of a hashCode() methods contract is:

true == a.equals(b)a.hashCode() == b.hashCode().

A so called perfect hashCode() method in addition will return different values whenever two instances a and b differ in value with respect to the underlying equals() method:

false == a.equals(b)a.hashCode() != b.hashCode().

Combining these two statements a perfect hashCode() method will have the following property:

a.equals(b) == (a.hashCode() == b.hashCode())

Method 1 serves this purpose: Its returns a given TimePeriod instance into its total amount of seconds equivalent. Example: (1 hour, 2 minutes, 5 seconds) amounts to (1 * 60 + 2) * 60 + 5 = 3725 seconds. Due to its specification an instance of TimePeriod cannot exceed 24 hours being equivalent to 24 * 60 * 60 = 86400 seconds. Thus a 4-byte int is safe with respect to overflow issues.

Conclusion: Whenever two instances of TimePeriod differ their method 1 hashCode() values will differ as well.

This does not hold with respect to method 2. Consider two instances (1 hour, 2 minutes, 3 seconds) vs. (3 hours, 2 minutes, 1 second) returning an identical hashCode() of 1 + 2 + 3 = 3 + 2 + 1 = 6. So method 2 requiring just two additions offers (slightly) better runtime performance at the expense of a higher hash value collision rate.

### Note

Perfect hash functions are rare with respect to real world modeling problems. In the current example Timeperiod instances are limited by 24 hours, 59 minutes and 59 seconds. This limit is equal to 89999 seconds fitting well into the count of $2 32$ different int values.

No. 134

### String and good hashCode() implementations.

Q:

In the previous exercise we found a perfect hashCode() implementation:

public class TimePeriod {
...
@Override
public int hashCode() {
return seconds + 60 * (minutes + 60 * hours);
}
}

Does a perfect hash function exist for String instances as well?

### Tip

Consider the possible number of different strings.

A:

It is not possible to construct a perfect hashCode() method acting on strings. A Java String consists of individual char elements each requiring two bytes representing $2 16$ different characters. Depending on a string's length we have:

String length (number of characters) Number of different strings
1 $2 16$
2 $2 2 × 16$
3 $2 3 × 16$

Thus considering just the union of zero (empty), one- and two character strings we have $1 + 2 16 + 2 32$ possibilities exceeding the $2 32$ count of different int values. Thus hash value clashes are inevitable.

The JDK's String.hashCode() implementation already reveals conflicts for ASCII strings of length 2:

Code Execution result
System.out.println("hashcode of Aa: " + "Aa".hashCode());
System.out.println("hashcode of BB: " + "BB".hashCode());
hashcode of Aa: 2112
hashcode of BB: 2112