Where the hashCode legs grow

Again in Java interviews are asked about hashCode and equals ? And who among the interviewees will answer the question how the Object.hashCode() and System.identityHashCode() calculated? How expensive is calling these methods? How can they be accelerated in HotSpot JVM? I bet hardly anyone will give the correct answer. Is that who reads this article.

There is a common misconception that Object.hashCode returns the address of an object in memory. Once upon a time, it probably was. For example, Dalvik VM still uses an object address that is shifted 3 bits to the right. However, such an implementation is unsuccessful: first, successively allocated objects will have consecutive hash codes; secondly, the garbage collector can move objects in memory by changing their addresses.

It so happened that last week I was twice confronted with the topic of calculating the hash code, which prompted me to write a note. At first I caught the eye of an extremely unfairly amused comment . It is SSSurkv that quite correctly suggested that a random number generator is used to calculate Object.hashCode .
- How so? - you ask. - After all, the object hash code must remain constant throughout the life of the application.

That's right. The embedded hash code is generated only once for each object during the first call to the hashCode() method, and then stored in the object header for subsequent calls. But for the first time it uses random! Convince yourself by looking at the OpenJDK sources ( get_next_hash function).
')
Probably, I would have forgotten about this case if I had not recently encountered a real problem in a real project. Profiling the application, among the hot methods, I suddenly saw IdentityHashMap.put() , which, in my opinion, is implemented quite effectively. The bottleneck turned out to be System.identityHashCode() , on which IdentityHashMap relies. And only the first identityHashCode call on the object was slow. The second and subsequent calls, as we now know, take the stored value from the header.

But every cloud has a silver lining. The fact is that in HotSpot you can choose the implementation of Object.hashCode using the command line switch -XX:hashCode=n (where n is from 0 to 5).
0 - Park-Miller RNG (default)
1 - f (address, global_state)
2 - constant 1
3 - sequential counter
4 - object address
5 - Thread-local Xorshift
The most adequate, in my opinion, is the last one - it gives a good uniform distribution using only bit operations, and, which is important for competitive algorithms, does not touch global variables.
So, just by adding the JVM -XX:hashCode=5 key -XX:hashCode=5 , I magically accelerated my algorithm by 30%! Why this option has not yet been defaulted remains a mystery ...

Finally, a funny fact: Hotspot hashCode never returns 0, because 0 is considered a sign that the hash code for this object has not yet been generated:

 if (value == 0) value = 0xBAD ;

I hope now that you have learned the whole truth about hashCode , you can not only surprise your colleagues at the interview, but also make your algorithms more efficient.

Source: https://habr.com/ru/post/165683/

All Articles

Where the hashCode legs grow

More articles: