📜 ⬆️ ⬇️

Changes to String. Java 7

Hello. The recent events in Ukraine somehow threw me away from Habr, but now, everything, more or less, has improved and, returning to the usual rhythm of work, I remembered a couple of my posts in the draft. In connection with the release of the 8th version of Java, the post may already be somewhat outdated, but do not lose the good.
So, one evening, while optimizing another piece of code, I accidentally glanced at the String and found that the class of the string is not the same. Since the string is probably one of the most common types, I think many will be interested to learn about the changes.

Optimized String.split () method

The split line method has become faster for a single-character parameter. Now, the method will not use regexp at all and will apply indexOf in a loop.
It was :
public String[] split(String regex, int limit) { return Pattern.compile(regex).split(this, limit); } 

It became :
 public String[] split(String regex, int limit) { if (((regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) || ...)) { ... while ((next = indexOf(ch, off)) != -1) { ... } ... return result; } return Pattern.compile(regex).split(this, limit); } 


2 fields removed

Starting from the 6th update of the 7th Java, 2 fields were removed from the line class:
 private int offset; private int count; 

As you probably remember, these fields were used when calling the substring method. The purpose of the fields is to reduce the complexity of the method and try to avoid creating a new array of characters of a string using a link to an already existing array. That, in turn, in some situations could cause a known memory leak . Now, the string size is 8 bytes smaller and the leakage problem is solved forever.
')

New hash algorithm

 private transient int hash32 = 0; int hash32() { int h = hash32; if (0 == h) { // harmless data race on hash32 here. h = sun.misc.Hashing.murmur3_32(HASHING_SEED, value, 0, value.length); // ensure result is not zero to avoid recalcing h = (0 != h) ? h : 1; hash32 = h; } return h; } 

Instead of 2 remote fields, 1 new integer appeared - hash32. Intended for storing a new string hash. A new hash is used, for example, in hashmap :
 transient int hashSeed = useAltHashing ? sun.misc.Hashing.randomHashSeed(this) : 0; final int hash(Object k) { int h = hashSeed; if (0 != h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } ... } 

A new hashing algorithm should improve the distribution of hashes for strings (I could not find out what it is specifically better than the existing one, someone can tell in the comments). The new hash function is disabled by default and to enable it you will need the “jdk.map.althashing.threshold” option. However, shortly after the release of the 6th update, it turned out that in a highly competitive multithreaded environment, due to the sun.misc.Hashing.randomHashSeed () method, the creation of the hashmap set is much slower than before the update, since the randomHashSeed method uses inside Random, which in its the queue is based on AtomicLong, which caused performance problems.
Starting from the update 40, the bug has already been fixed.

Java 8 Update

As I was told, in the 8th java, the new hashing algorithm was removed.

More optimization for split

It so happened that most developers rarely look into the internal methods of standard classes. However, inside you can find a lot of room for optimization. It also happened with the split line method. For critical sections of code, instead of:
 someString.split("[_,;,-]"); 

can do:
 private static final Pattern PATTERN = Pattern.compile("[_,;,-]"); PATTERN.split(someString); 

And get about a twofold performance boost of the split method.

Source: https://habr.com/ru/post/218961/


All Articles