Hello. The recent events in Ukraine somehow threw me away from Habr, but now, everything, more or less, has improved and, returning to the usual rhythm of work, I remembered a couple of my posts in the draft. In connection with the release of the 8th version of Java, the post may already be somewhat outdated, but do not lose the good.
So, one evening, while optimizing another piece of code, I accidentally glanced at the String and found that the class of the string is not the same. Since the string is probably one of the most common types, I think many will be interested to learn about the changes.
Optimized String.split () method
The split line method has become faster for a single-character parameter. Now, the method will not use regexp at all and will apply indexOf in a loop.
It was :
public String[] split(String regex, int limit) { return Pattern.compile(regex).split(this, limit); }
It became :
public String[] split(String regex, int limit) { if (((regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) || ...)) { ... while ((next = indexOf(ch, off)) != -1) { ... } ... return result; } return Pattern.compile(regex).split(this, limit); }
2 fields removed
Starting from the 6th update of the 7th Java, 2 fields were removed from the line class:
private int offset; private int count;
As you probably
remember, these fields were used when calling the substring method. The purpose of the fields is to reduce the complexity of the method and try to avoid creating a new array of characters of a string using a link to an already existing array. That, in turn, in some situations could cause a
known memory leak . Now, the string size is 8 bytes smaller and the leakage problem is solved forever.
')
New hash algorithm
private transient int hash32 = 0; int hash32() { int h = hash32; if (0 == h) {
Instead of 2 remote fields, 1 new integer appeared - hash32. Intended for storing a new string hash. A new hash is used,
for example, in hashmap :
transient int hashSeed = useAltHashing ? sun.misc.Hashing.randomHashSeed(this) : 0; final int hash(Object k) { int h = hashSeed; if (0 != h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } ... }
A new hashing algorithm should improve the distribution of hashes for strings (I could not find out what it is specifically better than the existing one, someone can tell in the comments). The new hash function is disabled by default and to enable it you will need the “jdk.map.althashing.threshold” option. However, shortly after the release of the 6th update, it turned out that in a highly competitive multithreaded environment, due to the sun.misc.Hashing.randomHashSeed () method, the creation of the hashmap set
is much slower than before the update, since the randomHashSeed method uses inside Random, which in its the queue is based on AtomicLong, which caused performance problems.
Starting from the update 40, the bug has already been fixed.
Java 8 Update
As I was told, in the 8th java, the new hashing algorithm was removed.
More optimization for split
It so happened that most developers rarely look into the internal methods of standard classes. However, inside you can find a lot of room for optimization. It also happened with the split line method. For critical sections of code, instead of:
someString.split("[_,;,-]");
can do:
private static final Pattern PATTERN = Pattern.compile("[_,;,-]"); PATTERN.split(someString);
And get about a twofold performance boost of the split method.