Accelerating the creation of ConcurrentReferenceHashMap

Greetings, in this post I will spread out how to speed up the creation of org.springframework.util.ConcurrentReferenceHashMap with insignificant efforts.

Interested in pumping performance? Welcome!

Intelligence service

We begin, of course, with measurements and try to understand what we are going to improve. To do this, take JMH 1.21, JDK 8 and JDK 11, as well as async-profiler .

To find out how much it takes to create an empty dictionary, let's put a simple experience:

 @Benchmark public Object original() { return new ConcurrentReferenceHashMap(); }

Profile looks like this:

 55.21% 2429743 osuConcurrentReferenceHashMap.calculateShift 20.30% 891404 osuConcurrentReferenceHashMap$Segment.<init> 8.79% 387198 osuConcurrentReferenceHashMap.<init> 3.35% 147651 java.util.concurrent.locks.ReentrantLock.<init> 2.34% 102804 java.lang.ref.ReferenceQueue.<init> 1.61% 70748 osuConcurrentReferenceHashMap.createReferenceManager 1.53% 67265 osuConcurrentReferenceHashMap$Segment.createReferenceArray 0.78% 34493 java.lang.ref.ReferenceQueue$Lock.<init> 0.76% 33546 osuConcurrentReferenceHashMap$ReferenceManager.<init> 0.36% 15948 osuAssert.isTrue

The direction is clear, you can proceed.

Maths

So, we spend the lion's share of the time in the method calculateShift . Here he is:

 protected static int calculateShift(int minimumValue, int maximumValue) { int shift = 0; int value = 1; while (value < minimumValue && value < maximumValue) { value <<= 1; shift++; } return shift; }

It’s hard to come up with something new, so let's switch to using it:

 public ConcurrentReferenceHashMap(/*...*/ int concurrencyLevel, /*...*/) { //... this.shift = calculateShift(concurrencyLevel, MAXIMUM_CONCURRENCY_LEVEL); //... } // ConcurrentReferenceHashMap$Segment public Segment(int initialCapacity) { this.referenceManager = createReferenceManager(); this.initialSize = 1 << calculateShift(initialCapacity, MAXIMUM_SEGMENT_SIZE); this.references = createReferenceArray(this.initialSize); this.resizeThreshold = (int) (this.references.length * getLoadFactor()); }

Note the use of the Segment constructor:

 int roundedUpSegmentCapacity = (int) ((initialCapacity + size - 1L) / size); //... for (int i = 0; i < this.segments.length; i++) { this.segments[i] = new Segment(roundedUpSegmentCapacity); }

The value roundedUpSegmentCapacity constantly in the loop, therefore the expression 1 << calculateShift(initialCapacity, MAXIMUM_SEGMENT_SIZE) , executed in the Segment constructor, will also always be constant. Thus, we can move the specified expression out of the constructor and the cycle.

The same statement is true for the expression (int) (this.references.length * getLoadFactor()) , because the references array is created using the variable initialCapacity and its size is constant when creating each segment. Bring the expression out of the constructor and the loop.

Arrays

Consider the createReferenceArray method:

 private Reference<K, V>[] createReferenceArray(int size) { return (Reference<K, V>[]) Array.newInstance(Reference.class, size); }

Using Array::newInstance obviously redundant, nothing prevents us from creating an array using a constructor:

 private Reference<K, V>[] createReferenceArray(int size) { return new Reference[size]; }

The constructor's performance is not inferior to the Array::newInstance call at the C2 level, but significantly exceeds it for small arrays in C1 modes ( -XX:TieredStopAtLevel=1 property -XX:TieredStopAtLevel=1 ) and interpreter ( -Xint property):

 //C2 length Mode Cnt Score Error Units constructor 10 avgt 50 5,6 ± 0,0 ns/op constructor 100 avgt 50 29,7 ± 0,1 ns/op constructor 1000 avgt 50 242,7 ± 1,3 ns/op newInstance 10 avgt 50 5,5 ± 0,0 ns/op newInstance 100 avgt 50 29,7 ± 0,1 ns/op newInstance 1000 avgt 50 249,3 ± 9,6 ns/op //C1 length Mode Cnt Score Error Units constructor 10 avgt 50 6,8 ± 0,1 ns/op constructor 100 avgt 50 36,3 ± 0,6 ns/op constructor 1000 avgt 50 358,6 ± 6,4 ns/op newInstance 10 avgt 50 91,0 ± 2,4 ns/op newInstance 100 avgt 50 127,2 ± 1,8 ns/op newInstance 1000 avgt 50 322,8 ± 7,2 ns/op //-Xint length Mode Cnt Score Error Units constructor 10 avgt 50 126,3 ± 5,9 ns/op constructor 100 avgt 50 154,7 ± 2,6 ns/op constructor 1000 avgt 50 364,2 ± 6,2 ns/op newInstance 10 avgt 50 251,2 ± 11,3 ns/op newInstance 100 avgt 50 287,5 ± 11,4 ns/op newInstance 1000 avgt 50 486,5 ± 8,5 ns/op

The replacement will not be reflected on our benchmark, but it will speed up the code when the application is started, when C2 has not yet worked. More about this mode will be discussed at the end of the article.

Key trivia

Refer again to the ConcurrentReferenceHashMap constructor.

 ConcurrentReferenceHashMap(/*...*/) { Assert.isTrue(initialCapacity >= 0, "Initial capacity must not be negative"); Assert.isTrue(loadFactor > 0f, "Load factor must be positive"); Assert.isTrue(concurrencyLevel > 0, "Concurrency level must be positive"); Assert.notNull(referenceType, "Reference type must not be null"); this.loadFactor = loadFactor; this.shift = calculateShift(concurrencyLevel, MAXIMUM_CONCURRENCY_LEVEL); int size = 1 << this.shift; this.referenceType = referenceType; int roundedUpSegmentCapacity = (int) ((initialCapacity + size - 1L) / size); this.segments = (Segment[]) Array.newInstance(Segment.class, size); for (int i = 0; i < this.segments.length; i++) { this.segments[i] = new Segment(roundedUpSegmentCapacity); } }

From curious for us: replacing Array.newInstance with a constructor leads to a compilation error, we pass by. But the cycle is very curious, more precisely, referring to the segments field. To make sure how destructive (sometimes) for performance such an appeal can be, I recommend the article by Nitzan Vakarta The volatile read suprise .

The case described in the article, it seems to me, corresponds with the code in question. Focus on the segments:

 this.segments = (Segment[]) Array.newInstance(Segment.class, size); for (int i = 0; i < this.segments.length; i++) { this.segments[i] = new Segment(roundedUpSegmentCapacity); }

Immediately after creating an array, it is recorded in the ConcurrentReferenceHashMap.segments field, and it is with this field that the loop interacts. Inside the Segment constructor, an entry occurs in the volatile references field:

 private volatile Reference<K, V>[] references; public Segment(int initialCapacity) { //... this.references = createReferenceArray(this.initialSize); //... }

This means that it is impossible to improve the access to the segments field, in other words, its contents are read at each turn of the cycle. How to check the truth of this statement? The simplest way is to copy the code into a separate package and remove volatile from the declaration of the Segment.references field:

 protected final class Segment extends ReentrantLock { //  private volatile Reference<K, V>[] references; //  private Reference<K, V>[] references; }

Check if something has changed:

 @Benchmark public Object original() { return new tsypanov.map.original.ConcurrentReferenceHashMap(); } @Benchmark public Object nonVolatileSegmentReferences() { return new tsypanov.map.nonvolatile.ConcurrentReferenceHashMap(); }

We detect significant performance gains (JDK 8):

 Benchmark Mode Cnt Score Error Units original avgt 100 732,1 ± 15,8 ns/op nonVolatileSegmentReferences avgt 100 610,6 ± 15,4 ns/op

On JDK 11, the elapsed time has decreased, but the relative gap has not changed much:

 Benchmark Mode Cnt Score Error Units original avgt 100 473,8 ± 11,2 ns/op nonVolatileSegmentReferences avgt 100 401,9 ± 15,5 ns/op

Of course, you need to return volatile to its place and look for another way. A bottleneck was found - this is an appeal to the field. And if so, then you can create the segments variable, fill the array, and only then write it into the field:

 Segment[] segments = (Segment[]) Array.newInstance(Segment.class, size); for (int i = 0; i < segments.length; i++) { segments[i] = new Segment(roundedUpSegmentCapacity); } this.segments = segments;

As a result, with the help of even such simple improvements, we managed to achieve quite good growth:

JDK 8

 Benchmark Mode Cnt Score Error Units originalConcurrentReferenceHashMap avgt 100 712,1 ± 7,2 ns/op patchedConcurrentReferenceHashMap avgt 100 496,5 ± 4,6 ns/op

JDK 11

 Benchmark Mode Cnt Score Error Units originalConcurrentReferenceHashMap avgt 100 536,0 ± 8,4 ns/op patchedConcurrentReferenceHashMap avgt 100 486,4 ± 9,3 ns/op

What gives the replacement of 'Arrays :: newInstance' to 'new T []'

When you start Spring Booth applications from Ideas, developers often set the flag 'Enable launch optimizations', which adds -XX:TieredStopAtLevel=1 -noverify to the VM arguments, which accelerates the launch by turning off profiling and C2. Let's make measurement with the specified arguments:

 // JDK 8 -XX:TieredStopAtLevel=1 -noverify Benchmark Mode Cnt Score Error Units originalConcurrentReferenceHashMap avgt 100 1920,9 ± 24,2 ns/op patchedConcurrentReferenceHashMap avgt 100 592,0 ± 25,4 ns/op // JDK 11 -XX:TieredStopAtLevel=1 -noverify Benchmark Mode Cnt Score Error Units originalConcurrentReferenceHashMap avgt 100 1838,9 ± 8,0 ns/op patchedConcurrentReferenceHashMap avgt 100 549,7 ± 6,7 ns/op

More than 3-fold increase!

What is it for?

In particular, this is needed to speed up queries that return projections to Spring Data JPA.

The JMC profile shows that creating a ConcurrentReferenceHashMap takes almost a fifth of the time it takes to execute a query.

 public interface SimpleEntityRepository extends JpaRepository<SimpleEntity, Long> { List<HasIdAndName> findAllByName(String name); }

where HasIdAndName is a view projection

 public interface HasIdAndName { int getId(); String getName(); }

Also, ConcurrentReferenceHashMap several dozen times in the Spring code, so there will definitely not be superfluous.

findings

improve performance is not as difficult as it seems at first glance
volatile access in the vicinity of the cycle - one of the possible bottlenecks
look for invariants and take them out of cycles

What to read

Article by Nitzan Waqarta

Code of examples

Changes:
https://github.com/spring-projects/spring-framework/pull/1873
https://github.com/spring-projects/spring-framework/pull/2051

Source: https://habr.com/ru/post/432824/

All Articles