📜 ⬆️ ⬇️

What should be the size of a thread pool?

In our article Stream API & ForkJoinPool, we already talked about the possibility of changing the size of the thread pool, which we can use in parallel handlers using the Stream API or Fork Join. I hope this information came in handy when, when you were in the position of Senior Java Developer, you were able to increase the performance of the system you developed by changing the default pool size. Since our courses , in general, are sharpened on the transition a step above from junior and middle above, a part of the program is based on the basic questions asked during interviews. One of which is: “You have an application. And there is a task using the Stream API or Fork Join, which can be parallelized. Under what conditions can you find it reasonable to change the default size of the thread pool? What size will you offer in this case? ”

You can try to answer this question yourself before reading further to test your own readiness for such an interview at the moment.

To back up the theoretical reasoning with real numbers, we suggest driving a small benchmark for the standard Arrays.parallelSort () method, which implements a kind of merge sort algorithm, and executed on ForkJoinPool.commonPool (). Run this algorithm on the same large array with different commonPool sizes and analyze the results.
')


The machine on which the benchmark was running has 4 cores and the Windows operating system that is not the most suitable for server applications, overloaded with background services, so the results show quite a significant variation, which nevertheless does not hurt to see the essence of things.

The benchmark is written using JMH and looks like this:

``` package ru.klimakov; import org.openjdk.jmh.annotations.*; import java.util.Arrays; import java.util.Random; import java.util.concurrent.TimeUnit; @Fork(1) @Warmup(iterations = 10) @Measurement(iterations = 5) @BenchmarkMode(Mode.AverageTime ) @OutputTimeUnit(TimeUnit.MICROSECONDS) public class MyBenchmark { @State(Scope.Benchmark) public static class BenchmarkState { public static final int SEED = 42; public static final int ARRAY_LENGTH = 1_000_000; public static final int BOUND = 100_000; volatile long[] array; @Setup public void initState() { Random random = new Random(SEED); this.array = new long[ARRAY_LENGTH]; for (int i = 0; i < this.array.length; i++) { this.array[i] = random.nextInt(BOUND); } } } @Benchmark public long[] defaultParallelSort(BenchmarkState state) { Arrays.parallelSort(state.array); return state.array; } @Benchmark public long[] twoThreadsParallelSort(BenchmarkState state) { System.setProperty( "java.util.concurrent.ForkJoinPool.common.parallelism", "2"); Arrays.parallelSort(state.array); return state.array; } @Benchmark public long[] threeThreadsParallelSort(BenchmarkState state) { System.setProperty( "java.util.concurrent.ForkJoinPool.common.parallelism", "3"); Arrays.parallelSort(state.array); return state.array; } @Benchmark public long[] fourThreadsParallelSort(BenchmarkState state) { System.setProperty( "java.util.concurrent.ForkJoinPool.common.parallelism", "4"); Arrays.parallelSort(state.array); return state.array; } @Benchmark public long[] fiveThreadsParallelSort(BenchmarkState state) { System.setProperty( "java.util.concurrent.ForkJoinPool.common.parallelism", "5"); Arrays.parallelSort(state.array); return state.array; } @Benchmark public long[] tooLargePoolParallelSort(BenchmarkState state) { System.setProperty( "java.util.concurrent.ForkJoinPool.common.parallelism", "128"); Arrays.parallelSort(state.array); return state.array; } @Benchmark public long[] singleThreadParallelSort(BenchmarkState state) { System.setProperty( "java.util.concurrent.ForkJoinPool.common.parallelism", "1"); Arrays.parallelSort(state.array); return state.array; } @Benchmark public long[] serialSort(BenchmarkState state) { Arrays.sort(state.array); return state.array; } } ``` 

Having executed this code, we got the following final results (in fact, several runs were performed, the numbers were slightly different, but the overall picture remained unchanged):



So, the winner was fourThreadsParallelSort. In this test, we set the size of the pool equal to the number of cores on my machine, and therefore one more than the default pool. Nevertheless, the fact of winning this micro benchmark does not mean that you should determine in your application the size of commonPool equal to the number of cores, and not use less for one worker, as JDK developers suggest. We would suggest accepting their point of view and for most applications never changing the size of commonPool, however, if we want to squeeze the maximum out of the machine and are ready to prove that the game is worth the more common CPU utilization with more general macro benchmarks, setting commonPool equal to the number of cores. Do not forget that system threads, such as the Garbage Collector, will compete with your pool for CPU time and the operating system scheduler will interrupt your threads by making context switches.

defaultParallelSort - was in second place. Pool size - 3 threads. During execution, a CPU load of about 70% was observed, in contrast to 90% in the previous case. Consequently, the reasoning that the main program thread from which we call parallel sorting becomes a full-fledged worker of the join join problem turned out to be wrong.

fiveThreadsParallelSort showed us how we begin to lose efficiency by setting the size of the pool only one more than the number of cores. This test lost a little to the default, while the utilization of the CPU was about 95%.

tooLargePoolParallelSort - showed that if we continue to increase the size of the pool (in this case, up to 128 threads), we will continue to lose efficiency and utilize the processor even more closely (about 100%).

twoThreadsParallelSort - turned out to be in the last place in the rating, and demonstrated that if we simply without any reasons leave one of the cores out of work, then the result will be mediocre.

You may ask, but what about serialSort and singleThreadParallelSort? And nothing. They simply cannot be compared with the other options, so there’s no merge sort inside at all, but a completely different algorithm, DualPivotQuickSort, and on different input data can show results both inferior to the tested algorithm and superior to them. We can say more, on most random arrays that were considered, DualPivotQuickSort significantly overtook Arrays.parallelSort, and it was just lucky to generate this time an array in which the parallelSort showed the best results. Therefore, we decided to include this particular array in the benchmarks so that you would not think that parallelSort never makes sense to use at all. Sometimes it has, but do not forget to check it and prove it with good tests.

Finally, let's try to answer the question: “In what cases can it be reasonable to change the size of a commonPool?”. We have already considered one case: if we want to dispose of CPUs more densely, then it may make sense to set the size of commonPool equal to the number of cores, and not one less than the default value. Another case may be the situation when others are running simultaneously with our application, and we want to manually divide the processor cores between applications (although it seems that a docker for these purposes would be more appropriate). And finally, it may turn out that the tasks that we really want to place in the Fork Join Pool are not clean enough. If these tasks allow themselves not only to calculate and read / write to RAM, but also to sleep, wait, or read / write to a blocking input / output source. In this case, it is not recommended to put such tasks in commonPool, because they will occupy the precious flow from the pool, the flow will go to standby mode, and the kernel which according to the idea would have to be disposed of by this flow will become idle. Therefore, for such a little sleepy tasks, it is better to create a separate, custom ForkJoinPool.

THE END

Questions and suggestions, as always, are welcome here and at the open door . We are waiting, sir.

Source: https://habr.com/ru/post/342040/


All Articles