📜 ⬆️ ⬇️

[Translation] When to use parallel streams

A source
Authors: Doug Lea with Brian Goetz, Paul Sandoz, Aleksey Shipilev, Heinz Kabutz, Joe Bowbeer, ...

The java.util.streams framework contains data-driven operations on collections and other data sources. Most stream methods perform the same operation on each of the elements. Using the parallelStream() collection method, if there are several cores, data-driven can be turned into data-parallel . But when is it worth doing?


Consider using S.parallelStream().operation(F) instead of S.stream().operation(F) , provided that the operations are independent of each other, and either costly in terms of computation, or applied to a large number of elements effectively split. (splittable) data structures, or both. More precisely:



The stream processing framework will not (and cannot) insist on any of the above. If the calculations are dependent on each other, then their parallel execution does not make sense, or will be harmful at all and will lead to errors. Other criteria derived from the above engineering constraints (issues) and tradeoffs (tradeoffs) criteria include:



Obtaining the exact characteristics of these effects can be difficult (although, if you try, and doable using tools like JMH ). But the cumulative effect is fairly easy to notice. To experience it yourself - conduct an experiment. For example, on a 32-core test machine, when small functions are started, like max() or sum() , the break-even point is approximately 10,000 over an ArrayList . For more elements, acceleration is noted up to 20 times. Working time for collections with less than 10,000 items is not much less than for 10,000, and therefore slower than sequential processing. The worst result occurs with less than 100 elements - in this case, the involved threads stop doing nothing useful, because calculations are completed before they start. On the other hand, when operations on elements are time-consuming, in the case of using efficiently and fully split collections, such as ArrayList , the benefit is immediately visible.


To paraphrase all of the above, the use of parallel() in the case of an unnecessarily small amount of computation can cost about 100 microseconds, and the use otherwise should save at least this time itself (or perhaps the clock for very large tasks). The specific cost and benefits will vary over time and for different platforms, and, also, depending on the context. For example, the launch of small computations in parallel within a sequential cycle enhances the effect of rises and falls (performance micro tests in which this manifests itself may not reflect the real situation).


Questions and answers



She might try, but too often the decision would be wrong. The search for fully automatic multi-core concurrency has not led to a universal solution over the past thirty years, and therefore, the framework uses a more robust approach, requiring the user to only choose between yes or no . This choice is based on engineering problems that are constantly encountered in sequential programming, which are unlikely to completely disappear ever. For example, you may encounter a 100-fold slowdown when searching for the maximum value in a collection containing a single element by comparison using this value directly (without a collection). Sometimes the JVM can optimize such cases for you. But this rarely happens in sequential cases, and never in the case of parallel mode. On the other hand, it can be expected that, as it develops, the tools will help users make better decisions.



This, too, is similar to problems often encountered in sequential programming. For example, the S.contains(x) method of the Collection class usually executes quickly if S is a HashSet , slowly if LinkedList , and moderately in other cases. Usually, for the author of a component using a collection, the best way out of this situation is to encapsulate it and publish only a specific operation on it. Then users will be isolated from the need to choose. The same applies to parallel operations. For example, a component with an internal price collection can define a method that checks its size to reach the limit, which will make sense until elementwise calculations become too expensive. Example:


 public long getMaxPrice() { return priceStream().max(); } private Stream priceStream() { return (prices.size() < MIN_PAR) ? prices.stream() : prices.parallelStream(); } 

This idea can be extended to other considerations about when and how to use parallelism.



At one extreme are functions that do not meet the independence criteria, including successive I / O operations in nature, access to blocking synchronized resources, and cases where an error in one parallel subtask that performs I / O affects others. Their parallelization does not make much sense. On the other hand, there are calculations that occasionally perform I / O or rarely blocked synchronization (for example, most logging cases, and using such competitive collections as ConcurrentHashMap ). They are harmless. What lies between them requires more research. If each subtask can block for a significant amount of time while waiting for I / O or access, CPU resources will be idle without being able to be used by the program or the JVM. From such a bad all. In these cases, parallel stream processing is not always the right choice. But there are good alternatives - for example, asynchronous I / O and the CompletableFuture approach.



Currently, I / O using JDK Stream s generators (for example, BufferedReader.lines() ) are mainly adapted for use in sequential mode, processing elements one by one as they arrive. Support for high-performance mass (bulk) processing of buffered I / O is possible, but, at the moment, this requires the development of special generators Stream s, Spliterator s and Collector . Support for some common cases may be added in future JDK releases.



Machines usually have a fixed number of cores, and cannot magically create new ones when performing parallel operations. However, as long as the criteria for choosing a parallel mode are clearly spoken for , there is nothing to doubt. Your parallel tasks will compete for others with the CPU and you will notice less acceleration. In most cases, it is still more effective than other alternatives. The underlying mechanism is designed so that if there are no available cores, you will notice only a slight slowdown in comparison with the sequential option, unless the system is so overloaded that it spends all its time switching the context instead of doing some real work, or configured with the expectation that all processing is performed sequentially. If you have such a system, then perhaps the administrator has already disabled the use of multi-threading / kernel in the JVM settings. And if you are the system administrator yourself, then it makes sense to do it.



Yes. At least to some extent. But it is necessary to take into account that the stream-framework takes into account the limitations of sources and methods when choosing how to do it. In general, the less constraints, the greater the potential for concurrency. On the other hand, there are no guarantees that the framework will identify and apply all available opportunities for parallelism. In some cases, if you have the time and expertise, your own solution can make better use of the possibilities of parallelism.



If you follow these tips, it is usually enough to make sense. Predictability is not the strength of modern hardware and systems, and therefore there is no universal answer. The cache locality, GC characteristics, JIT compilation, memory access conflicts, data location, OS dispatch policies, and the presence of the hypervisor are some of the factors that have significant influence. The performance of the sequential mode is also affected by them, which, when using parallelism, is often exacerbated: a problem causing a 10 percent difference in the case of sequential execution can lead to a 10-fold difference in parallel processing.


Stream-framework includes some features that help increase the chances of acceleration. For example, using specialization for primitives, such as IntStream , usually has a greater effect for parallel mode than for serial mode. The reason is that in this case not only the consumption of resources (and memory) is reduced, but also the locality of the cache is improved. Using ConcurrentHashMap instead of HashMap , in the case of the parallel operation of the collect operation, reduces internal costs. New tips and recommendations will appear as you gain experience with the framework.



We don't want to tell you what to do. The appearance for programmers of new ways to do something wrong may be scary. Errors in code, architecture, and evaluations will of course occur. Decades ago, some people predicted that having concurrency at the application level would lead to more trouble. But it never came true.


')

Source: https://habr.com/ru/post/420805/


All Articles