tryAdvance
, trySplit
, estimateSize
and characteristics
. There are also special modifications of the splitter for the primitive types int
, long
and double
: they add a few additional methods to avoid boxing. A splitter is like an ordinary iterator. The main difference - the ability to split (split) into two parts - underlies the parallel operation of threads. Also, in order to optimize, the splitter has a number of characteristic flags and can report exactly or approximately its size. Finally, the splitter never modifies a data source: it does not have a remove
method like an iterator. Consider the methods in more detail:
hasNext()
and next()
iterator methods. If the splitter has the following element, it must call the passed function with this element and return true
, otherwise the function will not be called and return false
.trySplit()
method has the legal right not to share and return null
(try not there by chance). This is usually done when there is little data left in the current splitter (say, only one element).HashSet
does not have this characteristic, because the order of the data in HashSet
depends on the implementation. The absence of this characteristic will automatically transfer the parallel stream to an unordered mode, so that it can work faster. Since there was no order in the data source, then you can not follow it further.Set
or stream after the distinct()
operation distinct()
creates a splitter with this characteristic. For example, the operation distinct()
on the stream from Set
will not be performed at all and, therefore, it will not take too much time.getComparator()
method, returning a sorting comparator or null for " natural order ". Sorted collections (for example, a TreeSet
) create a splitter with this characteristic, and with it the streaming operation sorted()
can be skipped.map()
or sorted()
) it is saved, and after others (say, filter()
or distinct()
) it is lost. It is useful for sorting or, say, the operation toArray()
: you can pre-allocate an array of the desired size, rather than guess how many elements you need.ArrayList
, because when dividing it simply breaks the range of values ​​into two ranges of known length. But HashSet
will not return it, because it breaks the hash table, for which it is not known how many elements are contained in each half. Accordingly, child splitters will no longer return SIZED.null
. This characteristic is returned, for example, by the splitter created by ConcurrentSkipListSet
: you cannot put null
in this data structure. It is also returned by all splitters created on primitive types.Collections.singletonList()
, because this list cannot be changed.java.util.concurrent
. If the splitter does not have the characteristics of IMMUTABLE and CONCURRENT, then it would be good to make it work in fail-fast mode so that it throws a ConcurrentModificationException
if it notes that the source has changed.HashSet
and divide it with trySplit()
, estimateSize()
will return half of the original size of the collection, although the actual number of elements in half of the hash table may be different. If there are an infinite number of items, or count them too Long.MAX_VALUE
, you can return Long.MAX_VALUE
.NodeList
. There is no standard such method, but it is easy to write it without additional splitters:
public class XmlStream { static Stream<Node> of(NodeList list) { return IntStream.range(0, list.getLength()).mapToObj(list::item); } }
org.json.JSONArray
), which can quickly return the length and the element by its ordinal number.
trySplit
, you'd better not write the splitter at all. Here is one friend writing the protonpack library, completely ignoring the existence of parallel threads. He wrote a lot of splitters that don’t know how to share at all. A splitter that does not share at all is a bad, worthless splitter. Do not do this. In this case, it is better to write an ordinary iterator and create a splitter on it using the Spliterators.spliterator methods or, if you do not know the size of the collection, then Spliterators.spliteratorUnknownSize . These methods have at least some sort of heuristics for division: they bypass a part of the iterator by reading it into an array and creating a new splitter for this array. If the stream continues with a lengthy operation, then parallelization will still speed up the work.
Iterable
interface or Collection
, then you get a free default
spliterator()
. It is worth, of course, to see if it can be improved. One way or another, you rarely need to write your splitters. This can be useful if you are developing your data structure (for example, a collection on primitives, as leventov does).
Map.Entry
with the same key and value type, etc.), we will give it to the user : let him decide how to combine the two values. That is, we want to create a method with the following signature:
public static <T, R> Stream<R> pairMap(Stream<T> stream, BiFunction<T, T, R> mapper) {...}
Map.Entry
:
public static <T> Stream<Map.Entry<T, T>> pairs(Stream<T> stream) { return pairMap(stream, AbstractMap.SimpleImmutableEntry<T, T>::new); }
public static Stream<Integer> diff(Stream<Integer> stream) { return pairMap(stream, (a, b) -> b - a); }
pairMap
look like a normal intermediate ( intermediate ) operation, that is, in fact, no calculations should be performed until it reaches the terminal operation. To do this, you need to take the spliterator
at the input stream, but do nothing with it until we are asked. Another small, but important thing: when you close a new stream through close()
you must close the original stream. As a result, our method may look like this:
public static <T, R> Stream<R> pairMap(Stream<T> stream, BiFunction<T, T, R> mapper) { return StreamSupport.stream(new PairSpliterator<>(mapper, stream.spliterator()), stream.isParallel()).onClose(stream::close); }
spliterator()
method becomes “used”, you cannot cook any more porridge with it. But this is normal: this is the case with all intermediate threads when you add a new operation. The Stream.concat () method, which merges two streams, looks about the same. It remains to write myself PairSpliterator
.
characteristics()
method. Some of the characteristics are inherited from the original splitter, but it is necessary to reset NONNULL, DISTINCT and SORTED: we cannot guarantee these characteristics after applying an arbitrary mapper function:
public int characteristics() { return source.characteristics() & (SIZED | SUBSIZED | CONCURRENT | IMMUTABLE | ORDERED); }
tryAdvance
method should be fairly simple. It is only necessary to read the data from the source splitter, memorizing the previous element in the intermediate buffer, and call the mapper
for the last pair. One has only to remember whether we are at the beginning of the stream or not. For this, the boolean hasPrev
variable is hasPrev
, indicating whether we have a previous value.
trySplit
method trySplit
best implemented by calling trySplit
on the source splitter. The main difficulty here is to process a pair at the junction of two separated pieces of the original stream. This pair should be processed by a splitter that bypasses the first half. Accordingly, he must keep the first value from the second half and, when he gets to the end, work again, submitting it to the mapper
along with his last value.
class PairSpliterator<T, R> implements Spliterator<R> { Spliterator<T> source; boolean hasLast, hasPrev; private T cur; private final T last; private final BiFunction<T, T, R> mapper; public PairSpliterator(BiFunction<T, T, R> mapper, Spliterator<T> source) { this(mapper, source, null, false, null, false); } public PairSpliterator(BiFunction<T, T, R> mapper, Spliterator<T> source, T prev, boolean hasPrev, T last, boolean hasLast) { this.source = source; // this.hasLast = hasLast; // ( ) this.hasPrev = hasPrev; // this.cur = prev; // this.last = last; // this.mapper = mapper; } // ... }
tryAdvance
method (instead of lambda for transmission to the original tryAdvance
use the reference to the setter):
void setCur(T t) { cur = t; } @Override public boolean tryAdvance(Consumer<? super R> action) { if (!hasPrev) { // : if (!source.tryAdvance(this::setCur)) { return false; // — } hasPrev = true; } T prev = cur; // if (!source.tryAdvance(this::setCur)) { // if (!hasLast) return false; // — hasLast = false; // cur = last; } action.accept(mapper.apply(prev, cur)); // action mapper' return true; }
trySplit()
method:
public Spliterator<R> trySplit() { Spliterator<T> prefixSource = source.trySplit(); // if (prefixSource == null) return null; // — T prev = cur; // , if (!source.tryAdvance(this::setCur)) { // source = prefixSource; // — return null; } boolean oldHasPrev = hasPrev; hasPrev = true; // , return new PairSpliterator<>(mapper, prefixSource, prev, oldHasPrev, cur, true); }
estimateSize()
: if the source splitter is able to estimate its size, you just need to check the flags and tweak it one way or the other:
public long estimateSize() { long size = source.estimateSize(); if (size == Long.MAX_VALUE) // — return size; if (hasLast) // size++; if (!hasPrev && size > 0) // size--; return size; }
pairMap
is not a static method.
Integer
numbers, leave only those that are smaller than the number following them, and save the result to a new list. The task itself is very simple, so a significant portion of the time will go to the overhead. You can offer two naive implementations: through an iterator (will work with any collection) and through access by item number (will work only with a list with quick random access). Here is an option with an iterator (naiveIterator):
List<Integer> result = new ArrayList<>(); Integer last = null; for (Integer cur : input) { if (last != null && last < cur) result.add(last); last = cur; }
List<Integer> result = new ArrayList<>(); for (int i = 0; i < input.size() - 1; i++) { Integer cur = input.get(i), next = input.get(i + 1); if (cur < next) result.add(cur); }
List<Integer> result = StreamEx.of(input).pairMap((a, b) -> a < b ? a : null).nonNull().toList();
List<Integer> result = IntStream.range(0, input.size() - 1).filter(i -> input.get(i) < input.get(i + 1)).mapToObj(input::get) .collect(Collectors.toList());
List<Integer> result = new ArrayList<>(); input.stream().reduce((a, b) -> { if (a < b) result.add(a); return b; });
public static Collector<Integer, ?, List<Integer>> collectPrecedingValues() { int[] holder = { Integer.MAX_VALUE }; return Collector.of(ArrayList::new, (l, elem) -> { if (holder[0] < elem) l.add(holder[0]); holder[0] = elem; }, (l1, l2) -> { throw new UnsupportedOperationException("Don't run in parallel"); }); } List<Integer> result = input.stream().collect(collectPrecedingValues());
Algorithm | n = 10,000 | n = 100,000 | n = 1,000,000 |
---|---|---|---|
naiveIterator | 97.7 | 904.0 | 10592.7 |
naiveGet | 99.8 | 1084.4 | 11424.2 |
collector | 112.5 | 1404.9 | 14387.2 |
reduce | 112.1 | 1139.5 | 12001.5 |
stream | 146.4 | 1624.1 | 16600.9 |
streamEx | 115.2 | 1247.1 | 12967.0 |
streamParallel | 56.9 | 582.3 | 6120.5 |
streamExParallel | 53.4 | 516.7 | 5353.4 |
Source: https://habr.com/ru/post/256905/