Trove Library. Java primitive type collections

In the standard Java library, it is not possible to operate with collections of primitive types, such as int, long, etc. Standard output - use objects of classes Integer, Long, etc.

This approach works well on a small number of elements, because, firstly, during any operation, autoboxing / autounboxing occurs and, secondly, the collection stores references to objects in the heap. The objects in the heap not only add extra overhead in memory, but also put a strain on the GC .

There is one more unobvious minus of objects in heap - caching in modern processors. The processor loads data into the cache in blocks. In the case of sequential processing of the array, several elements are loaded into the cache at once. In the case of objects scattered in heap, cache hits will be less. Read more about caching and memory hierarchy here .
')
The Trove library provides a standard collection interface for working with primitive types. For most applications, Trove collections run faster and consume less memory.

The collection includes:

Unlike jdk, Trove's Hash uses the Open addressing method for resolving collisions.

The naming principle is T <Type> <CollectionType>.
For example:
Interface TIntList - List of int, implementation of TIntArrayList:

TIntList l = new TIntArrayList();

Interface TLongLongMap - Map with long keys and long values, implementation of TLongLongHashMap:

 TLongLongMap m = new TLongLongHashMap();

In jdk collections, if the item is not found, null is returned. In Trove, “NoEntryValue” is returned, the default is 0. To learn NoEntryValue , you can specify NoEntryValue when creating a collection.

A plus of the Trove collections are the processing methods - forEach,

  public static long troveEach() { final long [] rvalue = {0}; // troveMap - TLongLongHashMap // TLongProcedure      , //    false //     troveMap.forEachValue(new TLongProcedure() { @Override public boolean execute(long value) { rvalue[0] += value; return true; } }); return rvalue[0]; }

grep , inverseGrep - list creation by condition (for TList ...) and transformValues - inplace of a change operation on elements of a collection.

A useful feature is that in the case of a HashMap / HashSet with an object (an Object heir) as a key, you can use your Interface HashingStrategy <T> hash function.

For benchmarking, an excellent benchmark tool jmh was used .
It would be great if the developers published it in the maven repository.

The conclusion had to be slightly edited to preserve the formatting, one operation - 1 million elements ( full report here ):

 $ java -version java version "1.7.0_21" Java(TM) SE Runtime Environment (build 1.7.0_21-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) $ java -server -XX:+AggressiveOpts -Xms2048m \ -Xmx2048m -jar microbenchmarks.jar ".*Trove.*" -prof gc -i 3 -r 5s

 Benchmark Mode Thr Cnt Sec Mean Mean Error Units
 // Insert In List <Integer>
 IntListJdkInsert thrpt 1 3 5 104.950 6.756 ops / sec
 // Brute-force List <Integer>
 IntListJdkTraverse thrpt 1 3 5 774.100 71.809 ops / sec

 // Insert in TIntArrayList
 IntListTroveInsert thrpt 1 3 5 424.556 28.239 ops / sec
 // Brute force TIntArrayList
 IntListTroveTraverse thrpt 1 3 5 3548.806 7.712 ops / sec

 // Paste into HashMap <Long, Long>
 LongHashMapJdkInsert thrpt 1 3 5 24.683 1.994 ops / sec
 // search for all items in HashMap <Long, Long> by turns
 LongHashMapJdkSearch thrpt 1 3 5 67.789 1.119 ops / sec
 // full search of HashMap <Long, Long> values
 LongHashMapJdkTraverse thrpt 1 3 5 99.761 0.882 ops / sec

 // Paste to TLongLongMap
 LongHashMapTroveInsert thrpt 1 3 5 28.750 0.165 ops / sec
 // search all items in TLongLongMap by turns
 LongHashMapTroveSearch thrpt 1 3 5 145.933 0.416 ops / sec
 // exhaustive search of TLongLongMap values using forEach
 LongHashMapTroveTraverse thrpt 1 3 5 318.528 0.980 ops / sec

The amount of memory used is not so easy to calculate, but an indirect conclusion can be drawn from GC activity:

  jd  List<Integer>: Iteration 1 (5s in 1 thread): 103,950 ops/sec GC | wall time = 5,002 secs, GC time = 0,331 secs, GC% = 6,62%, GC count = +24  Trove  TIntArrayList: Iteration 1 (5s in 1 thread): 428,400 ops/sec GC | wall time = 5,002 secs, GC time = 0,019 secs, GC% = 0,38%, GC count = +32

If you look at the source code TIntArrayList, then it becomes clear from where the performance gains - the data is stored as an array

 public class TIntArrayList implements TIntList, Externalizable { static final long serialVersionUID = 1L; /** the data of the list */ protected int[] _data;

The search speed of TLongLongMap is explained by the use of forEach - no temporary objects are created and unboxing is excluded.

The same benchmark, but one operation - 1 thousand elements:

 Benchmark Mode Thr Cnt Sec Mean Mean Error Units
 IntListJdkInsert thrpt 1 3 5 239478.011 871.469 ops / sec
 IntListJdkTraverse thrpt 1 3 5 1326701.717 1649.389 ops / sec

 IntListTroveInsert thrpt 1 3 5 315562.594 2483.415 ops / sec
 IntListTroveTraverse thrpt 1 3 5 3630599.806 10822.903 ops / sec

 LongHashMapJdkInsert thrpt 1 3 5 45315.689 47.630 ops / sec
 LongHashMapJdkSearch thrpt 1 3 5 114759.789 424.996 ops / sec
 LongHashMapJdkTraverse thrpt 1 3 5 210012.550 139.001 ops / sec

 LongHashMapTroveInsert thrpt 1 3 5 33078.583 119.127 ops / sec
 LongHashMapTroveSearch thrpt 1 3 5 148311.567 267.613 ops / sec
 LongHashMapTroveTraverse thrpt 1 3 5 351955.550 901.902 ops / sec

It can be seen that when reducing the number of items in the collection, the performance gap is falling.

It can be concluded that if the collections are small and are used infrequently, then there is little point in including an additional relationship in the project.
But if the project is massively used large collections of primitive types, the use of Trove may be justified.

The project code is available on GitHub .

Ps. The values of the number of elements when creating collections were not set intentionally.

Source: https://habr.com/ru/post/187234/

All Articles

Trove Library. Java primitive type collections

More articles: