Trove4j: high-performance collections

Prehistory

When I was a 3rd year student, I had the opportunity to program a numerical method as part of a laboratory workshop. It was implemented (in Java), moreover - it even gave the correct result, but there was one problem: it worked too slowly. It was necessary to slightly increase the dimension, and that was all - you could go and ~~beat the thumbs~~ to do other important things.

The method was iterative, and at each iteration the old values were updated. Profiling has suggested that working with associative arrays like Map <Integer, Double> takes the lion's share of the time.

The first thought that arose: “let me translate everything into clean arrays” - was dropped almost immediately. The option "rewrite everything in C" did not even consider. Therefore, I began to look for ready-made solutions to work with collections of primitives. Found two: Trove4j and Apache Common Primitives .
')
I chose the first option. The author declares an increase in performance and a decrease in memory consumption.

Install

So, having come to the link indicated above, and having downloaded the assembly and javadocs, I began to study.

The source code is generated from templates using the included generator and ant-script. It looks reasonable, because allows you to make changes to the code only once, and the changes will affect all relevant classes.

Get acquainted with the rules of naming

Trove4j uses specific class naming rules.
The first letter goes T - according to the author, to indicate that this is part of the Trove library.
Next comes the type name: Int, Double, Object, etc.
Then the name of the collection: ArrayList, HashSet, HashMap.
Examples: TByteArrayList , TIntHashSet , TFloatObjectHashMap , TDoubleDoubleHashMap .
THashSet and THashMap stand apart.

Looking for differences from JDK Collections

The author has provided a fairly flexible mechanism for managing hashing tables for hash tables. There is a default strategy, but if necessary it can be replaced with its implementation. Although, to tell the truth, I have not met situations when it was necessary.

The most important difference from the JDK Collections Framework is that in Trove4j all hash tables are built on arrays, not on linked lists. This saves a significant amount of memory by eliminating links between elements.

“JDK hashtables. It eliminates the need for a wise algorithm. The number of tables used is in the li li li. It is not a good idea to use the trove sets for the unused "values" array. ”

Moving from JDK Collections to Trove4j

THashSet implements Set, so its implementation is the least painful. The rest of the hash tables and all associative arrays do not implement the corresponding interfaces from the JDK, however, in the gnu.trove.decorator package, you can find an exhaustive set of wrappers for them that match these classes with the interfaces.

We get the result

After rewriting the labs using Trove4j, the calculation speed has increased markedly. Well, the laba was safely delivered in a couple of days.

Subsequent application

Trove4j was implemented in the working draft: it quickly got accustomed to the simulation module that performs a huge amount of computational operations, becoming its indispensable tool.
The transition from HashSet and HashMap to their counterparts has reduced the amount of memory consumed by a third. Work speed has also increased.

PS
I wanted to publish in JAVA, but not enough karma.

Source: https://habr.com/ru/post/88629/

All Articles