One rainy evening, I thought about memory management in Java and how to effectively use the Java collection. I did a simple experiment, how many entries can I insert a map with 16 GB of RAM?
The purpose of this experiment is to study the internal memory costs of managing collections. Therefore, I decided to use small keys and small values. All tests were conducted on 64-bit Linux Kubuntu 12.04. JVM 64bit Oracle Java 1.7.0_09-b05 with HotSpot 23.5-b02. Compressed pointers (-XX: + UseCompressedOops) are enabled by default on this JVM.
The first test with java.util.TreeMap. Inserts a number into the map; it works until the memory runs out. JVM parameters for this test-Xmx15G
import java.util. *;
Map m = new TreeMap ();
for (long counter = 0 ;; counter ++) {
m.put (counter, "");
if (counter% 1,000,000 == 0) System.out.println ("" + counter);
}
')
This example has ended with 172 million. Toward the end, the process slowed down due to the aggressive activities of the garbage collector. On the second run, I replaced TreeMap with a HashMap, it ended with 182 million.
By default, Java collections are not super efficient. So let's try more memory-optimized: I chose LongHashMap from MapDB, which uses primitive long keys and is optimized to have a small amount of memory. JVM settings again -Xmx15G
import org.mapdb. *
LongMap m = new LongHashMap ();
for (long counter = 0 ;; counter ++) {
m.put (counter, "");
if (counter% 1,000,000 == 0) System.out.println ("" + counter);
}
This time, the meter stopped at 276 million records. Again, near the end, the process has slowed down due to the aggressive activities of the garbage collector.
It seems that this is the limit for dynamic collections, garbage collection brings additional costs.
It is time to roll out a real weapon :-). We can always get away from the heap where the garbage collector will not see our data. Let me introduce you to MapDB, it provides TreeMap and HashMap with database support. It supports various storage modes, including the option that is not in dynamic memory.
So let's run the previous example, but now the Map without dynamic memory. First, it is a few lines to set up and open a database, direct memory access with transactions turned off. The next line creates a new Map in the database.
import org.mapdb. *
DB db = DBMaker
.newDirectMemoryDB ()
.transactionDisable ()
.make ();
Map m = db.getTreeMap ("test");
for (long counter = 0 ;; counter ++) {
m.put (counter, "");
if (counter% 1,000,000 == 0) System.out.println ("" + counter);
}
This Map is not in dynamic memory, so we need different JVM settings: -XX: MaxDirectMemorySize = 15G -Xmx128M. Memory has ended on 980 millions.
But MapDB can be done better. The problem in the previous example is fragmentation, the tree node (b-tree) changes its size on each insert. The solution is to lash the nodes of the tree before they are inserted. This reduces fragmentation when writing to a minimum. change DB configuration:
DB db = DBMaker
.newDirectMemoryDB ()
.transactionDisable ()
.asyncFlushDelay (100)
.make ();
Map m = db.getTreeMap ("test");
Memory ended on 1.738 million records, after 31 minutes.
MapDB can be made even better by increasing the size of the node in the tree from 32 to 120 entries and enable transparent compression:
DB db = DBMaker
.newDirectMemoryDB ()
.transactionDisable ()
.asyncFlushDelay (100)
.compressionEnable ()
.make ();
Map m = db.createTreeMap (“test”, 120, false, null, null, null);
This example ends the memory with 3.315 million entries. This is slower due to compression, but it still completes in a few hours. I could probably do some optimizations (special serializers) and increase the number of entries, somewhere around 4 billion.
Maybe ask how all these records fit in there. Answer delta-key compression. Also, inserting ordered keys into B-Tree is the best scenario and MapDB is slightly optimized for it. The worst case scenario is inserting keys in random order.
delta-key is the default compression for all examples. In this example, I activated Zlib compression.
DB db = DBMaker
.newDirectMemoryDB ()
.transactionDisable ()
.asyncFlushDelay (100)
.make ();
Map m = db.getTreeMap ("test");
Random r = new Random ();
for (long counter = 0 ;; counter ++) {
m.put (r.nextLong (), "");
if (counter% 1,000,000 == 0) System.out.println ("" + counter);
}
But even with a random order, MapDB will be able to store 651 million records, almost 4 times more than on the basis of normal dynamic collections.
This little exercise doesn't have many goals. This is just one of the ways to optimize MapDB. Most surprisingly, the insertion speed was excellent and MapDB can compete with collections in memory.
github.com/jankotek/jdbm3