The ability to take a snapshot (or dump) of the memory of a Java virtual machine is a tool whose value is difficult to overestimate. The dump file contains copies of all Java objects that were in memory at the time of the snapshot. The file format is well known, and there are many tools that can work with it.
In my practice, analyzing JVM dumps more than once helped find the causes of complex problems.
However, dumps are different. This time in front of me - a dump of 150 GB in size. My task is to analyze the problem identified in the process that became the source of this dump.
')
The application in which I am looking for a problem is a hybrid of a DBMS and a continuous data processing system. All data is stored in memory in the form of Java objects, so the size of the "heap" can reach impressive sizes (personal record - 400 GB).
I usually use JVisualVM to work with small dumps. But I suppose that a dump of this size is not in the teeth of either JVisualVM, Eclipse Memory Analyzer, or other profilers (although I did not try). Even copying a file of this size from the server to a local disk is already a problem.
When analyzing dumps in JVisualVM, I often resorted to using JavaScript to programmatically analyze the object graph. Graphic tools are good, but scrolling through millions of objects is not the most pleasant thing. It is much more pleasant to explore the object graph using code, not a mouse.
A JVM dump is just a serialized object graph; my task is to extract specific information from this graph. I don’t really need a beautiful user interface: API for working with object graphs programmatically - this is the tool that I really need.
How to programmatically analyze the heap dump?
I began my research with a NetBeans profiler. NetBeans is open source and has a visual heap dump analyzer (the same code is used in JVisualVM). The code for working with the JVM dump is a separate module, and the API it provides is well suited for writing your own specialized analysis algorithms.
However, the NetBeans Dump Analyzer has a fundamental limitation. The library uses a temporary file to build an auxiliary index on the dump. The size of the index file is usually around 25% of the dump size. But the most important thing is that it takes time to build this file, and any query on the object graph is possible only after the index is built.
Having studied the code responsible for working with the dump, I decided that I could get rid of the need for a temporary file using a more compact index structure that can be stored in memory. My
fork of the NetBeans profiler code-based
library is available on GitHub. Some API functions do not work with a compact implementation of the index (for example, bypassing backlinks), but they are not very necessary for my tasks.
Another major change from the original library was the addition of HeapPath notation.
HeapPath is an expression language for describing paths in an object graph; it borrows some ideas from XPath. It is useful both as a universal predicate language in graph traversal mechanisms, and as a simple tool for extracting data from an object dump. HeapPath automatically converts strings, primitives, and some other simple types from JVM dump structures to regular Java objects.
This library has been very useful in our daily work. One of the ways to use it was a tool that analyzes memory usage in our product (a hybrid of a DBMS and a continuous data processing system), which automatically analyzes the size of auxiliary structures across all relational transformation nodes (the number of which can be measured in hundreds).
Of course, for an interactive “free search” API + Java is not the best tool. However, it gives me the opportunity to do my job, and the dump size of 150 GB leaves no choice.
Several iterations with Java coding and script launches, analysis of the results - and after a couple of hours I know
what exactly broke in our data structures. This is the end of the work with the dump, now we need to look for the problem in the code.
By the way: One 150-GB “heap” pass takes about 5 minutes. A real analysis usually requires several passes, but even so, the processing time is acceptable.
In conclusion, I would like to give examples of using my library for less exotic software.
On GitHub, there are examples on how to analyze the dumps of a JBoss server.