📜 ⬆️ ⬇️

Understanding garbage collection and memory leak detection in Node.js

Bad press reviews about Node.js often refer to performance issues. This does not mean that Node.js has more problems than with other technologies. Just the user must keep in mind some features of its work. Although the technology has a flat learning curve, the mechanisms ensuring its operation are quite complex. You need to understand them to prevent performance errors. And if something goes wrong, you need to know how to quickly put everything in order. In this article, Daniel Hahn talks about how Node.js manages memory and how to track down memory-related problems.



Unlike platforms like PHP, applications on Node.js are long-term processes. There are a number of positive aspects - for example, the ability to connect to the database once and use this connection for all queries. But this feature can create problems. First, let's take a look at the basics of Node.js.


Real Austrian garbage collector
')
Node.js is a C ++ program controlled by a V8 JavaScript engine

Google V8 is an engine that was originally written for Google Chrome, but could also be used autonomously. Therefore, it is ideal for Node.js and is, in fact, the only part of the platform that "understands" JavaScript. V8 compiles javascript into machine code and executes it. During execution, the engine controls the allocation and cleaning of memory as needed. This means that when it comes to memory management in Node.js, we are in fact talking about V8.

Here you can see a simple example of how to use V8 from the point of view of C ++.

V8 memory circuit

A running program can always be represented after a certain amount of space allocated in memory. This place is called Resident Set. V8 uses a scheme similar to the Java Virtual Machine scheme, and divides the memory into segments:

Code: the currently executing code.
Stack: contains all primitive types of values ​​(like integer or Boolean) with pointers that refer to objects on the heap and define the control flow of the program.
Heap: a memory segment for storing reference types like objects, strings, and closures.


V8 memory circuit

In Node.js, current memory usage data can be obtained by calling process.memoryUsage ().

The function returns an object containing:



This function can be used to record memory usage over time and plot a graph that displays how the V8 manages memory.


Node.js memory usage over time

We see that the graph of used space in the heap is extremely unstable, but always remains within certain limits in order to keep the value of average consumption constant. The process that allocates and frees memory is called garbage collection.

Introduction to Garbage Collection

Each program that consumes memory requires a reservation and freeing mechanism. In C and C ++, this function is performed by the malloc () and free () commands, as shown in the example below:

char * buffer; buffer = (char*) malloc (42); // Do something with buffer free (buffer); 


We see that the programmer is responsible for freeing unused memory. If the program only allocates memory and does not free it, the heap will grow until the memory used is exhausted, which will cause the program to crash. We call it a memory leak.

As we already know, in Node.js, JavaScript is compiled into native code using V8. The data structures obtained after compilation cannot do anything with their original representation and are simply controlled using V8. This means that we cannot actively allocate and clear memory in JavaScript. V8 uses to solve this problem a well-known mechanism - garbage collection.

The principle of garbage collection is quite simple: if no one refers to a memory segment, we can assume that it is not used and clean it up. However, the process of obtaining and storing this information is rather complicated, since the code may contain chain references and redirections that form a complex graph structure.


Count Heap. A red object can only be deleted if it is no longer referenced.

Garbage collection is a rather expensive process because it interrupts the execution of the application, which naturally affects performance. To remedy this situation, V8 uses 2 types of garbage collection:



An excellent post containing very detailed information about garbage collection can be found at this link.

Now, looking at the graph obtained using process.memoryUsage (), you can easily distinguish between different types of garbage collection: a pattern resembling saw teeth, notes the work of Scavenge, falling down - Mark-Sweep.

Using the built-in node-gc-profiler module, you can get even more information about the work of the garbage collector. The module subscribes to garbage collector events and translates them into JavaScript.

The returned object indicates the type of garbage collection and the duration. Again, this data can be easily displayed graphically to make it clearer how things work.


Duration and frequency of launches of the garbage collector

You can clearly see that Scavenge runs much more often than Mark-Sweep. Depending on the complexity of the application, the duration may vary. It is noteworthy that on this graph you can see frequent and short-term launches of Mark-Sweep, the function of which is not clear to me yet.

When something goes wrong

If the garbage collector cleans the memory, why should we worry? In fact, memory leaks can easily occur in your logs.


Memory Leak Exception

Using the previously created schedule, we can observe how the memory is constantly clogged!


Memory leak progress

The garbage collector is doing everything possible to free up memory. But with each launch, we see that memory consumption is constantly increasing, and this is a clear sign of a memory leak. Since we know how to accurately detect a memory leak, let's see what needs to be done to trigger it.

We create a memory leak

Some leaks are obvious - like storing data in global variables (for example, folding the IP addresses of all logged-in users into an array). Others are not so noticeable - for example, a well-known memory leak from Walmart due to the omission of a small expression in the Node.js core code, which took weeks to find the source.

I am not going to look at errors in the kernel code here. Let's just take a look at a hard-to-find leak in the code from the Meteor blog, which you can easily admit in your code.


Entering an error in your JavaScript code

At first glance it looks fine. One would think that theThing is overwritten with every call to replaceThing (). The problem is that someMethod has its own private scope as context. This means that someMethod () knows about unused () and, even if unused () is never called, this fact will prevent the garbage collector from freeing memory from originalThing. Just because there are too many indirect calls. This is not a bug, but can lead to memory leaks that are difficult to track down.

True, it would be great if you could look in a bunch and see what is there now? Fortunately, there is such an opportunity! V8 allows you to dump heaps at the current moment, and V8-profiler allows you to use this functionality for JavaScript.

 /** * Simple userland heapdump generator using v8-profiler * Usage: require('[path_to]/HeapDump').init('datadir') * * @module HeapDump * @type {exports} */ var fs = require('fs'); var profiler = require('v8-profiler'); var _datadir = null; var nextMBThreshold = 0; /** * Init and scheule heap dump runs * * @param datadir Folder to save the data to */ module.exports.init = function (datadir) { _datadir = datadir; setInterval(tickHeapDump, 500); }; /** * Schedule a heapdump by the end of next tick */ function tickHeapDump() { setImmediate(function () { heapDump(); }); } /** * Creates a heap dump if the currently memory threshold is exceeded */ function heapDump() { var memMB = process.memoryUsage().rss / 1048576; console.log(memMB + '>' + nextMBThreshold); if (memMB > nextMBThreshold) { console.log('Current memory usage: %j', process.memoryUsage()); nextMBThreshold += 50; var snap = profiler.takeSnapshot('profile'); saveHeapSnapshot(snap, _datadir); } } /** * Saves a given snapshot * * @param snapshot Snapshot object * @param datadir Location to save to */ function saveHeapSnapshot(snapshot, datadir) { var buffer = ''; var stamp = Date.now(); snapshot.serialize( function iterator(data, length) { buffer += data; }, function complete() { var name = stamp + '.heapsnapshot'; fs.writeFile(datadir + '/' + name , buffer, function () { console.log('Heap snapshot written to ' + name); }); } ); } 


This simple module creates a heap dump file if memory usage is constantly increasing. Yes, there are much more complex approaches to defining anomalies, but for our purposes this will be enough. In the event of a memory leak, you may have many such files. So you need to closely monitor this and add the ability to alert this module. The same functionality for working with the heap dump is provided by Chrome, and you can use Chrome Developer Tools to analyze the dumps of the V8-profiler.


Chrome Developer Tools

One heap dump may not help, because you will not see how the heap changes over time. Therefore, Chrome Developer Tools allows you to compare different files. Comparing 2 dumps, we get a delta of values, which shows which structures increase between two dumps:


A comparison of the dumps shows our leakage.

Here we see our problem. The variable that contains a string of asterisks and is called longStr is referenced by the originalThing, referenced by some method that is referenced ... I think you understand. This is a long series of nested references and closure contexts that do not allow to clear longStr. Although this example leads to obvious results, the process is always the same:



Finally

As you can see, the garbage collection process is quite complex, and even valid code can cause memory leaks. Using the built-in V8 functionality along with Chrome Developer Tools, you can understand what causes memory leaks and, if you embed this functionality in your application, have everything you need to solve a similar problem when it occurs.

One question remains: how can I fix a leak? The answer is simple: just add theThing = null; at the end of the function, and you are saved.

Source: https://habr.com/ru/post/277129/


All Articles