Hello colleagues.
Our long search for timeless bestsellers on optimizing the code so far only gives the first results, but we are ready to please you, that literally just finished the translation of the legendary book of Ben Watson "
Writing High Performance .NET Code ". In stores - approximately in April, watch for advertising.
And today we offer you to read a purely practical article on the most pressing types of RAM leaks, which
Nelson Elhage wrote from
Stripe .
So, you have a program, the execution of which is spent the further - the more time. It is probably not difficult for you to understand that this is a sure sign of a leak in the memory.
However, what exactly do we mean by “memory leak”? In my experience, obvious leaks in memory are divided into three main categories, each of which is characterized by a particular behavior, and for debugging each category, special tools and techniques are needed. In this article I want to describe all three classes and suggest how to properly recognize, with
which class you are dealing with and how to find a leak.
')
Type (1): unreachable fragment of memory allocated
This is a classic memory leak in C / C ++. Someone allocated memory using
new
or
malloc
, and never called
free
or
delete
to free up memory at the end of working with it.
void leak_memory() { char *leaked = malloc(4096); use_a_buffer(leaked); }
How to determine that a leak belongs to this category- If you write in C or C ++, especially in C ++ without the ubiquitous use of smart pointers to control the lifetime of memory segments, then this is the first option we’ll consider.
- If the program is executed in an environment with garbage collection, then it is possible that a leak of this type is triggered by a native code extension , however, you must first eliminate leaks of types (2) and (3).
How to find such a leak- Use ASAN . Use ASAN. Use ASAN.
- Use another detector. I tried Valgrind or heap tcmalloc tools, there are other tools in other environments as well.
- Some memory allocators allow you to dump a heap profile in which all unallocated chunks of memory will be shown. If you have a leak, then after a while, almost all of the active secretions will flow from it, so finding it is probably not difficult.
- If all else fails, output a memory dump and learn it as thoroughly as possible . But we should definitely not start with this.
Type (2): Unplanned Long-Lived Memory AllocationsSuch situations are not “leaks” in the classical sense of the word, since a link from somewhere to this area of ​​memory is still preserved, so in the end it can be released (if the program has time to get there without spending all of its memory).
Situations in this category can arise for many specific reasons. The most common are:
- Inadvertent state accumulation in the global structure; for example, an HTTP server writes to the global list each
Request
object received. - Caches without a well thought out obsolescence policy. For example, an ORM cache that caches all of the uploaded objects that are active during the migration process, during which all the records that are present in the table are loaded.
- Too volumetric state is captured in the circuit. Such a case is especially common in JavaScript, but can also occur in other environments.
- In a broader sense, the inadvertent retention of each of the elements of an array or stream, while it was assumed that these elements would be processed online.
How to determine that a leak belongs to this category- If the program is executed in an environment with garbage collection, then this option is considered first of all.
- Compare the heap size displayed in the garbage collector statistics with the free memory size reported by the operating system. If the leak falls into this category, the numbers will be comparable and, most importantly, over time will follow each other.
How to find such a leakUse profilers or heap dump tools that are in your environment. I know if there is a
guppy in Python or a
memory_profiler in Ruby, and I myself wrote
ObjectSpace directly in Ruby.
Type (3): free, but unused or unusable memoryCharacterizing this category is the most difficult, but it is the most important to understand and take into account.
This type of leakage occurs in the gray area, between memory, which is considered “free” from the point of view of the allocator inside the VM or runtime environment, and memory, which is “free” from the point of view of the operating system. The most common (but not the only) reason for this is
heap fragmentation . Some distributors simply take and do not return memory to the operating system after it has been allocated.
A case of this kind can be seen on the example of a short program written in Python:
import sys from guppy import hpy hp = hpy() def rss(): return 4096 * int(open('/proc/self/stat').read().split(' ')[23]) def gcsize(): return hp.heap().size rss0, gc0 = (rss(), gcsize()) buf = [bytearray(1024) for i in range(200*1024)] print("start rss={} gcsize={}".format(rss()-rss0, gcsize()-gc0)) buf = buf[::2] print("end rss={} gcsize={}".format(rss()-rss0, gcsize()-gc0))
We allocate 200,000 1-kb buffers, and then save each subsequent one. We deduce every second the state of memory from the point of view of the operating system and from the point of view of our own Python garbage collector.
I get something like this on my laptop:
start rss=232222720 gcsize=11667592
end rss=232222720 gcsize=5769520
We can make sure that Python actually freed up half of the buffers, because the gcsize level dropped almost half the peak value, but could not return the operating system a single byte of this memory. The freed memory remains available to the same Python process, but to no other process on this machine.
Such free but unused portions of memory can be both problematic and harmless. If a Python program acts this way, and then allocates a handful of 1kb fragments, this space is simply reused, and everything is fine.
But, if we did this during the initial setup, and later allocated memory by the minimum, or if all the fragments subsequently allocated were at 1.5kb and did not fit into these previously left buffers, then all the memory allocated in this way would always stand idle. would be wasted.
Problems of this kind are especially relevant in a specific environment, namely, in multiprocess server systems for working with such languages ​​as Ruby or Python.
Suppose we set up a system in which:
- Each server uses N single-threaded workers that handle requests in a competitive manner. Let's take N = 10 for accuracy.
- As a rule, each employee has almost a constant amount of memory. For accuracy, let's take 500MB.
- With some low frequency, we receive requests that require much more memory than the median request. For accuracy, let's assume that once a minute we receive a request, the execution time of which additionally requires an extra 1GB of memory, and upon completion of the processing of the request, this memory is released.
Once a minute such a “cetacean” request arrives, the processing of which we assign to one of 10 employees, say, in a random fashion:
~random
. Ideally, at the time of processing this request, the employee should allocate 1GB of RAM, and after finishing work, return this memory to the operating system so that it can be used again later. In order to process requests unlimitedly by this principle, the server will need only 10 * 500MB + 1GB = 6GB RAM.
However, let's assume that due to fragmentation or for some other reason, the virtual machine can never return this memory to the operating system. That is, the amount of RAM that it requires from the OS is equal to the largest amount of memory that ever has to be allocated at a time. In such a case, when a particular employee serves such a resource-intensive request, the area occupied by such a process in memory will swell forever by a whole gigabyte.
When you start the server, you will see that the amount of memory used is 10 * 500MB = 5GB. As soon as the first large request arrives, the first worker will grab 1GB of memory, and then will not give it back. The total memory used will jump to 6GB. The following incoming requests may from time to time be dropped by the process that has previously processed the "whale", and in this case, the amount of memory used will not change. But sometimes such a large request will be given to another employee, which will cause the memory to expand by another 1GB, and so on until each worker has had time to process such a large request at least once. In this case, you will use these operations up to 10 * (500MB + 1GB) = 15GB of RAM, which is much more than the ideal 6GB! Moreover, if we consider how the server fleet is used over time, then you can see how the amount of memory used gradually grows from 5GB to 15GB, which will very much resemble a “real” leak.
How to determine that a leak belongs to this category- Compare the heap size displayed in the garbage collector statistics with the free memory size reported by the operating system. If the leak falls into this (third) category, then the numbers will diverge over time.
- I like to set up my application servers so that both of these numbers periodically beat off in my time series infrastructure, so it’s convenient to display graphics on them.
- In Linux, view the state of the operating system in field 24 of
/proc/self/stat
, and view the memory allocator through a language-specific API or virtual machine.
How to find such a leakAs already mentioned, this category is a bit more insidious than the previous ones, since the problem often arises, even when all the components work “as intended”. However, there are a number of good practices that can help mitigate or reduce the impact of such “virtual leaks”:
- Restart your processes more often. If the problem grows slowly, then perhaps restarting all the processes of the application once every 15 minutes or once an hour may not be difficult.
- An even more radical approach: you can teach all processes to restart on their own as soon as the memory space they occupy exceeds a certain threshold value or grows by a specified amount. However, try to ensure that your entire server park cannot start up in a spontaneous synchronous restart.
- Change the memory allocator. In the long run, tcmalloc and jemalloc usually cope with fragmentation much better than the default allocator, and experimenting with them is very convenient using the variable
LD_PRELOAD
. - Find out if you have individual requests that consume much more memory than others. In Stripe, API servers measure RSS (constant memory consumption) before and after servicing each API request and log the delta. Then we can easily query our log aggregation systems to determine if there are such terminals and users (and if patterns are traced) on which memory consumption bursts can be written off.
- Adjust the garbage collector / memory allocator. Many of them have customizable parameters that allow you to set how actively such a mechanism will return memory to the operating system, how optimized it is to eliminate fragmentation; There are other useful options. Everything is also quite difficult here: make sure that you understand exactly what you are measuring and optimizing, and also try to find an expert on the relevant virtual machine and consult with it.