Stories of several memory leak problems. Most of these problems are fairly trivial, easily reproduced, easily detected with the appropriate tools, and corrected. But, at times, problems turn out to be unusual, and require an unusual approach or solution ...
Memory Code Loader
The next testing in QA revealed a small permanent memory leak under load in the new version of the product. Reproduction was not a problem, but the initial run of the program under Valgrind did not reveal leaks.
After some thought, the option --leak-check = full was turned on and Valgrind began reporting leaks. But among the expected leaks of various kinds of static variables, often not consciously released, there was nothing resembling a leak. Usually, when you run several thousand iterations, it is easy to allocate a similar number of lost memory blocks allocated by malloc. In this case, the leakage per 10,000 requests to the server was minimal.
')
After analyzing the stack of memory allocations and dropping out the expected cases, there was only one candidate who was responsible for several dozen lost blocks - a number absolutely not associated with 10,000 requests. But there was an explanation for this - memory allocations occurred in the STL string class, which actively uses the memory pool to reduce the amount of memory allocations. Therefore, instead of 10,000 lost memory blocks, Valgrind reported 40+. The call stack looked like this:
==15882== 76,400 bytes in 8 blocks are definitely lost in loss record 2 of 3
==15882== at 0x401B007: operator new(unsigned int) (vg_replace_malloc.c:214)
==15882== by 0x40A40F0: std::__default_alloc_template<true, 0>::_S_chunk_alloc(unsigned int, int&) (in /usr/lib/libstdc++.so.5.0.3)
==15882== by 0x40A3FFC: std::__default_alloc_template<true, 0>::_S_refill(unsigned int) (in /usr/lib/libstdc++.so.5.0.3)
==15882== by 0x40A3B6B: std::__default_alloc_template<true, 0>::allocate(unsigned int) (in /usr/lib/libstdc++.so.5.0.3)
==15882== by 0x40A9B67: std::string::_Rep::_S_create(unsigned int, std::allocator<char> const&) (in /usr/lib/libstdc++.so.5.0.3)
==15882== by 0x40A9C98: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned int) (in /usr/lib/libstdc++.so.5.0.3)
==15882== by 0x40A7A05: std::string::reserve(unsigned int) (in /usr/lib/libstdc++.so.5.0.3)
==15882== by 0x8049826: std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char,
std::char_traits<char>, std::allocator<char> >(char const*, std::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&) (basic_string.tcc:619)
==15882== by 0x804956A: A::A(A const&) (class_a.cpp:20)
==15882== by 0x80491BC: foo(int) (test.cpp:23)
==15882== by 0x80492EA: main (test.cpp:32)
It seems that the source of the memory leak was found, but the code looked absolutely innocent - a function call with a temporary copy of the object passed to it:
doSomething( condition ? Object( params ) : getObject() );
We were already sure that the memory is lost in this line, and began to look at the code generated by the compiler for this line. It seems that everything was in place - calling “basic_string :: length ()”, calling the constructor of the parameter class for one branch of the condition, calling the “Parent :: getB ()” and copying constructor for the other branch, calling the actual function “A :: create ”, freeing temporary objects - everything except calling the temporary copy of the class created on the stack, but containing a copy of the string inside, which was not released as a result!
110: return A::create(b1, b2, s.length() > 0 ? B(s) : getB()); 13d4: 83 ec 0c sub $0xc,%esp 13d7: 8d 45 d8 lea 0xffffffd8(%ebp),%eax 13da: 50 push %eax 13db: e8 fc ff ff ff call std::basic_string<wchar_t>::length() const <<========= 13e0: 83 c4 10 add $0x10,%esp 13e3: 85 c0 test %eax,%eax 13e5: 74 18 je 13ff <A::create+0x341> 13e7: 83 ec 08 sub $0x8,%esp 13ea: 8d 45 d8 lea 0xffffffd8(%ebp),%eax 13ed: 50 push %eax 13ee: 8d 85 f8 fe ff ff lea 0xfffffef8(%ebp),%eax 13f4: 50 push %eax 13f5: e8 fc ff ff ff call B::B( std::basic_string<wchar_t> const & ) <<========= 13fa: 83 c4 10 add $0x10,%esp 13fd: eb 21 jmp 1420 <A::create+0x362> 13ff: 83 ec 08 sub $0x8,%esp 1402: 83 ec 04 sub $0x4,%esp 1405: ff 75 0c pushl 0xc(%ebp) 1408: e8 fc ff ff ff call Parent::getB() <<========= 140d: 83 c4 08 add $0x8,%esp 1410: 50 push %eax 1411: 8d 85 f8 fe ff ff lea 0xfffffef8(%ebp),%eax 1417: 50 push %eax 1418: e8 fc ff ff ff call B::B( B const & ) <<========= 141d: 83 c4 10 add $0x10,%esp 1420: 83 ec 0c sub $0xc,%esp 1423: 8d 85 f8 fe ff ff lea 0xfffffef8(%ebp),%eax 1429: 50 push %eax 142a: 0f b6 45 f6 movzbl 0xfffffff6(%ebp),%eax 142e: 50 push %eax 142f: 0f b6 45 f7 movzbl 0xfffffff7(%ebp),%eax 1433: 50 push %eax 1434: ff 75 0c pushl 0xc(%ebp) 1437: ff 75 08 pushl 0x8(%ebp) 143a: e8 fc ff ff ff call A::create(bool, bool, B) <<========= 143f: 83 c4 1c add $0x1c,%esp 1442: 83 ec 0c sub $0xc,%esp 1445: 8d 85 68 ff ff ff lea 0xffffff68(%ebp),%eax 144b: 50 push %eax 144c: e8 fc ff ff ff call BS<100, char>::~BS() 1451: 83 c4 10 add $0x10,%esp 1454: 83 ec 0c sub $0xc,%esp 1457: 8d 45 d8 lea 0xffffffd8(%ebp),%eax 145a: 50 push %eax 145b: e8 fc ff ff ff call std::basic_string<wchar_t>::~basic_string() 1460: 83 c4 10 add $0x10,%esp 1463: eb 55 jmp 14ba <A::create+0x3fc> 1465: 89 85 f0 fe ff ff mov %eax,0xfffffef0(%ebp) 146b: 8b b5 f0 fe ff ff mov 0xfffffef0(%ebp),%esi 1471: 83 ec 0c sub $0xc,%esp 1474: 8d 85 68 ff ff ff lea 0xffffff68(%ebp),%eax 147a: 50 push %eax 147b: e8 fc ff ff ff call BS<100, char>::~BS() 1480: 83 c4 10 add $0x10,%esp 1483: 89 b5 f0 fe ff ff mov %esi,0xfffffef0(%ebp) 1489: eb 06 jmp 1491 <A::create+0x3d3> 148b: 89 85 f0 fe ff ff mov %eax,0xfffffef0(%ebp) 1491: 8b b5 f0 fe ff ff mov 0xfffffef0(%ebp),%esi 1497: 83 ec 0c sub $0xc,%esp 149a: 8d 45 d8 lea 0xffffffd8(%ebp),%eax 149d: 50 push %eax 149e: e8 fc ff ff ff call std::basic_string<wchar_t>::~basic_string() 14a3: 83 c4 10 add $0x10,%esp 14a6: 89 b5 f0 fe ff ff mov %esi,0xfffffef0(%ebp) 14ac: 83 ec 0c sub $0xc,%esp 14af: ff b5 f0 fe ff ff pushl 0xfffffef0(%ebp) 14b5: e8 fc ff ff ff call _Unwind_Resume
Not finding the destructor’s call to the code, we began to look for similar problems, and almost immediately found "
gcc 3.2 bug 9946 - object destructor not called, causing a memory leak ".
The problem was in generating the code for the “?:” Operator, and was solved either by updating the compiler, or by cosmetic changing the “?:” Operator to a simple if ().
// ,
const Object & getObject();
//
void doSomething( Object obj );
// , .
// ,
// .
doSomething( condition ? Object( params ) : getObject() );
A simple test program would output the following (note the number of instances of class A created and released):
main() start
A::A( 'on stack' )
B::B()
A::A( 'static instance' )
A::A( 'copy of static instance' )
B::boo()
B::~B()
A::~A( 'on stack' )
main() end
A::~A( 'static instance' )
Class A created 3 times and destroyed 2 times
Class B created 1 times and destroyed 1 times
The problem occurred only when generating 32-bit code in gcc 3.2.3, and did not occur in 64-bit code, or code generated by later versions of the compiler.
I am not me and my memory is not
At one time I supported and refined the program to collect data and transfer it to the server. This program worked on a good dozen platforms, including Linux. Since the program was commercial, it was compiled by a compiler corresponding to the minimum supported version of Linux, in our case gcc 3.3.x, and executable files were provided.
At one point, in our department, QA registered and even managed to reproduce (during a long, several days, test under a large load) the program crash due to lack of memory - the process ate up 3GB of memory and safely dropped, creating core dump of a similar volume. Moreover, the explosive growth of memory usage occurred literally 10-15 minutes before the sad end, and at this time the processor load was about 12% (there were 4 dual-core processors on the server, so one thread spinning in a cycle should use 12.5%).
The call stack of the fallen stream pointed to a completely trivial code in the copy constructor, it was only clear that this was somehow related to exceptional situations.
(gdb) where #0 0xffffe410 in __kernel_vsyscall () #1 0xb7dbd8d0 in raise () from /lib/libc.so.6 #2 0xb7dbeff3 in abort () from /lib/libc.so.6 #3 0xb7f86da5 in dlopen () from /usr/lib/libstdc++.so.5 #4 0xb7f86de2 in std::terminate () from /usr/lib/libstdc++.so.5 #5 0xb7f85e89 in __cxa_allocate_exception () from /usr/lib/libstdc++.so.5 #6 0xb78f7f07 in Uuid::Uuid () from .../lib32/libourlibrary.so #7 0xb782409d in ...
Attempts to reproduce the problem with the memory debugger or the debug version of the executable files were not successful - the problem disappeared. And the reproduction itself was costly - it was usually necessary to test the process under load for 2-4 days before the problem occurred, and this delayed the tests of other components that required the same infrastructure for tests under load.
The search for similar call stacks gave almost no results, and the situation was a dead end. It was necessary to somehow find the cause of a memory leak without using a debugger.
In one implementation of the C ++ memory allocation library I encountered, each block allocated for a new instance of the class in the debug version of the library was marked with a string containing the name of this class. This method allowed to easily determine by core dump-y which objects of a particular type were allocated. I tried to search for strings contained in the core dump file, first running the “strings” program, and then sorting with “sort”.
Curiously, it was found that the file contains 32,123,751 lines of the form “++ CCUNG0o” - only these lines occupied about 275 MB in a file of 3 GB in size. When searching these lines in the file, it turned out that each such signature started a block of 96 bytes in size (96b * 32,000,000 = 3Gb !!!)
The blocks started with the inverted string “++ CCUNG0o” with a leading zero (as it is inverted), and differed only in a pair of four bytes in different places, forming, apparently, a linked list.
0x248b818: 0x00 0x2b 0x2b 0x43 0x43 0x55 0x4e 0x47 <<== «++CCUNG0o»
0x248b820: 0x30 0x6f 0xf8 0xb7 0x00 0x00 0x00 0x00
0x248b828: 0x6c 0xa8 0x78 0xb7 0x00 0x00 0x00 0x00
0x248b830: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x248b838: 0xa8 0x0e 0xd3 0xb7 0x3c 0xa2 0xfa 0xb7
0x248b840: 0xe8 0xae 0x82 0xb7 0x65 0x00 0x00 0x00
0x248b848: 0xc0 0x0e 0xd3 0xb7 0x38 0x9c 0xc8 0xb7
0x248b850: 0xf4 0x9b 0x04 0x08 0xe4 0x9c 0x04 0x08
0x248b858: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x248b860: 0x03 0x00 0x00 0x00 0xbe 0x80 0x79 0xb7
0x248b868: 0x58 0x80 0x79 0xb7 0x54 0xa9 0x78 0xb7
0x248b870: 0x98 0xb8 0x48 0x02 0x00 0x00 0x00 0x00
0x248b878: 0x00 0x2b 0x2b 0x43 0x43 0x55 0x4e 0x47 <<== «++CCUNG0o»
0x248b880: 0x30 0x6f 0xf8 0xb7 0x00 0x00 0x00 0x00
0x248b888: 0x6c 0xa8 0x78 0xb7 0x00 0x00 0x00 0x00
0x248b890: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x248b898: 0xa8 0x0e 0xd3 0xb7 0x3c 0xa2 0xfa 0xb7
0x248b8a0: 0xff 0xff 0xff 0xff 0x65 0x00 0x00 0x00
0x248b8a8: 0xc0 0x0e 0xd3 0xb7 0x38 0x9c 0xc8 0xb7
0x248b8b0: 0xf4 0x9b 0x04 0x08 0xe4 0x9c 0x04 0x08
0x248b8b8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x248b8c0: 0x03 0x00 0x00 0x00 0xbe 0x80 0x79 0xb7
0x248b8c8: 0x58 0x80 0x79 0xb7 0x54 0xa9 0x78 0xb7
0x248b8d0: 0xf8 0xb8 0x48 0x02 0x00 0x00 0x00 0x00
The search for such a line on the Internet did not at first bring any useful results, but then one link to
http://www.opensource.apple.com/ (unfortunately not working anymore) came up with the following snippet:
// This is the exception class we report -- "GNUCC++\0". const _Unwind_Exception_Class __gxx_exception_class = ((((((((_Unwind_Exception_Class) 'G' << 8 | (_Unwind_Exception_Class) 'N') << 8 | (_Unwind_Exception_Class) 'U') << 8 | (_Unwind_Exception_Class) 'C') << 8 | (_Unwind_Exception_Class) 'C') << 8 | (_Unwind_Exception_Class) '+') << 8 | (_Unwind_Exception_Class) '+') << 8 | (_Unwind_Exception_Class) '\0');
Then I went back to the call stacks of other threads, and found a very suspicious thread, obviously working in a loop at the time of the crash - and this loop also handles exception handling:
(gdb) thread 20 [Switching to thread 20 (process 27635)]#0 0xb7e87921 in dl_iterate_phdr () from /lib/libc.so.6 (gdb) where #0 0xb7e87921 in dl_iterate_phdr () from /lib/libc.so.6 #1 0x0804e837 in _Unwind_Find_FDE (pc=0xb782409c, bases=0xb70209b4) at ../../gcc/unwind-dw2-fde-glibc.c:283 #2 0x0804c950 in uw_frame_state_for (context=0xb7020960, fs=0xb7020860) at ../../gcc/unwind-dw2.c:903 #3 0x0804cfbf in _Unwind_RaiseException_Phase2 (exc=0xbfde3f38, context=0xb7020960) at ../../gcc/unwind.inc:43 #4 0x0804d397 in _Unwind_Resume (exc=0xbfde3f38) at ../../gcc/unwind.inc:220 #5 0xb78f82b0 in Uuid::Uuid () from /home/'work/lib32/libourlibrary.so #6 0xb782409d in ...
After that, we began to look more closely at the source code where the exception was caused, and assumed that the problem lies in the placement new operator that was used in this section of the code. A simple test confirmed the assumption - the creation of an exception in the object constructor, called from the placement new operator, led to an infinite loop and an explosive memory leak. The test was simple - simulation of an exceptional situation in a problem part of the code, but unfortunately, not at all easy apart from the context of our application - small test programs did not reproduce the problem. It also turned out that this problem is present only in the version of the compiler that we used, that is, the problem was missing in gcc 3.4. To the joy that the specific problem, which had already cost a pretty penny, was solved, the further investigation was turned off.
Not quite a leak
Once, after updating a software product, pre-testing at a client found a memory leak that did not occur before.
The program worked under Solaris, under load, and memory usage increased sharply at 10-100 MB, while rarely enough - sometimes every 2-3 days, and sometimes 2-3 times a day. Sooner or later, the memory used by the process grew to 2+ GB. At the same time, even when the load was reduced to zero, the memory used (RSS and VSS values) never decreased.
There was no direct access to the servers (for example, to configure libumem, for example), and the QA problem could not be reproduced. It’s also good that we managed to get a trimmed core dump file - a snapshot of the process memory. But analysis of the core dump file gave almost nothing - the call stack showed a drop in memory allocation due to lack of it. At the same time, most of the memory was not used - almost the entire core dump file was occupied by empty 4 KB pages filled with zeros.
The situation was strange, but gradually, by eliminating various components of the system and thoughtful analysis of the log files, the picture of the events taking place was restored.
When updating the system, one of the client programs stopped sending some messages intended for monitoring the system. By chance, the missing messages signaled closing the transactions, and they had to clean the internal buffers. Accordingly, as the load increased, buffered messages began to accumulate, waiting for pairs. By coincidence, in this case, the messages were atypically large (tens of kilobytes instead of hundreds of bytes), and the number of client processes was relatively large (several tens instead of the usual 2-5). But in fact, all these parameters fit perfectly into the limitations supported by the system (provided that it functions normally). And, although the program did not limit the size of the internal buffers, it did support a mechanism for clearing “lost” transactions — all incomplete messages older than 10 minutes were automatically deleted. And just this counter always showed 0 due to an error - the mechanism was intended for an almost unbelievable exceptional situation, it was almost never involved, and was not sufficiently tested. The cleaning itself worked fine, but was not efficient enough because of the large memory load caused by a large number of clients and the large size of non-standard messages. And the diagnosis of the problem turned out to be very difficult due to a faulty counter in the statistics.
Why the memory did not return when the load decreases? Very simple - the standard process memory manager in Solaris only increases the address space of the process, and reserves the memory freed by the process for reuse, leaving the pages "occupied" by the process from the perspective of an outsider.
In our case, single peaks of activity of problem clients led to periodic large memory allocations, in most cases imperceptible, since the previous maximum was not exceeded. And only if the last maximum was exceeded, the next step was obtained, which has never decreased. 10 minutes after the peak, all the memory was freed, but outside it was not visible, only a snapshot of the memory showed that most of the memory is filled with zeros and is not used.
The solution was simple even before correcting problematic clients and protecting against buffer overflows — the age of “old” transactions was temporarily limited to 30 seconds, and this was quite enough for timely clearing of buffers under a given load, with a large margin. But diagnostics and troubleshooting took a long time, mainly due to incorrect statistics in the logs.
Instead of an epilogue
Who had interesting cases in practice - write!