Correcting memory leaks in the Python application

I recently happened to figure out and fix a few memory leaks in the popular Tornado framework. It does not matter if you have never used it, because the described will have little to do with it. I want to tell you about the methods I used to find and fix leaks.

All this will be true for the most part only for the most popular Python implementation, CPython. As you know, it has two mechanisms for freeing memory. The first one is reference counting. Every time you explicitly or implicitly create a new object, its reference count is one. If you assign this object to a new variable or pass in as an argument, its reference count increases. When the function exits, the number of references to objects that were in local variables and arguments decreases. If for some object the number of links becomes zero, it is immediately destroyed.

This scheme works fine until objects refer to each other appear. The simplest example is the nodes of a tree that hold links to their child and parent nodes. Nodes will continue to refer to each other, even when there are no other external links to any of them. The most annoying thing is that such nodes can refer to some other data and prevent them from being released. To eliminate such circular references, there is a second mechanism for freeing memory in Python - the garbage collector. It runs from time to time, putting the rest of the code on a pause, and analyzes all un-freed objects.
')
Formally, circular references cannot be called leaks: garbage collection will sooner or later destroy such objects. The only trouble is that Python cannot determine when it is still early, and when it is too late. In my case, the system simply nailed the process with Python if garbage collection did not start on time.

As stated in the gc module documentation, the frequency of garbage collection depends on the thresholds established for the number of new objects. In all versions of Python available to me, this number is 700 by default. However, if you run a fairly simple test, you can see that the number of objects collected by the compulsory garbage collection gc.collect() can easily exceed this value.

 class Node(object): parent = None def __init__(self, *children): self.children = list(children) for node in self.children: node.parent = self @classmethod def tree(cls, depth=1, children=1): if depth == 0: return [] return [cls(*cls.tree(depth-1, children)) for _ in range(children)] import gc from time import time for n in range(1, 21): for _ in range(n): #   . Node.tree(depth=5, children=6) start = time() print('{1} objects collected for n={0} in {2:3.6} msec'.format( n, gc.collect(), (time() - start) * 1000))

With n equal to 10 and 20, I got 107 thousand unallocated objects. So the thresholds in the gc module are soft and their achievement does not guarantee immediate garbage collection (Andrei Svetlov corrects in the comments that this is not so, and explains in detail why this is happening). Moreover, the number of objects says nothing about the memory they occupy. As a result, if objects in your application that occupy a lot of memory are not destroyed by reference counting, this can lead to dire consequences.

That is exactly what happened in my application. The code localizing the problem looked like this:

 from tornado import web, ioloop, gen ioloop = ioloop.IOLoop.current() class IndexHandler(web.RequestHandler): megabyte_string = "0123456789abcdef" * 64 * 1024 @web.asynchronous @gen.engine def get(self): self.write("Hello, world<br>") yield gen.Task(self.some_task, self.megabyte_string * 20) self.finish() def some_task(self, bigdata, callback): self.write("some task<br>") callback() application = web.Application([(r'/', IndexHandler)], debug=True) if __name__ == "__main__": print("Start on 8888") application.listen(8888) ioloop.start()

This is where a server is created that handles the URL "/" in the IndexHandler.get() method. The method is asynchronous and puts on the task, which transfers a large piece of data - 20 megabytes. What the task does is not so important, because already in this example there is a problem: with each request, the amount of memory occupied by the Python process increases by these 20 megabytes, and far from each decreases. The result is a simple benchmark ab -n 100 -c 4 localhost:8888/ ab -n 100 -c 4 localhost:8888/ is able to consume gigabytes of memory in certain moments. But it is worth changing the task call from the use of yield gen.Task() to a direct call with the callback transfer, as the server easily begins to withstand the load ab -n 1000 -c 100 localhost:8888/ ab -n 1000 -c 100 localhost:8888/ , consuming no more than 50 MB of memory.

  @web.asynchronous @gen.engine def get(self): self.write("Hello, world<br>") self.some_task(self.megabyte_string * 20, self.finish)

How to debug such cases? It would be nice to see what exactly is not being released. The first thing to do is to ensure that you can manually start garbage collection to make sure that its call actually frees the memory. I made another request to the handler, which called gc.collect() and displayed the number of collected objects.

 class HealthHandler(web.RequestHandler): def get(self): self.write('{} objects collected'.format(gc.collect())) application = web.Application([(r'/', IndexHandler), (r'/health/', HealthHandler)], debug=True)

Second, you need to disable automatic garbage collection. This will allow to obtain a stable result during experiments. Third, we need information about the collected objects. In the gc module there is already a ready tool for this - the information will be output to the console during the gc.collect() call.

 import gc gc.disable() gc.set_debug(gc.DEBUG_LEAK)

Now getting the objects involved in this particular leak is quite simple: you need to run the method that flows and get a list of the collected objects at /health/ . Then run the method that does not flow, get a list for it. Find objects from the first list that are not in the second. Here they are:

 gc: collectable <cell> × 4 gc: collectable <dict> × 3 gc: collectable <function> × 2 gc: collectable <generator> gc: collectable <instancemethod> gc: collectable <Runner> gc: collectable <set> gc: collectable <Task> gc: collectable <tuple> × 3

For clarity, I have grouped the same elements. First of all, non-embedded types are of interest. Here it is Task and Runner . This proves once again that the problem is in the yield gen.Task() call, and the problem is in garbage collection. It remains to figure out what the Runner and why they link to each other with the Task . Open the source code.

Here it should be noted that all the examples will be for the Tornado version 3.1dev2 at the time of my research. In the @gen.engine decorator, @gen.engine quite a lot of code, but the main thing that happens there is that the original function is called, and if the result of its execution turns out to be a generator, it is passed to the Runner class (got caught). Our Task is what the generator will return. Therefore, you need to look for a place in the class Runner , where the generator is iterated. This is the line yielded = self.gen.send(next) . Well, then it's pretty easy to trace that yielded gets into self.yield_point . And besides, self.yield_point calls the .start() method, which stores a reference to Runner . So, after running the Runner.run() method, you need to break the link either from one side or the other. Since Runner.yield_point is just a pointer to the last element, and Task.runner is a link to the parent, it is logical to nullify the pointer to the element. It remains only to understand where the Runner.run() method completes its execution. Since A tornado is asynchronous, and we go through the source code in its very heart, then it’s quite difficult to understand where the top is and where the bottom is. The .run() method .run() 5 exit points, and it is re-called from all sorts of callbacks. After several attempts, I realized that the self.finished flag of the Runner object is not self.finished , and where it is set to True, and self.yield_point should be self.yield_point .

Check the result with ab -n 1000 -c 100 localhost:8888/ ab -n 1000 -c 100 localhost:8888/ . Everything is good.

It would be possible to finish this, but this seemed strange to me. Why does any request leave unallocated objects in memory? Maybe something can be done about it. It turned out that, nevertheless, not any request, but only those on which the decorator @web.asynchronous . And the list of un-freed objects looked like this:

 gc: collectable <dict> × 7 gc: collectable <list> × 16 gc: collectable <tuple> gc: collectable <instancemethod> gc: collectable <ChunkedTransferEncoding> gc: collectable <ExceptionStackContext> gc: collectable <HTTPHeaders> × 2 gc: collectable <HTTPRequest> gc: collectable <IndexHandler>

There are already 5 unbuilt objects, and it is unclear from which end to begin. But I began by IndexHandler.finish() method, in which I removed references to all the objects I found.

 class IndexHandler(web.RequestHandler): @web.asynchronous def get(self): self.write("Hello, world<br>") self.finish() def finish(self, chunk=None): super(IndexHandler, self).finish(chunk) for k, v in self.__dict__.iteritems(): print '"{}":'.format(k), type(v) self.request = None self._headers = None self.application = None self._transforms = None

This gave a definite result, but did not solve the problem completely. The number of unbuilt non-freed objects decreased to two: ExceptionStackContext and IndexHandler itself. ExceptionStackContext is created when the @web.asynchronous decorator is @web.asynchronous with the argument self._stack_context_handle_exception , where self is just an IndexHandler . There are no links in the opposite direction. It looks like ExceptionStackContext refers to itself. We look at the implementation and see that, in the .__enter__() method .__enter__() there is a string self.new_contexts = (self.old_contexts[0], self) . So you need to reset self.new_contexts to .__exit__() , and that's in the bag.

As a result, the pullrequest with both changes was reviewed and adopted by the master for 2.5 hours, which motivates to continue to make useful changes. The tornado with these two patches and one more completely ceased to leave garbage in memory after the request. This reduced memory consumption with multiple competitive requests, slightly accelerated it due to faster garbage collection and, most importantly, made memory consumption predictable.

It is quite difficult to find and fix such leaks, especially if they are not in the code of the application itself, but of the libraries used. However, it is worthwhile to at least find out if you have them in the application and make sure that no heavy objects hang in your memory because of them.

Source: https://habr.com/ru/post/178637/

All Articles

Correcting memory leaks in the Python application

More articles: