gc.collect()
can easily exceed this value. class Node(object): parent = None def __init__(self, *children): self.children = list(children) for node in self.children: node.parent = self @classmethod def tree(cls, depth=1, children=1): if depth == 0: return [] return [cls(*cls.tree(depth-1, children)) for _ in range(children)] import gc from time import time for n in range(1, 21): for _ in range(n): # . Node.tree(depth=5, children=6) start = time() print('{1} objects collected for n={0} in {2:3.6} msec'.format( n, gc.collect(), (time() - start) * 1000))
from tornado import web, ioloop, gen ioloop = ioloop.IOLoop.current() class IndexHandler(web.RequestHandler): megabyte_string = "0123456789abcdef" * 64 * 1024 @web.asynchronous @gen.engine def get(self): self.write("Hello, world<br>") yield gen.Task(self.some_task, self.megabyte_string * 20) self.finish() def some_task(self, bigdata, callback): self.write("some task<br>") callback() application = web.Application([(r'/', IndexHandler)], debug=True) if __name__ == "__main__": print("Start on 8888") application.listen(8888) ioloop.start()
IndexHandler.get()
method. The method is asynchronous and puts on the task, which transfers a large piece of data - 20 megabytes. What the task does is not so important, because already in this example there is a problem: with each request, the amount of memory occupied by the Python process increases by these 20 megabytes, and far from each decreases. The result is a simple benchmark ab -n 100 -c 4 localhost:8888/
ab -n 100 -c 4 localhost:8888/
is able to consume gigabytes of memory in certain moments. But it is worth changing the task call from the use of yield gen.Task()
to a direct call with the callback transfer, as the server easily begins to withstand the load ab -n 1000 -c 100 localhost:8888/
ab -n 1000 -c 100 localhost:8888/
, consuming no more than 50 MB of memory. @web.asynchronous @gen.engine def get(self): self.write("Hello, world<br>") self.some_task(self.megabyte_string * 20, self.finish)
gc.collect()
and displayed the number of collected objects. class HealthHandler(web.RequestHandler): def get(self): self.write('{} objects collected'.format(gc.collect())) application = web.Application([(r'/', IndexHandler), (r'/health/', HealthHandler)], debug=True)
gc.collect()
call. import gc gc.disable() gc.set_debug(gc.DEBUG_LEAK)
/health/
. Then run the method that does not flow, get a list for it. Find objects from the first list that are not in the second. Here they are: gc: collectable <cell> × 4 gc: collectable <dict> × 3 gc: collectable <function> × 2 gc: collectable <generator> gc: collectable <instancemethod> gc: collectable <Runner> gc: collectable <set> gc: collectable <Task> gc: collectable <tuple> × 3
Task
and Runner
. This proves once again that the problem is in the yield gen.Task()
call, and the problem is in garbage collection. It remains to figure out what the Runner
and why they link to each other with the Task
. Open the source code.@gen.engine
decorator, @gen.engine
quite a lot of code, but the main thing that happens there is that the original function is called, and if the result of its execution turns out to be a generator, it is passed to the Runner
class (got caught). Our Task
is what the generator will return. Therefore, you need to look for a place in the class Runner
, where the generator is iterated. This is the line yielded = self.gen.send(next)
. Well, then it's pretty easy to trace that yielded
gets into self.yield_point
. And besides, self.yield_point
calls the .start()
method, which stores a reference to Runner
. So, after running the Runner.run()
method, you need to break the link either from one side or the other. Since Runner.yield_point
is just a pointer to the last element, and Task.runner
is a link to the parent, it is logical to nullify the pointer to the element. It remains only to understand where the Runner.run()
method completes its execution. Since A tornado is asynchronous, and we go through the source code in its very heart, then it’s quite difficult to understand where the top is and where the bottom is. The .run()
method .run()
5 exit points, and it is re-called from all sorts of callbacks. After several attempts, I realized that the self.finished
flag of the Runner
object is not self.finished
, and where it is set to True, and self.yield_point should be self.yield_point
.ab -n 1000 -c 100 localhost:8888/
ab -n 1000 -c 100 localhost:8888/
. Everything is good.@web.asynchronous
. And the list of un-freed objects looked like this: gc: collectable <dict> × 7 gc: collectable <list> × 16 gc: collectable <tuple> gc: collectable <instancemethod> gc: collectable <ChunkedTransferEncoding> gc: collectable <ExceptionStackContext> gc: collectable <HTTPHeaders> × 2 gc: collectable <HTTPRequest> gc: collectable <IndexHandler>
IndexHandler.finish()
method, in which I removed references to all the objects I found. class IndexHandler(web.RequestHandler): @web.asynchronous def get(self): self.write("Hello, world<br>") self.finish() def finish(self, chunk=None): super(IndexHandler, self).finish(chunk) for k, v in self.__dict__.iteritems(): print '"{}":'.format(k), type(v) self.request = None self._headers = None self.application = None self._transforms = None
ExceptionStackContext
and IndexHandler
itself. ExceptionStackContext
is created when the @web.asynchronous
decorator is @web.asynchronous
with the argument self._stack_context_handle_exception
, where self
is just an IndexHandler
. There are no links in the opposite direction. It looks like ExceptionStackContext
refers to itself. We look at the implementation and see that, in the .__enter__()
method .__enter__()
there is a string self.new_contexts = (self.old_contexts[0], self)
. So you need to reset self.new_contexts
to .__exit__()
, and that's in the bag.Source: https://habr.com/ru/post/178637/
All Articles