¡Hola! We are continuing a series of publications dedicated to the launch of the course
“Web Developer in Python” and right now are sharing with you the translation of another interesting article.
At Zendesk, we use Python to create machine-learning products. In machine learning applications, one of the most common problems we have encountered is memory leakage and spikes. Python code is usually executed in containers using distributed processing frameworks such as
Hadoop ,
Spark, and
AWS Batch . Each container is allocated a fixed amount of memory. As soon as the code execution exceeds the specified memory limit, the container will stop its work due to errors caused by insufficient memory.

')
You can quickly fix the problem by allocating more memory. However, it can waste resources and affect application stability due to unpredictable memory surges. Causes of memory leaks may be
as follows :
- Protracted storage of large objects that are not removed;
- Cyclic links in the code;
- Base libraries / C extensions leading to memory leaks;
A useful practice is to profile the use of memory by applications to gain a better understanding of the efficient use of code space and the packages used.
This article covers the following aspects:
- Profiling application memory usage over time;
- How to check the memory usage in a certain part of the program;
- Tips for debugging errors caused by memory problems.
Memory profiling over timeYou can take a look at variable memory usage during the execution of a Python program using the
memory-profiler package.
Figure A. Memory profiling as a function of timeThe
include-children parameter will include the use of memory by any child processes spawned by parent processes. Figure A reflects an iterative learning process that causes memory increases in cycles at the times when the training data packets are processed. Objects are deleted during garbage collection.
If memory usage is constantly increasing, it is considered a potential threat of a memory leak.
Here is a sample code reflecting this:
Figure B. Memory usage increasing over time.You should set breakpoints in the debugger as soon as the memory usage exceeds a certain threshold. To do this, you can use the
pdb-mmem parameter , which is useful during troubleshooting.
Memory dump at a specific point in timeIt is useful to estimate in advance the expected number of large objects in the program and whether they should be duplicated and / or converted to different formats.
For further analysis of objects in memory, you can create a dump-heap in certain lines of the program using
muppy .
Figure C. Example dump-heap summaryAnother useful library for memory profiling is
objgraph , which allows you to generate graphs to check the origin of objects.
Useful pointersA useful approach is to create a small “test case” that runs the appropriate code causing a memory leak. Consider using a subset of randomly selected data if full input data is deliberately long processed.
Executing tasks with a large memory load in a separate processPython does not necessarily release the memory immediately for the operating system. To make sure that the memory has been freed, after executing the code snippet, it is necessary to start a separate process. More information about the garbage collector in Python can be found
here .
The debugger can add links to objects.If a breakpoint debugger such as
pdb is used , all created objects that the debugger manually refers to will remain in memory. This can create a false sense of memory leakage, since objects are not deleted in a timely manner.
Beware of packages that may cause memory leaks.Some libraries in Python could potentially cause a leak, for example
pandas
has several known
memory leak problems.
Have a nice leak hunt!
Useful links:docs.python.org/3/c-api/memory.htmldocs.python.org/3/library/debug.htmlWrite in the comments if this article was helpful to you. And those who want to learn more about our course, we invite you to
the open day , which will be held on April 22.