One of the main problems when writing large (relatively) Python programs is minimizing memory consumption. However, it is easy to manage memory here - if you care at all. Python memory is allocated transparently, objects are managed using a reference count system, and memory is released when the counter drops to zero. In theory, everything is fine. But in practice, you need to know a few things about memory management in Python in order for your programs to use it effectively. First thing, you need to understand it well: the size of the main objects in Python. And the second thing: how is the control "under the hood" of the language.
Let's start with the size of the objects. In Python, there are many primitive data types: integers (int), long (int version with unlimited precision), floating point numbers (they are doubles, double), tuples (tuple), string values, lists, dictionaries and classes.
What is the size of an int
? A programmer who writes in C or C ++ is likely to say that the size of a machine-dependent int
is about 32 bits, possibly 64; and therefore, takes no more than 8 bytes. But is it in Python?
Let's write a function showing the size of the objects (recursively, if needed):
import sys def show_sizeof(x, level=0): print "\t" * level, x.__class__, sys.getsizeof(x), x if hasattr(x, '__iter__'): if hasattr(x, 'items'): for xx in x.items(): show_sizeof(xx, level + 1) else: for xx in x: show_sizeof(xx, level + 1)
Now, using this function, you can explore the sizes of the main data types:
show_sizeof(None) show_sizeof(3) show_sizeof(2**63) show_sizeof(102947298469128649161972364837164) show_sizeof(918659326943756134897561304875610348756384756193485761304875613948576297485698417)
If you have 32-bit Python 2.7x, then you will see:
8 None 12 3 22 9223372036854775808 28 102947298469128649161972364837164 48 918659326943756134897561304875610348756384756193485761304875613948576297485698417
And if 64-bit Python 2.7x, you will see:
16 None 24 3 36 9223372036854775808 40 102947298469128649161972364837164 60 918659326943756134897561304875610348756384756193485761304875613948576297485698417
Let's focus on the 64-bit version (mainly because in our case it is more in demand). None
takes 16 bytes. int
is 24 bytes, three times more than int64_t
in C, although this is to some extent a machine-friendly integer. The minimum size of long values ​​(with unlimited precision) used to represent numbers greater than 2 63 - 1 is 36 bytes. Then they increase linearly, as the logarithm of the number being represented.
Python floating-point numbers are implementation-dependent, but are similar to double-precision numbers in C. However, they do not occupy only 8 bytes:
show_sizeof(3.14159265358979323846264338327950288)
On a 32-bit platform it produces:
16 3.14159265359
And on 64-bit:
24 3.14159265359
This is again three times more than a C programmer would have thought. What about string values?
show_sizeof("") show_sizeof("My hovercraft is full of eels")
On a 32-bit platform:
21 50 My hovercraft is full of eels
And on 64-bit:
37 66 My hovercraft is full of eels
An empty string value is 37 bytes in a 64-bit environment! Then the memory consumption increases in accordance with the size of the (useful) value.
Let's take a look at other frequently sought-after structures: tuples, lists, and dictionaries. Lists (implemented as lists of arrays , and not as linked lists , with all the consequences ) are arrays of references to Python objects, which allows them to be heterogeneous. Their sizes:
show_sizeof([]) show_sizeof([4, "toaster", 230.1]) 32- : 32 [] 44 [4, 'toaster', 230.1] 64-: 72 [] 96 [4, 'toaster', 230.1]
An empty list is 72 bytes. The size of the empty std::list()
in 64-bit C is only 16 bytes, 4-5 times smaller. What about tuples? And dictionaries?
show_sizeof({}) show_sizeof({'a':213, 'b':2131})
On a 32-bit platform it produces:
136 {} 136 {'a': 213, 'b': 2131} 32 ('a', 213) 22 a 12 213 32 ('b', 2131) 22 b 12 2131
And on 64-bit:
280 {} 280 {'a': 213, 'b': 2131} 72 ('a', 213) 38 a 24 213 72 ('b', 2131) 38 b 24 2131
The last example is especially interesting because it “does not add up”. Key / value pairs occupy 72 bytes (their components occupy 38 + 24 = 62 bytes, and another 10 are spent on the pair itself), but the entire dictionary weighs 280 bytes (and not the minimum required 144 = 72 × 2 bytes). A dictionary is considered an effective search data structure, and two likely implementations will consume more memory than the required minimum. If this is some kind of tree, then you have to pay for internal nodes that contain a key and two pointers to child nodes. If this is a hash table, then for good performance you need to have a place for free entries.
The equivalent (relatively) structure of std::map
from C ++ at creation takes 48 bytes (it is still empty). And an empty string value in C ++ requires 8 bytes (then the size grows linearly with the size of the string). The integer value is 4 bytes (32 bits).
And what gives us all this? The fact that an empty string value takes 8 or 37 bytes does not make much difference. Really. But only until your project starts to grow. Then you will have to very carefully monitor the number of objects created to limit the amount of memory consumed by the application. For real applications, this is a problem. To develop a really good memory management strategy, we need to monitor not only the size of new objects, but also the number and order of their creation. For Python programs, this is very important. Let's now deal with the following key point: with the internal organization of memory allocation in Python.
To speed memory allocation (and reuse), Python uses a series of lists for small objects. Each list contains objects of the same size: there can be one list for objects from 1 to 8 bytes, another for objects of 9–16 bytes, etc. When you need to create a small object, we again use a free block in the list or select a new one.
There are several nuances how Python distributes these lists into blocks, pools and "arenas": several blocks form a pool, pools gather into an arena, etc. But we will not go deep into it (if you wish, you can read about how to improve memory allocation in Python ). It is important for us to know that these lists are undiminished .
Indeed, if an element (of size x ) is removed from memory (the link to it is erased), then the volume that occupied it will not be returned to the Python global memory pool (including the system), but will be marked free and added to the list of free elements of size x . The volume occupied by a dead object can be used again if another object of a suitable size is needed. And if there is no suitable dead object, then a new one is created.
If memory with small objects is never released, then we come to the inevitable conclusion that these lists with small objects can only grow, they never decrease, which means that numerous small objects located in it at any given moment in the memory of your application.
Therefore, try to place in memory only the number of small objects that are needed for any one task, preferring the cycles in which a small number of elements are created / processed, rather than patterns, where lists are first created using the generate syntax, and then processed .
Although the second option is more consistent with the spirit of Python, it is less successful: in the end there will be a large number of small objects that fill the corresponding lists, and even if some list becomes dead, then the objects in it (now all in the list of free objects) will still take up a lot of memory.
Increasing the list of free elements is not a special problem, because this memory is still available for the Python program. But from the point of view of the OS, the size of your program is equal to the total size of the memory allocated for Python. And only under Windows, memory is returned to the OS heap (and is used to host other objects besides small ones), and under Linux, the total amount of memory used by your application will only grow.
Let us prove this statement using memory_profiler , a module for Python (dependent on the python-psutil
) (page on Github ). It adds the @profile
decorator to track some specific memory usage. To use it is extremely simple. Consider the following program:
import copy import memory_profiler @profile def function(): x = list(range(1000000)) # allocate a big list y = copy.deepcopy(x) del x return y if __name__ == "__main__": function() invoking python -m memory_profiler memory-profile-me.py
On a 64-bit computer, it displays:
Filename: memory-profile-me.py Line # Mem usage Increment Line Contents ================================================ 4 @profile 5 9.11 MB 0.00 MB def function(): 6 40.05 MB 30.94 MB x = list(range(1000000)) # allocate a big list 7 89.73 MB 49.68 MB y = copy.deepcopy(x) 8 82.10 MB -7.63 MB del x 9 82.10 MB 0.00 MB return y
The program creates n = 1,000,000 integers (n Ă— 24 bytes = ~ 23 MB) and an additional list of links (n Ă— 8 bytes = ~ 7.6 MB), and in total we get ~ 31 MB. copy.deepcopy
copies both lists, and copies take ~ 50 MB (I don’t know where the extra 50 - 31 = 19 MB come from). Curiously, del x
removes x
, but memory consumption is reduced by only 7.63 MB! The reason is that del
only removes the list of links, but real integer values ​​remain on the heap and lead to excessive consumption of ~ 23 MB.
In this example, the amount of busy is ~ 73 MB, which is more than twice the amount needed to store a list weighing ~ 31 MB. As you can see, with the loss of vigilance, sometimes there are very unpleasant surprises in terms of memory consumption!
You can get different results on other platforms and other versions of Python.
By the way, what about pickle
?
Pickle is the standard way to (de) serialize Python objects to a file. What is its memory consumption? Does it make extra copies of the data or is it smarter? Consider a short example:
import memory_profiler import pickle import random def random_string(): return "".join([chr(64 + random.randint(0, 25)) for _ in xrange(20)]) @profile def create_file(): x = [(random.random(), random_string(), random.randint(0, 2 ** 64)) for _ in xrange(1000000)] pickle.dump(x, open('machin.pkl', 'w')) @profile def load_file(): y = pickle.load(open('machin.pkl', 'r')) return y if __name__=="__main__": create_file() #load_file()
On the first call, we profile the creation of the pickled data, and on the second call, we re-read them (you can comment out the function so that it is not called). When using memory_profiler
lot of memory is consumed during data creation:
Filename: test-pickle.py Line # Mem usage Increment Line Contents ================================================ 8 @profile 9 9.18 MB 0.00 MB def create_file(): 10 9.33 MB 0.15 MB x=[ (random.random(), 11 random_string(), 12 random.randint(0,2**64)) 13 246.11 MB 236.77 MB for _ in xrange(1000000) ] 14 15 481.64 MB 235.54 MB pickle.dump(x,open('machin.pkl','w'))
And when reading - a little less:
Filename: test-pickle.py Line # Mem usage Increment Line Contents ================================================ 18 @profile 19 9.18 MB 0.00 MB def load_file(): 20 311.02 MB 301.83 MB y=pickle.load(open('machin.pkl','r')) 21 311.02 MB 0.00 MB return y
So pickl
very bad for memory consumption. The initial list takes about 230 MB, and during serialization it consumes about the same amount.
On the other hand, deserialization looks more efficient. More memory is consumed than the original list (300 MB instead of 230), but this is at least not twice as much.
In general, it is better to avoid (de) serialization in memory-sensitive applications. What are the alternatives? Serialization preserves the entire data structure, so that later you can completely restore it from the resulting file. But it is not always necessary. If the file contains a list, as in the previous example, then it may be advisable to use a simple, text format. Let's see what it gives.
The simplest (naĂŻve) implementation:
import memory_profiler import random import pickle def random_string(): return "".join([chr(64 + random.randint(0, 25)) for _ in xrange(20)]) @profile def create_file(): x = [(random.random(), random_string(), random.randint(0, 2 ** 64)) for _ in xrange(1000000) ] f = open('machin.flat', 'w') for xx in x: print >>f, xx f.close() @profile def load_file(): y = [] f = open('machin.flat', 'r') for line in f: y.append(eval(line)) f.close() return y if __name__== "__main__": create_file() #load_file()
Create a file:
Filename: test-flat.py Line # Mem usage Increment Line Contents ================================================ 8 @profile 9 9.19 MB 0.00 MB def create_file(): 10 9.34 MB 0.15 MB x=[ (random.random(), 11 random_string(), 12 random.randint(0, 2**64)) 13 246.09 MB 236.75 MB for _ in xrange(1000000) ] 14 15 246.09 MB 0.00 MB f=open('machin.flat', 'w') 16 308.27 MB 62.18 MB for xx in x: 17 print >>f, xx
Read the file:
Filename: test-flat.py Line # Mem usage Increment Line Contents ================================================ 20 @profile 21 9.19 MB 0.00 MB def load_file(): 22 9.34 MB 0.15 MB y=[] 23 9.34 MB 0.00 MB f=open('machin.flat', 'r') 24 300.99 MB 291.66 MB for line in f: 25 300.99 MB 0.00 MB y.append(eval(line)) 26 301.00 MB 0.00 MB return y
When recording consumes much less memory. Many temporary small objects are still being created (about 60 MB), but this cannot be compared with double consumption. Reading is comparable in cost (slightly less memory is used).
This example is trivial, but it summarizes strategies in which you do not first load the entire data with subsequent processing, but read several items, process them, and reuse the allocated memory. Loading data into the Numpy array, for example, you can first create an Numpy array, then read the file line by line, gradually filling the array. This will allow only one copy of all data to be stored in memory. And when using pickle
data will be placed in memory (at least) twice: once pickle
, the second time when working with Numpy.
Or better yet, use Numpy (or PyTables) arrays. But that's another story. At the same time, in the Theano / doc / tutorial directory you can read another loading and saving guide.
The goals of the Python architecture do not coincide in any way, say, with the objectives of the C architecture. The latter is designed to give you good control over what you are doing at the expense of more complex and explicit programming. And the first one is designed so that you can write code faster, but at the same time the language hides most of the implementation details (if not all). Although this sounds nice, but ignoring the inefficient implementations of the language in the production-environment sometimes leads to unpleasant consequences, sometimes unrecoverable. I hope that knowing these Python features when working with memory (architectural features!) Will help you write code that will better meet the requirements of production, scale well, or, on the contrary, will be a burning memory hell.
Source: https://habr.com/ru/post/336156/
All Articles