
Hi, All.
Previously, I worked with C-like languages, but now I had to sit down at Python. The syntax was easy, and the turn came tricky questions. Under the cut - an article about how in Python data storage in memory is implemented. I do not pretend to be true, but I try to figure it out.
We look at the links
Let's start with the simplest. Any data in Python is an object, any variable is a reference to an object. There is no data that is not an object. To begin with, we need to learn how to determine whether two “identical” objects are the same. This requires getting an address, which is easily done by the built-in function id (). We try:
print(id(0))
As expected, something unintelligible is displayed. A large number, probably an address. But if each number used throughout to store in memory, then no memory, of course, is not enough. A short experiment is being conducted:
')
print(id(0)) print(id(0))
Two absolutely identical numbers. Therefore, all constant numbers are actually stored in memory without duplication. It is logical - Python has so low performance, such a trick allows you to save its last remnants. OK, let's try to fill all the memory with a huge array of zeros.
a = [0] while True: a += [0]
The infinite loop, as expected, runs infinitely, but practically does not require memory. Another experiment:
a = [0, 0] print(id(a[0])) print(id(a[1]))
Well, yes, the same number. Rather, to confirm I conduct the same check with two different variables - the same number, and yes even equal to id (0). That is, an algorithm, apparently this: when we change the value of a variable, we check if there is the same in the memory, and, if there is, redirect the link to it. This behavior is required, obviously, because the object occupies quite a lot of space in memory, and to be more compact, Python makes the most of existing objects. In order not to overload the article with code, I will say that for strings (including those obtained through a slice), logical objects and even arrays it works the same way. Let's make a second attempt to take all memory with Python:
i = 0 a = [0] while True: a += [a[i]] i += 1
Success! Memory consumption is constantly increasing. We make the first conclusion:
1. Any data in Python is objects.
2. If objects are "the same", then they are stored at the same address in memory. In other words, a == b and id (a) == id (b) are equivalent statements.
3. More complex optimization is not used - a rather simple dependence in the array is no longer optimized in any way (only the rule “a [i] = i”). However, I would be surprised if I was used: rather complex lexical analysis is already required, which Python with its step-by-step interpretation cannot afford.
We consider links
Disclaimer: We will now work in Python's interactive mode. In order to count object references there is a function sys.getrefcount (). Import sys:
>>> from os import sys
And for a start, we need to determine how real the data it gives is:
>>> sys.getrefcount('There is no this string in Python') 3 >>> sys.getrefcount('9695c3716e3b801367b7eca6a3281ac9')
This tells us about one funny thing: considering the links, getrefcount () creates them himself. As we see, for constants it creates two of them (really two, I tried on large amounts of input data that I don’t publish here as unnecessary), so we can just subtract 2. Actually, apparently for variables, he also creates two , but does not take into account the variable itself. Well, the deviations of the results from reality, we figured out. Now a few examples:
>>> sys.getrefcount(1) 754 >>> sys.getrefcount(65) 13 >>> sys.getrefcount(67) 11 >>> sys.getrefcount('A') 4 >>> sys.getrefcount('a') 6 >>> sys.getrefcount(False) 100 >>> sys.getrefcount(True) 101
Why do pointers suddenly appear out of the blue (a total of 751 pieces for a unit)? Because this function considers C-shny pointers, that is, includes those that are used in the code of Python itself. In fact, we brazenly break into that part of Python that developers are trying to hide from us.
Well, here's a backstage for Python. If my hands reach and I can manage, I will write about what happens if I try to change these objects manually through OllyDbg, for example.