Pointers in Python: what’s the point?

If you have ever worked with such low-level languages like C or C ++, you probably heard about pointers. They allow you to greatly increase the effectiveness of different pieces of code. But they can also confuse newbies - and even experienced developers - and lead to memory management bugs. Are there any pointers in Python, can I emulate them in some way?

Pointers are widely used in C and C ++. In fact, these are variables that contain the memory addresses for which other variables are located. To refresh your pointer knowledge, read this review .

Thanks to this article, you will better understand the object model in Python and find out why pointers do not really exist in this language. In case you need to imitate the behavior of pointers, you will learn how to emulate them without a concurrent memory management nightmare.

With this article you will:
')

Find out why there are no pointers in Python.
Learn the difference between C variables and names in Python.
Learn how to emulate pointers in Python.
Use ctypes experiment with real pointers.

Note : The term “Python” here applies to the Python implementation of C, which is known as CPython. All discussions of the device language are valid for CPython 3.7, but may not correspond to subsequent iterations.

Why are there no pointers in Python?

I do not know. Can pointers exist natively in Python? Probably, but apparently, pointers contradict the concept of Zen of Python , because they provoke implicit changes instead of explicit ones. Often, pointers are quite complex, especially for beginners. Moreover, they are pushing you to unsuccessful decisions or to do something really dangerous, like reading from a memory area, from where you should not have read.

Python tries to abstract from the user implementation details, such as memory addresses. Often in this language the emphasis is on ease of use, not speed. Therefore, pointers in Python do not make much sense. But don't worry, by default the language gives you some advantages of using pointers.

To deal with pointers in Python, let's take a quick look at the features of the language implementation. In particular, you need to understand:

What are mutable and immutable objects.
How are the variables / names in Python.

Hold on to your memory addresses, let's go!

Objects in Python

Everything in Python is an object. For example, open the REPL and see how isinstance() :

 >>> isinstance(1, object) True >>> isinstance(list(), object) True >>> isinstance(True, object) True >>> def foo(): ... pass ... >>> isinstance(foo, object) True

This code demonstrates that everything in Python is actually objects. Each object contains at least three types of data:

Reference count.
Type of.
Value.

The reference counter is used to manage memory. Details about this management are written in Memory Management in Python . The type is used at the CPython level to ensure type safety during execution (runtime). And the value is the actual value associated with the object.

But not all objects are the same. There is one important difference: objects are changeable and unchangeable. Understanding this distinction between object types will help you become more aware of the first layer of the onion, which is called "Python Pointers."

Mutable and immutable objects

There are two types of objects in Python:

Immutable objects (can not be changed);
Variable objects (subject to change).

Awareness of this difference is the first key to traveling the world of pointers in Python. Here is a characteristic of the immutability of some popular types:

Type of	Unchangeable?
int	Yes
float	Yes
bool	Yes
complex	Yes
tuple	Yes
frozenset	Yes
str	Yes
list	Not
set	Not
dict	Not

As you can see, many of the commonly used primitive types are immutable. You can check this by writing some Python code. You will need two tools from the standard library:

id() returns the memory address of the object;
is returns True if and only if two objects have the same memory address.

You can run this code in the REPL environment:

 >>> x = 5 >>> id(x) 94529957049376

Here we assign the value x to the variable x . If you try to change the value using addition, you will get a new object:

 >>> x += 1 >>> x 6 >>> id(x) 94529957049408

Although it may seem that this code simply changes the value of x , in fact you get a new object as an answer.

The str type is also immutable:

 >>> s = "real_python" >>> id(s) 140637819584048 >>> s += "_rocks" >>> s 'real_python_rocks' >>> id(s) 140637819609424

And in this case, s after the operation += gets a different memory address.

Bonus : The += operator += converted to various method calls.

For some objects, such as a list, += converts to __iadd__() (local add). It will change itself and return the same ID. However, str and int do not have these methods, and as a result __add__() will be called instead of __iadd__() .

For more information, see the Python data model documentation .

If we try to directly change the string value s we get an error:

 >>> s[0] = "R"

Reverse tracing (the most recent calls are displayed last):

  File "<stdin>", line 1, in <mdule> TypeError: 'str' object does not support item assignment

The above code fails and Python reports that str does not support this change, which corresponds to the definition of immutability of type str .

Compare with a variable object, for example, with a list:

 >>> my_list = [1, 2, 3] >>> id(my_list) 140637819575368 >>> my_list.append(4) >>> my_list [1, 2, 3, 4] >>> id(my_list) 140637819575368

This code demonstrates the main difference between the two types of objects. Initially, my_list has an ID. Even after adding to list 4 , my_list still has the same ID. The reason is that the list type is mutable.

Here is another demonstration of list variability with assignment:

 >>> my_list[0] = 0 >>> my_list [0, 2, 3, 4] >>> id(my_list) 140637819575368

In this code, we changed my_list and set it to 0 as the first element. However, the list retained the same ID after this operation. The next step on our journey to knowing Python will be exploring its ecosystem.

Understanding Variables

Variables in Python are fundamentally different from variables in C and C ++. In fact, they are simply not in Python. Instead of variables here are the names .

This may sound pedantic, and for the most part the way it is. Most often, you can take names in Python as variables, but you need to understand the difference. This is especially important when studying such a difficult topic as pointers.

To make it easier for you to understand, let's see how variables work in C, what they represent, and then compare it with the work of names in Python.

Variables in C

Take the code that defines the variable x :

 int x = 2337;

Execution of this short line goes through several different stages:

Allocating enough memory for a number.
Assigning the value of 2337 to this place in memory.
The mapping that x indicates to this value.

Simplified memory may look like this:

Here, the variable x has a fake address 0x7f1 and a value of 2337 . If you later want to change the value of x , you can do this:

 x = 2338;

This code assigns the new value of 2338 to the variable x , thereby overwriting the previous value. This means that the variable x mutable . Updated memory for new value:

Note that the location of x not changed, only the value itself. It is important. This tells us that x is a place in memory , not just a name.

You can also consider this issue within the concept of ownership. On the one hand, x owns a place in memory. First, x is an empty box that can contain only one number (integer) in which integer values can be stored.

When you assign x a value, you put the value in the box belonging to x . If you want to submit a new variable y , you can add this line:

 int y = x;

This code creates a new box called y and copies the value from x . Now the memory circuit looks like this:

Note the new location y - 0x7f5 . Although the value x was copied in y , the variable y owns the new address in memory. Therefore, you can overwrite the value of y without affecting x :

 y = 2339;

Now the memory circuit looks like this:

I repeat: you changed the value of y , but not the location. In addition, you did not affect the original variable x .

Named in Python is a completely different situation.

Python Names

There are no variables in Python, instead of names. You can use the term "variables" at your discretion, but it is important to know the difference between variables and names.

Let's take the equivalent code from the above example in C and write it in Python:

 >>> x = 2337

As in C, during the execution of this code passes through several separate stages:

PyObject is created.
A number for PyObject is assigned a typecode.
2337 assigned a value for PyObject.
The name x is created.
x indicates a new PyObject.
PyObject's link count is incremented by 1.

Note : PyObject is not the same as an object in Python, this entity is typical for CPython and represents the basic structure of all Python objects.

PyObject is defined as a C-structure, so if you are wondering why you cannot directly call a typecode or a reference counter, the reason is that you do not have direct access to the structures. Calling methods like sys.getrefcount () can help get some internal things.

If we talk about memory, it may look like this:

Here, the memory circuit is very different from the circuit in C shown above. Instead of x owning a block of memory in which the value of 2337 is stored, the newly created Python object owns the memory in which 2337 lives. The python name x does not directly own any address in memory, as the C-variable owns a static cell.

If you want to assign x new value, try this code:

 >>> x = 2338

The behavior of the system will be different from what happens in C, but will not be too different from the original binding (bind) in Python.

In this code:

A new PyObject is created.
A number for PyObject is assigned a typecode.
2 assigned a value for PyObject.
x indicates a new PyObject.
The reference count of the new PyObject is incremented by 1.
The reference count of the old PyObject is decremented by 1.

Now the memory circuit looks like this:

This illustration demonstrates that x points to an object reference and does not own the memory region as it used to. You also see that the command x = 2338 is not an assignment, but rather a binding of the name x to the link.

In addition, the previous object (containing the value of 2337 ) is now in memory with a reference count of 0, and will be removed by the garbage collector .

You can enter a new name, y , as in the C example:

 >>> y = x

A new name will appear in memory, but not necessarily a new object:

Now you see that a new Python object is not created, only a new name is created that points to the same object. In addition, the object reference count increased by 1. You can check the equivalence of the identity of objects to confirm their sameness:

 >>> y is x True

This code shows that x and y are one object. But make no mistake: y is still immutable. For example, you can perform an addition operation with y :

 >>> y += 1 >>> y is x False

After the addition call, you will be returned a new Python object. Now the memory looks like this:

A new object has been created, and y now points to it. It is curious that we would get exactly the same final state if we directly tied y to 2339 :

 >>> y = 2339

After this expression, we obtain the final state of memory, as in the addition operation. Let me remind you that in Python you do not assign variables, but bind names to links.

About interned objects in Python

Now you understand how new objects are created in Python and how names are attached to them. It's time to talk about interned objects.

We have the following Python code:

 >>> x = 1000 >>> y = 1000 >>> x is y True

As before, x and y are names pointing to the same Python object. But this object containing the value 1000 cannot always have the same memory address. For example, if you add two numbers and get 1000, you will get another address:

 >>> x = 1000 >>> y = 499 + 501 >>> x is y False

This time, the string x is y returns False . If you are embarrassed, do not worry. Here is what happens when this code is executed:

A Python object ( 1000 ) is created.
It is given the name x .
A Python object is created ( 499 ).
A Python object is created ( 501 ).
These two objects add up.
A new Python object ( 1000 ) is created.
It is given the name y .

Technical explanation : the steps described take place only when this code is executed inside the REPL. If you take the above example, paste it into a file and run it, then the line x is y returns True .

The reason is the CPython compiler's quick thinking , which is trying to perform peephole optimizations that help, as far as possible, to save code execution steps. Details can be found in the source code of the CPython peephole optimizer .

But isn't it wasteful? Well, yes, but this price you pay for all the great benefits of Python. You do not need to think about removing such intermediate objects, and do not even need to know about their existence! The joke is that these operations are performed relatively quickly, and you would not have known about them until this moment.

The creators of Python wisely noticed these overheads and decided to make a few optimizations. Their result is behavior that may surprise newbies:

 >>> x = 20 >>> y = 19 + 1 >>> x is y True

In this example, almost the same code as above, except that we get True . It's all about interned objects. Python pre-creates in memory a certain subset of objects and stores them in the global namespace for everyday use.

What objects depend on the Python implementation? In CPython 3.7 interned are:

Integers in the range of -5 to 256 .
Strings containing only ASCII letters, numbers, or underscores.

This is done because these variables are very often used in many programs. When interning, Python prevents memory allocation for constantly used objects.

Lines smaller than 20 characters and containing ASCII letters, numbers, or underscores will be interned, since it is assumed that they will be used as identifiers:

 >>> s1 = "realpython" >>> id(s1) 140696485006960 >>> s2 = "realpython" >>> id(s2) 140696485006960 >>> s1 is s2 True

Here, s1 and s2 point to the same address in memory. If we inserted a non-ASCII letter, digit or underscore, we would get a different result:

 >>> s1 = "Real Python!" >>> s2 = "Real Python!" >>> s1 is s2 False

In this example, an exclamation point is used, so the strings are not interned and are different objects in memory.

Bonus : If you want these objects to refer to the same interned object, you can use sys.intern() . One way to use this feature is described in the documentation:

String interning is useful for a slight increase in performance when searching through a dictionary: if the keys in the dictionary and the desired key are interned, then the comparison of keys (after hashing) can be performed by comparing pointers rather than strings. ( Source )

Interned objects are often confused by programmers. Just remember that if you start to doubt, you can always use id() and is to determine the equivalence of objects.

Pointer emulation in Python

The fact that pointers are missing natively in Python does not mean that you cannot take advantage of pointers. There are actually several ways to emulate pointers in Python. Here we look at two of them:

Use as pointers of changeable types.
Application of specially prepared Python objects.

Use as pointers of changeable types

You already know what changeable types are. It is because of their variability that we can emulate the behavior of pointers. Suppose you need to replicate this code:

 void add_one(int *x) { *x += 1; }

This code takes a pointer to the number ( *x ) and increments the value by 1. Here is the main function to execute the code:

 #include <stdi.h> int main(void) { int y = 2337; printf("y = %d\n", y); add_one(&y); printf("y = %d\n", y); return 0; }

In the above fragment, we assigned y value 2337 , displayed the current value, increased it by 1, and then derived the new value. Appears on the screen:

 y = 2337 y = 2338

One way to replicate this behavior in Python is to use a mutable type. For example, apply the list and change the first element:

 >>> def add_one(x): ... x[0] += 1 ... >>> y = [2337] >>> add_one(y) >>> y[0] 2338

Here add_one(x) refers to the first element and increases its value by 1. Applying the list means that as a result we will get a modified value. So there are pointers in Python? Not. The described behavior became possible because the list is a changeable type. If you try to use a tuple, you will get an error:

 >>> z = (2337,) >>> add_one(z)

Reverse tracing (the most recent are the most recent calls):

  File "<stdin>", line 1, in <module> File "<stdin>", line 2, in add_one TypeError: 'tuple' object does not support item assignment

This code demonstrates the immutability of the tuple, so it does not support the assignment of elements.

list not the only changeable type, part pointers are emulated with dict .

Suppose you have an application that should track the occurrence of interesting events. This can be done by creating a dictionary and using one of its elements as a counter:

 >>> counters = {"func_calls": 0} >>> def bar(): ... counters["func_calls"] += 1 ... >>> def foo(): ... counters["func_calls"] += 1 ... bar() ... >>> foo() >>> counters["func_calls"] 2

In this example, the dictionary uses counters to track the number of function calls. After calling foo() counter increased by 2, as expected. And all thanks to the variability of dict .

Do not forget, this is only an emulation of pointer behavior, it is in no way connected with real pointers in C and C ++. It can be said that these operations are more expensive than if they were performed in C or C ++.

Using Python Objects

dict is a great way to emulate pointers in Python, but sometimes it's tedious to remember which key name you used. Especially if you use the dictionary in different parts of the application. A custom Python class can help here.

Suppose you need to track metrics in an application. A great way to abstract from annoying details is to create a class:

 class Metrics(object): def __init__(self): self._metrics = { "func_calls": 0, "cat_pictures_served": 0, }

This code defines the Metrics class. He still uses the dictionary to store the actual data that lies in the _metrics member _metrics . This will give you the desired variability. Now you just need to access these values. You can do this using the properties:

 class Metrics(object): # ... @property def func_calls(self): return self._metrics["func_calls"] @property def cat_pictures_served(self): return self._metrics["cat_pictures_served"]

Here we use @property . If you are not familiar with decorators, then read the article Primer on Python Decorators . In this case, the @property decorator allows you to refer to func_calls and cat_pictures_served as if they were attributes:

 >>> metrics = Metrics() >>> metrics.func_calls 0 >>> metrics.cat_pictures_served 0

The fact that you can refer to these names as attributes means that you are abstracted from the fact that these values are stored in a dictionary. In addition, you make attribute names more explicit. Of course, you should be able to increase the values:

 class Metrics(object): # ... def inc_func_calls(self): self._metrics["func_calls"] += 1 def inc_cat_pics(self): self._metrics["cat_pictures_served"] += 1

inc_func_calls()
inc_cat_pics()

metrics . , , :

 >>> metrics = Metrics() >>> metrics.inc_func_calls() >>> metrics.inc_func_calls() >>> metrics.func_calls 2

func_calls inc_func_calls() Python. , - metrics , .

: , inc_func_calls() inc_cat_pics() @property.setter int , .

Metrics :

 class Metrics(object): def __init__(self): self._metrics = { "func_calls": 0, "cat_pictures_served": 0, } @property def func_calls(self): return self._metrics["func_calls"] @property def cat_pictures_served(self): return self._metrics["cat_pictures_served"] def inc_func_calls(self): self._metrics["func_calls"] += 1 def inc_cat_pics(self): self._metrics["cat_pictures_served"] += 1

ctypes

, - Python, CPython? ctypes , C. ctypes, Extending Python With C Libraries and the «ctypes» Module .

, , . - add_one() :

 void add_one(int *x) { *x += 1; }

, x 1. , (shared) . , add.c , gcc:

 $ gcc -c -Wall -Werror -fpic add.c $ gcc -shared -o libadd1.so add.o

C add.o . libadd1.so .

libadd1.so . ctypes Python:

 >>> import ctypes >>> add_lib = ctypes.CDLL("./libadd1.so") >>> add_lib.add_one <_FuncPtr object at 0x7f9f3b8852a0>

ctypes.CDLL , libadd1 . add_one() , , Python-. , . Python , .

, ctypes :

 >>> add_one = add_lib.add_one >>> add_one.argtypes = [ctypes.POINTER(ctypes.c_int)]

, C. , , :

 >>> add_one(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> ctypes.ArgumentError: argument 1: <class 'TypeError'>: \ expected LP_c_int instance instead of int

Python , add_one() , . , ctypes . :

 >>> x = ctypes.c_int() >>> x c_int(0)

x 0 . ctypes byref() , .

: .

, . , .

add_one() :

 >>> add_one(ctypes.byref(x)) 998793640 >>> x c_int(1)

Fine! 1. , Python .

Conclusion

Python . , Python.

Python:

.
Python- .
ctypes.

Python .

Source: https://habr.com/ru/post/454324/

All Articles