📜 ⬆️ ⬇️

Python inside. Objects Head

1. Introduction
2. Objects. Head
3. Objects. Tail
4. Process structures

We continue to understand the insides of Python. Last time, we learned how Python digests a simple program. Today we begin to study the structure of its object system.

As I wrote in the previous episode (which, by the way, turned out to be successful; thank you all, your views and comments literally make me move on!) - today's post is dedicated to the implementation of objects in Python 3.x. At first, I thought it was a simple topic. But even when I read all the code that had to be read before writing a post, I can hardly say that the Python object system ... uhh, “simple” (and I can’t say that I understood it to the end). But I was even more convinced that the implementation of objects is a good topic to start with. In the following posts we will see how important it is. At the same time, I suspect, very few people, even among the veterans of Python, fully understand it. Objects are loosely connected with the rest of Python (when I wrote a post I didn’t look into ./Python and learned more ./Objects and ./Include ). It seemed to me easier to consider the implementation of objects as if it is not at all connected with everything else. So, as if it is a universal API in the C language for creating object subsystems. Perhaps it will also be easier for you to think in this way: remember, all this is just a set of structures and functions for managing these structures.

Everything in Python is an object: numbers, dictionaries, user and built-in classes, stack frames and code objects. In order for a pointer to a memory location to be considered an object, at least two fields are needed, defined in the ./Include/object.h structure: PyObject :
')
 typedef struct _object { Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; 

Many objects expand this structure by adding the required fields, but these two fields must be present anyway: the reference counter and the type (a pair of mysterious fields are added to track the references in special debug builds).

The reference count is a number indicating how many other objects refer to the given. In the code >>> a = b = c = object() , an empty object is initialized and associated with three different names: a , b and c . Each name creates a new link to the object, but the object is created once. Linking an object with a new name or adding an object to the list creates a new link, but does not create a new object! On this topic, you can still talk a lot , but this is more related to garbage collection, and not to the object system. I'd rather write a separate post about this, instead of sorting out this question here. But before leaving this topic, I’ll say that now it's easier for us to understand the macro ./Include/object.h : Py_DECREF , which we met in the first part: it only decrements ob_refcnt (and frees resources if ob_refcnt takes a zero value) . On it we will finish with reference counting.

It remains to make out ob_type , a pointer to an object type , the central concept of a Python object model (keep in mind: in the third Python, the type and class are essentially the same; for historical reasons, the use of these terms depends on the context). Each object has only one type that does not change during the life of the object (the type can change in extremely rare circumstances. There is no API for this task, and you would hardly have read this article if you were working with objects with changing types). Perhaps more importantly, the fact that the type of the object (and only the type of the object) determines what can be done with it (an example in the spoiler after this paragraph). As you remember from the first part, when performing the subtraction operation, the same function is called ( PyNumber_Subtract ) regardless of the type of operands: for integers, for integer and fractional, or even for complete absurdity, like subtracting an exception from the dictionary.

Show code
 # ,   , ,      >>> class Foo(object): ... "I don't have __call__, so I can't be called" ... >>> class Bar(object): ... __call__ = lambda *a, **kw: 42 ... >>> foo = Foo() >>> bar = Bar() >>> foo() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'Foo' object is not callable >>> bar() 42 #   __call__? >>> foo.__call__ = lambda *a, **kw: 42 >>> foo() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'Foo' object is not callable #      Foo? >>> Foo.__call__ = lambda *a, **kw: 42 >>> foo() 42 >>> 

At first it seems strange. How can a single function support any kind of objects passed to it? It can get a void * pointer (in fact, it will get a PyObject * pointer, which is also opaque to the data), but how does it determine what to do with the resulting argument? The answer lies in the type of object. A type is also an object (it has both a reference count and its own type; the type of most types is type ), but in addition to the two main fields it contains many other fields. The structure definition and description of its fields are here . The definition itself is in ./Include/object.h : PyTypeObject . I recommend to contact him in the course of reading the article. Many fields of an object of a type are called slots and indicate functions (or structures indicating related functions) that will be executed when calling the C-API function Python on objects of this type. And although it seems to us that PyNumber_Subtract works with arguments of different types, in fact, the types of operands are dereferenced and the subtraction function specific to this type is called. Thus, the C-API functions are not universal. They rely on types and abstract from details, and it seems that they work with any data (while throwing a TypeError exception is also a job).

Let's look at the details. PyNumber_Subtract calls the universal function of two arguments ./Objects/abstract.c : binary_op , specifying that you need to work with the nb_subtract slot (there are similar slots for other operations, for example, nb_negative for negating numbers or sq_length for determining the length of a sequence). binary_op is a error-checking wrapper over binary_op1 , a function that does all the work. ./Objects/abstract.c : binary_op1 (read this function code - opens eyes to many things) takes the operands of the operation BINARY_SUBTRACT as v and w , and tries to dereference v->ob_type->tp_as_number , a structure containing pointers to functions that implement numeric protocol. binary_op1 expects to find in tp_as_number->nb_subtract C function that either subtracts or returns the special value Py_NotImplemented if it determines that the operands are incompatible as decremented and subtracted (this will throw a TypeError exception).

If you want to change the behavior of objects, you can write an extension in C that overrides the PyTypeObject structure and fills in the slots as you like. When we create new types in Python ( >>> class Foo(list): pass creates a new type , classes and types are one and the same), we do not manually describe any structures and do not fill in any slots. But why then do these types behave the same as the built-in ones? That's right, because of inheritance, in which typing has a significant role. Python already has some built-in types, like list and dict . As mentioned, these types have certain functions that fill the corresponding slots, which gives objects the desired behavior: for example, the variability of a sequence of values ​​or the mapping of keys to values. When you create a new type in Python, on the heap for it (as for any other object) a new C-structure is dynamically determined and its slots are filled accordingly to the inherited, basic , type (you can ask, what about multiple heritability? in other episodes ). Since slots copied, newly conscious subtype and base have almost identical functionality. In Python, there is a base type without any functionality - object ( PyBaseObject_Type in C), in which almost all slots are set to zero, and which can be expanded without inheriting anything.

Thus, you cannot create a type in Python, you always inherit from something else (if you define a class without explicit inheritance, it will implicitly inherit from object ; in Python 2.x, in this case a “classic” class will be created , we will not consider them). Naturally, you do not need to constantly inherit everything . You can change the behavior of the type created right in Python, as shown in the snippet above. Defining a special method __call__ for the class Bar , we have made instances of this class callable.

Something, somewhere, during the creation of our class, notices this method __call__ and associates it with the tp_call slot. ./Objects/typeobject.c : type_new is a complex, important function - this is the place where all this happens. We will take a closer look at this function in the next post, and now we will pay attention to the line almost at the very end, after the new type has already been created, but before returning it: fixup_slot_dispatchers(type); . This function runs over all the correctly named methods defined in the new type and associates them with the necessary slots in the type structure, based on the names of the methods (but where are these methods stored?).

Another incomprehensible point: how does the definition of a __call__ method in a type after its creation make instances of this type invoked, even if they were instantiated before defining the method? Easy and simple, my friends. As you remember, a type is an object, and a type type is a type (if you have a broken head, do: >>> class Foo(list): pass ; type(Foo) ). Therefore, when we do something with a class (we could write the word type instead of class , but since we use “type” in a different context, let's call our type a class for some time), for example, we call, we subtract or we define an attribute, the ob_type field of the ob_type object is dereferenced, and it is discovered that the type of the class is type . Then to set the attribute, the slot type->tp_setattro . That is, a class can have a separate attribute setting function. And such a specific function (if you want to make friends on Facebook, here is its page - ./Objects/typeobject.c : type_setattro ) calls the same function ( update_one_slot ) that fixup_slot_dispatchers uses to resolve all issues after defining a new attribute. New parts are revealed!

On it, probably, it is necessary to finish introduction to objects of the Python. I hope you enjoyed the trip and you are still with me. I have to admit that writing this post turned out to be much more difficult than I thought (and without the help of Antoine Pitrou and Mark Dickins late at night on #python-dev I would most likely have given up!). We still have a lot of interesting things: which slot of the operand is used in binary operations? What happens with multiple inheritance , and what about those dreadful little details associated with it? What about metaclasses ? And __slots__ and weak links ? What is going on in the built-in objects? How do dictionaries , lists , sets and their brethren work? And finally, what about this miracle?

 >>> a = object() >>> class C(object): pass ... >>> b = C() >>> a.foo = 5 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'object' object has no attribute 'foo' >>> b.foo = 5 >>> 

How can you just add an arbitrary attribute to b , an instance of class C , which inherits from object , and you can't do the same with a , an instance of the same object ? Those who know can say: b has __dict__ , but a does not. Yes it is. But where did this new (and completely non-trivial!) Functionality come from if we don’t inherit it?

Ha! I am extremely happy with such questions! The answers will be , but in the next episode.



A short list of references for curious:


Have a nice sleep.



Remember! We are always happy to meet with interested people.

Source: https://habr.com/ru/post/189986/


All Articles