📜 ⬆️ ⬇️

Python inside. Objects Tail

1. Introduction
2. Objects. Head
3. Objects. Tail
4. Process structures

In the previous part, we began to study the object system of Python: we understood what exactly can be considered an object and how objects perform their work. We continue consideration of the issue.

Greetings to you in the third part of our series of articles on the internals of Python (I strongly recommend reading the second part, if you have not done this yet, otherwise you will not understand anything). In this episode, we will talk about an important concept, which we still cannot find, about attributes. If you ever wrote anything on the Python, then you had to use them. Object attributes are other objects associated with it that are accessible through an operator . (dot), for example: >>> my_object.attribute_name . We briefly describe the behavior of Python when referring to attributes. This behavior depends on the type of object that is accessible by attribute (have you already understood that this applies to all operations related to objects?).
')
In the type, you can describe special methods that modify access to the attributes of its instances. These methods are described here (as we already know, they will be associated with the necessary slots of the type by the fixup_slot_dispatchers function, where the type is created ... you read the previous post , right?). These methods can do anything; Whether you describe your type in C or in Python, you can write methods that save and return attributes from some incredible storage, if you like, you can send and receive attributes on the radio from the ISS or even store them in relational database. But in more or less ordinary conditions, these methods simply write an attribute as a key-value pair (attribute name / attribute value) in an object's dictionary when the attribute is set, and return an attribute from this dictionary when it is requested (or an exception is thrown AttributeError , if the dictionary does not have a key corresponding to the name of the requested attribute). It's all so simple and beautiful, thank you for your attention, perhaps we’ll end it.

Stand ! My friends, the fecal masses have just begun their rapid approach to the rotating wind generator. To disappear, so to all to disappear. I propose to jointly study what is happening in the interpreter and ask, as we usually do, some annoying questions.

We read carefully the code or go directly to the text description:

 >>> print(object.__dict__) {'__ne__': <slot wrapper '__ne__' of 'object' objects>, ... , '__ge__': <slot wrapper '__ge__' of 'object' objects>} >>> object.__ne__ is object.__dict__['__ne__'] True >>> o = object() >>> o.__class__ <class 'object'> >>> oa = 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'object' object has no attribute 'a' >>> o.__dict__ Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'object' object has no attribute '__dict__' >>> class C: ... A = 1 ... >>> C.__dict__['A'] 1 >>> CA 1 >>> o2 = C() >>> o2.a = 1 >>> o2.__dict__ {'a': 1} >>> o2.__dict__['a2'] = 2 >>> o2.a2 2 >>> C.__dict__['A2'] = 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'dict_proxy' object does not support item assignment >>> C.A2 = 2 >>> C.__dict__['A2'] is C.A2 True >>> type(C.__dict__) is type(o2.__dict__) False >>> type(C.__dict__) <class 'dict_proxy'> >>> type(o2.__dict__) <class 'dict'> 

Let's translate this into a human language: object (this is the simplest built-in type, if you forget), as we can see, has a dictionary, and everything we can access through attributes is identical to what we see in object.__dict__ . It should surprise us that instances of type object (for example, object o ) do not support the definition of additional attributes and do not have __dict__ at all, but do support access to the existing attributes (try o.__class__ , o.__hash__ , etc.; these commands allow then return ). After that we created a new C class, inherited it from object , added attribute A and saw that it is accessible through CA and C.__dict__['A'] , as expected. Then we created an instance of class o2 C and saw that the attribute definition changes __dict__ , and vice versa, the change in __dict__ affects the attributes. Afterwards, we were surprised to find out that the __dict__ class is read-only, although the definition of attributes ( C.A2 ) works great . Finally, we saw that the __dict__ objects of the instance and class are of different types - the usual dict and the mysterious dict_proxy respectively. And if all this is not enough, remember the puzzle from the previous part: if the heirs of a pure object (for example, o ) do not have __dict__ , and C expands the object without adding anything significant, then suddenly instances of class C ( o2 ) have __dict__ ?

Yeah, everything is strange and strange! But do not worry, everything has its time. First, consider the implementation of __dict__ type. If you look at the definition of PyTypeObject (I strongly recommend reading!), You can see the slot tp_dict , ready to accept a pointer to a dictionary. This slot should be in all types. The dictionary is placed there when calling ./Objects/typeobject.c : PyType_Ready , which occurs when the interpreter is initialized (remember Py_Initialize ? This function calls _Py_ReadyTypes , which calls PyType_Ready for all known types), or when the user dynamically creates a new type ( type_new calls PyType_Ready for each newborn type before returning). In fact, each name that you specify in the class statement appears in __dict__ new type (line ./Objects/typeobject.c : type_new : type->tp_dict = dict = PyDict_Copy(dict); ). Do not forget that types are also objects, i.e. they also have a type - type , which has slots with functions that provide access to attributes as needed. These functions use a dictionary that each type has and that tp_dict points to for storing / accessing attributes. Thus, a call to type attributes is, in fact, a call to the private dictionary of an instance of type , which is indicated by the type structure.

 class Foo: bar = "baz" print(Foo.bar) 

In this example, the last line demonstrates a call to a type attribute. In this case, to find the bar attribute, the function of accessing the attributes of the class Foo (pointed to by tp_getattro ) will be called. Approximately the same thing happens when defining and deleting attributes (for the interpreter, by the way, “deletion” is just setting the value to NULL ). I hope, until now everything was clear, and in the meantime we discussed the appeal to the attributes.

Before we consider accessing the attributes of instances, let me say a little-known (but very important!) Concept: a descriptor . Descriptors play a special role in accessing instance attributes, and I need to clarify what it is. An object is considered to be a descriptor if one or two slots of its type ( tp_descr_get and / or tp_descr_set ) are filled with non-zero values. These slots are associated with the special methods __get__ , __set__ and __delete__ (for example, if you define a class with the __get__ method that __get__ slot and create an object of this class, then this object will be a descriptor). Finally, an object is considered a data descriptor if the tp_descr_set slot is filled with a nonzero value. As we will see, descriptors play an important role in accessing attributes, and I will give some explanations and links to the necessary documentation.

So, we figured out what descriptors are, and understood how type attributes are accessed. But most objects are not types, i.e. their type is not type , but something more prosaic, for example, int , dict or a custom class. All of them rely on universal attribute access functions that are either defined in the type or inherited from the type parent when it was created (this topic, slot inheritance, we discussed in the “ Head ”). The algorithm of the universal function of accessing attributes ( PyObject_GenericGetAttr ) looks like this:

  1. Search in the instance type dictionary and in dictionaries of all parents of the type. If a data descriptor is found, call its tp_descr_get function and return the result. If something else is found, remember this just in case (for example, under the name X ).
  2. Search the object's dictionary and return the result if it is found.
  3. If nothing was found in the object dictionary, check X if it was installed; if X is a descriptor, call its tp_descr_get function and return the result. If X is a regular object, return it.
  4. Finally, if nothing was found, throw an AttributeError exception.

Now we understand that descriptors can execute code when accessed as attributes (that is, when you write foo = oa or oa = foo , a executes the code). Powerful functionality that is used to implement some of the "magic" features of Python. Data descriptors are even more powerful because they take precedence over instance attributes (if you have an object o class C , class C has a data descriptor foo , and o has an attribute foo , then when executing o.foo result will return a descriptor). Read what descriptors are and how . I especially recommend the first link (“what”) - despite the fact that at first it was discouraging, after attentive and thoughtful reading you will understand that it is much simpler and shorter than my talk. It is also worth reading Raymond Hettinger's amazing article that describes descriptors in Python 2.x; With the exception of removing unrelated methods, the article is still relevant for version 3.x and is recommended to be read. Descriptors are a very important concept, and I advise you to devote some time to studying the listed resources in order to understand them and get into the idea. Here, for the sake of brevity, I will no longer go into details, but I will give an example ( very simple) of their behavior in the interpreter:

 >>> class ShoutingInteger(int): ... # __get__   tp_descr_get ... def __get__(self, instance, owner): ... print('I was gotten from %s (instance of %s)' ... % (instance, owner)) ... return self ... >>> class Foo: ... Shouting42 = ShoutingInteger(42) ... >>> foo = Foo() >>> 100 - foo.Shouting42 I was gotten from <__main__.Foo object at 0xb7583c8c> (instance of <class __main__.'foo'>) 58 # :     ! >>> foo.Silent666 = ShoutingInteger(666) >>> 100 - foo.Silent666 -566 >>> 

Note that we have just gained a complete understanding of object-oriented inheritance in Python: the attribute search starts with the object type, and then in all parents, we understand that accessing the attribute A object O class C1 , which is inherited from C2 , which in turn inherits from C3 , can return A from O , C1 , and C2 and C3 , which is determined by a certain order of resolution methods, which is described well here . This way of attribute resolution together with inheritance of slots is enough to explain most of the inheritance functionality in Python (although the devil, as usual, is in the details).

We have learned a lot today, but it is still unclear where references to object dictionaries are stored. We have already seen the definition of PyObject , and there definitely is no pointer to a similar dictionary. If not there, then where? The answer is rather unexpected. If you look closely at PyTypeObject (this is a useful pastime! Read daily!), You can see a field called tp_dictoffset . This field specifies the byte offset in C-structures allocated for type instances; At this offset is a pointer to a regular Python dictionary. Under normal conditions, when creating a new type, the size of the type of memory plots required for instances will be calculated, and this size will be larger than that of the pure PyObject . Additional space is usually used (among other things) to store the pointer to the dictionary (all this happens in ./Objects/typeobject.c : type_new , read from the line may_add_dict = base->tp_dictoffset == 0; ). Using gdb , we can easily break into this space and look at the object's private dictionary:

 >>> class C: pass ... >>> o = C() >>> o.foo = 'bar' >>> o <__main__.C object at 0x846b06c> >>> #   GDB Program received signal SIGTRAP, Trace/breakpoint trap. 0x0012d422 in __kernel_vsyscall () (gdb) p ((PyObject *)(0x846b06c))->ob_type->tp_dictoffset $1 = 16 (gdb) p *((PyObject **)(((char *)0x846b06c)+16)) $3 = {u'foo': u'bar'} (gdb) 

We created a new class, an object and defined an attribute for it ( o.foo = 'bar' ), entered gdb , tp_dictoffset object type ( C ) and found it tp_dictoffset (16), and then checked what is located on this offset in C-structure of the object. Not surprisingly, we found an object dictionary with a single key, foo , indicating the value of bar . Naturally, if you check the tp_dictoffset type that does not have __dict__ , for example, an object , then we find zero there. Goosebumps, huh?

The fact that type dictionaries and instance dictionaries are similar, but their implementations differ a lot, can be confusing. There are still a few mysteries. Let's summarize and determine what we missed: define an empty class C inherited from object , create an object o this class, allocate additional memory for the pointer to the dictionary by offset tp_dictoffset (the space is allocated from the very beginning, but the dictionary is allocated only at the first (any) appeal; here is a trap ...). Then we execute in the interpreter o.__dict__ , compiles the byte code with the LOAD_ATTR command, which calls the PyObject_GetAttr function, which dereferences the object type o and finds the tp_getattro slot, which starts the standard attribute search process described above and implemented in PyObject_GenericGetAttr . In the end, after all this happens, what does the dictionary of our object return? We know where the dictionary is stored, but you can see that __dict__ doesn’t have him, so there is a chicken and egg problem: what does the dictionary give us when we turn to __dict__ if it’s not in the dictionary itself?

Something that has priority over an object's dictionary is a handle. See:

 >>> class C: pass ... >>> o = C() >>> o.__dict__ {} >>> C.__dict__['__dict__'] <attribute '__dict__' of 'C' objects> >>> type(C.__dict__['__dict__']) <class 'getset_descriptor'> >>> C.__dict__['__dict__'].__get__(o, C) {} >>> C.__dict__['__dict__'].__get__(o, C) is o.__dict__ True >>> 

Wow! You can see that there is something called getset_descriptor (a ./Objects/typeobject.c file), a certain group of functions that implements the descriptor protocol, and which must be in a __dict__ type object. This descriptor will intercept all attempts to access o.__dict__ objects of this type and return everything that it wants, in our case, it will be a pointer to the dictionary by offset tp_dictoffset to o . This also explains why we saw dict_proxy bit earlier. If tp_dict is a pointer to a simple dictionary in tp_dict , why do we see it wrapped in an object to which it is impossible to write something? This makes the __dict__ type descriptor type .

 >>> type(C) <class 'type'> >>> type(C).__dict__['__dict__'] <attribute '__dict__' of 'type' objects> >>> type(C).__dict__['__dict__'].__get__(C, type) <dict_proxy object at 0xb767e494> 

This handle is a function that wraps the dictionary with a simple object that simulates the behavior of a regular dictionary, except that it is read-only. Why is it so important to prevent user intervention in the __dict__ type? Because the namespace can contain special methods, for example __sub__ . When we create a type with special methods, or when we define them for a type through attributes, the update_one_slot function is update_one_slot , which connects these methods with slots of the type, for example, as it happened with the subtraction operation in a previous post. If we could add these methods directly to the __dict__ type, they would not be associated with slots, and we would get a type similar to what we need (for example, it has __sub__ in the dictionary), but which behaves differently .

We have long crossed the line in 2000 words, for which the reader's attention is rapidly fading away, but I still have not told about __slots__ . How about self- reading, daredevils? You have everything in place to deal with them alone! Read the document at the specified link, play a little with __slots__ in the interpreter, look at the sources and search them through gdb . Enjoy. In the next series, I think we will leave objects for some time and talk about the state of the interpreter and the state of the stream . I hope it will be interesting. But even if it does not, it is still necessary to know. What I can say for sure is that girls terribly like guys who are knowledgeable in such matters.

And you know what? Not just girls. We also like these guys. Come - together more fun.

Source: https://habr.com/ru/post/190336/


All Articles