Procedure for resolving methods in Python

This note discusses the MRO C3 algorithm and some specific problems of multiple inheritance. Although the algorithm and problems are not limited to the framework of a single language, I focused my attention on Python. At the end is a list of useful links on this topic.

The method resolution order allows Python to figure out from which ancestor class the method should be called if it is not found directly in the descendant class. If each descendant has only one ancestor, then the task is trivial. There is an upward search across the entire hierarchy. If multiple inheritance is used, you may encounter specific problems, which are described below.

In the old versions of Python, the order of resolution of the methods was quite primitive: the search was carried out in all parent classes from left to right to the maximum depth. Those. if the parent class in turn did not have the necessary method, but there were parents, then the search was performed in them according to the same rules. For example, let's take the structure:

 AC
 |  |
 Bd
  \ /
    E

When referring to an instance method of a class E, such an algorithm would search sequentially in classes E, B, A, D and C. Thus, the search was conducted first in the first parent class and in all its ancestors, then in the second parent class with all the ancestors etc. This method did not cause any special complaints, until the classes had a common ancestor. However, starting with version 2.2, the base class object has appeared, from which it was recommended to inherit all user classes. Why it was introduced - a topic for a separate article. Perhaps the most significant factor is the separation of the object model and the meta-data model. Starting with version 3.0, the old classes are no longer supported, and all user-defined classes are derived from the object class by default.
')
This gave rise to the “diamond diagram” problem.

    object
    / \
   AB
    \ /
      C

If we have classes A and B, from which class C is inherited, then when searching for a method using the old algorithm (C, A, object, B), it turns out that if a method is not defined in classes C and A, it will be extracted from the object, even if it is defined in B. This creates certain inconveniences, since object has many magic methods defined, such as __init__, __str__, etc. But even if object is replaced with a certain user-defined class D, then the problem will remain - the less specific method of the ancestor class can work instead of the more specific method of the descendant class.

So, we have the class itself, a list of all its ancestors and the connections between them. From this data, we need to build an ordered list of classes in which the search method will be performed from left to right. Such a list is called class linearization. For simplicity, take the structure:

 object
    | 
    A
    |
    B

Then the linearization for class B will be the list [B, A, object]. Those. When calling B (). The something () method will first be searched in class B. If it is not found there, the search will continue in class A. If it is not there, the search will end in the object class. After going through all the classes from the linearization and not finding the desired method, Python will throw out the Attribute Error error.

To solve the diamond-shaped structure, linearization should be monotonous. This means that if in the linearization of a certain class C, class A follows class B (it has the form [C, ..., B, ..., A]), then for any of its descendant D, class B will follow A in its linearization ( it will be [D, ..., C, ..., B, ..., A]). As you can see, the old order of resolution of methods is not monotonous, since in the case of a diamond-shaped structure for class A, linearization is [A, object], for class B - [B, object], but for class C - [C, A, object, B], which violates the monotony property with respect to class B.

To satisfy the property of monotonicity, in this case, two linearizations are appropriate: [C, A, B, object] and [C, B, A, object]. Obviously, both of them do not violate monotony either with respect to class A (since object follows A in both cases) or with respect to class B (since object follows B in both cases). So which one to choose? In this case, the most intuitive way is to look at the definition of class C. If a class is declared as C (A, B), then it is reasonable to take the first linearization, since B follows A. In this case, if the class is declared as C (B, A ), it will be better to take the second linearization, in which A follows B.

Such a choice is determined by the local precedence order (local precedence ordering). The property of the order of local seniority requires respect for the parent classes in the linearization of the descendant class of the same order as when it was declared. For example, if a class is declared as D (A, B, C), then in linearization D, class A should stand before B, and class B should stand for C.

Let's sum up the intermediate results:

A class is a linearization of a list from the class itself and all its ancestors (parents and parents) in which, in order from left to right, the method will be searched.
Method Resolution Order (MRO) is the way a class is linearized.
monotony is a property that requires that the descendant class of the parent classes be followed in linearization of the same class as in the parent class.
the order of local precedence is a property that requires that the descendant class follow the same order of parent classes in linearization as in its declaration.

We want our linearization to observe both monotony and the order of local seniority.

The algorithm for ordering the resolution of methods C3.

To achieve the above goals in Python, the C3 algorithm is used. It is quite simple, however, for a better understanding, we introduce the following conventions:
[C1, C2, ... CN] - a list of the elements C1, C2, ... CN. Accordingly, [] is a list of one element C.
L [C] - linearization of class C. It is important to remember that any linearization is a list by definition.
merge (L [C1], L [C2], ..., L [CN]) - combining the elements of linearization L [C1], L [C2], ..., L [CN] into the list using some algorithm. In fact, this algorithm should order all classes from L [C1], L [C2], ..., L [CN] and eliminate duplication of classes in the final list.

Algorithm C3 is a set of the following rules:

Linearization of class C is a singleton list of the class C itself, plus the union of the linearizations of its parents and the list of all its parents. In the legend, this can be written as L [C] = [C] + merge (L [C1], L [C2], ..., L [CN], [C1, C2, ..., CN]) if the class C was declared class C (C1, C2, ..., CN). It should be noted that each linearization of L [CX] begins with the class CX, which was additionally assigned to the end of the merge list as an immediate parent of class C. Why this is done will become clear further.
Combining linearization is as follows:
1. Take the zero element from the first linearization (L [C1] [0]).
2. This element is sought in all other linearizations (from L [C2] to L [CN]).
3. If this element is found somewhere outside the beginning of the list (L [CK] [X] == L [C1] [0], X! = 0; in essence, this means that L [C1] [0] is someone else ancestor), the algorithm proceeds to the first step, taking as the zero element from the following linearization (L [C2] [0]).
4. If an element is not found anywhere in a position other than zero, then it is added to the end of the final linearization list and removed from all lists under consideration (from L [C1] to L [CN]; the same class cannot be found in the final list twice) . If after deleting an item there are empty lists, they are excluded from the union. After this, the algorithm is repeated from the very beginning (from the new element L [C1] [0]), if it exists. If it is not - the union is over.
5. If the algorithm has reached the last element of L [CN], but none of the zero elements satisfies the rules, then linearization is not possible.

To make the algorithm clearer, let’s analyze a few examples. First of all, let's see what happens with our diamond shape now. For her, we get:
L [C] = [C] + merge (L [A], L [B], [A, B])
L [A] = [A] + merge (L [object], [object])
L [B] = [B] + merge (L [object], [object])
L [object] = [object] (degenerate case)

Of greatest interest is the process of unification, so we analyze it in more detail. In cases of L [A] and L [B], the expression merge (L [object], [object]) = merge ([object], [object]) is expanded trivially. Since both the first and second lists consist of one object element, according to item 4 of the merge rule, the result will be [object].

Now let's look at L [C] = [C] + merge (L [A], L [B], [A, B]) = [C] + merge ([A, object], [B, object], [A , B]). We will write down our union according to the rules C3:

Take the zero element of the first list - A.
Let's look for it in all the other lists, i.e. in [B, object] and [A, B].
Since class A is not found in the list [B, object], and in the list [A, B] is the zero element, we add it to the final linearization list L = [] + [A] = [A]. After this, class A needs to be removed from all lists in the union. We get L [C] = [C] + [A] + merge ([object], [B, object], [B]).
Again, take the zero element of the first list, object.
Let's look for it in all other lists and find out that object is the first (not zero) element of the list [B, object]. As mentioned above, in essence, this means that the class object is an ancestor of class B. So, first, it makes sense to look for a more specific method in class B. Therefore, the algorithm justly returns to item 1.
Again, take the zero element, but this time from the second list, i.e. B from the list [B, object].
Let's look for it in other lists and find it only in the third list [B] in the zero position, which suits us perfectly. Therefore, we add it to the final list, and then remove it from all the lists in the union. We get L = [A] + [B] = [A, B] and accordingly L [C] = [C] + [A, B] + merge ([object], [object]).
The resulting merge ([object], [object]) union has already been considered above. As a result, we obtain L [C] = [C] + [A, B] + [object] = [C] + [A, B, object] = [C, A, B, object]. The list [A, B, object] is the result of our merge (L [A], L [B], [A, B]) union.

I hope that the algorithm has become clearer. However, it is not yet clear why, according to the rules, it is necessary to add a list of all parents to the end of the union. Particularly perspicacious could already notice that if we changed the linearization of L [A] and L [B] in the union, i.e. would write merge (L [B], L [A], [A, B]), then the list of parents specified in the same order as when initializing the class descendant class C (A, B) would not allow class B be searched before A. And that's true. The list of parents is necessary in order not to allow violating the rule of local seniority. But let's look at an example class C (D, E):

 object
   |
   D
   |  \
   |  E
   |  /
   C

We write the linearization of L [C]:
L [C] = [C] + merge (L [D], L [E], [D, E])
L [E] = [E] + merge (L [D], [D])
L [D] = [D] + merge (L [object], [object]) = [D, object]

Perform substitution and get:
L [E] = [E] + merge ([D, object], [D]) = [E, D, object]
L [C] = [C] + merge ([D, object], [E, D, object], [D, E])

Now look what we got. The merge ([D, object], [E, D, object], [D, E]) merge cannot be completed, because in the [D, object] and [D, E] lists the zero element is D, which is the first list item [E, D, object]. Conversely, E, which is the zero element of [E, D, object], is also the first element of [D, E]. Thus, after 3 iterations, the algorithm will arrive at point 5, after which Python throws out a TypeError error: local precedence: L [C] = [C] + merge ([D, object], [E, D, object]) = [C] + [E] + merge ([D, object], [D, object] ) = [C] + [E, D] + [object] = [C, E, D, object]. With such a linearization of the class C, the search would be conducted first in the class E, and then in the class D, although the declaration recorded C (D, E).

Solving this problem is not difficult. It is enough to rewrite the ad on class C (E, D). In this case, we get:
L [C] = [C] + merge ([E, D, object], [D, object], [E, D]) = [C] + [E] + merge ([D, object], [D] , object], [D]) = [C] + [E, D, object] = [C, E, D, object].
In fact, nothing has changed, except that in the declaration of a class, the order of enumeration of parents turned out to be the same as in the linearization of the class, i.e. the order of local seniority is respected. It should be noted that although Python hints at the order in which it would be more logical to indicate parents, it will not prevent you from seeking adventure if you declare your own MRO through the metaclass. But more about that near the end.

Python calculates linearization once during class creation. If you want to get it for self-checking or for some other purpose, use the class property __mro__ (for example, C .__ mro__). In order to consolidate the material we will analyze examples of pozakovyrister. The object class is intentionally thrown away so as not to clutter the linearization. As is known from the above, with single inheritance, the classes are simply lined up from the descendant to the ancestors, so the object will always be at the end of any chain. And further. I am not a cook or a musician, so examples are just examples. It is not necessary to focus on semantic inaccuracies in them.

        Music
       / \
    Rock Gothic ------
    / \ / \
 Metal Gothic Rock \
   |  |  \
    \ ------------------ Gothic Metal
                 |  /
                 The 69 eyes

class Music (object): pass
class Rock (Music): pass
class Gothic (Music): pass
class Metal (Rock): pass
class GothicRock (Rock, Gothic): pass
class GothicMetal (Metal, Gothic): pass
class The69Eyes (GothicRock, GothicMetal): pass

L [The69Eyes] = [The69Eyes] + merge (L [GothicRock], L [GothicMetal], [GothicRock, GothicMetal])
L [GothicRock] = [GothicRock] + merge (L [Rock], L [Gothic], [Rock, Gothic])
L [GothicMetal] = [GothicMetal] + merge (L [Metal], L [Gothic], [Metal, Gothic])
L [Rock] = [Rock, Music]
L [Gothic] = [Gothic, Music]
L [Metal] = [Metal] + [Rock, Music] = [Metal, Rock, Music]

After substitutions we get:
L [GothicRock] = [GothicRock] + merge ([Rock, Music], [Gothic, Music], [Rock, Gothic]) = [GothicRock, Rock, Gothic, Music]
L [GothicMetal] = [GothicMetal] + merge ([Metal, Rock, Music], [Gothic, Music], [Metal, Gothic]) = [GothicMetal] + [Metal, Rock, Gothic, Music] = [GothicMetal, Metal , Rock, Gothic, Music]
L [The69Eyes] = [The69Eyes] + merge ([GothicRock, Rock, Gothic, Music], [GothicMetal, Metal, Rock, Gothic, Music], [GothicRock, GothicMetal])
= [The69Eyes] + [GothicRock, GothicMetal] + merge ([Rock, Gothic, Music], [Metal, Rock, Gothic, Music])
= [The69Eyes] + [GothicRock, GothicMetal, Metal] + merge ([Rock, Gothic, Music], [Rock, Gothic, Music])
= [The69Eyes, GothicRock, GothicMetal, Metal, Rock, Gothic, Music]

       Food -------
      / \ \
     Meat Milk Flour
     |  \ \ /  
 Rabbit Pork Pasty
       \ |  /
          Pie

class Food (object): pass
class Meat (Food): pass
class Milk (Food): pass
class Flour (Food): pass
class Rabbit (Meat): pass
class Pork (Meat): pass
class Pasty (Milk, Flour): pass
class Pie (Rabbit, Pork, Pasty): pass

L [Pie] = [Pie] + merge (L [Rabbit], L [Pork], L [Pasty], [Rabbit, Pork, Pasty])
L [Rabbit] = [Rabbit] + merge (L [Meat], [Meat])
L [Pork] = [Pork] + merge (L [Meat], [Meat])
L [Pasty] = [Pasty] + merge (L [Milk], L [Flour], [Milk, Flour])
L [Meat] = [Meat] + merge (L [Food], [Food]) = [Meat, Food]
L [Milk] = [Milk] + merge (L [Food], [Food]) = [Milk, Food]
L [Flour] = [Flour] + merge (L [Food], [Food]) = [Flour, Food]

After substitutions we get:
L [Rabbit] = [Rabbit, Meat, Food]
L [Pork] = [Pork, Meat, Food]
L [Pasty] = [Pasty] + merge ([Milk, Food], [Flour, Food], [Milk, Flour]) = [Pasty] + [Milk, Flour, Food] = [Pasty, Milk, Flour, Food ]
L [Pie] = [Pie] + merge ([Rabbit, Meat, Food], [Pork, Meat, Food], [Pasty, Milk, Flour, Food], [Rabbit, Pork, Pasty])
= [Pie] + [Rabbit] + merge ([Meat, Food], [Pork, Meat, Food], [Pasty, Milk, Flour, Food], [Pork, Pasty])
= [Pie] + [Rabbit, Pork] + merge ([Meat, Food], [Meat, Food], [Pasty, Milk, Flour, Food], [Pasty])
= [Pie] + [Rabbit, Pork, Meat] + merge ([Food], [Food], [Pasty, Milk, Flour, Food], [Pasty])
= [Pie] + [Rabbit, Pork, Meat, Pasty] + merge ([Food], [Food], [Milk, Flour, Food])
= [Pie] + [Rabbit, Pork, Meat, Pasty, Milk, Flour, Food]
= [Pie, Rabbit, Pork, Meat, Pasty, Milk, Flour, Food]

How to turn to ancestors.

Multiple inheritance causes another specific problem. In fact, a direct search for a method in the parent classes is only a fraction of the benefits that can be derived from it. As in the case of single inheritance, you can often make your life easier by implementing a method in the descendant that, in addition to some actions, will call the same method of the parent. For example, quite often you can find this:

class B (A):
def __init__ (self):
# something
A. __init__ (self)

However, for the case of multiple inheritance, this approach is not suitable. And for some reason:

class C (B, A):
def __init__ (self):
# something
B. __init__ (self)
A. __init__ (self)

First, we explicitly refer to the parent classes (in fact, in the example with a single inheritance, the same thing). If we want to replace someone from the ancestors with another class or remove them altogether, then we will have to change all the functions that were applied to it. This is fraught with bugs if we miss something. But this is still half the trouble. Secondly, we know nothing about classes A and B. Perhaps they have common ancestors, to whom they refer in a similar way:

class A (P1, P2):
def __init__ (self):
# something
P1. __init__ (self)
P2. __init__ (self)

class B (P1, P2):
def __init__ (self):
# something
P1. __init__ (self)
P2. __init__ (self)

If this is so, then it turns out that the initialization of the common ancestors will work two times. It is not right. To avoid this in Python there is a class super. In version 3.0, he is finally humanized and can be addressed as follows:

class C (B, A):
def __init__ (self):
# something
super (). __init__ () # 3.0 super(C, self)

Please note that in versions 2.x, the first argument is to specify the class itself, and not any of its parents. In fact, an object of class super remembers the arguments passed to it at the time of initialization and when calling any method (super () .__ init __ (self) in the example above) goes through the linearization list of the second argument class (self .__ class __.__ mro__), trying to call this method take turns for all classes following the class in the first argument (class C), passing the first argument (self) as a parameter. Those. for our case:
self .__ class __.__ mro__ = [C, B, A, P1, P2, ...]
super (C, self) .__ init __ () => B .__ init __ (self)
super (B, self) .__ init __ () => A .__ init __ (self)
super (A, self) .__ init __ () => P1 .__ init __ (self)
As you can see, from method B .__ init__, when using super, method A .__ init__ will be called, although class A is in no way associated with it and is not its ancestor. In this case, if necessary, the methods of all ancestors will be tested once in a chain.

Thanks to this approach, you can, for example, introduce the allergen method for all components for the considered example of a pie, which will successively go through the chain of ancestors to make a list of all the allergen components to warn customers. It will be convenient instead of rewriting the “not recommended” list for each product each time simply to inherit it from the components. Then the list will be generated automatically. Similarly, it will be possible to offer the user a choice of drinks for each product, or to twist the user's weight preferences on a handwritten Internet radio station, going from a specific group to genres with a certain attenuation coefficient.

An example of a pie might look like this (for version 2.x):

class Food (object):
def drink(self):
return [ 'Water' , 'Cola' ]
def allergen(self):
return []

class Meat (Food):
def drink(self):
return [ 'Red wine' ] + super (Meat, self).drink()

class Milk (Food):
def allergen(self):
return [ 'Milk-protein' ] + super (Milk, self).allergen()

class Flour (Food): pass

class Rabbit (Meat):
def drink(self):
return [ 'Novello wine' ] + super (Rabbit, self).drink()

class Pork (Meat):
def drink(self):
return [ 'Sovinion wine' ] + super (Pork, self).drink()
def allergen(self):
return [ 'Pork-protein' ] + super (Pork, self).allergen()

class Pasty (Milk, Flour): pass

class Pie (Rabbit, Pork, Pasty):
def drink(self):
return [ 'Mineral water' ] + super (Pie, self).drink()

if __name__ == "__main__" :
pie = Pie()

print 'List of allergens: '
for allergen in pie.allergen(): print ' - ' + allergen

print 'List of recommended drinks: '
for drink in pie.drink(): print ' - ' + drink

As a result, we will see the following:
List of allergens:
- Pork-protein
- Milk-protein
List of recommended drinks:
- Mineral water
- Novello wine
- Sovinion wine
- Red wine
- Water
- Cola

Based on this list, you can understand how to manage the list of linearization. As you can see, none of the parental methods were called twice, otherwise the list of allergens or recommendations would show repetitions. Also note that in the most senior class Food both methods are defined - allergen and drink. The call to super () does not perform any checks, so if we try to access a nonexistent method, we get an error like AttributeError: 'super' object has no attribute 'allergen'.

When linearization can not be drawn.

We have already analyzed the case when it was impossible to compose a linearization using the C3 algorithm. However, then the problem was solved by changing the places of the ancestor classes in the declaration of the descendant class. There are other cases in which the compilation of linearization is not possible:

class C (A, B): pass
class D (B, A): pass
class E (C, D): pass

L [E] = [E] + merge (L [C], L [D], [C, D]) = [E] + merge ([C, A, B], [D, B, A], [C, D])
= [E] + [C, D] + merge ([A, B], [B, A])

As you see, the conflict turned out to be insoluble, since in declaration C the class A stands before B, and in declaration D it is the other way around. Well, from this place you have to go and revise the structure. But if you really need to quickly podhachit or you just know what you are doing, Python will not limit you. You can define your own linearization through metaclasses. To do this, it is enough to specify the method mro (cls) in the metaclass - in fact, override the method of the basic metaclass type - which returns the linearization you need.

class MetaMRO ( type ):
def mro(cls):
return (cls, A, B, C, D object)

Further class declaration differs for version 3.0 and 2.x:

class E (C, D): __metaclass__ = MetaMRO # 2.x
class E (C, D, metaclass = MetaMRO): pass # 3.0

After that, E .__ mro__ = [E, A, B, C, D, object]. Notice that if you take responsibility for the MRO Python does not conduct any additional checks and you can safely search in ancestors earlier than in descendants. And although this is not a desirable practice, it is possible.

Useful links:
Unifying types and classes in Python 2.2 - about the separation of the object model and the meta-data model. It also discusses the MRO, the problems of multiple inheritance and super ().
The Python 2.3 Method Resolution Order is a C3 algorithm with examples. In the end, there is an implementation of mro and merge functions on pure Python for those who perceive code better than text.
A Monotonic Superclass Linearization for Dylan - a comparison of some types of linearization.

Afterword.

Of course, it is impossible to cover all related topics, and perhaps someone has more questions than answers. For example, what are metaclasses, base classes type and object, and the like. If there is interest, then, over time, I could try to make out such topics:

introspection in python: interactive help, dir, sys, etc.
type and object: most likely as a free and abbreviated translation of this one
Python magic: __str__, __init__, __new__, __slots__, etc. In fact, there is quite a lot of it, so there may be several parts.
metaclasses

PS: Thanks to all those who helped me post this topic and personally Goodrone .

Source: https://habr.com/ru/post/62203/

All Articles

Procedure for resolving methods in Python

The algorithm for ordering the resolution of methods C3.

How to turn to ancestors.

When linearization can not be drawn.

Afterword.

More articles: