Python, how would I like to see it

Everyone knows that I do not like the third version of Python and the direction in which this programming language is developing. Over the past few months, I have received many emails asking about my vision for Python development and decided to share my thoughts with the community in order to, if possible, give food for thought to future language developers.

We can say for sure: Python is not an ideal programming language. In my opinion, the main problems stem from the features of the interpreter and are little related to the language itself, however all these nuances of the interpreter gradually become part of the language itself, and therefore they are so important.

I want to start our conversation with one interpreter weirdness (slots) and end it with the biggest mistake of the language architecture. In fact, this series of posts is a study of the decisions inherent in the architecture of the interpreter, and their influence on both the interpreter and the language itself. I believe that from the point of view of the overall design of the language, such articles will look much more interesting than just expressing thoughts on improving Python.

Language and implementation

This section was added by me after writing the entire article. In my opinion, some developers overlook the fact that Python is interconnected as a language and CPython as an interpreter, and believe that they are independent of each other. Yes, there is a specification of the language, but in many cases it either describes the work of the interpreter, or simply holds back some points.
')
With this approach, the implicit implementation details of the interpreter directly affect the architecture of the language, and even force other implementations of the Python language to adopt some things. For example, PyPy does not know anything about slots (as far as I know), but has to work as if slots are part of it.

Slots

In my opinion, one of the biggest problems of the language is the idiotic slot system. I'm not talking about the construction of __slots__ , I mean internal type slots for special methods. These slots are a “feature” of the language, which many lose sight of because few people have to deal with it. At the same time, the existence of slots is the biggest problem of the Python language.

So what is a slot? This is a side effect of the internal implementation of the interpreter. Every Python programmer knows about “magic methods,” such as __add__ : these methods begin and end with two underscores, between which their name is enclosed. Every developer knows that if we write a + b in the code, the interpreter will call the function a .__ add __ (b) .

Unfortunately, this is not true.

Python doesn't really work that way. Python inside is completely different (at least in the current version). Here is how the interpreter works:

When an object is created, the interpreter finds all the class descriptors and searches for magic methods, such as __add__ .
For each special method found, the interpreter places a reference to the descriptor in a specially designated object slot, for example, the magic __add__ method is associated with two internal slots: tp_as_number-> nb_add and tp_as_sequence-> sq_concat .
When the interpreter wants to execute a + b , it will call something like TYPE_OF (a) -> tp_as_number-> nb_add (a, b) (in fact, everything is more complicated there, because the __add__ method has several slots).

The a + b operation should be something like type (a) .__ add __ (a, b) , however, as we saw from working with slots, this is not entirely true. You can easily verify this yourself by redefining the metaclass method __getattribute__ , and trying to implement your own __add__ method — you will notice that it will never be called.

In my opinion, the system of slots is simply absurd. It is an optimization for working with some data types (for example, integer), however, it makes absolutely no sense for other objects.

To demonstrate this, I wrote such a meaningless class ( x.py ):

class A(object): def __add__(self, other): return 42

Since we redefined the __add__ method, the interpreter will place it in the slot. But let's check how fast it is? When we perform operation a + b , we will use the slot system, and here are the results of the profiling:

 $ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b' 1000000 loops, best of 3: 0.256 usec per loop

If we perform the operation a .__ add __ (b) , then the slot system will not be used, and instead the interpreter will refer to the class instance dictionary (where it will not find anything) and, further, to the class dictionary itself, where the desired method will be found. Here are the measurements:

 $ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a.__add__(b)' 10000000 loops, best of 3: 0.158 usec per loop

Can you believe it? The option without using slots turned out to be faster than the option with slots. Magic? I’m not completely sure about the reasons for this behavior, but this has been going on for a long time, a very long time. In fact, the classes of the old type (which did not have slots) worked much faster than the classes of the new type and had more features.

More opportunities, you ask? Yes, because old type classes could do this (Python 2.7):

 >>> original = 42 >>> class FooProxy: ... def __getattr__(self, x): ... return getattr(original, x) ... >>> proxy = FooProxy() >>> proxy 42 >>> 1 + proxy 43 >>> proxy + 1 43

Today, despite a more complex type system than in Python 2, we have fewer features. The code above cannot be executed using new type classes. In fact, it is still worse if we take into account how lightweight the classes of the old type were:

 >>> import sys >>> class OldStyleClass: ... pass ... >>> class NewStyleClass(object): ... pass ... >>> sys.getsizeof(OldStyleClass) 104 >>> sys.getsizeof(NewStyleClass) 904

Where did the slot system come from?

All this raises the question of where the slots came from. As far as I can tell, it has long been the custom. When the Python interpreter was originally created, the built-in types (for example, strings) were implemented as global static structures, which made it necessary for them to contain all these special methods that the object should have. This was before the advent of the __add__ method as such. If we turn to the earliest version of Python in 1990, we will see how the objects were implemented at that time.

Here, for example, how integer looked like:

 static number_methods int_as_number = { intadd, /*tp_add*/ intsub, /*tp_subtract*/ intmul, /*tp_multiply*/ intdiv, /*tp_divide*/ intrem, /*tp_remainder*/ intpow, /*tp_power*/ intneg, /*tp_negate*/ intpos, /*tp_plus*/ }; typeobject Inttype = { OB_HEAD_INIT(&Typetype) 0, "int", sizeof(intobject), 0, free, /*tp_dealloc*/ intprint, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ intcompare, /*tp_compare*/ intrepr, /*tp_repr*/ &int_as_number, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ };

As we can see, even in the very first version of Python, the tp_as_number method already existed. Unfortunately, some old versions of Python (in particular, the interpreter) were lost due to damage to the repository, so we turn to slightly later versions to see how the objects were implemented. This is how the add function code looked in 1993:

 static object * add(v, w) object *v, *w; { if (v->ob_type->tp_as_sequence != NULL) return (*v->ob_type->tp_as_sequence->sq_concat)(v, w); else if (v->ob_type->tp_as_number != NULL) { object *x; if (coerce(&v, &w) != 0) return NULL; x = (*v->ob_type->tp_as_number->nb_add)(v, w); DECREF(v); DECREF(w); return x; } err_setstr(TypeError, "bad operand type(s) for +"); return NULL; }

So when did the __add__ and other methods appear? I think they appeared in version 1.1. I managed to compile Python 1.1 on OS X 10.9:

 $ ./python -v Python 1.1 (Aug 16 2014) Copyright 1991-1994 Stichting Mathematisch Centrum, Amsterdam

Of course, this version is not stable, and not everything works as it should, but you can get an idea about the Python of those days. For example, there was a huge difference between the implementation of objects in C and in Python:

 $ ./python test.py Traceback (innermost last): File "test.py", line 1, in ? print dir(1 + 1) TypeError: dir() argument must have __dict__ attribute

We see that then there was no introspection for built-in types, such as integer. In fact, the __add__ method was supported exclusively for user-defined classes:

 >>> (1).__add__(2) Traceback (innermost last): File "<stdin>", line 1, in ? TypeError: attribute-less object

Here is the legacy we got today in Python. The basic principle of the Python object architecture has not changed, but for many, many years they have been subjected to numerous modifications, changes and refactoring.

Modern PyObject

Today, many will argue with the statement that the difference between the built-in Python data types implemented in C and objects implemented in pure Python is insignificant. In Python 2.7, this difference is particularly pronounced in the fact that the __repr__ method is provided by the corresponding class class for types implemented in Python and, accordingly, the type for embedded objects implemented in C. This difference actually indicates the location of the object: statically (for type ) or dynamically on the heap (for class ). In practice, this difference did not matter, and in Python 3, it completely disappeared. Special methods are placed in slots and vice versa. It would seem that the difference between the Python and C classes is no more.

However, the difference is still there, and very noticeable. Let's figure it out.

As you know, classes in Python are “open”. This means that you can “look” at them, see the content that is stored in them, add or delete methods even after the class declaration is completed. But this flexibility is not provided for the built-in interpreter classes. Why is that?

There are no technical limitations to add a new method to, say, a dict object. The reason why the interpreter does not allow you to do this has little to do with the developer's sanity, the whole point is that the built-in data types are not located in the heap. To appreciate the global implications of this, you first need to understand how Python runs the interpreter.

Devil interpreter

Running an interpreter in Python is a very expensive process. When you run an executable file, you activate a complex mechanism that can do a little more than anything. Among other things, built-in data types, import mechanisms for modules are initialized, some required modules are imported, work with the operating system is performed to customize working with signals and command line parameters, the internal state of the interpreter is configured, etc. And only after the end of all these processes, the interpreter runs your code and completes its work. So Python has been working for 25 years now.

Here is what it looks like in pseudocode:

 /*   */ bootstrap() /*        ,   */ initialize() rv = run_code() finalize() /*   */ shutdown()

The problem is that the interpreter has a huge number of global objects, and in fact we have one interpreter. Much better, in terms of architecture, was you to initialize the interpreter and run it something like this:

 interpreter *iptr = make_interpreter(); interpreter_run_code(iptr): finalize_interpreter(iptr);

This is how other dynamic programming languages work, such as Lua, JavaScript, etc. The key feature is that you may have two interpreters, and this is a new concept.

Who in general may need several interpreters? You'd be surprised, but even for Python it is necessary, or, at least, it can be useful. Among the existing examples are applications with embedded Python, such as mod_python web applications, they definitely need to run in an isolated environment. Yes, in Python there are subinterpreters, but they work inside the main interpreter, and only because in Python so much is tied to the internal state. The biggest part of the code for working with Python's internal state is at the same time the most controversial one: global interpreter lock (GIL). Python works in the concept of a single interpreter because there is a huge amount of data shared by all subinterpreters. They all need a lock (lock) for the sole access to this data, so this lock is implemented in the interpreter. What data are we talking about?

If you look at the code above, you will see all these huge structures declared as global variables. In fact, the interpreter uses these structures directly in Python code using the OB_HEAD_INIT (& Typetype) macro, which sets the necessary headers for these structures. For example, there is a count of the number of links to the object.

Now you see what all goes? These objects are shared with all subinterpreters. Now imagine that we can change any of these objects in Python code: two completely independent and unrelated Python programs that nothing should link can affect each other's state. Imagine, for example, that JavaScript code in a tab with Facebook could change the implementation of the embedded array object, and in a tab with Google these changes would immediately begin to work.

This is an architectural decision of 1990, which still continues to influence the modern version of the language.

On the other hand, the immutability of built-in types was generally favorably received by the Python developer community, because the problems of changeable data types are well known in the example of other programming languages, and we will be frank, not so much lost.

However, there is more.

What is VTable?

So, in Python, built-in (implemented in C) data types are practically immutable. What else are they different? Another difference is the “openness” of Python classes. The methods of classes implemented in the Python programming language are “virtual”: there is no “real” table of virtual methods, as in C ++, and all methods are stored in the dictionary of the class from which the selection is made using a search algorithm. The consequences are obvious: when you inherit from an object and override its method, it is likely that another method will be indirectly affected, because it is called in the process.

A good example are collections that contain easy-to-use features. So, Python dictionaries have two methods for getting an object: __getitem __ () and get () . When you create a class in Python, you usually implement one method through another, returning, for example, return self .__ getitem __ (key) from the get (key) function.

For the types implemented in the interpreter, everything is different. The reason, again, is the difference between slots and dictionaries. Let's say you want to create a dictionary in the interpreter, and one of the conditions is to reuse the existing code, so you want to call __getitem__ from get . What do you do?

The Python method in C is just a function with a specific signature, and this is the first problem. The main task of the function is to process the parameters from the Python code and convert them to something that can be used at the C level. At a minimum, you need to convert the function call arguments from the Python tuple or dictionary (args and kwargs) to local variables. Usually they do this: first, dict__getitem__ simply parses the arguments, and then dict_do_getitem is called with the actual parameters. See what happens? dict__getitem__ and dict_get both call dict_get , which is an internal static function, and you can't do anything about it.

There is no good way to get around this limitation, and the reason is the slot system. The interpreter does not have a normal way to make a call via vtable, and the reason for this is GIL. The dictionary (dict) communicates with the “outside world” via an API using atomic operations, and this completely loses all meaning when such calls occur through a vtable. Why? Because such a call may not reach the Python level, and then it will not be processed through GIL, and this will immediately lead to huge problems.

Imagine the suffering of the redefinition in the class inherited from the dictionary, the internal function dict_get , in which lazy import is run. You throw all your guarantees out of the window. But then again, perhaps we should have done it a long time ago?

Conclusion

In recent years, there has been a clear trend of increasing complexity of the Python language. I would like to see the opposite.

I want the internal architecture of its interpreter to be based on independent subinterpreters with local basic data types, in the same way as it works in JavaScript. This would open up tremendous possibilities for embedding and multi-threading based on messaging. Processors will no longer be faster.

Instead of slots and dictionaries in the role of vtable, let's just experiment with dictionaries. Objective-C language is completely based on messaging, and it plays a decisive role in its speed: I see that call processing in Objective-C is much faster than in Python. The fact that strings are internal type in Python makes comparing them quick. I am ready to argue that the proposed approach will not be worse, and even if it slows down the work of internal types a little, the result should be a much simpler architecture that is easier to optimize.

You should study the Python source code to see how much extra code is required for the operation of the slot system, it’s just unbelievable! I am convinced that it was a bad idea, and we should have given it up long ago. Rejection of slots will benefit even PyPy, since I am sure that its authors have to go all out, so that their interpreter works in CPython compatibility mode.

Translated Dreadatour , text read %% username.

Source: https://habr.com/ru/post/234747/

All Articles