Hi, Habr! I present to your attention the translation of the article
Toward a “Kernel Python” by Glyph Lefkowitz (creator of the Twisted framework).
Read more - under the cut.
The magic of minimizing the standard library
Influenced by
Amber Brown ’s
speech last month at the Python Language Summit (referring to her May report “Batteries included, but they flow” - note of the translator) Christian Himes continued
his work on reducing the standard Python library and created the
PEP 594 proposal to remove obsolete and unsupported fragments of it.
The emergence of PEP 594 (“Removing dead batteries from the standard library”) is great news for Pythonists, especially for those who support the standard library and will now have a smaller front line. A brief tour of the PEP gallery of obsolete or requiring removal modules speaks for itself (the sunau, xdrlib, and chunk modules are my personal favorites). The standard Python library contains many useful modules, however, it also includes a real necropolis of code, a towering monument of obsolete fragments that threaten to bury their developers at any time.
')
However, I believe that an erroneous approach can be implemented in this PEP proposal. Currently, the standard library is maintained in tandem with the developers of CPython. Her large pieces are left in the vague hope that someday it will benefit someone. In the aforementioned PEP, this principle can be observed while protecting the colorsys module. Why not remove it? Answer: “This module is needed to convert CSS colors between coordinate systems (RGB, YIQ, HSL and HSV). [He] does not impose additional costs on the main development. "
There were times when Internet access was limited, and perhaps it was a good idea to preload Python with a whole bunch of stuff, but nowadays the modules needed to convert colors between different coordinate systems are in one step from the pip install command.
Why didn't you consider my pool request?
So let's consider this statement: do tiny modules like colorsys impose "additional costs on basic development"?
The main developers need only the fact that they are trying to maintain the huge and ancient code base in C, which is CPython itself. As Marietta Vijia said in
her speech at North Bay Python, the most common question asked by kernel developers is: “Why haven't you looked at my pull request yet?” And what is the answer?
It's easier to ignore these pull requests. That's what it means to be a kernel developer!
One might ask if Twisted has the same problem? Twisted is also a large collection of loosely coupled modules; a kind of standard library for networking. Are all of these clients and servers for SSH, IMAP, HTTP, TLS, etc.? etc. an attempt to squeeze everything into one package?
Forced to answer:
yes . Twisted is monolithic because it comes from the same historical period as CPython, when the installation of components was really difficult. Therefore, I sympathize with the position of CPython.
Ideally, at some point, each Twisted subproject should become a separate project with its own repository, continuous integration (CI), a website and, of course, with its own more focused developers. We are slowly but surely sharing projects where you can draw a natural border. Some points that began in Twisted as constantly and incremental are separated, deferred and filepath are in the process of separation. Other projects, such as klein and treq, continue to live separately. We will do more when we figure out how to reduce the costs of setting up and supporting continuous integration and how to release the infrastructure for each of them.
But is the monolithic nature of Twisted the most pressing or even serious problem for the project? Let's rate it.
At the time of this writing, Twisted had 5 outstanding outstanding pending requests in the queue. The average time spent on reviewing the ticket is, roughly speaking, four and a half days. The oldest ticket in the queue is dated April 22, which means less than 2 months have passed since the oldest unreviewed pull request was sent.
It is always difficult to find a sufficient number of developers and time to respond to pool requests. Sometimes it seems that we still get too often the question "Why do not you consider my pool-request?". We do not always do it perfectly, but we generally cope; the queue ranges between 0 and 25 or so in the most unfortunate month.
And what about the CPython core, compared to these figures?
Going to the
githab , you can see that at the moment 429 tickets are waiting for consideration. The oldest of them is expected from February 2, 2018, that is, almost 500 days.
How many problems with the interpreter and how many problems with the stdlib library? Obviously, the delay in reviewing is a problem, but will removing stdlib help?
For a quick and unscientific evaluation, I looked at the first (oldest) pull-requests page. According to my subjective assessment, out of 25 pull requests, 14 were associated with the standard library, 10 with the core language or interpreter code, and one was associated with a minor issue with documentation. I would venture to assume, on the basis of this proportion, that somewhere around half of the unexamined pull requests are associated with a standard library code.
So, the first reason why the main team of CPython needs to stop supporting the standard library is that they
literally do not have the physical ability to support the standard library. Or, in other words, they
do not support it, and it remains only to recognize this and begin to share the work.
The fact that none of CPython's open pull requests is associated with the colorsys module. Indeed, it does not impose the costs of kernel development.
Kernel development itself imposes these costs. If I wanted to update the colorsys module to keep up with the times (it is possible to have a Color object, perhaps to support integer color models), most likely I would have to wait 500 days or more.
As a result of all this, it is more difficult to change the code in the standard library, which leads to less user interest in making its own contribution. The infrequent releases of CPython also slow down library development and reduce the benefit of user feedback. It is no coincidence that almost all modules of the standard library have actively supported third-party alternatives, and this is not the fault of the stdlib developers. The whole process is sharpened by the stagnation of all but the most frequently used stdlib modules.
New environments, new requirements
Perhaps even more important is that binding CPython to the standard library puts CPython itself in a privileged position compared to other implementations of the language.
Podcast for
podcast ,
podcast for a
report tell us that to continue the success and development of Python, you need to grow in new areas: especially in the web front-end, as well as mobile clients, embedded systems and console games.
These environments require one or two conditions to be met:
- completely different runtime environment (see Brython or MicroPython )
- modified trimmed version of the standard library.
In all these cases, the stumbling block is the definition of modules that need to be removed from the standard library. They need to be found by trial and error; First of all, the process is completely
different from the standard dependency determination process in a Python application. There is no install_requires declaration in setup.py reporting that the library uses a module from stdlib that the Python target runtime may skip due to space limitations.
The problem may arise even if everything we use is standard Python on a Linux installation. Even Linux distributions for servers and desktops have an equal need for a smaller Python kernel package, so the standard library is already quite arbitrarily trimmed. This may not meet the requirements of the Python code and, as a result, lead to errors when even
pip install will not work .
Take it all away
“What about the assumption that you need to clean up little by little every day? Although it sounds convincing, do not let yourself be deceived. The reason you feel the cleaning never ends is precisely because you are cleaning up a little. [...] The main secret of success is: if you remove in one fell swoop, rather than gradually, you can change your thinking and life habits forever. ”
Marie Kondo, “Magical cleaning. The Japanese art of restoring order at home and in life ”(pp. 15-16)
While a gradual decrease in the standard library is a step in the right direction, only gradual changes are not enough. As Marie Kondo says, if you really want to put things in order, the first step is
to take everything out of sight in order to really see everything, and then return only what is necessary.
It is time to pay tribute to those modules that are no longer pleasing, and send them on a long journey.
We need a version of Python containing only the most necessary minimum in order for all implementations to be consistent, and for applications (even those running in web browsers or microcontrollers) to simply state their requirements in requirements.txt.
In some business environments, an idea with a huge standard library seems attractive because adding dependencies to requirements.txt is bureaucratic, however, the “standard library” in such environments has purely arbitrary boundaries.
A good idea can still be to supply part of CPython binary distributions (maybe even official ones) with a wide choice of modules from PyPI. After all, even for ordinary tasks, a certain amount of stdlib library is required, which pip-y needs in order to install other necessary modules.
Now there is a situation where pip is distributed along with Python, but CPython is
not developed in the repository. Part of what the default Python installer comes with is developed in the CPython repository, some comes in a separate source archive (tarball) for the interpreter.
To use Linux, we need bootable media with a huge set of additional programs. But this does not mean that the Linux kernel itself resides in one giant repository, in which hundreds of applications needed for a running Linux server are developed by one team. The Linux kernel is extremely valuable, but the operating systems using it are created from a
combination of the Linux kernel and a wide range of separately developed libraries and programs.
Conclusion
The “Battery Included” philosophy was perfect for the time of its creation; like a booster rocket, it delivered the python to the programming public. However, as the open-sorts ecosystems and Python packages mature, this strategy is outdated, and, like any accelerator, we must let it go back to earth so that it does not drag us back.
New Python runtimes, new deployment tasks, and new developer audiences all provide the Python community with tremendous opportunities to reach new heights.
But to achieve this, we need a new, more compact and non-overloaded Python core. We need to shake out the entire standard library on the floor, leaving only the smallest pieces we need so we can say: this is really necessary, and this is just nice to have.
I hope that I have convinced at least some of you what kind of Python kernel we need.
And now: who wants to write a
PEP ?