📜 ⬆️ ⬇️

Thoughts of Python 3

I offer to your attention the retelling of a wonderful article by Jinja2, Werkzeug and Flask, co-author of Sphinx and Pygments Armin Rahner. I got great pleasure in analyzing the source codes of his creations and learned a lot for myself. Armin writes excellent frameworks and how no one else can explain what the transition from Python 2 to Python 3 is fraught with and why it is not so easy to implement.



Thoughts of Python 3


Recently, I was often visited by thoughts about the state in which Python 3 is located. Though not at first glance, I fell in love with Python and was more than pleased with the course it is taking. Ten years my life goes along with Python. And while this is a big part of my life.
')
I warn you in advance: this is a very personal article. I counted a hundred copies of one capital letter in this text.

This is because I am very grateful for all the opportunities I have gained over the past two years: the opportunities to travel the world, communicate with people and share a cooperative spirit that allows free projects such as Python to encourage innovation and make people happy. Python has a wonderful community, which I often forget to say out loud.

On the other hand, even though I love Python and love to discuss ways and solutions, I am not connected with the project by any obligations, despite my devotion to it. When I attend meetings about the language, I immediately understand why my sentences are perceived with hostility, and I myself am considered to be another thorn. "He constantly complains and does nothing." There are so many things I would like to see in Python, but in the end, I am its user, not a developer.

When you read my comments about Python 3, given that its first version has already been released, you will get the impression that I hate it and don’t want to switch to it at all. Even as I want, but not in the form in which it is now.

Given my experience of people referring to articles long after they were written, let me first clarify the situation with Python 3 at the time of writing: version 3.2 was released, the next version is 3.3, and there are no plans to ever release Python 2.8. Moreover, there is a PEP in which it is written in black and white: there will be no release. Perfectly developing, PyPy remains a project whose architecture is so distant from everything else that no one will take it seriously for a long time. In many ways, PyPy does things that “I wouldn’t do” and it seems amazing to me.

Why do we use python?

Why do we use python? It seems to me that this is a very correct question that we rarely ask ourselves. Python has a lot of flaws, but I still use them. At the party, on the last day of the PyCodeConf conference of this year, I managed to discuss a lot with Nick Koflan. We were podshofe and thanks to this discussion was very sincere. We agreed to acknowledge the fact that Python is not perfect, like a language, that work continues on some flaws and that, upon careful consideration, some of them have no excuses. The PEP about “yield from” was considered as an example of the development of a dubious design (coroutine as a generator) to give it a more or less working look. But even with the changes adopted in “yield from,” all this is very far from the convenience of greenlets.

This conversation was a continuation of what was heard at the lecture “Biased Opinion on Programming Languages”, which Geri Bernard read on the same memorable conference day. We agreed that Ruby blocks have an amazing design, but for many reasons it wouldn’t work in Python (in its current state).

Personally, I don’t think we use Python because it’s a perfect and flawless language. Moreover, if you go back in time and look at earlier versions of Python, you will see that it is very, very ugly. It’s not surprising that in its early years Python remained unnoticed by anyone.

It seems to me that the scope gained by Python since then can be considered a great miracle. And this is why, as it seems to me, we use Python: the evolution of this language was very smooth, and the embodied ideas were correct. Early Python was terrible, it lacked the concept of iterators, and moreover, for dictionary iteration, you had to create an intermediate list of all its keys. At some point, the exceptions were strings, the string methods were not methods, but functions from the module of the same name (string). The syntax of interception of exceptions torments us in all the guises of the Python 2 language, and Unicode appeared too late and partially - never.

However, there were many good things in it. Even if it was a flawed one, the idea about modules with their own namespaces was amazing. The structure of the language based on multimethods * is still largely unmatched. Every day we benefit from this decision, although we do not give ourselves in this report. This language has always honestly done its work and did not hide what is happening in the interpreter (tracebacks, stack frames, opcodes, code objects, ast, etc.), which, together with a dynamic structure, allows the developer to quickly debug and solve problems with unattainable ease in other languages.

Often criticized is the indentation based syntax, but seeing how many new languages ​​implement this approach (HAML, CoffeeScript, and many others come to mind) proves that it has received recognition.

Even when I disagree with the way Raymond * writes something new for the standard library, the quality of its modules is not in the slightest doubt and this is one of the main reasons I use Python. I can't imagine working with Python without accessing the collections or itertools module.

But the real reason I loved and idolized Python was the anticipation of every new version, like an eager child waiting for Christmas. Small, barely noticeable improvements led me to delight. Even the ability to specify the beginning of the index for the enumerate function made me thankful for the new release of Python. And all this with regard to backward compatibility.

Importing from __future__ is something that we sometimes hate so much and exactly what made upgrades easy and painless. Once I used PHP and was absolutely not happy with the new releases. There were no namespaces in PHP, but new built-in functions always appeared, and with each release I really hoped to avoid collisions in names (I know that I could have avoided if I used prefixes, but that was long before I learned the basics of developing BY).

What has changed?

How did it happen that I was not up to the new releases of Python? I can only speak for myself, but I noticed that others have changed their attitude to new releases.

I never asked questions about what the core developers of the next Python 2.x were doing.
Of course, something was not so well thought out, such as the implementation of abstract classes or features of their semantics. But basically it all came down to criticism of high-level functionality.

With the advent of Python 3, there were also external factors, because of which I suddenly had to change the general approach to working with the language. I have not used new language features for a long time, even though I was glad because basically wrote the library. It would be a mistake to use the newest and the best. The Werkzeug code is still packed with hacks allowing it to work on Python 2.3, although now the minimum requirements have risen to version 2.5. I left in the code the bugfixes for the standard library, because some manufacturers (the infamous Apple) never update the interpreter until a critical vulnerability is found in it.

All this is impossible with Python 3. With it, everything turns into development for 2.x or 3.x. And no middle decision is foreseen.

After the announcement of Python 3, Guido always admired talking about 2to3 and how it will facilitate porting. It turned out that 2to3 is the worst thing that could happen to Python.

I experienced tremendous difficulties porting Jinja2 using 2to3, which I later regretted. Moreover, in the JSON Jinja budding project, I removed all the hacks written for 2to3 to work correctly and will never use it again. Like many others, I’m trying to keep the code working on both versions 2.x and 3.x. You will ask why? Because 2to3 is very slow, it is very difficult to integrate into the testing process, it depends on the version of Python 3 used and everything else is configured except with the use of black magic. This is a painful process that nullifies all the fun of writing libraries. I loved jingling Jinja2, but I stopped doing it at the moment when the port in Python 3 was ready, because I was afraid to break anything in it.

Now, the idea of ​​a shared codebase rests on the fact that I have to support Python up to version 2.5.

Changes caused by Python 3 have put all our code into disrepair, which in no way justifies its immediate rewriting and upgrade. In my deeply subjective opinion, Python 3.3 / 3.4 should be more like Python 2 and Python 2.8 should be closer to Python 3. It so happened that Python 3 is XHTML in the world of programming languages. He is incompatible with what he is trying to replace, and in return he offers practically nothing except that he is more “correct”.

Little about unicode

Obviously, Unicode handling was the biggest change in Python 3. It may seem that putting Unicode to one and all is good. And yet, this is a view of the world through rose-colored glasses, because in the real world we are faced not only with bytes and Unicode, but also with strings with a well-known encoding. Worst of all, in many ways, Python 3 has become a Fisher Price * in the world of programming languages. Some features have been removed because kernel developers thought it would be easy to cut about them. And all this came at the cost of removing widely used functionality.

Here is a specific example: operations with codecs in 3.x are currently limited to Unicode <-> bytes conversions. No bytes <-> bytes or Unicode <-> Unicode. It looks reasonable, but looking closer you will see that this removed functionality is just what is vital.

One of the most remarkable features of the codec system in Python 2 was that it was created with an eye to a variety of work with a huge number of encodings and algorithms. You could use a codec to encode and decode strings, and you could also ask the codec for an object that provides operations on streams and other incomplete data. And yet, the codec system worked equally well with content and transfer encodings. It was necessary to write a new codec and register it, as each part of the system learned about it automatically.

Anyone who took to writing HTTP libraries in Python was happy to discover that codecs can be used not only for decoding UTF-8 (the actual character encoding), but for example for gzip (a compression algorithm). This applies not only to lines, but also to generators or file objects, unless of course you know how to use them.

At the moment, in Python 3, all this simply does not work. They not only removed these functions from the string object, but also removed byte -> byte encoding, leaving nothing in return. If I am not mistaken, it took 3 years to recognize the problem and start a discussion about the return of the above functionality in 3.3.

Further, Unicode was pushed to a place where it is not at all a place. These places include the file system layer and the URL module. Also, a bunch of Unicode functionality was written from the point of view of a programmer living in the 70s.

UNIX file systems are based on bytes. So it is arranged and nothing can be done about it. Naturally, it would be great to change this, which is actually impossible without breaking the existing code. This is because changing the encoding is only a small part of what is needed for a Unicode-oriented file system. In addition, the questions of the forms for the normalization and storage of information about the register with the normalization already performed remain open. Remain type bytestring in Python 3, these problems could have been avoided. However, it is not there and its replacement, the byte type, does not behave as the lines behave. It behaves like a data type, written to punish people using byte data, which simultaneously exist as a string. It does not seem to be designed as a tool by which programmers could solve these problems. Problems that are more than real.

So, if you work with the file system from Python 3, then a strange feeling will not leave you despite the presence of a new encoding with surrogate pairs and screening. This is a painful process, painful because there is no tool for scrapping this bedlam. Python 3 seems to appeal to you, “Buddy, from now on your Unicode file system,” but it doesn’t explain from which end this mess should be raked. It does not even clarify whether the file system actually supports Unicode, or whether Python is faking this support. It does not disclose details about normalization or how to compare file names.

He works in the laboratory, but breaks down in field conditions. It so happened that my Mac poppy has an American keyboard layout, an American locale, and almost everything is American, except that dates and numbers are formatted differently. As a result of all this (and how I assume that I upgraded my poppy since Tiger), I had the following situation: logging into my remote server, I received a locale set to the string value “POSIX”. You ask, what's POSIX? And hell knows. So Python being in the same ignorance as I, decided to work with "ANSI_X3.4_1968". On this memorable day, I learned that ASCII has a lot of names. It turned out that this is just one of the ASCII names. And here's to you, my remote Python interpreter crookedly displayed the contents of the directory with the internalized file names. How did they get there? I threw in there Wikipedia articles with their original names. I did this with Python 3.1, which silenced what was happening with the files, instead of throwing exceptions or using any hacks.

But file system problems are just flowers. Python also uses environment variables (where, as you know, full of garbage) to set the default file encoding. During the conference, I asked a couple of visitors to guess the default encoding used for text files in Python 3. More than 90% of this small sample was sure that it was UTF-8. And no! It is set depending on the platform locale. As I told you, greetings from the 70s.

For fun, I logged in to both servers I controlled and found that one of them had a latin1 encoding when logging in via the console, which switches to latin15 when logging in via ssh as root, and UTF-8 if I logged in via my user. Damn entertaining, but you can only blame yourself. I have no doubt that I am not the only one whose server magically switches the encodings given that by default SSH sends the locale settings at login.

Why am I writing about this here? Yes, because again and again I have to prove that Unicode support in Python 3 gives me much more trouble than in Python 2.

Encoding and decoding Unicode does not stand in the way of someone who follows Python 2 Zen in that “explicit is better than implicit”. “Bytes are included, Unicode is coming out” - this is how pieces of applications that communicate with other services work. This can be explained. You can explain this thoroughly by documenting. You will emphasize that for internal text processing in the form of Unicode has its own reasons. You tell the user that the world around us is harsh and based on bytes, so you have to encode and decode to communicate with this world. This concept may be new to the user. But one has only to find the right words and paint everything properly, as one headache will be less.

Why am I talking about this with such certainty? Because since 2006, all my programs have been enforced by Unicode users, and the number of requests for Unicode cannot be compared with the breakthrough requests for working with packages or with an import system. Even with distutils2, in the realm of Python, packages remain a much bigger problem than Unicode.

Far from being a natural development: hiding Unicode away from the user of Python 3. But it turned out that it became even harder for people to imagine how all this works. Do we need a priori implicit behavior? I'm not sure about that.

Undoubtedly, now Python 3 is on the right track. I found that more and more talk is about returning some APIs for working with bytes. My naive idea was the idea of ​​a third line type in Python 3, called estr, or something like that. It would work exactly as str in Python 2, store bytes, and have the same set of string APIs. However, it would also contain information about the encoding that would be used for transparent decoding into a Unicode string or cast to a byte object. This type would be a holy grail that could facilitate porting.

But it is not, and the Python interpreter was not developed with a reserve for one more type of string.

We destroyed their world

Nick talked about how Python core developers destroyed the world of web programmers. For the time being, the destruction goes as long as the inverse noncompatibility of Python ends. But our world was destroyed no more than the world of other developers. After all, we have a single world. The network is based on bytes with encodings, but mainly for low-level protocols. Communication with the greater part of what lies on the lower level takes place in a byte language with encodings.

However, the main changes affected the way of thinking that is needed when working at these levels. In Python 2, Unicode objects were very often used to communicate with the lower levels. If necessary, the objects were encoded in bytes and vice versa. A pleasant side effect for us was, for example, the ability to speed up some operations, encoding and decoding data in the early stages and transferring them to the channel that understands Unicode. In many ways, this allows the kernel serialization module to function. For example, Pickle communicates with threads that support both bytes and Unicode. To some extent, the same can be said about simplejson. And here, Python 3 appears, in which you suddenly need to separate Unicode and byte streams. Many APIs will not survive on the path to Python 3 without major changes in their interfaces.

Yes, this is a more correct approach, but in fact he no longer has any merits, except that it is more correct.

Working with the I / O functionality in Python 3, I made sure that it is great. But in reality, it cannot be compared with the way Python 2 worked. It may seem like I have a lot of preconceptions, because I worked so much with Python 2 and so little with Python 3, however, writing more code for achieving the same functional, is considered bad form. And with Python 3, I have to do all this, considering all its aspects.

But porting works!

Of course, porting to Python 3 works. This has been proven, and not once. But just because something is possible and passes all tests does not mean that everything is well done. I am a person with flaws and make a bunch of mistakes. At the same time, I am proud to bring my favorite APIs to shine. Sometimes I find myself rewriting a piece of code again and again to make it more user-friendly. When working with Flask, I spent so much time honing the basic functionality that it's time to start talking about obsession.

I want it to work perfectly. When I use the API for an ordinary task, I want them to have the same level of perfection that is inherent in the design of Porshe. Yes, this is only the outer layer for the developer, but the product should be well designed from start to finish.

I can make my code work in Python 3 and still I will hate it. I want it to work. But at the same time, using my own or someone else's libraries, I want to get the same pleasure with Python 3, which I get from Python 2.

Jinja2, for example, incorrectly uses an input / output layer in Python 3, since it is not possible to use the same code on 2.x and 3.x without switching between implementations during execution. Now, the templates open in binary mode in both 2.x and 3.x, because This is the only reliable approach, and after that, Jinja2 itself decodes data from this binary stream. Actually, it works, thanks to the normalization of newline delimiters. But I’m more than sure that everyone who works in Windows and doesn’t normalize line separators will sooner or later get into a situation with mash from various separators, completely unaware of it.

Taking Python 3

Python 3 has changed a lot, that's a fact. Without a doubt, for him the future in which we are heading. Much in Python 3 holds great promise: a significantly improved import system, the emergence of __qualname__, a new way of distributing Python packages, a unified representation of strings in memory.

But now, porting a library to Python 3 looks like developing a library in Python 2 and creating it (I apologize for my French) with a smart-ass version for Python 3 just to prove that it works. About Jinja2 in Python 3, you can say in all respects that it is damn ugly. This is terrible and I should be ashamed of it. For example, in the version for Python 3, Jinja2 loaded two one-megabyte regular expressions into memory, and I absolutely did not care about its release. I just wanted her to work somehow.

So why did I have to use megabyte regular expressions in Jinja2? Yes, because the regular expression engine in Python does not support Unicode categories. And with such restrictions, we had to choose the lesser evil of the two: either to score on new Unicode identifiers in Python 3 and restrict ourselves to ASCII identifiers, or create a huge regular expression manually, writing all the necessary definitions into it.

The above is the best example explaining why I am not yet ready for Python 3. It does not provide tools for working with its innovations. Python 3 is a vital need for Unicode-oriented regular expressions, it needs an API to work with locales that take a course on Unicode. It needs a more advanced path module, revealing the behavior of the underlying file system. It should be stronger to impose a uniform standard encoding for text files, independent of the environment. It should provide more tools for working with coded strings. He needs IRI support, not just a URL. He needs something more than "yield from." It should have assistive transcoding mechanisms that are needed to display URLs in the file system.

To all of the above, you can add the Python 2.8 release, which would be a little closer to Python 3. For me, there is only one realistic way to go to Python 3: libraries and programs should be fully aware of Unicode and integrated into the new Python 3 ecosystem.

Do not let amateurs lay your way

The biggest mistake in Python 3 is its binary incompatibility with Python 2. Here I mean the lack of the ability of Python 2 and Python 3 interpreters to work together in a common process space. As a result, you cannot run Gimp simultaneously with both Python 2 and Python 3 script interfaces. The same applies to vim and Blender. We simply can not. It is not difficult to write a bunch of hacks with separate processes and fanciful IPC, but nobody needs it.

Thus, a programmer who will have to master Python 3 before others will do it from a stick. And not the fact that this programmer is generally familiar with Python.And the reason, in all honesty, is that the money revolves around Python 2. Even if we spend all our energy on Python 3 at night, we will go back to Python 2 anyway. So it will be for the time being. However, if a handful of graphic designers start writing scripts on Blender for Python 3, then here’s your adaptation.

I really don't want to see kak CheeseShop *will suffer from the abundance of curved ports of libraries in Python 3. I don’t want to see another Jinja2 and especially an ugly bunch of code designed to work on 2.x and 3.x. There, too, hacks like sys.exc_info () [1], to bypass syntactic differences, hacks of converting literals during execution for compatibility with 2.x and 3.x, and much more. All this has a bad effect not only on the performance at runtime, but also on the main Python postulates: beautiful and legible code without hacks.

Acknowledge Failure, Learn, and Adjust

Now is the time for us to get together and discuss everything that people are doing to work their code on 2.x and 3.x. Technology is evolving at a fast pace and it will be very offensive for me to watch Python collapse just because someone missed the dark clouds on the horizon.

Python is not "too big to be forgotten about." He can lose his popularity very quickly. Pascal and Delphi fell into a narrow niche, despite the fact that they remained delightful languages ​​even after the birth of the C # and. NET framework. Most of all, the wrong management affected their fall. People are still developing on Pascal, but how many are those who are starting to write new projects on it? Deplhi does not work on iPhone and Android. It is not very well integrated into the UNIX market. And to be honest, in some areas, Python is already losing ground. Python was quite popular in the field of computer games, but this train has long been gone. In the web community, new competitors appear like mushrooms after the rain, and whether we like it or not, JavaScript more and more often takes the position of Python as a scripting programming language.

Delphi could not adapt in time and people just switched to other technologies. If 2to3 is our transition path to Python 3, then py2js is the transition path to JavaScript.

And here's what I suggest: could we make a list of everything that complicates the transition to Python 3 and the list of solutions to solve these problems? Could we re-discuss Python 2.8 development, if he can help with porting? Could we recognize PyPy as a valid implementation of Python, weighty enough to influence the way we write code?

Armin Ronacher,
December 7, 2011.

From the translator: After reading this article, the first desire was to share with others, there was a sharp feeling that "the world should know." My colleague Irina Pirogova and my wife Ayla Mehdiyeva helped to retell the article, for which many thanks to them!

Source: https://habr.com/ru/post/147281/


All Articles