📜 ⬆️ ⬇️

Porting to python 3. Bug fixes

Note from the translator:
I present to you the translation of an interesting article by Armin Ronaker, the author of the web frameworks Flask and Werkzeug, the template engine Jinja2, and generally a well-known pythonist about current techniques and pitfalls he uses in his projects when adding support for the third python. A small note about the title of this article. It is a reference to Armin's article “Porting to Python 3. Manual,” in which he described the preparation of code for automatic porting through the 2to3 utility. As practice shows, today such an approach is rather an anti-pattern, since on the one hand, the quality of the code as a result of such operations deteriorates markedly, and in addition, such code is noticeably more difficult to maintain.

After the extremely painful experience of porting Jinja2 to the third python, I had to leave the project idle for a while, because I was too afraid to break support for python version 3. The approach I used was to write code for python version 2 and translate using 2to3 to the third python during package installation. The most unpleasant side effect is that any change you make requires approximately a minute to translate, thereby killing the speed of your iterations. Fortunately, it turned out that if you correctly specify the final version of python, the process goes significantly faster.

Thomas Waldman from the MoinMoin project started by running Jinja2 through my python-modernize with the correct parameters, and came to a single code that runs under 2.6, 2.7 and 3.3. By means of small tools, we were able to arrive at a pleasant code base that works with all versions of python and at the same time, for the most part, looks like ordinary code on python.
')
Inspired by this result, I went through the code several times and began to translate some other code in order to experiment with the combined code base.

In this article, I will selectively review some tips and tricks that I can share, in case they help someone in similar situations.

Throw out support 2.5, 3.1 and 3.2


This is one of the most important tips. The refusal to support Python 2.5 today is more than possible, since there are not too many people using it. Rejecting 3.1 and 3.2 is a fairly simple solution, given the low popularity of the third python. But what's the point to refuse to support these versions? In short, 2.6 and 3.3 contain a large number of overlapping syntax and capabilities, which allow the same code to work normally in both cases:


Yes, the six module will help you move forward, but do not underestimate the benefits of being able to see clean code. I trivially lost interest in supporting the Jinja2 ported to the third python, since I was horrified by her code. At that time, the combined code looked ugly and suffered in terms of performance (constant six.b('foo') and six.u('foo') ), or it had a low iteration rate of 2to3. Now, having dealt with this all, I get pleasure again. The Jinja2 code looks very clean, and you have to search to find compatibility support for Python 2 and 3 versions. Only a few pieces of code do something in the style of if PY2:

The rest of the article assumes that you want to support these versions of python. Also, attempts to support Python version 2.5 are very painful and I highly recommend that you refrain from them. 3.2 support is possible if you are ready to wrap all your lines in function calls, which I personally would not recommend doing for aesthetics and performance reasons.

Discard six


Six is ​​a pretty neat library, and Jinja2 started with her. But in the end, if you calculate, then at six there will be not so many necessary things to start the port under the third python. Of course, six is ​​necessary if you are going to support Python 2.5, but starting from 2.6 and more, there are not too many reasons to use six. Jinja2 has a _compat module, which contains some necessary helpers. Including a few lines not on Python 3, the entire compatibility module contains less than 80 lines of code.

This will help you avoid problems when users expect a different version of the six package due to a different library or adding another dependency to your project.

Start with Modernize


Python-modernize is a good library to start porting. This is version 2to3, which generates code that works in both versions of python. Despite the fact that it has enough bugs, and the default options are not the most optimal, it can help you to seriously move forward, doing the boring work for you. In this case, you still have to go over the code and clean up some imports and roughness.

Correct your tests


Before you start doing anything else, go over your tests and make sure that they still have not lost their meaning. A large number of problems in the standard python library versions 3.0 and 3.1 appeared as a result of unarranged changes in test behavior as a result of porting.

Write compatibility module


So, if you decide to give up six, can you live without helpers? The correct answer is no. You still need a small compatibility module, but it should be small enough so that you can keep it in your package. Here is a simple example of how a compatibility module might look like:
 import sys PY2 = sys.version_info[0] == 2 if not PY2: text_type = str string_types = (str,) unichr = chr else: text_type = unicode string_types = (str, unicode) unichr = unichr 

The code for this module will depend on how much has changed for you. In the case of Jinja2, I put several functions there. There, for example, there are functions ifilter , imap and other similar functions from itertools that became part of the standard library in 3.x (I use the function names from 2.x so that the reading code understands that the use of iterators here is deliberate and not an error ).

Check for 2.x, not for 3.x


At some point, you will have to check whether the code runs in 2.x or 3.x versions of python. In this case, I would recommend that you check the second version first, and put the check on the third version in the else branch, and not vice versa. In this case, you will get fewer unpleasant surprises when version 4 of python appears.

Good:
 if PY2: def __str__(self): return self.__unicode__().encode('utf-8') 

Not so perfect:
 if not PY3: def __str__(self): return self.__unicode__().encode('utf-8') 

Processing strings


The biggest change in the third python, no doubt, was the change in the unicode interface. Unfortunately, these changes were quite painful in some places and inconsistently changed the standard library. Most of the porting time will be spent at this stage. In fact, this is a topic for a separate article, but here is a small list of items that Jinja2 and Werkzeug stick to:


In addition to these simple rules, I added variables: text_type , unichr and string_types to my compatibility module, as shown above. As a result, the following changes occur:

I also wrote a class __unicode__ implements_to_string that helps implement classes with __unicode__ or __str__ :
 if PY2: def implements_to_string(cls): cls.__unicode__ = cls.__str__ cls.__str__ = lambda x: x.__unicode__().encode('utf-8') return cls else: implements_to_string = lambda x: x 

The basic idea is to implement the __str__ method in both 2.x and 3.x, allowing it to return unicode strings (yes, it looks somewhat clumsy in 2.x), and the decorator will automatically rename it __unicode__ to 2.x , and adds __str__ which calls __unicode__ and encodes the result of its call in utf-8. This approach has been quite widespread recently in modules for 2.x. So do for example Jinja2 or Django.

Here is an example of use:
 @implements_to_string class User(object): def __init__(self, username): self.username = username def __str__(self): return self.username 

Changes in metaclass syntax


Since in the third python, changes in the syntax for defining metaclasses are incompatible with the second, the porting process becomes a little more difficult. In six, there is a with_metaclass function that is designed to solve this problem. It creates an empty class, which is then visible in the inheritance tree. I did not like this solution for Jinja2, so I changed it. The external API remains the same, but the implementation uses a temporary class to add the metaclass. The advantages of such a decision are that you do not need to pay performance for using it, while the inheritance tree remains clean.

The solution code is somewhat confusing. The basic idea relies on the ability of the metaclass to change the class during creation, which is used by the parent class. My solution uses a metaclass to remove its parent from the inheritance tree when inheriting classes. In the end, the function creates an empty class with an empty metaclass. The metaclass of the inherited empty class has a constructor, which instantiates a new class from the correct parent and assigns the correct metaclass (Note: I'm not sure that I translated everything correctly - the source below seems to me more eloquent) . Thus, empty class and metaclass are never visible.

Here's what it looks like:
 def with_metaclass(meta, *bases): class metaclass(meta): __call__ = type.__call__ __init__ = type.__init__ def __new__(cls, name, this_bases, d): if this_bases is None: return type.__new__(cls, name, (), d) return meta(name, bases, d) return metaclass('temporary_class', None, {}) And here is how you use it: class BaseForm(object): pass class FormType(type): pass class Form(with_metaclass(FormType, BaseForm)): pass 

Dictionaries


One of the disruptive changes in the third python were changes to the protocols of dictionary iterators. In Python, all dictionaries had methods: keys() , values() and items() , which returned lists, and iterkeys() , itervalues() and iteritems() , which returned iterators. In the third python, none of them are present. Instead, they were replaced by methods that return view objects.

keys() returns a view object that behaves like an immutable set, values() returns an iterable container that can only be read (but not an iterator!), and items() returns something like an immutable set. Unlike regular sets, they can also point to objects that can be changed, in which case some methods may fall while the program is running.

Despite the fact that a large number of people miss the point that view-objects are not iterators, in most cases you can simply ignore this. Werkzeug and Django implement several of their own dictionary-like objects, and in both cases the solution was to simply ignore the existence of the view-objects, and allow keys() and his friends to return iterators.

At the moment, this is the only sensible solution given the limitations that the Python interpreter places. There are problems with:

This is where Jinja2 stopped in terms of dictionary iteration:
 if PY2: iterkeys = lambda d: d.iterkeys() itervalues = lambda d: d.itervalues() iteritems = lambda d: d.iteritems() else: iterkeys = lambda d: iter(d.keys()) itervalues = lambda d: iter(d.values()) iteritems = lambda d: iter(d.items()) 

To implement objects like dictionaries, the class decorator helps us again:
 if PY2: def implements_dict_iteration(cls): cls.iterkeys = cls.keys cls.itervalues = cls.values cls.iteritems = cls.items cls.keys = lambda x: list(x.iterkeys()) cls.values = lambda x: list(x.itervalues()) cls.items = lambda x: list(x.iteritems()) return cls else: implements_dict_iteration = lambda x: x 

In this case, all you have to do is implement the keys() method and its friends as iterators, everything else happens automatically.
 @implements_dict_iteration class MyDict(object): ... def keys(self): for key, value in iteritems(self): yield key def values(self): for key, value in iteritems(self): yield value def items(self): ... 

General iterator changes


Since the iterators have basically changed, a couple of helpers are needed to correct the situation. In fact, the only change was the transition from next() to __next__ . Fortunately, this is already handled transparently. The only thing you need to do is fix x.next() to next(x) , and the python takes care of the rest.

If you plan to declare iterators, again, the class decorator will help:
 if PY2: def implements_iterator(cls): cls.next = cls.__next__ del cls.__next__ return cls else: implements_iterator = lambda x: x 

To implement a class, just name the method of the next iteration step __next__ :
 @implements_iterator class UppercasingIterator(object): def __init__(self, iterable): self._iter = iter(iterable) def __iter__(self): return self def __next__(self): return next(self._iter).upper() 


Codec change


One of the great features of the encoding protocol in the second python was its type independence. You could register the encoding that would translate the csv file into the numpy array, if you needed it. This possibility, however, was not very well known, since during the demonstrations, the main objects of the encodings were string objects. Starting from 3.x, they became more stringent, so most of the functionality was removed in version 3.0, and returned back to 3.3, because proved its favor. Simply put, codecs that would not deal with the encoding between unicode and bytes were unavailable until 3.3. Among them, for example, hex and base64 codecs.

Here are two examples of using these codecs: operations on strings and operations on data streams. Good old str.encode() from 2.x is now mutated. If you want to support 2.x and 3.x, subject to changes to the API string:
 >>> import codecs >>> codecs.encode(b'Hey!', 'base64_codec') 'SGV5IQ==\n' 

You will also notice that the codecs in 3.3 have lost aliases, and you need to write explicitly 'base64_codec' , instead of 'base64' .

The use of these codecs is preferable to using functions from the binacsii module, since they support data stream operations through support for incremental encoding and decoding .

Other notes


There are also a few points for which I still do not have a good solution, or which are annoying, but are so rare that I don’t want to deal with them. Some of them, unfortunately, are part of the third python API and are almost invisible until you consider the boundary cases.


Conclusion


A single code for 2.x and 3.x today is quite possible. , , , API . , , 2.5, 3.0-3.2, .

Source: https://habr.com/ru/post/185518/


All Articles