Porting to python 3. Bug fixes

Note from the translator:
I present to you the translation of an interesting article by Armin Ronaker, the author of the web frameworks Flask and Werkzeug, the template engine Jinja2, and generally a well-known pythonist about current techniques and pitfalls he uses in his projects when adding support for the third python. A small note about the title of this article. It is a reference to Armin's article “Porting to Python 3. Manual,” in which he described the preparation of code for automatic porting through the 2to3 utility. As practice shows, today such an approach is rather an anti-pattern, since on the one hand, the quality of the code as a result of such operations deteriorates markedly, and in addition, such code is noticeably more difficult to maintain.

After the extremely painful experience of porting Jinja2 to the third python, I had to leave the project idle for a while, because I was too afraid to break support for python version 3. The approach I used was to write code for python version 2 and translate using 2to3 to the third python during package installation. The most unpleasant side effect is that any change you make requires approximately a minute to translate, thereby killing the speed of your iterations. Fortunately, it turned out that if you correctly specify the final version of python, the process goes significantly faster.

Thomas Waldman from the MoinMoin project started by running Jinja2 through my python-modernize with the correct parameters, and came to a single code that runs under 2.6, 2.7 and 3.3. By means of small tools, we were able to arrive at a pleasant code base that works with all versions of python and at the same time, for the most part, looks like ordinary code on python.
')
Inspired by this result, I went through the code several times and began to translate some other code in order to experiment with the combined code base.

In this article, I will selectively review some tips and tricks that I can share, in case they help someone in similar situations.

Throw out support 2.5, 3.1 and 3.2

This is one of the most important tips. The refusal to support Python 2.5 today is more than possible, since there are not too many people using it. Rejecting 3.1 and 3.2 is a fairly simple solution, given the low popularity of the third python. But what's the point to refuse to support these versions? In short, 2.6 and 3.3 contain a large number of overlapping syntax and capabilities, which allow the same code to work normally in both cases:

Compatible string literals. 2.6 and 3.3 support the same syntax for strings. You can use both 'foo' for native string types (byte strings in 2.x and unicode strings in 3.x), and u'foo' for unicode strings and b'foo' for byte strings or byte objects.
Compatible print syntax. In case you use print 's, you can add from __future__ import print_function and use print as a function, without the need to use a wrapper function and suffer from other incompatibilities.
Compatible catching syntax. In Python 2.6, the new syntax except Exception as e , which is used in 3.x, has appeared.
Class decorators are available. They are extremely useful for automatically correcting relocated interfaces, without the need to leave traces on the class structure. For example, they can help automatically rename a method name from next to __next__ , or __str__ to __unicode__ in python 2.x.
The next() built-in function for calling next or __next__ . Convenient because it works at about the same speed as a direct method call, so you do not have to pay performance compared to checks in runtime or adding your own wrapper function.
In Python 2.6 a new type of bytearray was added with the same interface as in 3.3. This is useful because while Python 2.6 lacks a bytes object, it has a built-in object, which, having the same name, is a synonym for str and behaves completely differently.
In Python 3.3, codecs from the bytes to bytes and from strings to strings that were broken in 3.1 and 3.2 reappeared. Unfortunately, their interfaces have become more complicated, and there are no aliases, but this is all much closer to what was in 2.x than before. This is especially important if you need stream-based encoding. This functionality was completely absent from 3.0 to 3.3.

Yes, the six module will help you move forward, but do not underestimate the benefits of being able to see clean code. I trivially lost interest in supporting the Jinja2 ported to the third python, since I was horrified by her code. At that time, the combined code looked ugly and suffered in terms of performance (constant six.b('foo') and six.u('foo') ), or it had a low iteration rate of 2to3. Now, having dealt with this all, I get pleasure again. The Jinja2 code looks very clean, and you have to search to find compatibility support for Python 2 and 3 versions. Only a few pieces of code do something in the style of if PY2:

The rest of the article assumes that you want to support these versions of python. Also, attempts to support Python version 2.5 are very painful and I highly recommend that you refrain from them. 3.2 support is possible if you are ready to wrap all your lines in function calls, which I personally would not recommend doing for aesthetics and performance reasons.

Discard six

Six is a pretty neat library, and Jinja2 started with her. But in the end, if you calculate, then at six there will be not so many necessary things to start the port under the third python. Of course, six is necessary if you are going to support Python 2.5, but starting from 2.6 and more, there are not too many reasons to use six. Jinja2 has a _compat module, which contains some necessary helpers. Including a few lines not on Python 3, the entire compatibility module contains less than 80 lines of code.

This will help you avoid problems when users expect a different version of the six package due to a different library or adding another dependency to your project.

Start with Modernize

Python-modernize is a good library to start porting. This is version 2to3, which generates code that works in both versions of python. Despite the fact that it has enough bugs, and the default options are not the most optimal, it can help you to seriously move forward, doing the boring work for you. In this case, you still have to go over the code and clean up some imports and roughness.

Correct your tests

Before you start doing anything else, go over your tests and make sure that they still have not lost their meaning. A large number of problems in the standard python library versions 3.0 and 3.1 appeared as a result of unarranged changes in test behavior as a result of porting.

Write compatibility module

So, if you decide to give up six, can you live without helpers? The correct answer is no. You still need a small compatibility module, but it should be small enough so that you can keep it in your package. Here is a simple example of how a compatibility module might look like:

 import sys PY2 = sys.version_info[0] == 2 if not PY2: text_type = str string_types = (str,) unichr = chr else: text_type = unicode string_types = (str, unicode) unichr = unichr

The code for this module will depend on how much has changed for you. In the case of Jinja2, I put several functions there. There, for example, there are functions ifilter , imap and other similar functions from itertools that became part of the standard library in 3.x (I use the function names from 2.x so that the reading code understands that the use of iterators here is deliberate and not an error ).

Check for 2.x, not for 3.x

At some point, you will have to check whether the code runs in 2.x or 3.x versions of python. In this case, I would recommend that you check the second version first, and put the check on the third version in the else branch, and not vice versa. In this case, you will get fewer unpleasant surprises when version 4 of python appears.

Good:

 if PY2: def __str__(self): return self.__unicode__().encode('utf-8')

Not so perfect:

 if not PY3: def __str__(self): return self.__unicode__().encode('utf-8')

Processing strings

The biggest change in the third python, no doubt, was the change in the unicode interface. Unfortunately, these changes were quite painful in some places and inconsistently changed the standard library. Most of the porting time will be spent at this stage. In fact, this is a topic for a separate article, but here is a small list of items that Jinja2 and Werkzeug stick to:

'foo' always means what I call the native implementation of the string. These are strings that are used in identifiers, source code, file names, and other low-level functions. In addition, in 2.x it is permissible to use unicode strings as literals, but only if they contain only ASCII characters.

This feature is very useful for a single code base, since the general trend in the third python is to add support for unicode in interfaces that did not support it before, and never vice versa. Since native string literals “upgrade” to Unicode, but support Unicode 2.x, they can be very useful.

For example, the datetime.strftime function does not support unicode in python in the second, but is only unicode in the third version. Since in most cases the return value in 2.x was exclusively in ASCII, such things will work in both 2.x and 3.x:
```
 >>> u'<p>Current time: %s' % datetime.datetime.utcnow().strftime('%H:%M') u'<p>Current time: 23:52' 
```
The string passed to strftime native (bytes in 2.x, unicode in 3.x). The return value is again a native string and is exclusive to ASCII. As a result, a correctly formatted Unicode string will be returned in both 2.x and 3.x.
u'foo' always means a unicode string. A large number of libraries already support Unicode 2.x perfectly, so Unicode literals are no surprise to anyone.
b'foo' always means something that can store real bytes. Since 2.6, in fact, does not have a bytes object, unlike Python 3.3, which in turn lacks real byte strings, the usefulness of this literal is somewhat limited. But it again becomes useful if used in conjunction with a bytearray object, which has the same interface in 2.x and 3.x:
```
 >>> bytearray(b' foo ').strip() bytearray(b'foo') 
```
Since it is mutable, you can convert it into something more familiar from bytes directly by wrapping the result back into bytes() .

In addition to these simple rules, I added variables: text_type , unichr and string_types to my compatibility module, as shown above. As a result, the following changes occur:

isinstance(x, basestring) becomes isinstance(x, string_types)
isinstance(x, unicode) becomes isinstance(x, text_type)
isinstance(x, str) if byte handling becomes isinstance(x, bytes) becomes isinstance(x, bytes) , or isinstance(x, (bytes, bytearray))

I also wrote a class __unicode__ implements_to_string that helps implement classes with __unicode__ or __str__ :

 if PY2: def implements_to_string(cls): cls.__unicode__ = cls.__str__ cls.__str__ = lambda x: x.__unicode__().encode('utf-8') return cls else: implements_to_string = lambda x: x

The basic idea is to implement the __str__ method in both 2.x and 3.x, allowing it to return unicode strings (yes, it looks somewhat clumsy in 2.x), and the decorator will automatically rename it __unicode__ to 2.x , and adds __str__ which calls __unicode__ and encodes the result of its call in utf-8. This approach has been quite widespread recently in modules for 2.x. So do for example Jinja2 or Django.

Here is an example of use:

 @implements_to_string class User(object): def __init__(self, username): self.username = username def __str__(self): return self.username

Changes in metaclass syntax

Since in the third python, changes in the syntax for defining metaclasses are incompatible with the second, the porting process becomes a little more difficult. In six, there is a with_metaclass function that is designed to solve this problem. It creates an empty class, which is then visible in the inheritance tree. I did not like this solution for Jinja2, so I changed it. The external API remains the same, but the implementation uses a temporary class to add the metaclass. The advantages of such a decision are that you do not need to pay performance for using it, while the inheritance tree remains clean.

The solution code is somewhat confusing. The basic idea relies on the ability of the metaclass to change the class during creation, which is used by the parent class. My solution uses a metaclass to remove its parent from the inheritance tree when inheriting classes. In the end, the function creates an empty class with an empty metaclass. The metaclass of the inherited empty class has a constructor, which instantiates a new class from the correct parent and assigns the correct metaclass (Note: I'm not sure that I translated everything correctly - the source below seems to me more eloquent) . Thus, empty class and metaclass are never visible.

Here's what it looks like:

 def with_metaclass(meta, *bases): class metaclass(meta): __call__ = type.__call__ __init__ = type.__init__ def __new__(cls, name, this_bases, d): if this_bases is None: return type.__new__(cls, name, (), d) return meta(name, bases, d) return metaclass('temporary_class', None, {}) And here is how you use it: class BaseForm(object): pass class FormType(type): pass class Form(with_metaclass(FormType, BaseForm)): pass

Dictionaries

One of the disruptive changes in the third python were changes to the protocols of dictionary iterators. In Python, all dictionaries had methods: keys() , values() and items() , which returned lists, and iterkeys() , itervalues() and iteritems() , which returned iterators. In the third python, none of them are present. Instead, they were replaced by methods that return view objects.

keys() returns a view object that behaves like an immutable set, values() returns an iterable container that can only be read (but not an iterator!), and items() returns something like an immutable set. Unlike regular sets, they can also point to objects that can be changed, in which case some methods may fall while the program is running.

Despite the fact that a large number of people miss the point that view-objects are not iterators, in most cases you can simply ignore this. Werkzeug and Django implement several of their own dictionary-like objects, and in both cases the solution was to simply ignore the existence of the view-objects, and allow keys() and his friends to return iterators.

At the moment, this is the only sensible solution given the limitations that the Python interpreter places. There are problems with:

The fact that view objects are not iterators in themselves means that you create temporary objects for no particular reason.
Behavior similar to sets of embedded view-objects of dictionaries cannot be reproduced on pure python, due to interpreter restrictions
Implementing view objects for 3.x and iterators for 2.x would mean a lot of repetition of the code.

This is where Jinja2 stopped in terms of dictionary iteration:

 if PY2: iterkeys = lambda d: d.iterkeys() itervalues = lambda d: d.itervalues() iteritems = lambda d: d.iteritems() else: iterkeys = lambda d: iter(d.keys()) itervalues = lambda d: iter(d.values()) iteritems = lambda d: iter(d.items())

To implement objects like dictionaries, the class decorator helps us again:

 if PY2: def implements_dict_iteration(cls): cls.iterkeys = cls.keys cls.itervalues = cls.values cls.iteritems = cls.items cls.keys = lambda x: list(x.iterkeys()) cls.values = lambda x: list(x.itervalues()) cls.items = lambda x: list(x.iteritems()) return cls else: implements_dict_iteration = lambda x: x

In this case, all you have to do is implement the keys() method and its friends as iterators, everything else happens automatically.

 @implements_dict_iteration class MyDict(object): ... def keys(self): for key, value in iteritems(self): yield key def values(self): for key, value in iteritems(self): yield value def items(self): ...

General iterator changes

Since the iterators have basically changed, a couple of helpers are needed to correct the situation. In fact, the only change was the transition from next() to __next__ . Fortunately, this is already handled transparently. The only thing you need to do is fix x.next() to next(x) , and the python takes care of the rest.

If you plan to declare iterators, again, the class decorator will help:

 if PY2: def implements_iterator(cls): cls.next = cls.__next__ del cls.__next__ return cls else: implements_iterator = lambda x: x

To implement a class, just name the method of the next iteration step __next__ :

 @implements_iterator class UppercasingIterator(object): def __init__(self, iterable): self._iter = iter(iterable) def __iter__(self): return self def __next__(self): return next(self._iter).upper()

Codec change

One of the great features of the encoding protocol in the second python was its type independence. You could register the encoding that would translate the csv file into the numpy array, if you needed it. This possibility, however, was not very well known, since during the demonstrations, the main objects of the encodings were string objects. Starting from 3.x, they became more stringent, so most of the functionality was removed in version 3.0, and returned back to 3.3, because proved its favor. Simply put, codecs that would not deal with the encoding between unicode and bytes were unavailable until 3.3. Among them, for example, hex and base64 codecs.

Here are two examples of using these codecs: operations on strings and operations on data streams. Good old str.encode() from 2.x is now mutated. If you want to support 2.x and 3.x, subject to changes to the API string:

 >>> import codecs >>> codecs.encode(b'Hey!', 'base64_codec') 'SGV5IQ==\n'

You will also notice that the codecs in 3.3 have lost aliases, and you need to write explicitly 'base64_codec' , instead of 'base64' .

The use of these codecs is preferable to using functions from the binacsii module, since they support data stream operations through support for incremental encoding and decoding .

Other notes

There are also a few points for which I still do not have a good solution, or which are annoying, but are so rare that I don’t want to deal with them. Some of them, unfortunately, are part of the third python API and are almost invisible until you consider the boundary cases.

The file system and file IO access continue to annoy on Linux, because it is not based on unicode. The open() function and the file system level can have dangerous default settings. If I, for example, log in via SSH to a machine with the en_US locale from a machine with de_AT , the python likes to switch to ASCII encoding and to work with the file system and for file operations.

In general, I consider the most reliable way to work with text in the third python, which also works fine in 2.x - just open files in binary mode and decode it explicitly. Alternatively, you can also use the codecs.open or io.open in 2.x and the built-in open in 3.x with an explicit indication of the encoding.
URLs in the standard library are displayed incorrectly in the form of unicode, which may prevent normal use of some URLs in 3.x.
Throwing exceptions with a treysback object requires a helper function because The syntax has been changed. This is, in general, not a very common problem and is quite simply solved by a wrapper. Since the syntax has changed, here you have to put the code inside the exec block:
```
 if PY2: exec('def reraise(tp, value, tb):\n raise tp, value, tb') else: def reraise(tp, value, tb): raise value.with_traceback(tb) 
```
The previous exec hack is useful if you have code that depends on the syntax. But since the syntax of exec itself has changed, you now have no opportunity to call anything with arbitrary namespace. This is not too big a problem, because eval and compile can be used as a replacement, which works in both versions. You can also declare the exec_ function, via exec itself.
```
 exec_ = lambda s, *a: eval(compile(s, '<string>', 'exec'), *a) 
```
If you have a C-module written over the Python C API, you can immediately shoot yourself. At the moment I am not aware of the existence of any tools that could help here. Use this opportunity to change the way you use to write modules and rewrite everything using cffi or ctypes . If you do not consider this option because you have something like numpy, then all you have to do is humbly accept pain. You can also try to write some abomination, on top of the C preprocessor, which will help make porting easier.
Use tox for local testing. The ability to run tests under all the necessary versions of python at a time is a very cool thing that will help you avoid a lot of problems.

Conclusion

A single code for 2.x and 3.x today is quite possible. , , , API . , , 2.5, 3.0-3.2, .

Source: https://habr.com/ru/post/185518/

All Articles