Push non-ASCII to inappropriate locations.

I was sitting at home in the evening, thinking what to do. BUT! Python has a debugger, but it has a completely ugly prompt for input. Let me get the powerline in there. The case would seem quite trivial: you just need to create your own subclass of pdb.Pdb with your property , right?

def use_powerline_prompt(cls): '''Decorator that installs powerline prompt to the class ''' @property def prompt(self): try: powerline = self.powerline except AttributeError: powerline = PDBPowerline() powerline.setup(self) self.powerline = powerline return powerline.render(side='left') @prompt.setter def prompt(self, _): pass cls.prompt = prompt return cls

Not. On Python-3, such code can still work, but on Python-2, the problem is already waiting for us: for output, it is necessary to turn the Unicode string into a set of bytes , which requires an indication of the encoding. Well, it's simple:

 encoding = get_preferred_output_encoding() def prompt(self): … ret = powerline.render(side='left') if not isinstance(ret, str): # Python-2 ret = ret.encode(encoding) return ret

. It's simple and it works ... until the user installs pdbpp . Now we are greeted by a number of errors related to the fact that pdbpp can use pyrepl, and pyrepl does not work with Unicode (and whether pyrepl will be used somehow depends on the value of $TERM ). Errors related to the fact that in the invitation someone does not want to see Unicode are not new - even IPython tried to disable Unicode in the rewrite prompt². But here everything is much worse: pyrepl uses from __future__ import unicode_literals , while doing using ordinary strings (turned by this import into Unicode) various operations on the prompt string, which is explicitly convertible into str at the very beginning.

So, this is what we need:

A unicode descendant class that would be converted to str without throwing errors on non-ASCII characters (conversion is carried out simply in the form of str(prompt) ). This part is very simple: you need to override the __str__ and __new__ __str__ (you can do without the second, in principle, but it’s more convenient when converting to this class from the following and to be able to explicitly specify the encoding to be used).
A class-successor str , in which the previous class would be converted. Here redefinition of two methods is categorically insufficient:
1. __new__ needed to easily save the encoding and there is no need for explicit conversion unicode → str .
2. __contains__ and several other methods should work with unicode arguments as if the current class is unicode (for non- unicode arguments, nothing needs to be changed). The fact is that if unicode_literals are unicode_literals '\n' in prompt throws an exception if the prompt is a byte string with non-ASCII characters, as Python tries to bring the prompt to unicode , and not vice versa.
3. find and similar functions should work with unicode arguments as if they were byte strings in the current encoding. It is necessary that they give out the correct indexes, but at the same time they do not fail with errors due to the conversion of a byte string to a unicode one (and here, why is the conversion not the inverse?).
4. __len__ should give the length of the string in unicode codepoints. This part is needed so that pyrepl, which considers where the invitation ends (and sets the cursor accordingly), is not mistaken and does not make a giant space between the invitation and the cursor. I suspect that you need to actually use not codepoints, but the width of the line in the screen cells (what, for example, strdisplaywidth () does in Vim).
5. __add__ should return our first class, the unicode heir, when added to the Unicode string. __radd__ should do the same. The addition of byte strings should be given by our class- str . More in the next paragraph.
6. Well, finally, __getslice__ (note: __getitem__ does not roll, str uses deprecated __getslice__ for slices) should return an object of the same class, because pyrepl at the very end adds the empty Unicode string, the slice from the current class and another slice from it. And if we ignore this part, we again get some UnicodeError .

The result will be the following two freaks:

 class PowerlineRenderBytesResult(bytes): def __new__(cls, s, encoding=None): encoding = encoding or s.encoding self = bytes.__new__(cls, s.encode(encoding) if isinstance(s, unicode) else s) self.encoding = encoding return self for meth in ( '__contains__', 'partition', 'rpartition', 'split', 'rsplit', 'count', 'join', ): exec(( 'def {0}(self, *args):\n' ' if any((isinstance(arg, unicode) for arg in args)):\n' ' return self.__unicode__().{0}(*args)\n' ' else:\n' ' return bytes.{0}(self, *args)' ).format(meth)) for meth in ( 'find', 'rfind', 'index', 'rindex', ): exec(( 'def {0}(self, *args):\n' ' if any((isinstance(arg, unicode) for arg in args)):\n' ' args = [arg.encode(self.encoding) if isinstance(arg, unicode) else arg for arg in args]\n' ' return bytes.{0}(self, *args)' ).format(meth)) def __len__(self): return len(self.decode(self.encoding)) def __getitem__(self, *args): return PowerlineRenderBytesResult(bytes.__getitem__(self, *args), encoding=self.encoding) def __getslice__(self, *args): return PowerlineRenderBytesResult(bytes.__getslice__(self, *args), encoding=self.encoding) @staticmethod def add(encoding, *args): if any((isinstance(arg, unicode) for arg in args)): return ''.join(( arg if isinstance(arg, unicode) else arg.decode(encoding) for arg in args )) else: return PowerlineRenderBytesResult(b''.join(args), encoding=encoding) def __add__(self, other): return self.add(self.encoding, self, other) def __radd__(self, other): return self.add(self.encoding, other, self) def __unicode__(self): return PowerlineRenderResult(self) class PowerlineRenderResult(unicode): def __new__(cls, s, encoding=None): encoding = ( encoding or getattr(s, 'encoding', None) or get_preferred_output_encoding() ) if isinstance(s, unicode): self = unicode.__new__(cls, s) else: self = unicode.__new__(cls, s, encoding, 'replace') self.encoding = encoding return self def __str__(self): return PowerlineRenderBytesResult(self)

(in Python2 bytes is str ).

The result on github while is only in my branch , will later develop main repository.
Of course, the result is not limited only to pyrepl, but can be used in various places where you can not slip a non-ASCII string, but you really want to.
')

¹ With TERM=xterm-256color I get errors from pyrepl, and with TERM= or TERM=konsole-256color - no, and everything works fine.
² What you will see if you enable autocall in IPython and type int 42 : Powerline IPython in and rewrite prompt

(bottom line).

Source: https://habr.com/ru/post/249129/

All Articles

Push non-ASCII to inappropriate locations.

More articles: