⬆️ ⬇️

How I struggled with the encodings in the console

Once again, running my script informer for SamIzdat in Windows and seeing the “mysterious characters” in the console , I said to myself: “Yes, finally, make yourself a normal cross-platform logging!”



About it, and how to color a log output like Django-vsky in Win32 I will try to tell under a habr-kat (Everything written below is applicable to Python 2.x to a branch)



Task one. Correct text output to the console


Symptoms


As long as we do not make any “corrections” to the initialized I / O system and use only the print operator with unicode strings, everything goes more or less normally regardless of the OS.



“Miracles” start further - if we change any encodings (see a little further) or use the logging module for displaying on the screen. It seems that by setting the expected behavior in Linux, in Windows you get garbage in utf-8. You start to rule under Win - 1251 gets out in the console ...

')

Theoretical excursion


Several parameters are responsible for the parameters for converting characters and outputting them to the console:



We are looking for a solution


Obviously, to get rid of all these problems, you must somehow bring them to uniformity.

And here the most interesting begins:

 # -*- coding: utf-8 -*- >>> import sys >>> import locale >>> print sys.getdefaultencoding() ascii >>> print locale.getpreferredencoding() # linux UTF-8 >>> print locale.getpreferredencoding() # win32/rus cp1251 #   : >>> print sys.stdout.encoding # linux UTF-8 >>> print sys.stdout.encoding # win32 cp866 


Aha It turns out that the “system” lives in general in ASCII. As a result, the attempt to work with input / output in a simple way ends with the “favorite” exception UnicodeEncodeError/UnicodeDecodeError .



In addition, as is remarkably evident from the example, if on linux we have utf-8 everywhere, then on Windows there are two different encodings - the so-called ANSI, also cp1251, used for the graphic part and OEM, it is also cp866, for displaying text in console. OEM encoding has come to us since DOS, and, theoretically, it can also be reconfigured by special teams, but in practice no one has been doing this for a long time.



Until recently, I used a common way to fix this trouble:

 #!/usr/bin/env python # -*- coding: utf-8 -*- # ============== # Main script file # ============== import sys reload(sys) sys.setdefaultencoding('utf-8') #  import locale sys.setdefaultencoding(locale.getpreferredencoding()) # ... 


And this, in general, worked. Worked until used print th. When you go to the output to the screen through logging everything is broken.

Yeah, I thought, since "it" uses the default encoding, I will set the same encoding as in the console:

 sys.setdefaultencoding(sys.stdout.encoding or sys.stderr.encoding) 


Already a little better, but:



Looking closely at the first example, it is easy to see that the desired cp866 encoding can only be obtained by checking the attribute of the corresponding stream. And he is not always available.

The second part of the task is to leave the system encoding in utf-8, but correctly configure the output to the console.

To customize the output, you need to override the processing of output streams like this:

 import sys import codecs sys.stdout = codecs.getwriter('cp866')(sys.stdout,'replace') 


This code allows you to kill two birds with one stone - set the desired encoding and protect yourself from exceptions when printing any umlauts and other typography, which is absent in 255 cp866 characters.

It remains to make this code universal - how can I know the OEM encoding on an arbitrary spherical computer? Googling for ready support for ANSI / OEM coding in python did not give anything sensible, therefore I had to remember a little WinAPI

 UINT GetOEMCP(void); //   OEM     UINT GetANSICP(void); //    ANSI   


... and put everything together:

 # -*- coding: utf-8 -*- import sys import codecs def setup_console(sys_enc="utf-8"): reload(sys) try: #  win32     if sys.platform.startswith("win"): import ctypes enc = "cp%d" % ctypes.windll.kernel32.GetOEMCP() #TODO:   win64/python64 else: #  Linux , ,    enc = (sys.stdout.encoding if sys.stdout.isatty() else sys.stderr.encoding if sys.stderr.isatty() else sys.getfilesystemencoding() or sys_enc) #   sys sys.setdefaultencoding(sys_enc) #    ,     if sys.stdout.isatty() and sys.stdout.encoding != enc: sys.stdout = codecs.getwriter(enc)(sys.stdout, 'replace') if sys.stderr.isatty() and sys.stderr.encoding != enc: sys.stderr = codecs.getwriter(enc)(sys.stderr, 'replace') except: pass # ?    -  -... 


Task two. Color the output


Looking at the debug output of Dzhangi in conjunction with werkzeug, I wanted something similar for myself. Googling produces several projects of varying degrees of development and convenience - from the simplest heir to logging.StreamHandler , to a certain set, when importing automatically replacing the standard StreamHandler.



Having tried several of them, I, as a result, used the simplest heir of StreamHandler, cited in one of the comments on Stack Overflow and so far I am quite pleased:

 class ColoredHandler( logging.StreamHandler ): def emit( self, record ): # Need to make a actual copy of the record # to prevent altering the message for other loggers myrecord = copy.copy( record ) levelno = myrecord.levelno if( levelno >= 50 ): # CRITICAL / FATAL color = '\x1b[31;1m' # red elif( levelno >= 40 ): # ERROR color = '\x1b[31m' # red elif( levelno >= 30 ): # WARNING color = '\x1b[33m' # yellow elif( levelno >= 20 ): # INFO color = '\x1b[32m' # green elif( levelno >= 10 ): # DEBUG color = '\x1b[35m' # pink else: # NOTSET and anything else color = '\x1b[0m' # normal myrecord.msg = (u"%s%s%s" % (color, myrecord.msg, '\x1b[0m')).encode('utf-8') # normal logging.StreamHandler.emit( self, myrecord ) 


However, in Windows all this, of course, refused to work. And if earlier it was possible to “enable” the support of ansi-codes in the console by adding the “magic” ansi.dll from the symfony project somewhere in the depths of the Windows system folders, then starting (it seems) with Windows 7, this feature is finally “cut out” from the system . Yes, and forcing the user to copy some dll in the system folder is also somehow "not kosher."



Again, we turn to Google and, again, we get several solutions. One way or another, all options are reduced to replacing the output of ANSI escape sequences by calling WinAPI to manage console attributes.



After wandering for some time on the links, I came across a project colorama . He somehow liked me more than the rest. The advantages of this particular project should be attributed to the fact that the entire console output is replaced - you can display colored text with a simple print u"\x1b[31;40m- \x1b[0m" if you suddenly want to pervert.



Immediately, I note that the current version 0.1.18 contains an annoying bug that breaks the output of unicode strings. But I gave the simplest solution in the same place when creating the issue.



Actually, it remains to combine both wishes and start using instead of the traditional “crutches”:

 # -*- coding: utf-8 -*- import sys import codecs import copy import logging #: Is ANSI printing available ansi = not sys.platform.startswith("win") def setup_console(sys_enc='utf-8', use_colorama=True): """ Set sys.defaultencoding to `sys_enc` and update stdout/stderr writers to corresponding encoding .. note:: For Win32 the OEM console encoding will be used istead of `sys_enc` """ global ansi reload(sys) try: if sys.platform.startswith("win"): #... ,   if use_colorama and sys.platform.startswith("win"): try: #   colorama      `ansi`,    from colorama import init init() ansi = True except: pass class ColoredHandler( logging.StreamHandler ): def emit( self, record ): # Need to make a actual copy of the record # to prevent altering the message for other loggers myrecord = copy.copy( record ) levelno = myrecord.levelno if( levelno >= 50 ): # CRITICAL / FATAL color = '\x1b[31;1m' # red elif( levelno >= 40 ): # ERROR color = '\x1b[31m' # red elif( levelno >= 30 ): # WARNING color = '\x1b[33m' # yellow elif( levelno >= 20 ): # INFO color = '\x1b[32m' # green elif( levelno >= 10 ): # DEBUG color = '\x1b[35m' # pink else: # NOTSET and anything else color = '\x1b[0m' # normal myrecord.msg = (u"%s%s%s" % (color, myrecord.msg, '\x1b[0m')).encode('utf-8') # normal logging.StreamHandler.emit( self, myrecord ) 


Further in the project, in the started file we use:

 #!/usr/bin/env python # -*- coding: utf-8 -*- from setupcon import setup_console setup_console('utf-8', False) #... #       import setupcon setupcon.setup_console() import logging #... if setupcon.ansi: logging.getLogger().addHandler(setupcon.ColoredHandler()) 




That's all. From the potential improvements, it remains to test the performance under win64 python and, possibly, to add ColoredHandler to check itself on isatty, as in more complex examples on the same StackOverflow.



The final version of the resulting module can be picked up at dumpz.org

Source: https://habr.com/ru/post/117236/



All Articles