Learning to deal with

Recently, I had the need to download a bunch of all documents from the web. Naturally not pens, but pitonya scripts. But the trouble is that quite often the pages contain cracks of ~~some kind of crap~~ .

')
Of course, there are a lot of online decoders, such as 2cyr, etc. But this is all wrong - I still want to be able to repair krakozyabry in scripts. I rummaged through a bunch of places - I did not find anything suitable for a python. As a result, he scratched his turnip and threw his bike. The bike rides slowly, but it rides.

The resulting library is less intelligent than 2cyr - it is not able, for example, to decode such "cracks": ирилица
In fact, this solution can do only one thing - to unravel the chains of successive transcoding into a readable form. For example, if the text in the CP1251 encoding was displayed in the KOI8-R encoding, then something like this would turn out: PUYNGAPPSH AKEYURE.

I do not pursue the goal now to paint something in detail and produce a long post. I want to save the time of someone to whom my decision may be useful.

Actually

 $ pip install recoder

 $ git clone https://bitbucket.org/dkuryakin/recoder.git $ cd recoder && python setup.py install

After that you can do this:

 $ echo   | python -mrecoder utf-8 #    .

Use on health, at the same time someone under Windows can potestit (:

UPD.

I finished the code a little bit, now it can do almost all cases from the examples on 2cyr.com. For example, "& egrave; & eth; & egrave; & euml; & egrave; & ouml; & agrave;" or "% D0% A2% D0% BE% D0% B2% D0% B0 +% D0% B5 +% D0% BA".

Source: https://habr.com/ru/post/216969/

All Articles

Learning to deal with

More articles: