Export favorites Habra in FB2

Hate long intro

And so I will not write them even under the spoiler.

What for?
- For offline viewing on the readers.
My reader does not support FB2!
- Universal Converter
Want!
1. Let's get Python 2.7+ . Tested in Python 2.7.3 .
2. Put the library BeautifulSoup 4. Briefly options:
  - apt-get install python-beautifulsoup4
  - easy_install beautifulsoup4
  - pip install beautifulsoup4
  - Sources and python setup.py install
3. Download the code from the repository ( direct link to the latest version).
4. Open the file habrafav.py and in the line username = ... write your login.
5. python habrafav.py (or just habrafav.py under Windows)
6. We wait. With cached data, export ~ 150 articles takes about 6 minutes and 600 MB of RAM.
7. We take habrahabr_favorites.fb2 . Mine takes about 62 MB.

It has already been .
- I know. Only
  - PDF is not normally displayed everywhere;
  - I never managed to run that code.
Do you have any comments?
- Not. There is a parsing, but there is no export. It is easy to fasten, but then the resulting file will be inflated another two or three times.
Why FB2?
- Because it is XML. Do not believe? Description scheme .
Is the conversion correct?
- Not really. Validation of the resulting files do not pass.
- Validation results of my favorites
  - This element is not expected.
    <empty-line> - 287 times
    <code> - 83 times
    <emphasis> - 19 times
    <strong> - 7 times
    <subtitle> - 5 times
    <cite> - 4 times
    <a> - 3 times
    <image> - 2 times
    <sup> - 1 time
  - Character content other than whitespace is not allowed because the content type is 'element-only'. Tag - <cite> , 245 times.
  - empty tag . Tag - <td> , 19 times.
- However, my Kindle file (after converting to .mobi) perfectly displays the resulting file.
And from an informal point of view?
- UPD: Fixed. I had a bug. Download version from repository.
  There is a strange bug with the disappearance of the spaces around the tags inside the text. That is, the HTML code of yet another bicycle turns into yet another bicycle . This is probably a BeautifulSoup bug, but maybe I have a bug somewhere.
Anything interesting in implementation?
- Not really. All parsing comes down to the right library calls. After it, I download all the pictures and replace tags. on <image l:href="#image_id"/> . Then with the help of a small set of crutches I rearrange the parsing trees. Delete some tags, replace others, insert third ones. Finally, I collect all this together, add a header, a cellar and write to the file. The only not quite trivial moment is the replacement of <br> tags with . - . - conversion.py , make_paragraphs . . - . - conversion.py , make_paragraphs .
Yes, the same direct road to govnokod.ru !
- Quite possible. I am only exploring the possibilities of BeautifulSoup, so for sure some things did not go the way it is done. Write in the comments your point of view, discuss.
I found another bug!
- BitBucket supports fork.

Source: https://habr.com/ru/post/116982/

All Articles

Export favorites Habra in FB2

More articles: