📜 ⬆️ ⬇️

Export Favorites on Habré to PDF. Version 2.0

Good day, habrazhiteli!



I think many of you have ever attended the idea of ​​"just to save articles from Habr." The same thought came to me a little over a year ago .
')
I present to you the new version of the program of downloading articles from Habra, Hiktames and Megamind in PDF format.

The new project is called HabraParse .

The project consists of a library, which parses the sites, and a script that uses only a part of the capabilities of this library. The script is written in python3, its work will require the docopt , requests and weasyprint modules (you can easily install them all with the pip install name command).

Currently, the script has the following features:


Using the options --gt / --mm allows you to save articles with GeekTimes.ru and Megamozg.ru .

Brief description of script parameters
Usage:
./habraparse.py save_favs_list [--gt|--mm] <username> <out_file> ./habraparse.py save_favs [--gt|--mm] [-cn --save-html --limit=N] <username> <out_dir> ./habraparse.py save_post [--gt|--mm] [-c --save-html] <topic_id> <out_file> 

By default, all teams work with the HabraHabr.ru project.
When specifying the --gt / --mm options, the script will work with GeekTimes.ru/Megamozg.ru.

Commands:
  save_favs_list -    <out_file>  URL    <username> save_favs -    <out_dir>      <username> save_post -    <out_file>    ID 



Enjoy and enjoy. In the event of errors, please post messages in a personal or a bug on the github-page of the project .
If someone is missing something, then write a feature-request in the comments, as far as I can, I will try to implement it.



Technical details


In fact, Habraparse is, first of all, a library for working with information on the websites Habrahabr.ru, GeekTimes.ru, MegaMozg.ru, which allows:

The name for the library was chosen extremely original - habr .

User information is presented in the classes HabraUser , GeektimesUser , MegamozgUser of the habr.user module and includes:


Information on articles is presented in the classes HabraTopic , MegamozgTopic , GeektimesTopic of the habr.topic module and includes:


The script uses the habr library for parsing and the weasyprint library for generating pdf. Weasyprint was chosen as the easiest to use interface, and as the only one that was tried that was able to generate a normal PDF file. However, as it turned out, this library is very slow.
If you know other pdf generation libraries that work better - write in comments or in person. However, I’ll say right away that the development was originally conducted under python3, so I don’t need to tell me about the excellent pdf libraries for python2.

On this all. If someone liked it, then use it to your health! If anyone is ready, on the basis of this library, to make his own script with cards and women, then everything is in your hands!

UPD. At the request of workers updated the image of the container for the docker icoz / habraparse . The order of use read here .

Source: https://habr.com/ru/post/250027/


All Articles