Good day, habrovchane!

I think many of you have ever visited the idea of "just like to save articles from Habr."
The same thought came to me two days ago. I wanted to save not just every article, but only those in my favorites, but not individually, but at once everything in a crowd.
The first thought is to write a script that will pull it all out. I
already learned Python, but I
have n’t yet encountered PDF generation.
I was screwed up ... But OpenSource and Habr saved me!
A brief summary of the article for those who are not interested in reading a lot.The article describes a modified Python script,
fav2pdf .
The original author of the script
vrtx , for which he thanks a lot.
"Usage is better than a thousand words!"
usage: fav2pdf.py [-h] [-d OUTPUT_DIR] [--from-date FROM_DATE] [--to-date TO_DATE] [--all-in-one] [--only-hubs [ONLY_HUBS [ONLY_HUBS ...]]] [--no-comments] [--no-symlinks] user Tool for save favorite posts from habrahabr.ru in pdf's or html's positional arguments: user habrahabr.ru username optional arguments: -h, --help show this help message and exit -d OUTPUT_DIR, --output-dir OUTPUT_DIR Directory for output --from-date FROM_DATE From date --to-date TO_DATE To date --all-in-one Save all posts in one PDF-file --only-hubs [ONLY_HUBS [ONLY_HUBS ...]] Save only posts from hubs. For multiple: "--only-hubs Hub1 Hub2 --" --no-comments Dont save comments from posts --no-symlinks Dont create symlinks to posts --create-html Create html's instead of pdf's --create-url-list Just save user.txt with all links
What is the picture assembled fromThanks again Habra. KDPV is assembled from two, which were in articles
one and
two .
After the first thought I was visited by the second. Surely I'm not the only one who thought about it. And he began to systematically search Habr in search of something tasty. And rather quickly I came across an
article by fellow
vrtx , where he described roughly what I wanted.
')
But as always there is
BUT :
his
fav2pdf.py script collects all the articles from the favorites and merges them into one PDF, without comments, and they often contain much more valuable information than is the case in the article itself.
I also wanted to have a separate PDF for the article. Have a choice: with or without comments, whether or not to have a breakdown by hubs ...
Having made a fork of the
fav2pdf.py script (for which I bow to the OpenSource movement), I first made
minor improvements so that everything would be saved in separate files.
I was pleased with the author by the fact that his developments were useful to someone.
But then I felt a little ashamed. I somehow corrected the script and did what I needed. But not all habrazhiteli know python and can correct the script to fit their needs.
Therefore, I spent a little more time and finished the fav2pdf script to a completely useful utility.
Now the script allows you to:
- select the directory where you can save the pdf (although for the pdf set, subfolders of posts and hubs will be created)
- set a time limit (for example, save only articles for 2013)
- choose to save or not comments
- save all in one pdf or split into many pdf'ok
- if there are a lot of pdfs, then you can also create a rubricator of symlinks (subfolders of hubs / hub_name with symlinks on posts / post_id.pdf) so that you can more easily navigate the collection.
Script dependenciesFor the script to work correctly, it took me through pip (for pyhton2) to put the following packages:
- pisa
- reportlab
- html5lib
- requests
- lxml
Command Line Utilities usage: fav2pdf.py [-h] [-d OUTPUT_DIR] [--from-date FROM_DATE] [--to-date TO_DATE] [--all-in-one] [--only-hubs [ONLY_HUBS [ONLY_HUBS ...]]] [--no-comments] [--no-symlinks] user Tool for save favorite posts from habrahabr.ru in pdf's or html's positional arguments: user habrahabr.ru username optional arguments: -h, --help show this help message and exit -d OUTPUT_DIR, --output-dir OUTPUT_DIR Directory for output --from-date FROM_DATE From date --to-date TO_DATE To date --all-in-one Save all posts in one PDF-file --only-hubs [ONLY_HUBS [ONLY_HUBS ...]] Save only posts from hubs. For multiple: "--only-hubs Hub1 Hub2 --" --no-comments Dont save comments from posts --no-symlinks Dont create symlinks to posts --create-html Create html's instead of pdf's --create-url-list Just save user.txt with all links
On this all. If someone liked it, then use it to your health!
If someone is missing something, then write a feature-request in the comments, as far as I can, I will try to implement it.
Ps. Implemented output in html. (Spoilers updated command line descriptions.)
PS2. Implemented txt output of a simple list of url posts. All restrictions that you can set from the command line (dates, hubs) apply to this list. (Spoilers updated command line descriptions.)