Creating an FB2 version of the latest issue of a magazine / newspaper

Prehistory

Recently, more and more magazines and newspapers are posting their latest numbers on the network (Vedomosti, Expert, Esquire, etc.). Everything is good with these numbers with one exception - to read them you need internet.
With the Internet, the problem is that it is not everywhere (metro) and not every device sees it (most e-ink readers).
In this regard, the idea was born that it would not be bad to make copies of periodicals in the form of electronic books (for example, in FB2 format).

Task

Create a solution that allows you to generate an FB2 file format from a piece of the site containing the desired issue of the journal (for example, Expert No. 32 for 2010 ).
The file should contain pictures and a table of contents with a list of articles is desirable.
Creating a file with a new number should occur in (semi) automatic mode, take no more than 5-10 minutes and do not require serious manual processing.

Search for a solution

What-To-> FB2 Converter

As it turned out the HTML-> FB2 converters cat wept. And there are no such ones that can automatically process a pack of html pages and correctly compile a table of contents and register links. ~~Although maybe I was looking bad or did not understand the possibilities of what I found.~~
For a start, I tried all the editors described in the review of computers .

"Any to FB2" - completely killed the Cyrillic (most likely because of the curved hands) and is designed to work with one page.
“FictionBook Designer” is a powerful thing, but does not have (did not find) the autoconversion function.
Web2FB2 - closest to what you want, but has a 10 page limit and puts everything in one pile without a table of contents

Further search brought on the remarkable service FeedConverter (about this service on Habré already wrote ).
Testing on the first Russian RSS feed showed that the service:

copes with the Cyrillic alphabet
generates a table of contents in the form of a list of records
takes pictures

Those. now, in order to get the result, it’s enough to feed a feed to the input to this service, in which there will be full-text article numbers.

Full-text RSS feed

The site in question does not provide full-text RSS with the given numbers. Only anotations of the last issue .
It’s convenient to use Yahoo Pipes to create Full Text RSS. We feed him our stream, and in the cycle we load the full text of the article - http://pipes.yahoo.com/pipes/pipe.edit?_id=661b8231fa3df88317939d452e772c10 . If the site does not provide an RSS feed at all, but only publishes articles (such as Esquire), the Yahoo Pipes engine allows you to parse the content of the page, get links from there and download the necessary articles. For this, I created a pipe http://pipes.yahoo.com/pipes/pipe.edit?_id=85427a7ff66aa7c06a1fa8da677fbd25
This mechanism has a plus in that it allows you to get any number, not just the last one.
To do this, in the call line you just need to change the parameter that is responsible for the year and the number within the year http://pipes.yahoo.com/pipes/pipe.run?_id=85427a7ff66aa7c06a1fa8da677fbd25&_render=rss&number= 31 & year = 2010 .
')

Total

The final algorithm for creating FB2 periodicals is as follows:

Find a site with information
We take RSS or index page
Parsim a page in Yahoo Pipes and pull up the full articles
Feed the FeedConverter pipe and pick up the FB2 book
??????
PROFIT!

With the once configured pipe, getting a new number in FB2 will consist of entering the FeedConverter website and pressing the generation button.

A spoon of tar

Due to the fact that Yahoo Pipes does not work too fast, the generation may not be completed on the first attempt. Hopefully the creators of FeedConverter will do something with this.
~~Yahoo Pipes has restrictions on how much CPU time can be eaten by one pipe.~~ ~~In this connection, some volume issues of the journal do not fit into the Procrustean bed and fly out with an error (for example, Expert No. 1 for 2010 ).~~ ~~What to do with this is not clear.~~ ~~Perhaps it is worth spreading the parsing and loading of texts into different pipes.~~
Uploading full articles can be posted on ReadBox.info (see below)

Update 1: Below in the comments suggested a great service for creating full-text versions - ReadBox.info . In order to get FullText feed at the entrance you need to feed the RSS feed and XPath block with text. Thus, the text loading function can be removed from Y! P, which will allow it to work more stably.
Actually now the process can be done like this:

Find a site with information
We take RSS or index page
We leave only the necessary articles or parsim index page in Yahoo Pipes
We tighten the full text of articles using ReadBox.info
Feed RSS FeedConverter and pick up FB2 book
??????
PROFIT!

Update 2: For those who are not afraid of picking in configs and smoking manuals there is an excellent program for our goal - nmdparser . Here is an example of how it can be configured to receive an archive copy of the Auto Retrieval in FB2 .

Source: https://habr.com/ru/post/102106/

All Articles