📜 ⬆️ ⬇️

Creating an FB2 version of the latest issue of a magazine / newspaper


Recently, more and more magazines and newspapers are posting their latest numbers on the network (Vedomosti, Expert, Esquire, etc.). Everything is good with these numbers with one exception - to read them you need internet.
With the Internet, the problem is that it is not everywhere (metro) and not every device sees it (most e-ink readers).
In this regard, the idea was born that it would not be bad to make copies of periodicals in the form of electronic books (for example, in FB2 format).


Create a solution that allows you to generate an FB2 file format from a piece of the site containing the desired issue of the journal (for example, Expert No. 32 for 2010 ).
The file should contain pictures and a table of contents with a list of articles is desirable.
Creating a file with a new number should occur in (semi) automatic mode, take no more than 5-10 minutes and do not require serious manual processing.

Search for a solution

What-To-> FB2 Converter

As it turned out the HTML-> FB2 converters cat wept. And there are no such ones that can automatically process a pack of html pages and correctly compile a table of contents and register links. Although maybe I was looking bad or did not understand the possibilities of what I found.
For a start, I tried all the editors described in the review of computers .
Further search brought on the remarkable service FeedConverter (about this service on Habré already wrote ).
Testing on the first Russian RSS feed showed that the service:Those. now, in order to get the result, it’s enough to feed a feed to the input to this service, in which there will be full-text article numbers.

Full-text RSS feed

The site in question does not provide full-text RSS with the given numbers. Only anotations of the last issue .
It’s convenient to use Yahoo Pipes to create Full Text RSS. We feed him our stream, and in the cycle we load the full text of the article - http://pipes.yahoo.com/pipes/pipe.edit?_id=661b8231fa3df88317939d452e772c10 . If the site does not provide an RSS feed at all, but only publishes articles (such as Esquire), the Yahoo Pipes engine allows you to parse the content of the page, get links from there and download the necessary articles. For this, I created a pipe http://pipes.yahoo.com/pipes/pipe.edit?_id=85427a7ff66aa7c06a1fa8da677fbd25
This mechanism has a plus in that it allows you to get any number, not just the last one.
To do this, in the call line you just need to change the parameter that is responsible for the year and the number within the year http://pipes.yahoo.com/pipes/pipe.run?_id=85427a7ff66aa7c06a1fa8da677fbd25&_render=rss&number= 31 & year = 2010 .


The final algorithm for creating FB2 periodicals is as follows:
  1. Find a site with information
  2. We take RSS or index page
  3. Parsim a page in Yahoo Pipes and pull up the full articles
  4. Feed the FeedConverter pipe and pick up the FB2 book
  5. ??????
  6. PROFIT!
With the once configured pipe, getting a new number in FB2 will consist of entering the FeedConverter website and pressing the generation button.

A spoon of tar

Update 1: Below in the comments suggested a great service for creating full-text versions - ReadBox.info . In order to get FullText feed at the entrance you need to feed the RSS feed and XPath block with text. Thus, the text loading function can be removed from Y! P, which will allow it to work more stably.
Actually now the process can be done like this:
  1. Find a site with information
  2. We take RSS or index page
  3. We leave only the necessary articles or parsim index page in Yahoo Pipes
  4. We tighten the full text of articles using ReadBox.info
  5. Feed RSS FeedConverter and pick up FB2 book
  6. ??????
  7. PROFIT!

Update 2: For those who are not afraid of picking in configs and smoking manuals there is an excellent program for our goal - nmdparser . Here is an example of how it can be configured to receive an archive copy of the Auto Retrieval in FB2 .

Source: https://habr.com/ru/post/102106/

All Articles