
Prehistory
Recently, more and more magazines and newspapers are posting their latest numbers on the network (Vedomosti, Expert, Esquire, etc.). Everything is good with these numbers with one exception - to read them you need internet.
With the Internet, the problem is that it is not everywhere (metro) and not every device sees it (most e-ink readers).
In this regard, the idea was born that it would not be bad to make copies of periodicals in the form of electronic books (for example, in FB2 format).
Task
Create a solution that allows you to generate an FB2 file format from a piece of the site containing the desired issue of the journal (for example,
Expert No. 32 for 2010 ).
The file should contain pictures and a table of contents with a list of articles is desirable.
Creating a file with a new number should occur in (semi) automatic mode, take no more than 5-10 minutes and do not require serious manual processing.
Search for a solution
What-To-> FB2 Converter
As it turned out the HTML-> FB2 converters cat wept. And there are no such ones that can automatically process a pack of html pages and correctly compile a table of contents and register links.
Although maybe I was looking bad or did not understand the possibilities of what I found.For a start, I tried all the editors described in the
review of computers .
- "Any to FB2" - completely killed the Cyrillic (most likely because of the curved hands) and is designed to work with one page.
- “FictionBook Designer” is a powerful thing, but does not have (did not find) the autoconversion function.
- Web2FB2 - closest to what you want, but has a 10 page limit and puts everything in one pile without a table of contents
Further search brought on the remarkable service
FeedConverter (about this service on Habré
already wrote ).
Testing on the first Russian RSS feed showed that the service:
- copes with the Cyrillic alphabet
- generates a table of contents in the form of a list of records
- takes pictures
Those. now, in order to get the result, it’s enough to feed a feed to the input to this service, in which there will be full-text article numbers.
Full-text RSS feed
The site in question does not provide full-text RSS with the given numbers. Only
anotations of the last issue .
It’s convenient to use
Yahoo Pipes to create Full Text RSS. We feed him our stream, and in the cycle we load the full text of the article -
http://pipes.yahoo.com/pipes/pipe.edit?_id=661b8231fa3df88317939d452e772c10 . If the site does not provide an RSS feed at all, but only publishes articles (such as Esquire), the Yahoo Pipes engine allows you to parse the content of the page, get links from there and download the necessary articles. For this, I created a pipe
http://pipes.yahoo.com/pipes/pipe.edit?_id=85427a7ff66aa7c06a1fa8da677fbd25This mechanism has a plus in that it allows you to get any number, not just the last one.
To do this, in the call line you just need to change the parameter that is responsible for the year and the number within the year
http://pipes.yahoo.com/pipes/pipe.run?_id=85427a7ff66aa7c06a1fa8da677fbd25&_render=rss&number= 31 & year = 2010 .
')
Total
The final algorithm for creating FB2 periodicals is as follows:
- Find a site with information
- We take RSS or index page
- Parsim a page in Yahoo Pipes and pull up the full articles
- Feed the FeedConverter pipe and pick up the FB2 book
- ??????
- PROFIT!
With the once configured pipe, getting a new number in FB2 will consist of entering the
FeedConverter website and pressing the generation button.
A spoon of tar
- Due to the fact that Yahoo Pipes does not work too fast, the generation may not be completed on the first attempt. Hopefully the creators of FeedConverter will do something with this.
Yahoo Pipes has restrictions on how much CPU time can be eaten by one pipe. In this connection, some volume issues of the journal do not fit into the Procrustean bed and fly out with an error (for example, Expert No. 1 for 2010 ). What to do with this is not clear. Perhaps it is worth spreading the parsing and loading of texts into different pipes.
Uploading full articles can be posted on ReadBox.info (see below)
Update 1: Below in the comments suggested a great service for creating full-text versions -
ReadBox.info . In order to get FullText feed at the entrance you need to feed the RSS feed and XPath block with text. Thus, the text loading function can be removed from Y! P, which will allow it to work more stably.
Actually now the process can be done like this:
- Find a site with information
- We take RSS or index page
- We leave only the necessary articles or parsim index page in Yahoo Pipes
- We tighten the full text of articles using ReadBox.info
- Feed RSS FeedConverter and pick up FB2 book
- ??????
- PROFIT!
Update 2: For those who are not afraid of picking in configs and smoking manuals there is an excellent program for our goal -
nmdparser . Here is an example of how it can be configured to receive an
archive copy of the Auto Retrieval in
FB2 .