Own selection of news. New release, new features

In the previous topic you asked me to add some features to the program, so I’ll go straight to
to new features:

Support for rss and atom streams (Now you can subscribe to your shared entries in Google Reader);
As a source, a regular html page can be specified (Now you can collect materials you like from different resources and put them into a book);
Print support in fb2 format (So far, only text, but in the next release there will definitely be support for pictures, print preview for fb2 is also lame);
Update Download Scheduler (Disable by default, settings are not saved upon restart in this release);
A new field clear.tags has appeared in the templates of the page parser (Allows you to remove some unnecessary tags from the content, supports Perl-like regular expressions, which you can share with each other by the symbol;);
Import source links from txt file (New line - new link, useful for those who want to create their own books);
Fixed several errors with getting the contents of the pages.

Further plans include optimizing the operation of the application (now with a large number of sources, very long loading and printing), error correction, improved usability, localization.

I do not plan to add new features to the next release.
You can download from here , there is an assembly for Windows and Linux. Current version 1.1.

Everything is as usual free and with open source, but if you want to thank me, then I will not give up

help in testing and sending reports with bugs or here in the comments, or (much better) in the project's tracker ;
original icons and splash screen, designed in the same style;
assistance in localizing the application (I will inform those interested later and give out a test version);
assistance in writing documentation and help'a, as well as their further localization;
build packages for Linux distributions and installers for Windows;
humble donationst;)

')
Upd. 1 If you have a problem running a Linux build, then you most likely have outdated Qt libraries installed.
Startup Error: Segmentation fault.
You can update the library by downloading qt-sdk and setting the path to the qt / lib environment variable LD_LIBRARY_PATH
or adding a repository with fresh packages:
for example, for Ubuntu 9.10, run sudo add-apt-repository ppa: bausparfuchs / qt4.6-release (if you have another version of Ubuntu, follow the link).

Upd. 2 An example of adding your own rules for parsing pages. (at the request of the user myther ).
Suppose we want to read messages from the forums.goha.ru forum.
In the page-templates folder we create a new file, forums-goha.properties (Any name can be, only the file extension is important).
Add the first field to the file
rss.host = forums.goha.ru (rss.host - site address)
Next, open any forum thread for which we want to recognize the content. For example, forums.goha.ru/showthread.php?t=388919 .
In the browser, select, view the source of the page.
Next we find the tags between which the content of interest is located, preferably the tags should be unique. In our case, the entire content is between <div id = "posts"> and </ div>, inside of which there are also several nested div tags, the number of which we cannot say for sure, therefore such a template would be inaccurate. But then immediately after closing the div with the contents, a new unique tag opens, which we will use <div id = "lastpost">. Thus, the content.pattern will look like this:
content.pattern = <div id = \ "posts \"> (. *) <div id = \ "lastpost \">
With the \ character, we escape double quotes, and the combination (. *) Means any character in any quantity.
As a result, the parser will take the contents between the specified tags, including themselves, which is not very good, since the last tag opens and needs to be removed. What is very simple to do, just add this line to the file
clear.tags = <div id = \ "lastpost \">
If you need to remove tags more than one, you can list them separated by a semicolon, for example
clear.tags = <div id = \ "lastpost \">; <a id==.3lastlast \\\; <b id = \ "lastpost \">
Last field
clear.html can take only two values (true or false) and means to enable / disable removal of all html tags in the text. For now we will try
clear.html = false
That is, we will not remove tags in the text.

Thus we obtain a file with the following contents:

rss.host=forums.goha.ru
content.pattern=<div id=\"posts\">(.*)<div id=\"lastpost\">
clear.tags=<div id=\"lastpost\">
clear.html=false

Source: https://habr.com/ru/post/84496/

All Articles

Own selection of news. New release, new features

More articles: