📜 ⬆️ ⬇️

Own selection of news. New release, new features

In the previous topic you asked me to add some features to the program, so I’ll go straight to
to new features:



Further plans include optimizing the operation of the application (now with a large number of sources, very long loading and printing), error correction, improved usability, localization.

I do not plan to add new features to the next release.
You can download from here , there is an assembly for Windows and Linux. Current version 1.1.

Everything is as usual free and with open source, but if you want to thank me, then I will not give up

')
Upd. 1 If you have a problem running a Linux build, then you most likely have outdated Qt libraries installed.
Startup Error: Segmentation fault.
You can update the library by downloading qt-sdk and setting the path to the qt / lib environment variable LD_LIBRARY_PATH
or adding a repository with fresh packages:
for example, for Ubuntu 9.10, run sudo add-apt-repository ppa: bausparfuchs / qt4.6-release (if you have another version of Ubuntu, follow the link).


Upd. 2 An example of adding your own rules for parsing pages. (at the request of the user myther ).
Suppose we want to read messages from the forums.goha.ru forum.
In the page-templates folder we create a new file, forums-goha.properties (Any name can be, only the file extension is important).
Add the first field to the file
rss.host = forums.goha.ru (rss.host - site address)
Next, open any forum thread for which we want to recognize the content. For example, forums.goha.ru/showthread.php?t=388919 .
In the browser, select, view the source of the page.
Next we find the tags between which the content of interest is located, preferably the tags should be unique. In our case, the entire content is between <div id = "posts"> and </ div>, inside of which there are also several nested div tags, the number of which we cannot say for sure, therefore such a template would be inaccurate. But then immediately after closing the div with the contents, a new unique tag opens, which we will use <div id = "lastpost">. Thus, the content.pattern will look like this:
content.pattern = <div id = \ "posts \"> (. *) <div id = \ "lastpost \">
With the \ character, we escape double quotes, and the combination (. *) Means any character in any quantity.
As a result, the parser will take the contents between the specified tags, including themselves, which is not very good, since the last tag opens and needs to be removed. What is very simple to do, just add this line to the file
clear.tags = <div id = \ "lastpost \">
If you need to remove tags more than one, you can list them separated by a semicolon, for example
clear.tags = <div id = \ "lastpost \">; <a id==.3lastlast \\\; <b id = \ "lastpost \">
Last field
clear.html can take only two values ​​(true or false) and means to enable / disable removal of all html tags in the text. For now we will try
clear.html = false
That is, we will not remove tags in the text.

Thus we obtain a file with the following contents:
rss.host=forums.goha.ru
content.pattern=<div id=\"posts\">(.*)<div id=\"lastpost\">
clear.tags=<div id=\"lastpost\">
clear.html=false


Source: https://habr.com/ru/post/84496/


All Articles