rawdog - RSS aggregator without excessive requests

Lyrical introduction

In connection with the recent budding from Habrahabr new resource, I had the need to equip a convenient way to read both resources. The first thought, of course, was about RSS, since the engine at both sites supports it. The real trifles remained - to find a good RSS aggregator that could be installed on a weak VPS (since the fate of Google Reader was somewhat dampened by the desire to rely on third-party services).

At first, a tip from Tsyganov_Ivan led to the Tiny Tiny RSS aggregator, which seemed like a real “silver bullet”. However, closer acquaintance with system requirements somewhat cooled my ardor - to pile up a full-fledged LAMP on a typewriter with God forbid 256 meters of unallocated memory, and all this for the sake of a resource literally for one person? Moreover, acquaintance with the FAQ, which contained links to frankly mocking answers on the package forum, finally discouraged the desire to deal with tt-rss.

The first round of the search ended in failure, because alternatives (like FeedHQ ) required roughly the same thing. Desperate, I was already going to write the tool I needed myself and began to look for suitable libraries for Python (to which I have a weakness) when I came across practically what was needed .

The name RAWDOG itself hints that the author at the time of writing was overwhelmed with similar feelings. This utility is designed to run manually or by cron and can only do one thing: parse the specified RSS feeds and write new items to the output file using the specified pattern.
')

Installation and Setup

Since rawdog is present in the Ubuntu repository, getting a package is not difficult. But the setting has its own characteristics.
First, you will have to add the rawdog call yourself to the crontab, or to cron. *. It will look something like this:

  rawdog --dir WORKDIR --log /var/log/rawdog/rawdog.log --no-lock-wait --update --write

where the key - no-lock-wait will not allow to run the second copy of rawdog, and WORKDIR - the working directory of the utility.

The fact is that rawdog searches for a configuration file and keeps all its temporary files in one working directory - by default ~ / .rawdog . This may be convenient for a workstation, but it is against the usual practice. If you, like me, like order and uniformity, you can specify a different working directory using the --dir key, which allowed you to send the working directory to / var / cache / rawdog (since its main contents, apparently, the cache of downloaded tapes) . Since the configuration file is also searched there (the –config key allows you to specify an additional config, but does not cancel the search for the main one), it was replaced with a symbolic link, and then went along with the templates in / etc.

A well-documented example of a configuration file can be found on the web , so I will only briefly indicate the main directives:

maxarticles N allows you to set the length of the ribbon of results (one-page output, which can be inconvenient);
maxage T indicates the records for which time interval will be shown in the output tape;
expireage T sets how long the entries that have disappeared in the original RSS feed will remain. If this interval is less than the maxage, then in the case of a frequently updated tape, the outdated entries will disappear from the results before the expiration of the normal period.
pagetemplate FILEPATH and itemtemplate FILEPATH allow you to specify a file with templates for the page as a whole and for a separate entry, respectively. By default ( default value) a simple built-in template is used.
outputfile FILEPATH - where output will be recorded. Web server settings for sharing this static page should be left outside the scope of this article (for example, I use lighttpd). The only thing is to make sure that this file will have write access to rawdog (no problem if the utility is started via cron as root) and read access from the web server.
The feed interval URL [params] directive allows you to add an RSS feed for viewing at a specified interval (since the call is usually made via cron, then rawdog will simply ignore the “non-obsolete” tapes if it is called earlier than expected). Among the parameters is to allocate id (below) and http_proxy , which allows you to specify a proxy server for accessing a specific tape (if you want a strange one, like RSS feed aggregation from Tor, well, or just from the site that came under RosKomKatok).
include FILEPATH will allow you to include another configuration file.

Customize logrotate

Since rawdog is usually called several times a day, and generates about a kilobyte of logs each time, it makes sense either to disable logging completely (by removing the --log key) or to configure logrotate. For the latter, it suffices to put in /etc/logrotate.d/ a file of approximately the same content (assuming that you have chosen the same path to the log file as I):

/var/log/rawdog/rawdog.log { weekly missingok rotate 5 compress delaycompress notifempty }

Induce beauty

The built-in template for rawdog is minimalist, if not tougher, so it makes sense to set your own template files. The most important is the pagetemplate template, since it is in it that you can define styles and include the necessary scripts. To see the default page template, you can use the following command (be sure to specify --dir WORKDIR if you, like me, moved the working directory):

  rawdog -s pagetemplate> template.html

Any embedded template can be viewed with a similar command, replacing the pagetemplate with the template name. Templating is implemented through a simple search with a replacement, although there is a conditional operator that allows you to insert a stub in the absence of a value. By the way, you can define your variables using the define directive VARNAME VALUE (globally) or the parameter define_VARNAME = VALUE (for a separate RSS feed).

It should be noted that each entry by default is marked with the feed-FEEDID CSS class, where FEEDID is the source id specified in the parameters above. This allows you to set your design for records from different sources (for example, show the site icon next to the title).

Grouping tapes into separate issue

Offhand, you can come up with one way to make it relatively easy to create several coexisting tape collections, with separate sets of subscriptions, target files, and design.

To do this, cron. * Instead of the above call is placed in the spirit of:

 #!/bin/sh WORKDIRS=/var/cache/rawdog CONFIGS=/etc/rawdog PLUGINS=/usr/share/rawdog/plugins LOGS=/var/log/rawdog for CFG in "$CONFIGS/"*.conf do WORKDIR="$WORKDIRS/"`basename "$CFG" .conf` [ -d "$WORKDIR" ] || mkdir -p "$WORKDIR" [ -f "$WORKDIR/config" ] || ln -s -f "$CFG" "$WORKDIR/config" if [ -d "$PLUGINS" ]; then [ -d "$WORKDIR/plugins" ] || ln -s -f "$PLUGINS" "$WORKDIR/plugins" fi rawdog --dir "$WORKDIR" --log "$LOGS/rawdog" --no-lock-wait --update --write done

The principle of operation is simple: for each * .conf file in / etc / rawdog, if necessary, a corresponding working subdirectory in / var / cache / rawdog will be created , and a link to the configuration file itself will be placed in it. There will also be placed (if absent) a link to a directory with common plugins.
For more convenience, you can make general settings in a separate file ( / etc / rawdog / config or / etc / default / rawdog ), including it in the * .conf files using the include directive.

Plugin extensions

rawdog searches for Python scripts located in the plugins subdirectory in the rawdog working directory. A number of ready-made plug-ins (in particular, multipage output and output in RSS format) can be found on the author’s website.

Source: https://habr.com/ru/post/240545/

All Articles