📜 ⬆️ ⬇️

We filter rss Habra through Yahoo. Pipes



In the comments often complain about the abundance of unwanted content on the main. Posts can not like everyone at once. Duh ...
Only one conclusion - you need to filter. Cap suggests Intuition tells you what we will filter using Yahoo.Pipes.

With pictures.


Instead of intro


About Y! P have already written: one , two , and more . We write again, will not be superfluous. The tool is truly impressive.
And why was Y! P actually chosen when this functionality was already implemented in many rss readers?

Just because it is very interesting to do something new and so close to * nix philosophy. In addition, after filtering through Y! P, you receive only useful traffic from the tape, without loading your channel with unnecessary data.
')

New engine


Not so long ago, Y! P got the version two engine: Yahoo! Pipes V2 engine .

In short:


Receiving tape


It all starts with the rss feed. However, Habralenta is not suitable for Y! P. Not too bothered with the details, take the feedburner feed:



The figure shows the Fetch Feed module. His task - to take the tape from the specified address and pass on to the pipe. And then, as you can see, the Split module follows, which divides one tape into two absolutely identical ones.
Is it to be redundancy for fault tolerance?

No, this is necessary for clever filtering: everything that is not needed is cut off in the left channel, and only that is left in the right one.

Filtration




The upper module blocks entries for two RegExp, and the lower one allows entries for the other two.
What is RegExp?

About RegExp'y eng , rus , as well as a good guide on the topic.

So that our filter modules do not turn into huge monsters, our keywords are filtered separately (text [wired] field with suspicious gray links). For now it is worth stopping on the left side of the module. From the drop-down list, you can select the item of interest. For our task (blocking records), the item.title field is perfect, since it is by the title that the records are the first to be eliminated. The item.description field contains the body of the entry and is used in the filter to leave topics of interest to us. For example, you,% username%, blocked the word Microsoft © , but allowed the word Linux. In this case, if the post with the title “Microsoft takes new heights” reads “linux is still cooler”, then this entry will go straight to you in the rss reader.

Making the result




After cutting \ leaving records, it is necessary to combine the two tapes again into one. This is done by the Union element, which already has 5 entrances (so it is a pity to waste unused inputs). Now we have one tape again, in which duplicates could have started. The Unique module will help us to display these unwanted parasites: based on the item.link field, it will scan all the records and remove the extra ones. It remains only to bring beauty, sorting entries by any criterion. In the picture entries are sorted by descending publication date (new at the beginning). The main module at the end of any pipe is Output. On it our project should end. Well, actually it happened. By clicking on the Output module, you can enjoy the result of the effort:



So, where are the patterns for filtering records?


Templates





This is how templates for deleting posts look. Three large String Builder'a needed just for beauty (you can shove everything into one). In addition, separation helps with some sort of systematization of patterns. All records are separated from each other by a vertical bar, and this is a very important point. String Builder simply takes all the fields and connects them to one string. Little String Builder collects all the information from the large, and then obdobsvivaet in a suitable wrapper. As you know, the headlines for posts look like this on Habré: “blog / post”, so in this case, all the templates are aimed at filtering certain blogs. The word “stub” helps us to avoid a situation where at the end of the block OR there is nothing (microsoft | linux | freebsd |) that will filter out absolutely all posts. You can look at the resulting RegExp in the debugger field after clicking on the small string builder:



The second template for a forbidding filter is a bit simpler, and it blocks certain posts (from any blogs):



The resolution filter looks noticeably smaller, but no less important:



It will search for keywords from the String Builder both in the title and in the body of the post. These elements will help us not to miss an important topic because of the hard filtering.
And then what?

Subscribe to rss or atom of the resulting tape and of course PROFIT.

Instead of conclusion


I hope this article will help you,%% username, master Yahoo.Pipes and implement your ideas there.
Link to pipe from article: habrapipe

Source: https://habr.com/ru/post/98468/


All Articles