RSS feeds and torrents

RSS feeds for torrent files allowed replacing good old Fido file echo conferences. According to them, new files on a pre-selected topic “came themselves”, i.e. it was possible to find out about the updates not by the reviews on the sites, but by sorting the incoming files.

The convenience of this is difficult to describe. Consider the same YouTube, but with real FullHD (which is FullHD, and not what YouTube HD calls), in its convenient player, without lags and advertisements ... In his free time, the computer downloads all the new items itself, and you only choose what to watch from this (and what to kill). An unacceptable waste of the Internet, a luxury that has become available only in recent years with an increase in speed and an almost universal abolition of traffic tariffing.

How it works?

The site accompanying the tracker (mononova, rutracker, bakabt, thepiratebay, animesuki, demonoid, etc.) has the ability to create feeds - pages in the RSS [wiki] format containing links to new torrent files.
')
Usually, it is possible to filter (directly on the site, when selecting the desired RSS) what interests you. For example, tokyotosho allows you to create the correct RSS link with a choice of file types of interest: tokyotosho.info/rss_customize.php .

In a good case, the link will be a torrent file. In the bad - a link to the html-page, which will already be a link to the torrent-file. (we will talk about this subtlety in the implementation section).

Then everything is simple: a certain client (torrent client with built-in RSS or a specialized program) periodically downloads RSS, downloads torrent-files from it, downloads the contents of torrents (well, or sends a torrent client to download). RSS is downloaded every N minutes (I have once an hour), files appear on the disk by themselves.

Potential problems

The file can be laid out on several trackers. If we have downloaded a torrent file several times, then we need to download it several times. Most torrent clients can track this, however, for this they need to remember all downloaded torrent files. After a year or two, the story becomes too extensive and slow (yes, even in μTorrent). If the torrents are “forgotten”, they will either swing several times, or (if the torrent client is smart enough) to be in the “downloaded” and sit. This raises the problem of lack of peers, and the file sits for a very long time. Moreover, the second or third time.
Torrents can be from different releasers, and we want to have a file in the singular, you need to catch repeats
I want to make the torrents swing without the participation of a person (even when he logged out)
I want to have a graphical full-featured face and the ability to observe the process when necessary
Among the downloaded feeds are likely to be uninteresting torrents, I want to configure the filter so as not to download too much
The feed may contain indirect links (i.e. links to the html page, from which you can already download the torrent file)
The tracker can be private and require a cookie (login / password) for the feed download and / or torrent files

Theoretical model

I have never seen this anywhere in its full form, so this can be considered a “spherical model in vacuum”.

The RSS server downloads all feeds, separately for each user. Feeds are sent to the download client.
Client download downloads torrent files.
The client sends the downloaded torrent files to filter on the basis of repetition, undesirability
After filtering, the torrent files are transferred to the torrent client in server mode, supporting multiple queues for users and able to find “identical” things from different users, so as not to download them twice.
After the download is complete, hardlink files are put into the "incoming" user, with the copy saved in the personal details of the torrent client until the final siding
The torrent client provides each user with either a web-snout or a rich-application interface (better this and that)

Practical implementation

(under linux)

Immediately I say, my current implementation is quite far from the desired, and is a compromise on the amount of work, functionality and convenience.

So first: RSS. I use the rsstail package for this with the -N1l key (just output the tail of the RSS). Next, I solve the “html-link” problem - this is wget, which can download everything recursively by reference (the nesting depth is 1, so we won’t go far, if the link is a torrent, then there will be no recursion). I also solve one stupid problem that nyaatorrents created, giving away torrents not in the form of files, but forming them dynamically with the Content-Disposition header.

The general line looks like this:

 rsstail -u http://www.nyaatorrents.org/?page=rss\&catid=1\&subcat=37 -N1l | grep http | wget -i - -r -l 1 -nd --content-disposition -A torrent > / dev / null 2> / dev / null

Due to the natural laziness to write processing the feed list, it was lazy. Although necessary. So in the script the same type of lines for all trackers, differing only in address.

I do not use private trackers (as far as possible) and I do not have several users on the machine (at least, torrent users), so this functionality is enough for me.

All torrents are downloaded to / srv / unsorted-torrents. From here, they are filtered (as long as I have very primitive filtering by find) using the “delete excess” method. And carry.

Here is the process of catching replays a little more interesting. I use the singlemv self-writing program, which in a very compact form looks to see whether such a file was before or not. And transfers only under the condition that the file was not (after each transfer, the repetition base is updated).

After that, all the excess in / srv / unsorted-torrents is nailed.

Torrents are transferred to the / srv / torrents-queue directory, which is assigned to the pickup-folder for the torrent client. In my case, this is a deluge, but, in principle, the scheme should work with any torrent client.

Future plans

It is already written in C lstorrent , although without a proper binding (compared to the Python version - threefold savings in memory and 10-fold in speed), allowing you to watch the contents of torrents.

In the long run, the gattai project (never crawled to the alpha version), which will determine files with different releasers / translators with the same name and will automatically choose the best quality among them.

Not solved the issue in the face. On the one hand, I want to have a full-fledged download server, on the other hand, I don’t want to lose a very nice deluge face.

Why all these difficulties?

It is better to lose a day, then fly in an hour.

After I finished making this system in my area around the month of February, for almost half a year the system has been working without any problems and complaints, neatly folding new items into the incoming. Moreover, if for some reason there are problems with deluge (for example, plugging due to a pack of slow torrents that cannot be downloaded for weeks), then the torrent files will continue to be collected, i.e. loss in the "feed" does not occur. (This is exactly what makes this scheme more interesting than the feed pump that is built into the torrent client).

UPD: forgot the most important thing is the singlemv script:

#!/usr/bin/python import sys,os, cPickle #minimalistic command line: #1st argument - memory file #2nd - pre-last - files to be moved #last - destignation (where to be moved) def singlemv(memory,from_list,to): #return new memory mode to_move=frozenset(from_list) - memory errors=set() for f in to_move: try: os.rename(f, os.path.join(to,os.path.basename(f))) except: print "error move %s to %s", f, to errors.add(f) return memory|(to_move-errors) def __main__(): if len(sys.argv)<4: print "Usage: singlemv.py memory_file from to" exit(-1) try: memory_file=file(sys.argv[1],"r") memory=cPickle.load(memory_file) memory_file.close() if type(memory) != type (frozenset()): print "bad memory file" exit(1) except IOError, OSError: memory=frozenset() newmem=singlemv(memory,sys.argv[2:-1],sys.argv[-1]) memory_file=file(sys.argv[1],"w") cPickle.dump(newmem,memory_file) memory_file.close() __main__()

All the beauty of the program - in the line to_move=frozenset(from_list) - memory (set difference).

Source: https://habr.com/ru/post/96410/

All Articles