📜 ⬆️ ⬇️

RSS feeds and torrents

RSS feeds for torrent files allowed replacing good old Fido file echo conferences. According to them, new files on a pre-selected topic “came themselves”, i.e. it was possible to find out about the updates not by the reviews on the sites, but by sorting the incoming files.

The convenience of this is difficult to describe. Consider the same YouTube, but with real FullHD (which is FullHD, and not what YouTube HD calls), in its convenient player, without lags and advertisements ... In his free time, the computer downloads all the new items itself, and you only choose what to watch from this (and what to kill). An unacceptable waste of the Internet, a luxury that has become available only in recent years with an increase in speed and an almost universal abolition of traffic tariffing.

How it works?


The site accompanying the tracker (mononova, rutracker, bakabt, thepiratebay, animesuki, demonoid, etc.) has the ability to create feeds - pages in the RSS [wiki] format containing links to new torrent files.
')
Usually, it is possible to filter (directly on the site, when selecting the desired RSS) what interests you. For example, tokyotosho allows you to create the correct RSS link with a choice of file types of interest: tokyotosho.info/rss_customize.php .

In a good case, the link will be a torrent file. In the bad - a link to the html-page, which will already be a link to the torrent-file. (we will talk about this subtlety in the implementation section).

Then everything is simple: a certain client (torrent client with built-in RSS or a specialized program) periodically downloads RSS, downloads torrent-files from it, downloads the contents of torrents (well, or sends a torrent client to download). RSS is downloaded every N minutes (I have once an hour), files appear on the disk by themselves.


Potential problems




Theoretical model


I have never seen this anywhere in its full form, so this can be considered a “spherical model in vacuum”.

  1. The RSS server downloads all feeds, separately for each user. Feeds are sent to the download client.
  2. Client download downloads torrent files.
  3. The client sends the downloaded torrent files to filter on the basis of repetition, undesirability
  4. After filtering, the torrent files are transferred to the torrent client in server mode, supporting multiple queues for users and able to find “identical” things from different users, so as not to download them twice.
  5. After the download is complete, hardlink files are put into the "incoming" user, with the copy saved in the personal details of the torrent client until the final siding
  6. The torrent client provides each user with either a web-snout or a rich-application interface (better this and that)


Practical implementation


(under linux)

Immediately I say, my current implementation is quite far from the desired, and is a compromise on the amount of work, functionality and convenience.

So first: RSS. I use the rsstail package for this with the -N1l key (just output the tail of the RSS). Next, I solve the “html-link” problem - this is wget, which can download everything recursively by reference (the nesting depth is 1, so we won’t go far, if the link is a torrent, then there will be no recursion). I also solve one stupid problem that nyaatorrents created, giving away torrents not in the form of files, but forming them dynamically with the Content-Disposition header.

The general line looks like this:

 rsstail -u http://www.nyaatorrents.org/?page=rss\&catid=1\&subcat=37 -N1l | grep http | wget -i - -r -l 1 -nd --content-disposition -A torrent > / dev / null 2> / dev / null

Due to the natural laziness to write processing the feed list, it was lazy. Although necessary. So in the script the same type of lines for all trackers, differing only in address.

I do not use private trackers (as far as possible) and I do not have several users on the machine (at least, torrent users), so this functionality is enough for me.

All torrents are downloaded to / srv / unsorted-torrents. From here, they are filtered (as long as I have very primitive filtering by find) using the “delete excess” method. And carry.

Here is the process of catching replays a little more interesting. I use the singlemv self-writing program, which in a very compact form looks to see whether such a file was before or not. And transfers only under the condition that the file was not (after each transfer, the repetition base is updated).

After that, all the excess in / srv / unsorted-torrents is nailed.

Torrents are transferred to the / srv / torrents-queue directory, which is assigned to the pickup-folder for the torrent client. In my case, this is a deluge, but, in principle, the scheme should work with any torrent client.

Future plans


It is already written in C lstorrent , although without a proper binding (compared to the Python version - threefold savings in memory and 10-fold in speed), allowing you to watch the contents of torrents.

In the long run, the gattai project (never crawled to the alpha version), which will determine files with different releasers / translators with the same name and will automatically choose the best quality among them.

Not solved the issue in the face. On the one hand, I want to have a full-fledged download server, on the other hand, I don’t want to lose a very nice deluge face.

Why all these difficulties?


It is better to lose a day, then fly in an hour.

After I finished making this system in my area around the month of February, for almost half a year the system has been working without any problems and complaints, neatly folding new items into the incoming. Moreover, if for some reason there are problems with deluge (for example, plugging due to a pack of slow torrents that cannot be downloaded for weeks), then the torrent files will continue to be collected, i.e. loss in the "feed" does not occur. (This is exactly what makes this scheme more interesting than the feed pump that is built into the torrent client).

UPD: forgot the most important thing is the singlemv script:

#!/usr/bin/python import sys,os, cPickle #minimalistic command line: #1st argument - memory file #2nd - pre-last - files to be moved #last - destignation (where to be moved) def singlemv(memory,from_list,to): #return new memory mode to_move=frozenset(from_list) - memory errors=set() for f in to_move: try: os.rename(f, os.path.join(to,os.path.basename(f))) except: print "error move %s to %s", f, to errors.add(f) return memory|(to_move-errors) def __main__(): if len(sys.argv)<4: print "Usage: singlemv.py memory_file from to" exit(-1) try: memory_file=file(sys.argv[1],"r") memory=cPickle.load(memory_file) memory_file.close() if type(memory) != type (frozenset()): print "bad memory file" exit(1) except IOError, OSError: memory=frozenset() newmem=singlemv(memory,sys.argv[2:-1],sys.argv[-1]) memory_file=file(sys.argv[1],"w") cPickle.dump(newmem,memory_file) memory_file.close() __main__() 

All the beauty of the program - in the line to_move=frozenset(from_list) - memory (set difference).

Source: https://habr.com/ru/post/96410/


All Articles