Torrent Monitoring and Auto Jump

Quite recently, there were 2 articles on Habré on how to automate the process of downloading new episodes from torrents. The authors of both articles shared their applications. For a year now, we have also been developing such an application, and it seems to me that the time has come to tell the habrasoobshchestvu about our small but wonderful project Monitorrent , which, perhaps, will make your life as simple and convenient as ours did.

Main Page

The web application is written in Python 2 (with Python 3 support). It allows you to add new torrents for monitoring, automatically download new series and add them to the torrent client.

We use it on an ongoing basis since the end of last year, and on May 1, 2016 we released the first release version, which is still spinning without any disruption on the cubietruck in the docker container.

For details of how it works inside, I ask under cat

I want to come home from work and, having sat down to have supper, simply open Kodi, choose the latest series of my favorite series and watch it. Without making any effort to search for it on torrent trackers and not wasting time waiting for it to download.

Solutions for this automation a lot. At first, I used the Chrome plugin, which monitored the changes on rutracker, and downloaded the changed torrents manually and added to uTorrent via RDC, and later through their web application.

Torrentmonitor

But after I discovered TorrentMonitor, everything became much easier. He worked for me on the router for over a year. Even a couple of pull requests for him were. About this application there were 2 wonderful articles on Habré from its author ( one , two ). Many thanks to the author.

TorrentMonitor is beautiful, but I always had one problem. Sometimes a file of zero size was downloaded. I had to use my hands to climb into the database and correct the information that this series has not yet been downloaded (it seems that this problem has already been fixed). Well, in those days, he could not himself add downloaded torrents to the torrent client (in the Transmission in my case). Now this is all good too.

FlexGet

The next discovery for me was FlexGet. Very powerful tool. It did not have the support of lostfilm.tv and screw it was something else an adventure. Otherwise, he worked fine, but to teach him to follow the change in the torrent on rutracker I never did. Probably now it can not be done. But I had a customized rule that downloaded films of this and the previous year from rutor, with 720p quality (I didn’t allow more Internet) and imdb rating was over 6.0, while excluding films from Japan (well, I don’t like Japanese cinema, and have rating consistently high). All this was described by just a couple of lines in yaml.

For a long time both services (TorrentMonitor and FlexGet) worked alongside the router.

After they gave me a cubietruck, and I installed 2.5 2.5 GB hard drive in it, it turned into a small but very practical NAS, which eats little electricity and regularly pumps torrents. A mobile battery saves you from problems with power outages. The file access speed of about 30 MB / s is stable, this is enough for my tasks. TorrentMonitor and FlexGet migrated to cubietruck.

However, the problem with downloading torrents of zero size has not disappeared anywhere.

Monitorrent

And I wanted to make my project to automate the download of new series. TorrentMonitor is written in PHP and calls curl to download new torrents. To adjust the startup time, use php via cron.

I wanted everything out of the box to install - and it worked.

This is how Monitorrent appeared. Like the idea of writing something useful for yourself in python. A small set of scripts do not count.

This is a one-page web application written in Python 2. Angular 1.4 and angular-material are used as the front-end. And the back-end is just a REST service written using falcon .

All sources are on github , and are distributed under the license Do.

The following trackers are currently supported:

Lostfilm.tv c parsing the series page
rutor.org (now rutor.is and rutor.info ) with tracking torrent changes
rutracker.org with tracking torrent changes and authorization
free-torrents.org with tracking torrent changes
tapochek.net with tracking torrent changes and authorization
unionpeer.org with tracking torrent changes

Downloaded torrents can be added to the following torrent clients:

Transmission
Deluge
uTorrent
qbittorrent

This covers my needs by 200% (I mainly use only 3 trackers and 2 torrent clients).

Front-end

Generally, this is a two-page application.

One page for the login, the second - the rest of the application. A separate login page is needed only so that you cannot download static files (pictures, css or js) before you log in to the system. I'm probably paranoid, there's little point in this, but I like to think that it’s so slightly safer.

Both pages are generated from a single index.htm file, which is then transformed using the gulp-preprocess plugin.

All external js files (frameworks and js libraries) are loaded from the CDN in order to facilitate access to the Monitorrent 'from outside when it is deployed in the home network. If ADSL is at home, and the upload speed is only 512 kbps, then it is much faster to download js from the Internet than from the home network at a limited speed, because the channel is already filled with distribution of torrents. All internal js files have to be downloaded from the home network, which are then perfectly cached by the browser.

And since the rest of the communication is done through REST, very little data is sent between the front-end and the back-end.

Authorization done via JWT. It seems to me that this is the most optimal authorization technology. It allows you not to store the session on the server and does not allow the client to see exactly what data is stored in it. If you are not using JWT in your applications, then I highly recommend it. It is especially good, as it seems to me, to use JWT in microservice architecture.

The gulp done using gulp , which replaced grunt . All js files are simply glued together in one big bundle, which is not even minified yet . But everything sticks together correctly, because the main file is called app.js and the first one gets into the final js. Everything else works thanks to angular DI.

Now I would screw the webpack. But I’m not a front-end developer and I didn’t know anything about front-end development when this project was just beginning.

Dynamic form generation

Among additional features of implementation, we can mention the angular directive we have implemented for generating dynamic forms.

Settings for all plugins are simple forms, for example, this is how the connection setup form with Transmission looks like:

Transmission Settings

This form consists of 2 lines, each of which has 2 text blocks. The length of the host element is 80%, and the length of the port 20%. Text blocks for login and password size of 50%. Writing this form on an angular-material is a trivial task.

However, we wanted to simplify the development of plug-ins and focus on writing backend-logic, and not bother with html. The plugin should be delivered as a single file, without an additional markup file.

We have developed a simple format for describing form markup in the plugin code:

 form = [{ 'type': 'row', 'content': [{ 'type': 'text', 'label': 'Host', 'model': 'host', 'flex': 80 }, { 'type': 'text', 'label': 'Port', 'model': 'port', 'flex': 20 }] }, { 'type': 'row', 'content': [{ 'type': 'text', 'label': 'Username', 'model': 'username', 'flex': 50 }, { 'type': 'password', 'label': 'Password', 'model': 'password', 'flex': 50 }] }]

This is a description of the form for editing settings for Transmission. It describes 3 text blocks and one block for entering a password. The purpose of the type and label properties is clear from their names. The name of the flex property was chosen unsuccessfully, it was more correct to call it width - it defines the element length as a percentage within the string. It was so named because angular-material uses flexbox to describe the layout of elements on the page.

After the user enters the data in this form, and press the Save button. A model of the following type will be sent to the back-end:

 { "host": "myhost", "port": "9091", "username": "username", "password": "******" }

The names of the properties of this model are taken from the model property of the form description.

This allowed us to focus on writing only the logic of the back-end plug-ins and simplified the writing of the UI. In the mobile version of the application, all elements will be located one after another, i.e. elements within one line will be split into several lines. This functionality is still not implemented, but I hope to appear in the future.

Naturally, dynamic generation of forms is not the most flexible solution, but I consider it correct and reasonable. Although our front-end developer does not agree with this to this day and still argues with me about this decision.

Websocket

In one of the first versions, work with Websockets was implemented. First, completely with your hands, then at socket.io .

To work with Websockets from the python side, the python library was used to work with socket.io . It uses gevent , to create coroutine (lightweight streams, greenlet'ov and many others, the name which I do not remember). This is an excellent library for writing asynchronous applications, which must be applications that use Websockets.

But, unfortunately, the python socket.io implementation requires the gevent library over version 1.0. And for gevent home routers there is only version 0.13. To exclude the possibility of running Monitorrent 'and we really didn’t want to run on routers despite the fact that I myself have been using cubietruck for a long time. Therefore, we had to refuse Websockets and replace them with long polling requests in the REST interface. Now they are used only in one place, to obtain the status of the current check for new series.

Back-end

Written in python 2 using falcon . Falcon promises very high performance and it seemed very convenient to me. Initially, Monitorrent was written in cherrypy , then rewritten in the flask , there was an attempt to use the bottle , but it also did not work out and we stopped at falcon .

Unfortunately, falcon is a framework for writing REST services in the first place, and you also need to give static. Falcon does not provide such functionality out of the box, in contrast to the same flask and cherrypy . I had to implement this functionality by myself. In addition, all the tools in falcon for this.

 @no_auth class StaticFiles(object): def __init__(self, folder=None, filename=None, redirect_to_login=True): self.folder = folder self.filename = filename self.redirect_to_login = redirect_to_login def on_get(self, req, resp, filename=None): if self.redirect_to_login and not AuthMiddleware.validate_auth(req): resp.status = falcon.HTTP_FOUND resp.location = '/login' return file_path = filename or self.filename if self.folder: file_path = os.path.join(self.folder, file_path) if not os.path.isfile(file_path): raise falcon.HTTPNotFound(description='Requested page not found') mime_type, encoding = mimetypes.guess_type(file_path) etag, last_modified = self._get_static_info(file_path) resp.content_type = mime_type or 'text/plain' headers = {'Date': formatdate(time.time(), usegmt=True), 'ETag': etag, 'Last-Modified': last_modified, 'Cache-Control': 'max-age=86400'} resp.set_headers(headers) if_modified_since = req.get_header('if-modified-since', None) if if_modified_since and (parsedate(if_modified_since) >= parsedate(last_modified)): resp.status = falcon.HTTP_NOT_MODIFIED return if_none_match = req.get_header('if-none-match', None) if if_none_match and (if_none_match == '*' or etag in if_none_match): resp.status = falcon.HTTP_NOT_MODIFIED return resp.stream_len = os.path.getsize(file_path) resp.stream = open(file_path, mode='rb') @staticmethod def _get_static_info(file_path): mtime = os.stat(file_path).st_mtime return str(mtime), formatdate(mtime, usegmt=True)

Here I had to do mimetype recognition, as well as checking if-modified-since and if-not-match headers for correct caching by the browser of statics. I think I stole this solution from either cherrypy or flask and simply rewrote it for falcon . I don’t think he has a place in falcon , so he didn’t send them a pull request.

The solution seems terrible to me, but we have not yet found a more beautiful one.

The built-in WSGI web server falcon can only be used for development, so everything is spinning on WSGI implementation from cherrypy , which, as far as I know, is very stable:

 d = wsgiserver.WSGIPathInfoDispatcher({'/': app}) server_start_params = (config.ip, config.port) server = wsgiserver.CherryPyWSGIServer(server_start_params, d)

If someone knows a good and fast WSGI server for python, please share in the comments. You need a cross-platform solution, since Monitorrent also works under Windows.

This is the first serious project in python, so many features we do not know. Probably, getting statics can be shifted to some WSGI server, and all the work on processing REST requests can be left on falcon . We will be grateful if someone tells you how to do it right.

Dependency Injection

It's hard for me to understand how you can live without a DI container, but in the python world it's not customary to use them. There were already a lot of holivars on this topic. Unfortunately, no good solution was found, so we took advantage of the explicit injection of dependencies into all classes.

Plugin system

All trackers and torrent clients are implemented as plugins. For now, these are all types of plugins, but soon there will be plugins for notifications. The corresponding pull request expects review and will be available in version 1.1.

We did not find a beautiful system for loading plug-ins, where one could simply scan the folder and somehow download all classes from there, so the implementation idea was stolen from FlexGet.

Each plugin registers itself in the system. Although, as it seems to me, it would be more correct for the system itself to search for plugins, and not the plug-in know how to register itself in the system.

Plugin for torrent client

The plugin interface is very simple:

 class MyClientPlugin(object): name = "myclient" form = [{ ... }] def get_settings(self): pass def set_settings(self, settings): pass def check_connection(self): pass def find_torrent(self, torrent_hash): pass def add_torrent(self, torrent): pass def remove_torrent(self, torrent_hash): pass register_plugin('client', 'myclient', MyClientPlugin())

The methods can be divided into 2 groups, one group for storing client torrent settings, the other for managing torrents.

The set_settings() and get_settings() methods save and read data from the database.

*_torrent() methods manage downloads. A torrent file can be uniquely identified by its hash code, therefore, to delete and search an already downloaded or rocking torrent, it is enough to transfer the hash of the torrent. But to add a torrent it is logical that you need to transfer the entire torrent.

The library for parsing the torrent file was taken from FlexGet. Where she came from there I could not find out (although I did not try hard). A couple of small modifications were made to it to support python 3 and to read a clean, unassembled byte array.

The form field describes the form of settings for this plugin on the UI. You can read about how this works in the section on dynamic form generation above.

Plugins are quite compact and easy to implement. For example, transmission takes only 115 lines, including a couple of comment lines and 7 import lines.

Tracker Plugin

In terms of Monitorrent, any subscription to torrent changes is called a topic. For example, for lostfilm, we follow the changes of the series on its page, and not by parsing RSS. After the release of the new series, we will download a new torrent file, not a modified one. Therefore, it seems to me more reasonable to call a subscription a topic.

If the contract of the plug-in for the torrent client is very simple and therefore there is no base class for it, then it’s more difficult for the tracker. To begin, consider the simple interface of the plugin for the tracker:

 class TrackerPluginBase(with_metaclass(abc.ABCMeta, object)): topic_form = [{ ... }] @abc.abstractmethod def can_parse_url(self, url): pass def prepare_add_topic(self, url): pass def add_topic(self, url, params): pass def get_topics(self, ids): pass def save_topic(self, topic, last_update, status=Status.Ok): pass def get_topic(self, id): pass def update_topic(self, id, params): pass @abc.abstractmethod def execute(self, topics, engine): pass

There are also methods for working with the settings of a specific torrent *_topic() and a separate method for getting all the get_topics() themes.

Adding a new torrent for monitoring occurs on the theme URL. For example, for rutracker is the address of the forum page, for lostfilm it is the page of the series. In order to find out which plug-in can handle this URL, all plug-ins in turn call can_parse_url() method, which through regex checks whether it can work with this URL or not. If such a plugin is not found, the user will see a message that the topic could not be added. If a plug-in that understands this URL was found, then first it calls the prepare_add_topic() method, which returns a model with parsed data and allows the user to edit this data. The form for editing data is described in the topic_form field. After the user edits these topics and clicks the Add button, the add_topic method is add_topic , which is passed the edited model and it saves this topic to the monitoring base.

Now there is one common feature for all themes - this is display_name . The title that is visible on the main page. For lostfilm, you can also choose the quality of the downloaded series.

The largest and most important method is execute(self, topics, engine) . He is responsible for checking changes and downloading new episodes. He is given a list of his topics for verification and a special engine object. The engine object allows you to add new torrents to the torrent client, and also provides an object for logging. Torrent client for downloads can be only one. And it’s not important for the plugin that this is a client, the engine is responsible for the client’s choice, the plugin simply transfers the downloaded torrent to the engine . In the case when the series is distributed by adding new series, the engine deletes the previous distribution and adds a new one.

Since some trackers require authorization, a separate type of plug-in is implemented for them that can store information for a WithCredentialsMixin login. As the name implies, this class is a mixin (why exactly mixin is described below). Only these types of plugins now have a UI setting. This class adds a couple more methods to the plugin interface:

 class WithCredentialsMixin(with_metaclass(abc.ABCMeta, TrackerPluginMixinBase)): credentials_form = [{ ... }] @abc.abstractmethod def login(self): pass @abc.abstractmethod def verify(self): pass def get_credentials(self): pass def update_credentials(self, credentials): pass def execute(self, ids, engine): if not self._execute_login(engine): return super(WithCredentialsMixin, self).execute(ids, engine) def _execute_login(self, engine): pass

Methods for storing and loading authorization data *_credentials . Methods for login and validation of login() and verify() data, respectively. He also overrides the execute() method in order to first log in to the tracker (by calling the _execute_login() method) and then check the threads for changes.

To edit the settings, use a dynamically generated form from the credentials_form field.

Now checking for changes on all trackers, except for lostfilm, is carried out by downloading the torrent file and comparing its hash with what was downloaded last time. If the hash is different, then a new torrent is downloaded and a client is added to the torrent. Probably it was enough to check the page itself, sending a HEAD request or something else, but this option is more reliable. The page size, as it turned out, is larger than the torrent file itself, and simply adding a comment, without being a change in the torrent, will change the page. In addition, rutor did not support HEAD at all.

This logic has been moved to the execute method of the ExecuteWithHashChangeMixin class. This is again a mixin, like WithCredentialsMixin . This allows you to write plugins, inheriting 1 or 2 mixin depending on the tracker, and override only a couple of methods.

This is how the plugin for free-torrents.org is defined:

 class FreeTorrentsOrgPlugin(WithCredentialsMixin, ExecuteWithHashChangeMixin, TrackerPluginBase): ... topic_form = [{ ... }] def login(self): pass def verify(self): pass def can_parse_url(self, url): return self.tracker.can_parse_url(url) def parse_url(self, url): return self.tracker.parse_url(url) def _prepare_request(self, topic): headers = {'referer': topic.url, 'host': "dl.free-torrents.org"} cookies = self.tracker.get_cookies() request = requests.Request('GET', self.tracker.get_download_url(topic.url), headers=headers, cookies=cookies) return request.prepare()

As a result, only a couple of methods require redefinition, and the most complex logic for checking new torrents remains unchanged and is concentrated in WithCredentialsMixin and ExecuteWithHashChangeMixin .

The plugin for rutor.org uses only ExecuteWithHashChangeMixin :

 class RutorOrgPlugin(ExecuteWithHashChangeMixin, TrackerPluginBase): pass

And the plug-in for lostfilm uses only WithCredentialsMixin , because it has its own implementation for finding changes:

 class LostFilmPlugin(WithCredentialsMixin, TrackerPluginBase): pass

The plugin for lostfilm is quite complex, for as many as 640 lines. Login via bogi is especially complicated, but everything has been working like a clock for more than 7 months.

In other languages, this would be implemented slightly differently, but I am glad that in python you can use multiple inheritance, which is worth doing only through mixins. And you should definitely indicate the correct order of all inherited classes. This is probably the only case when it seems to me that multiple inheritance simplifies writing code.

Unfortunately, on rutor, sometimes distributions are deleted and you have to look for a new one, Monitorrent can track remote distributions and highlight such topics on the main screen to the user. The execute() method is also responsible for this logic.

Database

The database used is sqlite. We have a ticket to support other databases, but I do not think that this is necessary, the system complexity will not increase much, but I will have to write a lot of tests to test various databases. In addition, now there is a small amount of code that is strictly tied to sqlite.

As ORM, sqlalchemy is used. This is a powerful and convenient ORM that supports the inheritance of classes with their mapping to the base. It is sqlalchemy that will simplify the transition to another database, if at some time we are going to add support for other databases.

The code in Monitorrent supports data and schema migrations. Unfortunately, the sqlalchemy out of the box does not have this functionality, but there is another project from the authors of sqlachemy - alembic , which we use for these purposes.

Sqlite driver for python has a couple of limitations. One of them is that you cannot use transactions along with modification of the data schema. Sometimes this is critical when migrating a database to a new version. The solution to this problem is described on the sqlalchemy site. From there, this code was moved to Monitorrent . Now migrations work without any problems.

Already, almost all tracker plugins have got their migrations, from the oldest versions to the latest release. Implemented support for migrations through the upgrade method, which is transmitted when registering the plugin.

The upgrade method first determines the current version by various checks such as the existence of columns and tables, and then directly migrates from this version to the last.

Sample migration code for rutor:

 def upgrade(engine, operations_factory): if not engine.dialect.has_table(engine.connect(), RutorOrgTopic.__tablename__): return version = get_current_version(engine) if version == 0: upgrade_0_to_1(engine, operations_factory) version = 1 if version == 1: upgrade_1_to_2(operations_factory) version = 2 def get_current_version(engine): m = MetaData(engine) t = Table(RutorOrgTopic.__tablename__, m, autoload=True) if 'url' in t.columns: return 0 if 'hash' in t.columns and not t.columns['hash'].nullable: return 1 return 2 def upgrade_0_to_1(engine, operations_factory): m0 = MetaData() rutor_topic_0 = Table("rutororg_topics", m0, Column('id', Integer, primary_key=True), Column('name', String, unique=True, nullable=False), Column('url', String, nullable=False, unique=True), Column('hash', String, nullable=False), Column('last_update', UTCDateTime, nullable=True)) m1 = MetaData() topic_last = Table('topics', m1, *[c.copy() for c in Topic.__table__.columns]) rutor_topic_1 = Table('rutororg_topics1', m1, Column("id", Integer, ForeignKey('topics.id'), primary_key=True), Column("hash", String, nullable=False)) def topic_mapping(topic_values, raw_topic): topic_values['display_name'] = raw_topic['name'] with operations_factory() as operations: if not engine.dialect.has_table(engine.connect(), topic_last.name): topic_last.create(engine) operations.upgrade_to_base_topic(rutor_topic_0, rutor_topic_1, PLUGIN_NAME, topic_mapping=topic_mapping)

In one of the very first versions, all plugins had their own table for storing themes. Later, common fields such as url and display_name were moved to the topics table. In the code, this is implemented as the inheritance of all the classes of themes from the base Topic of the class.

, topics. , MonitorrentOperations.upgrade_to_base_topic :

 def upgrade_to_base_topic(self, v0, v1, polymorphic_identity, topic_mapping=None, column_renames=None): from .plugins import Topic self.create_table(v1) topics = self.db.query(v0) for topic in topics: raw_topic = row2dict(topic, v0) # insert into topics topic_values = {c: v for c, v in list(raw_topic.items()) if c in Topic.__table__.c and c != 'id'} topic_values['type'] = polymorphic_identity if topic_mapping: topic_mapping(topic_values, raw_topic) result = self.db.execute(Topic.__table__.insert(), topic_values) # get topic.id inserted_id = result.inserted_primary_key[0] # insert into v1 table concrete_topic = {c: v for c, v in list(raw_topic.items()) if c in v1.c} concrete_topic['id'] = inserted_id if column_renames: column_renames(concrete_topic, raw_topic) self.db.execute(v1.insert(), concrete_topic) # drop original table self.drop_table(v0.name) # rename new created table to old one self.rename_table(v1.name, v0.name)

, 2.5 . , . .

DBSession python with, :

 with DBSession() as db: cred = db.query(self.credentials_class).first() cred.c_uid = self.tracker.c_uid cred.c_pass = self.tracker.c_pass cred.c_usess = self.tracker.c_usess

FlexGet'. DBSession() . .

Monitorrent . 2 , .

threading.Thread . threading.Timer , stop() , . , . , 2 ( ) .

Monitorrent ' server.py . cherrypy 6687 .

debug – , JWT , . false. .
ip – , . 0.0.0.0.
port – , . 6687
db_path – . monitorrent.db. Those. .
onfig – , . config.py

3- .

, .

— (config.py ). python , .

python exec . python 3 exec_ six .

 with open(config_path) as config_file: six.exec_(compile(config_file.read(), config_path, 'exec'), {}, parsed_config)

- , , .

— : MONITORRENT_DEBUG , MONITORRENT_IP , MONITORRENT_PORT MONITORRENT_DB_PATH . .

, , .

docker , .

config.py, .

python 100%. – server.py . . — . .

100% , .

unittest python.

, . Sqlite , . , , , , .

, , .

Monitorrent ' – .

vcrpy , , . monkey requests , . Those. back-end, , , . , back-end . , . , , .

. html . , , 404 . httpretty . httpretty , , .

, vcrpy FlexGet'. 97% vcrpy httpretty .

2 , . coveralls.io codecov.io .

, lostfilm, html:

 parser = None # lxml have some issue with parsing lostfilm on Windows if sys.platform == 'win32': parser = 'html5lib' soup = get\_soup(r.text, parser)

, Linux 100%. coveralls.io, codecov.io . . :

 # lxml have some issue with parsing lostfilm on Windows, so replace it on html5lib for Windows soup = get\_soup(r.text, 'html5lib' if sys.platform == 'win32' else None)

python , . , . . . codecov.io Chrome, github, .

front-end . . - , .

, vcrpy , .. back-end'.

Monitorrent – . 2 : Windows – ci.appveyor.com , — travis-ci.org Linux. Appveyor Windows . – travis, , coveralls.io codecov.io.

drone.io docker x86/x64 ARM. , . , .

. git flow github. master , issue. develop.

Semantic Versioning . — 1.0.0. 4 , .

ZenHub Chrome & Firefox, Boards Burndown github issue. waffle.io , ZenHub .

Conclusion

Monitorrent 9 .

. . , . Monitorrent , , requests. - . Windows, cubietruck. .

. 10 , , UI .

(, UX ), , .

, github. , pull request' , github'. .

, . Monitorrent .

Source: https://habr.com/ru/post/305574/

All Articles