Recently popular rumors about blocking torrent trackers (
or no longer rumors? ) Prompted me to write my own parser for rutracker.org. In this article I will describe the options for the script output. Also included is a viewer with the ability to search by category, and a distribution base with descriptions dated January 16, 2016.

What for?
- At the time of starting work on the script, rutracker had not yet laid out its base.
- In this database, there are descriptions of distributions.
- A similar article was on Habré in 2013, but unfortunately, this database has never been updated after publication, and the last update of the repository was January 16, 2014.
- This is my first great python application, which allowed me to get acquainted with many interesting features of the language, ranging from parallelizing tasks to compiling a binary under windows.
An example of running a script for mirroring a site
python3 ./
loader.py --ids 0000001 5160000 --threads 200 --qsize 25 --resume
')
Description of parameters
Option | Description |
---|
--ids 0000001 0001000 | pumps distribution in the specified range |
--ids_file file_with_ids.txt | takes the numbers of distributions for download from the specified file |
--ids_ignore old_finish.txt | excludes distributions from download that are not in the specified file (for example, you can skip those numbers that were not in the previous download) |
--random | download distributions randomly |
--threads 100 | number of threads to download |
--proxy_file proxy.txt | file with socks5 proxy (by default - proxy.txt) |
--login_file login.txt | file with logins to the site (by default - login.txt) |
--resume | continue the previous download (ignore the distributions from the file finished.txt) |
--print | along with the resume option will display the number of downloaded / not downloaded distributions and terminates the program |
--folder descriptions | specifies the name of the directory where handout descriptions will be saved (by default - descr) |
--qsize 20 | the maximum value of the queue for download (default - 30) |
Descriptions of auxiliary text files
File format with logins to the site
username1 password1
username2 password2
Note: Date of distribution change depends on the time zone set in the profile.
Proxy file format
127.0.0.1 8080
127.0.0.1 8081
In progress
The numbers of downloaded or non-existing distributions are recorded in the file finished.txt. The log is displayed in parallel to the console and to the file log.txt. Cookies are written in temp_cookies.txt, for saving between sessions.
Result Description
The file table.txt stores information about distributions (except for descriptions). For each distribution, the following is saved (separator - \ t):
- Identifier
- Distribution name
- Size (in bytes)
- Number of seeders
- Number of peers
- Hash
- Number of downloads
- date of creation
- Category, including all subcategories (separator - “|“)
Example:
4130425 Mark Lutz/ - Python, 4- [2011, PDF, RUS] 12799942 390 9 B507A45DA54ED5EED13221B16E2030DF789A235F 46455 28-08-12 11:28 | |
Distribution descriptions are saved in separate files in the descr folder (or specified by the - folder option), in a subdirectory of the first three digits of the distribution number.
For example, the description of the distribution 04893221 will be saved in descr \ 048 \ 04893221.
The main problems that arose when creating a script
- Site restrictions on 1 login and 1 IP address, in the end, the simplest solution is to use IP and login in only one stream at a time.
- The multiprocessing module for creating a thread uses the fork system call, which copies the current process. Accordingly, if all configs are read first, a download list is built, then the process eats more than 50 MB of memory and all descendants are just as heavy. The solution is to first create a pool of processes, and only then read configs.
- On different pages, the same distribution information can be presented in slightly different forms in the source code. Solution - when errors appear, manually added additional options to the script.
- Working with the network stack, especially if you use untested proxy servers, can give a bunch of all sorts of errors. With a large number of errors from one proxy - it is blocked, if the page is received with an error - it is added to the end of the queue for reloading.
- A lot of various errors when writing a multiprocessor application. The most effective method of debugging was logging of everything and everything with different levels of display.
- When creating a viewer, the main problem was the search by hand. MySQL full-text search refused to search for parts of a word (only by complete coincidence). Brute force is too long. The current trade-off is that the list is sorted by the number of seeders, the search is performed sequentially, until a specified number of results is found.
Viewer
To view the saved database and search by hand, a viewer is written using PyQt5. To work, the file table_sorted.tar.bz2, which contains the text file table_sorted.txt (table.txt, sorted by the number of seeds), is required. You can use
this script for conversion.
Also (optional), you can put the descr directory next to the script in which the archived distribution descriptions will be located. I pack
this script .
An example of the structure of directories and archives:
descr / 000 / 00000.tar.bz2
descr / 000 / 00001.tar.bz2
...
descr / 000 / 00099.tar.bz2
descr / 001 / 00100.tar.bz2
descr / 001 / 00101.tar.bz2
...
descr / 001 / 00199.tar.bz2
...
The appearance of the viewer is shown on the KDPV.
Search options
- Minus before the word to exclude hands with him from the issuance.
- limit: 5 to set a limit on the results after which the search is stopped (the default is 20).
- In the right field, you can enter the search words in the category name.
A double click on the hash copies the magnet link with it to the clipboard.
Requirements
python3-pyqt5, python3-pyqt5.qtwebkit
Installation
Script for python3.
Compiled binary for Windows:
Mega.nz (30 MB) (including all necessary libraries).
Base as of 01/16/2016
File with basic information about the distribution, sorted by the number of
seeders :
Mega.nz (118 MB).
Distribution descriptions (unzip to folder with viewer):
Mega.nz (2.06 GB).
Any criticism by code is welcome.
In the future, there is an idea to add libtorrent support to the viewer and get the opportunity to view the contents of the distributions / stream them to the player.
I also have a saved base from June 2014 (no categories) and from July 2015 (with categories), if there are any suggestions, I can count some statistics on the changes between these sections.
Script sources are posted on
github .