Torrent file What's inside it?

Introduction

Good day.
I use, like many, a large torrent tracker - rutracker.org, but there is one feature that annoys me.
This is adding to the list of trackers ix * .rutracker.net address , which serves for purposes I do not understand. However, that often (with me - almost always) gives errors ( 502 Bad Gateway and 0 No Response ). The torrent client (I have Transmission) marks the torrent broken. Which by itself rather bothers me. Especially if you take into account the Transmission feature - it sets the status of the torrent by the last answer of the tracker. That is, we poll ix *, it returns an error, the torrent is marked as Broken, after n minutes / seconds the next tracker from the list is polled - bt * .rutracker.org or retracker.local , which return a successful code and the torrent again becomes normal. Such a leapfrog doesn't make me particularly happy.

The solution is trite - remove this bad address from the list. However, I have a lot of files, I don’t want to cut out from everyone manually, and there was no desire to take an additional action when adding a new torrent. Therefore, I decided to understand the format and automate the removal of the tracker from the list.

Bencode

This is the name of the data encoding format in .torrent files. He is almost nowhere else used, he caught my eye also in the format of storing the trans-information in Transmission.
For most of the current languages, libraries are written to work with this format, but not for C ++, yes, of course, there is such a thing , but this is pure C and besides, the presentation form did not seem successful to me, so I wrote my own simple bike, because the format is extremely simple.
')
4 data types are described - byte array, number, list, associative array.

Let's go in order:

Numbers are given in the form i <sequence of digits> e , <sequence of digits> are numbers in the ascii representation, that is, 1 is specified as '1' or 0x31. It is noticeable that in this way we can specify huge numbers that do not fit in either long or long long, but most neglect the lack of a limit and use 64-bit numbers.
The byte array is <array length>: <array itself> . The length of the array is also formed by an unlimited sequence of numbers.
List - l <list items> e . The element can be any of the data types. Including a nested list. The end, as can be seen from the format, is marked with the literal 'e'.
Associative array - d <array elements> e . Each element of the array looks like this - <byte array> <element> . An array of bytes is the name of the record in the form of clause 2. Again, there can be any element — a list, an array, an associative array, a number.

It's all. The file itself is a sequence of such records. Therefore, decoding is extremely simple:

void CTorrentFile::ReadBencElement(ifstream & fin, tree <BencElement>::pre_order_iterator & parent, string name) { BencElement el; char c = fin.get(); el.name = name; if (c == 'i') { el.type = BencInteger; fin >> el.integer; m_tree.append_child(parent, el); } else if (c == 'l') { int l = fin.peek(); el.type = BencList; tree <BencElement>::pre_order_iterator it = m_tree.append_child(parent, el); while (l != 'e') { ReadBencElement(fin, it, string("")); l = fin.peek(); } fin.seekg(1, ios_base::cur); } else if (c == 'd') { int l = fin.peek(); el.type = BencDict; tree <BencElement>::pre_order_iterator it = m_tree.append_child(parent, el); while (l != 'e') { string name; int len; fin >> len; fin.seekg(1, ios_base::cur); while (len--) { char s = fin.get(); name += s; } ReadBencElement(fin, it, name); l = fin.peek(); } fin.seekg(1, ios_base::cur); } else if (c >= '0' && c <= '9') { fin.seekg(-1, ios_base::cur); int len; el.type = BencString; fin >> len; el.bstr.len = len; // skip ':' fin.seekg(1, ios_base::cur); el.bstr.byteStr = new char[len + 1]; for (int i = 0; i < len; i++) { char s = fin.get(); el.bstr.byteStr[i] = s; } el.bstr.byteStr[el.bstr.len] = 0; m_tree.append_child(parent, el); } }

Coding is also easy:

 void CTorrentFile::WriteBencElement(std::ofstream & fout, tree <BencElement>::sibling_iterator & el) { tree <BencElement>::sibling_iterator it; switch (el->type) { case BencInteger: fout << 'i' << el->integer << 'e'; break; case BencString: fout << el->bstr.len << ':'; fout.write(el->bstr.byteStr, el->bstr.len); break; case BencList: fout << 'l'; it = m_tree.child(el, 0); for (size_t i = 0; i < m_tree.number_of_children(el); i++, ++it) WriteBencElement(fout, it); fout << 'e'; break; case BencDict: fout << 'd'; tree <BencElement>::sibling_iterator it = m_tree.child(el, 0); for (size_t i = 0; i < m_tree.number_of_children(el); i++, ++it) { fout << it->name.length() << ':' << it->name.c_str(); WriteBencElement(fout, it); } fout << 'e'; break; } }

The structure of a .torrent file.

As I wrote above, Bencode is used for encoding.
It is worth adding that if an array of bytes can be interpreted as a string (the names of elements in an associative array, just string fields), then the utf-8 encoding is used.

Content is one large associative array with the following fields:

info - a nested associative array that actually describes the files that the torrent transmits.
announce - URL for the tracker. Along with info is a required field, everything else is optional.
announce-list - list of trackers, if there are several. In Bencode-form - a list of lists.
creation date - the creation date. UNIX Timestamp.
comment - text description of the torrent. rutracker.org stores here a link to the forum topic.
created by - tells us who created this torrent.

It is necessary to mention that the files are presented in the protocol in chunks. That is, the files contained in the torrent are combined into a single array, and then this array is divided into relatively small pieces. In this form, the data is processed by the BitTorrent protocol.

The associative info array consists of:

piece length - the size of one piece - 512 kilobytes, 1 meter, and so on. Too many pieces will inflate a .torrent file.
pieces is a string that contains the concatenation of SHA1 hashes that describe each piece. The length of this string is 20 * the number of pieces.
name is a recommendable file name (if the file is one) or directories. Alas, many torrent clients see this as an axiom.
length - if the file is one, then this field will be set, which contains the length of the file.
files - if there are several files, a list of associative arrays will appear.

Format of files list items:

length - the length of the file.
path - a list of strings that specify the path. Each line is an element of the path relative to the root directory of the torrent. For the path a / b / c / d.jpg there will be 4 lines in this list - ['a', 'b', 'c', 'd.jpg'] .

In general, that's all.
We currently only need one field - announce-list . Going over this list we find the objectionable tracker and cut it out:

 int CTorrentFile::RemoveTracker(const char * mask) { int deletedCount = 0; tree <BencElement>::pre_order_iterator root = m_tree.child(m_tree.begin(), 0); tree <BencElement>::sibling_iterator it = m_tree.child(root, 0); for (size_t i = 0; i < m_tree.number_of_children(root); i++, ++it) { if (it->type == BencString && !it->name.compare("announce") && it->bstr.len > 0 && it->bstr.byteStr) { if (wildcardMatch(it->bstr.byteStr, mask)) { it->bstr.len = 0; it->bstr.byteStr[0] = 0; deletedCount++; } } else if (it->type == BencList && !it->name.compare("announce-list")) { tree <BencElement>::sibling_iterator trackerList = m_tree.child(it, 0); for (size_t j = 0; j < it.number_of_children(); j++) { if (trackerList->type != BencList) { ++trackerList; continue; } tree <BencElement>::sibling_iterator tracker = m_tree.child(trackerList, 0); for (size_t k = 0; k < trackerList.number_of_children(); k++) { if (tracker->type != BencString || tracker->bstr.len <= 0 || !tracker->bstr.byteStr) { ++tracker; continue; } if (wildcardMatch(tracker->bstr.byteStr, mask)) { tracker = m_tree.erase(tracker); deletedCount++; } else ++tracker; } if (trackerList.number_of_children() == 0) trackerList = m_tree.erase(trackerList); else ++trackerList; } } } return deletedCount; }

Build everything into one source:
Download - cross-platform (win + * nix), we need boost :: filesystem .

It's easy to use:
torrentEditor <file name> <template> , where the template is a wildcard string ('*' and '?'), for my case - http: //ix*rutracker.net/*
If you substitute the directory name as the file name, then a recursive traversal along this directory and modification of the * .torrent files will be performed.
The backup for <name> .torrent is saved in <name> .old .

Daemons and watch-directory.

This way we can go over existing .torrent files and cut the tracker, but what about new files?
I use the convenient directory - watch directory. We throw there a .torrent and the client, finding it in this folder, will automatically add it to itself.
However, I don’t want to cut the tracker at all, but I want to automate this matter.
Therefore, I wrote a simple daemon that monitors its own watch directory, deletes the tracker and throws the file into the watch directory of the torrent client.
For me, as a user, absolutely nothing has changed, I throw files into the same folder, I get a torrent at the output in the client.

We write a demon on C using a wonderful thing - inotify ,

  notifyDesc = inotify_init(); if (notifyDesc < 0) exit(EXIT_FAILURE); watchDesc = inotify_add_watch(notifyDesc, argv[1], IN_CREATE); if (watchDesc < 0) exit(EXIT_FAILURE); // endless loop while (1) { processEvents(notifyDesc, argv[2], argv[3], argv[1]); }

We initialize the module using inotify_init () , then add the directory for tracking inotify_add_watch () , we are only interested in creating the file, therefore we specify the IN_CREATE flag. And then we twist the endless loop of tracking the directory.

 static void processEvents(int wd, char * moveDir, char * pattern, char * watchDir) { #define BUF_SIZE ((sizeof(struct inotify_event) + FILENAME_MAX) * 10) int len, i = 0; char buf[BUF_SIZE]; // blocked read, we wake up when directory changed len = read(wd, buf, BUF_SIZE); while (i < len) { struct inotify_event * ev; ev = (struct inotify_event *)&buf[i]; processNewFile(ev->name, moveDir, pattern, watchDir); i += sizeof(struct inotify_event) + ev->len; } }

The blocking call read () will return control to us as soon as the necessary changes occur to us in one of the directories we are watching. Thus, we absolutely do not ship the processor while waiting.
The file processing itself is nothing interesting - a pair of rename () calls and one system () call.

Demonization is also standard:

  // create child-process pid = fork(); // error? if (pid < 0) exit(EXIT_FAILURE); // parent? if (pid > 0) exit(EXIT_SUCCESS); // new session for child sid = setsid(); if (sid < 0) exit(EXIT_FAILURE); // change current directory if (chdir("/") < 0) exit(EXIT_FAILURE); // close opened descriptors close(STDIN_FILENO); close(STDOUT_FILENO); close(STDERR_FILENO);

Source code

Source: https://habr.com/ru/post/119753/

All Articles

Torrent file What's inside it?

Introduction

Bencode

The structure of a .torrent file.

Daemons and watch-directory.

More articles: