
Attempt evaluation. Just the facts.
Princeton's student, Sauhard Sahi, did a little research to assess what kind of data the global torrent traffic consists of. To do this, he connected to the
Mainline DHT network, the main
DHT used by Bitorrent, uTorrent, Transmission, etc. (Azureus / Vuze uses a different DHT system by default, but there is a plug-in that allows it to use Mainline DHT), and received data and fragments of 1021 randomly selected torrents in distribution.
')
At the same time, it should be noted that one can only say that the distribution of this file is among the active ones, but it cannot be said about the scale of its popularity, and the number of distributors or downloaders. In addition, the full download was not carried out, but it turned out only a characteristic fragment that allows you to add an idea about this file or the contents of the torrent, if the torrent contained many files.
It is also worth noting that connecting to DHT allowed us to conduct an analysis without being tied to the specifics of a particular tracker, however, it seems that it excluded a percentage of torrents and customers who do not use DHT from the study (are there any?).
The analysis gave the following results:
From the considered group by file types, the files were divided as follows:
46% - movies and video shows (without porn)
14% - games and software
14% - Porn (video and photo)
10% - music
1% - books and manuals
1% - pictures
14% - failed to classify
Movies and video showsMainly represented by AVI files, and a number of other types, such as RMVB (RealVideo), MPEG, raw DVD (DVD-rips), and various multi-volume RAR-archives with such content. It is curious that in this segment, a preponderance towards recent films is clearly visible.
Of these randomly selected films and videos, 60% were in English, 8% in Spanish, 7% in Russian, 5% in Polish, 5% in Japanese, 4% in Chinese, 4% could not be determined, 3% in French , 1% in Italian, other different languages ​​- 2%.
Games and softwareNo dominant file type was noted in this category. The main file types in this segment were ISO images, multi-volume RAR archives, and EXE files (Windows executables). The games were for various platforms such as XBOX360, Nintendo Wii, Windows PC. 74% of games and software were in English, 12% in Japanese, 5% in Spanish, 4% in Chinese, 2% in Polish, and 1% each in Russian and French.
PornThe dominant format in this category is also AVI, which is similar to the “Movies” category, however, there are significantly more MPEG and WMV files. Also, most porn videos in torrents are presented as a full file, a sample of 1-5 minutes, and a poster in JPG.
Pornovideo was difficult to date, so there was a suggestion that, unlike the tendency revealed in the “movies” group, where the bias towards new films is clearly pronounced, in the porn section they are more evenly distributed along the “time scale”.
We found that 53% of porn movies were in English, 16% in Chinese, 15% in Japanese, 6% in Russian, 3% in German, 2% in French, 2% could not be classified, other languages ​​such as Italian, Hindi Spanish no more than 1% each.
MusicThe main, dominant file type in this category is MP3, but some albums have met in WMA, as well as ISO images and in multi-volume RAR archives. There is also a steady bias towards new products, although not as pronounced as for movies, perhaps because the seeders will continue to distribute them, even when the music being distributed is not so new, therefore these files are stored in the DHT.
By languages, this category is distributed as follows: 78% English, 6% Russian, 4% Spanish, 2% Japanese, 2% Chinese, the rest, more rare languages, no more than 1% each.
Books and manualsBooks and manuals occupy a distinct minority. It was possible to classify only 15 torrents of this kind. 13 in English, 1 in French, 1 in Russian. In addition, met the sets of posters of the national park, a collection of pictures with cars BMW (both in English) and the Japanese comic.
Copyright RelationsOur last classification makes an attempt to figure out what percentage of torrents is copyright infringing.
We classified as non-infringing objects in the following three categories: those in the public domain, freely accessible from legitimate sources, or user-generated.
Based on this classification, all of the 476 torrents of the “movies and video show” category were found to be infringing copyrights. We found that seven of the 148 torrents of the “games and software” category looked like non-infringing copyrights (including two Linux distributions, one add-on pack for the game, as well as free software and beta versions). In the “porno” category, one of the 145 films looked like an amateur video, and we attributed it to non-infringing copyrights. All 98 torrents with music were copyright infringing. Two of the 15 files that were distributed as “books and manuals” looked like non-infringing.
As a result, the authors found that approximately 10 hands out of a total of 1021 torrents could be considered completely non-infringing copyrights, which is approximately 1%.
This result should be evaluated with caution, since the authors could skip some files, and the samples available to the authors (according to the chosen methodology, we did not download the entire file) could add the wrong impression about the copyright relationship of the material. However, from the data that emerged from the survey, it should be concluded that today the Bittorrent network, in the overwhelming majority of cases, is used almost exclusively to transmit illegally copied content that infringes the copyrights of creators and owners.
The original text of the message in English was published
on the Princeton’s Center for Information Technology
blog by the employee of the center, who was the head of the student who conducted the study.