📜 ⬆️ ⬇️

How to run your own torrent search engine based on RuTracker?

I will try to write as much as possible without "water." A minimum of superfluous distracting information and ranting. Maximum useful information and working code. I will not raise the question of why someone own torrent search engine based on RuTracker. And I do not consider myself a programming guru. We will just make this site together. We will use Apache + PHP, MySQL and Sphinx. Immediately I will warn you that the site will not work very quickly on a minimal virtual hosting.

image

Database


First we need to take the base itself. RuTracker every month lays out a dump of its torrents here . Download, unpack and see two dozen CSV files.

image
')
We need only those in which there is information about the torrents - the rest are deleted. In the file “category_info.csv” - a hint for those who do not want to open each file (delete: “category_1.csv”, “category_4.csv”, “category_36.csv”). Open any of the remaining files and see such a structure (I immediately replaced the “;” symbol with a new line, which would be visually more convenient):
"1568"Section ID on RuTracker
"Cooking"Section name
"63629"Theme ID on RuTracker
"F7D7BE97A818CCDFA072C42348EB669F7883888D"Hash torrent
"(Cooking) Tasty Stories 1"The name of the torrent
"729927066"Size of distribution in bytes
"2006-08-21 10:00:22"Date of distribution

Now we will add all the information to the database. We use MySQL, as the most common database. I got this table (note: the “hash” column is unique, all text data is in utf8 encoding):

SQL table
CREATE TABLE IF NOT EXISTS `torrents` ( `id` int(11) NOT NULL, `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `hash` varchar(40) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, `date` date NOT NULL, `size` int(11) NOT NULL, `topic_id` int(11) NOT NULL, `cat_id` int(11) NOT NULL, `cat_name` varchar(120) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf32 COLLATE=utf32_bin; ALTER TABLE `torrents` ADD PRIMARY KEY (`id`), ADD UNIQUE KEY `hash` (`hash`); ALTER TABLE `torrents` MODIFY `id` int(11) NOT NULL AUTO_INCREMENT; 


Then we upload all the CSV files into one folder (for example, let's call it “db”) on the server. Adding information about torrents in the database is accomplished using a simple script presented below. It also needs to be downloaded to the same folder where the source CSV files are located.

File insert_to_db.php
 <? //    3-  set_time_limit(180); //  MySQL,     mysql_connect("localhost", "torrent", "password") or die("Could not connect to MySQL"); // ,     mysql_select_db("torrent") or die("Could not select database"); //       utf8 mysql_query("SET NAMES utf8"); //    url  "f" $fp = fopen($_GET[f], "r"); //       while (!feof($fp)) { //  (,  trim()   ,      "") $tmp = trim(fgets($fp)); //   .    ";" $torrent = explode('";"', $tmp); //        " $torrent[0] = substr($torrent[0], 1); $torrent[6] = substr($torrent[6], 0, (strlen($torrent[6]) - 1)); //   ,          //print '<pre>'; print_r($torrent); exit(); //      mysql_query("INSERT INTO `torrents` (`name`, `hash`, `date`, `size`, `topic_id`, `cat_id`, `cat_name`) VALUES ('" . mysql_real_escape_string($torrent[4]) . "', '" . $torrent[3] . "', '" . $torrent[6] . "', '" . $torrent[5] . "', '" . $torrent[2] . "', '" . $torrent[0] . "', '" . mysql_real_escape_string($torrent[1]) . "') "); } //  fclose($fp); //     print 'complete: ' . $_GET[f]; ?> 


Open the browser, open the url "http://site.ru/db/insert_to_db.php?f=category_10.csv". We do the same with every CSV file. Yes, all this could be automated, but I specifically wrote so that everything would be as clear as possible. After these actions, a little more than 1.6 million records appeared in our table. Not a small base like this. MySQL search with this amount of data will not cope, so we will entrust this task to Sphinx.

image

Sphinx


Sphinx is installed on different systems in different ways. It all depends on the operating system and hardware. This topic deserves a separate article. But there are so many great manuals on the Internet. In Russian too. Now we will set up the configuration file for Sphinx. Create in the root directory of the site directory, for example, cache. All Sphinx index files for our site will be stored here. We load the configuration file into this folder (listing below).

Torrents.conf file
 #      source torrentz { #    type = mysql sql_host = localhost sql_user = torrent sql_pass = password sql_db = torrent sql_port = 3306 #        utf8 sql_query_pre = SET NAMES utf8 sql_query_pre = SET CHARACTER SET utf8 #     sql_query = SELECT id, name FROM torrents #  ( )     .       sql_ranged_throttle = 0 } #  .       Sphinx index torrentz { #   source = torrentz #     path = /home/rutr/rutracker.online/www/cache/ #    docinfo = extern #      morphology = stem_enru #     min_word_len = 2 #   charset_type = utf-8 #  charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F #    min_infix_len = 2 #    "*" enable_star = 1 } #   indexer { #     mem_limit = 32M } #    searchd { #           listen = 127.0.0.1:3312 #  log = /home/rutr/rutracker.online/www/cache/searchd.log #   query_log = /home/rutr/rutracker.online/www/cache/query.log #      read_timeout = 5 #  -    max_children = 30 #   pid- pid_file = /home/rutr/rutracker.online/www/cache/searchd.pid #  -   max_matches = 1000 } 


Connect to the server via ssh. In order for Sphinx to be able to search our database, you need to prepare an index. Execute the command:

 indexer --config /home/rutr/rutracker.online/www/cache/torrents.conf –all 

Sphinx will spend some time indexing the database. Duration depends on server capacity. In my case, the indexing took about 10 minutes.

image

After the end of the indexing, check whether everything is ok. To do this, perform a search through the console using the command (the search phrase is written after specifying the config file):

 search --config /home/rutr/rutracker.online/www/cache/torrents.conf morrowind mod 

image

If you saw something similar to the upper screenshot, then the indexing was successful. If nothing is found, then you do not need to run the following command. To launch the Sphinx search daemon, execute the following command:

 searchd --config /home/rutr/rutracker.online/www/cache/torrents.conf 

image

Please note that the daemon needs to be run after each reboot. To disable the daemon (if needed), add "--stop" at the end of the above command.

Web


I did not think for a long time what framework to use for the web interface. The requirements are simple: ease of use, responsive design and support for all modern browsers. Under this is great, albeit a bit bored, Bootstrap. It is not necessary to download the distribution kit, you can connect the style file online. The main page is in pure HTML, without using PHP. Comments to the code, I think, will be superfluous.

File index.php
 <!DOCTYPE html> <html lang="ru"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="icon" href="/favicon.ico"> <title>  RuTracker</title> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" crossorigin="anonymous"> <style type="text/css"> .inCenter { margin: auto; position: absolute; top: 0; left: 0; bottom: 0; right: 0; } .inCenter.isResp { width: 50%; height: 50%; min-width: 400px; max-width: 800px; padding: 40px; } </style> </head> <body> <div class="container"> <div class="row"> <div class="inCenter isResp"> <div class="col-sm-12 col-md-10 col-md-offset-1"> <form action="search.php" method="GET"> <div class="form-group text-center"> <h1>  RuTracker</h1> </div> <div class="form-group input-group"> <input class="form-control input-lg" type="text" name="q" placeholder=""/> <span class="input-group-btn"> <button class="btn btn-primary input-lg" type="submit"><i class="glyphicon glyphicon-search"></i></button> </span> </div> </form> </div> </div> </div> </div> </body> </html> 


The design of the main page was very minimal and very functional.

image

The search script will be more interesting. First we need the Sphinx API for PHP. The latest version can be found here . Briefly describe how the search script works, and more details in the listing. We connect the file to work with the API, we set up the search, we search, we post the search results in a convenient way. Download torrent can be directly from the search, without additional clicks.

File search.php
 <? //    $q=trim(urldecode($_GET[q])); //    ,       if (empty($q)) {header("Location: /"); exit();} //  , ... //  MySQL,     mysql_connect("localhost", "torrent", "password") or die("Could not connect to MySQL"); // ,     mysql_select_db("torrent") or die("Could not select database"); //       utf8 mysql_query("SET NAMES utf8"); // API Sphinx include("sphinxapi.php"); //  Sphinx $sphinx=new SphinxClient(); //  Sphinx-.      "torrents.conf" $sphinx->SetServer('localhost', 3312); //     $sphinx->SetMatchMode(SPH_MATCH_ANY); //    $sphinx->SetSortMode(SPH_SORT_RELEVANCE); // 50    . $sphinx->SetLimits(0, 50); //  (* -      "torrents.conf",     : torrentz) $torrents=$sphinx->Query($q, '*'); //    ,         //print $sphinx->getLastError(); //print '<br><pre>'; print_r($torrents); exit(); //    ,   .   .       -  . function bytesToSize($bytes, $precision = 0) { $kilobyte = 1024; $megabyte = $kilobyte * 1024; $gigabyte = $megabyte * 1024; $terabyte = $gigabyte * 1024; if (($bytes >= 0) && ($bytes < $kilobyte)) {return $bytes . ' B';} elseif (($bytes >= $kilobyte) && ($bytes < $megabyte)) {return round($bytes / $kilobyte, $precision) . ' Kb';} elseif (($bytes >= $megabyte) && ($bytes < $gigabyte)) {return round($bytes / $megabyte, $precision) . ' Mb';} elseif (($bytes >= $gigabyte) && ($bytes < $terabyte)) {return round($bytes / $gigabyte, $precision) . ' Gb';} elseif ($bytes >= $terabyte) {return round($bytes / $terabyte, $precision) . ' Tb';} else {return $bytes . ' B';} } ?> <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="icon" href="/favicon.ico"> <title><?=htmlspecialchars($q)?></title> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" crossorigin="anonymous"> <style type="text/css"> body { padding-top: 80px; padding-bottom: 20px; } </style> </head> <body> <nav class="navbar navbar-default navbar-fixed-top"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> <span class="sr-only"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="/">  RuTracker</a> </div> <div id="navbar" class="navbar-collapse collapse"> <form action="/search.php" method="GET" class="navbar-form navbar-left"> <div class="form-group input-group"> <input type="text" placeholder="" value="<?=htmlspecialchars($q)?>" class="form-control" name="q"> <span class="input-group-btn"> <button class="btn btn-primary" type="submit"><i class="glyphicon glyphicon-search"></i></button> </span> </div> </form> </div> <!--/.navbar-collapse --> </div> </nav> <div class="container"> <h1><?=htmlspecialchars($q)?></h1> <table class="table table-striped"> <caption> : <?=$torrents[total_found]?></caption> <tbody> <? //         $ids = array_keys($torrents[matches]); //   id-     SQL  $ids = implode(',', $ids); // SQL        $sql="SELECT `id`, `name`, `hash`, `date`, `size` FROM `torrents` WHERE `id` IN (".$ids.") ORDER BY FIELD(`id`, ".$ids.")"; // SQL  $r=mysql_query($sql); //   for ($i=0; $i < mysql_num_rows($r); $i++) { //      $f=mysql_fetch_array($r); //     $torrent_date=explode('-', $f[date]); //   ,      $torrent_date=$torrent_date[2].'.'.$torrent_date[1].'.'.$torrent_date[0]; ?> <tr> <td width="75%"><a href="/torrent.php?id=<?=$f[id]?>"><?=$f[name]?></a></td> <td width="5%"><a href="magnet:?xt=urn:btih:<?=$f[hash]?>"><i class="glyphicon glyphicon-magnet"></i></a></td> <td width="10%"><?=bytesToSize($f[size])?></td> <td width="10%"><?=$torrent_date?></td> </tr> <? } ?> </tbody> </table> </div> <!-- /.container --> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> <script src="http://getbootstrap.com/dist/js/bootstrap.min.js"></script> </body> </html> 


For the convenience of users, we will make a separate page for each torrent. Suddenly someone will need to send a link.

File torrent.php
 <? //   id  $id=trim(urldecode($_GET[id])); //  id,       if (empty($id)) {header("Location: /"); exit();} // id , ... //  MySQL,     mysql_connect("localhost", "torrent", "password") or die("Could not connect to MySQL"); // ,     mysql_select_db("torrent") or die("Could not select database"); //       utf8 mysql_query("SET NAMES utf8"); // SQL      id $sql="SELECT * FROM `torrents` WHERE `id`='".mysql_real_escape_string($id)."'"; // SQL  $r=mysql_query($sql); //   id  ,       if (mysql_num_rows($r)==0) {header("Location: /"); exit();} //     $torrent=mysql_fetch_array($r); //     $torrent_date=explode('-', $torrent[date]); $torrent_date=$torrent_date[2].'.'.$torrent_date[1].'.'.$torrent_date[0]; //    ,   .   .       -  . function bytesToSize($bytes, $precision = 0) { $kilobyte = 1024; $megabyte = $kilobyte * 1024; $gigabyte = $megabyte * 1024; $terabyte = $gigabyte * 1024; if (($bytes >= 0) && ($bytes < $kilobyte)) {return $bytes . ' B';} elseif (($bytes >= $kilobyte) && ($bytes < $megabyte)) {return round($bytes / $kilobyte, $precision) . ' Kb';} elseif (($bytes >= $megabyte) && ($bytes < $gigabyte)) {return round($bytes / $megabyte, $precision) . ' Mb';} elseif (($bytes >= $gigabyte) && ($bytes < $terabyte)) {return round($bytes / $gigabyte, $precision) . ' Gb';} elseif ($bytes >= $terabyte) {return round($bytes / $terabyte, $precision) . ' Tb';} else {return $bytes . ' B';} } ?> <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="icon" href="/favicon.ico"> <title><?=htmlspecialchars($torrent[name])?></title> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" crossorigin="anonymous"> <style type="text/css"> body { padding-top: 80px; padding-bottom: 20px; } </style> </head> <body> <nav class="navbar navbar-default navbar-fixed-top"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar"> <span class="sr-only"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="/">  RuTracker</a> </div> <div id="navbar" class="navbar-collapse collapse"> <form action="/search.php" method="GET" class="navbar-form navbar-left"> <div class="form-group input-group"> <input type="text" placeholder="" value="" class="form-control" name="q"> <span class="input-group-btn"> <button class="btn btn-primary" type="submit"><i class="glyphicon glyphicon-search"></i></button> </span> </div> </form> </div> <!--/.navbar-collapse --> </div> </nav> <div class="container"> <h1><?=htmlspecialchars($torrent[name])?></h1> <table class="table table-striped"> <tbody> <tr> <th width="20%">:</th> <td><a href="magnet:?xt=urn:btih:<?=$torrent[hash]?>"><i class="glyphicon glyphicon-magnet"></i> Magnet</a></td> </tr> <tr> <th width="20%">:</th> <td><?=bytesToSize($torrent[size])?></td> </tr> <tr> <th width="20%"> :</th> <td><?=$torrent_date?></td> </tr> <tr> <th width="20%">:</th> <td><a target=_blank href="http://rutracker.org/forum/viewforum.php?f=<?=$torrent[cat_id]?>"><?=htmlspecialchars($torrent[cat_name])?></a></td> </tr> <tr> <th width="20%">:</th> <td><a target=_blank href="http://rutracker.org/forum/viewtopic.php?t=<?=$torrent[topic_id]?>"> #<?=$torrent[topic_id]?></a></td> </tr> </tbody> </table> </div> <!-- /.container --> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script> <script src="http://getbootstrap.com/dist/js/bootstrap.min.js"></script> </body> </html> 


That's all. We got a fully working site with a database from RuTracker, with a quick search and user-friendly interface. I deliberately did not add search filtering by categories, sorting, pagination, etc., so that the code would be as clean as possible. If there is interest, I will tell all about it in the comments or in a separate article.

Thank you all for your attention. Write questions, I will answer all.

Source: https://habr.com/ru/post/273777/


All Articles