📜 ⬆️ ⬇️

How I determined the provider by IP

Task setting is rather trivial: it is necessary to determine the provider by the user's IP address. This data should then be used in its analytics system, and it should also be possible to verify it with data, for example Google or Ripe.

I must say that the code that will be given in the article is not perfect. The programming language used is PHP (of course, it’s better to use C or Perl for similar tasks). DB - MySQL (here it is better to choose a database that is faster and can handle a large number of selections. For example, Tarantool). But in everyday life enough data technology \ languages.

Search database and extract data from it


So let's get started. Where do we get the data on providers, and it is also desirable for free? Very quickly, the choice fell on the RIPE service. The downside was that their entire database is stored in text files . For a while “googling” I did not find a converter for MySQL. Well, that complicated the task a bit. But it does not matter, remembering a fairly well-known video clip and a phrase from it: “You are an expert,” I decided to look for the description of the database files and write a data parser from a text format in MySQL.
')
But there was a small ambush. There is a description of the fields, but I could not find any connections between the tables (maybe, of course, I looked badly). But this has become quite tangible problem.

Then I began to think in the wrong direction and tried to somehow match the fields from the files on the fly. That is, I ran the script on parsing one file (it’s very “crutch” that I remember about it like this: A holiday, you got drunk and you don’t remember anything. And the next day your friends tell you how you ran around naked and screamed and you are ashamed.)

As a result, after a few hours (yes, just a few hours), I realized that I was creating nonsense and decided to drive all the data into MySQL. The benefit in the course of studying the data was to understand what fields I need for matching.

Import data to our database


Since the fields were known, I created the following tables:

CREATE TABLE `test_inetnum` ( `sip` bigint(20) unsigned NOT NULL, `eip` bigint(20) unsigned NOT NULL, `org` varchar(255) NOT NULL, PRIMARY KEY (`sip`,`eip`), KEY `idx_org` (`org`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `test_organization` ( `organisation` varchar(255) NOT NULL, `org-name` varchar(255) NOT NULL, PRIMARY KEY (`organisation`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `test_route` ( `sip` bigint(20) unsigned NOT NULL, `eip` bigint(20) unsigned NOT NULL, `origin` varchar(255) NOT NULL, PRIMARY KEY (`sip`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `test_aut_num` ( `aut-num` varchar(255) NOT NULL, `org` varchar(255) NOT NULL, PRIMARY KEY (`aut-num`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

The fields in the sip and eip tables are the decoded ip2long IP addresses of the beginning of the range and its end.

Class for parsing data:

 <?php namespace Ripe; class Ripe { /** * @var string -      RIPE */ public $folder; /** * @var int - ,   */ public $time = 86400; /** * Ripe constructor. * * @param string $folder */ function __construct($folder = '') { if (empty($folder)) { $folder = __DIR__ . '/../../config/ripe'; } if (!is_dir(__DIR__ . '/../../config/ripe')) { mkdir(__DIR__ . '/../../config/ripe', 0777, true); } $this->folder = $folder; } /** *    . * * @param string $file * * @return bool */ function needUpdate($file = '') { $current = time(); if ($current - filectime($this->folder . '/' . $file) > $this->time) { return true; } return false; } /** *  . * * @param string $url */ function download($url = '') { if (!empty($url) && true === $this->needUpdate($url) ) { system("cd " . realpath($this->folder) . " && wget ftp://ftp.ripe.net/ripe/dbase/split/" . $url . " && gunzip $url"); } } /** *    . * * @param string $file * @param $callback */ function read($file = '', $callback) { if (is_file($this->folder . '/' . $file)) { $f = fopen($this->folder . '/' . $file, 'r'); if (!empty($f)) { $string = []; while (($buffer = fgets($f)) !== false) { //   if ("\n" != $buffer) { $string[] = trim($buffer); } else { $blockArray = []; //     for ($i = 0; $i < $ic = count($string); $i++) { if (strpos($string[$i], ': ') === false) { break; } else { $arBlockData = explode(": ", $string[$i]); if (!empty($arBlockData)) { $key = trim($arBlockData[0]); $value = trim($arBlockData[1]); if (!empty($blockArray[$key])) { $blockArray[$key] .= $value . "\n"; } else { $blockArray[$key] = $value; } } } } // callback if (!empty($callback) && is_callable($callback) && !empty($blockArray) ) { call_user_func_array($callback, [ $blockArray, $file ]); } $string = ''; } } } } } } 

I write the data in blocks, that is, I received the block, immediately recorded it. For better performance, of course, it is better to insert data in batches. The code for downloading and writing to the organizations database looks like this:

 $ripe->download('ripe.db.organisation.gz'); $ripe->read('ripe.db.organisation', function ($block, $file) { $ripeRoute = new \Ripe\RipeOrganization(); $ripeRoute->save($block); }); 

Fields in the table are recorded for the same keys that are in the files. If you need to change the fields, then the save method has a “before insertion” handler, with which you can change the field names and write to changed ones.

Data analysis and getting the final table.


Now that the data has been received, you can create a summary table, according to which the definition itself will pass.

 CREATE TABLE `test_ripe` ( `sip` bigint(20) unsigned NOT NULL, `eip` bigint(20) unsigned NOT NULL, `org_code` varchar(100) NOT NULL, `org_name` varchar(255) NOT NULL, PRIMARY KEY (`sip`,`eip`), KEY `idx_org_name` (`org_name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

Now the simplest thing is to transfer the necessary data to the final table.

Everything is solved by a couple of requests:

 select * from test_inetnum as t1 inner join test_organization as t2 on t1.org = t2.organisation; select * from test_route as t1 inner join test_aut_num as t2 on t1.origin = t2.`aut-num` inner join test_organization as t3 on t2.org = t3.organisation; 

The resulting data is pushed into the test_ripe table and get happiness.

results


The results exceeded my expectations. The definition of the provider works quite accurately (tested on the address pool). Also, a nice bonus was that, according to this database, the definition works better than 2ip.

Well, actually determine the provider for the current table can be such a query:

 SELECT * FROM `test_ripe` WHERE `sip` <= '33554435' AND `eip` >= '33554435' ORDER BY `eip` DESC LIMIT 1 

Sorting in the query is necessarily necessary, as it happens that one range is included in another.

Source: https://habr.com/ru/post/346590/


All Articles