📜 ⬆️ ⬇️

IP Geo. SQL query optimization



Hello, Habrahabr! My name is Babichev Maxim. This is my first article on Habrahabr, please do not judge strictly.

I found an article on Habré in which there were bases in two modifications: Countries and cities and only countries . Also in these archives there are small examples of using tables in php.
')
Just want to say that this article is designed for beginners, and not for advanced gurus.

Just in case, I will attach the structure of the tables
-- --   `net_city` -- CREATE TABLE IF NOT EXISTS `net_city` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `country_id` INT(11) DEFAULT NULL, `name_ru` VARCHAR(100) DEFAULT NULL, `name_en` VARCHAR(100) DEFAULT NULL, `region` VARCHAR(2) DEFAULT NULL, `postal_code` VARCHAR(10) DEFAULT NULL, `latitude` VARCHAR(10) DEFAULT NULL, `longitude` VARCHAR(10) DEFAULT NULL, PRIMARY KEY (`id`), KEY `country_id` (`country_id`), KEY `name_ru` (`name_ru`), KEY `name_en` (`name_en`) ) ENGINE = MyISAM DEFAULT CHARSET = utf8; -- --   `net_city_ip` -- CREATE TABLE IF NOT EXISTS `net_city_ip` ( `city_id` INT(11) DEFAULT NULL, `begin_ip` BIGINT(11) DEFAULT NULL, `end_ip` BIGINT(11) DEFAULT NULL, KEY `city_id` (`city_id`), KEY `ip` (`begin_ip`) ) ENGINE = MyISAM DEFAULT CHARSET = utf8; -- --   `net_country` -- CREATE TABLE IF NOT EXISTS `net_country` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `name_ru` VARCHAR(100) DEFAULT NULL, `name_en` VARCHAR(100) DEFAULT NULL, `code` VARCHAR(2) DEFAULT NULL, PRIMARY KEY (`id`), KEY `code` (`code`), KEY `name_en` (`name_en`), KEY `name_ru` (`name_ru`) ) ENGINE = MyISAM DEFAULT CHARSET = utf8; -- --   `net_country_ip` -- CREATE TABLE IF NOT EXISTS `net_country_ip` ( `country_id` INT(11) DEFAULT '0', `begin_ip` BIGINT(11) DEFAULT NULL, `end_ip` BIGINT(11) DEFAULT '0', KEY `country_id` (`country_id`), KEY `ip` (`begin_ip`) ) ENGINE = MyISAM DEFAULT CHARSET = utf8; -- --   `net_euro` -- CREATE TABLE IF NOT EXISTS `net_euro` ( `country_id` INT(11) DEFAULT '0', `begin_ip` BIGINT(11) DEFAULT NULL, `end_ip` BIGINT(11) DEFAULT '0', KEY `country_id` (`country_id`), KEY `ip` (`begin_ip`) ) ENGINE = MyISAM DEFAULT CHARSET = utf8; -- --   `net_ru` -- CREATE TABLE IF NOT EXISTS `net_ru` ( `city_id` INT(11) DEFAULT '0', `begin_ip` BIGINT(11) DEFAULT NULL, `end_ip` BIGINT(11) DEFAULT NULL, KEY `city_id` (`city_id`), KEY `ip` (`begin_ip`) ) ENGINE = MyISAM DEFAULT CHARSET = utf8; 


I'm more interested in SQL queries.

LONG_IP_ADDRESS, the number obtained using the ip2long () function in PHP .

 --       --  (1) SELECT * FROM ( SELECT * FROM net_ru WHERE begin_ip <= LONG_IP_ADDRESS – IP , ip2long() ORDER BY begin_ip DESC LIMIT 1 ) AS t WHERE end_ip >= LONG_IP_ADDRESS – IP  long 

After the query (1), get the desired city from the net_city table:

 --  (2) SELECT * FROM net_city WHERE id = -- (   ).city_id 

PHP code example:

 <?php //     $db_host = "localhost"; $db_user = ""; $db_password = ""; $db_database = "geo"; $link = mysql_connect ($db_host, $db_user, $db_password); if ($link && mysql_select_db ($db_database)) { mysql_query ("set names utf8"); } else { die ("db error"); } // IP-,    $ip = "79.134.219.2"; //  IP   $int = sprintf("%u", ip2long($ip)); $country_name = ""; $country_id = 0; $city_name = ""; $city_id = 0; //       $sql = "select * from (select * from net_ru where begin_ip<=$int order by begin_ip desc limit 1) as t where end_ip>=$int"; $result = mysql_query($sql); if ($row = mysql_fetch_array($result)) { $city_id = $row['city_id']; $sql = "select * from net_city where id='$city_id'"; $result = mysql_query($sql); if ($row = mysql_fetch_array($result)) { $city_name = $row['name_ru']; $country_id = $row['country_id']; } else { $city_id = 0; } } 

We get rid of the subquery in the query (1), using AND.

 --   (3) SELECT `city_id` FROM `net_ru` WHERE begin_ip <= LONG_IP_ADDRESS AND end_ip >= LONG_IP_ADDRESS 

And in this case we will get rid of the subquery. But remember about BETWEEN and write the query like this:

 --   (4) SELECT `city_id` FROM `net_ru` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip 


So the SQL is readable and shorter.

There is a separate request for the city. Let's unite the modified query (4) and (2).

 --   (5) SELECT * FROM `net_city` `city` JOIN ( SELECT `city_id` FROM `net_ru` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip ) AS `res` ON `res`.`city_id` = `city`.`id` 

It so happened that the IP address 79.134.219.2 in the net_ru database is not. But it is in the net_city_ip database.
Many resources combine requests like this:

 --   (5.1) SELECT * FROM `net_city` `city` JOIN ( SELECT `city_id` FROM `net_ru` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip ) AS `res` ON `res`.`city_id` = `city`.`id` UNION SELECT * FROM `net_city` `city` JOIN ( SELECT `city_id` FROM `net_city_ip` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip ) AS `res` ON `res`.`city_id` = `city`.`id` 

We see that the requests are identical. Let's combine two queries inside JOIN, we get:

 --   (5.2) SELECT * FROM `net_city` `city` JOIN ( SELECT `city_id` FROM `net_ru` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip UNION SELECT `city_id` FROM `net_city_ip` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip ) as `res` ON `res`.`city_id` = `city`.`id` 

The modified query (5.2) is good-looking, but we do not need all the fields.
Pull out:
  1. name_ru
  2. name_en
  3. region
  4. postal_code
  5. latitude
  6. longitude


 --   (6) SELECT DISTINCT `city`.`name_ru` `city_name_ru`, `city`.`name_en` `city_name_en`, `city`.`region`, `city`.`postal_code`, `city`.`latitude`, `city`.`longitude` FROM `net_city` `city` JOIN ( SELECT `city_id` FROM `net_ru` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip UNION SELECT `city_id` FROM `net_city_ip` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip ) AS `res` ON `res`.`city_id` = `city`.`id` 

Next we need to select the country in which the user is located. Add a JOIN connection to the query.

 --   (7) SELECT DISTINCT `city`.`name_ru` `city_name_ru`, `city`.`name_en` `city_name_en`, `city`.`region`, `city`.`postal_code`, `city`.`latitude`, `city`.`longitude`, `country`.`name_ru` `country_name_ru`, `country`.`name_en` `country_name_en`, `country`.`code` FROM `net_city` `city` JOIN ( SELECT `city_id` FROM `net_ru` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip UNION SELECT `city_id` FROM `net_city_ip` WHERE LONG_IP_ADDRESS BETWEEN begin_ip AND end_ip ) AS `res` ON `res`.`city_id` = `city`.`id` JOIN `net_country` `country` ON `country`.`id` = `city`.`country_id` 

Test the request.

To do this, we’ll get the value of LANG_IP_ADDRESS using PHP:

 <?php echo ip2lang('79.134.219.2'); # : 1334237954 

We substitute it in our request and execute it in phpMyAdmin.

Modified query (7) with substituted value
 --   (7)    -- ip2lang('79.134.219.2')  LONG_IP_ADDRESS SELECT DISTINCT `city`.`name_ru` `city_name_ru`, `city`.`name_en` `city_name_en`, `city`.`region`, `city`.`postal_code`, `city`.`latitude`, `city`.`longitude`, `country`.`name_ru` `country_name_ru`, `country`.`name_en` `country_name_en`, `country`.`code` FROM `net_city` `city` JOIN ( SELECT `city_id` FROM `net_ru` WHERE 1334237954 BETWEEN begin_ip AND end_ip UNION SELECT `city_id` FROM `net_city_ip` WHERE 1334237954 BETWEEN begin_ip AND end_ip ) AS `res` ON `res`.`city_id` = `city`.`id` JOIN `net_country` `country` ON `country`.`id` = `city`.`country_id` 


The result of the query:



1 total, request took 0.3408 seconds.

JOIN joins have a great effect on the speed of query execution. Writing JOIN with WHERE

 --   (8) --  JOIN   WHERE SELECT DISTINCT `city`.`name_ru` `city_name_ru`, `city`.`name_en` `city_name_en`, `city`.`region`, `city`.`postal_code`, `city`.`latitude`, `city`.`longitude`, `country`.`name_ru` `country_name_ru`, `country`.`name_en` `country_name_en`, `country`.`code` FROM `net_city` `city` JOIN `net_country` `country` ON `country`.`id` = `city`.`country_id` WHERE `city`.`id` = ( SELECT `city_id` FROM `net_city_ip` WHERE LONG_IP_ADDRESS BETWEEN `begin_ip` AND `end_ip` UNION SELECT `city_id` FROM `net_ru` WHERE LONG_IP_ADDRESS BETWEEN `begin_ip` AND `end_ip` ) 

Test the modified query:

Modified query using WHERE
 SELECT DISTINCT `city`.`name_ru` `city_name_ru`, `city`.`name_en` `city_name_en`, `city`.`region`, `city`.`postal_code`, `city`.`latitude`, `city`.`longitude`, `country`.`name_ru` `country_name_ru`, `country`.`name_en` `country_name_en`, `country`.`code` FROM `net_city` `city` JOIN `net_country` `country` ON `country`.`id` = `city`.`country_id` WHERE `city`.`id` = ( SELECT `city_id` FROM `net_city_ip` WHERE 1334237954 BETWEEN `begin_ip` AND `end_ip` UNION SELECT `city_id` FROM `net_ru` WHERE 1334237954 BETWEEN `begin_ip` AND `end_ip` ) 


The result of the query:



1 total, request took 0.1527 sec.

The modified request was executed more than twice faster.

A similar request can be obtained for countries.

This Geo-IP database was chosen to explain JOIN / UNION connections and query optimization.
I hope this article will help beginners understand how important optimization is and how it can be achieved. I recommend reading the article by user tuta_larson .

This Geo-IP database is very old and knows very few IP addresses. But you can create your own IP database and replenish it with the help of users.



Next, make an IP rating and, based on your own rating, “guess” the city from which the user is.

GeoIP took the database and information from the article: “GeoIP Base - Countries and Cities, September 2013” . Thanks to the netload user for a fascinating article written in 2013.

Source: https://habr.com/ru/post/254461/


All Articles