
To determine a country by IP, special databases are needed consisting of ranges of IP addresses and their respective countries. Usually such databases are distributed in the form of CSV or SQL files for use in a DBMS, or binary files of a special format.
The February base GeoLite Country, the free version of GeoIP Country by MaxMind, was chosen for testing.
Several popular solutions and my “bicycle” on this topic took part in the testing.
Testing participants
Mysql
MySQL will be used as the experimental DBMS. In which the table is created, consisting of IP-ranges and numbers of countries, IP are transformed into integer and on them indexes are built. The structure of the table looks like this:
')
CREATE TABLE `ip2country ` ( `ipn1` INT(10) UNSIGNED NOT NULL, `ipn2` INT(10) UNSIGNED NOT NULL, `num` TINYINT(3) UNSIGNED NOT NULL, PRIMARY KEY (`ipn1`), INDEX `ipn2` (`ipn2`) ) ENGINE=MyISAM;
For MySQL, 3 queries will be tested.
- Simple
SELECT `num` FROM `ip2country` WHERE `ipn1` <= INET_ATON(' IP ') AND `ipn2` >= INET_ATON('IP')
- Between.
SELECT num FROM `ip2country` WHERE INET_ATON('IP') BETWEEN `ipn1` AND `ipn2`
- Subselect.
SELECT num FROM (SELECT * FROM ip2country WHERE `ipn1` <= INET_ATON('IP') ORDER BY `ipn1` DESC LIMIT 1) AS t WHERE `ipn2` >= INET_ATON('IP')
GeoIP API
GeoIP will use the native API for PHP, and the database in binary format. It will be tested in two modes:
- Standart - the default mode.
- Memory - caching the database in memory.
SxGeo v2
A few words about my "bike". About 6 years ago, after studying the country definition solutions for IP available at that time, I was impressed with the speed of the GeoIP binary format. But he, as it seemed to me, had a lack of a large number of moves around the file to find the desired IP. There was an interesting idea about its implementation. Which was quickly implemented and, surprisingly, turned out to be much faster than I expected. For a long time, Sypex Geo has been used in its projects.
Recently decided to implement some more ideas for optimization. As a result, a version of Sypex Geo 2 (abbreviated SxGeo). The database file was 25% smaller than the first version, and at the same time the speed increased 1.7-2 times.
The main advantages over GeoIP and other solutions.
- The small size of the base is just over 4 bytes per range. For example, the GeoIP binary base weighs 1.2 MB, while SxGeo 2 weighs 0.62 MB.
- Very high processing speed (see the test results).
- The minimum number of reads from disk (3 + 1 * N, where N is the number of IP).
- Additional modes for batch processing.
SxGeo will be tested in three modes:
- File - the usual mode recommended for single IP processing.
- Batch - batch processing mode, designed to process multiple IP addresses at a time.
- Batch + Memory - in this mode, the database cache is also used in memory. The fastest mode, but requires more memory, because the entire base file is loaded into memory.
Geobaza
Also out of competition was tested algorithm Geobaza. Out of the competition, because the native binary file was used, with a significantly larger number of ranges. Geobaza showed about 2000-3000 IP / s, there was a very large variation in results. If the creators of Geobaza read this article and send the file generated by the February GeoLite Country, I’m happy to add it to the testing.
Testing
For testing, a PHP script was written in which an array of 10,000 random IP addresses was generated each time it was started. After that, all algorithms were tested on this array. This testing method was chosen to ensure that the algorithms were equal.
Tested on a server running FreeBSD 8 and PHP 5.2.17. It was also tested on Win 7 x64 LAN, PHP 5.3.9, the proportions are about the same, so only the FreeBSD results are presented in the tables.
The test was run 10 times, the averaged data are shown in the graph.

The easiest MySQL queries were the slowest. The reason for such slow work becomes obvious if you look at the EXPLAIN of these queries.
EXPLAIN SELECT num FROM `ip2country` WHERE ipn1 <= INET_ATON('88.88.88.88') AND ipn2 >= INET_ATON('88.88.88.88') LIMIT 1;
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE ip2country range PRIMARY PRIMARY 4 NULL 51059 Using where 1 SIMPLE ip2country range PRIMARY PRIMARY 4 NULL 53852 Using where 1 SIMPLE ip2country range PRIMARY,ipn2 PRIMARY 4 NULL 51587 Using where
The first result is a simple index
PRIMARY KEY (`ipn1`) , the second for a composite index
PRIMARY KEY (` ipn1`, `ipn2`) , the third for two indices
PRIMARY KEY (` ipn1`), INDEX `ipn2` (` ipn2`) . As you can see for a composite index of rows for subsequent iteration more than in other cases. I test without LIMIT 1, and in this case EXPLAIN writes that the index is not used, although this option actually works faster than with LIMIT.
The option with nested SELECT is much faster. The query shows that indexes in MySQL, when they are normally used, work very quickly, approaching specialized binary formats.
GeoIP showed that it is still preferable to use it than MySQL. I was embarrassed that the use of caching in memory gives such a small increase, less than 10%. Picked up in geoip.inc, found the culprit. It turned out to be the following code:
if ($gi->flags & GEOIP_MEMORY_CACHE) {
If you comment out the lines with mb_internal_encoding, then as a result, the speed rises to 6600 IP / s - already a more tangible increase in the use of caching in memory. The coding in this case, we do not care, maybe there were some glitches with GeoIP City.
As for SxGeo, I think there are no comments. He and in the normal mode works very quickly, and in the mode Batch + Memory allows you to get an increase of another 40%.
Those
interested can
download and potest SxGeo 2 . Wishes and bug reports are welcome.
UPD. I tested more queries in different combinations of indexes, it turned out that if in the first 2 queries I added LIMIT 1, then MySQL starts to get very blunt, somewhere 3-5 times slower.