In this topic, I want to talk about a task that is very common in web projects - determining the location by ip-address. To begin with, in order to determine the location of a user, a certain geoip base is needed. Here are two popular free options:
IpGeoBase is a very good free base, but, unfortunately, only on Russian ip-addresses.
MaxMind is a huge database of ip-addresses of all countries. Provide a free lite version of the database. The accuracy of the database on Russian ip-addresses is not as good as that of IpGeoBase. Also provide some kind of API for working with your database, which allows you to make samples very quickly.
Suppose you downloaded these databases and uploaded them to your Postgresql database tables (the download itself is somewhat out of scope, if someone has a desire - I can tell you about the COPY command in the future and what they eat with). In general, you will get a table with this structure:
startip | endip | location_id |
---|
2130706433 | 2130706433 | one |
Here:
startip is the beginning of the block of ip-addresses in the long format
endip - the end of the block of ip-addresses in the long format
location_id -
location identifier (city, region, country, etc., maxmind even contains coordinates).
')
Decision number 1
The first time I encountered this, I simply decided to do it on the machine as follows:
1. Create an index on (startip, endip)
2. Create a simple function to convert ip-address to BIGINT:
CREATE OR REPLACE FUNCTION "public"."extract_long_from_ip" (ip text)
RETURNS bigint AS
$body$
SELECT (((elements[1]::bigint * 256) + elements[2]::bigint) * 256 + elements[3]::bigint) * 256 + elements[4]::bigint
FROM (
SELECT string_to_array($1, '.') as elements
) t;
$body$ LANGUAGE 'sql' IMMUTABLE;
3. Voila:
SELECT location_id
FROM geo.ip_blocks
WHERE extract_long_from_ip('93.158.134.8') BETWEEN startip AND endip;
And everything is so simple? - you ask. But never once. This query will work correctly, but very slowly. The thing is that postgresql does not know how to use an index in queries of the form "SOMETHING BETWEEN X AND Y". If the location needs to be determined quickly (and I had just such a case) - seq scan of the entire table is simply no good.
And now what? Transfer location by ip to the application? Not necessarily - the search can be significantly accelerated. For this we use the wonderful module -
ip4r .
Installation
su -c 'yum install postgresql-ip4r'
or
sudo apt-get install postgresql-8.3-ip4r
or simply download from the site.
We look for the file ip4r.sql in the contrib directory and install it in the base we need:
psql -U user -f "...../contrib/ip4r.sql" database
Table structure
The ip4r module provides two new types: ip4 and ip4r. The first corresponds to the IPv4 address. The second - a certain range of IPv4 addresses. The type ip4r is especially interesting to us - the fact is that it is indexable.
Let's change our table:
ALTER TABLE geo.ip_blocks ADD COLUMN ip_range ip4r;
UPDATE geo.ip_blocks SET ip_range = ip4r(startip::ip4, endip::ip4);
ALTER TABLE geo.ipblocks DROP COLUMN startip;
ALTER TABLE geo.ipblocks DROP COLUMN endip;
Create an index:
CREATE INDEX ip_blocks_idx ON geo.ip_blocks USING gist (ip_range);
That's all
We use this simple request and get the acceleration hundreds of times:
SELECT location_id FROM geo.ip_blocks WHERE ip_range >>= '93.158.134.8'::ip4;
UPD:All, of course, strongly depends on the hardware and the load of the base. But, for example, I got the following results:
NOTICE: 100 queries without ip4r 00:00:14.988
NOTICE: 100 queries with ip4r 00:00:00.008
UPD2:The comments suggested another solution for quick search by ip.
Ip4r is not used, the startip index is created and we use a small trick in the request:
CREATE INDEX ip_blocks_idx ON geo.ip_blocks USING btree(startip);
SELECT CASE WHEN extract_long_from_ip('93.158.134.8') <= endip
THEN location_id
ELSE NULL END AS location_id
FROM geo.ip_blocks
WHERE startip <= extract_long_from_ip('93.158.134.8')
ORDER BY startip DESC
LIMIT 1;
UPD3:From the comments - another place where you can take a geoip base. Based on maxmind lite, but not bad reworked + comes in a SQL database format or CSV. There's also a good description of how to work with this base.
http://ipinfodb.com/ip_database.php