Such a need may arise when geo-targeting information, the operation of map services, the analysis of statistics of site visits.
The problem is solved as follows.
1. We pump out and correctly fold the whois database.
2. For each subnet define the city.
3. We pack the base into a binary file.
')
But why do it all by yourself, when there are already ready-made solutions. One of them (the full version) I post here for a test and get feedback.
And now a little more detail on the tasks.
1. We extort and correctly fold the whois database - for this we write a multi-thread rocking parser, which selects all relevant information in the 0.0.0.0-255.255.255.255 ranges.
Difficulties - correctly calculate significant ranges of subnets and agree with whois services so that they are not banned and allowed to do the appropriate load.
The result is approximately 20 GB of recordings.
2. For each subnet, we define a city - for this we write in parallel a working recognizer, which with the help of a dictionary of spelling options for cities, telephone codes and other signs determines the city.
Difficulties - the creation and maintenance of dictionaries of options, large amounts of information.
The result is several million dedicated and recognized subnets.
3. We pack the base into a binary file - for this, the packer runs through all the ranges and collects data into the binary tree that returns the IP address to the city, coordinates, region, country, including national spellings.
Difficulties - compiling dictionaries with national spelling of city names, optimization of packer algorithms.
The result is a binary base of about 11 MB.
What we get in the end:
- geobase worldwide;
- affordable price;
- honest, not stolen base of matches IP-address -> city;
- names of cities, countries, regions in Russian;
- the ability to localize the database (technology allows the use of national alphabets, especially in the names);
- improved exUSSR support (no cities for you - ul.Lenina, etc.);
- interfaces for working with a binary base on c, php, perl;
- the technology of building various variants / versions of the base has been developed.
And here is the link to the full version of CNGeoip for testing -
www.cn-software.com/datastore.php?7ae24a71bad7583b551289f0b03062c9The link is valid until June 20, 2008.
As a feedback, I would like to receive suggestions for improvement, bug reports (it is better to send them here
www.cn-software.com/ru/contacts ), suggestions for product start-ups (is it real? Or only web 2.0 is now in high esteem).
added.
Yes, colleagues, we don’t feel sorry if until June 20 (while the link is in effect), someone other than the habrovchan will download the module and unsubscribe any suggestions - share the info with colleagues, all of a sudden someone will be useful.