When a problem arises, it seems to get the city and the tax (car) code of the region at the visitor’s address - it’s just that, it’s full of such pieces in the internet!
And then you look: some are paid, others cannot be deployed, others are possible, but this is resource-intensive, the fourth people do not know anything about the regions of the Russian Federation ...
And here the sick brain of a programmer with an obsessive idea hurries to the rescue:
"Do not have the others - do it yourself"
As soon as you begin to think in this vein - here, nginx has an excellent geoip module, which “is not only fast, but also optimized to the point of impossibility”. But here is an ill luck, he does not understand any of the known database formats (MaxMind, Sypex, ipgeobase).
')
A couple of hours in an embrace with a python and now there is a good converter that pulls out all we need from the site ipgeobase.ru.
(Yes, there were rumors that everyone had been fired there for half a year now, but the bases are regularly updated, which is good news)
And so that there are no concerns, I will comment on the code below (if not interested, you can immediately flip to the setting)
Code
1. Download the databaseThere is nothing complicated, requests + zipfile:
archive = requests.get("http://ipgeobase.ru/files/db/Main/geo_files.zip") if archive.status_code != 200: error("IPGeobase no answer: %s" % archive.status_code) extracteddata = ZipFile(StringIO(archive.content)) filelist = extracteddata.namelist() if "cities.txt" not in filelist: error("cities.txt not downloaded") if "cidr_optim.txt" not in filelist: error("cidr_optim.txt not downloaded")
2. We load the dictionary of regions REGIONS = dict(l.decode("utf8").rstrip().split("\t")[::-1] for l in open("regions.tsv").readlines())
where regions.tsv is a list of automotive / tax codes of regions, of the form:
66
77
78 -
3. Get a dictionary of citiesFor each city we need to know its id, name and region code:
CITIES = {} for line in extracteddata.open("cities.txt").readlines(): cid, city, region_name, _, _, _ = line.decode("cp1251").split("\t") if region_name in REGIONS: CITIES[cid] = {'city': b64encode(city.encode("utf8")), 'reg_id': REGIONS[region_name]} if cid == "1199":
I note that here immediately, with an eye to the future, the utf-8 name of the city is encoded in base64, to expand the possibilities of use (for example, in the nginx logs), without the need to work with transliteration.
4. We glue the address ranges and cities for line in extracteddata.open("cidr_optim.txt").readlines(): _, _, ip_range, country, cid = line.decode("cp1251").rstrip().split("\t") if country == "RU" and cid in CITIES: database["".join(ip_range.split())] = CITIES[cid]
Obviously, if the country is not Russia, then there will be no regions or cities in ipgeobase, and our tasks do not need such ranges.
5. Generate files for geoip module with open("region.txt", "w") as reg, open("city.txt", "w") as city: for ip_range in sorted(database): info = database[ip_range] city.write("%s %s;\n" % (ip_range, info['city'])) reg.write("%s %s;\n" % (ip_range, info['reg_id']))
Nginx configuration
For everything to work, you need to enable the
nginx.org/ru/docs/http/ngx_http_geo_module.html module in nginx geo,
put the generated files in a known place and add such a config to the http section:
geo $region { ranges; include geo/region.txt; } geo $city { ranges; include geo/city.txt; }
After such manipulations, two variables $ city and $ region will appear in nginx, which can be used anywhere:
- at least in the log:
log_format long '$time_iso8601\t$msec\t$host\t$request_method\t$uri\t$args\t$http_referer\t$remote_addr\t$http_user_agent\t$status\t$request_time\t$request_length\t$upstream_addr\t$bytes_sent\t$upstream_response_time\t$city\t$region';
- at least sending in to the application:
location / { proxy_set_header X-City $city; proxy_set_header X-Region $region; proxy_pass http://backend; }
At the same time, in the geo module, by default, all the missing addresses will return an empty string, in which case the header will simply not be installed
In fact, such a module works just instantly, does not load nginx, and due to the easy automation of database updates, it is fairly accurate (it all depends on trust in the ipgeobase.ru databases). In this connection, there was a feeling that he might be useful to someone else. So I propose to use and, maybe, make converters to other data providers.
GitHub code (ipgeobase-importer branch)
PS After a while, after writing the article, I rewrote everything on Go and added support for MaxMind