
In Podolsk, near Moscow, in the Silikatnaya-2 microdistrict, there is one life hack - when it is already 9 o'clock in the courtyard, and the beer in the stores is no longer sold - just enough to cross the road to buy it. Across the Moscow road - you can find what you want to see in it.
It remains to solve only one problem - on a dark night, in an unfamiliar area, among similar high-rise buildings ... find out which side you are on.
Historically, this question is answered by a “reverse geocoder”. It is an important part of almost all cartographic APIs -
Google ,
Yandex , and even
OSM . But in most cases, his answer is intended for a person, and contains only a textual description of the location.
')
This is non-technological! And certainly impractical.
Esosedi , they ate this cactus for a couple of years, and then they just made their own reverse geocoder. The main thing is how and why.
Most recently, Habre was
searched for Death to Kashcheev (nested set and nesting of administrative headings), went to the
districts (displaying these regions on the map), and (did not)
get to the Yandex counter (direct geocoder). And now let's look at what a reverse geocoder is and why it is needed. And then we analyze the mechanics of his work.
Part 1.
So historical it happened that esosedi is "a place on the world map." Old school squares and geotagging. The only pity is that geo-coordinates for normal people are just a set of numbers. Normal people need an address.
For transfer of coordinates to the address and “reverse geocoders” are used. You are his coordinates, and he is your address. This text is enough for a person to understand the location of an object. And, humanly, I have no complaints about standard implementations. But this is not suitable for the “system” - here the text “match” itself is not stable (and will be slowed down), so the “normal” geocoders are looking for several sources at once, or at least according to various “providers”. Two requests for geographically close addresses can be found in different databases, and the answers "texts" do not match with each other.
And in the answers of the geocoder one can meet both transliteration, and spelling errors, and even different systems for assembling administrative division.
For example, the city of Moscow is sometimes encountered, which is the Region of Moscow, and sometimes also the City District of Moscow, in which the City of Moscow itself. Although technically it is right. Normalized in the image and likeness of other cities. The problem is that sometimes Moscow is just Moscow, and indeed it’s not the city for the flags in the answers. Thank you for the "Big Moscow" has not yet come up.
Another standard error of the Google geocoder is the “draw” of coordinates in the streets, not careful work with urban districts, abbreviations (Altai Republic), the use of various terrain names. Russia can be both Russia and the Russian Federation and the normal Russian Federation.
I decided not to wait for Autonomuses from the Republic of Chechnya to come after me (total of 5 spellings), and two years ago I made my geocoder based on the regions module.
Was ours - become yours. After a random
comment in the dadata topic, the geocoder became public.

Everything works simply - you give him the coordinates, and in return you get all the osmRelationId to which this point enters. With breakdown by administrative levels. Everywhere tsiferki, everywhere connection with other sources, everywhere "verifiability".
To visualize the work there is
an example -
jsfiddle.net/azvjukh8 , whose work I liked so much, that now the same thing flaunts on the main
esosedi .
The algorithm of the example is as follows:
- We take the current center of the map and send the request to the geocoder.
- We find in the response an object whose size suits us.
- Show it on the map. (Actually his parent).
- When you move the card - repeat.
Since the release of osme, many people have asked me to facilitate the search for the desired region - and finally one of the problems of
the administrative division highlighting module is being solved - the difficulty of finding the region number you need. Previously, it was proposed to go to the
data.esosedi.org face and look for its place there recursively plunging into the abysses of administrative division.
Now everything is easier - jsonp handle data.esosedi.org/geocode/v1?[lng=(ru|en)[&point=x,y[&seq=?[&callback=?] - it works 24 hours a day, 7 days a week, with a limit of 6 * requests per second using nginx ...
And it allows, on average, to make 3 times more requests than geocoders of the IPA Maps. And for those who will be able to distribute the load on the fronts (using DNS), approximately 20 RPS will be released (+ burst x4).
And most importantly - no houses and streets, to which a point may be illegally drawn. And sometimes a normal street can be located a couple of hundred kilometers from the desired point, in a completely different country. Only countries, neighborhoods, districts ...
Part 2
Geocoder works on the basis of geometry, which gives the handle regions. He just needs to find the smallest polygon where the point is included, and all the “higher” polygons. But it is not necessary to look for “higher” - this information already exists. But it was the geocoder who collected it. As already mentioned, the whole thing is based on OSM geometry. From OSM it is easy enough to pull out the geometry, but it is much more difficult to digest it. Especially glued into a clear hierarchical structure of administrative division.
In principle, the first stage didn’t really work out for me either - a lot of geometry in the current database simply does not exist, in particular, many cities for which for some reason someone (until now) has not created a relation.
The second “narrow” moment is that many of the relation have the wrong admin_level, or do not have it at all. Well, the flag is_in (who is subordinate to someone) is a rare beast. As a result, the most reliable option - to make a geometric "punching" of geometry.
At the entrance comes the landfill. Next, you need to find a certain point inside it and find all other polygons in which this point belongs. Sort them by size and re-sort them into final administrative nesting - a smaller object is always included in a larger one. With this, everything is clear.
Total task is divided into two:
1. Quickly find objects in which a point enters.
2. Quickly produce a more accurate “PointInPolygon” check.

In principle, this is the whole essence of the reverse geocoder, and its simplest implementation. But, in fact, the very first ambush will be - to find this very point, which we will then substitute in all the polygons.
Geometric center, center of mass, logarithms, integrals — it’s impossible to find a point that is guaranteed to be inside a polygon using conventional methods. The landfill can have any shape, but at least a hole exactly in its center.
Personally, I solved the problem simply - I take any point that can enter a polygon, for example, a geometric center, and check it with a "standard" algorithm.
The standard algorithm (Ray casting, O (n)) works simply - from infinity to the left a ray starts up, after which its intersections with the faces of the polygon are counted.
If crossed an odd number of times - we are inside. And the point suits us.
If even - outside.
And under the end point we use the middle of the segment from the first intersection point to the second - it is exactly in the polygon.There are a lot of algorithms in the world to determine whether a point belongs to a polygon, with completely different complexity. Some of them require preprocessing, some - not. Non-trivial options either translate a polygon into more “mathematical” forms, reducing algorithmic complexity, or simply reducing N (the number of vertices). For example, any, the most difficult, polygon can be simply cut into pieces. For this,
geojson-vt is well suited.

The second interesting point is the search for the minimum object that can cross the point. The smaller the object, the faster the check. And more precisely. Quad or R-trees work well for this purpose. But this is not really tube, so let's do it through Z codes.
Z code (N, Morton, Peano) is a number obtained from 2D coordinates (X, Y), such that objects above or to the left will always be able to do it less, and to the right and below - more. Nested set.
In fact, we can simply find all the objects in which one of the criminals knows the Z coordinate more than the desired one, and sort them by Z (the mapbox has a good library that uses this effect - an
earcut ).
Although, if it worked - the name of these codes would be different - Z code is zigzagging. When we try to search “lower and to the right” - it will lead him to the left. It saves here that under the mask of Z (orro) bit interleaving is hidden. You can always get an X or Y coordinate from the Z-code through the mask overlay, and further restrict the search by them ... Well, you can find the “exit” point (LITMAX) from the search region and the “entry point” (BIGMIN) completely
academically . By the way, this is not enough in the earcut.
In principle, using “spatial filling curves” is one of the most effective ways for 2D search, if you have a regular base with digits. For example, MySQL. Although, I still have the feeling that PostGIS ...
It remains only to load real geometry from the base, and to make more accurate calculations. Various optimizations are possible here, the main one of which is the basis of OSM data - there is no need to check the polygon as a polygon. Polygon is a relation - a set of individual segments. OSM is an almost normal "topological" database. Only small segments can be checked, often common to two likely answers.
How fast does it all work? - On a working laptop with concurrency> 10, it turns out about 600RPS. On the battle servers, on the one hand, faster, on the other hand, pings to Germany - 60 ms.
Total in the world has become another reverse geocoder more. It is not like everyone else - it is “verifiable” (the Wikipedia term), gives not only a “textual” description of the location, but also an ID, including from various databases on which to rely. He is not looking for a house, he is not looking for streets - only administrative nesting.
(And only the one that hit the current base. For example, there is no Yekaterinburg, only the urban district.)If you really need a “normal” geckoder, take
Nominatim , which works on top of OSM, or normal geocoders of normal API Maps.
... but then the "spirit of the hunt" will disappear ....If he is alive in you,% username%, then your sacrifice is an example on a feed -
jsfiddle.net/azvjukh8 .
Someone (dadata) such a geocoder may be useful for determining the area of ​​the street. Someone (delivery calculator) to filter locks.
Someone (me) he just allows you to spend your evenings.
Documentation for the response format is available at
https://github.com/esosedi/regions .
You can also read a bit of theory, but practice:
Or collect the gekoder himself:
PS: Restrictions on the sale of alcoholic products were removed on December 18, 2014 (January 5, 2015). Now the Oblast has federal regulations. But this news did not reach the shops in Podolsk, as it did to me.