📜 ⬆️ ⬇️

Ariadna. Why do I need another geocoder for OSM?

Hello!

Most recently, I finished doing a geocoder for my goals. Ariadna
Under the cut there is a story about why I did it and what it can do.

The article will not be a single line of code on Go. But there will be a complete description of the work of the geocoder and the problems I have met. And you can look at the code on the githaba.

Prehistory


I work in one of the Bishkek taxi services in Kyrgyzstan. We decided to optimize the location of coordinates at the address for a more optimal distribution of orders.
')
What we do not have:


What we have:


What can users enter?


Users can enter addresses in different formats:


And there are a lot of such options, for example
Kiev 28
Kiev Soviet
5-42
5 microdistrict Soviet 42
TSUM
cafe at Ashot
barrier
And so on

Formulation of the problem


Make a geocoder who would be able to take any data at the input and give them coordinates
Search language is only Russian.

How come to such a life


Before making my bike, I decided to do something that already exists.
Were:


What did not suit nominatim, which we had:


What liked Pelias:


As a result, I decided to abandon all three geocoders and make my tool for several reasons:

  1. I want to understand the data in OSM and import only the ones needed for the search.
  2. I can process geodata before indexing
  3. I don’t like javascript and node.js, hence no desire to do a search based on a pelias

Design


The following algorithm was laid:

  1. First, we obtain the geometry for large settlements (cities, capitals, villages, residential areas)
  2. We unload all possible addresses and correlate them to the desired residential area, city and other settlement, setting the desired value.
  3. We unload all roads
  4. We are looking for crossing roads
  5. We put everything in the index
  6. Are looking for

To implement, I chose Go, considering projects like pbf2json , golang-geo and many others for processing geodata. I also wanted to shake the skill in it.

Implementation


With receipt and parsing of the data with osm it seems understood. For residential areas use tags place = city, place = village, place = suburb, place = town, place = neighbors for filtering. For addresses, buildings addr: street + addr: housenumber, amenity, shop, addr: housenumber
All roads can be accessed using the highway tag.

There are difficulties with finding English names in Russian. How I tried to solve it:

  1. Simple automatic transliteration in Russian. The result was absurd and not correct. An example of data conversion was as follows: City House -> Tsiti House
  2. Let's try to convert this way. Receive a transcription of the word and translate it. It turned out something like Adrenaline rush -> Erdenalin Rash. Passable, but need a Russian accent, such as adrenaline rush.
  3. Approached such a mechanism. Automatically transliterate all data using the replacement dictionary. Still, simple and stupid transliteration works tolerably. The dictionary was filled in principle quickly after several runs on the data.

With this sorted out by this time we already get data that:

  1. Normalized and reduced to the Russian language
  2. The addresses are in the format Country, city, village or village, microdistrict or residential community, street, house

The next part of the quest is to find the intersection of the roads. I did it quickly and got a very slow implementation, O (n ^ 2) complexity. As a temporary exit, I use Postgres + postgis to find intersections until I have found a good algorithm for finding intersections.

The result was a good data parser with OSM, which puts the data in ElasticSearch. Which got the simple name importer

Automate it


Taking into account the fact that constantly pumping out and creating indexes in elastiksercha rather tired, the updater component appeared. There was also an automatic configuration in JSON format.

The process of downloading a file and importing it into elastic search has been automated. Plus, the opportunity to update the data in elastikserche without downtime, thanks to aliases.

How it works:

  1. Updater downloads file
  2. Finds out the current version of the index from the config
  3. Increment version and create new index
  4. Fills it with data
  5. Changes aliases
  6. Removes old index

Got such benefits from this:

  1. We write a config
  2. Run ./ariadna update
  3. Let's go drink coffee
  4. We receive the ready adjusted index.

Also for convenience, I screwed a simple web interface with a map and search capabilities.

Automatic data enrichment


In addition to OSM, we still have a lot of drivers and operators who clog up orders.
We have the name and coordinates accordingly.
Made such a scheme:

  1. Driver tracks are stored in drivers_data index
  2. OSM data is stored in the osm_data index
  3. They are combined via the addresses alias for which the address search is performed.

Data from drivers is recorded if we have an error in certain coordinates greater than 200 meters.

Total


The result is a geocoder who can:

  1. Search for coordinates by synonyms. for example ShVK - Champagne Winery
  2. Able to search for addresses in a specific radius (for example, for myself, I made a search for addresses 30 km from the city center)
  3. Search by the name of institutions (cafe at Ashot for example)
  4. Search for intersections
  5. Search for addresses in neighborhoods and lived arrays
  6. Do reverse geocoding
  7. Automatically updated with new data from drivers.

Consists of three components:

  1. Data importer
  2. Data updater
  3. Web interface

Minuses


  1. Tested for Kyrgyzstan only
  2. No demos
  3. No support for all addressing schemes

Therefore, I hope someone will help him finish and for a good search in other countries and cities.

If someone found the project interesting, then I am not against any criticism, the pool of requests, issues on the githaba and feedback as a whole.

Source: https://habr.com/ru/post/277043/


All Articles