📜 ⬆️ ⬇️

Geo index to search for new acquaintances or the revolutionary discovery of scientists from Austria

As you know, Badoo is a service for finding new people. Among other things, we allow people to look around and in the “game” we also show people who are close to you. This part of the "around" is very, very important. After all, a young man from Novosibirsk is much more interesting to meet a girl who lives five minutes away from him, and not in Vladivostok.
We still have not talked publicly about how we do it. But the new discovery of Austrian scientists so pleased us that we decided to do it. Let's get down to business.
Badoo works all over the world and our search works exactly the same regardless of where in the world you are located. How to effectively search for people around among all 200+ million users?
The decision is not trivial, to be honest. We need some kind of index, some way to immediately narrow the search. In the case of the globe, the simplest partition is a grid of geographic latitudes and longitudes. However, the problem with this grid is its unevenness. The cell at the north pole and the cell at the equator have completely different forms. This asymmetric partition introduces major problems if we want to evenly distribute the load across the search cluster.

How do you break the surface of the globe so that the cells are equal to each other? At least approximately. We in Badoo remembered one of the regular polygons, namely the icosahedron. Its faces are equal regular triangles. There are only 20 facets in the icosahedron, which is too small to effectively curb the load on the search cluster. But after all, its faces can also be divided into triangles.

We project the vertices and edges of the icosahedron onto the sphere of the Earth. Recursively divide the triangles defined by the icosahedron into 4 smaller triangles. And so two times. We get a total of 320 faces. Unfortunately, no longer equal. Next, with a program on python, we will generate ~ 100,000 lines of code in C, which are able to do the lon / lat transformation -> triangle.


')
Thus, we can compare any coordinate on the globe with a triangle or, in other words, with the cluster shard, which contains all users living in a given area of ​​the globe.
Such a “big” triangle, depending on the desired search accuracy, we can further break into triangles, more and more “smaller”. When the triangles become quite small compared to the radius of the earth, one can already consider them “flat” and build distances in “heights of triangles” rather than meters.



Each of these “small” triangles is uniquely defined by a triple (a, b, c), where a, b, c are numbers of intersecting lines.

How to effectively search for people in a small triangle, using our coordinate system, we will definitely tell you in the future if you are interested, but we would like to come back today and talk in more detail about the “big” triangles.

The solution we proposed for dividing the globe has drawbacks associated with both unevenness and accuracy. Despite the 320 shard, we often encounter distortions in the load.

And so, a couple of weeks ago, our friends contacted us - employees of the Austrian Institute of Science and Technology and told about their discovery.

Using the technique of searching for symmetric sections of multidimensional lattices developed by them and resource-intensive computer calculations, they managed to construct a new symmetric partition of the sphere into hexagons with a sufficiently large number of vertices. The resulting design they called "Hexasphere." On the site with the announcement of the article laid out an interactive model of this partition. We could not insert it here, so follow the link to play with it.



During the last week, we worked closely with the Austrians, refined our algorithm, and achieved an absolutely even distribution of load across shards.

Badoo for its users has become more accurate, for us the system has become more predictable and resistant to stress, and scientists from Austria, we hope, have seen an interesting use for their ideas.
In the next article, we will certainly describe in more detail how this algorithm works and show on the charts the improvements that we were able to achieve.

Marco Kevac, research developer

Source: https://habr.com/ru/post/254643/


All Articles