📜 ⬆️ ⬇️

Server clustering of markers on the map. From theory to practice

Hi Habr. The story begins with the fact that we decided to make a geo service with the possibility of placing labels on the map by the users themselves.
And when they decided to pour 1 million markers into the database, they realized that even if you only need to request markers within a certain radius, everything works very slowly and clustering on the client is also not an option :)

And somewhere under this forest is manhattan.



')
Clustering is a very interesting task, there is a client one, there is a server one and all large map services support it. There are many types of clustering, but in my search I chose Grid clustering. Yandex employees told about it in several of their reports at conferences. It is used in Yandex maps for clustering markers.
Yandex clustering


Some theory


Grid clustering is based on the Mercator projection , in short it is the projection of the spherical surface of the globe onto a rectangular plane. This is just all atlases, maps, etc. As the name implies in this clustering, we cover the map with a grid and calculate the number of markers in each cell of this grid.

Terms:

A tile (tile) is in fact simply dividing a map into squares for a specific zoom. At the first level, our map will consist of 1 square (tile) at level 0, at level 1 of 4 squares (tiles), at the second level at 16 at the third at 64, and so on. In fact, it is 4 to the degree level.


Quadkey is a unique identifier of a tile at any level. At the first level, we number tiles 0,1,2,3 from right to left, from top to bottom. Then at the next level for each tile from the top level we do the same number, but already adding the quadkey parent, i.e. 00,01,02,03,10,11,12,13 etc. It becomes more clear from the picture:

Finally, we got a number in 4-hour numbering system. Now we have each tile has a unique identifier at any zoom level. From the zoom, latitude and longitude, you can get the quadkey tile itself; Any point can be converted to a quadkey of the corresponding zoom level. This is how quadkey will look like for specific coordinates:
longitude -79.377807617187500
latitude 43.653785705566406
quadkey 13940830302567 (base4: 03022313122033033011213)

Description of this approach in the English language from Microsoft:
Bing Maps Tile System
In this article, the implementation of all these transformations is written in C #, I quickly rewrote it in PHP:
TileSystem.php
class TileSystem { const EarthRadius = 6378137; const MinLatitude = -90; const MaxLatitude = 90; const MinLongitude = -180; const MaxLongitude = 180; const MaxZoom = 23; private static function clip( $n, $minValue, $maxValue) { return min( max( $n, $minValue), $maxValue); } public static function mapSize( $levelOfDetail) { return 256 << $levelOfDetail; } public static function GroundResolution( $latitude, $levelOfDetail) { $latitude = self::clip( $latitude, self::MinLatitude, self::MaxLatitude); return cos( $latitude * pi() / 180) * 2 * pi() * self::EarthRadius / self::mapSize( $levelOfDetail); } public static function mapScale( $latitude, $levelOfDetail, $screenDpi) { return self::groundResolution( $latitude, $levelOfDetail) * $screenDpi / 0.0254; } public static function latToPixelY( $latitude, $levelOfDetail) { $latitude = self::clip( $latitude, self::MinLatitude, self::MaxLatitude); $sinLatitude = sin( $latitude * pi() / 180); $y = 0.5 - log((1 + $sinLatitude) / (1 - $sinLatitude)) / (4 * pi()); $mapSize = self::mapSize( $levelOfDetail); $pixelY = (int) self::clip( $y * $mapSize + 0.5, 0, $mapSize - 1); return $pixelY; } public static function longToPixelX( $longitude, $levelOfDetail) { $longitude = self::clip( $longitude, self::MinLongitude, self::MaxLongitude); $x = ($longitude + 180) / 360; $mapSize = self::mapSize( $levelOfDetail); $pixelX = (int) self::clip( $x * $mapSize + 0.5, 0, $mapSize - 1); return $pixelX; } public static function pixelXToLong( $pixelX, $levelOfDetail) { $mapSize = self::mapSize( $levelOfDetail); $x = ( self::clip( $pixelX, 0, $mapSize - 1) / $mapSize) - 0.5; $longitude = 360 * $x; return $longitude; } public static function pixelYToLat( $pixelY, $levelOfDetail) { $mapSize = self::mapSize( $levelOfDetail); $y = 0.5 - (self::clip( $pixelY, 0, $mapSize - 1) / $mapSize); $latitude = 90 - 360 * atan( exp(-$y * 2 * pi())) / pi(); return $latitude; } public static function pixelYToTileY( $pixelY) { return $pixelY / 256; } public static function pixelXToTileX( $pixelX) { return $pixelX / 256; } public static function tileXYToPixelXY( $tileX, $tileY, &$pixelX, &$pixelY) { $pixelX = $tileX * 256; $pixelY = $tileY * 256; } public static function tileXYToQuadKey( $tileX, $tileY, $levelOfDetail) { if ($levelOfDetail == 0) return "0"; $quadKey = ""; for ($i = $levelOfDetail; $i > 0; $i--) { $digit = 0; $mask = 1 << ($i - 1); if (($tileX & $mask) != 0) { $digit++; } if (($tileY & $mask) != 0) { $digit++; $digit++; } $quadKey .= $digit; } return $quadKey; } public static function latLongToQuadKeyDec( $lat, $long, $zoom) { return self::quadKey4ToDec( self::latLongToQuadKey( $lat, $long, $zoom)); } public static function latLongToQuadKey( $lat, $long, $zoom) { $pixelX = \foci\utils\TileSystem::longToPixelX( $long, $zoom); $pixelY = \foci\utils\TileSystem::latToPixelY( $lat, $zoom); $tileX = \foci\utils\TileSystem::pixelXToTileX( $pixelX); $tileY = \foci\utils\TileSystem::pixelYToTileY( $pixelY); return \foci\utils\TileSystem::tileXYToQuadKey( $tileX, $tileY, $zoom); } public static function pixelXYToQuadKey( $pixelX, $pixelY, $zoom) { $tileX = \foci\utils\TileSystem::pixelXToTileX( $pixelX); $tileY = \foci\utils\TileSystem::pixelYToTileY( $pixelY); return \foci\utils\TileSystem::tileXYToQuadKey( $tileX, $tileY, $zoom); } public static function pixelXYToQuadKeyDec( $pixelX, $pixelY, $zoom) { return base_convert( self::pixelXYToQuadKey( $pixelX, $pixelY, $zoom), 4, 10); } public static function quadKeyToTileXY( $quadKey, &$tileX, &$tileY, &$levelOfDetail) { $tileX = $tileY = 0; $levelOfDetail = strlen( $quadKey); for( $i = $levelOfDetail; $i > 0; $i--) { $mask = 1 << ($i - 1); switch( $quadKey[$levelOfDetail - $i]) { case '0': break; case '1': $tileX |= $mask; break; case '2': $tileY |= $mask; break; case '3': $tileX |= $mask; $tileY |= $mask; break; default:; } } } public static function tileXYToQuadKeyDec( $tileX, $tileY, $levelOfDetail) { return base_convert( self::tileXYToQuadKey( $tileX, $tileY, $levelOfDetail), 4, 10); } public static function quadKey4ToDec( $quadkey) { return base_convert( $quadkey, 4, 10); } } 


Go to practice


Area of ​​visibility


Here it is necessary to take into account an important point, you do not need to request all the clusters on the map on a specific zoom, but you need to receive only those that are within the scope of the client (in our case, this is android and web).

Options for obtaining clusters in scope:

We chose a lazy client . Why?

What will be the logic of the server?

Storage


To store quadkey-s, we need to choose a zoom level, take 23, it will be quite enough. DB will use MySQL. The marker storage table will have the following structure:
 CREATE TABLE `marker` ( `id` int(11) NOT NULL AUTO_INCREMENT, `latitude` FLOAT(19,15) NOT NULL, `longitude` FLOAT(19,15) NOT NULL, `quadkey` bigint(20) NOT NULL DEFAULT 0, PRIMARY KEY (`id`), INDEX `quadkey ` (`quadkey `) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

The quadkey index allows us to easily sample the number of markers in a tile (cluster).

To store quadkey in MySQL, you can use bigint (20), just translate from a 4-h system into a decimal system and into a database and application we work with a 10-h quadkey.

Cluster acquisition code


The code for getting clusters will look something like this:
  public function getClusters( $quad_key, $zoom) { $q1 = $quad_key . str_repeat("0", TileSystem::MaxZoom - $zoom); $q2 = $quad_key . str_repeat("3", TileSystem::MaxZoom - $zoom); $mask = $q1; $mask_plus = 1; $mask = $quad_key . str_repeat("3", $mask_plus); $mask = $mask . str_repeat("0", TileSystem::MaxZoom - $zoom - $mask_plus); $q1 = TileSystem::quadKey4ToDec( $q1); $q2 = TileSystem::quadKey4ToDec( $q2); $mask = TileSystem::quadKey4ToDec( $mask); return $this-> findBetweenQuadKeys( $q1, $q2, $mask); } public function findBetweenQuadKeys( $quad_key_l, $quad_key_r, $mask) { $entities = []; $sql = "SELECT `quadkey`, MIN(id) AS id, COUNT(id) AS count, AVG(longitude) AS longitude, AVG(latitude) AS latitude "; $sql .= "FROM `marker ` "; $sql .= "WHERE `quadkey` BETWEEN $quad_key_l AND $quad_key_r "; $sql .= "AND `is_deleted`= '0' "; $sql .= "GROUP BY (`quadkey` & $mask) "; if( $stmt = $this->_connection->query( $sql)) { foreach( $stmt as $row) $entities[] = $this->_createEntityFromRow( $row); } return $entities; } 

The $ mask variable allows us to group the quadkey-c with a specific zoom. The variables $ quad_key_l and $ quad_key_r allow you to count quadkey-and all markers for a given parent, ie if $ quad_key_l = 03022313122033033000000 and $ quad_key_r = 03022313122033033033333 we will find in the database all markers for 030223131220330330.
This is what the resulting sql query will look like:
 SELECT `quadkey`, MIN(id) AS id, COUNT(id) AS count, AVG(longitude) AS longitude, AVG(latitude) AS latitude FROM marker WHERE quadkey between 15499682906112 and 15499683168255 GROUP BY (quadkey & 15499683151872); --   4-  -- WHERE quadkey between "03201203031012000000000" and "03201203031012333333333" -- GROUP BY (quadkey & "03201203031012330000000") 


What happened


Unfortunately, screenshots with clusters based on 1 million markers did not survive (although it was something), but there is a current small base, and this is how it looks:
web

Android


The coordinates of the clusters are in the place of the largest cluster of markers in the tile and not in its center, and all thanks to a simple averaging of the coordinates in the sql query.

results


A service using this clustering is running and running. The approach itself is very convenient for further scaling.
To avoid the constant recalculation of the number of markers in the cluster, cachesting caching is simply added.
If we consider this topic further, then tiles are actively used by map services for drawing maps. Something similar is also used in game development, etc.

Sources


www.mapbox.com/guides/how-web-maps-work
msdn.microsoft.com/en-us/library/bb259689.aspx

Source: https://habr.com/ru/post/268621/


All Articles