📜 ⬆️ ⬇️

How to find the nearest cafe, sight, free taxi through the eyes of a programmer

Services that solve any problems in the context of our location have firmly enough entered our lives. Most smartphones can, with Internet access, call us a taxi, calculate how long the bus will arrive, get directions based on traffic jams and different user preferences, or show friends nearby. Problems like finding the nearest cafes or sights have become trivial for them and can usually be solved without access to the world wide web. In this article I want to consider some tools for solving such problems and compare their performance with each other.

Formulation of the problem


You need to choose tools for developing a service where users periodically upload their location, while other users look for their “neighbors”. Examples of services that solve such problems can be taxi ordering services, social networks and games like Ingress .

The way to solve the problem


Some theoretical introduction is in the article , in more detail - in Wikipedia . Further, purely practical issues will be considered.
To solve the problem, adapter classes for several selected services will be implemented. The interface of these adapters is shown in the listing:

from abc import ABCMeta, abstractmethod class AbstractStorage(object): __metaclass__ = ABCMeta @abstractmethod def prepare_storage_for_experiment(self, test_data): pass @abstractmethod def experiment_search(self, test_data): pass @abstractmethod def experiment_update(self, test_data): pass @abstractmethod def clear_storage(self): pass 

Time measurement is performed using Profilehooks .
')

Accepted simplifications


  1. Hereinafter, all the code is written in Python; All of the tools described below can be operated from other common programming languages, unless otherwise noted. The ability to speed up the system, rewriting everything in a faster programming language like c / c ++ in this article will not be considered, although it may well be used in combat conditions if the expediency of such a solution is proved.
  2. In the given system, all requests are processed sequentially, which is equivalent to the presence of a queue of requests before the module in question and the operation of the module in one stream; when using the system in combat, the developed module will most likely process requests in several threads.
  3. All tests run on a laptop with the following hardware: 8 Gb RAM, Intel Core i5 2.6 GHz, SSD. The server hardware configuration is likely to be different.
  4. All the tools used will be used with the default configuration with the exception of the same amount of allocated RAM (where this point is configurable by standard means). Configuring the selected tools in this article will not be considered.

Implementation


A string (document or other - depending on the accepted terminology) consists of id and a pair of coordinates in the internal representation of the system. For each of the columns, an index is constructed in case the system allows doing this. In all implementations, the code responsible for the search and update will be presented. A link to the full project code on github will be given at the end of the article.

Implementation 1. MongoDB v3.2.6
Link to documentation on geo-search

Code responsible for testing search speed and updates:

 @timecall(immediate=True) def experiment_search(self, test_data): def find_point(point): cursor = self.collection.find( { MongoStorage.key_position: { '$near': { '$geometry': { 'type': "Point", 'coordinates': [point['lng'], point['lat']] }, '$maxDistance': 10000 } } } ) return cursor[0] if cursor.count() > 0 else None @timecall(immediate=True) def experiment_update(self, test_data): for t in test_data: self.collection.update_one( { MongoStorage.key_id: t["id"] }, { '$set': { MongoStorage.key_position: { 'type': "Point", 'coordinates': [t['position']['lng'], t['position']['lat']] } } } ) 

Explain for search query:

 { "queryPlanner" : { "plannerVersion" : 1, "namespace" : "taxi_geo_experiment_test_db.taxi_driver_collection", "indexFilterSet" : false, "parsedQuery" : { "key_position" : { "$near" : { "$geometry" : { "type" : "Point", "coordinates" : [ 37.3816328351611, 55.01604115262 ] }, "$maxDistance" : 10000 } } }, "winningPlan" : { "stage" : "GEO_NEAR_2DSPHERE", "keyPattern" : { "key_position" : "2dsphere" }, "indexName" : "key_position_2dsphere" }, "rejectedPlans" : [ ] }, "serverInfo" : { "host" : "host", "port" : 27017, "version" : "3.2.6", "gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25" }, "ok" : 1 } 

We see that geo index is used.

Implementation 2.1. PostgreSQL v9.5.2 using ST_DWithin
Link to documentation (postgis)

Code responsible for testing search speed and updates:

 @timecall(immediate=True) def experiment_search(self, test_data): cursor = self.db_connect.cursor() for item in test_data: request = "SELECT * FROM {table_name} " \ "WHERE ST_DWithin({column_geo},ST_MakePoint({lat},{lng}), 10000)".format( table_name=PostgreSQLStorage.table_name, column_geo=PostgreSQLStorage.column_position, lat=item["lat"], lng=item["lng"]) cursor.execute(request) search_result = cursor.fetchall() @timecall(immediate=True) def experiment_update(self, test_data): cursor = self.db_connect.cursor() for item in test_data: request = "UPDATE {table_name} set {update_column_name}=ST_MakePoint({lat},{lng}) " \ "where {id_column_name}={id}".format( table_name=PostgreSQLStorage.table_name, update_column_name=PostgreSQLStorage.column_position, id_column_name=PostgreSQLStorage.column_id, lat=item["position"]["lat"], lng=item["position"]["lng"], id=item["id"] ) cursor.execute(request) self.db_connect.commit() 

Explain for search query:

  Index Scan using geo_index on taxi_drivers (cost=0.14..10.51 rows=1 width=36) Index Cond: (driver_position && '0101000020E6100000C8EA77DD72C44B404C0305B698B04240'::geography) Filter: (('0101000020E6100000C8EA77DD72C44B404C0305B698B04240'::geography && _st_expand(driver_position, '10000'::double precision)) AND _st_dwithin(driver_position, '0101000020E6100000C8EA77DD72C44B404C0305B698B04240'::geography, '10000'::double precision, true)) 

We also see the use of geo-index.

Implementation 2.2. PostgreSQL v9.5.2 using ST_Distance
Link to documentation (postgis)

Code responsible for testing the search speed:
 @timecall(immediate=True) def experiment_search(self, test_data): cursor = self.db_connect.cursor() for item in test_data: request = "SELECT * FROM {table_name} " \ "WHERE ST_Distance({column_geo},ST_MakePoint({lat},{lng})) < 10000".format( table_name=PostgreSQLStorage.table_name, column_geo=PostgreSQLStorage.column_position, lat=item["lat"], lng=item["lng"]) cursor.execute(request) search_result = cursor.fetchall() 


The code responsible for testing the update rate does not differ from the implementation described in the previous paragraph.
Explain for search query:
  Seq Scan on taxi_drivers (cost=0.00..8241.00 rows=10000 width=60) Filter: (_st_distance(driver_position, '0101000020E61000003B2CDE5519AA4B402B1697185FED4240'::geography, '0'::double precision, true) < '10000'::double precision) 

In this case, the index is not used, all values ​​in the table will be viewed, which is much slower.
Learn more about EXPLAIN in PostgreSQL .

Implementation 3. Redis v3.2.0
Link to geofunctions documentation

Code responsible for testing search speed and updates:

 @timecall(immediate=True) def experiment_search(self, test_data): i = 0 for item in test_data: command = "GEORADIUS {key} {lng} {lat} {dist_km} km WITHCOORD WITHDIST".format( key=RedisStorage.KEY_DRIVERS, lat=item["lat"], lng=item["lng"], dist_km=10 ) result = self._redis.execute_command(command) @timecall(immediate=True) def experiment_update(self, test_data): for item in test_data: command_rm = "ZREM {key} \"{id}\"".format( key=RedisStorage.KEY_DRIVERS, id=item["id"] ) self._redis.execute_command(command_rm) command_add = "GEOADD {key} {lng} {lat} \"{id}\"".format( key=RedisStorage.KEY_DRIVERS, lat=item["position"]["lat"], lng=item["position"]["lng"], id=item["id"] ) self._redis.execute_command(command_add) 

There is no analog of explain for redis, since there is no need for such a command, redis is not intended for complex queries that may require such functionality.

It is easy to notice one more feature - in redis you cannot remove from a key (the closest analogue of a key in SQL is a table; a key can contain both a simple value, for example, a number, and a complex one, for example, many) one of the geo objects, for this you need to use internal representation: the ZREM command removes the value from the set.

Conclusion


Testing was conducted on 30,000 objects and the same number of requests. If necessary, you can test on other sets of values ​​by replacing the corresponding parameters in the code. The test results are presented in the table:
ToolTime to searchTime to update
MongoDB320.41514.275
PostgreSQL (ST_DWithin)114.8788.908
PostgreSQL (ST_Distance)1829.920-
(implementation and result are similar to PostgreSQL (ST_DWithin))
Redis1098.6045.016

All data in the table are presented in seconds, the average value of 5 tests.

Link to the project repository.

If you know another tool to effectively solve the task - write in the comments, I will consider with interest.

Update 1: Added PostgreSQL implementation using ST_Distance. This implementation does not use an index, respectively, the search takes longer.

Source: https://habr.com/ru/post/301460/


All Articles