📜 ⬆️ ⬇️

Dynamic pricing, or How Yandex.Taxi predicts high demand



Previously, to call a taxi, you had to call to different numbers of dispatching services and wait for the car to go half an hour or more. Taxi services are now well automated, and the average car delivery time for Yandex.Taxi in Moscow is about 3-4 minutes. But it is necessary to go to the rain or end a mass event, and again we may face a shortage of free cars.

My name is Anton Skokurev, I lead a platform efficiency development team at Yandex.Taxi. Today I will tell Habr's readers how we have learned how to predict high demand and additionally attract drivers so that users can find a free car at any time. You will learn how to form a coefficient that affects the cost of the order. Everything is far from being as simple as it may seem at first glance.
')

Dynamic pricing task


The most important task of dynamic pricing is to provide the ability to book a taxi at all times. It is achieved by using the surge pricing coefficient, which is calculated by the calculated price. We call him simply "Surge." It is important to say that Surge not only regulates the demand for taxis, but also helps attract new drivers to increase supply.

If you set Surge too large - we will reduce the demand too much, there will be an excess of free cars. If set too low, users will see “no free machines”. We must be able to choose the coefficient at which we will walk on thin ice between the lack of free cars and low demand.

What should this coefficient depend on? Immediately comes to mind the dependence on the number of machines and orders around the user. Now you can simply divide the number of orders by the number of drivers, get the coefficient and turn it into our Suj by some formula (possibly linear).

But there is a small problem in this task - it may be too late to count orders around the user. After all, an order is almost always an already occupied car, which means that an increase in our coefficient will always be delayed. Therefore, we consider not created orders, but intentions to order a car - pins. Pin - this is the mark "A" on the map, which puts the user launching our application.



We formulate the problem: we need to consider the instantaneous values ​​of the machines and pins at some point of the user.

We count the number of pins and cars around


When a pin's position changes (the user chooses point “A”), the user application sends new coordinates to the backend and a small sheet of additional information that helps to estimate the pin more accurately (for example, the selected rate).

We try to adhere to microservice architecture, where each microservice is engaged in separate tasks. Surge is engaged in counting Surja. It registers pins, saves them to the database, and also updates the cast of pins in the RAM, in which they fit quite well. Cache lag in such work is only a few seconds, which is acceptable in our case.

A few words about the database
When registering, each pin is asynchronously added to the MongoDb with the TTL Index , where TTL is the “lifetime” of the pin, at which we consider it active to calculate the multiplying factor. The user does not wait until we perform these actions. Even if something goes wrong, losing a pin is not such a big tragedy.



Hot cache is built with an index on geohashu . We group all pins by geohash, and then collect pins for the desired radius around the order point.

We do the same with cars, but in another service called Tracker, which Surger just goes with the question “how many drivers are in this radius”.

So we consider the instantaneous values ​​of the coefficient.



Caching


Case : you are standing in Moscow on the Garden Ring and want to order a car. In this case, the price jumps often enough and it is annoying.

Already knowing the mechanics, it can be understood that this may be due to the fact that drivers accumulate at the conditional traffic lights at the time of the request of Surge and also quickly leave from there. Because of this, the Surge and price can jump.

To avoid this, we cache the value of Surge by users. When a user comes for a Surge, we are looking at whether there is a saved value for the user in the allowed radius (a linear walk through all the saved Surges of the user). If there is, we give it away, otherwise we count the new one and also save it.

It worked well, but there are other situations.

Case : 2 users request Surge. One orders 30 seconds after the other, when the cars from the traffic light from the past case have already left. We get a picture where 2 users ordering almost simultaneously can have a markedly different Surge.

And here we go from the cache by user to cache by position. Now, instead of caching the value of surdzh only by the user, we begin to cache it on the already familiar geohash. So we almost fix the problem. Why almost? Because there may be differences on the boundaries of geocaches. But the problem is not so significant, because we have anti-aliasing.

Smoothing


Perhaps, reading a case about a traffic light, the thought occurred to you that it was somehow unfair to count an instant Surge depending on the traffic light. We also think so, so we figured out how to remedy the situation.

We decided to borrow from the machine learning method of the nearest neighbors for the regression problem in order to determine how much the value of the instant Surge differs from what is happening around.

The learning stage, as in the formal description of the method, consists in memorizing all the objects — in our case, the calculated values ​​of the Surge in the pin, we are already doing all this at the moment of loading all the pins into the cache. Things are going to be small - to calculate the instantaneous value, compare it with the value in the zone and agree that we cannot deviate from the value in the zone too much.

So we get a system with a quick response to events and allowing you to quickly read the value of the multiplying factor.

Surdz driver's card


In order to communicate with the driver, we need to be able to display the Surge map in the driver's application - a taximeter. This gives the driver feedback on whether there is demand in the zone where he is now and where he should go to get the most expensive orders. For us, this means that more drivers will arrive in a zone with high demand and settle it.



We live with the paradigm that the driver device is a rather weak device. Therefore, the rendering of the Surdzh hexagonal grid lies on the side of the backend. The client comes to the backend for the tiles. These are cut raster images for direct display on the map.

We have a separate service that periodically collects the casts of the Surger microservice pins and calculates all the meta information needed to render the hexagonal grid: where is the hex and what is the Surge in each.

Conclusion


Dynamic pricing is a constant search for a balance between supply and demand, so that free cars are always available to users, including through the mechanism of attracting additional drivers to areas with high demand. For example, we are currently working on a deeper use of machine learning for calculating surdzh. As part of one of the tasks in this area, we are learning how to determine the probability of pin conversion to an order and take this information into account. There is plenty of work here, so we are always glad to see new specialists in the team.

If you are interested in learning about some part of this large topic in more detail, please write in the comments. Feedback and ideas are welcome too!

PS In the next post, my colleague will talk about the use of machine learning to predict the expected time of arrival of a taxi.

Source: https://habr.com/ru/post/429226/


All Articles