📜 ⬆️ ⬇️

How does Qlean use Machine Learning?

image Every day more and more orders arrive, and they need to be somehow distributed among the performers. It seems nothing complicated: the order came - give it to the Clin. But not everything is as simple as it seems. Our cleaners do not have a fixed work schedule, they can work when they want to, refuse to virtually any orders (and these cleaners, alas, are done quite often). Therefore, the distribution of orders - one of the most difficult tasks on which we work.

MAIN PROBLEMS


One of the biggest problems is the manual distribution. We have a lot of orders that need to be quickly distributed. Where do these orders come from?


We call such orders “last minute”. They are quite a lot, and hope that the wipers will disassemble them in the application is not worth it. Therefore, we distribute them manually. The call center employees ring up the cleaners and offer them orders. This is a time consuming process, requiring time and resources.

On the day, it is distributed from 50 orders manually, in the season an adism happens at all - maybe 300. It takes a lot of time and resources.
')
Another problem is backlog orders. Every day we do not fulfill about 2% of orders because we cannot or do not have time to find cleaners on them. If we do not do them, we give the client a free cleaning. Every day we give about 20 free clean-ups, and in season this number can reach 200-300.

How do we solve these problems?

1. Clinical application

Cleaners can independently disassemble orders using an application that has a tape with orders for each day.

image

These tapes differ for different wedge segments. For example, novice cleaners will never see the orders of our regular customers. This visibility is controlled by the matrix. Numbers mean how long the cliner sees the order in the application.

image

There are other limitations, we follow them, and each of them affects our metrics.

2. Auto-assignment

Together with the rapid growth of orders, the number of “bad” orders is growing. “Bad” - those that ultimately do not take place or will be canceled. We needed to do something about it, and therefore we decided to hack a little and make ML to predict such orders.

Especially we did not become wise. We invented about 30 different signs: the time from the moment the order was created, the time since the last order, how much the customer canceled the orders, how much he brought us money for all the time, and so on.

They took ready libraries

import numpy as np import pandas as pd from sklearn.ensemble import GradientBoostingClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score 

Loaded features

 features = pd.read_csv('train/train.csv', header = 0) X = features.drop(['cancel'], axis=1) y = features.cancel 

We divided the sample into training and test.

 X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=1234) 

Trained by gradient boosting

 gb_clf = GradientBoostingClassifier(learning_rate=0.2, n_estimators=70, max_depth=3, verbose=False, random_state=241) gb_clf.fit(X_train, y_train) 

Checked on a test sample

 y_true = y_test y_pred = final_gb_clf.predict_proba(X_test)[:,1] roc_auc=roc_auc_score(y_true, y_pred) 

roc-auc: 0.881929812245

Made a forecast

 features = pd.read_csv('predict_features.csv', header = 0, index_col='order_id') y_predict = gb_clf.predict_proba(features)[:,1] print(y_pred) 

Got a good roc-auc = 0.88. And threw in the production.

Now, every morning, we are given an estimate of the probability of order cancellation in the database.
Expertly, we chose a threshold value for estimating probability: 0.7.
Now we are actively using the model in our business processes. For example, we call such bad orders in advance and they have a separate flow in the application.

For different tests, we sometimes cut it down, and this affects our business metrics.
For example, on April 7, they caught a bunch of outstanding orders, because before that they turned off the model.

image

We decided to go ahead and distribute orders automatically. But they immediately faced with the fact that we do not know which wipers will work and when.

Got them a schedule. But, as it turned out, the wipers, in principle, do not comply with it. And to force them is contrary to our foundations.

Once again, ML. We made a similar model, but predicting the likelihood of us to go to work. Now we have two models spinning, every day we have current info about orders and clinkers and we can distribute them.

Algorithm

We are looking for orders that need to be distributed, we clean out the “bad” ones with the help of the obtained estimate of the probability of cancellation. We select then on each order of kliner. First, those that the client wants to see - the chosen ones - and assign them to the order. If all of them are busy, then we select other cleaners that the client already had, but simply were not evaluated at all. If they are busy, just look for the nearest free clinoer.

results

We expected a significant reduction in manual distribution and we managed to reduce it by half. We also thought that auto assignment would reduce the percentage of outstanding orders to us, but this did not happen, since the main contribution to the solution of this problem was made by the model for estimating the probability of canceling orders.

findings

Do not be afraid to use ML in your tasks - making a simple model is easy. It took us only a couple of days to work one model for writing models. But now we can use it in our business processes.

Ask questions in the comments, we will be happy to answer them.

Source: https://habr.com/ru/post/354616/


All Articles