Dynamic Pricing Based on LSTM - ANN in the Home Retail

It is no secret that machine learning methods began to penetrate everywhere in various business areas, optimizing, improving and even creating new business processes. One of the important areas is the question of setting the price for a product and here, with enough data, MO helps to do what was previously difficult to achieve - to recover the multifactor demand curve from the data. Thanks to the restored demand curve, it became possible to build dynamic pricing systems that allow price optimization depending on the pricing goal - to increase revenue or profit. This article is a compilation of my dissertation work, in which the LSTM-ANN dynamic pricing model was developed and tested in practice for 4 weeks for one of the home goods retailer's products.

I want to immediately note that in this article I will not disclose the name of the company in which the research was conducted (nevertheless, this is one of the companies from the list presented in the premises), instead I will simply call it Retailer.

Prerequisites

There is a price leader in the home goods retail market - Leroy Merlin. Sales volumes of this network allow them to maintain a minimum price strategy for the entire product range, which leads to price pressure on other market players.

Revenues and profits of major retailers in St. Petersburg as of December 31, 2017
')

In this regard, in Retailer, a different pricing approach is used:

The price is set at the level of the lowest of competitors;
Restriction on the price of the bottom: the purchase price + minimum mark-up, reflecting the approximate cost per unit.

This approach is a combination of cost method of pricing and price orientation of competitors. However, it is not perfect - it does not directly take into account consumer demand.

Due to the fact that the dynamic pricing model takes into account many factors (demand, seasonality, promotions, prices of competitors), and also allows you to impose restrictions on the proposed price (for example, from the bottom - covering costs), the system potentially gets rid of all one-sidedness and disadvantages of other pricing methods.

Data

For the study, the company provided data from January 2015 to July 2017 (920 days / 131 week). These data included:

Daily sales, including weekends, for 470 products (16 product groups);
Days of promotions in the store;
Days in which discounts were provided for the goods;
Prices for each of the 470 products;
Daily data on the number of checks throughout the network in St. Petersburg;
Prices of major competitors for most of the 470 products (data were taken once a week).

In addition to this data, I also added calendar dummy variables:

Season of the year (autumn / winter / summer / spring);
Month;
Quarter;
Day of the week;
Holidays;

Also, weather variables:

Rainfall - dummy;
Temperature;
The temperature deviation from the average in the season.

Directly analyzing the daily sales of goods, I found that:
Only about 30% of the goods were sold all the time, all other goods were either introduced for sale later than 2015, or were withdrawn from sale earlier than 2017, which led to a significant restriction on the choice of goods for research and price experiments. This also leads us to the fact that due to the constant change of goods in the line of the store, it becomes difficult to create an integrated pricing pricing system, however, there are some ways to get around this problem, which will be discussed later.

Pricing system

To build a price recommendation system for a product for the next period of time based on a model predicting demand, I came up with the following scheme:

Since, having trained the model on the data, we get the model that restored the multifactor demand curve, giving the input different prices of the goods, we will get the expected sales, depending on this price. Thus, we can optimize the price to achieve the desired result - to maximize the expected revenue or expected profit. It remains only to train a model that could well predict sales.

What did not work
After selecting one of the products for research, I used XGBoost before going directly to the LSTM model.

I did this in the hope that XGBoost will help me to throw away a lot of unnecessary factors (this happens automatically), and the ones that I have left to use for the LSTM model. I used this approach deliberately, because in order to avoid unnecessary questions in my dissertation defense, I wanted to get a strong and, at the same time, a simple justification of the choice of factors for the model, on the one hand, and on the other, simplification of development. In addition, I received a ready-made, draft model on which one could quickly try out different ideas in a study. And after that, going to a final understanding of what will work and what does not, make the final LSTM model.

To understand the problem of forecasting, here is the daily sales schedule for the first selected product:

The entire time series of sales on the chart was divided into average sales for the period, so as not to disclose the real values, but keep the look.

In general, a lot of noise, while there are pronounced bursts - this is the conduct of promotions at the network level.

Since for me it was the first experience in building machine learning models, I had to spend quite a lot of time on various articles and documentation in order for me to eventually get something done.

The initial list of factors that presumably affect sales:

Data on the daily sales of other products from this group, total sales in the group in pieces and the number of checks for all the stores of the chain in St. Petersburg with lags 1, 2, 3, 7, 14, 21, 28;
Data on prices of other products from the group;
The ratio of the price of the investigated product with the prices of other products from the group;
The lowest price among all competitors (the data were taken once a week, and I made an assumption that such prices will be valid for the next week);
The ratio of the price of the investigated product with the lowest price from competitors;
Sales lags by group (in units);
Simple average and RSI based on lags of sales of group products, total sales in the group and the number of checks.

A total of 380 factors turned out. (2.42 observations per factor). Thus, the problem of clipping is not significant factors was really high, but XGBoost helped to cope with this, significantly cutting the number of factors to 23 (40 observations per factor).

The best result I could achieve using greed search is as follows:

R ^ 2-adj = 0.4 on the test sample

The data were divided into training and test samples without mixing (since this is a time series). As a metric, I used the R ^ 2 indicator adjusted deliberately, since the presentation of the final results of the work was to be carried out before the commission, including consisting of business representatives, therefore, it was used as the most famous and easy to understand.

The final results diminished my belief in success, because the result of R ^ 2-adj 0.4 meant only that the prediction system would not be able to predict demand the next day well enough, and the price recommendation would differ little from the “finger to the sky” system.

Additionally, I decided to check how much XGBoost will be effective for predicting daily sales for a group of products (in jokes) and predicting the number of checks in the whole network.

Sales by product group:

R ^ 2-adj = 0.71

Checks:

R ^ 2-adj = 0.86

I think the reason that sales data for a specific product could not be predicted is clear from the graphs presented - noise. Individual sales of goods turned out to be too susceptible to randomness, so the regression construction method was not effective. At the same time, by aggregating the data, we removed the influence of randomness and obtained good predictive capabilities.

In order to finally make sure that it is a meaningless exercise to predict demand for one day ahead, I used the SARIMAX model (statsmodels package for python) for day sales:

In fact, the results are no different from those obtained using XGBoost, which suggests that the use of a complex model in this case is not justified.

At the same time, I also want to note that neither for XGBoost nor for SARIMAX were the weather factors significant.

Construction of the final model

The solution to the quality prediction problem was to aggregate the data into a weekly level. This reduced the effect of random factors, however, significantly reduced the amount of observed data: if the daily data was 920, then weekly only 131. The situation worsened by the fact that the number of factors remained almost unchanged (dummies were excluded on the days of the week), but the number of observations of the target variable shrunk dramatically.

In addition, my task was complicated by the fact that at that time, the company decided to change the product for which the experiment will be carried out using the model, so I had to develop a model from scratch.

The change of goods occurred on goods with pronounced seasonality:

Due to the transition to weekly sales, there was a logical question: is it adequate to use the LSTM model in general on such a small amount of data? I decided to find out in practice and, first of all, reduce the number of factors (even if it carried the potential damage in reducing relevant information). I threw out all the factors that are calculated on the basis of sales lags (average, RSI), weather factors (on the daily data, the weather did not matter, and the transfer to the weekly level, especially, lost some sense). After that, I traditionally used XGBoost to cut off other non-significant factors. Later, I additionally cut a few more factors, based on the LSTM model, just by eliminating the factors one by one, teaching the model again and comparing the results.

The final list of factors is as follows:

The ratio of price per kilogram of the investigated product and primer CERESIT ST 17 10 l .;
The ratio of the price of the investigated product and the product and primer CERESIT ST 17 10 l;
The ratio of the price of the investigated product and the primer EURO PRIMER 3 liters;
The ratio of the price of the investigated product and the minimum price of competitors;
Dummy variables for three network-level promotions;
Dummy variable spring, summer, autumn seasons;
Lags 1 - 5 week sales of the investigated product.

Only 15 factors (9 observations per factor).

The final LSTM model was written using Keras, it included 2 hidden layers (25 and 20 neurons, respectively), and the activator was sigmoid.

The code for the final LSTM model using Keras:

model = Sequential() model.add(LSTM(25, return_sequences=True, input_shape=(1, trainX.shape[2]))) model.add(LSTM(20)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='mean_squared_error', optimizer='adam') model.fit(trainX, trainY, epochs=40, batch_size=1, verbose=2) model.save('LSTM_W.h5')

Result:

The quality of the prediction on the test sample looked quite convincingly on the metric, however, in my opinion, it did not reach the ideal, because, despite a fairly accurate definition of the average sales level, bursts in individual weeks could be quite deviate from “ average level of sales, which gave a strong deviation of the sales forecast from reality on certain days (up to 50%). However, I have already used this model directly to experiment in practice.

It is also interesting to see what the restored price demand curve looks like. To do this, I drove the model across the price range and, based on the predicted sales, built the demand curve:

Experiment

Each week, the network provided data on sales for the previous week in St. Petersburg, as well as prices from competitors. Based on these data, I optimized the price to maximize the expected profit, saying the price that the network should set for the next week, which she did. This went on for 4 weeks (the deadline was agreed with the retailer).

Profit maximization was carried out with restrictions: the minimum price was the purchase price + fix. surcharge, the maximum price was limited by the price of the primer of the same manufacturer, only in a 10l pack.

The results of the experiment are presented in the tables below (all figures are divided by a certain value in order not to reveal the absolute values):

Sales Prediction:

Profit Prediction:

In order to assess the impact of the new pricing system on sales, I compared sales for the same period, only in previous years.

Summary results for 4 weeks:

As a result, we get a double picture: absolutely unrealistic predictions in terms of sales, but, at the same time, purely positive results in terms of economic indicators (both in profit and revenue).

The explanation, in my opinion, is that in this case, the model, incorrectly predicting sales, nevertheless caught the right thought - the price elasticity for this product was below 1, which means that the price could be increased, without fear of falling sales, which we saw (sales in units remained at about the same level as last year and the year before).

But do not forget that 4 weeks is a short-term period and the experiment was conducted on only one product. In the long run, the overpricing of goods in the store usually leads to a drop in sales in the whole store. To confirm my guess on this score, I decided, using XGBoost, to check whether consumers have a “memory” of prices for previous periods (if in the past it was more expensive “in general” than its competitors, the consumer goes to competitors). Those. will the average price level of the group for the last 1, 3 and 6 months provide for sales by groups of goods?

Indeed, the guess was confirmed: one way or another, the average price level for previous periods affects sales in the current period. This means that it is not enough to optimize the price in the current period for a single product - it is also necessary to take into account the general price level in the long term. That, in general, leads to a situation where tactics (profit maximization now) contradicts strategy (competitive survival). This, however, is already better off to marketers.

Given the results and experience in my opinion, the most optimal, pricing system based on sales forecast could look like this:

Rising from the commodity nomenclature to the half of the step above is to carry out cluster analysis and group conditional screwdrivers by similarity and forecast sales and set the price not for a single screwdriver, but for this subgroup - this way we avoid the problem of permanently removing and adding commodity nomenclatures.
Carrying out price optimization in the complex - not only for individual subgroups of goods, but also taking into account long-term effects. For this, you can use a model that predicts sales in the whole network, good, it turned out to be impressively accurate even in day sales.

Summing up the work done, I would like to say that for me, as an inexperienced person, in development in general and in MO methods in particular, it was difficult, however, everything turned out to be feasible. It was also interesting to check for yourself how these methods are applicable in reality. Having read many articles before, my eyes were burning from the fact that I would try to do everything on my own and I was in anticipation that I would get excellent results. The practice turned out to be harsh - a small amount of goods with a long sales history, noisy daytime data, mistakes in predicting sales volumes, the use of complex models is not always justified. Nevertheless, I got an unforgettable experience and learned what it means to use analytics in practice.

→ Based on the work done, I prepared a project in my repository

In the repository, you will find a dataset generated based on dependencies taken from real data, as well as a python script that allows you to conduct a virtual experiment on this generated data, suggesting you try your luck and outrun the model for the resulting profit by setting the price for the item. All you need is to download and run the script.

I hope my experience will help in determining the boundaries of the use of MO methods and will show that patience and perseverance can achieve results, even if you are not a professional in some field.

Source: https://habr.com/ru/post/421429/

All Articles