📜 ⬆️ ⬇️

Business cases use Data Mining. Part 1

Hi, Habr.
I am very glad that the topic of Data Mining is interesting to the community.

In this topic (and if you like it, in a series of topics) I will tell you what examples of using Data Mining are in Russian and not only business. Why am I writing about this? I work in a company that is closely related to the CC RAS ​​(Computing Center of the Russian Academy of Sciences), which allows us to have an excellent research department and develop new projects, applying domestic achievements in mathematics. In this topic there will be more business than science, but if the latter still interests you, then you are here: mmro.ru or here: www.machinelearning.ru

So, let's go:

')
Today, almost every big business has a huge cloud of data that has been collected and stored over the years. The main task of Data Mining is to find non-trivial dependencies in raw data that will allow solving a specific business problem.

Example1. Retail (retail networks).


A large retail chain, such as Kopeyka, Perkrestok, Pyaterochka, Auchan, has hundreds of stores throughout the Russian Federation, tens of thousands of active goods. Sales data for each product (SKU) in each store at any time (day or hour) is stored in the company's accounting system.
The trading network must order goods to its stores every day. Those. Every day in the matrix, for example [5000 X 10 000] should be the value - how much to carry this product?

If the retailer orders less than the actual demand, it will LOSE due to the shortage (and lose the marginal cost), if the network orders more goods than the actual demand, it will receive the LOSS due to the cost of storing goods in the warehouse, frozen funds , damage to the goods after the expiration date. These two types of losses are called out-of-stock and over-stock, respectively.
image

What to do? Based on the accumulated history of product movements in each store and each product, you can learn from precedents and build a predictive model that will take into account:
1. Weekly seasonality (for example, vodka is sold starting on Thursday, energy is sold more on Fridays)
2. Annual seasonality (for example, more beer is bought in summer)
3. Holidays (for example, Soviet champagne is sold 100 times more than the average of March 8, December 31, February 23)
4. Promotions (for example, a surge in sales of pesicola in September is not an annual seasonality, but the effect of a promotion with Britney :))

In any data-mining task, it is important to correctly clear the data before working and building models. Therefore, in the case of sales, it is important to be trained not on real sales, but on “restored demand”. What is it?
Suppose you sold candies evenly during the month, but in the last week they were not delivered and sales were equal to 0. This does not mean that there was no demand (and there will be no demand in the last week of the next month either :)). Therefore, in this situation, it is important to restore demand in places of no sales.

What to do if you have a new product (for which there is no history yet)? In this case, it is important to look at the history of the product group. For example, the appearance of tea “icetea lipton peach” can be predicted for the “cold teas” product group, while not forgetting to take into account the factor “how the appearance of a new product in history was reflected in sales in the group as a whole”.
The same thing happens with new stores - how many goods to order in the newly opened store? You need to find a “similar” store and predict the first time based on its history, and then smoothly switch to the history of the new store.
These are all tasks of working with data in data minnig.

Daily forecasting systems in retail process gigabytes (and somewhere terabytes) of data, make a forecast and answer the question: How many of each specific product to order at a particular store to reduce financial costs and take into account (predict) demand as much as possible.
If you have 1-2 stores and several thousand positions - a person will predict you better than any machine, but when you have WallMart and hundreds of thousands of products on the shelves - no army of analysts and commodity researchers can cope with the solutions of this problem, which is why in retail chains so closely Pay attention to the automation of business processes.
FACT: Improving the predictive model can reduce the cost of the trading network by 1-2 percent of turnover. And now think about what kind of money it is, given the fact that the turnover of the largest Russian networks is from $ 1 billion.

I think that on this example1 I’ll finish the trade networks. If interested - write questions, comments - I will answer. If you generally like it, next time I will talk about telecom, and how they solve the problem of “increasing subscriber loyalty,” taking into account tens and hundreds of millions of subscribers.

Source: https://habr.com/ru/post/65647/


All Articles