📜 ⬆️ ⬇️

RFM analysis on the knee (Excel)

Good day! In the summer of 2014, working as an ordinary analyst and suffering from procrastination, he participated in the creation of an online clothing store. Successfully “filed” for this project, a management accounting system , found in the eyes of the owner the halo of analytics god in general, and Excel in particular)) Since then, the owner, being a stupid man, although terribly lazy, has attracted me to in the slightest degree close to analytics tasks. The result of one of these tasks and I want to share. Under the cut is my version of the implementation of RFM-analysis. It will be interesting to the owners of a small B2C business who do not have a significant research budget, as well as to anyone interested in the practical use of Excel in business.


Oftop: with the RFM tag on Habré only 2 articles, and both of the corporate blogs. Strange, why so little content on the topic, because on Habré a lot of people from e-commerce related area?

However, I quit pouring water and suggest, for a start, to agree on terms. Further, RFM-analysis implies the analysis of customer value for the company. In fact, a slightly advanced version of ABC analysis , only with a focus not on products, but on customers. At the forefront is the formalization of the size of the benefits of each client for the business. In order to identify this benefit, each client is considered according to the following parameters:
')
R ecency - novelty (time since the last purchase)
F requency - frequency (frequency of purchases for the period)
M onetary - monetization (the cost of purchases for the period)

Given :

1. The history of online store sales in the form of .xlsx upload, like
image

Sic! Do not look for meaning in numbers, everything is semi-randomly changed by 1-2 orders of magnitude

2. TK from the owner, the full version of which sounds as simple as the phrase “RFM-analysis you can do?”

Result :

At first, I spent half the day thinking: “How to do all this with the help of calculated pivot table objects so that it is beautiful.” As a result, I scored on beauty and made it in an hour using an intermediate sheet and the usual formulas like "= IF", etc.

3. Intermediate calculations

To calculate the time since the last purchase, the current date is required (standard function in Excel = TDA ()) and the date of the last customer purchase. Since the unloading was an unordered data-date-customer-amount-of-purchase array, there was a difficulty in identifying the last purchase date for each of the customers. The problem was solved by sorting on the entire volume of dates in the unloading (please do not blame for the "collective farm style", but at that moment I scored on beauty, because I wanted to implement the solution that was in my head as quickly as possible). Green columns are the original information. In the first line I left formulas for understanding, and sorted by column in descending order (the column was created using concatenate)


4. Parts of the “Total” sheet

Now we collect the result of the RFM analysis on one sheet. We start from the list of clients (sorting does not matter) - we copy the list of clients from the first sheet and leave only unique entries using standard functionality (Data - Remove duplicates). In column B, using the CDF, we draw the date of the last sales order. The formula in column C counts the number of customer orders for the entire unloading. In column D, the sum of orders per customer is considered in a similar way. And column E calculates for us the number of days since the last customer purchase.

Sic! An example of a formula for column E is specified in cell K1, and in column E itself only values ​​are stored to show the result.

5. Recency (time since the last purchase)

The essence of the selected formula is as follows: we look at which of the five equal intervals from 0 to the maximum (highlighted in red in the formula) is the value of each cell in column E and put a mark from 1 (the client who bought something from us a year ago) to 5 (the client bought or recently).


6. Frequency (frequency of purchases for the period) and Monetary (price of purchases for the period).

The formulas are identical, so consider the example of Frequency. In this case, we have divided the entire set into 3 equal in terms of the number of members of the set of the gap and look to which of these gaps is the value in column C with scoring 1 (the client who buys from us less often than the rest), 3, 5 (the client who buys from us more often than the rest ).


For those who find it difficult or lazy to understand the definition of the median in Wikipedia : the median is the value dividing the data set into 2 equal parts. Example: Average arithmetic value of 5 clients who made 1, 2, 2, 2, 100 purchases = 21.4 (the average temperature in the hospital that does not tell us anything); median for the same series = 2.

Conclusion : I didn’t write about adding all the indicators together and sorting in descending order of the most right column of the Itoto sheet — I think it’s understandable)) My goal, to create a system on my knee, was fully achieved. I give "as is" . Writing these lines, I understand that my definition of the median and an example are not the easiest either (for those who did not have a university statistics). If someone offers a simpler and more understandable option, I will replace it.

Source: https://habr.com/ru/post/243219/


All Articles