📜 ⬆️ ⬇️

Decomposition, a task without a complete set of data, desktops and marketing

In our reality, we never have complete input data for tasks that on paper seem to be purely mathematical. Here is an example from the practice of one of the regions with a store. In June, they call you from the radio and say they are ready to repeat the placement of advertising at a 40% discount. These are 192 videos in two weeks. Last time you ordered this ad on “try”, because the expectation of profit exceeded the cost of advertising.

The problem is that during the placement period two big things happened:

Now we need to separate one from the other and understand what worked and how. You can not evaluate advertising without taking into account the recession, and recession without advertising. Here is your sales schedule for the period before the holidays, during and after:

Departure city, vertical sales in units by weeks

It shows that sales are falling after advertising on holidays. Falling on holidays is the norm for all cities. True, we, roughly speaking, do not know what the schedule would be without advertising. Same? A little lower? Much lower?
')

Without advertising, events would follow one of these scenarios. For what - I do not know.

So, the task is to understand how advertising has affected sales, despite the fact that there were at least two global factors changing them. This task occurs almost always when you need to evaluate the effectiveness of advertising or promotions.

Let's start from the other side. We know that in another city the schedule was:


Control City 1

But this is not enough for us, we need a third point. Another fully controlled source in the third city:


Control city 2

We put them on each other:


Our surveyed city is green.

It turns out a picture that still looks like some kind of garbage from colored lines. Systematics are not visible, since the sales volumes in absolute values ​​in cities are different.

What is needed is normalization, not assembly of the overall sales schedule. Since one of the cities almost triples the turnover of the two remaining ones (there is a larger population), then at the superposition we just see a noisy signal from this particular city, like this:



Let us turn to the relative sales volume per cycle, that is, we normalize the graphs. We get this.


In the third week of the period, there is a peak in the studied city and a decline in the controls. But further in one of the controls you can see an incomprehensible peak.

We return to the data for the three cities. I remind you, we need to highlight two trends: the seasonal decline and the rise in sales in the green city after radio advertising. First we need to clean the graphics from local distortions.

Unit check


It should be understood that the data in our unloading arrived grouped by weeks. This makes sense, since one week corresponds to one cycle of rise-fall by day, plus contains data on delivery orders, which, for example, were received and shipped in this cycle, but shipped only in the next. The week has a pronounced "seasonality" inside: for example, on Monday there is much less sales than on Wednesday.

Compare our cycle and the cycle of advertising. Radio advertising and holiday recession last much more than a single cycle. This means that all the sharp jumps within the cycles themselves are local distortions, in our case noise. We have every right to use the week.

Model checking


We build the standard profile of the standard week, removing emissions. If our assumption is correct, then all the weeks in all cities will be about the same. But suddenly it turns out that during the cycles of advertising a green city in a blue city (where this advertisement was not) local bursts were noted. There is a waste from the usual sales day more than 40%.

Most likely, there was something that is not explained by the model. First, we test the hypothesis of incorrect transfer of data from reality. I chose a simple way - I called the head of the call center and asked when the last time there were such bursts that the change just vomited. Given that the call center also builds the reference week profiles and forms the sizes of the shifts of them, I will get a test of my emission hypothesis within weeks. Roughly speaking, if the forecast is incorrect, then everyone in the call center will remember the overload. In our case, yes, in KC remember the peak in this period. That is, it is not, for example, a stub synchronization server, which for some reason has loaded the data with one report over several days.

But I, in my analysis, and KC in the formation of shifts, proceed from similar premises, which means there may be a general error in the calculations. We need to dig further, checking the causes of these recessions. Ok, let's switch to the site toolkit: I look at the chart for traffic sources for these numbers. Yeah, the first peak is a small effect from the publication of a large local blogger, it can be removed from the schedule as part of solving the season / radio problem. The second peak detected is traffic from search queries. Fortunately, judging by the shape of the chart, I know what it can be. I check the posting reports - yes, one of the games appeared on the local TV channel in the children's program. Also remove.

Here, with the help of mad skillz, I approximately show how it looks like:


The approximate schedule with deductions of the described effects

Total - we have two test cities with similar graphs after decomposing other known factors. Roughly speaking, we are now comparing normal sales of the red city, normal sales of blue (excluding blogger and television) and normal sales of green, taking into account the influence of the radio. The graphs show that both control cities for the holidays go down. The green city rises in the holidays, but is not being restored at the same time as they.

Ok, let's go find out further. After the May holidays, people return to the city and begin to participate in various summer events. The graphs of the control cities show a rise - yes, in both cities we brought the desktops to some major events of a citywide scale. On the graph of the investigated city there is no such thing. Need a hypothesis explaining what it is. Options:
  1. Belated seasonal shift: what if they are going later for holidays in this city? Then our calculations are not suitable - it is necessary to compare with a similar shift in the schedules of control cities.
  2. Local problem of any nature.
  3. Rolling back from advertising - suddenly we moved people who were thinking over - and exhausted the market a little?

The first and third reasons seem extremely unlikely, so let's start with the second. I look at the schedule of sales in the store - normal distribution by the hour. Sellers also did not notice anything strange, except for complaining about the rain, which is why few people come. Aha So, we need to check the weather during the weekend events - if there were few people, then we will explain the recession. Check - yes, for sure, a downpour. That is, the decline in the studied city is explained by the weather (it happens). And if it were not for this, there would be the same rise as in the control cities.

Conclusion


We cleaned the three schedules of noise, made a decomposition on different events - and got a comparison of the city with radio advertising and cities without. Most importantly, I have numerical data on different vectors for increasing or decreasing sales (accumulated from the history over 3 years), that is, I can evaluate not only “it works — it doesn't work”, but also estimate the profit from this advertisement. For example, with an accuracy of 10-30%.

Why we do not take into account the long-term effects of advertising? Because in such calculations, in such cases, it is necessary to rely only on the direct effect on sales. Of course, the people from the radio themselves will tell five times what a lasting effect it has (and this is true for individual customers), but after placement - as a whole - there will be no effect.

Result? It can be seen that advertising brought more net profit than we spent on it, and ROI (return on investment) in the region of 130-160%. So, now an informed decision appears - advertising can be continued.

Source: https://habr.com/ru/post/202456/


All Articles