Active analytics for your project

We take the task of an e-commerce project (an online store or another b2c internet service) as a condition. We will equip this store with a cool team: clear guidance, quick on honest decisions by marketers, agile development (ready to respond quickly to changing requirements). We will give him some level of quantitative success (let it be from 1000 orders per day). Suppose that this project is still a startup (or recently was). And he will ever take over the world. But so far, he has not been able to introduce an ERP / CRM discharge system for working with large orders / customers.

What usually happens in the active stage of development of such a business? Marketing by all means is seeking ways:

to expand the audience and channels to attract customers;
increase the quality of services;
implement loyalty programs to retain good customers;
develop partner networks.

')
And he reads and invents other world-wide business models.

And in the development we get a line of ingenious and not very (but necessarily urgent) tasks. Affecting all processes from the landing of the client to the service, to his repeated maintenance. The software part of the business is quickly brewed with the noodles from the logic of registering new clients, placing orders, calculating values / tariffs, ways of paying for orders / services, forming personal mailings, individual and group discount offers, holding promotions and so on.

Marketing, which at the start of business was content with reports such as “average order amount per month” now wants “the average amount of the order without special offers compared to two weeks in January and the same two weeks of September for a group of customers who have made more than one order in the last six months and registered more than 2 years ago, and even separately please customers of this and that gender, and even better so that they never call the call center, but also divide it all separately for each channel to attract customers and by moon phases Auger "...

From haste and hopelessness, the base is actively overgrown with ~~garbage~~ auxiliary tables and fields that are not needed by anyone (except for marketers). Happy marketing develops successful and discards unsuccessful business models. Fields and whole tables remain forever. The whipped up reports, with the growth of the client and commercial base, soon cease to fit within reasonable limits of the load on the system, require rendering to background tasks (if not to a separate server), and want to replicate the base. Or even more fun: the marketer “learned the sql” and sits in your server directly, periodically running to the admin asking him to “do not know why a long-running query”.

And the structure of the base looks like this (not the worst option):

Why is that bad"? Because a bunch of users-orders-payments should solve the problem of getting the service by the client and its payment. As soon as we give these entities any extraneous tasks, we increase the dangerous connectivity through the entire workflow.

Life example

To begin with, let us imagine a simplified direct sales cycle in e-commerce (from the point of view of its most important part, the client):

In order for this process to be performed qualitatively, quickly and with the minimum amount of logical errors (the client must have a “type-top” when ordering, otherwise it will fall off the marketing hook), there should not be anything extra in it (and the probability of filling bugs - minimal).

But marketing comes and says:

After registration, you need to send a letter to the client.
And if the client came from the action, then replace “hello” with “hello” in it.
Or maybe it is from an order for a trial period of service - then a completely different pattern.
If the client did not arrive for a long time, then send him a reminder letter.
If the client came, but quickly left, then send-letter-ask-why.
If he came and trampled N minutes, without putting anything in the basket, offer a discount.
And prokinut track-id from partners, then to share the percentage on request.

And this is just one piece of the whole chain. It will only be “worse”. Do you agree that the client himself is not involved in these tasks and his need does not satisfy these secret intrigues? And when you program all these conditions right in your class (component / controller), which handles visits and customer registrations, the code of this class will be increasingly thick, and the dependencies in it will be more and more. And this class is no longer registering customers! More precisely, he not only registers: he sends letters and makes difficult decisions (and even learns to play the guitar - guaranteed). Well done, if you selected letters in a separate component, but this component constantly risks starting to refer to the customer base and their order history to decide which letter to send and eventually learn to play the piano (I'm sure, I once taught).

And now everything works somehow (and this is just the beginning):

To continue, I hope that the problem is clear and even familiar.

But suddenly the intelligible leadership not only heard about this problem, nor provided the “green light” on its solution. Attraction of generosity: the whole developer, a specialist in mailings, generating reports with charts and Excel uploads. Or you are a happy owner of an offer to develop a project from scratch. What would now be TK to make up in order to “forget” about the influence of marketing activity on the main (clear and stable) business process?

The solution is, you can not eat

I propose one of the solutions that will satisfy the developers supporting the principles of SOLID, design patterns, TDD and DDD, and other beautiful terms. This decision was born out of the confidence that you can always just take ... and put the marketing functionality behind the door of your important client process (and never let it in).

And of course to make it happen. We take marketing, we ask it to move away from our simple system into a corner, and we begin to rush information there:

Now all tasks from the list “But marketing comes and speaks” can be solved separately. Your main system is pristine marketing; it can solve its problems by itself, send letters and decide when to do it. It has much more basic data than in the “clean” database of the project. Because when receiving event data, the analytic module can decompose them into detailed lists and even perform all related actions (send a letter, in our case):

save in separate fields year-month-day to facilitate future reports / cuts;
accumulate history counters (for example, which account of the product was looked at by this customer?);
trace in detail the behavioral chain of a particular client;
whether he did it in one session or if the chain of actions needs to be interrupted;
send all necessary information to external analytics;
on the basis of the client’s behavior history and the source of its appearance, form the correct letter;
create a list of popular products / services, “often buy with this product”;
and so on and so forth.

Imagine if all these nuances are calculated on the fly (or later in the reporting scripts) and stored in the main database?

In our scheme, the conditional task “to send a letter” becomes separated and does not threaten the main process. It turns out quite a CRM module (and analysis, decision making, and their execution), which can even be outrageous curves, with a terrible (for its tasks) denormalized base, a bunch of all sorts of different auxiliary fields in the tables, brakes, poorly written code ... But this is not important, because it is far away in the “marketing corner” (ideally, physically on another server) and does not concern the client process.

And there is nothing special about this decision. The result was a familiar “observer” pattern with throwing events and listening to them. But in our case, it is impossible to implement the pattern “on the spot”, when events are heard and executed by the same process that accepted the request from the client, since if we mingle in the “listener”, then the 500th errors will come back from it just like without any patterns.

Therefore, the approach will be good only when:

you make it asynchronous and fast;
and on a separate server resource code + database (a separate project);
you will also need a database with backup copies (this is in addition to the main database of the analytical module itself);
and the support of this structure will be needed as a whole project ...

Ha! - you think. Is the game worth the candle? We try to evaluate the pros and cons of implementing a separate “analytical project”:

Interestingly it turns out. All the disadvantages of a “separate project” are only in the need to complicate the server-software infrastructure and implement the project’s core for analytics, which will provide interfaces for reading events and storing them in their database (from a week to a month in especially neglected cases of this kernel). About minus synchronization - just below.

The fact that the decision to “do so” depends on the project and its business models. And the fact that it is not for nothing that projects are more serious are purchasing very expensive (in implementation and support) customer / order / warehouse management systems and so on. And in no case do not trust this work online site. In this way, we also separate “primary” marketing with all its frantic (sometimes disposable) ideas and complex reports.

And this is the most separate project:

Instead of implementing complex analytics and indirect processes in the main project, we transfer this responsibility to the analytical project, realizing only reliable transport for events (red legend). This will need to be done once and sometimes maintained. The support will include the tasks of emitting new events from the main project and reading them on the analytical project. But agree: in the case of programming these actions within the main project there will be no less work, and the risks of breaking something in it are much more.

A pair of dessert spoons of high quality tar

There is a significant disadvantage in this approach. The main project does not need to know anything about the existence of an analytical project (otherwise the point?). But with the analytical situation the opposite. He will most likely need to know all the processes of the main one (since he was invented to solve all the tasks that he did not want to solve).

Therefore, in the development process, firstly, it will be difficult (and often impossible) to solve problems simultaneously on both. In the main project, new events will appear - and in the analytical one they will be processed much later. Secondly, a software error or insufficient denormalization may occur in parsing data from the queue on the analytic server, and the tasks have already been read ... “the train has left”.

To solve this problem, an “event database” is conceived, which stores them (events) everything from the beginning of time, including (importantly) the photographic context (in which these events occurred). This is necessary both for peace of mind and then, so as a last resort, you can always perform a full reload of the created matrix:

On the analytical project errors are corrected and necessary corrections are made (before deployment)
Next, the deployment script runs:
- We stop reading data from the queue (let it be accumulated).
- Fully clean analytical database
- We post all the necessary fixes
- We launch a special script, which in a separate queue (constantly read) will send all previously occurring events.
- The reading of data from the main queue is turned on again.
As a result, the analytical “worker” takes data from the queue as if nothing had happened.

There is another obvious disadvantage - it is the complexity of implementation. If you decide to do so, and you have everything “very long and bad,” there are few options and you will have to think a lot about them:

Refuse analytics in the past tense (events are accumulated only since the launch of the analytical project).
Implement the proposed scheme in parallel with the current analytical developments (general analytics - they continue to live in the main project, new things in the new one).
Load the data into the “event database” with one-time scripts for all of the previously past time (we imitate the life cycle of sending events in hindsight).

But there are six bonuses too.

Based on reading the flow of events from the main system, you can implement

and convenient process logs (for technical / customer support operators - notice / warning / error);
asynchronous distribution of events by project personnel (notification, reminders; need to call the client, check the order, etc.);
asynchronous distribution of commercial notifications (sms, emails, push) to customers;
generation of streams of “secondary” events according to the conditions of analytics (the 99th client in this quarter with a check of more than 1000 - send a “letter of happiness”);
simple data overload into existing OLAP analysis systems (QlikView, MS Business Solutions) anytime and any number of times - and no pain in generating interfaces for marketing and business analysis;
simple garter and aggregation of external sources of events and data into a single database (Yandex-metric, etc.);
simply issuing access to external partners (to the same call center, they will not “sit” in the core of your project);
and so on.

Source: https://habr.com/ru/post/236943/

All Articles

Active analytics for your project

Life example

The solution is, you can not eat

A pair of dessert spoons of high quality tar

But there are six bonuses too.

More articles: