How to be data driven. From the very beginning

The numbers mean a lot to us. We invest in data, listen to and understand it. We are guided by them when making decisions. Despite the fact that we still have much ahead in terms of the infrastructure for working with data, the data driven approach has always been with us. In this text - a story about which way we went, what lessons we learned and what rake we collected.

My name is Andrey Sytsko, I am the head of the product line in the fintech company ID Finance. As I said, we still have a long way to go in terms of methods and tools for working with data. The multiple growth that the company has experienced since its inception sets an unattainable pace for the analytical infrastructure. However, it is likely that expectations from a data driven approach are simply rising at a faster pace. In the end, as we all understand, not just specific tools and technologies are important, but the approach, culture and worldview.

What is a data driven culture?

What do we mean by a data-driven culture in a company? In my opinion, this is when we internally agreed that the data can give a good answer or advice in the framework of a particular business dilemma. There are several consequences of such an arrangement:
')

We are ready to invest in working with data: extraction, storage, analysis, interpretation, visualization and more. Ready to spend money and time
We are ready to listen to the data. Those. when you need to make a business decision, we stop and tell ourselves - let's look at the numbers.
We can understand the data. Indeed, it is terrifying to just make the wrong conclusion, having all the necessary numbers on hand. Anyway, there are some minimum requirements for the analytical thinking of decision makers in order to extract meaning from tables, graphs and charts.
We trust data and are guided by them when making decisions. When a manager, looking at a prepared analytical report, says that he will do better as experience tells him, rather than a report, then he is not necessarily wrong. What if analysts did not take into account seasonality, the results of the upcoming elections, or something else? The dialogue between managers and analysts, trust in each other is important here.

Naturally, the data driven culture in the company is easiest to build when the founders of the company are already its carriers. Using data in decision making makes this process more time-consuming and expensive. And without serious conviction that it makes sense to do so, and not otherwise, you will not go far. We were lucky in this case - the right foundation for the future building was already laid.

First infrastructure steps

The first thing you will come across on the way to your ideal data driven decision making is that you do not have enough data. In general, they will always be missed for objective reasons, but you have to start somewhere.

To get started, you build the infrastructure for collecting and storing metrics. In the vast majority of projects for data backends (and we have, for example, information about customers, their loans and payments on them), the replica of the production base is simply used at first. In this case, you will have to fully enjoy the internal data structure of your software, which the developers created without the thought of making the data convenient to analyze. But we have first-hand information, so to speak. In the beginning, there is usually one database, and the data structure is relatively simple, as well as the questions that you want to ask for this data, so this is a completely working option and investing in something more complicated does not make sense.

For front-end data (page views, interaction with controls, scrolling, clicks, input), you can use classic tools such as Google Analytics or Yandex.Metrica and, for example, HotJar to record sessions. There is enough basic functionality for marketing tasks, and for product reports on funnels and a / b tests, we quickly enough switched to work through the Google Reporting API. We already told about it on Habré. Here and here .

After you have built the basic infrastructure and started collecting basic statistics, you need to make sure that the product will develop synchronously with its metrics.

Those. when you are going to implement a new feature in a product, you need to answer approximately the following questions:

What key business metrics will this affect?
What changes will be made to customer journey or backend algorithms? And how will this affect existing metrics?
What stages / components can I break down the new functionality so that by collecting metrics for each of them, I could look inside and analyze the feature’s work

Now think about whether the ability to collect all of the above metrics is part of the problem statement. And how exactly will you collect them when the functionality is implemented?

Next, you need to make sure that the subsystem for collecting and storing statistics is of sufficient importance for your development team and IT team. Its importance should be almost equal to the importance of the production system. For example, at the beginning we had a constant problem with Google Analytics tracking disappearing from different pages, until we discussed the importance of these things with developers. After that, the necessary shared libraries, QA guidelines, etc. appeared.

Analytics for Analysts

The availability of data does not mean its effective use. Typically, the following problems / tasks arise:

Where to get this or that metric? How to get it from there?
Is she going right? (suddenly everything does not work as intended)
What report should I draw so that any conclusions can be drawn?
Is there any statistical significance?
Is it possible to dig up more data in order to better understand what is happening or to check metrics collected in one way / in one place by other metrics.

It turns out that this is a rather voluminous work that requires special skills and, most importantly, time. So there is a need to create an analytics department.

Our analytics department is quite large, in terms of the number of people, it is almost equal to middle management. It contains both yesterday's students with good knowledge of SQL, and professionals who understand well how and what data needs to be obtained in order to make business decisions. The flow of requests to them traditionally exceeds their capabilities.

Lakes and data warehouses

One of the problems that you are likely to encounter when there will be more and more data is that they lie in different places and some analysts can work with some repositories, others with others. And with some databases, probably, nobody knows how to work right away. It also becomes difficult to compare these data with each other.
The solution to this problem can be systems like data warehouse (DWH). In our case, we thought about this for the first time, when we wanted to combine data on user behavior on the site and data on his behavior as a borrower. The principles of constructing DWH are far beyond the scope of this article, I will only say what difficulties / features were in our case:

each of our projects (now there are 9 in 6 countries) the data structure is slightly different and, accordingly, it was necessary to develop principles for their unification
It was necessary to think up how to unite heterogeneous data in one storage.

For example:

user behavior on the site - transitions between pages, interaction with controls
credit policy work log - the implementation of the rules and their outcome, the transition along the branches of logic
borrower behavior - loan payments, cross-selling

Now that we have more or less learned how to integrate data with each other and merged it into one Data Lake, we proceeded to create storefronts - pre-prepared datasets, reports and visualizations - for which it was all about. At the exit, we expect to receive a significant reduction in the requirements for the skills and labor costs of our analysts.

Usually, at this stage, a dedicated data engineers role appears in the company - i.e. people in charge of data infrastructure. They are entrusted with the task of maintaining and developing DWH.

It’s better to hire the right people right away.

With the growth of the company, it turns out that not all employees immediately understand the importance of data and are able to work with them. Two questions arise: internal promotion and hiring the right people.

As for the internal promotion, then, as mentioned above, if the founders of the company are carriers of a data culture, then it goes down to top management, middle management, and so on. For example, I demand from my product managers to calculate the potential effect in money or change key metrics before implementation, and see the plan fact after the implementation of the new functionality. Or, say, to prioritize work, be guided by the same assessments of “business value”.

We approach the planting of a data-driven culture from two sides. Our IT department may require business managers to set an estimate of the effect in money in the statement of tasks. And this applies to all departments: marketing, support, accounting. To this, we recently added the requirement that the business explicitly describe the metrics by which it will track the results of implemented changes, and IT must ensure that these metrics can be accessed in an understandable way.

It is important, of course, to check immediately when hiring people whether they are used to focusing on the numbers in their work or not, whether they know how to do it. My favorite questions during the interview, when we discuss the candidate’s experience: how did you calculate what effect the feature will give, how did you measure, what effect did it actually give, and why do you think that this effect should be attributed to this feature, and not to something else. A good candidate will always be able to logically explain why he did so and not otherwise.

With the growth of business and data volumes, it becomes meaningful to use more advanced statistical techniques and more advanced application libraries - some of what is now called data science.

If we talk about data science in a broader sense than neural networks and machine learning, then, for example, we had successful experience in moving from classical packages like SAS to build logistic regression to self-written python tools. This reduced the time for developing credit scoring by 5 times.

At some point, they realized that logistic regression and cluster analysis on certain volumes justify their use in marketing and product management for tasks related to customer segmentation and determining the optimal product or discount strategy individually for each client.

Learn to predict the future

The peculiarity of the loan business is that it is not enough to sell a product - money on credit, you need to manage future cash flow. Accordingly, the role of various predictive models and their integration into the forecast of the future P&L comes to the fore. Examples of such models: future fees based on early arrears data, average bill based on customer segmentation data, number of loans based on return data, and the like.

This is generally very inspiring when there is a toolkit that allows you to evaluate the effect of your feature on various key business metrics and predict the increase in company revenue.

To develop, maintain and implement such tools, we are now developing a department for financial planning and analysis (FP&A), whose task will be to make business decision-making even more supported by data, analysis and modeling.

Ahead of us is still a lot of interesting things: the further development of BI infrastructure, the creation of departments that support it and the processes that use it.

To summarize, we can distinguish the following principles for the development of a data-driven approach, which I would adhere to:

The expected return on investment (for example, in saving staff time, increasing accuracy / speed of decision making, etc.) is adequate to the resources expended.
Internal product management: during the creation and development of infrastructure, “Wishlist” and feedback of internal customers are investigated. And taken into account.
Infrastructure development must keep up with the development of processes and methodologies. And all together - not to lag behind and not outstrip the development of the company in terms of its analytic needs.

Source: https://habr.com/ru/post/461339/

All Articles