Average hospital temperature, data clusters and project decision making

0. Intro

It's nice to see people launching many services and applications. Someone is lucky and success comes to the product itself. The majority should adequately assess the situation on their project and make the right decisions leading to their lunapark with backgammon and secretaries.
Now I will offer you one of the options for how to correctly assess the situation with the product, make decisions and not get caught up with the error of “average temperature in the hospital.” Under the hood - a bit of datamaning, hospital metaphors and “start-up metrics”.

This is a bird of the century and it will help us with today's article.

')

1. The eternal question: "What to do?"

So, we launched a new web service for mass use. We were lucky: we did not turn around during the development process, created a workable project, presented it successfully and even began to attract the first users. This is quite a success - heaps of projects do not live to see the public release. On this occasion, we roll a party with cats, a cake in the form of a huge Darth Vader and chocolate fountains.
After the party it becomes obvious that the problems are just beginning. You need to somehow grow, work with users, add new features and improve old ones. In general, it is necessary to do something. But even if this project is not your first one, the paths for further movement are not always obvious.
Everyone solves the problem in their own way . Someone thoughtlessly hires the first marketer and follows his advice ("and here we will start selling a subscription to the red buckets at $ 2.99 per bucket per month"). Someone says that he, a pancake, is a visionary, therefore we will do it because “I see it so” (and it still works sometimes).
A good way for most is to adequately assess their situation in numbers, and then engage in market-oriented and visionary work to improve performance, and at the end - in numbers to evaluate the results of their actions. We will go this way.

2. Introduce the patient!

Then I wanted to go to the consideration of a specific example from my work, there are just a lot of users there and the “what to do” problem rises almost every day. But, as soon as I reveal the numbers and details - my boss will arrange for me to die through sleep and sleep for disclosing work information.

Therefore, we will look at all the methods using my hobby as an example - bamb.ninja , a free project for those who are learning English (there was an article about Giktimes about this thing ). In short - the service allows you to analyze books in English and predict difficult words in the text. After that, the service collects for you a new book, inserting translations of difficult words directly into the text. The output is your personal adapted version of the text.
This is the patient for today. The patient has several thousand users and it is absolutely not clear what to do with the service further. And it's just a hobby - you can feel free to light up the data and discuss them as you please.

3. Vital signs

When someone enters the hospital, any actions of doctors begin with the determination of vital signs. In the case of people, everything is known - you need to measure the pulse, pressure, weight, take general tests and conduct an inspection.
With tech projects, not everything is obvious. All projects are different and their indicators will also be different. Often people try to consider the number of registrations per day, conversion, growth rate as the most important values. But these parameters are easy to influence from the outside - I bought a bunch of ads, flooded people, a bunch of metrics jumped. Joy? Not really. New users can come, register and never return. Still, these metrics will not say anything about the quality of service and the real involvement of the audience.

A healthy project is a useful and necessary service that is able to solve problems, involve users and force people to use the service again and again.
It means that it would be good to find the parameters that speak about the meaningful activity of a person in the service. Not an abstract retension (how many people from the total number of registered people returned to the service), but the number of meaningful actions made by man. And yet - the time between these meaningful actions.
After digging into the database, I came to the conclusion that for my project these parameters are:

The number of texts translated by each user (the more - the better)
Average time in hours between user access to text loading and translation. The bigger, the better. It is good if a person periodically downloads new books and reads them, improving their English and filling their minds with knowledge. It is bad if a person immediately translated 30 books and forgot about them forever.
The number of errors in the service for each user. Less is better.
The number of operations with a list of compound words for each user. The bigger, the better. Work on the list of words after reading the book suggests that the person got carried away and learned new words from the read book.

All these parameters quite accurately indicate whether people use the service, whether technical problems prevent them and whether they will solve their problems. Your project should also include such parameters - for example, the number of units of created or consumed content, the time between creation / consumption, the popularity of this content, errors in the system when a person works with the service.

And, of course, you need to follow the growth, conversion and retension - these numbers also need to know.

4. What to do with it?

Now we know the key vital indicators of the service, which tell us how good our people are. What to do next?
I can already hear several people shouting “calculate averages and try to optimize them for all users.” The average is, of course, also a metric, but it says little about the real state of affairs.
If the temperature in the hospital is measured by everyone who is there (including the corpses in the morgue), then it is possible to come to the wrong conclusions. For example, you can say that if you send 100 people to the hospital with acute infections and fever, the average temperature will be just 36.6 and then everyone can be discharged immediately to their homes (including the dead) - by all average values, everything turned out well.
If in the case of a hospital, we all understand that the strategy of optimization of average values is absurd, then in the case of software products, this is not understandable to everyone. In startups, they love to produce amazing average values, get a couple of million dollars for this business, and then die in 3-4 years, to the great surprise of investors.
If not average, then what?

5. Clusters

It is necessary to group users in such a way that the average properties of a group reflect well the properties of all users who fall into this group. This is called "user clusters". Instead of several thousand people, I will need to consider only a few clusters, the statistical values for which reflect the peculiarities of behavior within each cluster.

I build clusters.

I am pumping out from the database a table in CSV of the most critical parameters - the number of translated books, the number of errors in processing the request, the intervals between using the service and the number of operations with the list of words.
I take Weka - a free program for mining data. Clustering values is one way to mine data. Weka is named after a New Zealand eyelid bird.
Using a pair of simple operations, we feed data on our users to Weka and use the clusterizer to find clusters among our users. For our tasks, it is quite possible to use a clustering device with the parameters EM -I 100 -N -1 -M 1. 0E-6 -S 100 (I recommend that you read the documentation for Weka and get a little insight into what various analyzers, clustering and classifiers do).

Instead of several thousand users, I received only 3 clusters with different properties, into which all my users with different behaviors fit surprisingly well. But over the past few months I have changed my project a couple of times in a fundamental way, so for greater reliability I will take data only for the last month - they will more accurately show the current state of affairs.

6. Analyze it

Cluster # 0 - people who are pretty stuck on the service. They translated an average of 8 books, the average interval between calls to the translation function was 36 hours. The average impact is rather low - 25.5 operations with a list of difficult words from the text. Most of the failures in the service had to these people - they translated a lot of different texts and generated a lot of bad situations. 9% of the total number of my users.
Cluster # 1 - people who are stuck for a short time. They translated an average of 3.2 books on the service, but they have a better parameter of engagement with new English words - 27.2. These users have no problems with the service at all - 0. And 65 hours between calls to the main function with a large standard deviation are 164. 19% of people from the total number.
Cluster # 2 - people who only tried, but it did not go further. With new words they do not work, books are translated a little. The average value between calls to the service - 34 hours. Explicit anomaly - where does the average time between calls come from, if people turn on average 1 time? 72% of the total.

Looking at these 3 clusters you understand that there are some questions.

Even the most loyal users do not work on the study of words. Is this too hard? Badly implemented? Or is it generally an unnecessary feature? Yes, “I see this,” but maybe people think differently?
Not everyone can start using the service after registration. Too hard? Unclear principle of work? Laziness?
Users who have read 3 books each are more likely to learn new words than those who read 7–8 books. Is it a trend or just a fluctuation? Is there a pattern here? And isn’t it harder for people to learn words with a large number of books? Do I throw too much information on them?
To keep the most loyal users need to solve technical problems. What problems do these people have? Do they have common causes?

The distribution of people in clusters shows that I have problems. And after a bit of brainwashing and seeing additional statistics in the database, I can understand the nature of these problems and the methods for solving them. If I took the average values - the picture would be quite pleasant. And I would focus on the growth of the audience and the increase in the metric “the number of books per user”. In this case, this is a direct way to bend.

7. What to do with it?

Now let's talk about your project.

Identify key vital signs for your project.
Upload statistics on them and try to find clusters.
View the results. What do you like about them, what don't you like and what causes suspicions?
If you want to change something in your statistics - make hypotheses, which, as you think, explain the influence of some factors on the value in your clusters. Try to influence these factors. Go back to step 1 and go through the cycle again.

Add to this the typical statistics: the growth of users, MAU / DAU, retention, your earnings and conversions - you get a fairly convenient working compass for making decisions. With the help of these numbers, you can build the right marketing and test your visionary decisions without losing contact with reality.

8. Outro

If clusters were built in the hospital, they would rather quickly understand that there are different categories of patients in the building. Some of them are corpses (this happens), some of them are with broken arms and legs (they need plaster and X-rays), and some have already recovered and can go home.
Of course, this will not solve all the problems, but on the scale of hundreds of thousands of patients it will help to understand that on this floor we have an infectious disease ward, and antibiotics should be used there. We do not know exactly who is there and what hurts, but even pumping this area with antibiotics will help to get a more adequate result.

Search and try different approaches to the analysis of statistics. Analyze cohorts, try to mine data and select groups. Look for deviations from averages, peaks and dips in activity. In general - try to look at the statistics from a variety of angles. The main thing is to go beyond the standard metrics for startups and look at things more broadly.

May the Force be with you and your deeds!

Source: https://habr.com/ru/post/295926/

All Articles