Organization and use of segmentation in large mobile applications

One day, your mobile application becomes quite large and it is used daily by ten thousand - one hundred thousand - a million, it does not matter, in general, there are a lot of living and different people. What does this mean for you as a developer?

Yes, now it’s much more scary to click the “Submit” button, because if you overlooked something - unlike web applications, you won’t be able to sit overnight, surrounded by red-bull and pizza banks and fix everything - a review on mobile platforms takes time, and if we talk about iOS, it’s already a whole week. A week is more than enough time for a previously loyal user to stop opening your application.

And, equally important, it means that the time has come when “I like the way this screen looks like” is no longer enough justification for this screen to really be present in the application.
')

In this article I will try to talk about what we are doing so that the huge production application will continue to be so.

As a note: the material of this article is not very suitable for applications whose work does not require an Internet connection. But in our mobile time - there are less and less of these.

Part 1. Segmentation of uber alles

Story One: We have integrated work with one large and very beautiful third-party into our application. They have their own team, their backend, even their office in some sunny country where they can call and make an appointment. But at one point, this whole service falls on three days after solid refactoring, which, by a funny coincidence, broke backward compatibility. Yes, “write it yourself”, “don't mess with them anymore”, “ask them to fix everything as soon as possible” - these are, of course, interesting thoughts to think about, but you need to do something quickly and quickly, so that users can click on which of the usual buttons did not see the constant "Sorry, we have lunch", or worse, some information that is not very relevant to the truth.

The second story: You deploy a large and important feature and, of course, you have tested it properly, but at the same time there was something wrong in it! Who and how to scold - we will decide later.

Third story: In order to know more about the users of your application, you send a lot of useful information to your dedicated log server. But someone knew that suddenly there were several million of them - and your log server happily drops every 10-15 minutes, and a new one will arrive in 2-3 weeks.

These are scary stories, and there are still many others not so scary.
And for all of them there is one convenient and useful tool - segmentation.
In short - the flow of our application can be described in simple steps:

Getting from the server a common config
Login to server
Getting segmentation data
Work in the application

So, now we’ll dwell in detail on the first and third step and on what problems they solved in certain moments of life:

What is a common config? This is a set of application settings, the same for all users.
What it contains:

Production Server Address

Entry-point of your application. And now, if something happens that you need to move to another domain or, for example, transfer users to a backup server, this can be done via the config. Unlike, again, from web applications, for mobile (and for desktop) systems, quite a large number of versions can be alive at the same time, which means that a forced change of the production server can be done (and we had it, it can happen with by you) without breaking customers who are already in work.

Thus, the application will store exactly one static URL - per config file, and you can keep track of it somehow.

Minimum application version (optional / force)

In general, people do not like to be updated. And even with the advent of auto-updates on mobile platforms, the situation has improved, but has not been corrected in its entirety. For example, with us - with two-week release cycles - we use, as a rule, 10-15 versions simultaneously. But sometimes, the so-called breaking changes occur - fundamental changes that make it impossible / uncomfortable to work on older versions of the client. In this case, this parameter signals to us that “it would be nice to update” in a soft scenario and “no continuation of work is possible without an update” - in a hard one, which we show in the UI for users.

Log settings

For this we use a separate config file, which allows us to:

Add / exclude fields sent in a specific log message.
To establish what percentage of users should specifically process these log messages to the server. (And, moreover, to which server - some messages are logged to third-party analytics services like Google Analytics, some to our internal service, and so on)
Set log-level for system log.

The first helps us with a fairly compact log message - if necessary, add to them the information that we lack.

The second helps us to balance the load on the log server. Since you already have a lot of users, even if 1 percent of them will send these messages to the server, this already allows us to get a fairly representative picture.

The third one has a slightly different benefit - sometimes crash analytics appear in our system, the circumstances are difficult to restore by stackTrace, then reducing the log level to the limit, we simply apply the last few lines of the system log to the crash report and the next day we can restore the actions of users and localize the problem (if anyone is interested, we have this achieved in conjunction of the cocoalumberjack logger and HockeyApp analysts).

Keys and other settings from third-party libraries

Yes, I understand that this is fufufu, but nevertheless, in cases of any ambiguous situations with these libraries - we had to re-create our accounts in the third-party several times - and this, again, did not break the work of old customers. In addition, these keys are quite possible to "salt" and make them fairly secure. (But we still remember that if the pest sets itself - it will be able to pull out these keys statically given from a binary application - therefore this way of storing them is not too bad - especially since pulling the encryption algorithm is more difficult than one key).
In addition, sometimes in the case of especially unpleasant breaking changes in the third-party, you can afford to arrange proxying work with them through your server, saving the old format for old clients.

Global A / B Parameters

Some decisions need to be made for a user who is not yet registered in the system (for example, the appearance of the registration window, the way we go through the registration and much, much more). In this case, as such a parameter, we can well store the percentage of users for whom you want to enable / disable some functionality. To determine whether a test group is hit, some unique device identifier (as a rule, you can find one in each operating system), from which a two-digit hash was taken, suits well.

In addition, the following concept turned out to be a very useful acquisition for us:
Devices and users are divided into "real" and "test." To determine the second, we tried different (for example, by UDID, when its use was banned, by the identifier for advertising, but restrictions were imposed on it, and in general the use of such identifiers does not exclude collisions that one day the real user will see what he should not) but in the end they stopped at a simple scheme: a small utility, when launched on the device, writes some key to the encrypted storage on the device - this is on every mobile system, on iOS, for example, this is Keychain. The main application, on startup, checks for the presence of this key and, if present, considers the user to be a test key.

IMPORTANT: If you introduce this kind of separation, try to observe two things:

A real user should not “accidentally” become a test. Seeing information can confuse him.
Working under a test user does not give any preferences and the possibility of dangerous actions when working with an application. (Especially it concerns game development - because attackers who set themselves the goal of circumventing protection - will be able to do it sooner or later and, moreover, will share this way)

And now, when we had tesz users - why did we use them?

Debag-panel, which facilitates the actions of the tester. (Replaces the responses from the server, changes the log level, shows the frame rate and UI logs)
“Stages” - or test servers - for test users, it is possible to choose from which starting config we download the application - from production, or from one of the test servers, which allows us to test the operation of the application with different settings of both the server and segmentation.

So, now let's move on to the main segmentation configuration:
It is focused on more pragmatic tasks - on what is commonly accepted in the community as classic A / B.
This config is structured in the form of a dictionary (for example, JSON), in the format of id_name: {dictionary of parameters}.
Here, we no longer need any percentages and other branch points - as we remember, by this time we are already registered in the system - and, accordingly, based on a deliberately unique user_id, we return specific parameters specific to the user, based on the existing segmentation module server. In it we can rely on:

How long the user uses the application (newcomer, loyalty, ...).
Is it a paying user.
What device, what version of the operating system and, directly, the application uses the user.
In what time zone the user, in what country, what localization is used.

And many other data.
What is it used for?

Enable / Disable Segmentation

As described at the beginning of the article, it is pretty scary to open some great functionality, so in this way we can, first, disable the feature that behaves not as expected. And secondly, to ensure the gradual opening of this functionality for users (open for 5 percent - we observe, then for 10, 20, 50 and, finally, for all).

Parameter segmentation

You can put a number of variables in each feature designed for A / V - text on the pop-up window, animation time, time between impressions, button color, one of the possible behavior options. The more of these parameters - the more you can set up experiments in search of the best solution. You are limited only by your imagination. On the other hand, the more testing is required for this functionality. (However, this volume can be well spread out over time - after testing the basic operation of the application, in the future, just carry out a brief test on the test server of the set of parameters that are going to roll out)

So, summarizing this part, I would like to note the following points:

A / B testing should be embedded in the architecture - every designed feature, except for very monolithic, should assume the possibility that it will be completely disabled. It is better to remove excess branching when the functionality has already proven itself in production during refactoring. In addition, the feature must be laid in advance "point of implementation" - which may take some valid values that we get from the server.
In addition to the previous paragraph - for each feature there must be some default config, firstly, in order for the config to be sent to be the smallest (only redefinable parameters), secondly because the segmentation, like everything else - can easily and just fall off.
Maintaining this kind of logic is really expensive - it complicates the code. Sometimes, significantly.
The volume of acceptance testing of the application increases - in case of design errors or simply related features, it is necessary to check the values of not one parameter group, but possible interconnections of several.
With all this, it can make the application much more resistant to the mistakes of developers and ideologists of additional functionality.

Part 2. Hey, are you there, alive?

Ask yourself if I'm not doing bullshit

In the first part, I wrote what we are doing, and in this very same I will try to describe - how can we, in fact, understand whether we do.

Crash analytics.

The easiest way to understand that something is not happening is to find out that after the introduction of new functionality, the application simply stopped working - it started to fall happily.
For it, we use HockeyApp - because it has quite a handy toolkit for working with existing crashes, and, moreover, it is well integrated into various deployment systems - so that you can maintain the relevance of the information in it automatically. But, in fact, at the moment there is a very decent amount of such tools - for every taste and color, choose for yourself. As I wrote a little higher, working with it has become even more pleasant since we introduced the ability to attach a piece from the device log to the crash log.

Monitoring sessions and payments.

Perhaps the main tool of interest to the business. For him, too, there are quite a lot of different tools, but we use some mixture of samopisny and existing. Because not every business will go on to share information about payments with third-party services. And rightly so. Existing systems allow us to quite comfortably detect beyond the shortest and the most long-term trend of the quality of the application. With a wide range of essential functionality, the following metrics deserve to be observed:

The number of sessions — if it has changed dramatically — means something is wrong.
Revenue - or the number of payments - which allows both to understand in the short-term sense - what effect the implemented functionality has on users, and in strategic terms - to understand whether we are moving there at all.
Session length is also a very important metric. It cannot be said for her that it is always better when more (this is true for game projects), in business applications it should rather not be very far from some predictable. (If the sessions are too long, then it may be worth considering what do users spend so much time?)
The number of sessions per day - well, here, in general, everything is clear.

Modern services allow for a good segmenting of this analytics (by device, by application version, by geolocation, and by a bunch of signs), which allows us to be more accurate in their forecasts.

Monitoring reviews.

Unfortunately, the process is very poorly formalized and not amenable to automation, but, I think, it is not required to explain its importance and necessity for any (not only large) applications. It will be good to give an opportunity to write a review proposal directly to the company, because it can be skipped in a large number of reviews.

Life monitoring features and stories.

For this, unfortunately, it was not possible to find a fairly convenient and functional simple public tool, therefore most of the application logs and a dedicated log server are dedicated to this.
We use Hadoop (and several other services) for this for the following reasons:

Being NoSQL - it allows you to easily add / remove fields to log messages without having to change the structure of the database on the log server - this gives the necessary flexibility.
It is possible to make complete statistical sampling on the set of parameters of interest to us, in order to get the most accurate information cut.
We use such an entity as “history”, it’s also an internal session, it’s also a funnel - relatively speaking, a unique session identifier for using an application.
Based on this, we can get a sample showing the entire sequence of user actions in the session we are interested in. (For example, we make a sample of users who for some reason failed to make a payment, then for any of these users we select its funnel and get the necessary context of why this could happen)
It is possible to set up “notifications” suspended on some time-scripts — that is, for example, once per hour we make a sample that counts the percentage of the number of sessions with errors to the total number of sessions — and if this percentage exceeds a certain interval, the interested parties receive the message about this and proceed to a more detailed analysis of the problem.

I believe that all these requirements are quite important, and I think a tool that satisfies them is not one.

And, perhaps, the most important point that follows from the availability of such tools is that it provides the ability to build formal (and sufficiently accurate) metrics to assess the quality and relevance of the functionality. If the button is not pressed, it must be removed, even if it is very beautiful. If after the introduction of the super-convenient feature, users began to complain more about the application, use it less, pay less, then this feature is not super-convenient. And so on.

Summarizing all the above, I would like to note the following several points:

Perhaps the majority of the article’s material seems obvious, but nevertheless, we have not come to all this from the first attempt, and this set of approaches works and solves problems.
All this infrastructure is very expensive in the development and maintenance of performance - so you shouldn’t implement it ALL if you start writing a small application that you believe will interest millions. At least also because statistical methods are absolutely inapplicable to a certain audience size.
However, some recipes are easy to maintain in working condition, and they save nerves decently.
It is not necessary to log straight all-all-all. It quickly turns into indigestible porridge. I propose to log based on the hypotheses. That is, starting to design a new functionality, you write out options from a pessimistic position, which can go wrong, and then based on this, write a minimal set of log messages with a minimal set of fields that will allow analytics to cover these hypotheses. And sometimes conduct reviews of log messages and remove those that are no longer relevant.
But at the same time, remember that you can log a variety of metrics and events - from errors in the session and the frequency of opening a screen, to the application load time or the time spent on a particular task.
Try different tools for organizing segmentation - ideally, this should be done in such a way that it is not the programmers who are involved in the changes of the segments, but the marketing and product teams. Give them more opportunity to organize segments.
There are a number of ready-made solutions for organizing A / B segmentation in mobile applications on the market if you want to save time on infrastructure organizations, for example, leanplum
You are very lucky, you are working on a project that is interesting to people :-) Thank you all.

Source: https://habr.com/ru/post/255941/

All Articles