📜 ⬆️ ⬇️

Organization and use of segmentation in large mobile applications

One day, your mobile application becomes quite large and it is used daily by ten thousand - one hundred thousand - a million, it does not matter, in general, there are a lot of living and different people. What does this mean for you as a developer?

Yes, now it’s much more scary to click the “Submit” button, because if you overlooked something - unlike web applications, you won’t be able to sit overnight, surrounded by red-bull and pizza banks and fix everything - a review on mobile platforms takes time, and if we talk about iOS, it’s already a whole week. A week is more than enough time for a previously loyal user to stop opening your application.

And, equally important, it means that the time has come when “I like the way this screen looks like” is no longer enough justification for this screen to really be present in the application.
')


In this article I will try to talk about what we are doing so that the huge production application will continue to be so.

As a note: the material of this article is not very suitable for applications whose work does not require an Internet connection. But in our mobile time - there are less and less of these.

Part 1. Segmentation of uber alles


Story One: We have integrated work with one large and very beautiful third-party into our application. They have their own team, their backend, even their office in some sunny country where they can call and make an appointment. But at one point, this whole service falls on three days after solid refactoring, which, by a funny coincidence, broke backward compatibility. Yes, “write it yourself”, “don't mess with them anymore”, “ask them to fix everything as soon as possible” - these are, of course, interesting thoughts to think about, but you need to do something quickly and quickly, so that users can click on which of the usual buttons did not see the constant "Sorry, we have lunch", or worse, some information that is not very relevant to the truth.

The second story: You deploy a large and important feature and, of course, you have tested it properly, but at the same time there was something wrong in it! Who and how to scold - we will decide later.

Third story: In order to know more about the users of your application, you send a lot of useful information to your dedicated log server. But someone knew that suddenly there were several million of them - and your log server happily drops every 10-15 minutes, and a new one will arrive in 2-3 weeks.

These are scary stories, and there are still many others not so scary.
And for all of them there is one convenient and useful tool - segmentation.
In short - the flow of our application can be described in simple steps:

So, now we’ll dwell in detail on the first and third step and on what problems they solved in certain moments of life:

What is a common config? This is a set of application settings, the same for all users.
What it contains:

Production Server Address

Entry-point of your application. And now, if something happens that you need to move to another domain or, for example, transfer users to a backup server, this can be done via the config. Unlike, again, from web applications, for mobile (and for desktop) systems, quite a large number of versions can be alive at the same time, which means that a forced change of the production server can be done (and we had it, it can happen with by you) without breaking customers who are already in work.

Thus, the application will store exactly one static URL - per config file, and you can keep track of it somehow.

Minimum application version (optional / force)

In general, people do not like to be updated. And even with the advent of auto-updates on mobile platforms, the situation has improved, but has not been corrected in its entirety. For example, with us - with two-week release cycles - we use, as a rule, 10-15 versions simultaneously. But sometimes, the so-called breaking changes occur - fundamental changes that make it impossible / uncomfortable to work on older versions of the client. In this case, this parameter signals to us that “it would be nice to update” in a soft scenario and “no continuation of work is possible without an update” - in a hard one, which we show in the UI for users.

Log settings

For this we use a separate config file, which allows us to:

The first helps us with a fairly compact log message - if necessary, add to them the information that we lack.

The second helps us to balance the load on the log server. Since you already have a lot of users, even if 1 percent of them will send these messages to the server, this already allows us to get a fairly representative picture.

The third one has a slightly different benefit - sometimes crash analytics appear in our system, the circumstances are difficult to restore by stackTrace, then reducing the log level to the limit, we simply apply the last few lines of the system log to the crash report and the next day we can restore the actions of users and localize the problem (if anyone is interested, we have this achieved in conjunction of the cocoalumberjack logger and HockeyApp analysts).

Keys and other settings from third-party libraries

Yes, I understand that this is fufufu, but nevertheless, in cases of any ambiguous situations with these libraries - we had to re-create our accounts in the third-party several times - and this, again, did not break the work of old customers. In addition, these keys are quite possible to "salt" and make them fairly secure. (But we still remember that if the pest sets itself - it will be able to pull out these keys statically given from a binary application - therefore this way of storing them is not too bad - especially since pulling the encryption algorithm is more difficult than one key).
In addition, sometimes in the case of especially unpleasant breaking changes in the third-party, you can afford to arrange proxying work with them through your server, saving the old format for old clients.

Global A / B Parameters

Some decisions need to be made for a user who is not yet registered in the system (for example, the appearance of the registration window, the way we go through the registration and much, much more). In this case, as such a parameter, we can well store the percentage of users for whom you want to enable / disable some functionality. To determine whether a test group is hit, some unique device identifier (as a rule, you can find one in each operating system), from which a two-digit hash was taken, suits well.

In addition, the following concept turned out to be a very useful acquisition for us:
Devices and users are divided into "real" and "test." To determine the second, we tried different (for example, by UDID, when its use was banned, by the identifier for advertising, but restrictions were imposed on it, and in general the use of such identifiers does not exclude collisions that one day the real user will see what he should not) but in the end they stopped at a simple scheme: a small utility, when launched on the device, writes some key to the encrypted storage on the device - this is on every mobile system, on iOS, for example, this is Keychain. The main application, on startup, checks for the presence of this key and, if present, considers the user to be a test key.

IMPORTANT: If you introduce this kind of separation, try to observe two things:

And now, when we had tesz users - why did we use them?

So, now let's move on to the main segmentation configuration:
It is focused on more pragmatic tasks - on what is commonly accepted in the community as classic A / B.
This config is structured in the form of a dictionary (for example, JSON), in the format of id_name: {dictionary of parameters}.
Here, we no longer need any percentages and other branch points - as we remember, by this time we are already registered in the system - and, accordingly, based on a deliberately unique user_id, we return specific parameters specific to the user, based on the existing segmentation module server. In it we can rely on:

And many other data.
What is it used for?

Enable / Disable Segmentation

As described at the beginning of the article, it is pretty scary to open some great functionality, so in this way we can, first, disable the feature that behaves not as expected. And secondly, to ensure the gradual opening of this functionality for users (open for 5 percent - we observe, then for 10, 20, 50 and, finally, for all).

Parameter segmentation

You can put a number of variables in each feature designed for A / V - text on the pop-up window, animation time, time between impressions, button color, one of the possible behavior options. The more of these parameters - the more you can set up experiments in search of the best solution. You are limited only by your imagination. On the other hand, the more testing is required for this functionality. (However, this volume can be well spread out over time - after testing the basic operation of the application, in the future, just carry out a brief test on the test server of the set of parameters that are going to roll out)

So, summarizing this part, I would like to note the following points:

Part 2. Hey, are you there, alive?



Ask yourself if I'm not doing bullshit

In the first part, I wrote what we are doing, and in this very same I will try to describe - how can we, in fact, understand whether we do.

Crash analytics.

The easiest way to understand that something is not happening is to find out that after the introduction of new functionality, the application simply stopped working - it started to fall happily.
For it, we use HockeyApp - because it has quite a handy toolkit for working with existing crashes, and, moreover, it is well integrated into various deployment systems - so that you can maintain the relevance of the information in it automatically. But, in fact, at the moment there is a very decent amount of such tools - for every taste and color, choose for yourself. As I wrote a little higher, working with it has become even more pleasant since we introduced the ability to attach a piece from the device log to the crash log.

Monitoring sessions and payments.

Perhaps the main tool of interest to the business. For him, too, there are quite a lot of different tools, but we use some mixture of samopisny and existing. Because not every business will go on to share information about payments with third-party services. And rightly so. Existing systems allow us to quite comfortably detect beyond the shortest and the most long-term trend of the quality of the application. With a wide range of essential functionality, the following metrics deserve to be observed:

Modern services allow for a good segmenting of this analytics (by device, by application version, by geolocation, and by a bunch of signs), which allows us to be more accurate in their forecasts.

Monitoring reviews.

Unfortunately, the process is very poorly formalized and not amenable to automation, but, I think, it is not required to explain its importance and necessity for any (not only large) applications. It will be good to give an opportunity to write a review proposal directly to the company, because it can be skipped in a large number of reviews.

Life monitoring features and stories.

For this, unfortunately, it was not possible to find a fairly convenient and functional simple public tool, therefore most of the application logs and a dedicated log server are dedicated to this.
We use Hadoop (and several other services) for this for the following reasons:

I believe that all these requirements are quite important, and I think a tool that satisfies them is not one.

And, perhaps, the most important point that follows from the availability of such tools is that it provides the ability to build formal (and sufficiently accurate) metrics to assess the quality and relevance of the functionality. If the button is not pressed, it must be removed, even if it is very beautiful. If after the introduction of the super-convenient feature, users began to complain more about the application, use it less, pay less, then this feature is not super-convenient. And so on.

Summarizing all the above, I would like to note the following several points:

Source: https://habr.com/ru/post/255941/


All Articles