📜 ⬆️ ⬇️

Quick start iOS-applications on the example of iOS Mail Mail.Ru



Nikolay Morev ( Mail.Ru )


Today I will talk about our experience in speeding up the launch time of the application, and what it taught us.

Here we see that for most users the launch time was about 4 seconds, even a little more. Therefore, recently we decided to pay more attention to the quality of the product, and not to the new functionality. We started to increase test coverage, started working on reducing the size of the application, on optimizing the launch speed, on optimizing the use of network resources. And this is what we have learned.

Our application is an e-mail client that allows you to work with any mailboxes, not only those that are on mail.ru. And we exist in the Store since 2012, although the application development history is a little longer and goes back to the Agent Mail.Ru application.
')
Practically all this time, we are in the 30th position in the ranking of the most popular free applications in the Russian Store and in the 1-2nd position in the performance section. Today we will also talk about performance, but not quite about that one. For an international audience, we are doing the same application, with a slightly different design called MyMail. And our users sometimes notice this.



Today I will talk about our experience in speeding up the launch time of the application, and what it taught us.


Our users, in principle, constantly point us to some problems with applications and talk about what is really important for them. Here are some examples of reviews in the AppStore:



In addition, data analysts that we collect also confirmed the presence of a problem with the launch time.



Here we see that for most users the launch time was about 4 seconds, even a little more. Therefore, recently we decided to pay more attention to the quality of the product, and not to the new functionality. We started to increase test coverage, started working on reducing the size of the application, on optimizing the launch speed, on optimizing the use of network resources.

First, let's see how we came to the relevance of this problem? As it so happened, the problem with the launch speed began to worry us. Perhaps you can compare these factors with your application and understand, and, in general, is it worth doing this?



The first thing is that our application has such a usage scenario, that users launch it many times during the day. And of course, at the same time, if the application starts up slowly, it irritates everyone.

The second reason is the obvious answer to a lot of questions in the development: “This has historically been the case.” I would attribute the problem with productivity to problems that are called technical debt. These problems accumulate gradually, as new functionality is added, imperceptibly for everyone, and it also happens that even intentionally, to speed up development time. I think everyone knows such situations. For this reason, it hardly makes sense to optimize the launch speed if you do not run your application as often.

And another reason - the lack of continuous monitoring of performance. The process of accumulating technical debt, as we all know, is natural, and a new code is constantly added to the application that can affect the launch speed. Part of this code really needs to be done during the start process, these are things like setting up a logging library, starting a crash catching library, etc. And the part, it turns out, is added to the startup process by chance, i.e. by mistake. For example, we are in our application at the start, so historically, that we customize the appearance for all the application screens, even if they are not shown right at the very beginning.

All this is complicated by the fact that each time the start time is increased by a very small amount. And this deterioration is impossible to notice with manual testing, and even if we use special tools such as profiler, we may not notice this deterioration, because the error in the change in the profiler will be greater than the degradations made.



Here is a graph of the launch speed we have built over the past few months. It just shows this problem, how gradually, by a little bit, the launch speed increases and increases with each new commit. This graph was one of the results of the work we did to improve the launch speed, and then I will tell you how you can also build such a graph for your application.



But first, let's talk about how to build the process itself to improve the launch speed.

Everyone knows that the main rule of optimization: premature optimization is the root of all evil. Therefore, before you start, you need to decide on the main issues - what exactly we are optimizing, how users will feel the effect of our optimization, how you will understand, whether the change led to the goal or not and, generally, it would be nice to make sure first, and, in principle, whether optimization is possible, and what is the maximum value that we can improve the launch speed, because the speed may depend not only on your code, but also on some external factors that you cannot influence. Let's start to answer these questions.

What have we optimized? We have chosen to optimize the main startup script, the most frequent. This is when the application is unloaded from memory, the user has already logged in to his account, and when it starts up, it enters the list of letters in the Inbox folder. It looks like this:



Further. The effect that users should feel.



As a result of all the optimizations, the user should have lost the feeling of brakes at the start. To achieve this, we go from two sides - we are trying to reduce the time itself, but, besides this, we are trying to improve the subjective perception of the start time.

Here I will talk only about the technical part - how we improved the time, but in my article, a link to which I will give later, you can find several techniques for improving subjective perception.

Further. How to measure that optimization has given effect?



In the process of working on optimization, in the process when we tried to find places that could be optimized, we used Time Profiler. To assess the overall effect of the change, we used the logs embedded in the application. Why we did not use Time Profiler? Because if you cut out some small part in an application, optimized it, removed some code, it’s far from a fact that it will affect the total start time. And, naturally, to make the measurements as useful as possible, we make all measurements on the slowest device that we have, and in no case on the simulator.



And the answer to the last question — what, in principle, the optimization limit is possible — we received as follows. We created a simple test application with minimal functionality, which is, literally, the Xcode Single View Application template, and added screens with a header, a list of letters and several cells that mimic the list of letters. And on this application, we measured the time below which we, in principle, can not optimize. And we realized that, in theory, we have about two seconds of optimization possibilities.

We proceed to the optimization directly. Let's start with the first stage of launch.



The first stage is the time that passes from clicking on the application icon to transferring control to our own code. In fact, at this stage a lot of things are happening, and it may well take a considerable amount of time.



The bad news is that in the first stage you will not be able to see almost any data in the profiler, but the good news is that there are still some opportunities to influence this time.



At WWDC this year there was an excellent report about the first stage, it was discussed in detail what exactly is happening here, and recommendations were given on what we can do with it.

What's going on here? iOS loads the executable application code into memory, performs the necessary manipulations on them, a shift of indicators that is in our application; binds pointers to external libraries, checks the signatures of all executable files, and then the load methods and static constructors are executed. This is the very first code, which is our code, and not the code of the operating system. For example, I gave a diagram of how it looks in our application, what a breakdown by various stages. For your application, you can get the same data using the DYLD_PRINT_STATISTICS environment variable in Xcode. Accordingly, in order to speed up the first stage, the main recommendation is to reduce these stages. How to do it?



This I cut out a slide from the report at WWDC, which summarizes all the recommendations in short - in order for the application to work faster, you just need to do fewer things in the application.

What other recommendations?



We reduce the number of dynamic frameworks that you have in the application. Why? Because they load much slower than system frameworks, the loading of system dynamic frameworks is already pre-optimized in the operating system, and the number of 5 is given as the optimal number of its own dynamic frameworks.

In our application we have only one dynamic framework and, mainly, we added it in order to share the code between different actions and to reduce the size of the application so that this code is not duplicated. But, in principle, if we only thought about launch speed, we could abandon dynamic frameworks.

By the way, if you use swift, then it adds several own dynamic frameworks at once, which are also considered in this limit. Those. it turns out that using swift adds a certain overhead at the start.

The steps that are marked here as rebase fixups, binding fixups are affected by the number of Objective-C characters in your application, so the main recommendation that was given in the report is to write large classes and write large methods. Or switch to swift, where all addresses are set statically and you do not need to do these steps, or at least they are shortened.

Naturally, for an existing large application, this is not a very useful recommendation, because you have to do a lot of refactoring, retesting a bunch of code, and, in general, the readability of the code naturally decreases. Therefore, even for new applications, I would not recommend this method of optimization.



The second stage, when we already got control from the operating system. Here we have more room for action, because we can change our code somehow and, naturally, we began to use Time Profiler for research here. I will not explain what Time Profiler is.



Time Profiler is a very cool and powerful tool, it helped us a lot, but here I will list a few problems or shortcomings that he could not solve for us.

By the way, we posted in open access the video of the last five years of the conference of developers of high-loaded systems HighLoad ++ . Watch, learn, share and subscribe to the YouTube channel .

First, we didn’t find any obvious bottlenecks in the appendix, which could have just been cut out, and everything would have improved right away. This is a well-known development problem called “uniformly slow code” and it is a consequence of the correct approach to development, when we first try to make working and readable code, and then we think about optimization. Another reason for this problem may be the features of the platform used. For example, here we see that the overhead of calling Objective-C methods is quite tangible.



The second problem is Time Profiler. In some cases, we can see such heavy parts of the call tree in Time Profiler, but the problem is that it is not always possible to understand from them which particular view a particular call belongs to, which part of the application it is. And this is mainly observed when we analyze the layout or the loading of views from XIBs. In XIBs, there can also be a rather complex hierarchy, and it is not always clear which view is loading slowly there.



The next problem is the dips on the CPU usage graph. Ideally, of course, for everything to work very quickly, the main thread should be constantly loaded at 100%, something should always be done there. But on the chart we always see failures, they may be smaller or larger, and Time Profiler tells us practically nothing about what they are caused by, what leads to them. But there are two main reasons for this:



Another Time Profiler problem I mentioned earlier about is that it’s difficult to understand the overall effect of optimization due to the fact that the spread of measurements can be quite large. That's what I did measurements on the same application without any changes, and we see that from start to start the time varies greatly:



What else can you look for when searching for places to optimize?



Profiler gives us a lot of useful information, but our psychology is designed so that it is very easy to get on the wrong track. During the analysis, we tend to pay more attention not to the places that really take a lot of time and can give a big win, but to those that are very easily visible and understandable and which we are interested in doing.

For example, in the process of finding places for optimization, I found a place where calls to the baseboard at the launch stage took as much as 20 ms. I begin to think: “And how can I get rid of all this? Maybe replace the baseboard with something else? ". But, in an amicable way, you need to look at the problem to a higher level and understand, and why we, in principle, do it. In our case, this was done in the process of sending statistics on the launch of the application, and, in principle, we can simply transfer this statistics to a slightly later stage, and functionally, this will not change much.

Naturally, we first of all want to reduce the amount of work on the main thread. And first of all we pay attention to it, but we should not forget about the background threads either, because the hardware parallelization possibilities are not limitless. In particular, we are faced with a situation where one of the libraries we use and initialize at the start immediately went into the background thread and did some work there. At first, we didn’t even see what she was doing there, but then we decided to just turn it off and see what would happen. And it gave quite a significant effect.

Even in the Time Profiler catches the eye in the first place, that most of the time is spent on drawing UI and layout. But from the traces it is not always clear what exactly this time is spent on in the UI, because there are some strange system calls that are incomprehensible, CA, render, something. And these calls can relate to anything, to any drawing on the screen. But practice shows that the most voracious in the UI is drawing labels because it is relatively difficult to calculate their size and draw, and any pictures, because they need to be read from disk and decoded.

From the above, the conclusion follows: if you want to reduce the launch time, do all the operations as lazily as possible. What does it mean? Do not create or set up any screens and views if they are not shown immediately after the start. In general, perhaps, this is the most effective way to speed up the existing large application, where there are no obvious bottlenecks.

For example, what have we done lazy in our application? We made a lazy loading of pictures in the process of setting the appearance for the secondary screens. We removed the intermediate launch screen, we removed the creation of the background screens, which are in the sidebar, and much more. In principle, this rule applies not only to the UI, but also to any logic, any code, if some manager or action needs to be initialized at the start of the application. Consider whether it can be postponed until the main user interface appears. Perhaps in terms of functionality there will be no difference.



And a few words about such a controversial topic as creating a UI in an interface builder or in code. Strangely enough, XIBs are usually not a problem, the creation of a similar UI in the code is performed very slightly faster, and there are
cases where even slower. Here is a link to a rather old blog entry where this comparison was made. If you wish, you can download the test project, although it will take some effort to drag it to the latest version of Xcode, because it was written in 2010. And you can see for yourself what is slower and what is faster.



Input Output. In principle, reading and writing to flash memory on modern devices happens very quickly, it is a few or tens of milliseconds, so it’s not always worth bothering about it, but it happens that your or third-party code misuses it and opens too many files at launch. For example, we discovered such a problem with the Flurry analytics framework and with our own code where we load pictures to customize the look of the application. Time Profiler will not show you such places. In Time Profiler, at best, you will see small dips in the CPU graph. Instead, you can use another tool, the I / O Activity, which lists all I / O operations and the names of the corresponding files. By name it is then quite easy to determine which part of the application this file reads.

Similar information can be obtained not only by the I / O Activity tool, but also by a simple breakpoint for the open function. In the case of system frameworks and XPC, which I mentioned earlier, you can track, paying attention to the dips on the graphics CPU. In Profiler, you open the Call Samples view, where the list of all stack traces is, and see which calls preceded the failure. So you can understand which call leads to this delay.



When Time Profiler does not provide enough information ... I used to give an example with layout, and in such cases you can get more detailed information using the swizzling of layoutSubviews methods in all classes. What is swizzling will not explain. And Objective-C allows us to do this easily. In the swizzling layoutSubviews methods, we simply insert logging - how long it took this call, and we also output to the console a pointer to the object on which this layout was made. After that, we copy all this, paste it into a tablet, into Google Sheets, and we can analyze it. If, after such a log, we do not terminate the application, but exit to the debugger, pause, we can roughly understand what the views take the longest layout.



The optimization search methods that I described above have a big disadvantage - they do not allow you to confidently answer the question whether a small change resulted in an overall improvement, because the launch sequence in a large application can be quite complex. This is the interweaving of different callbacks on different threads, etc. And the fact that you have removed something somewhere in one place can either be transferred to a later stage of loading, or will not give any improvement at all, because this moment will be replaced by the expectation of performing some actions. And this problem is especially brightly manifested when corrections, improvements give a fairly small gain. Therefore, we come to the need to automate the launch of the start time measurement and perform a large number of measurements, so that for some median time it is more accurate to saylevel measurement error.

Of course, using Time Profiler for this is not an option, because it is difficult to automate it, and such a large amount of information that it provides is not needed for this task. Therefore, in the application itself, we added debug logs, which enter the time of various launch phases into the console and in a separate file. Something like this logs look like:



Here we have selected some key points of the critical launch path of the application. At these points we enter the absolute time from the start of the launch and the time from the previous stage. We used such logs not only later to automate measurements, but also during the search for places to optimize in addition to the Time Profiler. Because sometimes it is useful to simply get an idea of ​​how much time the larger stages of an application take. So we can understand which stage you need to spend more time in Time Profailer. And for such logs you can even build such beautiful diagrams in Google Sheets, which clearly show everything:



For example, this diagram shows how the time of various stages was redistributed after one change was made.



Those.without such measurements, you might think that you made some improvement, but in fact it turned out that time was simply redistributed.

Or here are such diagrams that show the sequence of different stages of the application: You



can think about them, which places we can parallelize, where we have an unnecessary relationship between the stages of launch, etc.

Let's talk about optimization.



There is a lot of talk in the developer community about Continuous Integration, TDD and other useful practices of continuous monitoring of the quality of the application, but for some reason there is very little information on how to monitor performance. We tried to fill this gap. And we consider one of the main achievements of the work done to be a system that allows us to continuously monitor the launch time during development. Having such a system, we solve the main problem, which led to the need to deal with it ... With the help of such a schedule, we can now clearly see how this or that change affected the launch speed, and we can take the necessary measures by seeing these signals. Now the time of feedback is reduced very much. If earlier we received feedback from users that something is slow,now we see it very clearly.

Naturally, when using such a system, as well as many other useful practices in the development, the benefits of such an approach can only be seen as the application evolves. At the very beginning you may not understand why this is necessary.

I'll tell you briefly how it is technically implemented.



For each commit, a task is launched on Jenkins. It assembles the application in a release configuration with the profiling logs turned on and the application automatically terminating at the final stage, when we consider that the application has fully started. This assembly is launched 270 times on a device specifically allocated for this task. At the moment we have this Iphone 5S, on iOS 9.

And you probably have questions, where did this number 270 come from? Obviously, to reduce the error, this number should tend to infinity, but then each run will take infinite time. Therefore, we made 10 thousand measurements and calculated the required number of launches using the formula for determining the sample size for a normal distribution with an error of about 10 ms. Because of this, in our chart everything is jumping a little from side to side.

By the way, if we return to the schedule, we can see the moment when we switched from 10 measurements to 270 measurements. The bottom line shows the minimum start time from all starts, respectively, when we increased the number, the minimum was less.

And further, when we made these 270 launches, we process data on all launches, calculate their statistical characteristics and then save them to InfluxDB, and then we plot them.

Specific examples of scripts, how it is done, you can see later in my article. There is really nothing complicated, there bash-scripts are literally from 10 lines. Here I will tell only the main points, what tools we used for this.



As you all know, iOS is a closed system, so there are two options for automating tasks such as automatic installation, autorun, and getting results from the device. We can either work with an undocumented USB protocol that Apple itself uses in its applications, or we can just put up Jailbreak and, like white people, walk to the device via ssh and run the application by executing one command. We, of course, settled on the latter, because it is much simpler, more reliable and more flexible. We do not need to tie the test phone to a specific slave, connect via USB. The phone just lies on the desk of one of the developers, and from any of the Jenkins slaves we can run measurements on it. Or, if the developer needs to run something, he simply takes and launches.

Pitfalls of this approach, which emerged after some time of operation:



Yes, now we can see that at some development stage, the launch speed increased, but, unfortunately, the jump moment does not always coincide exactly with the commit where the deterioration occurred. This is due to the measurement error and due to various external factors — some system process or something else that we do not know yet could have been performed on the device at this moment.

Even if we have identified the commit in which the degradation occurred, in order to fix it, it is not so easy, you still have to do some research work in Profiler, conduct comparison experiments and analyze the code. Of course, I would like to have some kind of methodology that allows me to see where exactly the behavior at run time changed when I entered a commit, something like a hybrid Time Profiler and div, but unfortunately we don’t know such a tool.

And it also happens that the performance deteriorates due to the update of some third-party library, and here we can do little.

I will list the main conclusions to which we came from the results of all the work done, and which I tried to convey in my report.





In conclusion, I will talk about the results of the work done from the product point of view, and not from the point of view of the developer. Here is the launch of the application before and after the changes ( video demonstration ).



Video slow 2 times for clarity. And we see what has become better. Although insignificant, but better. It is difficult to calculate the digital acceleration, which we achieved on the basis of all the work, because the work lasted for a long time, in parallel, other non-related tasks were added. In addition, the measurement method was developed as we did all of this, so there are no initial figures, but an approximate comparison “before and after” shows that we managed to reduce the launch time by about 30%.



There is also such a beautiful statistics from our analysts, which shows that the number of users whose launch occurred in less than 2 seconds increased 10 times during this time. It may not seem quite obvious, how did it increase 10 times if it was only a third improved? But if we calculate the weighted average for all user groups, then we also get about an improvement of 40%, which coincides with the data of the Time Profiler.

Well, the most important indicators for mobile developers - retension and user satisfaction - also improved slightly. Here are the indicators of retention:



And the negative reviews in the story.



Although it is difficult to draw conclusions from such minor fluctuations, it seems to us that the work on acceleration also contributed to this.

All that I didn’t have time to talk about in today's report can be found in my article on this topic.


Contacts


→ github
→ twitter

This report is a transcript of one of the best speeches at a professional conference of developers of highload systems Highload ++ , specifically, the section “Mobile application performance”.

In a couple of weeks we will have an entire conference dedicated to mobile development -
AppsConf . Here , Yandex will already tell you about the optimization of loading time using the example of Yandex.Maps. And the Mail.ru team will reveal the theme of optimizing the size of an iOS application , this time using the example of ICQ.

Source: https://habr.com/ru/post/328908/


All Articles