Record time: how we increased the launch speed of Mail.Ru Mail application on iOS

Launch speed is a critical factor for the long-term success of an application. It is especially important for applications such as Mail.Ru Mail, which run many times a day in order to quickly check for new emails in the Inbox.

The article focuses on optimizing a large-volume application code that has a relatively long history of development and has managed to accumulate during this time a lot of visible and hidden from the user functionality. The challenge was to reduce startup time without cutting back on functionality.

Content:
')

What can and should be optimized?

By launch time, we mean the interval between clicking on the application icon and the moment when the user has the opportunity to perform actions for which he starts the application. The most important thing is the subjective perception of time by the user, so you can go for various tricks that are not directly related to code optimization. But the objective indicators, that is, the measured time of the main stages of the launch, directly affect the subjective ones. In addition, they allow you to measure the impact of a change on the result.

We consider the case when the application is unloaded from RAM, since the opening of an already running application takes much less time. Ideally, you should limit the conditions even more, eliminating the influence of various operating system caches, but then you would have to reboot the device before each measurement, which takes more time and is more difficult to automate.

The entire launch of the application can be divided into two major stages: the first is from clicking on the icon to the first call of custom code (usually these are +load methods), the second is the direct execution of application code, including actions performed by system frameworks (run run loop, screen load) launch and main UI) and your own code.

Loading the application into memory

The first stage is the loading of the executable application code into virtual memory, the loading of the used system and user dynamic libraries, the matching of addresses pointing to various characters inside and outside the main executable file, the initialization of Objective-C runtime. Some system frameworks may have their own +load methods or constructor functions that are also executed at this stage.

A very detailed story about the first stage was at WWDC this year in a wonderful 406 Optimizing App Startup Time report, a brief squeeze from which you can read on the Use Your Loaf blog - Slow App Startup Times .

I will list the main recommendations.

Run the application with the environment variable DYLD_PRINT_STATISTICS (to work on iOS 9, you must also check the Dynamic Linker API Usage checkbox) to see how long the various actions of the first stage take. Here is an example of such statistics on the iPhone 5 under iOS 9.2.1:

 total time: 2.1 seconds (100.0%) total images loaded: 309 (304 from dyld shared cache) total segments mapped: 14, into 2352 pages with 140 pages pre-fetched total images loading time: 842.08 milliseconds (39.3%) total dtrace DOF registration time: 0.19 milliseconds (0.0%) total rebase fixups: 310,006 total rebase fixups time: 51.52 milliseconds (2.4%) total binding fixups: 376,990 total binding fixups time: 598.68 milliseconds (27.9%) total weak binding fixups time: 6.55 milliseconds (0.3%) total bindings lazily fixed up: 0 of 0 total time in initializers and ObjC setup: 639.21 milliseconds (29.8%) libSystem.B.dylib : 130.68 milliseconds (6.1%) libBacktraceRecording.dylib : 1.05 milliseconds (0.0%) libc++.1.dylib : 0.16 milliseconds (0.0%) CoreFoundation : 1.74 milliseconds (0.0%) CFNetwork : 0.02 milliseconds (0.0%) vImage : 0.00 milliseconds (0.0%) libGLImage.dylib : 0.15 milliseconds (0.0%) QuartzCore : 0.02 milliseconds (0.0%) libViewDebuggerSupport.dylib : 0.11 milliseconds (0.0%) libTelephonyUtilDynamic.dylib : 0.00 milliseconds (0.0%) CoreTelephony : 0.04 milliseconds (0.0%) MRMail-Alpha-Enterprise-Shared : 275.17 milliseconds (12.8%) MRMail-Alpha-Enterprise : 224.68 milliseconds (10.5%) total symbol trie searches: 473750 total symbol table binary searches: 0 total images defining weak symbols: 26 total images using weak symbols: 63

To shorten the total images loading time , use as few dynamic frameworks as possible. Usually most of them (about 200–300) are system frameworks that come with iOS, and their load times are already optimized. But if you add your own frameworks, it can negatively affect the launch speed of the application. By the way, standard Swift libraries are also considered user-defined.
The time of rebase fixups , binding fixups and ObjC setup affected by the number of Objective-C characters: classes, selectors, categories. For existing applications it is difficult to give recommendations that would not be associated with significant refactoring or cutting back on functionality. But if you have such an opportunity, you can try to write the maximum amount of code on pure Swift, that is, without using Objective-C runtime.

These recommendations can provide a good acceleration, and they should not be neglected. But it is much more interesting to consider the second stage, when the code that we write ourselves is executed and, therefore, we can profile, investigate and improve.

Search for optimization opportunities

First of all, it is necessary to determine key indicators by which we will evaluate the utility of optimizations. We decided to focus the main efforts on one of the most common scenarios: the user has already been logged into his account and, launching the application, enters the list of letters in the Inbox. In addition, we have identified several invisible to the user stages, which are part of the internal logic of the application, in order to better understand which of them takes most of the time. Here are some of them: loading application settings, loading accounts, loading a cached list of folders, appearing a view controller list, initializing background screens.

Time profiler

Of course, any competent iOS developer, when looking for performance problems, will first of all turn to the Time Profiler tool. And we also started with him. In our view, the task should have been easily solved in approximately the following sequence of steps: we look at the profiler, we notice that most of the time is spent on two or three, well, a maximum of ten individual calls, and we concentrate all efforts on optimizing them. But in the profiler we saw the following.

There are no clear lead times in Call Tree that you can easily get rid of. Basically, this is loading UI from xibs, creating a tree of UI objects, layout. It is clear that if the goal of launching the application is to see the UI, then we cannot simply discard this code.
If we exclude the UI boot code from consideration, the rest of the time is smeared with a very thin layer across all stages and launch sub-stages. From each individual change here, it was possible to squeeze 50 ms at best.
On the graph of CPU usage, it can be seen that the processor is not continuously used to 100% (or at least 50%, corresponding to single-threaded execution), in some places dips and pauses are visible. What happens at these moments, with the help of Time Profiler, you can find out only approximately by looking in the Samples list, which stack traces precede the pause. Most often it is input / output or interaction with system services via XPC.

^{CPU usage schedule}

However, the Time Profiler was still useful.

First, it gives an idea of what code inside the system frameworks, especially in the UI Kit, takes a lot of time. Based on this, we can make decisions about the use of certain means of the UI Kit and the UI initialization sequence.

Secondly, after we have identified the key events in the startup process, we can see what exactly is being done between these events. Without the use of Time Profiler, this is not always obvious, since the launch sequence is a complex interrelation of various subsystems and classes of business logic and UI, asynchronous callbacks and code running in parallel. We used this method: we find samples that interest us in which the event occurred in the Samples list, put a label in these places on the chart (Edit> Add Flag), select the area between the two labels and study Call Tree.

Another Time Profiler allows you to find functions and methods that are themselves quickly executed, but slow down the launch due to the large number of calls. To do this, tick the Invert Call Tree and Top Functions checkboxes in the Call Tree settings, arrange the list by Self Weight or Self Count field and go through the resulting list from top to bottom, excluding uninteresting calls using the Charge XXX to callers or Prune XXX and subtrees options. So you can, for example, find out that almost 228 ms are occupied by calls to the objc_msgSend function, that is, the overhead of calling methods in Objective-C, and 106 ms are occupied by calls to CFRelease — the overhead of managing memory. It is unlikely that this will be possible to optimize, but in each particular case there is a chance to find something else interesting. In our case, these were calls to +[UIImage imageNamed:] , since the creation of appearance configurations for all application screens took place at the start.

Logs embedded in the application

Time Profiler does not provide an easy way to measure the total time spent on a complex sequence of actions, consisting of asynchronous calls. The method described above — with Samples search and labeling — cannot be automated. Therefore, we decided to use a different approach - to place logs at critical points of the launch process. Each record of such a log consists of the name of the event, the absolute time since the beginning of measurements, the relative time elapsed from the previous event. An example of a log.

 load 0 0 main 44 44 did finish launching 295 250 did init BIOD 489 194 will load accounts 1145 655 ELVC view did appear 1663 518 did load accounts 1933 269 did load cached folders 2075 142 items from cache 5547 3471 ELVC did show initial items 7146 1599 did update slide stack 7326 179 stop 7326 0

How to choose the right place for the arrangement of logs? Considering that in the process of launching individual sub-steps can be carried out in parallel, it makes sense to pay more attention to those that take more time and the end of which you have to wait to continue the launch process. For example, downloading the list of messages from the database and loading the xib controller view can be done in parallel, but you cannot download the list of messages from the database until we open the database. To form such a critical path from the events that must be executed, you can use the Xcode feature, which allows you to see not only the current stack-trace, but all previous stack-traces in the sequence of asynchronous calls. Just put a breakpoint in the line that you think is the end of the startup process, switch the Debug Navigator to View Process By Queue mode - and get a critical path leading from main or application:didFinishLaunchingWithOptions: to the final goal.

Although logs have less granularity than Time Profiler data, they have several important advantages.

Measurements using logs can be automated. This will help to quickly assess the effect of the implemented change during development, as well as to make the launch speed check part of the Continuous Integration process.

^Grafana
You may have noticed that the execution time of the same code may differ quite significantly from launch to launch. The duration is affected by various optimizations built into the operating system itself (such as disk cache), as well as other processes running in parallel at the same time on the device. In order to get more stable and comparable results, we take measurements several times and take the average or median.
Logs can be copied to Google Sheets, analyzed and compared in any convenient way, and even build beautiful diagrams that clearly show the sequence and duration of various stages.

In profiling using logs there are pitfalls. Sometimes you will notice that one of the stages takes a disproportionately long time. Do not rush to blame him. First make sure that all this time really applies to this stage. Perhaps the completion of the stage is actually slowed down by waiting for dispatch_async on the main queue, and the main queue is busy at this time with other things. If you remove dispatch_async in this place, the duration of this stage can be reduced, but the total time for the execution of all stages will simply be redistributed and remain the same.

When searching for places to optimize, it is not enough just to look at how long a particular function is performed: you also need to pay attention to how the high-level application architecture is organized. It is important to understand the dependencies between the launch stages: can they be performed in parallel or only sequentially. It is possible that acceleration is achievable due to parallelization of consecutive stages or due to postponement of optional stages for a later time. Profiling logs can help with this.

Input Output

Reading files even from flash memory takes an order of magnitude longer than working with RAM or CPU, so it makes sense to minimize the number of disk operations at the start. The inhibition associated with reading data from the disk appears in Time Profiler as failures at best, and often in no way, so other methods should be used for tracking.

The I / O Activity tool (System Usage template) only works on real devices. It shows all system calls related to input / output, paths to opened files, and stack traces of places in the code from which the corresponding operations were performed.
If the first method does not suit you, similar information can be obtained using the usual breakpoint on the __open function. In this case, the name of the file being opened can be obtained using the LLDB p (char *)$r0 command (the name of the register in which the first call parameter is located will depend on the architecture ).

Layout and stuff

Some calls that put a load on the CPU are almost impossible to analyze with the Time Profiler: their stack traces do not always contain information about which part of the application they belong to. An example is a layout walk through a hierarchy view. Often their stack-trace consists only of system methods, but we would like to know for which view layout takes the most time. Such calls can be measured and correlated with the objects on which they work, using swizzling, by adding, before and after the corresponding call, a code that displays information about the objects being processed and the execution time of each call to the console. Having such a log, it is easy to build in Google Sheets a sign with the allocation of time spent on the layout of each of the views. We used the Swing Aspects library.

 static void LayoutLoggingForClassSelector(Class cls, SEL selector) { static NSMutableDictionary *counters = nil; if (!counters) { counters = [NSMutableDictionary dictionary]; } SEL selector = NSSelectorFromString(selectorName); [cls aspect_hookSelector:selector withOptions:AspectPositionBefore usingBlock:^(id<AspectInfo> info) { TLLOG(NL(@"lob %s %p"), class_getName([[info instance] class]), (void *)[info instance]); } error:nil]; [cls aspect_hookSelector:selector withOptions:AspectPositionAfter usingBlock:^(id<AspectInfo> info) { NSValue *key = [NSValue valueWithPointer:(void *)[info instance]]; NSNumber *counter = counters[key]; if (!counter) { counter = @(0); } counter = @(counter.integerValue + 1); counters[key] = counter; TLLOG(NL(@"loa %s %p %@"), class_getName([[info instance] class]), (void *)[info instance], counter); } error:nil]; }

Optimization

We formulated the conclusions that we have made for ourselves in the course of work to accelerate the launch in the form of tips. We hope they will be useful to other developers. Often, in the process of analyzing the collected performance data, we do not quite correctly interpret certain data. By taking a false mark, we are trying to correct something that does not really require corrections. Often, attempts to optimize code make it more difficult to understand and less reliable, the number of extreme cases to be taken into account increases, and so on. Therefore, when you are trying to optimize something, it is very important to do it not blindly, but having a clear measurement method, showing stable results and helping to evaluate the usefulness of each edit.

^{Premature optimization}

Before proceeding with the optimization of the launch time, it is useful to determine which start time limit is in principle achievable if all the code are removed from the application to the maximum, except for what is needed so that the application looks at least externally (it doesn't talk about preserving the functionality here). ). In our case, such a minimal application is a single screen containing a UINavigationController with a bar and buttons, a UITableView and several cells.

Test the version of the application that is as close as possible to the build for the App Store, disable all kinds of runtime checks, logs and asserts, turn on optimization in the build settings. In our case, the assembly for internal testing contains, for example, a swizzling of a large number of methods for running time-checking of their execution on the main thread. Naturally, this has a very large impact on the results obtained.

First of all, you should optimize the code that runs on the main thread, since the ultimate goal of the entire startup process is to show something on the screen, and this can only be done from the main thread. Background threads also should not be ignored, since the hardware parallelization capabilities are not limitless: on most devices only two cores are available, and very heavy code running in the background can significantly affect the overall launch time. We encountered this when we examined the influence of one of the analytics libraries on the launch time. After the library was initialized, it worked completely in the background thread, but despite this, turning it off gave a relatively large gain.

Studying how to reduce the time spent on building a UI, we found that considerable time is spent on loading, configuring and displaying cells in the list of letters. There are several reasons: there are many cells, each of them consists of a set of subviews, of which the most voracious are various icons and labels. Icons are relatively slow because loading images in principle is slower than other actions, and the slowness of labels is caused by the necessity of calculating their sizes depending on the content.

The icons that are displayed in the letter cell are indicators of the letter attributes (unread, flagged, presence of attachments), as well as action icons that appear during the swipe. We made the creation of the action pane lazy: it is not created until the swipe happens. We did the same with attribute icons - they will not be loaded if there are no letters with such attributes in the list.

, . , , : , , , , UI . — +[UIImage imageNamed:] . , imageNamed: , — . AppearanceConfigurator . , , , , . imageNamed: , : imageNamed: imageNamed:inBundle:compatibleWithTraitCollection: , , imageNamed: . -, UIImage , .

 + (UIImage *)imageWithBlock:(UIImage *(^)(void))block { MRLazyImage *lazyImage = [(MRLazyImage *)[self alloc] initWithBlock:block]; return (UIImage *)lazyImage; } - (UIImage *)image { if (!_image && self.block) { _image = self.block(); self.block = nil; } return _image; } - (void)forwardInvocation:(NSInvocation *)invocation { invocation.target = self.image; [invocation invoke]; } - (NSMethodSignature *)methodSignatureForSelector:(SEL)sel { return [self.image methodSignatureForSelector:sel]; }

UIAppearance , . .

UILabel , AsyncDisplayKit . ASDK — , UI UI, . ASDK , , . UILabel ASTextNode . , , . , , , SDK. ASTextNode : , — . . , , , UILabel ( , CoreText).

ASDK . , , , ASTextNode UILabel .

, , Time Profiler, — , xib. , , Interface Builder . , — ? xib . , , , , . 2010 Cocoa with Love . , , , .

, xib, , — UINib . UINib , .

, , Time Profiler . , keychain-, Touch ID, Pasteboard, : , . , , — UI.

application:didFinishLaunchingWithOptions: . , , , UI.

Core Data SQLite-, , Core Data . , UI, , , . — .

Swift ? Objective-C. Swift . Swift rebase/bind , Objective-C. , Swift . Swift- ( — ). , Swift , ABI.

, . , .

.
: , ( : , - , ).
— , .

, .

- iOS Human Interface Guidelines : « — . , ». Mail.Ru, HIG .
. , .
. , UI, . — , , , .

, . , . , , . , , , .

, , .

Jenkins, , CI. iPhone. : jailbreak . jailbreak, .

USB Jenkins, . ssh.
Jenkins , .
Jenkins , USB.
.

jailbreak . , jailbreak . xcodebuild App Store Release . ENABLE_TIME_LOGGER , .

 $XCODEBUILD -project MRMail.xcodeproj -target "$TARGET" -configuration "$CONFIGURATION" \ -destination "platform=iOS" -parallelizeTargets -jobs 4 \ CODE_SIGN_IDENTITY="iPhone Developer" \ MAIN_INFOPLIST_FILE="tools/profiler/Info.plist" \ GCC_PREPROCESSOR_DEFINITIONS='$GCC_PREPROCESSOR_DEFINITIONS ENABLE_TIME_LOGGER=1 DISABLE_FLURRY=1'

Info.plist , iTunes File Sharing: jailbreak - Documents. Flurry : , Flurry , , . jailbreak ios-deploy :

 APP_BUNDLE="$PROJECT_ROOT/build/${CONFIGURATION}-iphoneos/$PRODUCT.app" $IOS_DEPLOY --bundle "$APP_BUNDLE" --id "$DEVICE_ID" \ --noninteractive --justlaunch

jailbreak .app .ipa ssh ( — localhost , ssh ). ipainstaller Cydia.

 APP_BUNDLE="$PROJECT_ROOT/build/${CONFIGURATION}-iphoneos/$PRODUCT.app" cd "$PROJECT_ROOT/build" rm -rf Payload; mkdir -p Payload cp -a "$APP_BUNDLE" Payload/ rm -f MRMail.ipa; zip -r MRMail.ipa Payload SCP_TO_DEVICE MRMail.ipa root@localhost: SSH_TO_DEVICE ipainstaller MRMail.ipa

, , . , , .

. - . jailbreak idevicedebug , USB — ifuse .

 for i in $(seq 1 $NUMBER_OF_RUNS) do $IDEVICEDEBUG --udid "$DEVICE_ID" run "$BUNDLE_ID" >/dev/null 2>/dev/null & COMPLETION_PATH="$MOUNTPOINT_PATH/$COMPLETION_INDICATOR" LOG_PATH="$MOUNTPOINT_PATH/$LOG_NAME" for j in $(seq 1 5) do sleep $MOUNT_SECONDS_PEDIOD $UMOUNT "$MOUNTPOINT_PATH" 2>/dev/null || true $MKDIR "$MOUNTPOINT_PATH" $IFUSE --documents "$BUNDLE_ID" --udid "$DEVICE_ID" "$MOUNTPOINT_PATH" sleep $AFTER_MOUNT_SECONDS_PEDIOD if [ -f "$COMPLETION_PATH" ] && [ -f "$LOG_PATH" ]; then break fi done RESULT_PATH="$LOGS_FOLDER_PATH/$(date +"%d-%m-%Y-%H-%M-%S").csv" (set -x; cp "$LOG_PATH" "$RESULT_PATH") $UMOUNT "$MOUNTPOINT_PATH" done

jailbreak . open Cydia, ssh.

 SANDBOX_PATH=`SSH_TO_DEVICE ipainstaller -i "$BUNDLE_ID" | grep '^Data: ' | awk '{print $2}'` SANDBOX_PATH="${SANDBOX_PATH//[$'\t\r\n ']}" COMPLETION_PATH="$SANDBOX_PATH/Documents/$COMPLETION_INDICATOR" LOG_PATH="$SANDBOX_PATH/Documents/$LOG_NAME" for i in $(seq 1 $NUMBER_OF_RUNS) do SSH_TO_DEVICE open "$BUNDLE_ID" sleep $MOUNT_SECONDS_PEDIOD for j in $(seq 1 5) do if SSH_TO_DEVICE test -f "$COMPLETION_PATH" && \ SSH_TO_DEVICE test -f "$LOG_PATH" then break fi sleep $AFTER_MOUNT_SECONDS_PEDIOD done RESULT_PATH="$LOGS_FOLDER_PATH/$(date +"%d-%m-%Y-%H-%M-%S").csv" (set -x; SCP_TO_DEVICE root@localhost:"$LOG_PATH" "$RESULT_PATH") done

Ruby , ( , , , , ) Influxdb , Grafana .

. , ? , (, 10000). — 270, 10 , , .

, CI, - , - iOS. , . .

^Statistics

results

, . — , .

Source: https://habr.com/ru/post/307778/

All Articles