Performance Applications
The application for the Android platform will run on a mobile device with limited computational capabilities and memory, and with a short battery life. So, the application must be effective. Battery life is one of the reasons why you want to optimize your application, even if it runs fast enough. The battery life is very important for users, and the Android platform will easily show the user if the application significantly reduces it.
Despite the fact that micro-optimizations will be described here, they will almost never be able to damage your application. Choosing the right algorithms and data structures should always be first priorities, but this aspect will not be considered.
')
Intro
There are only two basic rules for developing a productive code:
- Do not do the work you do not need
- Do not allocate memory, which can not allocate
Optimize wisely
We describe micro-optimizations for Android, so it is assumed that you have already used the profiler to determine which specific piece of code you need to optimize, and that you already know how to evaluate the effect of any changes you make. You’ve been developing a lot of time in development, so it’s important to know that you spend it wisely.
It is also assumed that you have already chosen the best algorithms and data structures, and provided for the impact of your decisions regarding the API on performance. Using the right data structures and algorithms improves performance much more than any of these tips, and carefully considering the impact of API on performance will ease the transition to a better implementation in the future (which is primarily important for library code than application code).
One of the most cunning difficulties that you will encounter during the micro-optimization of an Android application will be that your application must work with a high probability on many different hardware platforms. Different versions of the virtual machine on different processors operate at different speeds. In general, this is not the case when you can just say “Device X is n times faster / slower than device Y” and extrapolate the results to other devices. In particular, testing on the emulator says almost nothing about performance on any device. There is also a huge difference between devices with and without JIT: the “best” code for a device with JIT does not always remain the best for a device that is devoid of it.
If you want to know how the application behaves on some device, then it will have to be tested on it.
Avoid creating unnecessary objects.
Creating objects is never free. A garbage collector that works with generations and pools for temporary objects in each stream can make allocating memory easier, but allocating memory is always more expensive than not allocating it.
If you select objects in a loop in the user interface, you force a periodic garbage collection, creating small “stutters” that are visible to the user. The parallel garbage collector, which appeared in Gingerbread, can help with this, but unnecessary work should always be avoided.
Thus, you should avoid creating objects that are not necessary. Here are a couple of examples that can help with this:
- If there is a method that returns a string and it is known that the result will always be added to the StringBuffer, change the implementation so that the method adds directly instead of creating a short-lived temporary object.
- When extracting a string from the input data set, try to return the substring of the initial data instead of creating a copy. A new String object will be created, but the character array for it and the initial data will be shared. (The compromise is that if only a small part of the initial input is used, it will still be stored in the memory as a whole if you follow this advice).
Take something more radical: dividing multidimensional arrays into one parallel one-dimensional array:
- The int array is much better than the Integer array. But we can summarize this fact: two parallel arrays of int are also much more efficient than arrays of objects (int, int). The same applies to any combination of primitives.
- If you need to implement a container that contains pairs (Foo, Bar), remember that two parallel arrays of Foo [] and Bar [] in general are much better than one array (Foo, Bar) of objects. (The exception is when you develop an API; for these cases, it is better to stick with a good API at a slight performance detriment. But you should try to be as efficient as possible in your own internal code.)
In general, avoid creating short-lived objects if possible. A smaller number of objects created means that garbage collection occurs less frequently, which directly affects the interaction with the user.
Performance myths
Old versions of this manual contain various incorrect statements. We turn to some of them.
On non-JIT devices, calling methods on a specific class object is slightly faster than calling through an interface. (Thus, it is cheaper to call HashMap methods instead of Map, even if it is the same object.) Not 2 times faster. The real number is more close to 6%. Moreover, with JIT, the difference is not noticeable at all.
On devices without JIT, caching of calls to the class fields is 20% faster than a repeated call to the field itself. With JIT, the cost of accessing a field is equal to the price of accessing a local address, so this optimization is not needed until it seems that it makes the code more readable. (What is true about final, static and static final fields).
Prefer static to virtual
If there is no need to refer to the fields of the object, then the method can be made static. The call will be 15-20% faster. This is good also because it can be said from the signature that the method does not change the state of the object.
Avoid internal accessor methods
In native languages ​​such as C ++, it is good practice to use getters (eg i = getCount ()) instead of direct access (i = mCount). This is a great habit for C ++, because the compiler can usually perform inline substitution and, if you need to restrict or debug field access, you can add the necessary code at any time.
For Android, this is a bad idea. Calling virtual methods is quite expensive - much more expensive than searching for object fields. Of course, using common OOP practices and using getters and setters in the interface is reasonable, but inside the class you should always refer to the fields directly.
Without JIT, direct access to the field is about 3 times faster than calling a regular getter. With JIT, where direct access is equal to the speed of direct access to the local address, direct access will be approximately 7 times faster than calling the accessor method. This statement is true already for Froyo, but in future releases the situation will improve due to the fact that JIT inlineit getters.
Use static final for constants.
Consider the following declaration at the beginning of the class:
static int intVal = 42; static String strVal = "Hello, world!";
The compiler generates a class initialization method, with the name that is executed when the class is first used. The method assigns the value 42 to the variable intVal and extracts the reference from the table of unchangeable rows of the class file for strVal. When these variables are accessed, the corresponding class fields will be searched.
Here's how we can change this behavior with the “final” keyword:
static final int intVal = 42; static final String strVal = "Hello, world!";
No longer need a method, because the constants are written to static field initializers in the dex file. Code that accesses intVal uses integer value 42 directly, and access to strVal will cause an inexpensive reference to a string constant instead of searching for a field. (Note that this optimization only works for primitives and strings, and not for all arbitrary types of references. Despite this, constant static declarations should be used wherever possible.)
Using the improved For loop syntax
The for-each loop can be used for collections that implement the
Iterable
interface and arrays. For collections, an iterator is allocated to call the hasNext () and next () methods. For
ArrayList
classic loop with a counter is about 3 times faster (with or without JIT), but for other collections, the “for-each” syntax would be equivalent to explicitly using an iterator.
There are several alternatives for traversing an array:
static class Foo { int mSplat; } Foo[] mArray = ... public void zero() { int sum = 0; for (int i = 0; i < mArray.length; ++i) { sum += mArray[i].mSplat; } } public void one() { int sum = 0; Foo[] localArray = mArray; int len = localArray.length; for (int i = 0; i < len; ++i) { sum += localArray[i].mSplat; } } public void two() { int sum = 0; for (Foo a : mArray) { sum += a.mSplat; } }
zero () is the slowest method, because JIT cannot optimize the length of the array at each iteration step.
one () is faster. It pulls the necessary information into local variables, avoiding the search field. Only array.length here improves performance.
two () is faster for devices without JIT and indistinguishable from one () for devices with JIT. It uses the extended for syntax introduced in Java 1.5.
Bottom line: use the for-each syntax by default, but think about manual iteration for performance-critical passes through
ArrayList
. (See also
Effective Java , clause 46.)
Use package-private access instead of private for private inner classes.
Consider the following class definition:
public class Foo { private class Inner { void stuff() { Foo.this.doStuff(Foo.this.mValue); } } private int mValue; public void run() { Inner in = new Inner(); mValue = 27; in.stuff(); } private void doStuff(int value) { System.out.println(" " + value); } }
The most important thing to note is the definition of a private inner class (Foo $ Inner), which directly refers to the private method and the private field in the outer class. The code is correct and displays "Value is 27", as expected.
The problem here is that the virtual machine considers direct access to private members of Foo from the inner class as invalid, because Foo and Foo $ Inner are different classes, even though Java allows inner classes access to private members of the outer classes. To overcome this gap, the compiler generates a pair of artificial methods:
static int Foo.access$100(Foo foo) { return foo.mValue; } static void Foo.access$200(Foo foo, int value) { foo.doStuff(value); }
The inner class calls these static methods each time it needs to access the mValue or calls the doStuff () of the outer class. This means that the code above turns into a case where access to class fields occurs through accessor methods. We have already discussed the question of the slowness of such methods before direct access to fields, so it turns out that the specific idiom of the language results in an “invisible” degradation of performance.
If a performance-critical piece of the application uses similar code, then you can avoid this behavior by declaring the fields and methods accessed from the inner class, package-private instead of private. Unfortunately, this means that the fields will be accessible from other classes of the package, so this technique cannot be used in a public API.
Use floating point numbers wisely
In short, floating-point calculations are about 2 times slower than integers on Android devices. This is true for G1 (without JIT and FPU) and Nexus One with FPU and JIT. (Although of course, the absolute difference between these two devices in the speed of arithmetic operations is about 10 times).
In terms of speed, there is no difference between float and double on more modern hardware. From memory, double is 2 times more. For desktops, without taking memory into account, double should be preferred over float.
Also, some chips carry onboard integrated multiplication of integers, but do not have integrated integer division. In such cases, integer division and modulo operations are performed at the software level. Think about it if you are writing a hash table or doing a lot of math operations.
Know and use libraries
In addition to all the usual reasons for using library code instead of writing your own, keep in mind that the system can replace library code with assembly inserts that can be faster than the best code produced by the JIT compiler for the Java equivalent. A typical example is
String.indexOf
and other methods that Dalvik replaces with internal code. Because of this,
System.arraycopy
about 9 (!) Times faster than the manual cycle on the Nexus One with the existing JIT. (See also
Effective Java , paragraph 47.)
Use native methods wisely
Native code is not necessarily more efficient than Java. For one reason: there is a price for switching Java -> native code, and JIT cannot do anything within these limits. If you allocate native resources
(heap memory, file handles, or anything else), then the complexity of collecting these resources in a timely manner increases noticeably. You also have to compile the code for each architecture on which you plan to run it (instead of relying on JIT for this). You can even build multiple versions for the same architecture: the native code compiled for an ARM processor in G1 cannot take full advantage of the same processor, but in Nexus One, and the code compiled for Nexus One simply won't run on G1.
Native code is mainly useful if there is some native base that you want to port to Android, and not to speed up individual parts of a Java application. (See also Effective Java, para. 54.)
At last
One last thing: always measure. Before you start optimizing, make sure you have a problem. Make sure that you can accurately measure the existing performance, otherwise you will not be able to measure the advantage obtained from alternative solutions.
Every statement made here is backed by a test. Source codes can be found at code.google.com, in the “dalvik” project.
These tests are written using Caliper micro-measurements framework. Micromeasures are difficult to perform correctly, so Caliper helps to do the hard work for you, and even determines some cases where you measure things other than what you are trying to measure (for example, because the virtual machine has optimized all of your code). We strongly recommend using Caliper to make your own micro measurements.
You can also use Traceview for profiling, but it is important to understand that it now turns off the JIT, which leads to a large execution time that the JIT can play. It is especially important, after making the changes proposed by Traceview, to make sure that the resulting code is actually executed faster when running without Traceview.