In the article I would like to give a brief description of the work of the Renderscript technology inside Android, compare its performance with Dalvik on a specific example of an Android device with an Intel processor and consider a small technique of optimizing renderscript.
Renderscript is an API that includes features for 2D / 3D rendering and high performance math calculations. It allows you to describe any task with the same type of independent calculations over a large amount of data and break it into homogeneous subtasks that can be performed quickly and in parallel on multi-core Android platforms.
This technology can improve the performance of a number of dalvik applications related to image processing, pattern recognition, physical modeling, cellular automaton model, etc., which, in turn, will not lose hardware independence.
1. Renderscript technology inside Android
I will give a brief overview of the mechanism of the Renderscript technology inside Android, its advantages and disadvantages.
1.1 Renderscript offline compilation
Renderscript began to be supported in Honeycomb / Android 3.0 (API 11). Namely, in the Android SDK, in the platform-tools directory appeared llvm-rs-cc (offline compiler) for compiling renderscript (* .rs file) into bytecode (* .bc file) and generating java object classes (* .java files ) for structures, global variables inside renderscript and renderscript itself. The llvm-rs-cc is based on the
Clang compiler, with minor modifications for Android, which is a front-end for the
LLVM compiler .
')
1.2 Renderscript run-time compilation
In Android, a framework appeared based on the LLVM back-end, which is responsible for the run-time bytecode compilation, linking with the necessary libraries, launching and monitoring the execution of renderscript. This framework consists of the following parts:
libbcc initializes the LLVM context in accordance with the specified pragmas and other metadata in bytecode, compiles the bytecode and dynamically links to the required libRS libraries;
libRS contains the implementation of libraries (math, time, drawing, ref-counting, ...), structures and data types (Script, Type, Element, Allocation, Mesh, various matrices, ...).
Benefits:
- The hardware-independent application is obtained due to the fact that the renderscript byte-code included in the apk file will be compiled at run-time into the machine code of the hardware-computing module (CPU) of the platform where it will be launched;
- The speed of execution is achieved through parallelization of calculations, run-time compiler optimization and native code execution.
Disadvantages:
- The lack of detailed documentation for working with renderscript complicates application development. Everything is limited to a short description of the proposed renderscript run-time API presented here ;
- Lack of support for the execution of renderscript streams on the GPU, DSP. There may be problems with run-time balancing threads in heterogeneous startup, shared memory management.
2. Dalvik vs. Renderscript in monochrome image processing
Consider the
Dalvik_MonoChromeFilter dalvik function (converting a color RGB image to black and white (monochrome)):
private void Dalvik_MonoChromeFilter() { float MonoMult[] = {0.299f, 0.587f, 0.114f}; int mInPixels[] = new int[mBitmapIn.getHeight() * mBitmapIn.getWidth()]; int mOutPixels[] = new int[mBitmapOut.getHeight() * mBitmapOut.getWidth()]; mBitmapIn.getPixels(mInPixels, 0, mBitmapIn.getWidth(), 0, 0, mBitmapIn.getWidth(), mBitmapIn.getHeight()); for(int i = 0;i < mInPixels.length;i++) { float r = (float)(mInPixels[i] & 0xff); float g = (float)((mInPixels[i] >> 8) & 0xff); float b = (float)((mInPixels[i] >> 16) & 0xff); int mono = (int)(r * MonoMult[0] + g * MonoMult[1] + b * MonoMult[2]); mOutPixels[i] = mono + (mono << 8) + (mono << 16) + (mInPixels[i] & 0xff000000); } mBitmapOut.setPixels(mOutPixels, 0, mBitmapOut.getWidth(), 0, 0, mBitmapOut.getWidth(), mBitmapOut.getHeight()); }
What can I say? A simple loop with independent iterations, "grinding" a bunch of pixels. Let's see how fast it works!
For the experiment, take the MegaFon Mint on the Intel Atom Z2460 1.6GHz with Android ICS 4.0.4 and 600x1024 with a Lego robot carrying Christmas gifts.

Measurements of the time spent on processing will be done according to the following scheme:
private long startnow; private long endnow; startnow = android.os.SystemClock.uptimeMillis(); Dalvik_MonoChromeFilter(); endnow = android.os.SystemClock.uptimeMillis(); Log.d("Timing", "Excution time: "+(endnow-startnow)+" ms");
A message with the “Timing” tag can be received using
ADB . We will do a dozen measurements, before each of which we will restart the device and make sure that the variation of the measurement results is small.
Image processing time dalvik-implementation amounted to 353 ms.
Note: using multithreading tools (for example, the AsyncTask class to describe tasks performed in separate threads),
at best, you can squeeze the double acceleration due to the presence of two logical cores on the Intel Atom Z2460 1.6GHz.
Now consider the renderscript implementation of the
RS_MonoChromeFilter of the same filter:
private RenderScript mRS; private Allocation mInAllocation; private Allocation mOutAllocation; private ScriptC_mono mScript; … private void RS_MonoChromeFilter() { mRS = RenderScript.create(this); mInAllocation = Allocation.createFromBitmap(mRS, mBitmapIn, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT); mOutAllocation = Allocation.createTyped(mRS, mInAllocation.getType()); mScript = new ScriptC_mono(mRS, getResources(), R.raw.mono); mScript.forEach_root(mInAllocation, mOutAllocation); mOutAllocation.copyTo(mBitmapOut); }
Note: the performance of the implementation will be evaluated as for dalvik.
The processing time of the same image by the renderscript implementation was 112 ms.
Gained a performance gain of 3.2x (dalvik and renderscript: 353/112 = 3.2 comparing the runtime).
Note: the renderscript implementation runtime includes creating the renderscript context, allocating and initializing the necessary memory, creating and binding the renderscript to the context, and running the root function in mono.rs.
Note: A critical place for mobile application developers is the size of the resulting apk file. In this implementation, the size of the apk file can increase only by the size of the renderscript in bytecode (* .bc file) compared with the dalvik implementation. In my case, the size of the dalvik version was 404KB, and the size of the renderscript version became 406KB, of which 2KB is the renderscript bytecode (mono.bc).
3. Optimize renderscript
The current renderscript performance can be improved by rejecting a little the accuracy of arithmetic operations with real numbers, which is unprincipled for the problem in question. To do this, add the
rs_fp_imprecise pragma to the
renderscript :
As a consequence of this,
we get an additional 10% performance gain for the renderscript implementation: 112 ms. -> 99 ms.
Note: as a result, we obtain visually the same monochrome image without any artifacts and distortions.
Note: Renderscript does not have an explicit run-time control mechanism by compiler optimization, unlike NDK, since compiler keys are pre-registered inside Android for each platform (x86, ARM, ...).
4. Dependence of the running time of dalvik and renderscript implementations on image sizes
We investigate the next question: what is the dependence of the operation time of each implementation on the size of the processed image? To do this, take 4 images with dimensions of 300x512, 600x1024 (our original image with a Lego robot), 1200x1024, 1200x2048 and make the corresponding measurements of monochrome image processing time. The results are presented below in the graph and in the table.
| 300x512 | 600x1024 | 1200x1024 | 1200x2048 |
dalvik | 85 | 353 | 744 | 1411 |
renderscript | 75 | 99 | 108 | 227 |
win | 1.13 | 3.56 | 6.8 | 6.2 |
Note the linear dependence of time for dalvik relative to the size of the image in contrast to renderscript. This difference can be explained by the presence of the initialization time of the renderscript context.
For images of relatively small sizes, the gain is insignificant, since The initialization time of the renderscript context is about 50-60 ms. However, on medium-sized images, which are most often used on android-devices, the gain is 4-6x.
Conclusion
The article reviewed the dalvik and renderscript implementation of monochrome image processing of different sizes. Due to the parallelization, compiler optimization and native execution of the code, renderscript significantly exceeds dalvik in performance for images of medium size. With this small example, I tried to show when renderscript can be an assistant to improving the performance of applications that remain hardware-independent.