📜 ⬆️ ⬇️

Is the native method expensive? JNI "Secret" extension


Why do Java programmers use native methods? Sometimes, to use a third-party DLL library. In other cases, to speed up the critical algorithm due to optimized C code or assembler. For example, for streaming media processing, compression, encryption, etc.

But the call to the native method is not free. At times, the overhead of JNI is even greater than the performance gain. And all because they include:
  1. create stack frame;
  2. shifting arguments in accordance with the ABI ;
  3. wrapping links in JNI handles ( jobject );
  4. passing additional arguments JNIEnv* and jclass ;
  5. capture and release of a monitor if the method is synchronized ;
  6. "Lazy" linking of native function;
  7. tracing method input and output;
  8. transfer of a stream from in_Java to in_native and back;
  9. checking the need for a safepoint;
  10. handling possible exceptions.

But often the native methods are simple: they do not throw exceptions, do not create new objects in the heap, do not bypass the stack, do not work with handles, and are not synchronized. Is it possible for them not to do unnecessary actions?

Yes, and today I will talk about the undocumented features of HotSpot JVM for the accelerated call of simple JNI methods. Although this optimization has appeared since the first versions of Java 7, which is surprising, no one has ever written about it.

Jni how we know it


For example, consider a simple native method that takes an array of byte[] as input and returns the sum of elements. There are several ways to work with an array in JNI:

Critical native


And here is our secret tool. Outwardly, it is similar to the usual JNI method, only with the JavaCritical_ prefix instead of Java_ . There are no JNIEnv* and jclass , and instead of jbyteArray two arguments are passed: jint length - the length of the array and jbyte* data - a “raw” pointer to the elements of the array. Thus, the Critical Native method does not need to call the expensive JNI functions GetArrayLength and GetByteArrayElements - you can immediately work with an array. At the time of such a method, the GC will be delayed.
')
 JNIEXPORT jint JNICALL JavaCritical_bench_Natives_javaCriticalImpl(jint length, jbyte* buf) { return sum(buf, length); } 

As you can see, in the implementation there is nothing extra.
But in order for a method to become Critical Native, it must satisfy strict limitations:

Critical Natives was conceived as a private API of Hotspot for the JDK in order to speed up the call to the cryptographic functions implemented in the native. The maximum that can be found from the description is comments to the task in the bugtracker . Important feature: JavaCritical_ functions are called only from hot (compiled) code, therefore, in addition to the JavaCritical_ implementation, the method must also have a “spare” traditional JNI implementation. However, for compatibility with other JVM it is even better.

How many will be in grams?


Let's measure what the savings are on arrays of different lengths: 16, 256, 4KB, 64KB and 1MB. Naturally, using JMH .
Benchmark
 @State(Scope.Benchmark) public class Natives { @Param({"16", "256", "4096", "65536", "1048576"}) int length; byte[] array; @Setup public void setup() { array = new byte[length]; } @GenerateMicroBenchmark public int arrayRegion() { return arrayRegionImpl(array); } @GenerateMicroBenchmark public int arrayElements() { return arrayElementsImpl(array); } @GenerateMicroBenchmark public int arrayElementsCritical() { return arrayElementsCriticalImpl(array); } @GenerateMicroBenchmark public int javaCritical() { return javaCriticalImpl(array); } static native int arrayRegionImpl(byte[] array); static native int arrayElementsImpl(byte[] array); static native int arrayElementsCriticalImpl(byte[] array); static native int javaCriticalImpl(byte[] array); static { System.loadLibrary("natives"); } } 
results
 Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) Benchmark (length) Mode Samples Mean Mean error Units b.Natives.arrayElements 16 thrpt 5 7001,853 66,532 ops/ms b.Natives.arrayElements 256 thrpt 5 4151,384 89,509 ops/ms b.Natives.arrayElements 4096 thrpt 5 571,006 5,534 ops/ms b.Natives.arrayElements 65536 thrpt 5 37,745 2,814 ops/ms b.Natives.arrayElements 1048576 thrpt 5 1,462 0,017 ops/ms b.Natives.arrayElementsCritical 16 thrpt 5 14467,389 70,073 ops/ms b.Natives.arrayElementsCritical 256 thrpt 5 6088,534 218,885 ops/ms b.Natives.arrayElementsCritical 4096 thrpt 5 677,528 12,340 ops/ms b.Natives.arrayElementsCritical 65536 thrpt 5 44,484 0,914 ops/ms b.Natives.arrayElementsCritical 1048576 thrpt 5 2,788 0,020 ops/ms b.Natives.arrayRegion 16 thrpt 5 19057,185 268,072 ops/ms b.Natives.arrayRegion 256 thrpt 5 6722,180 46,057 ops/ms b.Natives.arrayRegion 4096 thrpt 5 612,198 5,555 ops/ms b.Natives.arrayRegion 65536 thrpt 5 37,488 0,981 ops/ms b.Natives.arrayRegion 1048576 thrpt 5 2,054 0,071 ops/ms b.Natives.javaCritical 16 thrpt 5 60779,676 234,483 ops/ms b.Natives.javaCritical 256 thrpt 5 9531,828 67,106 ops/ms b.Natives.javaCritical 4096 thrpt 5 707,566 13,330 ops/ms b.Natives.javaCritical 65536 thrpt 5 44,653 0,927 ops/ms b.Natives.javaCritical 1048576 thrpt 5 2,793 0,047 ops/ms 


It turns out that for small arrays the cost of a JNI call is many times greater than the time of the method itself! For arrays of hundreds of bytes, the overhead is comparable to useful work. Well, for multi-kilo byte arrays the method of calling is not so important - all the time is spent on processing itself.

findings


Critical Natives is a private JNI extension in HotSpot, which appeared with JDK 7. By implementing a JNI-like function according to certain rules, you can significantly reduce the overhead of calling the native method and processing Java arrays in native code. However, for long-playing functions such a solution is not suitable, since the GC will not be able to start until Critical Native is executed.

Source: https://habr.com/ru/post/222997/


All Articles