📜 ⬆️ ⬇️

OpenJDK: Project Panama



Two years ago, a new project codenamed “Panama” was created in OpenJDK. The main focus of research was the creation of a new interface for working with platform-dependent libraries and data outside of the Java heap (off-heap). But the project’s goals are broader: exploring the interaction mechanisms of the JVM and the "external" (non-Java) API.

Vladimir Ivanov iwanowww - lead engineer of Oracle, works in the development team of the Java virtual machine HotSpot. Specializes in JIT compilation and support for alternative languages ​​on the Java platform. Vladimir joined Sun Microsystems (acquired by Oracle in 2010) in 2005 and has since participated in a large number of Java-related projects (HotSpot JVM, RTSJ, JavaFX).
')

JNI 2.0?


- Most of the Panama project is working with native libraries from Java code. How can this be done now?

- It was always possible to work with native code in Java. Native methods were still in the first version of Java, and the standard JNI interface appeared already in version 1.1. But as time goes on, the platform is evolving, requirements are changing and, looking at JNI now, there is an understanding that it is possible to organize work with native libraries more conveniently and efficiently.

JNI has a number of drawbacks associated with complexity in use and speed. In order to integrate a certain library into an application, it is necessary not only to write a C / C ++ wrapper for it, but also to provide assemblies for all supported platforms. This does not fit well with the compilation of modern Java applications and can be a significant barrier to implementation. Also, due to its Java centricity, each call through JNI incurs certain overhead costs, which becomes especially noticeable during intensive work, even with small methods. The Panama project is, among other things, an attempt to create a new version of JNI, “JNI 2.0”, which is more convenient and productive. And there is already a JEP : JEP 191: "Foreign Function Interface" .

- There is an opinion that JNI was designed so complex that it was unpleasant to use. What do you think about it?

- This is something from the category of "urban legends". Although in general, the opinion is, of course, erroneous, there is a grain of truth in it: investing in improving JNI was not a priority. It was believed that it is more efficient and more convenient to write everything in Java. There was an interface that covered> 90% of user needs and did not see it necessary to develop it. Yes, and with the help of third-party libraries, you can significantly simplify working with JNI. Just look at JNR , which allows you to fully work with native libraries without writing a line of C / C ++ code.

- JNI is already 20 years old, why did the Panama project arise and develop just now?

- I would say that we have matured to this project. Over the years, the strengths and weaknesses of JNI have become apparent, and the Java platform has come a long way in its development. It became clear that it is not always advisable to develop applications entirely in Java, in addition, work with data outside of Java heap has become much more in demand. JNI and NIO no longer satisfy all needs and users have to work with sun.misc.Unsafe . The Panama project is designed to solve a number of problems that they face.

As announced by John Rose (John Rose, JVM architect from Oracle), who oversees the project: any useful library should be easily accessible as part of the Java ecosystem (regardless of whether it is written in Java or not).

For example, there is a package of linear algebra LAPACK , originally written in Fortran. A lot of resources were invested in optimization, and it is hardly possible to win something from rewriting in Java. It is significantly more productive just to reuse it, as C / C ++ programmers do, for example.

In general, the first attempt to “look outside” can be considered Project Sumatra , whose goal was to study the prospects for using the GPU to execute Java programs. In theory, everything sounds very attractive: run the program on a device where the GPU is available, and the JVM will automatically start using it. But in practice, everything turned out to be not so rosy, and to create an effective mechanism for the execution of Java bytecode on modern GPUs did not work out. There are several Java libraries ( Aparapi and Rootbeer ) for working with Java GPUs, but they offer a rather low-level approach similar to OpenCL / CUDA.

Panama gives a different perspective on the problem of using the GPU: it is not necessary to execute Java bytecode for the GPU, it’s enough to work with libraries that know what to do with the GPU. For example, some implementations of BLAS and the MAGMA linear algebra package have such functionality.

- What tasks do programmers solve now with the help of JNI?

“The Java ecosystem of libraries is rich, but not all are written in Java. I have already mentioned the linear algebra and LAPACK packages. The only way to use them in a Java program is JNI. Another example is 3D graphics: how to work with OpenGL from Java? There is no standard Java API, there are platform implementations with the necessary functionality, but a way to integrate with them from Java is required. The answer is again JNI.

- And what successful projects are currently using JNI?

- In general, any more or less popular non-Java library has a version-wrapper for Java, and of course, this is all implemented using JNI. For example, in the field of computer vision, this is the OpenCV library . If you look at 3D graphics, then this is Java Binding for the OpenGL API and Lightweight Java Game Library .

Regarding linear algebra packages, netlib-java provides access to the BLAS / LAPACK platform implementations. By the way, is present in the latest versions of Apache Spark .

From Java projects, I would mention JRuby, which JNI does not directly use, but relies on JNR to work with a platform-specific API.

Access to off-heap and work with data


- In addition to the native library call interface, the Panama project includes support for native data structures. Do you view it as a separate functionality or as a feature necessary to call native libraries?

- Both. The main problem when working with native code from Java is data exchange. A virtual machine has complete freedom in choosing the representation of Java objects, and, often, this format is in no way consistent with the native libraries. You have to either copy the data "back and forth", or try to work with one copy.

JNI offers an API for accessing Java heap in native code, and to work with off-heap, you need to write code in the JNI wrapper. It turns out very expensive: both in terms of performance and the amount of required code.

In Panama, we are working on a new format ( Layout Definition Language ), which allows us to describe fairly complex data structures in a compact and flexible form. LDL descriptions can be automatically extracted from C / C ++ headers, and from the "on the fly" description, Java code is generated for working with data. The JVM can also use this information, for example, to search for pointers to Java objects with GC. At the same time, native code can work with this data directly, without any additional adaptation.

In combination with pointers and explicit memory management , this fully covers part of the sun.misc.Unsafe functionality used for off-heap solutions.

But that is not all. With proper support on the JVM side, LDL can be used to describe the structure of Java objects.

First of all, it will allow to control alignment and produce padding fields.
In the hot code, the effects of false sharing and unaligned memory accesses seriously affect the speed of execution. Inside the JDK there is the @Contended annotation, but for the user, the only way to avoid false sharing is to manually “overlap” the problem field with other fields, in the hope that the JVM will keep their order.

But, most importantly, it will open the way to a number of exotic structures, such as fused strings (header and character array as one object) or tagged arrays (each element of the array is a primitive value, or a pointer to an object).

This part of the project has something in common with Valhalla and value-types in terms of creating compact Java structures with fast access (both arbitrary and sequential) to the data.

- What do you think, which of the project features will be most demanded by users?

“Panama has a number of independent research areas.” The first is the work with native code and off-heap data. This is where the new JNI replacement API comes in. I estimate that this part of the project should be the most popular.

From other directions, I would point out the API for batch processing ( Vector API ). In modern processors, there are vector extensions (SSE and AVX on x86, NEON on ARM) containing instructions for batch processing ( SIMD instructions ). Currently, the JVM can do automatic vectorization of code during dynamic compilation, but this does not cover all interesting cases. Work is underway on a specialized API, making it possible to explicitly describe batch processing operations on data.

Another area is the update of Java arrays, also known as Arrays 2.0. The arrays were in Java from the very beginning, and in some aspects are seriously outdated (for example, the size limit is 2Gb). There is a need for more effective and flexible mechanisms for describing and working with them.

- When compared to other changes with JVM and Java, how important is the Panama project at the moment?

- Work in Panama is actively underway, but is still in the research phase. We have yet to determine what and when to integrate into Java.

Today, the key project for JDK 9 is platform modularization ( Project Jigsaw ).

In the context of Panama, VarHandle 's ( JEP 193: “Variable Handles” ) looks very interesting. They work both on fields and arrays, as well as on off-heap data, and provide a number of exotic read / write modes that cannot be described in terms of the standard Java memory model. Such support is necessary for the effective implementation of non-blocking synchronization, and the java.util.concurrent package already in JDK 9 completely migrated from sun.misc.Unsafe to VarHandle 's. The new primitives proposed in Panama should fit well into the paradigm of access through VarHandle 's, unifying access to on-heap and off-heap data.

What's next? Future jdk


- And in subsequent versions?

- Project Valhalla is also in active development. The Panama project is less well known, but in my opinion, it is no less important for the Java platform in the long run.

About FFI and work with off-heap conversations have been going on for quite some time, but lately there has been a keen interest in the Vector API. At the JVM Language Summit conference this year there was a funny moment: when discussing Panama, colleagues from Facebook were strongly interested in waiting for the appearance of the Vector API in Java, and said that they needed it 3 years ago. It is a pity that they were silent for so long. It was necessary to immediately raise the topic, the benefit of the JVMLS, they come every year. At that time, support for explicit vectorization did not attract much interest.

- Do you expect some projects written in Java to be rewritten using the new API?

- Of course, JNI will remain, but for people who are currently using JNI, the new API will be much more attractive. Judging by our experience, with the presence of a new interface, the need for JNI should no longer be.

We are actively experimenting with the current prototype and are happy with the results: Clang is used to extract information from the C / C ++ header, and now the entire binding is created by new tools from Panama. Simple, convenient, saves a lot of time during migration.

- That is, the Panama project will be in demand within the Java platform?

- Of course, calling the native code is also actively used inside the platform, so with the advent of a more convenient and efficient mechanism, we will gradually move to the JDK, but Panama is not positioned as internal. His goal is to create a new mechanism for working with native code and off-heap data for the Java platform, and this implies the emergence of a new public API.

- That is, JNI will remain in the language as a legacy framework?

- No one is going to get rid of JNI yet. Backward compatibility is crucial for Java and support will continue. It is possible that in the future JNI will be marked “for deletion” (as deprecated), but at the moment there are no such plans.

I would also like to note that the work on the Panama project is likely to benefit JNI. Effective work with native code requires serious support on the JVM side. So rewriting JNI using new JVM primitives can provide significant performance gains.

- And what else interesting will be available in JDK 9?

- From projects that are not widely known, I would single out the JVMCI interface ( JEP 243: "Java-Level JVM Compiler Interface" ). With it, you can connect a third-party dynamic compiler to the JVM, and the first user of this API is Graal , developed in Oracle Labs. This is a new JIT compiler written entirely in Java.

- That is, it will be possible to replace the standard JIT compiler with Graal?

- Yes, Graal can be used either as a “last level” compiler, generating optimized code (replacing the C2 server compiler in Hotspot), or as the only JIT compiler in a virtual machine. This is available now, but a special JVM build is required.

In general, experiments with “Java on Java” implementations have been going on for a long time. There was a Maxine VM project - a virtual machine, completely (!) Written in Java. The greatest success was achieved in the field of dynamic compilers. After all, Graal began with an attempt to rewrite the client compiler from HotSpot to Java. In Maxine, he even initially called C1X. Finally, the time has come to implement the developments in the platform.

I, as a JVM engineer, are extremely impressed by the trend of rewriting a virtual machine in Java. On the one hand, we get implementation on a modern platform, under which there is a convenient toolkit, high-level language constructs, an extensive standard library with excellent support for multi-threaded programming. It is also important that we are in complete control of this platform, so we have the opportunity to expand it in the direction we need and solve our problems ourselves. On the other hand, we add tools to solve low-level tasks in Java. And the Panama project will play a key role in this.



If you like the “guts” of the JVM just like us, then in addition to the report by Vladimir “Native code, Off-heap data and Java” we recommend you to look at the following Joker 2016 reports:

Source: https://habr.com/ru/post/310014/


All Articles