Java with assembly inserts

As you know, in any language you can write as in Java, and the first love of javista is the writing of Garbage Collectors and JIT Compilers. Many delightful questions are connected with this, for example: how can one directly work with machine code and assembler from managed code?

In addition, this article will be a small example in C #. At some point, it became clear that you cannot always learn one Java. Runtime dynamic languages use a general theory and in practice work within similar problems. The easiest way to promote your work is to see what your neighbors are doing and copy something good for yourself.

Now about the assembler and machine code. Why this is needed - an open question. For example, you have heard enough about Meltdown and want to write a beautiful API for it :-) Well, don't forget that Oracle is not gods, support for the same AVX-512 was added only in Nine, direct control of hardware transactional memory does not fall on the language, Some standard methods can be implemented better than they did in the SDK, etc. - we always have something to dig!

There are two levels in the problem:

Cannot directly zainlaynit machine code or x86-assembler in Java-code (insert construction __asm() )
you cannot directly execute them (however, in the absence of syntax, it is difficult to get to this problem)

But you can operate some hacks, which will be further.

Naive option: take and run!

As you know, you can run native code using JNI.
So, we in C ++ can safely dynamically generate the machine code and then pull it.

To do this, create a memory segment that is available for recording and execution at the same time.

Here is an example for Windows with VirtualAllocEx (in posix there is mprotect (2) , but I'm lazy).
Try to understand what he is doing. If there is no PAGE_EXECUTE_READWRITE , then this code will immediately kill Data Execution Prevention in Windows.

 #include <stdio.h> #include <windows.h> typedef unsigned char byte; int arg1; int arg2; int res1; typedef void (*pfunc)(void); union funcptr { pfunc x; byte* y; }; int main( void ) { byte* buf = (byte*)VirtualAllocEx( GetCurrentProcess(), 0, 1<<16, MEM_COMMIT, PAGE_EXECUTE_READWRITE ); if( buf==0 ) return 0; byte* p = buf; *p++ = 0x50; // push eax *p++ = 0x52; // push edx *p++ = 0xA1; // mov eax, [arg2] (int*&)p[0] = &arg2; p+=sizeof(int*); *p++ = 0x92; // xchg edx,eax *p++ = 0xA1; // mov eax, [arg1] (int*&)p[0] = &arg1; p+=sizeof(int*); *p++ = 0xF7; *p++ = 0xEA; // imul edx *p++ = 0xA3; // mov [res1],eax (int*&)p[0] = &res1; p+=sizeof(int*); *p++ = 0x5A; // pop edx *p++ = 0x58; // pop eax *p++ = 0xC3; // ret funcptr func; func.y = buf; arg1 = 123; arg2 = 321; res1 = 0; func.x(); // call generated code printf( "arg1=%i arg2=%i arg1*arg2=%i func(arg1,arg2)=%i\n", arg1,arg2,arg1*arg2,res1 ); }

Of course, doing it manually is not the most sensible idea, and you need to drag some AsmJit . Then from the Java-code we forward specific data, than we fill the buffer, and oh - rolled!

The problem here is that from inline we are expecting a slightly more fat functionality. I want to have access to the entire context of the call, plus pull the various system pieces from the SDK. It is probably possible to make this independent - but long and painful. Fortunately, everything is stolen before us.

Java Native Interface

You can still use JNI, but in a different way.

Suppose we have this class:

 public class MyJNIClass { public native void printVersion(); }

The idea is to name the symbol in accordance with the convention of names in JNI, and then it will do everything. In our case, it will look something like Java_MyJNIClass_printVersion .

The symbol must be visible from other translation units, and this can be done in NASM using the global directive or in FASM using the public .

The ASM itself must be written with an understanding of the calling conventions in the architecture used (arguments can be in registers, on the stack, in other memory structures, etc.). The first argument that arrives in the function will be a pointer to JNIEnv , and it, in turn, will be a pointer to the JNI function table.

For example, NASM under x86_64 will look like this:

 global Java_MyJNIClass_printVersion section .text Java_MyJNIClass_printVersion: mov rax, [rdi] call [rax + 8*4] ; pointer size in x86_64 * index of GetVersion ...

Where GetVersion magic index GetVersion ? Very simple: they are listed in the documentation .

Here is the GetVersion description:

 GetVersion jint GetVersion(JNIEnv *env); Returns the version of the native method interface. LINKAGE: Index 4 in the JNIEnv interface function table. PARAMETERS: env: the JNI interface pointer. RETURNS: Returns the major version number in the higher 16 bits and the minor version number in the lower 16 bits. In JDK/JRE 1.1, GetVersion() returns 0x00010001. In JDK/JRE 1.2, GetVersion() returns 0x00010002. In JDK/JRE 1.4, GetVersion() returns 0x00010004. In JDK/JRE 1.6, GetVersion() returns 0x00010006.

As you can see, the table of functions is just some sort of pointer array. You need to remember to multiply these indexes by the size of the pointer in the target architecture, of course.

The second argument is a reference to the class or object that called the function. All of the following arguments are parameters of the method declared as native in the Java code.

Next you need to assemble an object from the assembler: nasm -f elf64 -o GetVersion.o GetVersion.asm
From the object library, the library: gcc -shared -z noexecstack -o libGetVersion.so GetVersion.o
And finally collect the file itself: javac MyJNIClass.java

You can perform much more complex operations. Here is an example of adding elements of an array .

And it seems that everything would be fine, but I want a few things.

First, if we code in Java, I would like to have a nice syntax for creating an assembler with static checks (whatever that means in this case), etc. I’m blonde and I want registers to choose in IDE autocompletion, and not to be afraid to be sealed in one letter. Well, at least, let it be a Java API.

Secondly, it is not a good idea to assemble the library by hand next to the code. Well, collect files one by one - a complete bottom. We need infrastructure that will allow not to care about such things. For example, let an asm code be inline, or a plugin in Maven, or distribute as part of a modified JDK.

Thirdly, it is not clear whether it is worth choosing an assembler as an abstraction, because there are many different representations.

Libraries

Immediately wrote to the dudes from Oracle, who are engaged in machine code in Java, and received the answer: they do not know normal beautiful libraries.

But you still need to go to Google, we're not lazy. Use the keyword "java call x86 assembly library" and meditate on the result.

The results show that from the point of view of libraries everything is really bad. Googling a few unfinished things, including The Machine Level Java .

And there even has a beautiful API (how beautiful it can be when using constructs from Java):

 public class SimpleNativeDemo extends X86InlineAssembly // X86InlineAssembly is a successor of InlineAssembly { static // static initializer { InlineAssemblyHelper.initializeNativeCode_deleteExistingLibraryAndNoExceptions(new SimpleNativeDemo(System.out)); } // constructor, which defines x86 architecture as a native method's target public SimpleNativeDemo(OutputStream debugStream) { super(Architectures.X86.architecture, false, debugStream); } // native method declaration public static native long multiply(int x, int y); // native method implementation @Override public void writeNativeCode() { parameterIn(r.EAX,IP.In0.ordinal()); parameterIn(r.EBX,IP.In1.ordinal()); mul.x32(r.EBX); } }

It gives us the syntax (without modifying the Java parser), and the method of execution is good.

Almost the main problem here is that inside there is a very complex code that needs to be maintained. It is part of jEmbryOS - a project to create an entire operating system in Java. And externally, the project is not very much alive: it is still on Sourceforge (and it is not on GitHub and other popular modern hosting systems), an empty forum with the latest message for 2014. The last nail in the coffin was that in the 2015 release on Sourceforge there is no license file - you can’t use a code without a license (the default copyright rules that make such a read-only code apply).

Okay, that didn't work out. But in the future you can write yourself, the good is that the idea is clear.

Intrinsiki

On this occasion there is a whole report here:

If it is worth going to the conference at all, then just for such reports. Volker zhzhot.
By the way, he will be at the next JBreak in Novosibirsk , with another annealing about the class data sharing .

In short, in OpenJDK we have the magic file src/share/vm/classfile/vmSymbols.hpp . Just open it in the browser by the link and you will understand everything. We can catch specific methods and replace them with an assembler. Well, rebuild OpenJDK with these changes, of course.

IMHO, if you really resist, you can do this: write a preprocessor for .java-classes that will catch calls to constructions like __asm("ret") , then generate a patch for intrinsics from them and automatically rebuild OpenJDK.

Why does this solution seem not so pleasant to me? First, changing intrinsik leads to rebuilding a large part of OpenJDK. So you have to drink tea very often, smoke, ~~get drunk with grief~~ and kill time in other ways, while hot as a stove a laptop pereklebashivaet C ++.

Secondly, intrinsics do not work in exactly the same way as native methods. If we work in JNI in normal mode and the JVM can always roll back to safepoint, then in the case of intrinsic it does not work. It will be necessary to sweat in order to not break something fatally.

And thirdly, there is a suspicion that a normal person will not be very pleased to contact this part of OpenJDK. Most of the code there consists of a serious witch in which you can get bogged down.

And what about. NET?

Some shock was the fact that the dotnetchikov has a completely different approach. They may not wrap the assembler at all, but run the native code directly from C #!

The idea set an example, written back in 2005 . Unfortunately, the code on the link does not work, because it will be immediately beaten by DEP. I had to modify it a bit by dragging debris from kernel32.dll : through pinvoke, get VirtualAllocEx and the flags it needs - AllocationType and MemoryProtection . This is exactly the same trick we used in the C ++ example.

For simplicity of example, let there be a method that returns an answer to the most important question of Life, the Universe and Everything Else:

 using System; using System.Collections.Generic; using System.Diagnostics; using System.Runtime.InteropServices; class Program { [DllImport("kernel32.dll", SetLastError = true, ExactSpelling = true)] static extern IntPtr VirtualAllocEx(IntPtr hProcess, IntPtr lpAddress, uint dwSize, AllocationType flAllocationType, MemoryProtection flProtect); [Flags] public enum AllocationType { Commit = 0x1000, Reserve = 0x2000, Decommit = 0x4000, Release = 0x8000, Reset = 0x80000, Physical = 0x400000, TopDown = 0x100000, WriteWatch = 0x200000, LargePages = 0x20000000 } [Flags] public enum MemoryProtection { Execute = 0x10, ExecuteRead = 0x20, ExecuteReadWrite = 0x40, ExecuteWriteCopy = 0x80, NoAccess = 0x01, ReadOnly = 0x02, ReadWrite = 0x04, WriteCopy = 0x08, GuardModifierflag = 0x100, NoCacheModifierflag = 0x200, WriteCombineModifierflag = 0x400 } private delegate int IntReturner(); private static void Main() { List<byte> bodyBuilder = new List<byte>(); bodyBuilder.Add(0xb8); bodyBuilder.AddRange(BitConverter.GetBytes(42)); bodyBuilder.Add(0xc3); byte[] body = bodyBuilder.ToArray(); IntPtr buf = VirtualAllocEx(Process.GetCurrentProcess().Handle, (IntPtr) 0, Convert.ToUInt32(body.Length), AllocationType.Commit, MemoryProtection.ExecuteReadWrite); Marshal.Copy(body, 0, buf, body.Length); IntReturner ptr = (IntReturner) Marshal.GetDelegateForFunctionPointer(buf, typeof(IntReturner)); Console.WriteLine(ptr()); Console.ReadKey(); } }

If we suddenly needed a smarter example with parameters, Marshal can allocate unmanaged memory using AllocHGlobal , clean FreeHGlobal and another pack of methods on the same topic.

Having such superpowers, you can create real game, for example, replace methods in classes . Before publishing this article, I looked at a large number of projects on GitHub, but, unfortunately, none of these hacks could do without C ++, unsafe and, the saddest thing, is very voluminous code. So here I will not write all of this and put it in a separate article on hacks in .NET.

JVM Compiler Interface

Turning the brains in the right direction, it becomes clear that in Java you can solve the problem in a similar way. The fact is that in Java 9 JEP 243 is implemented : Java-Level JVM Compiler Interface .

The developers of this feature understood that the JIT compiler is a serious ~~piece of govnokod~~ software that would be good to develop separately, using all possible features of the java ecosystem, such as good free IDEs. Not all of these features can be used inside OpenJDK - usually you open its code in the IDE, everything is red and ten times highlighted as an error. There is some justification for a monolithic architecture in subsystems that need direct access to various internal mechanisms (for example, this is needed by the bytecode interpreter or the garbage collector) —but the compiler doesn't care.

Hence the idea to separate the compiler into a separate entity. The connection should be made conveniently, in the form of a plug-in, included from the command line. So was born JVMCI.

In short, we have the simplest interface:

 interface JVMCICompiler { byte[] compileMethod(byte[] bytecode); }

The Java bytecode arrives at the input, the native code arrives at the output. Much like what was higher in C #. We have this in something even better, because such use is not a dirty side effect, but the main use pattern.

In reality, simply bytecode is not enough. This is rather a CompilationRequest with additional fields:

 interface JVMCICompiler { void compileMethod(CompilationRequest request); } interface CompilationRequest { JavaMethod getMethod(); } interface JavaMethod { byte[] getCode(); int getMaxLocals(); int getMaxStackSize(); ProfilingInfo getProfilingInfo(); ... }

Whether long, shortly, you compiled bytecode, and you can safely install it using HotSpot.installCode(...);

Including, such an approach can solve the initial problem with intrinsics - the need to rebuild OpenJDK.

The problem part here is that writing your own JVMCI implementation is not a very quick and easy task. Documentation on this feature is almost absent. The only comprehensive documentation is the OpenJDK code in C ++, which I really don't want to read.

But here everything is stolen for us.

Graal and truffle

In the dark scary basements of Oracle Labs, several cool tools are being developed that will change the picture for everyone who is expanding OpenJDK in the near future. These projects are united under the common name Graal and lie here in this repository on GitHub .

Including:

Graal is an optimizing compiler written in Java and integrating with HotSpot JVM
Truffle - a framework for creating languages and languages using Graal as its main compiler
Substrate VM - a framework that allows you to do ahead-of-time (AOT) compilation of Java applications and turn them into executable files

Interestingly, Graal and Truffle provide their own JVMCI implementation. And this support is already in OpenJDK 9 - just connect the necessary flags. Connecting these flags, of course, will not save us from rebuilding Graal itself, but it does show how seriously the developers got down to business. It shows that all this has already been sufficiently tested and matured in order to turn into an official feature.

Very good about how Graal works, told Chris Seaton . By the way, this article was written based on his speech on Joker 2017 .

Now to the question of how well all this works and is applicable in practice. Christian Thalinger spoke at the same Joker and told that a significant part of Twitter had already been translated to Graal. Using it as a compiler turned out to be not only practical, but also increased the performance of existing code by more than 10%.

In addition, we have a JEP 317: Experimental Java-Based JIT Compiler , which is likely to be included in Java 10.

In this section, I wanted to write a small, victorious example that shows how to use Graal for our purposes. Unfortunately, the example is still being written, and it looks like it will be written for a long time. This is a topic for a separate article.

What is missing here

The following things were unfairly not considered: VMStructs , Java Native Runtime ( JNR-x86asm in this case), Project Panama as a whole. In April, after the apangin report , you will need to write a spin-off and reveal these topics.

Conclusion

In this article, a review of how to launch native code directly from Java was shown.

This is only the first article in the series. What are the next steps?

First you need to conduct an interview with Christian Thalinger, who, like no one else, understands Graal. Interview will be published on Habré soon.

By the way, he is coming to JBreak 2018 in Novosibirsk with a new report "Graal: how to use the new JVM JIT compiler in real life" - you should go to this report.

In the following articles on this topic, you need to delve into the architecture and organization of Graal and Truffle and show how we can make simple changes and achieve a quick effect.

In addition, you can try to associate modern materials with old, but not lost utility works that influenced the design of Graal. For example, I have already published one such article (context-sensitive inlayning of traces) on Habré . A large amount of material has accumulated on related developments: for example, the current developer of Graal, Doug Simon, used to work on Maxine VM, about which there is an impressive number of publications.

Source: https://habr.com/ru/post/347200/

All Articles