How to work with JIT

enter image description here

In some internal systems we use JIT in Badoo to quickly search a large bitmap. This is a very interesting and not the most famous topic. And, to correct such an annoying situation, I translated a useful article by Eli Bendersky on what JIT is and how to use it.

I previously published an introductory article on libjit for programmers who are already familiar with JIT. At least a little. In that post, I described JIT quite briefly, and in this I will do a full review of JIT and will supplement it with examples, the code in which does not require any additional libraries.

JIT Definition

JIT is an acronym for “Just In Time” or, if translated into Russian, “on the fly”. This does not tell us anything and it sounds as if it has nothing to do with programming. I think this JIT description is most likely to be true:

If a program creates and executes some new executable code that was not part of the original program on disk during its execution, this is JIT.

But where did this name come from? Fortunately, John Aycock from the University of Calgary wrote a very interesting article called “A Brief History of JIT” , which considers JIT techniques from a historical point of view. Judging by the article, the first mention of code generation and code execution during program operation appeared in 1960 in an article about LISP written by McCarthy. In later works (for example, Thomson ’s 1968 article on regular expressions) this approach is quite obvious (regular expressions are compiled into machine code and executed on the fly).

The very same term JIT first appeared in the books on Java by James Gosling. Aikok says Gosling adopted the term from industrial production and began using it in the early 1990s. If you are interested in the details, then read the article Aikok. And now let's see how everything described above works in practice.

JIT: generate machine code and run it

It seems to me that the JIT is easier to understand if you immediately divide it into two phases:

Phase 1: machine code generation during program operation
Phase 2: machine code execution while the program is running

The first phase is 99% of the total JIT complexity. But at the same time, this is the most trivial part of the process: this is exactly what a regular compiler does. Well-known compilers, such as gcc and clang / llvm, translate source codes from C / C ++ into machine code. Further, the machine code is usually saved to a file, but there is no sense not to leave it in memory (in fact, both in gcc and clang / llvm there are ready-made options for storing code in memory for use in JIT). But in this article I would like to focus on the second phase.

Execution of the generated code

Modern operating systems are very selective in what the program is allowed to do during its work. The times of the wild west are over with the advent of protected mode , which allows the operating system to expose various rights to different pieces of process memory. That is, in the “normal” mode, you can allocate memory on the heap, but you cannot simply execute the code that is allocated on the heap without first explicitly asking for the OS.

I hope everyone understands that the machine code is just data, a set of bytes. Like this, for example:

unsigned char[] code = {0x48, 0x89, 0xf8};

For some, these three bytes are just three bytes, and for someone, the binary representation of valid x86-64 code:

 mov %rdi, %rax

Putting this machine code in memory is very easy. But how to make it executable and, in fact, execute?

Look at the code

Further in this article will be code samples for a POSIX-compatible UNIX operating system (namely, Linux). On other operating systems (such as Windows), the code will differ in details, but not in the approach. All modern operating systems have user-friendly APIs to do the same.

Without further ado, let's see how to dynamically create a function in memory and execute it. This feature is specially made very simple. In C, it looks like this:

 long add4(long num) { return num + 4; }

Here is the first attempt (the full source with the Makefile is available in the repository):

 #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> //  RWX        .    //     NULL. void* alloc_executable_memory(size_t size) { void* ptr = mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (ptr == (void*)-1) { perror("mmap"); return NULL; } return ptr; } void emit_code_into_memory(unsigned char* m) { unsigned char code[] = { 0x48, 0x89, 0xf8, // mov %rdi, %rax 0x48, 0x83, 0xc0, 0x04, // add $4, %rax 0xc3 // ret }; memcpy(m, code, sizeof(code)); } const size_t SIZE = 1024; typedef long (*JittedFunc)(long); //  RWX  . void run_from_rwx() { void* m = alloc_executable_memory(SIZE); emit_code_into_memory(m); JittedFunc func = m; int result = func(2); printf("result = %d\n", result); }

Three main steps that this code performs:

Using mmap to allocate a piece of memory on a heap into which you can write, from which you can read, and which you can execute.
Copying machine code that implements add4 to this memory.
Execution of code from this memory by converting a pointer into a pointer to a function and calling it through this pointer.

Please note that the third stage is possible only when a piece of memory with machine code has execution rights. Without the necessary permissions, a function call would lead to an OS error (most likely a segmentation error). This happens if, for example, we allocate m with a normal call to malloc, which allocates RW memory, but not X.

Let's take a break for a moment: heap, malloc and mmap

Attentive readers may have noticed what I said about the memory allocated by mmap, as “heap memory”. Strictly speaking, “heap” is the name for the memory source, which is used by the functions malloc , free and others. Unlike the stack, which the compiler manages directly.

But not everything is so simple. :-) If traditionally (that is, a long time ago) malloc used only one source for allocated memory (the sbrk system call), now most malloc implementations use mmap in many cases. Details differ from OSes to OSes and in different implementations, but usually mmap is used for large chunks of memory, and sbrk - for small ones. The difference in efficiency during the use of one or another method of obtaining memory from the operating system.

So to call the memory received from mmap “heap memory” is not an error, in my opinion, and I am going to continue to use this name.

We care about safety

The code above has a serious vulnerability. The reason for the block of RWX-memory, which he allocates - a paradise for exploits. Let's be a little more responsible. Here is the slightly modified code:

 //  RW        .    //     NULL.    malloc,   //    ,        mprotect. void* alloc_writable_memory(size_t size) { void* ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (ptr == (void*)-1) { perror("mmap"); return NULL; } return ptr; } //  RX      .  // 0  .       -1. int make_memory_executable(void* m, size_t size) { if (mprotect(m, size, PROT_READ | PROT_EXEC) == -1) { perror("mprotect"); return -1; } return 0; } //  RW ,         RX  // . void emit_to_rw_run_from_rx() { void* m = alloc_writable_memory(SIZE); emit_code_into_memory(m); make_memory_executable(m, SIZE); JittedFunc func = m; int result = func(2); printf("result = %d\n", result); }

This example is equivalent to the previous example in all respects but one: the memory is first allocated with RW rights (as with the usual malloc ). These are sufficient rights so that we can write our piece of code there. After the code is already in memory, we use mprotect to change permissions from RW to RX, disabling the entry. As a result, the effect is the same, but at no stage our memory is both rewritable and executable. This is good and right from a security point of view.

What about malloc?

Could we use malloc instead of mmap to allocate memory in the previous code? After all, RW memory is exactly what malloc gives us. Yes, we could. But there are more problems than amenities. The fact is that rights can be set only on whole pages. And, allocating memory with malloc , we would need to manually verify that the memory is aligned to the page boundary. Mmap solves this problem in such a way that it allocates always aligned memory (because mmap by definition, only works with whole pages).

Summing up

This article began with a general JIT review, what we generally mean when we say “JIT”, and ended with examples of code that demonstrates how to dynamically execute a piece of machine code from memory. The techniques presented in the article are about how JIT is done in real JIT systems (LLVM or libjit). All that remains is the “simple” part of the generation of machine code from some other representation.

LLVM contains a full-fledged compiler, so that it can translate C and C ++ code (via LLVM IR) into computer code on the fly and execute it. Libjit runs at a much lower level: it can serve as a backend for the compiler. My introductory article on libjit demonstrates how to generate and execute non-trivial code using this library. But JIT is a much more general concept. You can create code on the fly for data structures , regular expressions, and even for accessing C from virtual machines of various languages . I rummaged through the archives of my blog and found a reference to JIT in an article eight years ago . It is about a Perl code that generates another Perl code on the fly (from an XML file with a description), but the idea is the same.

That is why I believe that it is important to describe JIT by separating the two phases. For the second phase (which I described in this article), the implementation is rather trivial and uses standard operating system APIs. For the first phase, the possibilities are endless. And what exactly will be in it ultimately depends on the specific application that you are developing.

Source: https://habr.com/ru/post/321378/

All Articles