LLVM Overview

LLVM (Low Level Virtual Machine) is a universal system for analyzing, transforming and optimizing programs or, as the developers call it, “compiler infrastucture”.

LLVM is not just another academic project. Its history began in 2000 at the University of Illinois, and now LLVM is used by industry giants like Apple and Adobe. In particular, LLVM is based on the OpenGL subsystem on MacOS X 10.5, and the iPhone SDK uses GCC with a backend on LLVM. Apple is one of the main sponsors of the project, and LLVM's mastermind, Chris Lattner, now works for Apple.

LLVM is based on an intermediate code representation (intermediate representation, IR), on which you can perform transformations during compilation, linking and execution. From this view, optimized machine code is generated for a number of platforms, both statically and dynamically (JIT compilation). LLVM supports code generation for x86, x86-64, ARM, PowerPC, SPARC, MIPS, IA-64, Alpha.
')
LLVM is written in C ++ and ported to most * nix-systems and Windows. The system has a modular structure and can be expanded with additional transformation algorithms (compiler passes) and code generators for new hardware platforms. User frontend usually links to LLVM and uses the C ++ API to generate code and transform it. However, LLVM includes standalone utilities.

For those who, not without reason, thinks C ++ is not the best language for writing compilers, more recently, the OCaml API wrapper is included in LLVM.

To understand what can be done with LLVM, and at what level you will have to work, let's see what is LLVM IR. In one sentence, it can be described as a typed three-address code in SSA-form.

Hereinafter, we will use the text form of the entry of the intermediate code, a kind of “assembler” of the LLVM. In practice, an effective binary representation (bitcode) is used to store the code. Generating the same code is usually most convenient not in text form, and certainly not in binary, but with the help of a special API. In this case, the bitcode may not reach: the code is formed in the form of internal structures in memory, on which all operations are performed, up to the generation of the machine code.

Data types

The following primitive types are supported in LLVM:

Integers of arbitrary width:
i1 ; — 0 1 i32 ; 32- i17 ; i256 ; !
The generation of machine code for types of very large bit width is not supported. For example, for x86 you will have to limit to i64 , and for x86-64 and other 64-bit platforms - to 128-bit integers. But for intermediate representation there are no restrictions.
Numbers are considered to be presented in an additional code. There is no distinction between signed and unsigned integers at the type level: when it matters, different instructions work with them.
Floating-point numbers: float , double , and a number of platform-specific types (for example, x86_fp80 ).
void is an empty value.

Derived types:

Pointers
* i32* ; 32- 
Arrays
[ x ] [10 x i32] [8 x double] 
Structures
{ i32, i32, double }
A vector is a special type to simplify SIMD operations. The vector consists of 2 ⁿ values of the primitive type - integer or floating point.
< x > < 4 x float > 
Functions
i32 (i32, i32) float ({ float, float }, { float, float })

The type system is recursive, so multidimensional arrays, arrays of structures, pointers to structures and functions, etc. can be used.

Operations

Most instructions in LLVM take two arguments (operands) and return a single value (three address code). Values are determined by text identifier. Local values are indicated by the prefix % , and global values are @ . Local values are also called registers, and LLVM - a virtual machine with an infinite number of registers. Example:

%sum = add i32 %n, 5 %diff = sub double %a, %b %z = add <4 x float> %v1, %v2 ; %cond = icmp eq %x, %y ; . i1. %success = call i32 @puts(i8* %str) 

The type of operands is always specified explicitly, and uniquely identifies the type of the result. The operands of arithmetic instructions must be of the same type, but the instructions themselves are “overloaded” for any numeric types and vectors.

Instead of tedious enumeration of instructions (there are 52 of them altogether), we say that LLVM supports the full set of arithmetic operations, bitwise logical operations and shift operations, as well as special instructions for working with vectors.

LLVM IR is strongly typed, so you cannot do without type conversions, which are explicitly encoded with special instructions. A set of 9 instructions covers all kinds of castings between different numeric types: whole and floating point, with and without a sign, different bit bitcast , etc. In addition, there are conversion instructions between integers and pointers, as well as a bitcast instruction that will lead everything but you are responsible for the result.

Now it should be clear how to compile simple expressions in LLVM: each node of the expression tree except leaves (constants and variables) is replaced with an intermediate register value — the result of an instruction whose operands are child nodes.

; x = (a + b) * c - d / e %tmp1 = add float %a, %b %tmp2 = mul float %tmp1, %c %tmp3 = fdiv float %d, %e %x = sub float %tmp2, %tmp3 

Looking ahead, beyond the scope of this article, we note that when using the LLVM API to generate code, everything becomes even simpler, because it follows the principle “instruction is the meaning”. We will not have to deal with the generation of unique names for intermediate values: the function that generates the instruction returns a value (a C ++ object) that can be passed as an argument to other such functions.

SSA

SSA ( static single assignment form ) is a form of intermediate code presentation in which any value is assigned only once. Thus, it is impossible to write:

%z = sum i32 %x, %y %z = sum i32 %z, 5 

The new value should get a new name:

%z.1 = sum i32 %z, 5

However, you ask how to be if the same variable should receive different values depending on some condition? Or how to organize a loop variable?

Let's start a little from afar. It is convenient to consider the code in the SSA form not as a linear sequence of instructions, but as a control flow graph (CFG). The vertices of this graph are the so-called basic blocks (basic blocks), containing a sequence of instructions, ending with a terminator instruction that explicitly transfers control to another block. Base blocks in LLVM are labeled, and the terminators are the following instructions:

ret - return value from function
br i1 , label _1 , label _2 - conditional transition. For example:
define float @max(float %x, float %y) { %cond = fcmp ogt float %x, %y br i1 %cond, label %IfTrue, label %IfFalse IfTrue: ret float %x IfFalse: ret float %y }
(I think the syntax of the function definition is obvious).
There is also a form of unconditional transition:
br label
switch - generalization br , allows you to organize the transition table:
switch i32 %n, label %Default, [i32 0, label %IfZero i32 5, label %IfFive]
invoke and unwind are used to organize exceptions, in this article we will not dwell on them.
unreachable is a special instruction showing the compiler that execution will never reach this point. For example, this instruction may be inserted after calling the system function terminating the process.

Let us return to the question of how to make variables changeable. In the SSA form, a special function φ is added to the other instructions, which returns one of the listed values depending on which block transfers control to the current one.

In LLVM, this function corresponds to the phi instruction, which has the following form:

phi , [_1, label _1], ..., [_N, label _N]

As an example, consider the factorial calculation function, which could be written in C as follows:

int factorial(int n) { int result = n; int i; for (i = n - 1; i > 1; --i) result *= i; return result; } 

Note: a block that starts from entering a function is indicated by %0 .

define i32 @factorial(i32 %n) { %i.start = sub i32 %n, 1 br label %LoopEntry LoopEntry: %result = phi i32 [%n, %0], [%result.next, %LoopBody] %i = phi i32 [%i.start, %0], [%i.next, %LoopBody] %cond = icmp sle i32 %i, 1 br i1 %cond, label %Exit, label %LoopBody LoopBody: %result.next = mul i32 %result, %i %i.next = sub i32 %i, 1 br label %LoopEntry Exit: ret i32 %result }

Do not be confused by the seemingly meaningless transitions to the label immediately following the transition instruction. As we have said, the base unit must end with an explicit control transfer. LLVM also requires that all phi instructions go at the beginning of the block, and there were no other instructions before them.
Here, probably, many will exclaim: but this is terribly inconvenient! Indeed, although the SSA-form allows you to perform many useful transformations, directly generating it from code in an imperative language is difficult, although there are well-known transformation algorithms in SSA. Fortunately, when writing a compiler based on LLVM, there is no need to do this, because the system can generate SSA itself. How and from what, we now find out.

Memory

In addition to register values, LLVM also has work with memory. Values in memory are addressed by typed pointers, which we discussed above. You can access the memory only with the help of two instructions whose names speak for themselves: load and store . For example:

%x = load i32* %x.ptr ; i32 %x.ptr %tmp = add i32 %x, 5 ; 5 store i32 %tmp, i32* %x.ptr ; 

But to use pointers, you need to somehow allocate memory for the values to which they point.

The malloc instruction is translated into a call to the system function of the same name and allocates memory on the heap, returning a value - a pointer of a certain type. Together with her, of course, there is an instruction free .

%struct.ptr = malloc { double, double } %string = malloc i8, i32 %length %array = malloc [16 x i32] free i8* %string 

There is no official recommendation not to use the malloc instruction, but developers admit that there is not much point in its existence now. You can call @malloc function @malloc or write your own allocator function that meets some special requirements.

But the alloca instruction is irreplaceable and very important. It has the same format, but allocates memory on the stack.

%x.ptr = alloca double ; %x.ptr double* %array = alloca float, i32 8 ; %array float*, [8 x float]! 

The memory allocated by alloca is automatically freed upon exiting the function using the ret or unwind instructions.
With alloca , load and store we can use local variables in the same way as in any imperative language. For example, our long-suffering factorial function:

define i32 @factorial(i32 %n) { %result.ptr = alloca i32 ; result %i.ptr = alloca i32 ; i store i32 %n, i32* %result.ptr ; result = n %tmp1 = sub i32 %n, 1 store i32 %tmp1, i32* %i.ptr ; i = n - 1 br label %Loop Loop: %i = load i32* %i.ptr ; i %cond = icmp sle i32 %i, 1 ; i <= 1 br i1 %cond, label %Exit, label %LoopBody ; , LoopBody: %tmp2 = load i32* %result.ptr %tmp3 = mul i32 %tmp2, %i store i32 %tmp3, i32* %result.ptr ; result *= i %i.next = sub i32 %i, 1 store i32 %i.next, i32* %i.ptr ; --i br label %Loop Exit: %result = load i32* %result.ptr ret i32 %result ; return result }

Verbose enough, but tell me, where else besides a similar article will you write the code on LLVM manually? :-)
The good news is that LLVM can build an SSA form from such a code. This process is called “promote memory to register”. Here is what comes out of the factorial function after passing this algorithm:

define i32 @factorial(i32 %n) { ; <label>:0 %tmp1 = sub i32 %n, 1 br label %Loop Loop: %i.ptr.0 = phi i32 [ %tmp1, %0 ], [ %i.next, %LoopBody ] %result.ptr.0 = phi i32 [ %n, %0 ], [ %tmp3, %LoopBody ] %cond = icmp sle i32 %i.ptr.0, 1 br i1 %cond, label %Exit, label %LoopBody LoopBody: %tmp3 = mul i32 %result.ptr.0, %i.ptr.0 %i.next = sub i32 %i.ptr.0, 1 br label %Loop Exit: ret i32 %result.ptr.0 } 

Pointer Operations

The arrays in LLVM are very similar to those in C, but there is no address arithmetic, as in C. That is, you can not write:

%ptr = add i32* %array, i32 %index

To calculate the addresses of elements of arrays, structures, etc. with the correct typing, there is a special instruction getelementptr .

%array = alloca i32, i32 %size %ptr = getelementptr i32* %array, i32 %index ; i32* 

getelementptr only calculates the address, but does not access memory. The instruction takes an arbitrary number of indices and can dereference structures of any nesting. For example, from the following C code:

struct s { int n; char *a[4]; }; struct *s = ...; char c = s->a[2][5]; 

the following sequence of instructions will be generated:

%ptr = getelementptr { i32, [4 x i8*] }* %s, i32 1, i32 2, i32 5 %c = load i8* %ptr 

As you can see, indexes are counted from zero.
There are a couple of extractvalue and insertvalue instructions extractvalue are very similar to insertvalue . They differ in that they take not a pointer to an aggregate data type (array or structure), but a value of this type itself. extractvalue returns the corresponding value of the sub-element, not a pointer to it, but insertvalue generates a new value of the aggregate type.

%n = extractvalue { i32, [4 x i8*] } %s, 0 %tmp = add i32 %n, 1 %s.1 = insertvalue { i32, [4 x i8*] } %s, i32 %tmp, 0 

Built-in functions and annotations

A number of primitives are represented in LLVM not by instructions, but by built-in functions (intrinsic functions). For example, some mathematical functions: @llvm.sqrt.* , @llvm.sin.* , Etc. There are also primitives for atomic operations and some others.

It is interesting that the calls of these functions in the intermediate code are not at all obliged to turn into function calls in machine code, or even into inline substitution of functions. They can simply carry service information for some kind of compiler subsystem. For example, the generation of debug information in DWARF format is organized in this way: calls of functions %llvm.dbg.stoppoint are inserted into IR (sets the correspondence between lines of the source code and the generated code), %llvm.dbg.declare (sets the description of a local variable), etc. as arguments to which pointers to special structures are passed.

Similarly implemented support garbage collection. LLVM does not contain any garbage collection algorithm, instead providing an interface for writing your own exact GC. The primitives @llvm.gcroot , @llvm.gcread and @llvm.gc.write allow you to encode the information needed for the GC to work, and the plug-in interface to the LLVM compiler to generate the necessary data structures from this information and insert calls to runtime.

What the LLVM optimizer can do

Just list some of the algorithms. All of them are platform independent and transform IR. Optimizer passes can be called independently in the correct order.

Delete unused code (dead code elimination).
Selection of identical subexpression elimination.
Constant distribution (constant propagation, condition propagation).
Inline function substitution.
Reversal of tail recursion. LLVM can also in some cases expand non-tail recursive calls by entering an additional battery variable, as is often done in functional languages. For example, in this function the recursive call will be successfully replaced by a conditional branch.
int factorial(int n) { if (n < 2) return 1; return n * factorial(n - 1); } 
Promotion and disconnection of cycles, removal of invariants beyond the cycle.

Conversion can be not only optimizing, but also used for analysis and instrumentation. For example, LLVM can generate CFG in Graphviz format.

Instead of conclusion

From this brief overview, it is clear that the intermediate representation of LLVM closely matches the code in low-level procedural languages like C. The LLVM-based C translator will be fairly straightforward and straightforward, but at the same time, the machine-generated code of performance will be able to withstand the latest versions of GCC.

When translating high-level languages - object-oriented, functional, dynamic - you will have to perform much more intermediate transformations, as well as write a specialized runtime. But in this case, LLVM removes the problems of code generation for a specific platform from the compiler developer, takes on most language-independent optimizations - and does them qualitatively. In addition, we have a ready infrastructure for JIT compilation and the possibility of link-time optimization between different languages compiled in LLVM.

LLVM tries to achieve a balance between convenience and flexibility without imposing any particular programming paradigm, without restricting the type system.

A full-fledged frontend exists today only for C, C ++, Ada and Fortran - this is llvm-gcc. Work is underway to create a C / C ++ compiler - clang independent of GCC. Both projects are supported by the main LLVM team.

Other projects compilers known languages based on LLVM has not yet reached the level of practical applicability. But the prospects are tempting. Will we see a modern functional or dynamic ( PyPy ?) Compiler in LLVM - time will tell.

to be continued…

Source: https://habr.com/ru/post/47878/

All Articles