Tail recursion optimization in Java

For a long time already certain things from the world of functional programming more and more penetrate into non-functional languages. It may seem strange that lambda expressions have been able to integrate into Java, but the tail recursion optimization (conversion of recursion into an equivalent cycle) has not yet been done, although it is much simpler. Why is it not?

Let's try to figure out the reasons and see what you can do with your own hands.

First of all, an example. I propose not to torment the unfortunate factorial this time and use another function:

static int add(int x, int y) { if (y == 0) return x; return add(x ^ y, (x & y) << 1); }

This is a recursive way of adding 2 integers. It fits the definition of tail recursion: each recursive call is immediately followed by a return operation. Optimization is to recursively call not to create a new stack frame, and reuse the current one. To do this, put new parameters in place of the old ones and perform an unconditional transition to the first instruction of the method. In pseudocode, it will look like this:
')

 static int add(int x, int y) { start: { if (y == 0) return x; (x, y) = (x ^ y, (x & y) << 1); goto start; } }

Further, various stylistic variations are possible, but in general, the code after manual optimization will become like this (it is possible and more beautiful, the verbosity here is only for clarity):

 static int add(int x, int y) { while (true) { if (y == 0) return x; int _x = x ^ y; int _y = (x & y) << 1; x = _x; y = _y; } }

Immediately it becomes obvious minus the manual optimization of tail recursion: the code becomes worse, especially if there are more than one recursive call. I'd like to optimize automatically.

And everything would be fine, but there is one BUT: optimization breaks the semantics of program execution. This is because in Java at any point in the code information about the call stack is available. And if in the case of inline methods, JIT is still capable of resolving everything by itself, then when we replace the recursive call with GOTO we lose a lot of stack frames with information about the entry points that we should have according to the specification.

This is unpleasant and suggests that, most likely, we will not see this optimization in Java.

To continue the study, we accept the fact that several lines will disappear from stacktrace. Suppose that the gain in speed (or the beauty of the code) is more important. We define other factors that may interfere with optimization:

polymorphism:

in order to implement GOTO , you need to know for sure that in fact the same method should be called. This requirement is met for:

- static methods;
- private methods;
- methods of final classes;
- explicit invokespecial calls. This option cannot be created by the compiler from Java source code, since invokespecial is only available for calling methods of the superclass.
exceptions:

if suddenly a recursive call occurs inside a try block, then we cannot just take and transfer control instructions outside this block itself. Here is an example:
```
 static int f(boolean shouldThrow) { if (shouldThrow) throw new RuntimeException(); try { f(!shouldThrow); } catch (Exception e) { } return -1; } 
```
A call to f(false) should return -1, but if we make a GOTO in the place of a recursive call, we will get a RuntimeException , and this is clearly different from what should happen with the correct optimization.

There are at least 2 proven ways to modify Java class bytecodes:

the preprocessor is built into the compiler, the bytecode changes will go to the class files;
ClassLoader - embedded in ClassLoader , changes will be visible only in runtime.

I chose the second option and wrote a simple Java Agent that optimizes tail recursion. He will be able to optimize only under the above conditions:

static method / final method / final class ;
recursive calls are outside try blocks.

For impatient link to github: github.com/ibessonov/java-tailrec-agent .
There is a customized IDEA project, in which there are examples to play with them. And for the patient - a little explanation on the example.

The verification of access modifiers in separate explanations does not need to be made obvious by its implementation. Therefore, we omit it and proceed to consider the typical method and see what happens to it:

 static int gcd(int n, int m) { try { if (m == 0) return n; } catch (Throwable t) { // do nothing } return gcd(m, n % m); }

After compilation, the method will look as follows (simplified for readability, some of the information is intentionally omitted):

 static gcd(II)I TRYCATCHBLOCK TryBlockStart TryBlockEnd CatchBlockStart java/lang/Throwable TryBlockStart: ILOAD 1 IFNE ElseBlock ILOAD 0 TryBlockEnd: IRETURN CatchBlockStart: ASTORE 2 //      -    ElseBlock: ILOAD 1 ILOAD 0 ILOAD 1 IREM INVOKESTATIC Main.gcd(II)I IRETURN

Each method contains an array of try block descriptions. Each description has 4 components: the start of the block, the end of the block, the exception handler, and the exception class descriptor. This information allows us to unambiguously determine the instructions that are inside the try blocks, and not to optimize them.

Next, you need to find all INVOKE* instructions with a method descriptor that matches the method itself (in this case, the gcd(II)I descriptor of the Main class method is searched for), immediately followed by a RETURN instruction.

Found INVOKESTATIC must be converted from a call to an unconditional transition. It is known that at the time of the call all the parameters are on the stack. For static methods, everything is simple, you need to save these parameters back to local variables and make an unconditional transition to the very beginning:

 static gcd(II)I TRYCATCHBLOCK TryBlockStart TryBlockEnd CatchBlockStart java/lang/Throwable StartLabel: TryBlockStart: ILOAD 1 IFNE ElseBlock ILOAD 0 TryBlockEnd: IRETURN CatchBlockStart: ASTORE 2 ElseBlock: ILOAD 1 ILOAD 0 ILOAD 1 IREM ISTORE 1 ISTORE 0 GOTO StartLabel

For non-static methods, one interesting problem arises: the method is invoked on an object, which, generally speaking, does not have to coincide with this . It is technically possible to find in the bytecode the place of calculation of this object and to carry out optimization only when this calculation is equal to ALOAD 0 .

I acted lazily and saved the already calculated value of the required class instead of the current this (making ASTORE 0 ). Strangely enough, this approach works, but I, due to insufficient knowledge, cannot guarantee that the JIT will behave correctly in this situation. I would like to know the answer in the comments - are there any risks here?

In addition to simple tests, I tried to connect the agent to existing applications. In Tomcat, there was not a single method in which to optimize. In IntelliJ IDEA, there were at least a dozen of such methods, but the application crashed. Given the presence of several agents and the complex logic of class loaders in IDEA, I would not dare to say what went wrong.

As a result, a tool has been written that optimizes tail recursion in all methods, where it can be done without violating semantics (with the exception of modifying the stack of calls). Of the obvious drawbacks can be noted the complexity of debugging. On the other hand, debugging can always be done while the agent is off.

Is it worth them somewhere to use - it's up to you.

Source: https://habr.com/ru/post/319282/

All Articles

Tail recursion optimization in Java

More articles: