Lambda the ripper

Although Java 9 was recently released with a new modular system, many still continue to use the usual eighth version, with lambdas. For six months, I worked closely with her and all her innovations. If everything is clear with the new collection and Optional methods, it’s not so obvious with lambdas. In particular, how they are implemented and how they affect performance. And most importantly - how do they differ from the good old anonymous classes.

In this article, I will not understand the syntax of Java 8 - there are enough such articles and books already written. I was interested in the questions of how this all works, so I decided:

Understand the theory
See what's inside the lambda
Understand their performance impact.

Quite a bit of theory

For a start, it would be nice to deal with the types of lambda. It's all quite simple, there are two types:

Non-capturing - the most simple, not tied to the environment. Do not contain references to external variables. Do not call instance methods. Can call static methods.
Capturing - have a connection with the outside world, such functions are also called closures . They, in turn, can be divided into two subtypes: those that refer to variables inside a method or class, and those that call an instance method.

First approximation

I will act in order. Let's see how the code compiles with lambdas. Let's take the simplest example that creates and immediately causes lambda:

public class TestRun {  public static void main(String[] args) throws Exception {   ((Callable<Integer>) (() -> 10)).call();  } }

While I have enough standard functionality built into the JDK. To view the contents of a class file, you can use:

javap -p -c -v -constants TestRun.class

This command will output the contents of the methods and the constant pool for the class:

 Constant pool:        #2 = InvokeDynamic    #0:#30     // #0:call:()Ljava/util/concurrent/Callable;        #3 = InterfaceMethodref #31.#32     // java/util/concurrent/Callable.call:()Ljava/lang/Object;        #4 = Methodref      #33.#34     // java/lang/Integer.valueOf:(I)Ljava/lang/Integer; public static void main(java.lang.String[]) throws java.lang.Exception; Code:        0: invokedynamic #2,  0        // InvokeDynamic #0:call:()Ljava/util/concurrent/Callable;        5: invokeinterface #3,  1       // InterfaceMethod java/util/concurrent/Callable.call:()Ljava/lang/Object; private static java.lang.Integer lambda$main$0() throws java.lang.Exception; Code:        0: bipush     10        2: invokestatic  #4          // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;        5: areturn

In the main method there are only two instructions: invokedynamic creates an instance of a certain class, and invokeinterface calls the call () method on the object that lies on the stack. There is also a constant pool in the class, it contains the description of the # 2 method for which the lambda will be created, and # 3 the description of the interface method. A strange lambda $ main $ 0 () method also appeared, which we did not order. But if you look closely, then it contains the lambda code: it creates an Integer variable and returns it. It is referred to by structure # 2 of the constant pool .

Just a couple of links:
- Invokedynamic instruction specification
- Structure description from constant pool

This example gives more questions than answers. It is completely incomprehensible how the call to the interface method leads us to the generated lambda $ main $ 0 () . To clarify this will have to climb into the contents of the lambda.

What gut?

To move on, I will need special tools. I would like to know what is inside the objects from which we call the method. For these purposes, you can use an additional parameter:

-Djdk.internal.lambda.dumpProxyClasses = [dir]

If you add it, then at runtime we get in the [dir] folder of the proxy classes that the factory generates.

Lambdas easier

I will move further from simple too. First, I will analyze an example with lambdas that do not contain references to the surrounding context:

 public class TestNonCapturing { public static void main(String[] args) throws Exception {  Callable<Integer> r = () -> 10; } }

The code will generate TestNonCapturing $$ Lambda $ 1.class , which is very simple:

 final class TestNonCapturing$$Lambda$1 implements Callable { private TestNonCapturing$$Lambda$1() { } @Hidden public Object call() {  return TestNonCapturing.lambda$main$0(); } }

This is the final class that works with a statically generated method TestNonCapturing.lambda $ main $ 0 () . The calling code from main accesses its own method through a wrapper that the invokedynamic instruction will generate at runtime.

Lambda more difficult

Now we will look inside lambdas which refer to environment variables. For this, it suffices, for example, to refer to the method variable in which the lambda is created:

 public class TestCapturingVariable { public static void main(String[] args) throws Exception {  int methodVariable = 5;  Callable<Integer> r = () -> 10 + methodVariable; } }

TestCapturingVariable $$ Lambda $ 1.class will be a bit more complicated:

 final class TestCapturingVariable$$Lambda$1 implements Callable { private final int arg$1; private TestCapturingVariable$$Lambda$1(int var1) {  this.arg$1 = var1; } private static Callable get$Lambda(int var0) {  return new TestCapturingVariable$$Lambda$1(var0); } @Hidden public Object call() {  return TestCapturingVariable.lambda$main$0(this.arg$1); } }

Here the context has already appeared, the constructor has the argument int var1 . By calling TestCapturingVariable.lambda $ main $ 0 , we pass the local variable arg $ 1 . A copy of the lambda is obtained through the getter. Why did the getter appear over the constructor - I honestly do not know. I guess these are implementation details in the JVM. If you have an answer to this question, I will be glad to know it in the comments.

Let me try to complicate the example a bit and add a call to the class instance method:

 public class TestCapturingMethod { public static void main(String[] args) throws Exception {  TestCapturingMethod v = new TestCapturingMethod();  Callable<Integer> r = v::instanceMethod; } private int instanceMethod() {  return 10; } }

Suddenly: Exception in thread "main" java.lang.VerifyError

In this case, the JVM was embarrassed by the fact that the instanceMethod is private and is being called from another class. You can make it public or add –noverify to the command line. The contents of the TestCapturingMethod $$ Lambda $ 1.class will be as follows:

 final class TestCapturingMethod$$Lambda$1 implements Callable { private final TestCapturingMethod arg$1; private TestCapturingMethod$$Lambda$1(TestCapturingMethod var1) {  this.arg$1 = var1; } private static Callable get$Lambda(TestCapturingMethod var0) {  return new TestCapturingMethod$$Lambda$1(var0); } @Hidden public Object call() {  return Integer.valueOf(this.arg$1.instanceMethod()); } }

As can be seen from the decompiled code, the difference is small, arg $ 1 from the parameter turned into an instance of the class that has the method called. In the method call () still appeared autoboxing.

How it works

Now it is more or less clear what is inside the objects themselves. Let me try to figure out how this works and whether there are differences between closures and simple lambdas in this example:

 public class LambdaRun { public static void main(String[] args) throws Exception {  int local = 10;  for (;;) {   Callable<Integer> nonCapturing =  () -> 10;   Callable<Integer> capturing =  () -> 10 + local;   System.out.println("Non-capturing: " + nonCapturing.hashCode());   System.out.println("Capturing: " + capturing.hashCode());  } } }

Here, in the loop, exciting and non-capturing lambdas are generated, then their hash is printed. The output will be something like this:

 Capturing: 231987608 Non-capturing: 1595428806 Capturing: 1549385383 Non-capturing: 1595428806 Capturing: 1879451745 Non-capturing: 1595428806

Obviously, in one case, a new object is created each time, but not in the other. It seems that the JVM has optimized something, and the lambda factory generates new objects only when it is really necessary. It is logical to use the object again if its content does not depend on the environment. When you call a lambda that captures the context, a new object will be created each time - and this case is of greater interest for research, since GC load may be implicitly added. But it also turned out not so simple.

To stack or not to stack

The enlightened reader will notice that if the object is not beyond the scope of the method, then it will most likely fall under the escape analysis , it will be created on the stack and there will be no load on the GC. But who causes the lambda in the same method where it creates them? The basic idea here: lambda is a higher-order function, a function that accepts or returns another function. Thus, lambda almost always goes beyond the boundaries of the method where it was created. Any book or article on Java 8 is filled with similar examples.

An even more enlightened reader will notice that sometimes methods can be included into each other by the JIT compiler at runtime — and then the escape analysis will work.

Just a couple of links on the topic:
- Escape analysis
- Method inlining

Take the following example. If before that the examples were artificial, then this one is close to reality. In the loop, a closure is created that is created in a separate method, which can be considered a higher order function:

 public class CapturingLambdaLongRun { int i = 10; public static void main(String[] args) throws Exception {  CapturingLambdaLongRun run = new CapturingLambdaLongRun();  while (true) {   getLambda(run).run();  } } public static Runnable getLambda(CapturingLambdaLongRun run) {  return () -> {   run.i++;  }; } }

I will run this code under VisualVM for one minute:

Strangely enough, there is nothing criminal here, although a new object should be created for every getLambda call. Now I will try to disable inline by adding the -XX parameter : MaxInlineLevel = 0 . And here the picture changes a lot:

Why at first everything was smooth and smooth, and then changed? When JIT worked to its fullest and I did not put a stick in his wheels, the getLambda method was included in main, and the new Runnable was allocated on the method stack. Therefore, there were no problems. When disconnecting inline, everything began to work exactly as it looks in Java code, both optimizations were turned off ( inlining , followed by escape analysis ), and there was a load on the GC, since creation of objects passed from stack to heap.

In this example, I artificially turned off the optimization, but I think it is easy to imagine the following situations:

In the process of project development, the closure method has grown and ceased to inline due to restriction. By default, MaxInlineSize = 35 .
When refactoring, the environment variable was added to the large method that creates the lambda, so the type changed and a new object on the heap was created for each call.

Total

It's time to summarize my little research. What did you find out:
- There are different types of lambda expressions: although they have the same syntax, inside they are arranged differently and work differently.
- Quite discreetly, you can switch from one type of lambda to another, thus changing the load on the GC.
- The lambda method call itself is no different from any other method call, there is no reflection here.

And a couple of words about the good old anonymous classes, I will try to compare them with lambda expressions:

- An anonymous class is generated at compile time, the lambda code creates a factory at run time.
- Code generation on the fly can be faster than loading from the classpath . Since A call to the classpath may cause a disk read, some tests confirm that a cold start is faster for lambdas than for anonymous classes.
- The code of the lambda is placed in the generated method of the same class where it is created. All the code of the anonymous class is contained in it.
- Anonymous classes have explicit syntax. We know for sure that one object will be created for each call. Non-capturing lambdas here make optimization and implicitly reuse one object.

How to live on

I hope this article has helped lift the veil of secrecy. Understand how lambda expressions work and what's inside them. Now I consciously add the dependence of the closures on the surrounding context, knowing what this may lead to. What to do next with all this information:

- If you are developing an application that does not have strict performance requirements, then you can rely on the JIT compiler. In most cases, he saves. But even here one should not forget about such simple rules as, for example, not to make big methods. This affects not only readability.
- In the critical load code, you need to be careful with lambdas. If they suddenly turn into closures, this may have consequences. Therefore:
- Avoid references to method variables or class instance.
- It is best to refer to static methods

Source: https://habr.com/ru/post/343624/

All Articles