Compiling nested classes: javac and ecj

As you know, in the Java language there are nested (nested) classes declared inside another class. There are even four types of them - static nested, inner (inner) , local (local) and anonymous (anonymous) (in this article we do not touch on lambda expressions that appeared in Java 8). All of them are united by one interesting feature: the Java virtual machine has no idea about the special status of these classes. From her point of view, these are ordinary classes located in the same package as the external class. All work on converting nested classes to regular ones falls on the compiler. And here it is curious to see how different compilers cope with it. We look at the behavior of javac 1.8.0.20 and the ecj compiler from Eclipse JDT Core 3.10 (included with Eclipse Luna).

Here are the main problems associated with compiling nested classes:

Access rights;
Passing an object reference to an external class (irrelevant for static nested classes);
Passing local variables from external context (similar to closure).

This article will talk about the first two problems.

Access rights

With access rights there is a big trouble. We can declare a field or method of a nested class as private, and according to the Java specification, this field or method can still be accessed from an external class. It is possible and vice versa: to refer to a private field or a method of an external class from a nested one, or from one nested class to use another. However, from the point of view of a Java machine, accessing private members of another class is unacceptable. The same goes for accessing protected members of a parent class located in another package. To circumvent this limitation, compilers create special access methods. They are all static, have access package-private and are called starting with access $. Moreover, ecj calls them simply access $ 0, access $ 1, etc., and javac adds at least three digits, where the last two encode a specific operation (read = 00, write = 02), and initial ones encode a field or method. Access methods are required to read fields, write fields and call methods.

The access methods for reading fields have one parameter — an object, and the methods for writing fields — two parameters (an object and a new value). In this case, the write methods in ecj return void, and in javac, the new value (the second parameter). Take for example the following code:
')

public class Outer { private int a; static class Nested { int b; void method(Outer i) { b = ia; ia = 5; } } }

If the bytecode generated by javac is translated back to java, you get something like this:

 public class Outer { private int a; static int access$000(Outer obj) { return obj.a; } static int access$002(Outer obj, int val) { return (obj.a = val); } } class Outer$Nested { int b; void method(Outer i) { b = Outer.access$000(i); Outer.access$002(i, 5); } }

The ecj code is similar, only the methods are called access $ 0, access $ 1 and the second returns void. Everything will become much simpler if you remove the word private: then access methods will not be needed and the fields can be accessed directly.

Interestingly, javac behaves smarter with field increments. For example, compile this code:

 public class Outer { private int a; static class Nested { void inc(Outer i) { i.a++; } } }

Javac will display something like this:

 public class Outer { private int a; static int access$008(Outer obj) { return obj.a++; } } class Outer$Nested { void inc(Outer i) { Outer.access$008(i); } }

A similar behavior is observed with decrement (the name of the method will end with 10), as well as with pre-increment and pre-decrement (04 and 06). In all these cases, the ecj compiler will first call the read method, then add or subtract one and call the write method. If someone wonders where the odd numbers are, they will be used for direct access to the protected fields of the parent of the outer class (for example, Outer.super.x = 2, I can’t imagine where it could be useful!).

By the way, it is curious that javac 1.7 behaved even smarter, generating special methods for any assignment operations like + =, << =, etc. (the right part was calculated and transferred to the generated method as a separate parameter). A special method was generated, even if you applied + = to an inaccessible string field. In javac 1.8, this functionality broke, and it seems that by accident: the corresponding code is present in the compiler's source code.

If the programmer himself creates a method with the appropriate signature (for example, access $ 000, never do that!), Javac will refuse to compile the file, displaying the message "the symbol (conflict) conflict with a compiler-synthesized symbol in (class)". The ecj compiler quietly transfers conflicts by simply increasing the counter until it finds a free method name.

When you try to call an inaccessible method, an auxiliary static method is created that has the same parameters and a return type, only an additional parameter is added to transfer the object. A more interesting situation is the use of a private constructor. When constructing an object, you must call the constructor. Therefore, compilers generate a new non-private constructor, which calls the necessary private one. How to create a constructor that does not conflict with the existing ones exactly according to the signature? Javac for this purpose generates a new class! Take this code:

 public class Outer { private Outer() {} static class Nested { void create() { new Outer(); } } }

When compiling, not only Outer.class and Outer $ Nested.class will be created, but another class Outer $ 1.class. The code created by the compiler looks like this:

 public class Outer { private Outer() {} Outer(Outer$1 ignore) { this(); } } class Outer$1 {} //      ,  ,     class Outer$Nested { void create() { new Outer((Outer$1)null); } }

The solution is convenient in the sense that a conflict over the signature of the constructor will not be guaranteed. The ecj compiler decided to do without an extra class and add the same class with a dummy parameter:

 public class Outer { private Outer() {} Outer(Outer ignore) { this(); } } class Outer$Nested { void create() { new Outer((Outer)null); } }

In the event of a conflict with an existing designer, new dummy parameters are added. For example, you have three constructors:

  private Outer() {} private Outer(Outer i1) {} private Outer(Outer i1, Outer i2) {}

If you use each of them from a nested class, ecj creates three new ones that have three, four, and five Outer parameters.

Passing an External Class Object Reference

Inner classes (including local and anonymous) are tied to a specific object in the outer class. To achieve this, a new final-field is added to the inner class by the compiler (usually with the name this $ 0), which contains a reference to the surrounding class. At the same time, the corresponding parameter is added to each constructor. If you take this simple code:

 public class Outer { class Nested {} void test() { new Nested(); } }

The compilers (ecj and javac behavior here seems to) turn this code into something like this (I remind you that I restore it manually using bytecode, so that it is clearer):

 public class Outer { void test() { new Outer$Nested(this); } } class Outer$Nested { final Outer this$0; Outer$Nested(Outer obj) { this.this$0 = obj; super(); } }

It is curious that the assignment of this $ 0 occurs before calling the parent class constructor. In normal Java code, you cannot assign a value in the field before executing the parent constructor, but bytecode does not prevent this. Because of this, if you override the method called by the constructor of the parent class, this $ 0 will already be initialized and you can easily access the fields and methods of the outer class.

If you create a conflict by name, starting a field with the name this $ 0 in the Nested class (never do that!), The compilers will not be confused: they will call their internal field this $ 0 $.

The Java language allows you to create an instance of the inner class not only on the basis of this, but also on the basis of another object of the same type:

 public class Outer { class Nested {} void test(Outer other) { other.new Nested(); } }

An interesting point arises here: the other may turn out to be null. For good, you should fall in this place with a NullPointerException. Usually, the virtual machine itself ensures that you do not dereference null, but in fact there will be no dereference until you use the outer class inside the Nested object, which may happen much later or not at all. The compilers have to get out again: they insert a dummy call, turning the code into something like this:

 public class Outer { void test(Outer other) { other.getClass(); new Outer$Nested(other); } }

The call to getClass () is safe: for any object it must succeed and takes a little time. If it turns out that in other null, the exception will happen even before the creation of the Nested object.

If the nesting level of classes is more than one, then new variables appear in the innermost ones: this $ 1 and so on. As an example, consider this:

 public class Outer { class Nested { class SubNested { {test();} } } void test() { new Nested().new SubNested(); } }

Here javac will create something like this:

 public class Outer { void test() { Outer$Nested tmp = new Outer$Nested(this); tmp.getClass(); //  ,   new Outer$Nested$SubNested(tmp); } } class Outer$Nested { final Outer this$0; Outer$Nested(Outer obj) { this.this$0 = obj; super(); } } class Outer$Nested$SubNested { final Outer$Nested this$1; Outer$Nested$SubNested(Outer$Nested obj) { this.this$1 = obj; super(); this.this$1.this$0.test(); } }

The call to getClass () could have been removed, since we just created this object, but the compiler does not bother. But ecj generated an access method altogether unexpectedly:

 class Outer$Nested { final Outer this$0; Outer$Nested(Outer obj) { this.this$0 = obj; super(); } static Outer access$0(Outer$Nested obj) { return obj.this$0; } } class Outer$Nested$SubNested { final Outer$Nested this$1; Outer$Nested$SubNested(Outer$Nested obj) { this.this$1 = obj; super(); Outer$Nested.access$0(obj).test(); } }

Very strange, given that this $ 0 does not have a private flag. On the other hand, ecj guessed to reuse the obj parameter instead of referring to the field this.this $ 1.

findings

Nested classes present some headaches for compilers. Do not hesitate to access package-private: in this case, the compiler will do without autogenerated methods. Of course, modern virtual machines almost always make them idle, but still the presence of these methods requires more memory, inflates the pool of class constants, lengthens the stack-traces and adds extra steps when debugging.

Different compilers can generate very different code in similar situations: even the number of generated classes may differ. If you are writing bytecode analysis tools, you must take into account the behavior of different compilers.

Source: https://habr.com/ru/post/250029/

All Articles

Compiling nested classes: javac and ecj

Access rights

Passing an External Class Object Reference

findings

More articles: