📜 ⬆️ ⬇️

Minimal value types

This article is a translation of a specification that describes the minimum implementation of value types in Java, which has been eagerly awaited for several years. Welcome to MVT!


Notes on translation


This text is intended primarily for developers of the Java platform. We even argued a bit with the developers: how interesting could such a text be to the general public? There are too many extraneous, very precise details about internal magic.


To make it clearer, we will talk about including the class file format and the JVM instruction set , and these web pages may well have to be kept in front of your eyes in order not to get lost in the text. Personally, Vladimir Ivanov and his reports helped me to deal with this (for such cases there are discussion zones at the Joker conference ).


But let's take a look at it from the other side. In the Russian-speaking community, there is a tendency to view Java as a kind of bloody ideal of the enterprise, unchanging, unbreakable, and staying in an unbroken stagnation almost from its very appearance. Legend has it that the most fun happens in JavaScript and so on, and the share of java accounts for boring Legacy tinkering. A real hacker absolutely can’t find anything worthy of applying his extraordinary abilities.


There is a community, a kind of common culture, consisting of experienced programmers and network wizards, whose history can be traced back to the first time-sharing minicomputers and the earliest experiments with the ARPAnet network. The members of this culture gave birth to the term “hacker”. Hackers have created the Internet. Hackers have made the Unix operating system what it is today. Hackers created Java and JVM.


It should be clearly understood that the modern hacker world is not limited to marginal tasks in (yet / already) unknown languages, and hacking of the FBI site as in the movie Swordfish. Sometimes graceful tasks that require a lot of strength and talent appear in such well-known things as the Java platform - in the things about which they write notes on Habré today, and tomorrow half of the world will use them. Sometimes you look at a piece of Java code and you are amazed - what unknown logic led to such a sophisticated technical solution. Well, here it is, this logic is described in the following text.


This is an article for Java developers. But it can and should be read today to catch the moment. Because tomorrow everything will change .


Note : it is already being updated, right now! Pot, do not boil! Updates based on data from the JVM Language Summit and the most recent developments will be presented in the following articles. Most likely, in the same place we will write a certain review article, giving a top-level, more practical, understanding of MVT. If you need such an article, please write about it in the comments.


Introduction


Three years ago we published the first version of this document in open access. Three years of heated discussions and violent prototyping of the Java compiler, class-file format and virtual machine. The goal was to integrate primitives, references, and values ​​into a common platform that would support efficient generalized object-oriented programming.


Much of the discussion focused on the specialization of generics as a way to implement full parametric polymorphism in Java and JVM. We concentrated on this on purpose, and it bore fruit: it helped to find situations where primitives are not combined with links, which, in turn, forced us to expand the bytecode model. Having dealt with List<int> , it was already much easier to go to List<Complex<int>> .


The remaining discussions focused on the development of semantics and on specific tactics for implementing new bytecodes that can work with values. In several experiments, APIs were implemented that were similar to what was needed and which did useful things such as looping vectorization .


Returning from the past to the present, at the JVM Language Summit (2016), and at the Valhalla EG meeting, we were asked many times to provide an “early access” build that could be played with vectorization, GPU and Panama. This document schematically shows which subset of JVM experimental features (and to a lesser extent, language and libraries) are suitable for the first experiments with value types.


Looking back, it makes sense to assess the scale of the task: thousands of engineering-hours devoted to working out the bright future in which we live now. Now is the best time to use this in and out and, finally, choose the first option, something like “hello world” for a type-value system.


This document offers a minimal, but still sufficiently working, subset of the functionality of value types to achieve the following goals:



In addition to the goals, we definitely will not:



In other words, before retrieving our value types to the light of day, we have to create their prototype in a certain gray zone, on the border between daily discussions and the release of public specification. Such a prototype, even if very limited, would be very useful. It will allow to experiment with various approaches to the design and implementation of value types. Approaches that have not proven their viability can always be thrown away - this is just a prototype! Also, as soon as advanced users begin to experiment, we can make more accurate estimates of performance and usability.


Functionality


Specific functionality for our minimal (but still working) support for value types can be formulated as follows:



Special value-capable classes that support value types can be developed in modern toolchains as standard POJOs . Then, the usual Java source code, including generic classes and methods, can only access values ​​in their boxed form. At the same time, references to methods and a specially generated byte code can work with values ​​in their original, non-box form.


This job is for JVM, not language. Therefore, we are not going to do the following:



While the slogan "looks like a class, works like an int" expresses our common vision of value types, this minimal set of functionality will provide something like "it works as an int if you can catch it in boxing or a link . "


We have limited the scope of this work specifically so that useful experiments on the product JVM can be started much earlier than if we rolled out the entire functionality stack of value types at once.


Support for new functionality at the JVM level will instantly prototype new language features and tools that directly use these features. But the minimal project does not depend on such features of the language and tools.


VCC Classes


A class can be marked with a special annotation @DeriveValueType (or, perhaps, an attribute). The class so marked is called the value-capable class (abbreviated VCC ). This means that in addition to its own type, this class can be assigned to the derived value type ( derived value type , or abbreviated DVT ).


The use of the annotation will be somehow limited, for example, it can be unlocked only using the command line options, associating it with some module of the incubator .


Example:


 @jvm.internal.value.DeriveValueType public final class DoubleComplex { public final double re, im; private DoubleComplex(double re, double im) { this.re = re; this.im = im; } ... // toString/equals/hashCode, accessors, math functions, etc. } 

The semantics of the marked class will be the same as if the annotation had not been affixed. But this annotation will allow the JVM, in addition to everything else, to try using the marked class as the source for the associated derived value type.


The superclass of this VCC must be Object (there was a similar requirement in the full set of functionality, in which superclasses were prohibited).


A class labeled as VCC must meet the requirements for value-based classes , since its instances will be used as boxes for the values ​​of the associated value type. In particular, a class and all its fields must be marked as final , and the constructor must be private.


A class marked as VCC must not use any methods provided from Object on all its instances, since Such use would lead to uncertain consequences of operations on the box version. The equals , hashCode and toString methods must be completely replaced without attempting to call Object using super .


As an exception, you can freely use the getClass method; it behaves as if it were replaced in VCC with a method that returns a constant.


Like all value-based classes, the remaining methods ( clone , finalize , wait , notify and notifyAll ) should not be used either. This feature we put on the user, he must achieve this manually. In the full version of the functionality, we will try to find ways to achieve the same automatically.
Summarizing, the JVM will do the following structural checks against the VCC:



These structural checks are performed when the JVM creates a DVT from VCC. The phases of this process are described below.


In addition to the limitations described above, VCC can do everything that ordinary value-based classes do. For example, the definition of constructors, methods, fields and nested types, the implementation of interfaces, the definition of type variables on themselves or on methods. There are no special restrictions on field types.


As we will see later, derived value types contain only fields . They will contain the same set of fields as the VCC from which they are derived. But the JVM will not add methods, constructors, nested types or super-types to them. (In the full set of functionality, of course, value types will be "encoded as classes" and support all these features).


Note : VCCs that are compiled using the standard javac compiler will not be able to define nested fields (more precisely, “inline sub-value”), which themselves are value types. The maximum that can be done is to enter fields with their associated reference types ("L-types"). An updated version of Java will be able to declare such sub-values ​​immediately as real "Q-types". This javac version will give developers the ability to work with value types directly, without using focus to create a separate derived value type from VCC. Therefore, if you see a field in VCC that is typed as VCC by itself, this is most likely an error, as it will lead to unintended boxing of the sub-value.


Here is a slightly more detailed example of a VCC that describes a super-large long:


 @DeriveValueType final class Int128 extends Comparable<Int128> { private final long x0, x1; private Int128(long x0, long x1) { ... } public static Int128 zero() { ... } public static Int128 from(int x) { ... } public static Int128 from(long x) { ... } public static Int128 from(long hi, long lo) { ... } public static long high(Int128 i) { ... } public static long low(Int128 i) { ... } // possibly array input/output methods public static boolean equals(Int128 a, Int128 b) { ... } public static int hashCode(Int128 a) { ... } public static String toString(Int128 a) { ... } public static Int128 plus(Int128 a, Int128 b) { ... } public static Int128 minus(Int128 a, Int128 b) { ... } // more arithmetic ops, bit-shift ops public int compareTo(Int128 i) { ... } public boolean equals(Int128 i) { ... } public int hashCode() { ... } public boolean equals(Object x) { ... } public String toString() { ... } } 

Similar types were used in the prototype of vectoring cycles. This example is contained in the java.lang prototype package. But those VCCs that are described in this minimal document will not be included in any public API. Their visibility will be strictly limited, for example, by a system of modules.


Note : first of all, VCCs will appear to extend numeric types, for example, long . Therefore, they must immediately follow some standard and have a consistent set of arithmetic and bit operations. Sooner or later, you will need to create a set of interfaces that express the general structure of operations for numerical primitives and numeric values.


Type-value and type-object


When a JVM loads a VCC, it can either aggressively create a derived type-value, or vice versa, set a flag on the class that indicates that the value-type should be created on demand. (It is recommended to use the second method).


Note: the minimum set of functionality may not consider this order and leave it undefined. As for the full version, the question is controversial, because derived type values ​​and VCC are identical in it.


The VCC itself does not change at all during the boot phase. It remains the usual POJO for the value-based class.


The corresponding type-value is created as a copy of this class, but with the following critical changes:



The name given to DVT is hidden inside the implementation. Instead, the name VCC is used to designate both types, and it is argued that there is always enough information to resolve uncertainty. In bytecode descriptors, the letters Q and L are used to reflect this difference, we denote VCC as Q-type and DVT as L-type.


DVT creation should occur at some point after VCC loading, but before the first DVT instance is created. This is controlled by the semantics of those specific instructions that trigger the initialization of DVT in exactly the same way as the formerly existing instructions (such as getstatic or new ) trigger the initialization of normal classes. Details will be described below.


Let's start again with the DoubleComplex class DoubleComplex :


 @jvm.internal.value.DeriveValueType public final class DoubleComplex { public final double re, im; ... double realPart() { return re; } } 

When the JVM decides to synthesize the derived value type for the DobuleComplex type, it makes a fresh copy, cutting out all the class members except the double fields. It is important to note that, in order to transform a synthetic class into a value-type, the JVM uses not just the type-object, but special internal magic.


Inside the JVM, the resulting type-value will look something like this:


 @jvm.internal.value.DeriveValueType public final class L-DoubleComplex { public final double re, im; ... double realPart() { return $value.re; } } public static __ByValue class Q-DoubleComplex { public final double re, im; } 

The hypothetical keyword __ByValue indicates that instead of references, values ​​are determined. Until we seriously improve the whole stack, such things cannot be directly written in the source code, but it is reasonable and useful to do this at the class loading stage.


Notice that the derived type value has no constructors. This would normally be a problem, since the JVM requires that the class has at least one constructor. But in this case, the JVM resolves this because it knows about the value types. (Such a restriction is not necessary if we talk about meanings in general - but this story is too long to fit it here in the margin notes ). In any case, the derived value type will borrow the constructors of its VCC, as we will see in the next section.


Note : this design can be called "box-first", the design based on boxing. The point is that the JVM loads only boxes, and the value type is created as a side effect of the load. In the end, we come to this design when the values ​​themselves become the starting point of the design, but the current box-first design imposes far fewer restrictions on the tools used to read and write class files, including JVM and javac. Therefore, despite some oddity, this box-first design is currently the best solution.


Boxing, anboxing and borrowing


Under the hood of a JVM, for converting between VCC and value type, boxing and anboxing operations are arranged. The semantics of these operations is simple bit-wise copying between them. It is obviously well defined, so the list of fields is identical.


A synthetic anboxing operation allows derived value types to contact the constructors of their VCC, though not directly. Using the constructor, the programmer can make the box, and unpack it to get the constructed value. The JVM just copies the fields from inside this box, and then the box is erased. (In the full version, types-values ​​can have real designers, they will not need to borrow anything from their boxes).


Also, synthetic anboxing operation allows value types to access VCC methods. In the same way, a programmer can temporarily go to a value, and call on him any methods from the VCC, and throw out the box as soon as the method is completed. Since the boxes are short-lived, most likely, the JVM will be able to optimize or throw them away, at least for simple methods. (In the full version, value types can have real methods, they will not need to borrow anything from boxes. On the contrary, boxes will be able to borrow methods from values).


Notice that the synthetic boxing operation creates a new instance of VCC without performing a constructor . Normally this would be a problem, but in this case, these two classes are so strongly connected that it is safe to assume that any values ​​will be created (first of all) using the well-designed boxing anboxing. The boxing and anboxing pattern is very similar to the serialization and deserialization pattern. In both patterns, the second action overrides the normal creation of the object.


The synthetic anboxing operation also allows the derived value type to use the interfaces of its VCC, but not directly. Again, if the derived type-value is transferred somewhere where the interface is needed, the developer can simply box it and pass the link to the box. (In the full version, we want to give the value to work directly with the interfaces so that no boxing developer can be seen. This will require careful work on the interaction of values ​​and interfaces).


Finally, since static methods and fields are not copied into a derived type-value, the programmer can access them only through VCC, that is, through boxing.


Value descriptors


When using value-capable modules (VCM), the class-file description language is extended by Q-types, which directly denote value-types (without boxing). Descriptor Syntax: QBinaryName; where BinaryName is the internal form of the VCC name (the internal form is different in that all the slashes in it are replaced by dots). In fact, the class name is derived from the name of the corresponding VCC.


For comparison, the standard reference type descriptor is called the L-type. C VCC, Q-, L-. , L- Q-. , , , .


Q- , VCM. ( CONSTANT_Fieldref ) ( VCM!), getfield .


( : , , , , . , .)


Q- , VCM ( , VCM, , -. , ). , , .


Q- - -, null . - ( , ) - . , MethodHandles.empty .


( , - null , false , \0 0.0 . Java , . , , ).


Q- . , , Q- ( L- ).


( CONSTANT_Methodref CONSTANT_InterfaceMethodRef ) Q- . , , , Q-.


, CONSTANT_Fieldref Q- .


, Java Q- . , . , Valhalla , Q- .



, - , , Q-, .


, , Q-. .


, «» , ( L ; ). , CONSTANT__Class ( ) , . , ?


ldc ldc_w , , CONSTANT_Class , , Class . , , CONSTANT_Dynamic ( JDK-8177279 ).


CONSTANT_Methodref CONSTANT_Fieldref , CONSTANT_Class , «» - «». , , , Q L . «» , getfield L-, vgetfield — Q-.


( : «» , JVM CONSTANT_Fieldref , getfield vgetfield , , L- Q- . . JVM, Q- L- , . , - U-).


API MethodHandles.Lookup , Q- L- ( , ) Class , API, Lookup.findGetter . findGetter Q-, . (L-), , , , .


Q-


Q- ( Q-), java.lang.Object ; JVM , . , Q- Object . , , Object , , value-based , Object .


, Object.getClass , Class , VCC.


: , getClass - , . .


, Q- vinvoke ( ).


JVM, Q-


Q-, , . :



JVM Q- . , -, .. .


Q- ( ), , , -, ( ) null L-.


, , Q-, Q-, Q- . , , Q-.


- , -, .


( : « », . - , , . , - , , / -, , — , , . . API, , . ! )


, , , Q-. L-, , Integer[].class int[].class . , - Object ( int[] ), - . , .


JVM, new , ( , - getstatic ). -, , -. - , ( ).


, ( «») Q- Q- ( -) -, , , , . , : ( , ), ( ) , «» , , . ( -). , , , ( , ), .


, DVT , . , , , vdefault vunbox . DVT VCC, DVT , , VCC.


, vunbox , , VCC, , — DVT , , .



:



, , long double (, ).


— . , , . CONSTANT_Fieldref . , vbox , vunbox vdefault .


JVM Q- Q- , «» . , JVM ( value-capable L-, - ) .


, , , . , JVM «-» - , , «» -, - .


invokevirtual , invokespecial invokeinterface Q-, Q-. , invokestatic invokedynamic Q-, . , , Q- L-, Java, .


( : , Q-, .. vinvoke , . U-, uninvoke , U-, -, Q-, L-. , , -, , , , .)



, Q- , , Q- ( L-!) .


, Q-, Q- .


( ), Q- , Q- . , Q-, Q- .


int float , Q- , - oneWord top . , . Q- L-, - ( Object , ) L- .


, vload , vstore , vreturn invoke , , Q- — pop , pop2 , swap , dup . . Q-.


vaload vastore . «-», - Q-, . .


vgetfield , getfield . - - , .


vwithfield , . - . putfield final-, , , , . VCC DVT , JVM VCC vwithfield DVT. , , VCC - vwithfield .


( : - -, — final. . JVM , putfield final , . Java , , , . , -, JVM , vwithfield . final , vwithfield .)


Lookup API VCC DVT , nestmates. vwithfield . DVT , , , .


( : JVM nestmates VM, . JVM, vwithfield nestmates -. , vwithfield «», ).


vdefault , , «» -, . - JVM , - -. , , «» vdefault . - Q-.


Q-


Q-, :



:



, JVM , Q-, . , workaround , Q- .


(, , javac. VCC Q-, javac ).


-


-, , - , java.lang.Class . -, , Class- « ». — int.class , . -, Class-. , - , Q-, — L-. int.class Integer.class , . ( U-, . ).


, , . , «», , . .


( : , , Integer.class int.class . , - , ).


-, , Class- DVT VCC, VCC , DVT — .


, jdk.experimental.value.ValueType ( ) runtime- .


ValueType Q-:


 public class ValueType<T> { static boolean classHasValueType(Class<T> x); static ValueType<T> forClass(Class<T> x); Class<T> valueClass(); // DVT, secondary mirror Class<T> boxClass(); // VCC, principal mirror ... } 

classHasValueType , , Q- VCC ( L-). forClass Q- , VCC. ( IllegalArgumentException . , classHasValueType ).


valueClass boxClass , , java.lang.Class Q-, (VCC) L-.


ValueType.forClass(vt.valueClass()) vt , boxClass . , Class - ValueType .


Class.forName boxClass , . , . ( , T.class - , T , « int ».


Q- , getDeclaredMethods . , DVT, VCC, . , , .


, -, VCC ( boxClass ). ( , ). , VCC POJO, DVT .


Q- API , - ( int.class ). API ( Class java.lang.reflect ), API java.lang.invoke , MethodType MethodHandles.Lookup . , , Q- , L-, Class .


( , «crass» — «». , . — , Class.forName , . , «» — , java.lang.Class . « » ).


, API , , . Q- , , , JVM , .


, ( asType ) - , . , , DobuleComplex :


 Class<DoubleComplex> srcType = DoubleComplex.class; Class<DoubleComplex> qt = ValueType.forClass(srcType).valueClass(); MethodHandle mh = identity(qt).asType(methodType(Object.class, qt)); 

, MethodHandle.invoke Q-: , Java, ( ) , Q-.



Q- API, Object - . ( , , - println ), Object , . VCC L- Q- (, Q-) , , , .


, VCC ( L-), Q-. API -, , .


VCC L- Q-, toString — Java L-. ( this ), (HotSpot JVM , ).


, , . , , , println .


VCC value-based, , - , , null .


( : , , , ).


JVM ( ) , , ( escape analysis).


, . , Q- , , , Q- , , null.



, Q-, () . :



MethodHandles.Lookup API MethodHandles Q- ( Class-), , .


( : . invokedynamic , , . , «» JVM, invokedynamic ).


API :



( : , -, findVirtual , , final . — findSpecial , API findDirect , , . Java «final virtual» , - .)


L-, , , , ( ) L- , , -. , L-, , L-, Q-.


, , private static L-, , Q-. , JVM . , , , Q-. , JVM.


value-based , VCC Object . Q- , Object . , (, toString ).


, , ( ) API MethodHandle , jdk.experimental.value.ValueType .


ValueType :


 public class ValueType<T> { ... MethodHandle defaultValueConstant(); MethodHandle substitutabilityTest(); MethodHandle substitutabilityHashCode(); MethodHandle findWither(Lookup lookup, Class<?> refc, String name, Class<?> type); } 

defaultValueConstant , , - Q-. ( , ) - .


substitutabilityTest , Q- (). , , . , == Java, float double, « », .


, substitutabilityHashCode , , Q-, -, , , - .


( , - 64 . , 32- - -, -. -, , base-31 , , ).


findWither Lookup.findSetter , , , , , . , — .


«wither» , refc Q- ValueType . Q-, . . - , , winther- , . , withfield , , .


( wither «» «» -, . , . c.withRe(0) , , . , c.setRe(0) , .. , , . - , wither- . , , , , , wither', (get, set, wither). - , withRe(0) re(0) ).


, ValueType Lookup , MethodHandles , .


, .


, -. , . , Q-, Q . QC , , -, . VT<QC> ValueType , Q-.


RC Q- , L-. , Q- (« int!») .


, Q-, . , findStaticGetter , findStaticSetter , findVirtual , findStatic , findStaticSetter , arrayElementSetter , identity , constant , , .



API - Java. , , , .


Further work


, . , . , , , , , .


, -, . . , , , , .


Q- Java


, Q- . JVM (VCC), — -. , Q- ( ).


, . , javac Q- , Java ( ).


, -. ( — final, -). , javac VCC).


, Java - -, invokedynamic , .


Q- VCC


VCC, Java, ( , ) , Q- , , VCC.


, Q- Q-. L- Q-. , ( ) , L- , Q-. JDK-8164889 .


Q- , , Q- Java. , , Valhalla, — , .




, : ( - ), , , , Q-. « » , uload , ustore , uinvoke , ucmp , u2u , . , v.


I , JVM int, short, boolean, char byte, L ( ) L-, Q ( ) K-. , , I- , L- , JVM -, . ( - long, float double). , L- — , , Q- , , «» - , ( , GC). , . , Q- — , ( Java, C, ).



-, . . , - . :



- , Q- default - . — ( TreeMap ). , default- , . , -. , , , default- - ( , null, , == , ). .


, QT[] , - I[] , I - QT . JVM, , . I L-, , — , Object[] . - QT .


Brige-o-matic


, API, Q- . , , -, ( ) -. , equals , hashCode toString . JDK-8164891 .


(JVM-native) -


, , .. POJO . , POJO, . , -, POJO.


, , , -.


, , POJO, , POJO, - .


, - (, , IDE, ). , -. , , , . :




, L- value-based, - JVM :



, , «». , ( == , acmp ) , , , Q- . - , , . , , , … JDK-8163133 .


The authors


John Rose is a JVM engineer and architect at Oracle. Lead Engineer Da Vinci Machine Project (part of OpenJDK). Lead Engineer JSR 292 (Supporting Dynamically Typed Languages ​​on the Java Platform), deals with the specification of dynamic calls and related issues such as type profiling and advanced compiler optimizations. Previously, he worked on inner classes, did the original HotSpot port on SPARC, the Unsafe API, and also developed many dynamic, parallel and hybrid languages, including Common Lisp, Scheme ("esh"), dynamic binding for C ++.


— Senior Java Language Architect Oracle, Java. C Sun Oracle- , Value Types. , Java, , . Java Concurrency in Practice. , Java, , , .


Translators


Oleg Chirukhin - at the time of writing this text, he is working as an architect in the company Sberbank-Technologies, developing architecture for automated business process management systems. Before moving to Sberbank-Technologies, he participated in the development of several state information systems, including state services and the electronic medical map, as well as in the development of online games. Speaker at JUG.ru conferences (JPoint, JBreak). Current research interests include virtual machines, compilers, and programming languages.


Thanks


, . , - :-)


')

Source: https://habr.com/ru/post/336378/


All Articles