Stack chases with a stack, or bytecode conversion of a Java virtual machine into a Phantom OS bytecode.

Phantom OS is an experimental operating system containing, at the application level, a virtual bytecode machine in persistent RAM.

One of the two key paths for migrating existing code that are planned for Phantom OS is the conversion of Java bytecode to Phantom bytecode.

I must say that these virtual machines are pretty, albeit completely random, similar. The Phantom Virtual Machine was designed when I did not know anything about Java, but, probably, the similarity of goals led to the similarity of the decisions made.
')
Both machines are stack machines. Both operate on two separate stacks — a stack for working with objects (only links are on the stack), and a binary stack — for computing. The Phantom machine also has separate stacks for feature frames and exception traps. How this part is arranged in the JVM, I do not know so far, but I believe that it is unlikely to be radically different.

Naturally, the set of operations of stack machines in some places is similar as two drops.

But, of course, there are very significant differences.

First, the Phantom Virtual Machine is designed to run the application code in a less friendly environment. Java assumes that each program lives in a separate address space, and everything around is “our” code. Phantom allows direct calls between applications of different users and different programs, which requires a tougher attitude to some aspects of the virtual machine, including the same call, and the interface of the object in general. For example, we cannot rely on the fact that the called method behaves “decently” - you cannot give it access to your stack, you cannot rely on the presence or absence of a return value. It is impossible to guarantee the difference between a method, a function and a static function. That is, we can assume that it is we who cause, but what is “slipped” to us from that side is unknown.

In view of all the above, the call in Phantom is absolutely unified - it is always a method call (there is this and there is a class), and a value is always returned, which for the void method is null and is explicitly destroyed by the calling code. This ensures that whatever the call error happens, so that it does not turn up as the subject of the call, the call and return protocol will be observed.

There is a difference in working with integers. Java allocates them to a separate category of types, different from the object, “class” types - java.lang.Integer and int - different things in Java. Kompyler sometimes successfully hides this fact, but inside they differ. Phantom and here goes towards maximalism. The whole is an honest object. It can be pulled out onto an integer stack and there calculated in a “non-object”, binary form, but it will return to the form of an object being assigned to a variable or passed in a parameter. Incidentally, this also follows from the requirement of uniformity of the method call protocol — the methods that return the integer and the object according to the protocol are identical. (The same, obviously, applies to other "integral" types - long, float, double.)

There are other differences, for example, the connection protocol of what is called native methods in Java. In Phantom, these are “system calls,” and, again, at the level of a method call, they are indistinguishable from the usual “honest” method. (The code of this method contains a special instruction for “leaving” the OS kernel, but this method is not visible “outside”. In particular, this allows inheriting and overriding such methods in the traditional way, by replacing VMT.)

It seems (at least, it seemed to me) that converting the bytecode of one stack machine to the bytecode of another stack machine is an elementary task. In the end, there and there stacks, and 90% of operations are simply identical. Well, there is no difference between the Phantom and Java bytecode integer addition: pick up two integers from the stack, fold, put the result on the stack.

The first approach to translation was based on the model of sequential conversion of Java bytecode to phantom one. It quickly turned out that it was impossible to do this linearly. Totally. It is necessary to “work out” when parsing the Java code “work” of the stack, and to synthesize an intermediate representation. A part of such a translator was written and declared unsuitable - the labor intensity exceeded all imaginable boundaries. For example, locally, at the call point, it is completely impossible to find out if the object is a call (the first parameter is this) or not. It's all the same to us, but it matters to us. You can figure it out, but you need to make a lot of effort. It even provided that only the analyzer had to write - the compiler backend, generating quite reliable Phantom bytecode, worked stably by that time (due to the fact that the compiler of its own language was ready and stably used).

In this place, the work would have stalled, do not get me under the hands of a framework called Soot. Originally intended for static analysis and Java instrumentation bytecode, it was ideal for the task described. Soot parses the JVM class file and generates an extremely sane internal representation — a tree of operations with a compact (about a dozen types of nodes) basis, plus information about types and other meta-information.

From this point, conversion is catastrophically simpler - in fact, you need to convert a tree into a tree. By the way, by the way, we also get support from Dalvik (Andrid VM bytecode).

We can not say that now everything is cloudless. Although the first primitive Java classes have already been compiled and work has begun on the compiler unit tests. There are a lot of problems.

For example: in the phantom, inheritance from classes with an “internal” implementation was supposed to be prohibited. At the same time, Java “used” to see the type java.lang.String in the string, not internal.String. But that's fine! More difficult with the comparison of objects. In Java, == for integers and strings works differently, compares values and references, respectively. The more consistent Phantom clearly distinguishes between comparing values and references, which means that the seemingly simple conversion of operators == and! = Causes a problem - one has to either deal with the type, or introduce a java bytecode into the basis, which behaves as described above. That “inaccurately”, but damn simple.

In general, it was originally supposed that the type system of Java should be encapsulated by presenting them in the type tree of the virtual machine Phantom inside the java branch. In fact, I have now refused this. This appears to cause more problems than it solves.

A ridiculous problem was with access to public fields: in the Phantom they ... no. At all. Only methods. Bypassing the problem required automatic generation and use of getters and setters. Which is probably also problematic - they are now given typical “Java” names getVariable / setVariable, which may cause a conflict. Apparently, it is necessary to make the names of the “generated” methods special and inaccessible from the usual method namespace, but it’s also a bit sad to do so - the autogeneration of public setters has an applied value.

The next problem will be synchronization primitives. In Java, the synchronization point can be any object. You do not want to keep special fields for this in each object of the Phantom, but you need to be able to somehow “complete” the objects. And not only synchronization, but also, for example, the mechanism of weak links requires that additional entities be “hung” on the object. At the moment, this is supposed to be done through the object header field, on which it is possible, if necessary, to hang an object or a multitude of objects to serve special cases. For most “linear” objects, this field will be empty, and filled only if they are doing something special with it.

Phew Probably, to begin with, we put a semicolon.

Well, yes, that's all - open source. If it is interesting to take part in the work on the OS, or you need a ready-made virtual machine in your project, the project is easily located on the github using the phantomuserland key.

Source: https://habr.com/ru/post/278245/

All Articles

Stack chases with a stack, or bytecode conversion of a Java virtual machine into a Phantom OS bytecode.

More articles: