📜 ⬆️ ⬇️

[Javawatch Live] The story of one pull request. `os.version` in SubstrateVM

A year has passed since the previous trick was successful: publish a YouTube clip instead of a post. "Shameful talk about singletons" scored 7k views on YouTube and twice as much on Habré itself in the text version. For an article written in a completely underhanded state and telling about the most ancient bayan - this is something like success.

Today I installed a new edition all night. This time the topic is much more recent: the history of committing to experimental technology - SubstrateVM. But the degree of upward movement rose to a new level.


')
Really looking forward to your comments! I remind you that if you really want to improve something in this post, then it’s best to file your content on Github . I would like to say “put likes and subscribe to a new channel , but after all, all its releases will already be in your Java hub?

Technically: there is one glue in the video near the end. I just wrote an uncompressed video, and my m2 ssd, the size of just five hundred gigabytes, quickly overflowed. And no other hard disk could not withstand such a pressure data. So I had to disconnect for half an hour and, having gotten tired of finding an additional fifty gigs to record the last few minutes. This was achieved by deleting the files collected by GoogleChrome . Opinion about the recording software wrote in FB right at the moment of recording , there is a lot of pain.

More from the technically interesting: YouTube for some reason blocked me live streaming. At the same time on the account there is not a single strike and stigma. Let's hope it's just a cant, and in 90 days everything will be back.

This article will be quotes from code owned by Oracle. You cannot use this code in your home (unless you read the original license, and it allows it on terms, for example, the GPL). It is not joke. Olso, I warned.

Tip (and the tale will be ahead)


Many have already heard enough stories that “new Java will be written in Java,” and they wondered how this could be. There is a program document Project Metropolis and the corresponding letter from John Rose , but everything is rather vague.

It sounds like some kind of creepy, bloody magic. In the same thing you can try right now, not just there is no magic, but everything is stupid like the back of a shovel when you knock out your teeth. Of course, there are some nuances, but this will someday be very much later.

I will show it on the example of one instructive story that happened in the summer. How is it in schools write an essay "how I spent my summer".

To start a small remark. The project that Ahead-of-Time is currently compiling with Oracle Labs is GraalVM. The component that actually makes nishtyaki and turns the java code into an executable file (into an executable) is SubstrateVM or SVM for short. Do not confuse this with the same abbreviation used by data-satanists (support vector machine). This is about the SVM, as the key part, we'll talk further.

Formulation of the problem


So, "how I spent the summer". I was on vacation, dvucheval F5 on the Grail githabe and came across this ishshshuyu :



A person wants os.version give the correct value.

Well cho, I wanted to fix the bug? The boy said - the boy did.

We go to check if our patient is lying.

 public class Main { public static void main(String[] args) { System.out.println(System.getProperty("os.version")); } } 

At the beginning, what the exhaust looks like on real Java: 4.15.0-32-generic . Yes, this is a fresh Ubuntu LTS Bionic.

Now let's try to do the same on the SVM:

 $ ls Main.java $ javac -cp . Main.java $ ls Main.class Main.java $ native-image Main Build on Server(pid: 18438, port: 35415) classlist: 151.77 ms (cap): 1,662.32 ms setup: 1,880.78 ms error: Basic header file missing (<zlib.h>). Make sure libc and zlib headers are available on your system. Error: Processing image build request failed 

Well yes. This is because especially for the “clean” test I made a completely new virtual machine.

 $ sudo apt-get install zlib1g-dev libc6 libc6-dev $ native-image Main Build on Server(pid: 18438, port: 35415) classlist: 135.17 ms (cap): 877.34 ms setup: 1,253.49 ms (typeflow): 4,103.97 ms (objects): 1,441.97 ms (features): 41.74 ms analysis: 5,690.63 ms universe: 252.43 ms (parse): 1,024.49 ms (inline): 819.27 ms (compile): 4,243.15 ms compile: 6,356.02 ms image: 632.29 ms write: 236.99 ms [total]: 14,591.30 ms 

Absolute runtime numbers can be terrifying. But, first of all, this is what was intended: very hellish optimizations are being applied here. And secondly, it is a sickly virtual machine that you want.

And finally, the moment of truth:

 $ ./main null 

It seems that our guest did not lie, really does not work.

The first approach: theft of properties from the host


Then I searched the global search for os.version and found that all these properties are in the class SystemPropertiesSupport .

I will not write the full path to the file, because right in the SVM built the ability to generate the correct projects for IntelliJ IDEA and Eclipse. This is very cool and does not at all resemble the torment that OpenJDK has to endure. Let classes for us opens IDE. So:

 public abstract class SystemPropertiesSupport { private static final String[] HOSTED_PROPERTIES = { "java.version", ImageInfo.PROPERTY_IMAGE_KIND_KEY, "line.separator", "path.separator", "file.separator", "os.arch", "os.name", "file.encoding", "sun.jnu.encoding", }; //... } 

Then I, completely without including my head, just went and added another variable to this set:

 "os.arch", "os.name", "os.version" 

I rebuild, I launch, I receive a treasured line 4.15.0-32-generic . Hooray!

But here’s the problem: now, on every machine running this code, it always gives 4.15.0-32-generic . Even where uname -a gives up the previous version of the bucket, on the old Ubunt.

It becomes clear that these variables are written to the source file at the time of compilation.
And indeed, you need to carefully read the comments:

 /** System properties that are taken from the VM hosting the image generator. */ private static final String[] HOSTED_PROPERTIES 

It is necessary to apply other methods.

findings



Second approach


If you dig into the SystemPropertiesSupport SystemPropertiesSupport , we find a much more reasonable thing:

 /** System properties that are lazily computed at run time on first access. */ private final Map<String, Supplier<String>> lazyRuntimeValues; 

Among other things, the use of these propertey still does not block the build process of the executable. It is clear that if we cram a lot in HOSTED_PROPERTIES , then everything will slow down.

Registration of the lazy properties occurs in an obvious way, by reference to the method that returns:

 lazyRuntimeValues.put("user.name", this::userNameValue); lazyRuntimeValues.put("user.home", this::userHomeValue); lazyRuntimeValues.put("user.dir", this::userDirValue); 

And all these links to methods are interface, and the same this::userDirValue is implemented for each of the supported platforms. In this case, it is PosixSystemPropertiesSupport and WindowsSystemPropertiesSupport .

If out of curiosity to go to the implementation for Windows, we will see the sad:

 @Override protected String userDirValue() { return "C:\\Users\\somebody"; } 

As you can see, Windows is not yet supported :-) However, the real problem is that the generation of executables for Windows has not yet been completed, so supporting these methods would actually be completely unnecessary efforts.

That is, you need to implement the following method:

 lazyRuntimeValues.put("os.version", this::osVersionValue); 

And then support it in two or three available interfaces.

But what to write there?

findings



Bit of archeology


The first thing that comes to mind is to peek at the implementation in OpenJDK and brazenly copy-paste. A little archeology and looting will never prevent the brave explorer!

Feel free to open any Jav project in the Idea, write System.getProperty("os.version") , and by ctrl + click proceed to the implementation of the getProperty() method. It turns out that all this is stupid in Properties .

It would seem that it is enough to copy the place where these Properties are filled, and, laughing defiantly, to escape into the void. Unfortunately, we come across a problem:

 private static native Properties initProperties(Properties props); 

Noooooooooooooo.



But it all started well.

Was there a boy?


As we know, using C ++ is bad. Is C ++ used in SVM?

And how! There is even a special package for this: src/com.oracle.svm.native .

And in this package, horror-horror, is the file getEnviron.c with something like this:

 extern char **environ; char **getEnviron() { return environ; } 

It's time to smear C ++


Now we dive a little deeper and open the full OpenJDK sources.

If someone does not have them yet, then you can look at the web or download. I warn you, they are swinging from here , still with the help of Mercurial, and still it will take about half an hour.

The file we need is at src/java.base/share/native/libjava/System.c .

Notice that this is the path to the file, and not just the name? That's right, you can shove your new shiny, fashionable Idea, bought for $ 200 a year. You can try CLion , but in order to avoid irreversible mental damage, it is better to just take the Visual Studio Code . He already highlights something, but still does not understand what he saw (he doesn’t cross out everything in red).

A brief retelling of System.c :

 java_props_t *sprops = GetJavaProperties(env); PUTPROP(props, "os.version", sprops->os_version); 

In turn, they are taken in src/java.base/unix/native/libjava/java_props_md.c .
Each platform has its own such file, they are switched via #define .

And here begins. There are many platforms. On any kind of necrophilia like AIX, you can score, because GraalVM officially does not support this (as far as I know, GNU-Linux, macOS and Windows are planned first). GNU / Linux and Windows support the use of <sys/utsname.h> , which has ready-made methods for obtaining the name and version of the operating system.

But here in macOS there is a creepy piece of shit .


 // Fallback if running on pre-10.9 Mac OS if (osVersionCStr == NULL) { NSDictionary *version = [NSDictionary dictionaryWithContentsOfFile : @"/System/Library/CoreServices/SystemVersion.plist"]; if (version != NULL) { NSString *nsVerStr = [version objectForKey : @"ProductVersion"]; if (nsVerStr != NULL) { osVersionCStr = strdup([nsVerStr UTF8String]); } } } 

If initially the idea was to rewrite it manually in a good style, then it quickly broke about reality. And what if I’m somewhere in the jungle of this noodles jungle, for someone it breaks, and I am hanged in the central square? Well nafig. Need to copy-paste.

findings



Is copy-paste the norm?


This is an important question on which the amount of further torment depended. I really didn’t want to rewrite manually, but it was even worse to go to court for violating licenses. So I went to the githab and asked Codrut Stancu about it directly. Here is what he said :

»Reusing OpenJDK code, for example, copy-paste is a normal thing in terms of licensing. However, for this you need to have a very good reason. If the feature can be implemented by reusing the JDK code without copying, for example, patching it with a substitution, it will be much better. "

That sounds like official copy-paste permission!

Normally communicated ...


I began to transfer this piece of code, but rested on my laziness. To check the work under macOS of different versions, you need to find at least one with necrofile 10.8 Mountain Lion. I have two of my apple devices and one of my friend, plus you can deploy to some kind of VMWare trial.

But laziness. And this laziness saved me.

I went to chat and asked Chris Seaton which toolchain is the right one for the build. What is the supported version of the operating system, C ++ compiler and so on.

In response, he received a surprised silence of the chat and Chris's answer that he did not understand the essence of the question.

It was dofig time before Chris could understand what I want to do, and asked him not to do so .
That's really the idea of ​​SVM. SVM is pure Java, it’s not a code. But nobody wants C ++ code from OpenJDK. That's the last thing we want.

The example with mathematical libraries did not convince him. At a minimum, they are written in C, and the inclusion of C ++ would mean the connection of a perfect new language into the code base. And this, that fufufu.

What to do? Write on System Java .

And if a call to the C / C ++ Platform SDK cannot be avoided, then it must be some kind of single system call wrapped in a C API. The data is drawn in Java and then business logic is written strictly in Java, even if the Platform SDK has convenient ready-made ways to do it differently on the C ++ side.

I sighed and began to study the source code in order to figure out how this can be done differently.

findings



Fiddler is not needed


A fiddler is not needed, dear. He only eats extra fuel.

Here I felt some sadness, because look here. If we have <sys/utsname.h> on Windows, and we stupidly hope for its answer, this is easy and simple.

But if it's not there, you have to do what?


Fortunately, my mental anguish was interrupted by a pulquest Paul Woegerer, who repaired it all.

It is interesting that at first everything was fixed in the wizard ( os.version stopped giving null in the test), and only then I noticed a pullrequest. The problem is that this commit is not marked as a pullrequest on Github - it is a simple commit with the PullRequest: graal/1885 . The fact is that the dudes in Oracle Labs do not use Github, they need it only to interact with external committers. All of us who are not fortunate enough to work at Oracle Labs need to subscribe to alerts about new commits to the repository and read them all.

But now you can relax and see how to implement this feature correctly .

Let's see what this beast is, System Java.

As I said earlier, everything is as simple as the back of a spade when they try to knock your teeth out. And just as painful. Let's look at a quote from the pool:

 @Override protected String osVersionValue() { if (osVersionValue != null) { return osVersionValue; } /* On OSX Java returns the ProductVersion instead of kernel release info. */ CoreFoundation.CFDictionaryRef dict = CoreFoundation._CFCopyServerVersionDictionary(); if (dict.isNull()) { dict = CoreFoundation._CFCopySystemVersionDictionary(); } if (dict.isNull()) { return osVersionValue = "Unknown"; } CoreFoundation.CFStringRef dictKeyRef = DarwinCoreFoundationUtils.toCFStringRef("MacOSXProductVersion"); CoreFoundation.CFStringRef dictValue = CoreFoundation.CFDictionaryGetValue(dict, dictKeyRef); CoreFoundation.CFRelease(dictKeyRef); if (dictValue.isNull()) { dictKeyRef = DarwinCoreFoundationUtils.toCFStringRef("ProductVersion"); dictValue = CoreFoundation.CFDictionaryGetValue(dict, dictKeyRef); CoreFoundation.CFRelease(dictKeyRef); } if (dictValue.isNull()) { return osVersionValue = "Unknown"; } osVersionValue = DarwinCoreFoundationUtils.fromCFStringRef(dictValue); CoreFoundation.CFRelease(dictValue); return osVersionValue; } 

In other words, we write in Java word for word what we would have written in C.

Look at how DarwinExecutableName written:

  @Override public Object apply(Object[] args) { /* Find out how long the executable path is. */ final CIntPointer sizePointer = StackValue.get(CIntPointer.class); sizePointer.write(0); if (DarwinDyld._NSGetExecutablePath(WordFactory.nullPointer(), sizePointer) != -1) { VMError.shouldNotReachHere("DarwinExecutableName.getExecutableName: Executable path length is 0?"); } /* Allocate a correctly-sized buffer and ask again. */ final byte[] byteBuffer = new byte[sizePointer.read()]; try (PinnedObject pinnedBuffer = PinnedObject.create(byteBuffer)) { final CCharPointer bufferPointer = pinnedBuffer.addressOfArrayElement(0); if (DarwinDyld._NSGetExecutablePath(bufferPointer, sizePointer) == -1) { /* Failure to find executable path. */ return null; } final String executableString = CTypeConversion.toJavaString(bufferPointer); final String result = realpath(executableString); return result; } } 

All these CIntPointer , CCharPointer , PinnedObject , what.

For my taste, this is inconvenient and ugly. You need to manually work with pointers that look like Java classes. It is necessary to call the appropriate release in time so that the memory does not flow away.

But if it seems to you that these are unjustified measures, you can again look at the implementation of GC in .NET and be terrified, what does C ++ lead to if you don’t stop in time. Remember, this is one huge CPP file of more than a megabyte size. There are some descriptions of his work, but they are clearly insufficient for understanding by an external contributor. The code above, albeit ugly looking, is quite understandable and analyzed by means of static analysis for Java.

As for the essence of the commit, I have questions for him. And at least there is no support for Windows. When kodgen appears for Windows, I'll try to take on this task.

findings



Epilogue


This battle ends, but not a war at all.

Fighter, sensitively wait for new articles on Habré and fit into our ranks !

I want to remind you that Oleg Shelayev, the only official GraalVM evangelist from Oracle, will come to the next Joker conference . Not just "the only Russian-speaking", but "the only one in general." The title of the report ( “Compiling Java ahead-of-time with GraalVM” ) hints that it won't do without SubstrateVM.

By the way, Oleg recently issued a service weapon - an account on Habré, shelajev-oleg . There are no posts there yet, but you can cast on this username.

You can chat with Oleg and Oleg in our chat-room at the Telegram: @graalvm_ru . Unlike ishshuyov on Gitkhab, you can communicate in any form, and no one will be banned ( but this is not accurate ).

Also I remind you that every week we, together with the podcast “Debriefing”, make an issue of “Java-digest”. For example, this was the last digest . From time to time, there is also news about GraalVM (in fact, I don’t turn the whole issue into a GraalVM news release just because of respect for the audience :-)

Thank you for reading this - and see you soon!

Source: https://habr.com/ru/post/420455/


All Articles