Future WinRT or Going Native 2.0

Alexandre Mutel is the creator of the fastest and most comprehensive .NET wrapper for DirectX, the only Windows 8 Metro support, R & D developer of the game engine in SiliconStudio, a member of the French demo group FRequency.

Recently, we hear a lot of noise about the return of the idea of "Going Native" after the era of controlled languages such as Java and .NET. Last year, when WinRT was only introduced, short-sighted comments began to appear that claimed that .NET was dead and C ++ was coming back in all its glory - the true and only sure way to develop applications, while JIT is starting increasingly appear in the world of scripting languages (JavaScript uses the advantages of JIT most actively). Any code will somehow become native before execution - the difference is only in the length of the path it takes to become native, and how optimized it will be. The meaning of the word “native” has changed a little and has become inextricably linked with the word “performance.” Even as a strong advocate for a managed language [C #], its performance is actually lower than a well-written C ++ application. So, we just have to accept this fact and return to C ++, when such things as WinRT will be the basis for inter-language interaction for us? In truth, I would like .NET to die, and this post is about why and why.

The era of controlled languages

Let's review the recent development history in managed languages and note current problems. Remember the Java slogan? "Write once runs everywhere." It was a presentation of a new paradigm, when a completely “secure” language based on a virtual machine, associated with a rich set of APIs, would make it possible to easily develop applications for any OS and platform. This was the beginning of the era of controlled languages. While Java was quite successfully adopted in various development areas, it was also rejected by many developers who were aware of the memory management features, the insufficiently optimized JIT (although everything has improved significantly since), a huge number of poor architectural solutions, such as the lack of support of structures, direct access to memory, and calls to native code through JNI were extremely inefficient and time consuming (and even recently they considered the possibility of removing all native object types and making all object - what a terrible idea!).

Also, Java failed to fulfill the promise made in the slogan itself - in fact, it is impossible to cover all the capabilities of each platform with a single API, which led to things like Swing - to put it mildly, the most unimportant UI framework. Also, Java was originally designed for a single programming language, although many saw in JIT and bytecode the ability to port scripting languages to Java JVM.
')
At the beginning of the managed language era, Microsoft tried to enter the Java market with its own language extensions (everyone knows about the end of this story) and eventually got its own platform for managed languages, which in some aspects was better designed and put together: starting from bytecode, the key word unsafe, calling native code, lightweight but very effective JIT and NGEN, the rapidly developing language C #, C ++ / CLI, etc. Initially given interlanguage interaction without the burden of Java slogan (although Silverlight on MacOS or Moonlight were quite good Attempts).

Both platforms used a similar monolithic stack: metadata, bytecode, JIT, and the garbage collector are all closely related. Accordingly, there were similar problems with performance: JIT implies a delay at startup, and code execution is not as fast as it should be. Main reasons:

JIT performs insufficient optimizations in comparison with C ++ -O2, because it must generate code very quickly (also, unlike the Java HotSpot JVM, the .NET JIT cannot replace existing code with more optimized code on the fly).
.NET types, such as Array, always do border checks on access (apart from simple cycles, where the JIT can take away checks if the loop termination condition is less than or equal to the length of the array).
The garbage collector stops all threads during the build (although the new garbage collector in .NET 4.5 improved somewhat in this respect), which can lead to unpredictable performance drops.

But even with such insufficient productivity, a managed ecosystem with a universal framework is the king of productivity and inter-language interaction, with decent overall performance for all supported languages. The culmination of the era of managed languages was probably the launch of WindowsPhone and VisualStudio 2010 (which used WPF to render the interface, although WPF itself worked on top of a decent amount of native code). Managed languages were the only way to develop applications at that time. This was not the best thing that could have happened, given the long list of unresolved .NET performance problems, long enough to encourage “native developers” to strike back, and they had every right to do so.

So it turned out that this means in a sense, the rejection of .NET. I don’t know so much about Microsoft’s internal kitchen, but judging by the frequent reports, there is a lot of confrontation between departments. Good or bad, but for .NET in recent years it seems that Microsoft is running out of heat (for example, there are practically no significant improvements in JIT / NGEN, many unresolved performance improvement requests, including such things as SIMD developers are waiting for already very long). And it seems to me that all these changes are possible only if .NET is a global strategy and with the strong support and participation of all departments.

At the same time, Google began to promote its technology NativeClient, which allows you to run native code in the sandbox directly from the browser. Last year, following the trend of “Going Native”, Microsoft announced that even HTML5, designed for the next IE, will be native! Sic.

In “ Reader Q & A: When will better JITs save managed code? ” Herb Sutter, one of the evangelists “Going Native”, argues some interesting arguments that the philosophy of “Going Native” thinks about JIT (“ Can JITs be faster? ” post Miguel de Icaza) with a lot of inaccurate facts, but let's just consider the key: even if JIT gets better in the future, managed languages have already made a choice between performance and security in favor of security. Thus, the way to the big leagues has already been ordered for them.

And at this moment WinRT appears, which smooths sharp corners a bit. Using part of the .NET philosophy (metadata and some common types, such as strings and arrays) and the good old COM model (as a common denominator for native interlanguage interaction), WinRT tries to solve language interaction problems outside the CLR world (which means no loss of performance C ++) and provide a more modern API for the OS. Is this the answer to the main question of life, the universe and all that? Not really. For WinRT, they chose a course for a clear convergence of technologies, which could potentially lead to great things, but so far there is no certainty about the right choice of path. But what could be the “right way”?

Going Native 2.0 - Performance for All

Security checks can have a negative impact on performance, but managed code is not doomed to run all its life only on top of a slow JIT (for example, Mono can run C # code natively compiled via LLVM on iOS / Linux) and it would be fairly easy to extend bytecode with unsafe instructions provide a controlled improvement in performance (such as canceling array bounds checking).

But the most obvious problem now is the lack of a strong infrastructure for interlanguage compilers. Starting with the compiler used in IE 10 JavaScript JIT,. NET JIT and NGE compilers, Visual C ++ compiler (and many others) - they all use different code for almost the same time-consuming and complex task - generating efficient machine code. Having a single compiler is a very important step to provide high-performance code available for all languages.

Felix9 on Channel9 found that Microsoft can actually work on this problem. This is definitely good news, but the problem of “productivity for all” is only a small part of the big picture. In fact, the “right path” mentioned earlier is a broader integrated architecture, not only an improved LLVM stack, but supported by Microsoft’s many years of experience in various areas (C ++ compiler, JIT, garbage collection, metadata, etc.) which will provide a fully expandable and modular "CLR" consisting of:

Intermediate language intermediate level . Supportive reflection, very similar to LLVM IR or .NET bytecode, defining common data types (primitives, strings, arrays, etc.). An API similar to System.Reflection.Emit should be available. Vectorized types (SIMD) must be the same basic types as int and double. The IL code should not be limited only to the CPU, but also should allow for computations on the GPU (as AMP extensions for C ++ do). It should be possible to present HLSL bytecode using this IL, taking advantage of the common compiler infrastructure (see below). A typeless IL should also be available to make it easier to transfer dynamic programming languages to it.
Dynamically linked libraries and executable files, such as .NET assemblies, providing metadata, IL code that supports reflection. During development, the code must be associated with assemblies (IL code), and not with obsolete C / C ++ header files.
A compiler from IL to machine code , which can be integrated into a JIT, desktop application or cloud compiler, or a combination of all of this. This compiler should provide vectorization as much as the target platform supports. IL code must be compiled into machine code during installation or deployment, using information about the system architecture (during development this can be done immediately after compiling into IL). Compilation steps should be accessible through the API and should provide extension points wherever possible (providing access to the IL, optimizing the IL, or embedding your own transformations from the IL into the machine code). Optimization settings should range from fast compilation (like JIT) to aggressive optimization (pre-compiled applications or hot-swappable JIT code to more productive ones). An application profile can also be used to automatically tune localized optimizations. This compiler must support advanced JIT scripts, such as dynamic code analysis, On Stack Replacement (OSR, allowing code replacement for complex calculations to more optimal code at runtime), unlike the current .NET JIT compiler, which compiles the method in the time of its first launch. Optimizations of this kind are very important in dynamic scenarios, when type inference (type inference) occurs after compilation (as in the case of Javascript).
An extensible memory allocator that allows parallel allocations. The garbage collector will be one of the possible implementations . Most applications will use it for most objects, while most performance-critical objects will use other memory allocation strategies (such as reference counting used in COM / WinRT). There should be no restrictions on using multiple memory allocation strategies in one application (this is what happens in .NET when an application has to resort to using calls to native functions to create objects outside the CLR).

The idea is very close to the CLR stack, but it does not force applications to run on top of the JIT compiler (yes, there is NGEN in .NET, but it was designed to speed up loading, not to speed up overall work, besides it is a black box and it works only with assemblies installed in the GAC) and allows mixed memory allocation strategies: using and without the garbage collector.

In such a system, inter-language interaction will be simpler without sacrificing performance for the sake of simplicity and vice versa. Ideally, the OS itself should be built on the basis of a similar architecture. Perhaps this idea was (is?) At the heart of projects such as Redhawk (this is for the compiler) or Midori (for the OS). In such an integrated system, it is possible that only drivers will require direct access to the hardware.

Felix9 also unearthed that an intermediate bytecode, lower-level than MSIL (.NET bytecode), called MDIL, can already be used and it can be exactly the intermediate bytecode described earlier. Although, if you look at the corresponding patent " INTERMEDIATE LANGUAGE SUPPORT FOR CHANGE RESILIENCE ", then in the specification you can find x86 instructions that do not quite fall under the definition of an architecturally independent bytecode. Perhaps they will leave MSIL unchanged and use MDIL at a lower level. We will find out soon.

So, what problems does WinRT solve from this point of view? Metadata, some APIs that support sandboxes and interlanguage interaction in the embryonic stage (although there are common data types and metadata). As you can see, not a lot, such a COM ++. It is also obvious that WinRT does not provide advanced optimizations when we use its API . For example, we are not allowed to have a structure with inline methods. Each method call in WinRT is a virtual call that will go through the virtual method table (and in some cases it takes several virtual calls when, for example, a static method is used). The simplest read-write properties will require a virtual call. This is clearly ineffective. Apparently, WinRT focuses only on higher-level APIs, not allowing scripts in which we would like to use high-performance code wherever possible, bypassing the layer of virtual calls and non-embeddable code. As a result, we have an extended COM model - this is not exactly what could be called “Building the Future”.

Productivity and performance for C # 5.0

A language like C # would be an ideal candidate for such a modular CLR system, and could easily be transferred to an already existing intermediate bytecode. But in order to effectively use a similar system, C # needs to be improved in several ways:

More unsafe constructions , when we could turn off "controlled" behavior like checking array boundaries (like "super insecure mode", when we could use caching instructions in the CPU to access array elements, this kind of "advanced" things cannot be done with managed arrays without the use of undocumented tricks).
Configurable new operator that supports different memory allocation schemes.
Vectorized types (like float4 in HLSL) should be added to base types. This has long been asked for (with horrible patches in XNA WP to "solve" this problem).
Lightweight interaction with native code : in the current state, the transition from managed to unmanaged code is quite expensive, even without transferring any parameters. The transition to unmanaged code should be possible without the x86 / x64 prologue / epilogue instructions that are now generated in the .NET JIT.

In addition to performance, there are other equally important areas:

Generic (generics) everywhere - in constructors and implicit type conversions, with more advanced constructions (contracts for operators, etc.), closer to the flexibility of C ++ templates, but safer and less cluttered.
Inheritance and finalizers in structures (to allow the execution of lightweight code when completing a method without using cumbersome patterns like try / finally and using).
More metaprogramming . Extension methods for static types, impurities (adding class content to another class, convenient for things like mathematical functions), modifying classes / types / methods at compile time (for example, methods that would be called at compile time to add other methods or properties to class, something like eigenclass in Ruby , instead of using T4 templates to generate code).
A built-in literal or type that could express a link to a language object (class, property, method) using a simple construct like: symbol LinkToMyMethod = @ MyClass.MyMethod; instead of using Linq expressions. This would make more reliable code for things like INotifyPropertyChanged or simplify all property-based systems like WPF (which in its current state contains a lot of duplicate code).

The basic idea is that you need to add less to C # than to remove from C ++ in order to fully utilize the capabilities of such an integrated system, increase developer productivity and without attendant performance losses . Some may argue that C ++ already offers all this and even more, but that is why C ++ is so cluttered (in terms of syntax) and dangerous for most developers. It allows unsafe code absolutely everywhere, while every application has quite specific places where it is really needed (which leads to memory problems that are easier to fix if these places were clearly marked in the code, as is done with the key the word asm). It is much easier and safer to keep track of these areas in the code than to have them everywhere.

What next?

We hope that Microsoft has chosen the path from the general to the particular and started with the release of WinRT, which provides a universal API for all languages and simple inter-language interaction. And then they will present all these more advanced features in the next versions of their OS. But this is the ideal situation and it will be interesting to see if Microsoft can handle this. Even if the recently announced that .NET applications in WP8 will have compilation benefits in the cloud, we still know little about it: it’s just an adapted NGEN (which, I remind you, is not performance oriented and generates code very similar to that generates JIT) or not yet introduced RedHawk compiler?

Microsoft probably has something in store, considering the many years of development of C ++ compilers, JIT, garbage collector and all related R & D projects that they have ...

To summarize, .NET should die and give way to a more integrated, performance-oriented, shared environment, where managed (security and productivity) and unmanaged (performance) would be closely related. And this should be a structural part of the next round of WinRT development.

Source: https://habr.com/ru/post/149866/

All Articles

Future WinRT or Going Native 2.0

The era of controlled languages

Going Native 2.0 - Performance for All

Productivity and performance for C # 5.0

What next?

More articles: