Past and Future of JavaScript Compilation

Now we take for granted the fast execution of js-code in browsers, and every day there are more and more inspiring examples of what can be implemented using JS. But this was not always the case. In this article we’ll talk about JS engines that are responsible for compiling code in browsers, their historical acceleration path and possible future paths.

The first engine interpreting the js-code was SpiderMonkey, which was introduced in the Netscape 2.0 browser in 1995. The myth of its rapid creation is well documented . Brendan Ike had only 10 days to design a language and build a compiler. Javascript was successful from the start, and by August, Microsoft had already embedded its version of JScript in Internet Explorer 3.0 by the same code. By the end of 1996, the language was accepted into the commission for formal standardization, and in June of the following year it acquired the official standard ECMA-262. Since then, JS support has become mandatory for each browser, and each major manufacturer began to build its own engine to support JS. Over the years, these engines have evolved, replaced each other, were renamed, and became the basis for the following engines. Tracking all created versions is not a task for the faint of heart.

For example, few people now remember the KDE Konquerer browser, which used its open source KJS engine. Subsequently, the Apple developers “forked” this project and developed the WebKit core to the future, replacing several names during the evolution process: Squirrelfish , Squirrelfish Extreme , Nitro .
')
Opposite processes also took place to be. There are engines, whose names have remained unchanged, while all the insides have been changed. For example, in Mozilla’s SpiderMonkey there is no hint of code that existed in 1995.

By the mid-2000s, JavaScript was standardized and very common, but its execution was still slow. The race for speed began in 2008, when a number of new engines appeared. In early 2008, Futhark from Opera was the fastest engine. By the summer, Mozilla introduced Tracemonkey, and Google launched its Chrome with a new JavaScript JavaScript V8. Despite the abundance of names, they all tried to do the same thing, and each project wanted to compare favorably in the speed of execution. Since 2008, the engines have been improved by optimizing their design, and the main players have been racing for building the fastest browser.

When we talk about the JavaScript engine, we usually mean the compiler, and making the code generated by the compiler faster is what is the real challenge. Probably, not all writing JS programs fail to understand how the compiler works.
This implies that JavaScript is a high-level language. This means that it is readable and has a high degree of flexibility. Compiler's job is to form native code from this human-readable code.

Usually compilation takes place in 4 stages:

1. At the stage of lexical analysis (scanning), the compiler scans the source code and breaks it into separate components called tokens. This is usually achieved through regular expressions.
2. The parser structures reworked code into a syntax tree.
3. Then this structure is converted by bytecode to translator. In the simplest case, the translation can be represented as a mapping of tokens in their view bytecode.
4. In the end, the bytecode passes through the bytecode interpreter to get the native code.

This is a classic design of compilers, and it has been used for many years. But the requirements of desktop applications are very different from the requirements of browsers, and this architecture began to experience difficulties in a number of ways. The elimination of these contradictions was the result of a race for speed between browsers.

Fast, elegant, right

JavaScript is a very flexible language and is quite tolerant to “on the verge of foul” constructions. So how do you write a compiler for a weakly typed, dynamic language with late binding? Before you make it fast, you need to make it neat. As Brendan Ike put it,
“Fast, elegant, correct. Select 2, considering that the 'correct' is already selected. ”
“Fast, Slim, Correct. Pick any two, so long as one is 'Correct' ”

Mozilla's Jesse Ruderman has created a very useful jsfunfuzz tool for testing compiler validity. Brendan called this a parody of the JavaScript compiler, as his goal is to create the strangest but valid constructions that are sent to the compiler for verification. The tool revealed numerous extreme cases and bugs.

JIT compilers

The main problem with classical architecture is that the interpretation of the bytecode at runtime is rather slow. Performance can be improved by completing the bytecode compilation into machine code, but this will require a long wait for users to load web pages.

The solution is “lazy compilation”, or “on the fly” compilation. As the name implies, pieces of code are compiled into machine code by the time it is needed. JIT compilers have appeared in various technologies, with different optimization strategies. Some are sharpened by optimizing individual commands, others by optimizing repetitive operations, such as loops and functions. Modern JavaScript-engine uses several such compilers, working together to improve the performance of your code.

Javascript jit compilers

The first JavaScript JIT compiler was Mozilla's TraceMonkey. This was the so-called tracing JIT, since it tracks the most repeated cycles. These “hot loops” are compiled into machine code. Only through this optimization alone did Mozilla get performance improvements from 20% to 40% compared to their previous engine.

Soon after launching TraceMonkey, Google released Chrome along with a new V8 engine. V8 was designed specifically to optimize speed. The main architectural solution was the rejection of the generation of bytecode, instead of which the translator directly generates native code. During the year after launch, the team also applied register allocation, improved inline caching, and completely rewritten the regular expression engine, making it 10 times faster. This combined increased the speed of JavaScript execution by 150%. The race for speed went on!

Later, browser manufacturers introduced compilers with an extra step. After a control flow graph or syntax tree has been generated, the compiler can use this data for further optimizations before generating native code. Examples of such compilers are IonMonkey and Crankshaft .

The ambitious goal of all these transformations is to execute JavaScript code at native C speed. A few years ago, this goal seemed incredible, but the gap in execution speed is getting shorter.

Now about some particular features of JS compilation.

Hidden Classes

Since in JavaScript, building objects and structures is fairly simple for a developer, navigation through these loosely determined structures can be very slow for the compiler. For example, in C, a hash table is the usual way to store properties and access properties. The problem with a hash table is that a search on a very large hash table can be very slow.
To speed up this process, both V8 and SpiderMonkey use hidden classes — an internal representation of your JavaScript objects. In Google, they are called maps, and in Mozilla - shapes, but this is essentially the same thing. These structures are much faster to search than a standard dictionary search.

Type inference

Dynamic typing Javascript is what allows the same property to be a number in one place and a string in another. Unfortunately, this diversity leads to numerous additional type checks in the compiler, and the code with conditional checks is much longer and slower than the code defined for variable types.

The solution is a type inference method that is available in all modern JS compilers. The compiler makes assumptions about the data types of the properties. If the assumption is true, it passes the execution to a typed JIT that generates fast machine code for these sections. If the type does not match, then the code is transferred to the untypical JIT, for execution on a slower code with condition checks.

Inline caching

This is the most common optimization in modern JavaScript compilers. This is not a new technique (first used 30 years ago for the Smalltalk compiler), but very useful.

Inline caching requires both previous techniques for their work: type inference and hidden classes. When the compiler detects a new class, it caches its hidden class along with all the specific types. If this structure is encountered later, it can be quickly compared with the stored cache. If the structure or data type has changed, they are transferred to a slower generalized (generic) code or in some compilers polymorphic inline caching is performed — a separate cache of one structure is generated for each data type. Read more about this in the article by Vyacheslav Egorov from the V8 team.
When the compiler obtains the structure of the code and the data types in it, various additional optimizations become possible. Here are just a few examples:

inline expansion, or “inlining”

Calling a function is an expensive operation because it requires some kind of search, and the search can be slow. The idea is to put the body code of the function in the place where it is called. This avoids unnecessary branching, and speeds up execution, but at the expense of increasing the size of the executable code.

invariant cycle changes, “lift”

Cycles are the first candidate for optimization. By removing unnecessary calculations from the loop, you can greatly improve performance. The simplest example is a for loop on an array element. Calculate the length of the array at each iteration is unprofitable, so this operation is carried out, “rises” for the cycle.

convolution of constants

Constant expressions are calculated, as well as expressions containing immutable variables.

removal of common subexpressions

Similar to the constants design, the compiler searches for expressions containing the same computations. These expressions can be replaced with variables with calculated values.

dead code removal

Code that is not used or cannot be reached. It makes no sense to optimize the body of the function, which is never used, you can simply delete it.

This is just a small set of simple tools, demonstrating the direction in which browser manufacturers are working to achieve their ambitious goals. Many of them have made long-term investments in the concept of the web as an operating system. To achieve this, they set the task of executing JavaScript code at the speed of native C, and gradually erasing the difference between native and web applications.

ES.next

The next version of the EcmaScript specification (EcmaScript 6) has long been in operation, the final version is expected this year. One of the stated goals of the project is a quick compilation. A set of means by which this can be achieved is discussed, including typing, binary data, and typed arrays. Typed code can be sent directly to JIT, speeding up compile and execute time.

The support of ES.next by browsers is still quite limited, but you can follow it at least here , you can also start testing with Traceur - the ES.next compiler in JavaScript, written in JavaScript.

Webgl

JavaScript in the browser is not limited to DOM manipulations. A large number of modern browser games are rendered directly to the canvas element of the page, using the standard 2D context. The fastest way to render on canvas is WebGL, an API that provides optimization by transferring expensive operations to the GPU, leaving the CPU for the application logic.

WebGL in some form is supported in most browsers , primarily in Chrome and Firefox. Safari and Opera users must first enable the corresponding option. Microsoft also recently announced WebGL support in IE11.

Unfortunately, even with full browser support, you cannot guarantee that WebGL will work equally well for all your users, as it also depends on modern GPU drivers. Google Chrome is the only browser offering an alternative solution if these drivers are not installed. WebGL is a very powerful technology, but its high point has not yet arrived. In addition to security issues, support for mobile devices is very heterogeneous. And, of course, there is no support in older browsers.

Javascript as a result of compilation

Despite the fact that all modern web applications use JavaScript on the client, not all of them were written in JavaScript. Many are written in completely different languages (Java, C ++, C #), and then compiled into JS. Some were created as languages that extend JavaScript, for more convenient development, such as TypeScript.

Cross compilation, however, has its problems. The minified code is unreadable, it is difficult to debug it, in practice it is possible only for browsers with support for force mapping - an intermediate file that preserves the mapping in the code in the original language.
A couple of years ago, Scott Hanselman from Microsoft put forward the postulate that Javascript is the compilation language for the web . His remark that the modern minified JavaScript application is poorly readable is difficult to dispute, but his post nonetheless caused a lot of discussion. Many web developers began by simply studying the source code in the browser, and now it is almost always obfuscated. Can we, for these reasons, lose some of the future developers?

An interesting demonstration of the idea was the Emscripten project, which allows you to compile LLVM bytecode in JavaScript. LLVM (Low Level Virtual Machine) is a very popular intermediate compilation format; you can find LLVM compiler for almost any language. This approach will allow everyone to write the source code in the language in which it is convenient. The project is still in its early stages, but the team has already released some impressive demos. For example, the Epic developers ported the Unreal Engine 3 to JavaScript and WebGL, using the LLVM compiler C and Emscripten to compile into asm.js code.

asm.js

A project working in the same direction. Its creators accepted the call “javascript as machine code” quite literally, taking a strongly limited subset of the language as an JavaScript assembler. this way you can theoretically write asm.js code with your hands, but no one wants to do this. To get the most out of this opportunity, you need 2 compilers.
The Emscripten compiler can produce asm.js code. the resulting javascript is unreadable, but it is correct and backward compatible. A huge acceleration will come when the browser engines recognize the asm.js format and pass this code through a separate compiler. For this purpose, Mozilla is working on OdinMonkey, an asm.js optimizing compiler embedded in IonMonkey. Google also announced support for asm.js in Chrome. Preliminary tests showed performance in about 50% of compiled C ++, a phenomenal achievement comparable in speed to Java and C #. The team is confident that the result will be improved.
Mozilla Research is really on the crest of a wave now. In addition to Emscripten and asm.js, there is also an LLJS project (JavaScript as C), and River Trail , ECMAScript extensions for parallel computing, is being developed in conjunction with Intel. Considering how much effort is being applied in this direction, and what results have already been obtained, it can be assumed that the execution of JavaScript at native speed is not as unattainable as it seemed before.

ORBX.js

There are also those who offer to solve the problem of JavaScript performance due to full virtualization. Instead of running the application on your machine, it runs in the cloud. This, of course, is not a solution to the JS compilation problem itself, but an alternative solution for users. ORBX.js - implementation of a video codec capable of streaming video with a resolution of 1080 pixels exclusively using JavaScript. Joint project of Mozilla and Otoy.
The technology, of course, is impressive , but perhaps it creates more problems than it solves.

And what do you think about the future of Javascript compilation?

Source: https://habr.com/ru/post/182802/

All Articles