Optimization of work with prototypes in JavaScript engines

The material, the translation of which we are publishing today, was prepared by Mathias Beinens and Benedict Meirer. They are working on the V8 JS engine at Google. This article is devoted to some basic mechanisms that are typical not only for the V8, but also for other engines. Familiarity with the internal structure of such mechanisms allows those who are engaged in JavaScript development to better navigate in terms of code performance. In particular, here we will focus on the features of the engine optimization pipelines, and how to accelerate access to the properties of prototypes of objects.

Code Optimization Levels and Tradeoffs

The process of converting the texts of programs written in JavaScript into suitable for execution code looks approximately the same in different engines.

The process of converting source JS code to executable code

Details on this can be found here . In addition, it should be noted that although, at a high level, the pipelines for converting source code to executable are very similar for different engines, their code optimization systems often differ. Why is this so? Why do some engines have more optimization levels than others? It turns out that the engines have to compromise one way or another, which is that they can either quickly generate a code that is not the most effective, but suitable for execution, or spend more time to create such code, but due to this, achieve optimal performance.

Fast code preparation for execution and optimized code that prepares longer, but runs faster
')
The interpreter is able to quickly form a bytecode, but such code is usually not very effective. An optimizing compiler, on the other hand, takes more time to generate code, but in the end it produces an optimized, faster machine code.

This model of code preparation for execution is used in V8. The V8 interpreter is called Ignition, it is the fastest of the existing interpreters (in terms of the execution of the original bytecode). The optimizing V8 compiler is called TurboFan, it is responsible for creating highly optimized machine code.

Ignition interpreter and TurboFan optimizing compiler

The compromise between program launch delay and execution speed is the reason for some JS engines to have additional optimization levels. For example, in SpiderMonkey, between the interpreter and the IonMonkey optimizing compiler, there is an intermediate level represented by the basic compiler (it is called “The Baseline Compiler” in the Mozilla documentation , but “baseline” is not a proper name).

SpiderMonkey code optimization levels

The interpreter quickly generates bytecode, but such code runs relatively slowly. The base compiler takes more time to generate code, but this code is already running faster. Finally, the IonMonkey optimizing compiler takes the most time to create machine code, but this code can be executed very efficiently.

Let's take a look at a specific example and look at how the conveyors of various engines handle the code. In the example presented here, there is a “hot” loop containing code that repeats so many times.

let result = 0; for (let i = 0; i < 4242424242; ++i) {    result += i; } console.log(result);

V8 starts bytecode execution in the Ignition interpreter. At some point in time, the engine finds out that the code is “hot” and starts the TurboFan frontend, which is part of TurboFan, which works with profiling data and creates a basic machine code representation. The data is then passed to the TurboFan optimizer, which runs in a separate thread, for further improvements.

Hot code optimization in V8

During optimization, V8 continues to execute bytecode in Ignition. When the optimizer finishes, we have executable machine code that can be used later.

The SpiderMonkey engine also starts to execute bytecode in the interpreter. But it has an extra level, represented by the base compiler, which leads to the fact that the “hot” code first gets to this compiler. It generates the base code in the main thread, the transition to the execution of this code is made when it is ready.

Optimization of "hot" code in SpiderMonkey

If the base code runs long enough, SpiderMonkey eventually launches the IonMonkey frontend and optimizer, which is very similar to what happens in V8. The base code continues to run as part of the code optimization process performed by IonMonkey. As a result, when the optimization is complete, the optimized code is executed instead of the base code.

The architecture of the Chakra engine is very similar to the SpiderMonkey architecture, but Chakra tends to a higher level of parallelism in order to avoid blocking the main thread. Instead of solving any compilation problems in the main stream, Chakra copies and sends the bytecode and profiling data that the compiler most likely needs in a separate compilation process.

Optimization of the "hot" code in Chakra

When the generated code prepared by SimpleJIT means is ready, the engine will execute it instead of bytecode. This process is repeated to proceed to the execution of the code prepared by means of FullJIT. The advantage of this approach is that the pauses associated with copying data are usually much shorter than those caused by the work of a full-fledged compiler (frontend). However, the disadvantage of this approach is the fact that heuristic copying algorithms may skip some information that may be useful for performing some optimization. Here we see an example of a compromise between the quality of the received code and the delays.

In JavaScriptCore, all tasks of optimizing compilation are performed in parallel with the main thread responsible for executing JavaScript code. At the same time there is no copying stage. Instead, the main thread simply invokes compilation tasks on another thread. The compiler then uses a complex locking scheme to access profiling data from the main thread.

Optimize Hot Code in JavaScriptCore

The advantage of this approach is that it reduces the forced blocking of the main thread, due to the fact that it performs code optimization tasks. The disadvantages of such an architecture are that its implementation requires the solution of complex tasks of multi-threaded data processing, and that during the work, to perform various operations, it is necessary to resort to locks.

We have just discussed the trade-offs that engines have to go to, choosing between fast code generation using interpreters and creating fast code using optimizing compilers. However, this is not all the problems that confront engines. Memory is another system resource, which, when used, has to resort to compromise solutions. In order to demonstrate this, consider a simple JS program that adds numbers.

 function add(x, y) {   return x + y; } add(1, 2);

Here is the byte-code of the add function generated by the Ignition interpreter in V8:

 StackCheck Ldar a1 Add a0, [0] Return

The meaning of this bytecode can not go into, in fact, we are not particularly interested in its contents. The main thing here is that there are only four instructions in it.

When this code snippet turns out to be “hot,” TurboFan is taken into account, which generates the following highly optimized machine code:

 leaq rcx,[rip+0x0] movq rcx,[rcx-0x37] testb [rcx+0xf],0x1 jnz CompileLazyDeoptimizedCode push rbp movq rbp,rsp push rsi push rdi cmpq rsp,[r13+0xe88] jna StackOverflow movq rax,[rbp+0x18] test al,0x1 jnz Deoptimize movq rbx,[rbp+0x10] testb rbx,0x1 jnz Deoptimize movq rdx,rbx shrq rdx, 32 movq rcx,rax shrq rcx, 32 addl rdx,rcx jo Deoptimize shlq rdx, 32 movq rax,rdx movq rsp,rbp pop rbp ret 0x18

As you can see, the amount of code, in comparison with the above example of the four instructions, is very large. As a rule, the byte code is much more compact than the machine code, and in particular - the optimized machine code. On the other hand, an interpreter is needed to execute the bytecode, and the optimized code can be executed directly on the processor.
This is one of the main reasons why JavaScript engines do not optimize absolutely all code. As we saw earlier, creating optimized machine code takes a lot of time, and, moreover, as we just found out, it takes more memory to store optimized machine code.

Memory usage and optimization level

As a result, it can be said that the reason why JS engines have different levels of optimization is the fundamental problem of choosing between fast code generation, for example, using an interpreter, and fast code generation performed by the optimizing compiler. If we talk about the levels of code optimization used in engines, the larger they are, the more subtle optimizations the code can be subjected to, but this is achieved due to the complexity of the engines and due to the additional load on the system. In addition, here we must not forget that the level of code optimization affects the amount of memory that this code occupies. That is why JS-engines are trying to optimize only the "hot" features.

Optimize access to object prototype properties

JavaScript engines optimize access to object properties through the use of so-called object shapes (Shape) and inline caches (Inline Cache, IC). Details about this can be read in this material, if we express it in a nutshell, then we can say that the engine keeps the shape of the object separately from the values of the object.

Objects of the same shape

The use of object forms makes it possible to perform optimization, called inline caching. Sharing of forms of objects and inline-caches allows you to speed up repeated operations of accessing the properties of objects performed from the same place in the code.

Acceleration of access to the property of the object

Classes and prototypes

Now that we know how to speed up access to the properties of objects in JavaScript, take a look at one of the recent JavaScript innovations — classes. Here is the class declaration:

 class Bar {   constructor(x) {       this.x = x;   }   getX() {       return this.x;   } }

Although it may look like the appearance of a completely new concept in JS, classes, in fact, are only syntactic sugar for the prototype object design system that has always been present in JavaScript:

 function Bar(x) {   this.x = x; } Bar.prototype.getX = function getX() {   return this.x; };

Here we write the function in the getX property of the getX object. Such an operation works in the same way as when creating a property of any other object, since the prototypes in JavaScript are objects. In languages based on the use of prototypes, such as JavaScript, methods that all objects of a certain type can share are stored in prototypes, and the fields of individual objects are stored in their instances.

Look at what is happening, so to speak, behind the scenes, when we create a new instance of the Bar object, assigning it to the constant foo .

 const foo = new Bar(true);

After executing such code, the instance of the object created here will have a form containing the only property x . The prototype of the foo object is Bar.prototype , which belongs to the class Bar .

Object and its prototype

Bar.prototype has its own form containing a single property getX , the value of which is a function that, when called, returns the value of this.x The prototype of the Bar.prototype prototype is the Object.prototype , which is part of the language. Object.prototype is the root element of the prototype tree, so its prototype is null .

Now let's see what happens if we create another Bar object.

Several objects of the same type

As you can see, both the foo object and the qux object, which are instances of Bar , as we have said, use the same object form. Both of them use the same prototype - the Bar.prototype object.

Access to prototype properties

So now we know what happens when we declare a new class and create instances of it. And what about the method call object? Consider the following code snippet:

 class Bar {   constructor(x) { this.x = x; }   getX() { return this.x; } } const foo = new Bar(true); const x = foo.getX(); //        ^^^^^^^^^^

The method call can be perceived as an operation consisting of two steps:

 const x = foo.getX(); //         : const $getX = foo.getX; const x = $getX.call(foo);

The first step is loading the method, which is just a property of the prototype (the value of which is a function). In the second step, the function is called with the setting this . Consider the first step in which you load the getX method from the foo object:

Loading the getX method from the foo object

The engine analyzes the foo object and finds out that there is no getX property in the form of the foo object. This means that the engine needs to view a chain of object prototypes in order to find this method. The engine refers to the prototype Bar.prototype and looks at the shape of the object of this prototype. There, he finds the desired property at offset 0. Next, it accesses the value stored by this offset in Bar.prototype , there JSFunction getX is getX - and this is exactly what we are looking for. This completes the search method.

The flexibility of javascript makes it possible to change prototype chains. For example:

 const foo = new Bar(true); foo.getX(); // true Object.setPrototypeOf(foo, null); foo.getX(); // Uncaught TypeError: foo.getX is not a function

In this example, we call the foo.getX() method twice, but each of these calls has a completely different meaning and result. That is why, although prototypes in JavaScript are just objects, accelerating access to the properties of prototypes is even more difficult for JS engines than accelerating access to the proper properties of ordinary objects.

If we consider the real-life programs, it turns out that the loading of the properties of the prototypes is an operation that occurs very often. It is executed every time the method is called.

 class Bar {   constructor(x) { this.x = x; }   getX() { return this.x; } } const foo = new Bar(true); const x = foo.getX(); //        ^^^^^^^^^^

Earlier we talked about how the engines optimize the loading of ordinary, own properties of objects through the use of object shapes and inline caches. How to optimize repeated loadings of properties of a prototype for objects with the same form? Above, we have seen how properties are loaded.

Loading the getX method from the foo object

In order to speed up access to the method with repeated calls to it, in our case, you need to know the following:

The form of the foo object does not contain the getX method and does not change. This means that the foo object is not modified by adding properties to it or deleting them, or changing the attributes of properties.
The foo prototype is still the original Bar.prototype . This means that the prototype foo does not change using the Object.setPrototypeOf() method or by assigning a new prototype to the special property _proto_ .
The form Bar.prototype contains getX and does not change. That is, Bar.prototype not changed by deleting properties, adding them, or changing their attributes.

In general, this means that we need to make 1 check of the object itself, and 2 checks for each prototype, up to the prototype that stores the property we are looking for. That is, you need to carry out 1 + 2N checks (where N is the number of prototypes to be checked), which in this case does not look so bad, since the chain of prototypes is rather short. However, engines often have to work with much longer chains of prototypes. This, for example, is typical of ordinary DOM elements. Here is an example:

 const anchor = document.createElement('a'); // HTMLAnchorElement const title = anchor.getAttribute('title');

Here we have HTMLAnchorElement and we call its getAttribute() method. The prototype chain of this simple element representing an HTML link includes 6 prototypes! Most of the interesting DOM methods are not in their own prototype of HTMLAnchorElement . They are in prototypes, located further in the chain.

Prototype chain

The getAttribute() method is found in the Element.prototype . This means that each time the anchor.getAttribute() method is anchor.getAttribute() , the engine must perform the following actions:

Checking the anchor object itself for the presence of getAttribute .
Verifying that the direct prototype of the object is HTMLAnchorElement.prototype .
Finding out that in HTMLAnchorElement.prototype there is no getAttribute method.
Verifying that the next prototype is the HTMLElement.prototype .
Figuring out that here there is no necessary method.
Finally, figuring out that the next prototype is Element.prototype .
getAttribute out that there is a getAttribute method.

As you can see, 7 checks are performed here. Since such code is very common in web programming, the engines use optimizations to reduce the number of checks needed to load prototype properties.

If you go back to one of the previous examples, you can remember that when accessing the getX method of the getX object, we perform 3 checks:

 class Bar {   constructor(x) { this.x = x; }   getX() { return this.x; } } const foo = new Bar(true); const $getX = foo.getX;

For each object that exists in the chain of prototypes, down to the one that contains the desired property, we need to check the shape of the object only to find out the absence of what we are looking for. It would be nice if we could reduce the number of checks by reducing the prototype check to checking the presence or absence of what we are looking for. This is exactly what the engine does with a simple move: instead of storing the reference to the prototype in the instance itself, the engine stores it in the form of an object.

Storing references to prototypes

Each form has a link to the prototype. This also means that whenever the prototype of foo changes, the engine moves to a new form of the object. Now we just need to check the shape of the object for the presence of properties in it and take care of protecting the reference to the prototype.

Thanks to this approach, we can reduce the number of checks from 1 + 2N to 1 + N, which will speed up access to the properties of prototypes. However, such operations are still quite resource-intensive, since there is a linear relationship between their number and the length of the prototype chain. In the engines, various mechanisms are implemented to ensure that the number of checks does not depend on the length of the prototype chain, to be expressed as a constant. This is especially true in situations where the same property is loaded several times.

ValidityCell property

V8 refers to prototype forms specifically for the above purpose. Each prototype has a unique form, which is not used in conjunction with other objects (in particular, with other prototypes), and each of the forms of prototype objects has a ValidityCell property associated with them.

ValidityCell property

This property is declared invalid if the prototype associated with the form or any overlying prototype changes. Consider this mechanism in more detail.

In order to speed up sequential operations for loading properties from prototypes, V8 uses an inline cache that contains four fields: ValidityCell , Prototype , Shape , Offset .

Inline cache fields

During the “warming up” of the inline cache when the code is first run, V8 remembers the offset by which the property was found in the prototype, the prototype in which the property was found (in this example, Bar.prototype ), the shape of the object ( foo in this case) , and, in addition, a reference to the current parameter ValidityCell immediate prototype, which is referenced in the form of an object (in this case, it is also Bar.prototype ).

The next time the inline cache is accessed, the engine will need to check the shape of the object and ValidityCell . If the ValidityCell indicator is still valid, the engine can directly use the previously saved offset in the prototype, without performing additional search operations.

When the prototype changes, a new form is created, and the previous property ValidityCell declared invalid. As a result, the next attempt to access the inline cache does not bring any benefit, which results in poor performance.

Implications of a prototype change

If you go back to the example with the DOM element, this means that any change, for example, in the prototype Object.prototype , will lead not only to the inline cache invalidation for the Object.prototype itself, but also for any prototypes located below it in the prototype chain , including EventTarget.prototype , Node.prototype , Element.prototype , and so on, up to HTMLAnchorElement.prototype .

Implications of changing Object.prototype

In fact, modifying the Object.prototype in the process of executing code means causing serious harm to performance. Do not do this.

We investigate the above with an example. Suppose we have a class Bar , and a function loadX , which calls the method of objects created on the basis of the class Bar . We call the loadX function several times, passing it instances of the same class.

 function loadX(bar) {   return bar.getX(); // IC  'getX'   `Bar`. } loadX(new Bar(true)); loadX(new Bar(false)); // IC  `loadX`    `ValidityCell`  // `Bar.prototype`. Object.prototype.newMethod = y => y; // `ValidityCell`  IC `loadX`   //    `Object.prototype`  .

Inline cache in loadX now points to ValidityCell for Bar.prototype . , , Object.prototype — JavaScript, ValidityCell , - , .

Object.prototype — , - , . , :

 Object.prototype.foo = function() { /* … */ }; //    : someObject.foo(); //     . delete Object.prototype.foo;

Object.prototype , - , . , . - , . , « », , .

, , . . Object.prototype , , - .

, — , JS- - , . . , , . , , , .

Results

, JS- , , , -, ValidityCell , . JavaScript, , ( , , , ).

Dear readers! , - , JS, ?

Source: https://habr.com/ru/post/422321/

All Articles