Basics of JavaScript engines: common forms and inline caching. Part 1

Hello friends. At the end of April, we are launching a new course , Information Security . And now we want to share with you the translation of the article, which will certainly be very useful for the course. The original article can be found here .

The article describes the key fundamentals, they are common to all JavaScript engines, and not just the V8 , on which the authors of the engine ( Benedict and Mathias ) are working. As a JavaScript developer, I can say that a deeper understanding of how the JavaScript engine works will help you figure out how to write effective code.

Attention : if you like watching presentations more than reading articles, then watch this video . If not, then skip it and read on.

Pipeline (engine) JavaScript engine
')
It all starts with the fact that you write code in JavaScript. After that, the JavaScript engine processes the source code and presents it as an abstract syntax tree (AST). Based on the AST built, the interpreter can finally get to work and start generating bytecode. Fine! This moment the engine executes JavaScript code.

So that it is executed faster, you can send bytecode to the optimizing compiler along with the profiling data. The optimizing compiler makes certain assumptions based on the profiling data, then it generates highly optimized machine code.

If at some point the assumptions are wrong, the optimizing compiler deoptimizes the code and returns to the interpreter stage.

Interpreter / compiler pipelines in JavaScript engines

And now let's take a closer look at the parts of the pipeline that execute your JavaScript code, namely where the code is interpreted and optimized, and also consider some of the differences between the main JavaScript engines.

At the heart of everything is pipeline, which contains an interpreter and an optimizing compiler. The interpreter quickly generates a non-optimized bytecode, the optimizing compiler, in turn, runs longer, but the output has highly optimized machine code.

Next is the pipeline, which shows exactly how V8 works, the JavaScript engine that is used by Chrome and Node.js.

The interpreter in V8 is called Ignition, it is responsible for the generation and execution of bytecode. It collects profiling data that can be used to speed up execution in the next step while bytecode is being processed. When a function gets hot , for example, if it runs frequently, the generated bytecode and profiling data is transferred to the Turbo Fan (TurboFan), that is, to the optimizing compiler to generate highly optimized machine code based on the profiling data.

For example, the Mozilla JavaScript SpiderMonkey engine, which is used in Firefox and SpiderNode , works a little differently. There is not one but two optimizing compilers in it. The interpreter is optimized to the base compiler (Baseline compiler), which produces in some way optimized code. Together with the profiling data collected during the execution of the code, the IonMonkey compiler can generate a heavily optimized code. If speculative optimization fails, IonMonkey returns to the base code (Baseline code).

Chakra - Microsoft's JavaScript engine, used in Edge and Node-ChakraCore , has a very similar structure and uses two optimizing compilers. The interpreter is optimized in SimpleJIT (where JIT means “Just-In-Time compiler”, which produces some optimized code. Together with profiling data, FullJIT can create even more highly optimized code.

JavaScriptCore (abbreviated as JSC), Apple's JavaScript engine that is used in Safari and React Native, generally has three different optimizing compilers. LLInt is a low-level interpreter, optimized to the basic compiler, which in turn is optimized in the DFG (Data Flow Graph) compiler, and it is already optimized in the FTL (Faster Than Light) compiler.

Why do some engines have more optimizing compilers than others? Here it’s all a matter of compromise. The interpreter can quickly handle bytecode, but bytecode itself is not very efficient. An optimizing compiler, on the other hand, runs a little longer, but produces more efficient machine code. This is a trade-off between fast code generation (interpreter) or some waiting and running code with maximum performance (optimizing compiler). Some engines choose the addition of several optimizing compilers with different characteristics of time and efficiency, which allows to provide the best control over this compromise solution and to understand the cost of additional complication of the internal structure. Another trade-off relates to the use of memory, take a look at this article to get a better understanding of it.

We have just reviewed the main differences between pipelines of interpreters and optimizing compilers for various JavaScript engines. Despite these high-level differences, all JavaScript engines have the same architecture: they all have a parser and some kind of interpreter / compiler pipeline.

JavaScript object model

Let's see what else is common in the JavaScript engines and what methods do they use to speed up access to the properties of JavaScript objects? It turns out that all the main engines do it in a similar way.

The ECMAScript specification defines all objects as dictionaries with string keys mapped to property attributes.

In addition to the [[Value]] , the specification defines the following properties:

[[Writable]] determines whether a property can be reassigned;
[[Enumerable]] determines whether the property is displayed in for-in loops;
[[Configurable]] determines whether the property can be deleted.

The notation [[ ]] looks strange, but this is how the specification describes properties in JavaScript. You can still get these property attributes for any given object and property in JavaScript using the Object.getOwnPropertyDescriptor API:

 const object = { foo: 42 }; Object.getOwnPropertyDescriptor(object, 'foo'); // → { value: 42, writable: true, enumerable: true, configurable: true }

Well, so JavaScript defines objects. What about arrays?

You can imagine for yourself arrays, as special objects. The only difference is that arrays have special handling of indexes. Here the array index is a special term in the ECMAScript specification. In JavaScript, there are limitations on the number of elements in the array - up to 2³² − 1. An array index is any available index from this range, that is, any integer value from 0 to 2³² – 2.

Another difference is that arrays have the magic property length .

 const array = ['a', 'b']; array.length; // → 2 array[2] = 'c'; array.length; // → 3

In this example, the array has a length of 2 at the time of creation. Then we assign another element to index 2 and the length is automatically increased.

JavaScript defines arrays as well as objects. For example, all keys, including array indices, are represented explicitly as strings. The first element of the array is stored under the key '0'.

The length property is just another property that turns out to be non-enumerable and non-configurable.

As soon as an element is added to the array, JavaScript automatically updates the [[Value]] property of the length property.

In general, it can be said that arrays behave similarly to objects.

Property Access Optimization

Now that we know how objects are defined in JavaScript, let's take a look at how JavaScript engines allow you to work with objects efficiently.

In everyday life, access to properties is the most common operation. It's extremely important for the engine to do this quickly.

 const object = { foo: 'bar', baz: 'qux', }; // Here, we're accessing the property `foo` on `object`: doSomething(object.foo); // ^^^^^^^^^^

Forms

In JavaScript programs, a fairly common practice is to assign many objects the same property keys. It is said that such objects have the same shape (shape) .

 const object1 = { x: 1, y: 2 }; const object2 = { x: 3, y: 4 }; // `object1` and `object2` have the same shape.

Also the usual mechanics is access to the property of objects of one form:

 function logX(object) { console.log(object.x); // ^^^^^^^^ } const object1 = { x: 1, y: 2 }; const object2 = { x: 3, y: 4 }; logX(object1); logX(object2);

Knowing this, JavaScript engines can optimize access to the property of an object based on its shape. See how it works.

Suppose we have an object with x and y properties, it uses the data structure dictionary, which we talked about earlier; it contains key strings that point to their respective attributes.

If you are accessing a property, such as object.y, the JavaScript engine searches for a JSObject using the 'y' key, then loads the attributes of the properties that correspond to this request, and finally returns [[Value]] .

But where are these property attributes stored in memory? Should we store them as part of a JSObject? If we do this, we will see more objects of this form later, in this case, a waste of space - to keep a complete dictionary containing the names of properties and attributes in the JSObject itself, since the property names are repeated for all objects of the same form. This causes a lot of duplication and leads to poor memory usage. To optimize the engines keep the shape of the object separately.

This form ( Shape ) contains all property names and attributes, except [[Value]] . Instead, the form contains an offset ( offset ) of the values inside the JSObject, so the JavaScript engine knows where to look for the values. Each JSObject with a common form points to a specific form instance. Now every JSObject has to store only values unique to the object.

The advantage becomes obvious as soon as we have a lot of objects. Their number does not matter, because if they have one form, we save information about the form and property only once.

All JavaScript engines use forms as a means of optimization, but they do not call them directly shapes ( shapes ):

Academic documentation calls them Hidden Classes (similar to classes in JavaScript);
V8 calls them Maps;
Chakra calls them Types;
JavaScriptCore calls them Structures;
SpiderMonkey calls them Shapes.

In this article we will continue to call them shapes .

Transition chains and trees

What happens if you have an object of a certain shape, but do you add a new property to it? How does the JavaScript engine define a new form?

 const object = {}; object.x = 5; object.y = 6;

Forms create the so-called transition chains (transition chains) in the JavaScript engine. Here is an example:

The object initially has no properties; it corresponds with the empty form. The following expression adds the property 'x' with a value of 5 to this object, then the engine goes to the form that contains the property 'x' and the value 5 is added to the JSObject at the first offset of 0. The next line adds the property 'y' , then the engine moves to the next a form that already contains both 'x' and 'y' , and also adds the value 6 to the JSObject at offset 1.

Caution : The sequence in which properties are added affects the shape. For example, {x: 4, y: 5} will lead to a different form than {y: 5, x: 4}.

We do not even need to store the entire property sheet for each form. Instead, each form needs to know only the new property that they are trying to include in it. For example, in such a case, we do not need to store information about the 'x' in the latter form, since it can be found earlier in the chain. For this to work, the form connects with its previous form.

If you write ox in your JavaScript code, JavaScript will search for the 'x' property along the transition chain, until it finds a form that already has the 'x' property in it.

But what happens if it is impossible to create a transitional circuit? For example, what happens if you have two empty objects and add different properties to them?

 const object1 = {}; object1.x = 5; const object2 = {}; object2.y = 6;

In this case, we have a branch, and instead of the transition chain, we get a transition tree:

We create an empty object a and add the property 'x' . As a result, we have a JSObject containing a single value and two forms: empty and a form with a single property 'x' .

The second example starts with the fact that we have an empty object b , but then we add another property 'y' . As a result, here we have two chains of forms, and in the end there are three chains.

Does this mean that we always start with an empty form? Not necessary. The engines apply some optimization of object literals ( object literal ), which already contain properties. Let's say that we add x, starting with an empty object literal, or we have an object literal that already contains x :

 const object1 = {}; object1.x = 5; const object2 = { x: 6 };

In the first example, we start with an empty form and a transition to a chain that also contains x , as well as we saw earlier.

In the case of object2 it makes sense to directly create objects that already have x from the very beginning, rather than starting with an empty object and a transition.

The literal of an object that contains the property 'x' starts with a form that contains 'x' from the very beginning, and the empty form is effectively skipped. This is what (at least) what V8 and SpiderMonkey do. Optimization shortens the transition chain and makes it easier to assemble objects from literals.

A post on Benedict's blog about the amazing polymorphism of applications on React talks about how such subtleties can affect performance.

Next you will see an example of points of a three-dimensional object with the properties 'x' , 'y' , 'z' .

 const point = {}; point.x = 4; point.y = 5; point.z = 6;

As you understood earlier, we create an object with three forms in memory (not counting the empty form). To access the 'x' property of this object, for example, if you write point.x in your program, the JavaScript engine must follow a linked list: starting with the form at the bottom, and then gradually stepping up to the form that has 'x' at the top.

It turns out very slowly, especially if you do it often and with a large number of properties of the object. The time of finding the property is O(n) , that is, it is a linear function that correlates with the number of properties of the object. To speed up searching by properties, JavaScript engines add a ShapeTable data structure. ShapeTable is a dictionary where the keys are mapped in a certain way with the forms and give the desired property.

Wait a second, now we return to the dictionary search ... This is exactly what we started with when we put the forms in the first place! So why do we even care about forms?
The fact is that forms contribute to another optimization, which is called Inline Caches.

We will talk about the concept of inline caches or ICs in the second part of the article, and now we want to invite you to a free open webinar , which is already held on April 9 by a well-known virus analyst and part-time our teacher, Alexander Kolesnikov .

Source: https://habr.com/ru/post/446446/

All Articles

Basics of JavaScript engines: common forms and inline caching. Part 1

More articles: