Algorithms and solutions when developing the C # JavaScript engine

Hello, dear habrovchane!

A little less than a year ago, I also, in the sandbox, published an article about the beginning of the development of the JavaScript engine in C # . A year has passed since the creation of the project and I am pleased to present you the first version of this creation, which can be downloaded on nuget .
But in this article I will not promote, make comparisons with competitors, measure performance and so on. Here I will write about what I had to go through, what kind of blood it all was and what I had to face.

Typing conflict

This problem was one of the first. And it follows from the fact that at the very beginning the goal was set to ensure easy interaction with the .NET classes. As a result, all built-in types of the standard JavaScript library are implemented by the same types in C #.
Everyone knows that in JavaScript you can call any function in the context of any object. Some of these cases are even described in the specification. For example, what happens if the arguments object that stores the function call arguments and their number is sent to some function from Array.prototype ? That's right, it will work as with a regular array and return the correct result. And how to implement it in C # ?! It turned out there is such a way:
')

Through Reflection, get the MethodInfo of the method you want to call.
Get MethodHandle.GetFunctionPointer ()
Via Activator, create a delegate with the arguments of the types used in the original function, but add the first argument of the object type.

Now we can call this delegate, passing the first argument an object of any reference type.

As you can see, there is no magic here, standard documented functions are used. But to what comes after that, the creators of the platform, obviously, did not prepare - Visual Studio cannot decide on the type and does not even try to call ToString during debugging, and the is operator returns true when checking for the type that declared the method, even if object inherit from object only makes them related. Saves GetType , which does not cheat and returns a real type.
Understanding the danger of such an operation, I added the AllowUnsafeCall attribute, the designer of which requires Ian to indicate an alternative type that I’m ready to receive inside the method marked by it (the heirs of the specified type will also come).

Sparse Array (Sparse Array)

JavaScript arrays have a wonderful property - to store elements with indices 0, 1, 2 and 10005000, not to occupy several gigabytes of memory, and even to select existing indexes in ascending order when iterating, which is not characteristic of hash tables that you can think about. In search of a suitable data structure that has these properties, and even fast, a lot of books and websites were crammed. At some point, this whole mess of information degenerated into a “prefix binary search tree with processor cache optimization”. It sounds loud and long, but in reality the implementation of this structure takes a little more than 100 lines of code.
Three fields are created for storing objects: one array for the “tree node”, the second array for the values themselves, and an integer number of selected elements.
Each node has 4 fields:

Key to access. Unsigned integer 32 bits
The index of the first child (or zero, which is more appropriate in this case)
Index of the second descendant
Index bit. Created for optimization

By default, we assume that the object with index 0 has already been added to the array.
Addition:

Let i = 31;
If the key is equal to the key of the current element, write the added value to the array of values by the index of the current element and exit.
If the key of the current item is greater than the added key, write the added key to the current item, add the value to the array of values, then call the recursively with the previous key and value, and then exit.
Take the i-th bit of the key. If it is 0, make the current element of the first child, otherwise the second child, reduce i by 1. If the descendant is an element with index 0, add a new element, assign it to the previous child, descendants of a new element element with index 0, make the key equal to the key to be added , the bit index is i and write the value. Otherwise, go to step 2.

Receiving:
Here, without formalization, since it is the same as adding, with the only difference that there is no step 3, in the second step we return the value, not write it, and in the absence of the required descendant, the element processes the absence of the requested value, but do not add a new one.

During the experiments, it turned out that this structure (and this is pure binary tree), works not much slower than the hash table. At the same time, there is one feature: if before the formal execution of the algorithm for adding or receiving a “step” into a random element, then there is a chance that it will lead us to the desired element in fewer steps (here the value of the bit index, which was processed during add this item stored in box 4). Another nice feature was that by successively adding elements, they are stored in indices that correspond exactly to their keys, which reduces the search time to O (1).
Here is the structure turned out. When I implemented it, the time for passing SunSpider 0.9.1 decreased by 10%. In my opinion, this is not bad.

Rope String (Rope string)

On Habré already wrote about this , so here I will write only the idea. When string concatenation occurs in .NET, a new string is created, the value in which is first copied from the first line, then from the second. On the face of O (n + m). Rope String is an object that stores references to source objects, not copying their value. In essence, it is a promise that these two lines, if required, will be one line. It turns out O (1). Of course, if we call ToString for each such object, which, in this case, takes O (n + m), we won’t win, but when they create Rope from Rope and Rope from Rope ... we can combine them in linear time using stringbuilder. When I added this solution, the time for all the same SunSpider'e decreased by 5 times.

Type prediction (Prediction type)

This optimization is specific to the dynamic nature of JavaScript. Again an example:

var a = 1; var b = 2; var c = a + b;

Which of the many "+" implementations should be called in the third line? Well, of course, the addition of numbers is not worth thinking here. But what is obvious to man is not always clear to the car. And therefore, when the execution reaches the third row, the virtual machine should think and “walk around” in the huge comparison table. Type prediction is an attempt to teach a computer to analyze operations a bit closer to how a person does it. Therefore, in the third line, the addition operator of numbers will be called, which will only follow, and if we did not make a mistake in the prediction and will call the standard implementation, if this, all the same, has happened. Now this optimization is still in the process of integration, but already with the replacement of the operators "+", "<", ">", "<=", "> =" the effect is well noticeable.
Separately, I mention once again about concatenation. Now this effect is overlapped by Rope String, but sometime, replacing

 "abc" + 1234 + 567

On the multiple line merge operator (with one pass through all elements), I received double acceleration.

In conclusion, a little about the project. The engine has remained interpretive. There have been attempts to add JIT using System.Linq.Expressions, but for some reason this decision only led to a slowdown, therefore this direction is abandoned.

Source: https://habr.com/ru/post/243399/

All Articles

Algorithms and solutions when developing the C # JavaScript engine

More articles: