Implementing V8 Sorted by Google

Hi, Habr.

The world of javascript is developing at an incredible speed: new language standards, new frameworks, both in the browser and on the server and in desktop applications and so on ... But sometimes you want to dive into some more basic topic instead of learning new super-features. And dive deep into the source code.

And at this moment under my scrutiny there was an imperceptible line "native code", which somehow appears before the eyes of any JS developer in the Chrome or Node.js console:
')

[].sort.toString(); "function sort() { [native code] }"

So, who cares, what sorting implementation is hidden in V8 behind the [native code] inscription - welcome under cat.

For the study we will use the v8-git-mirror repository on github. Hereinafter I will refer to the code from version 4.6.9, which was relevant at the time of this writing.

The V8 repository is great, and I decided to start a search with a simple find command:

 find ./src -iname "*array*" ./src/array-iterator.js ./src/array.js ./src/arraybuffer.js ./src/harmony-array-includes.js ./src/harmony-array.js ...

The names of these files seem to be familiar. It seems we need exactly array.js . What is its content?

In general, this is an ordinary code for js, with the exception of various macros and define-s, which are written in UPPERCASE and defined in the macros.py file, and function calls defined already somewhere in the depths of C ++ code, starting with% or $. But you should not be afraid, everything that is not js-noye has a quite clear assignment based on the name, for example,% _CallFunction obviously calls a function, etc. So let's continue searching for the sorting code.

Let's start from the end, from the place where the sorting method is added to Array.prototype:

 // Set up non-enumerable functions of the Array.prototype object and // set their names. // ... utils.InstallFunctions(GlobalArray.prototype, DONT_ENUM, [ // ... "sort", getFunction("sort", ArraySort), "filter", getFunction("filter", ArrayFilter, 1), "forEach", getFunction("forEach", ArrayForEach, 1), // ... ]);

Yeah, we need the ArraySort function. It is defined in the same file:

 function ArraySort(comparefn) { CHECK_OBJECT_COERCIBLE(this, "Array.prototype.sort"); var array = $toObject(this); var length = TO_UINT32(array.length); return %_CallFunction(array, length, comparefn, InnerArraySort); }

It checks whether this can be converted to an object (in accordance with the specification ), and if not, a TypeError exception will be thrown. This is then converted to an object, the length is converted to uint32 and called InnerArraySort . In these 281 lines and implemented sorting in V8.

As many guess, this quick quicksort is one of the most common unstable internal sorting algorithms with medium complexity O (nlog n). There was a quick sort in V8 not immediately, but on September 25, 2008 , replacing the heapsort , to improve performance.
Not everywhere the choice of sorting stopped on quicksort; for example, SpiderMonkey (js Firefox engine) uses merge sort ( merge sort ) - source .

In order to study the implementation nuances further, it would be nice to remember what the quick sorting algorithm step is (simplified):

Select the pivot element from the array;
We reorganize the array, we reduce it to the form: the elements are smaller pivot, pivot, the elements are large pivot;
We repeat recursively for the left and right parts of the array.

Over time, the implementation of sorting inside V8 has acquired many different optimizations. First of all, optimization is associated with the method of selecting the support element, tail recursion, and the algorithm of operation on arrays of small size. Let's look at them in order of appearance.

The first refinement was the addition of sorting inserts ( Insertion sort ) for arrays of small size. Here is its source code :

 var InsertionSort = function InsertionSort(a, from, to) { for (var i = from + 1; i < to; i++) { var element = a[i]; for (var j = i - 1; j >= from; j--) { var tmp = a[j]; var order = %_CallFunction(UNDEFINED, tmp, element, comparefn); if (order > 0) { a[j + 1] = tmp; } else { break; } } a[j + 1] = element; } };

Developers for a long time could not come to an agreement on which array to be considered “small”. Many commits over the past 7 years have reduced the barrier (it all started with 22 elements ), and at the moment they have converged on a length of 10 elements .

Now about the method of selecting the reference element, which has also undergone a number of changes.

The first implementation used a random index selection for the reference element ( source ):

 var pivot_index = $floor($random() * (to - from)) + from;

Then, the median of the three (first, last, and middle element) was changed to choose.

But in the end, a simple choice of the median of three was supplemented with a very unusual method of determining the average element for the median over large arrays ( over 1000 elements ). Elements are selected from the initial array (in increments of a little more than 200) and written to a new array. Then this array is sorted, and its middle element becomes the middle element for finding the median. This function is called GetThirdIndex :

 var GetThirdIndex = function(a, from, to) { var t_array = []; // Use both 'from' and 'to' to determine the pivot candidates. var increment = 200 + ((to - from) & 15); for (var i = from + 1, j = 0; i < to - 1; i += increment, j++) { t_array[j] = [i, a[i]]; } %_CallFunction(t_array, function(a, b) { return %_CallFunction(UNDEFINED, a[1], b[1], comparefn); }, ArraySort); var third_index = t_array[t_array.length >> 1][0]; return third_index; }

And finally, about recursion. The problem of the basic algorithm with two recursive calls (on the left and right parts of the array) is that if the selection of the reference element is unsuccessful, the depth of the recursion will be commensurate with the size of the array. This not only significantly impairs the execution speed, but also translates into a memory O (n) estimate for the worst case. In the sorting implementation for V8, this problem was also eliminated : the sorting is called recursively only for the smaller subarray, and the sorting of the larger continues in the main loop. This optimization ensures an estimate of the memory consumption O (log n). Usually, the compiler can perform such tail recursion optimization, and in our case it had to be done manually with JS.

Here is such a quick sort now used in V8. But the story about sorting in the js engine would be incomplete without some points that are easier to show with examples.

Let's start with the well-known. What happens if you do not pass the function comparator:

 var arr = [1, 2, 11]; arr.sort(); [1, 11, 2]

In this case, the default comparator is used , which sorts in the lexicographical order:

 comparefn = function (x, y) { if (x === y) return 0; if (%_IsSmi(x) && %_IsSmi(y)) { return %SmiLexicographicCompare(x, y); } x = $toString(x); y = $toString(y); if (x == y) return 0; else return x < y ? -1 : 1; };

The next example is a mystery:

 var arr = [2, undefined, 1]; arr.sort(function () { return 0; });

Answer

 [2, 1, undefined]

Wat? But there is logic. This is because, before sorting, the array is reorganized by the SafeRemoveArrayHoles function, namely: all significant elements replace the undefined at the beginning of the array, thereby all the undefined are accumulated at the end.

And finally, perhaps the most fun. Array.prototype.sort not only sorts arrays and objects like arrays. Even values from the prototype can take part in sorting. All missing properties (from zero to the value of the length property) will be copied (the CopyFromPrototype function) from the prototype into the object itself, and then sorted:

 function Arr() { this[1] = 1; } Arr.prototype[0] = 2; Arr.prototype.length = 2 var arr = new Arr(); Array.prototype.sort.call(arr, function (a, b) { return a - b; }); Arr {0: 1, 1: 2}

This description of the sorting can be completed. Now we know everything that is actually behind the inscription [native code], and we can draw a number of useful conclusions:

V8 write and improve ordinary people, the same as we are with you. The same piece of code corresponded and supplemented over 7 years: not all optimizations were applied immediately, and perhaps this implementation can be somehow accelerated or rewritten again :)
No matter how they improve the search methods for the reference element, this procedure still depends on the input data. The case of the “unsuccessful” selection of the support element and the subsequent degradation of performance are possible. In general, it is worth looking at your input data; it may be more profitable for them to use, for example, sorting by counting ;
Quick sorting is intermittent ( Waxer example ). But if you need sustainability, you can do it in different ways. You can supplement the elements of the array with their initial index and change the comparator somewhat to take it into account. Or, implement some sortable sorting in js, for example, Timsort . But this is all beyond the scope of this article.

I hope you were just as interested to read this study as I do it. Good luck in learning the source!

Source: https://habr.com/ru/post/265079/

All Articles

Implementing V8 Sorted by Google

More articles: