YASS Architecture Part 2: sampling on the CSS selector

The article on modular loading was the first sign in a series of notes in which I am going to tell you step by step on what principles YASS is built on and what were the results of testing for maximum performance of each part of this library. But first things first.

Formulation of the problem

About the simplest: what do we want to achieve? We want, by specifying an arbitrary string of a CSS selector that conforms to the specification , to output an array of all the elements corresponding to this very string. It seems so far simple.
')
To illustrate the specification, the following examples can be given:

  // returns the element with the identifier my_id
 querySelectorAll ('# my_id')
 // returns all elements with class external
 querySelectorAll ('. external')
 // returns all paragraphs on the page
 querySelectorAll ('p')

However, here we can note one thing: very often we need to select just an element by its identifier or find all the elements with a specific class. These operations are quite common in all JavaScript libraries, so they should be performed as quickly as possible. It is extremely imprudent to run the entire mechanism for analyzing the input selector string simply in the case where we need to return a single element specified using an identifier. Here we can use the principle of lazy programming: “do not do what can not be done” - and to accelerate the work sufficiently enough for the simplest cases.

If you look at modern JavaScript libraries, then everywhere such a check is already done using a regular expression. And then immediately a little trick: the initialization of any regular expression is quite expensive in browsers (although the complexity of the expression begins to affect the elapsed time only if it is long), and it would be reasonable to do without it at all. And when you can just use indexOf to check for a substring, you should always use this (in elementary cases).

In the same case, if the regular expression provides correctness and replacing it does not lead to loss of code readability and incomprehensible gain (or even loss) in execution speed, you can replace, for example, exec with a bunch of test and charAt / substr : this will increase performance by about 20%. If this section of code is executed in a loop multiple times, this can be quite significant.

In YASS, this problem is solved as follows:

  // check if the selector satisfies the simple case
 if (/^[\w[:#.[[\w\RL**||=====$/.test (selector)) {
 // in the case of a positive result, initialize the variable
 // which is responsible for '#', '.'  , ':' or '[' at the beginning of the selector
	 var firstLetter = selector.charAt (0);
	 ...
 }

From simple to complex

But let's look at how the overall task of parsing CSS selectors is solved. If we take into account that the selector can be specified in the form p a.link, form input[type=radio] , then the logic of its analysis can be written schematically in the form:

Select the sequence of selectors that are between the commas. Next, we work with each sequence separately. At the output, all sequences are combined into a final array ( sets ).
In the sequence of selectors, we have a set of elementary selectors that are “nested” in each other (for our example, p a.link ). We need to break the sequence into parts and disassemble each such part, given that the parent elements for the next part will be the selected elements from the previous one. For the "transformation" of the child nodes in the parent (directly the process of growing up turns out) is an array of nodes .
Each elementary element will have to be parsed using a regular expression to isolate the parts responsible for the identifier, class, and CSS 2/3 modifiers. Disassemble the fastest with exec , and then write to the variable parts of the resulting array:
```
  single = regexp.exec (single);
 tag = single [1];
 id = single [2];
 ... 
```
And finally, the third cycle goes through all the parent elements and tries to select from them the child nodes corresponding to the parameters set in the CSS selector.

As we can see, the basic logic of this task includes at least one regular expression (using indexOf and substring will be much more resource intensive with such complexity) and 3 cycles (which should be made as fast as possible). Recently there was an article about the fastest methods of sorting elements , I will not list all the possibilities, I will just focus on some aspects.

Brute force array

Let us declare some array a , with the elements of which we perform some actions. We need to iterate over all elements strictly ascending (order is important), i.e. just while(i--) we cannot use. The most common method now is the usual for :

  var j = 0,
	 item,
	 len = a.length;
 for (var j = 0, item = a [j]; item; item = a [j ++]) {
	 item ++;
 }

Naturally, he is 30–40% slower than the following while :

  var j = 0,
	 item,
	 len = a.length;
 while (j <len) {
	 item = a [j ++];
	 item ++;
 }

However, if we need to perform any actions with an array element, then without caching it into a local variable can not do. In this case, the next option with while (through checking the existence of elements with an increment) will be even faster by 5–10%:

  var j = 0,
	 item;
 while (item = a [j ++]) {
	 item ++;
 }

It is obvious that for all three cycles in YASS it is used.

If the order of the elements is absolutely not important to us (for example, you just need to find the right one or return false ), then it will be logical to use the reverse while :

  while (idx--) {
	 sets [idx] .yeasss = null;
 }

It is this code that is used to reset an element of a flag that marks the state "selected." Let's consider why this flag is needed, and why it should be reset.

Uniqueness of elements

One of the main problems in the selection of elements for the CSS-selector is the correctness of the final set. In the case of div p , we may have nested div , and if we just iterate through all the parent elements and combine the resulting children, then the final set will contain duplicates. To avoid this kind of error, we need to somehow mark the elements that we have chosen.

This is a standard task, and it is solved, in principle, also standard: in all libraries, a certain property of DOM nodes is brought up, which is responsible for the “selected” state. This approach has several drawbacks (for example, you need to extend this solution to the case of asynchronous samples of elements and understand that every access to elements of the tree is resource-intensive), but in most cases it allows eliminating non-uniqueness of elements.

Schematically present the work of this flag can be as follows.

  for (child in children) {
	 if (! children [child] .yeasss) {
		 if (last) {
			 children [child] .yeasss = 1;
		 }
		 newNodes = children [child];
	 }
 }

In the process of enumerating the child nodes we need, we check if this descendant has the yeasss flag yeasss (of course, besides this check we have to make sure that we really need this descendant). Next, we set the flag only if we work with the last link in the chain of “children-parents”, and write the child node to the array of new nodes that will become “parents” at the next iteration of the cycle.

It is clearly seen that the yeasss flag will be set only if we work with the last link: it will never be set for intermediate links, therefore we do not extend any restrictions on the parent elements. It is possible that it would be possible to reduce the array of parents at each iteration (in order not to choose the same elements), however, this step will not increase (in the general case, only decrease) the sample performance, therefore this approach is not used.

The flag (described in the previous section) is reset for the final array of elements (after all cycles have been completed) and is necessary both for the correctness of further samples and for preventing memory leaks in IE (advanced properties of elements cause micro leaks in IE6 / 7).

Drawing the line

YASS was created and created to search for and implement the most productive methods for solving a certain range of tasks. It can be used both for educational purposes and for purely practical ones (for example, to compile a set of CSS selectors that are not used on the site — with the help of YASS this is implemented most quickly).

To be continued...

Source: https://habr.com/ru/post/49679/

All Articles