📜 ⬆️ ⬇️

280 crinkles or explosive power of regular expressions

In general, probably, like any other JavaScript programmer who starts out (2 years ago), I wanted to do everything with my own hands. So a terrifying, very fast regular expression of 280 characters arose.

A bit of history


About a year and a half ago, I learned about the yass library, which was the fastest tool for finding DOM elements in JavaScript using CSS selectors ( link to tests ).
And then I had a terrible interest. I wanted to think of a way that would be even faster. At that time I was just reading the book “Regular Expressions of the Programmer’s Library” second edition from J. Friedl. And so ... It was summer, I was still a student and I had a lot of time. Work has begun to boil ...

What are we making noise?


I decided to write the article precisely because of the following expression, which is able to almost completely analyze the CSS selector query (even a little advanced, which goes beyond the CSS3 standard):
/(?:(?:\s*[+>~,]\s*|\s+)|[^:+>~,\s\\[\]]+(?:\\.[^:+>~,\s\\[\]]*)*)|\[(?:[^\\[\]]*(?:\\.[^\\[\]]*)*|[^=]+=~?\s*(?:"[^\\"]*(?:\\.[^"\\]*)*"|'[^\\']*(?:\\.[^'\\]*)*'))\]|:[^\\:([]+(?:\\.[^\\:([]*)*(?:\((?:[^\\()]*(?:\\.[^\\()]*)*|"[^\\"]*(?:\\.[^"\\]*)*"|'[^\\']*(?:\\.[^'\\]*)*')\))?/g


Let's be friends


I have to say that a normal person, in this form, does not understand anything in the lines above! I, referring to the number of not normal, to write it made a regular expression parser in JavaScript . In fact, we got a simple form: a regular expression in one field, a search string in another, and a third result, several checkboxes.
We write this expression in a readable form using the “x” modifier (I implemented its essential JavaScript emulation).
(?:
(?:\s*[+>~,]\s*|\s+)
|
[^:+>~,\s\\[\]]+(?:\\.[^:+>~,\s\\[\]]*)*
)
|
\[(?:
[^\\[\]]*(?:\\.[^\\[\]]*)*
|
[^=]+=~?\s*
(?:
"[^\\"]*(?:\\.[^"\\]*)*"
|
'[^\\']*(?:\\.[^'\\]*)*'
)
)\]
|
:[^\\:([]+(?:\\.[^\\:([]*)*
(?:
\((?:
[^\\()]*(?:\\.[^\\()]*)*
|
"[^\\"]*(?:\\.[^"\\]*)*"
|
'[^\\']*(?:\\.[^'\\]*)*'
)\)
)?


')

A bit of theory


Immediately to make it clear and I did not write for myself or the guru of regular expressions, I would say that in this expression the construction of the beginning (normal characters) * (special characters (normal characters) *) * end is very much repeated. This is an almost universal construction of finding something between some characters, for example, searching for text between quotes, and nested quotes are allowed considering screening. More detailed information can be found in the above mentioned book, in the section "Building Efficient Regular Expressions".
In our case, this concerns the search for text between quotation marks ("and '), round and square brackets, as well as the symbols" + ","> "," ~ ",", "," ",": ".

We analyze


The basis for the construction of this expression is the ability to split the CSS selector into parts. I broke it like this:

Let us now compare all this with expression.
The first part is looking for or "+", ">", ",", "~", "\ s +", if it does not find it, then we look for everything in between.
The second handles square brackets. The template "[^ =] + = ~? \ S *" was built so that you can search for attribute selectors using arbitrarily complex regular expressions.
The third one finds matches for the pseudo selector, and it is not necessary to put the parentheses.
All characters can be escaped using a backslash ("\") or taken as an expression in single or double quotes, then they will not be perceived as control.

Conclusion


I think further it is clear how easy it is to write a CSS3 parser selectors. Who is interested in experimenting - go here . I would be very grateful if someone favors the improvement of the speed of the robots or the severity of the regular expression.
And of course, many thanks to J. Friedl, the author of a series of priceless books on regular expressions

PS: I apologize for the greatness of the regular expression analyzer. It was created as an intermediate stage (works in Chrome and FF exactly). If something does not work, there is a callback to the change event, click on the checkbox, or simply insert a space in the field with a regular expression, it should start.

Source: https://habr.com/ru/post/114156/


All Articles