📜 ⬆️ ⬇️

We simulate intersection, exclusion and subtraction, using forward checks, in regular expressions in ECMAScript

From translator


This is a translation of a small note written yesterday by Lea Verou , it offers an interesting, though not new, technique for solving everyday tasks.

The information in the article concerns ECMAScript , but it can be used in other RegExp engines (although there is a possibility that there is a more appropriate solution).

If the examples seem difficult to you, I recommend playing with them in the console, as you read. And I will forgive you in advance for reading the scary title.
')

Article


If you use regular expressions for some time, then you have probably come across different variants of the following tasks:



Despite the fact that in ECMAScript there is a circumflex (^), to exclude a character set, we do not have the opportunity to exclude something more complex.

In addition, we have a vertical bar (|) denoting "OR", but we have nothing that would mean "AND", and nothing that would mean "EXCEPT" (Exception). You can do all these actions with a simple set of characters, using character classes, but with complex sequences this will not work.

Nevertheless, we can imitate all three operations, taking advantage of the fact that advanced checks do not capture characters and do not shift the search position. We can simply continue to look for a match further, and they will coincide with the substring we need, because the advanced checks do not capture anything ...

An exception


As a simple example: the expression / ^ (?! meow) \ w {3} $ / captures any three-character word that does not contain the word "meow." This is a simple exception option.

Here is the solution for the problem proposed above: / ^ (?! \ D + [50] 0) \ d {3} $ / .

Intersection


For intersection (I), we can simply chain up a few positive forward checks, and grab the string we need with the last template (If we leave only the forward checks, we still get the right answer, but we can get the wrong matches). For example, the solution for the problem with the password given above will be as follows: /^(?=.*\d)(?=.*azaz :(( == . *[\W_ [)] .66, i .
If you want your regular expressions to work in Internet Explorer version 8 and below, it is important to be aware of this error and to change your regular expressions accordingly.

Negation


Negation is the best. We just need a negative prefetch condition and . + To capture the substring that passed the test. For example, the solution for the problem proposed above would look like this: /^(?!.*foo) .+$/. It is true, however, to recognize that from the whole list, denial is not the least useful.


Conclusion


This technique has some difficulties. This is mainly due to what is captured as a result. (Ensure that the exciting template outside the advance checks captures the entire line that you need)

Steven Levithan is digging even deeper, and trying to mimic conditional operators and atomic groups . Goodbye brain.


A pair of bonus links


A utility that parses regular expressions in parts and explains them.
JS library , greatly facilitating the work with regular expressions and adding functionality to them.

Source: https://habr.com/ru/post/143857/


All Articles