ECMAScript 6 introduces two new flags for regular expressions:
y
turns on sticky matching mode.u
includes various options related to Unicode.u
flag. This article will be useful to you if you are familiar with Unicode-problems in Javascript .
u
flag in a regular expression allows the escape sequences of ES6 Unicode code points ( \u{...}
) in the pattern.
u
flag, things like \u{1234}
may technically still occur in patterns, but they will not be interpreted as Unicode code point escape sequences. /\u{1234}/
equivalent to writing /u{1234}/
, which corresponds to 1234
consecutive u
characters instead of the character corresponding to the U + 1234 code points escape sequence.
u
flag set and things like \a
(where a
not an escape sequence) will no longer be equivalent to a
. Therefore, even if /\a/
processed as /a/
, /\a/u
throws an error, since \a
not a reserved escape sequence. This allows extending the functionality of the u
flag of regular expressions in a future version of ECMAScript. For example, /\p{Script=Greek}/u
throws an exception for ES6, but can become a regular expression matching all the characters of the Greek alphabet according to the Unicode database when the corresponding syntax is added to the specification.
.
'u
matches any BMP character (Basic Multilingual Plane) with the exception of line terminators . When the flag is set ES6 u
,. also corresponds to astral symbols.
*
, +
?
, and {2}
, {2,}
, {2,4}
. In the absence of the u
flag, if the quantifier follows the astral symbol, it applies only to the low surrogate of this symbol.
u
flag, any given character class can only match BMP characters. Things like [bcd]
work as we expect:
const regex = /^[bcd]$/; console.log( regex.test('a'), // false regex.test('b'), // true regex.test('c'), // true regex.test('d'), // true regex.test('e') // false );
u
flag allows you to use solid astral symbols in character classes.
u
flag is set.
u
flag also affects excluding character classes . For example, /[^a]/
equivalent to /[\0-\x60\x62-\uFFFF]/
, which matches any BMP character except a
. But with the flag u
/[^a]/u
corresponds to a much larger set of all Unicode characters except a
.
u
flag affects the value of the \D
, \S
, and \W
escape sequences. In the absence of the flag u
, \D
, \S
, and \W
correspond to any BMP characters that do not correspond to \d
, \s
and \w
, respectively.
u
, \D
, \S
, and \W
flags also correspond to astral symbols.
u
flag does not refer to their inverse analogues \d
, \s
and \w
. It was suggested that \d
and \w
(and \b
) be more Unicode-compatible, but this proposal was rejected.
i
i
and u
flags are set, all characters are implicitly converted to a single register using a simple conversion provided by the Unicode standard, immediately before matching them.
const es5regex = /[az]/i; const es6regex = /[az]/iu; console.log( es5regex.test('s'), es6regex.test('s'), // true true es5regex.test('S'), es6regex.test('S'), // true true // Note: U+017F `S`. es5regex.test('\u017F'), es6regex.test('\u017F'), // false true // Note: U+212A `K`. es5regex.test('\u212A'), es6regex.test('\u212A') // false true );
console.log( /\u212A/iu.test('K'), // true /\u212A/iu.test('k'), // true /\u017F/iu.test('S'), // true /\u017F/iu.test('s') // true );
\w
and \W
escape sequences, which also affects the \b
and \B
escape sequences. /\w/iu
corresponds to [0-9A-Z_a-z]
, but also U + 017F , because U + 017F from the matched regular expression string is converted (canonicalizes) to S
The same goes for U + 212A and K
Thus, /\W/iu
equivalent to /[^0-9a-zA-Z_\u{017F}\u{212A}]/u
.
console.log( /\w/iu.test('\u017F'), // true /\w/iu.test('\u212A'), // true /\W/iu.test('\u017F'), // false /\W/iu.test('\u212A'), // false /\W/iu.test('s'), // false /\W/iu.test('S'), // false /\W/iu.test('K'), // false /\W/iu.test('k'), // false /\b/iu.test('\u017F'), // true /\b/iu.test('\u212A'), // true /\b/iu.test('s'), // true /\b/iu.test('S'), // true /\B/iu.test('\u017F'), // false /\B/iu.test('\u212A'), // false /\B/iu.test('s'), // false /\B/iu.test('S'), // false /\B/iu.test('K'), // false /\B/iu.test('k') // false );
u
flag also affects HTML documents.
input
and textarea
elements allows you to specify a regular expression to validate user input. The browser then provides you with styles and scripts to create behavior based on the validity of the input.
u
flag is always enabled for regular expressions compiled using the HTML attribute pattern
. Here is a demo .
u
flag for regular expressions is available in stable versions of all major browsers. Browsers are gradually starting to use this functionality for the HTML attribute pattern
.
Browser (s) | JavaScript engine | u flag | u flag for pattern attribute |
---|---|---|---|
Edge | Chakra | issue # 1102227 + issue # 517 + issue # 1181 | issue # 7113940 |
Firefox | Spidermonkey | bug # 1135377 + bug # 1281739 | bug # 1227906 |
Chrome / Opera | V8 | V8 issue # 2952 + issue # 5080 | issue # 535441 |
Webkit | Javascriptore | bug # 154842 + bug # 151597 + bug # 158505 | bug # 151598 |
u
flag for every regular expression you write.u
flag to existing regular expressions, as this may implicitly change their meaning.u
and i
flags. It is better to explicitly include in the regular expression the characters of all registers than to suffer from the implicit programmatic reduction of characters to one register.u
transpilation. Let me know if you can break it.Source: https://habr.com/ru/post/338366/