for in
loop that goes through the enumerated properties of an object. Since only property A
indicated, it can be assumed that a message with the letter
will be displayed. Well ... I was wrong. : D for(A in {A:0}){console.log(A)}; // A
for(A in {A:0}){console.log(escape(A))}; // A%uDB40%uDD6C%uDB40%uDD77%uDB40%uDD61%uDB40%uDD79%uDB40%uDD73%uDB40%uDD20%uDB40%uDD62%uDB40%uDD65%uDB40%uDD20%uDB40%uDD77%uDB40%uDD61%uDB40%uDD72%uDB40%uDD79%uDB40%uDD20%uDB40%uDD6F%uDB40%uDD66%uDB40%uDD20%uDB40%uDD4A%uDB40%uDD61%uDB40%uDD76%uDB40%uDD61%uDB40%uDD73%uDB40%uDD63%uDB40%uDD72%uDB40%uDD69%uDB40%uDD70%uDB40%uDD74%uDB40%uDD20%uDB40%uDD63%uDB40%uDD6F%uDB40%uDD6E%uDB40%uDD74%uDB40%uDD61%uDB40%uDD69%uDB40%uDD6E%uDB40%uDD69%uDB40%uDD6E%uDB40%uDD67%uDB40%uDD20%uDB40%uDD71%uDB40%uDD75%uDB40%uDD6F%uDB40%uDD74%uDB40%uDD65%uDB40%uDD73%uDB40%uDD2E%uDB40%uDD20%uDB40%uDD4E%uDB40%uDD6F%uDB40%uDD20%uDB40%uDD71%uDB40%uDD75%uDB40%uDD6F%uDB40%uDD74%uDB40%uDD65%uDB40%uDD73%uDB40%uDD20%uDB40%uDD3D%uDB40%uDD20%uDB40%uDD73%uDB40%uDD61%uDB40%uDD66%uDB40%uDD65%uDB40%uDD21
from the object and immediately realized that the Chrome console was working with something hidden, because the cursor was “frozen” and did not respond to a few keystrokes left / right.
with the value of code unit 65
, followed by several code units in the region of 55 thousand and 56 thousand, which console.log
visualizes with a familiar sign of the question. This means that the system does not know how to handle this code unit.65536
). This is necessary because Unicode itself defines 1,114,122 different code points, and in JavaScript the format of the string is UTF-16. That is, only the first 65536 code points from Unicode can be represented by a single element of the JavaScript code unit.65536
.for of
loop that drives the code points of the string (not code units, like the first for
loop), as well as the ...
operator, which is used in for of
.console.log
doesn't even know how to display these code points, we need to figure out what we are dealing with.917868
, 917879
and onward are part of the Unicode Variation Selectors Supplement . Variant selectors in Unicode are used to indicate standardized variant sequences for mathematical symbols, emoji, Mongolian square letters, and Eastern single ideograms corresponding to Eastern compatibility ideograms. They are usually not used by themselves. Identifier :: IdentifierName but not ReservedWord IdentifierName :: IdentifierStart IdentifierName IdentifierPart IdentifierStart :: UnicodeLetter $ _ \ UnicodeEscapeSequence IdentifierPart :: IdentifierStart UnicodeCombiningMark UnicodeDigit UnicodeConnectorPunctuation <ZWNJ> <ZWJ>
IdentifierName
and IdentifierPart
. The identification of IdentifierPart
is important. Apart from the first character of the identifier, all other names are fully valid: const examples = { // UnicodeCombiningMark example somethingî: 'LATIN SMALL LETTER I WITH CIRCUMFLEX', somethingi\u0302: 'I + COMBINING CIRCUMFLEX ACCENT', // UnicodeDigit example something١: 'ARABIC-INDIC DIGIT ONE', something\u0661: 'ARABIC-INDIC DIGIT ONE', // UnicodeConnectorPunctuation example something﹍: 'DASHED LOW LINE', something\ufe4d: 'DASHED LOW LINE', // ZWJ and ZWNJ example something\u200c: 'ZERO WIDTH NON JOINER', something\u200d: 'ZERO WIDTH JOINER' }
{ somethingî: "ARABIC-INDIC DIGIT ONE", somethingî: "I + COMBINING CIRCUMFLEX ACCENT", something١: "ARABIC-INDIC DIGIT ONE" something﹍: "DASHED LOW LINE", something: "ZERO-WIDTH NON-JOINER", something: "ZERO-WIDTH JOINER" }
The two IdentifierName, which are canonically equivalent to the Unicode standard, are not the same until they are represented exactly the same sequence of code units.
î
, which corresponds to a code unit with the value 00ee
and the symbol i
with a circumflex COMBINING CIRCUMFLEX ACCENT
. So this is not the same thing, and dual properties are included in the object. The same with the Zero-Width joiner or Zero-Width non-joiner symbols . They look the same, but they are not!UnicodeCombiningMark
category, which makes them valid identifier names (even if they are invisible). They are invisible, because with high probability the system will show the result only if they are used in a valid combination.escape
function does is pass through all code points and treat them as escape . That is, it takes the first letter
and all parts of surrogate pairs - and simply converts them again into strings. Invisible values ​​are "converted to string form." This is how the long sequence that you saw at the beginning of the article appears. A%uDB40%uDD6C%uDB40%uDD77%uDB40%uDD61%uDB40%uDD79%uDB40%uDD73%uDB40%uDD20%uDB40%uDD62%uDB40%uDD65%uDB40%uDD20%uDB40%uDD77%uDB40%uDD61%uDB40%uDD72%uDB40%uDD79%uDB40%uDD20%uDB40%uDD6F%uDB40%uDD66%uDB40%uDD20%uDB40%uDD4A%uDB40%uDD61%uDB40%uDD76%uDB40%uDD61%uDB40%uDD73%uDB40%uDD63%uDB40%uDD72%uDB40%uDD69%uDB40%uDD70%uDB40%uDD74%uDB40%uDD20%uDB40%uDD63%uDB40%uDD6F%uDB40%uDD6E%uDB40%uDD74%uDB40%uDD61%uDB40%uDD69%uDB40%uDD6E%uDB40%uDD69%uDB40%uDD6E%uDB40%uDD67%uDB40%uDD20%uDB40%uDD71%uDB40%uDD75%uDB40%uDD6F%uDB40%uDD74%uDB40%uDD65%uDB40%uDD73%uDB40%uDD2E%uDB40%uDD20%uDB40%uDD4E%uDB40%uDD6F%uDB40%uDD20%uDB40%uDD71%uDB40%uDD75%uDB40%uDD6F%uDB40%uDD74%uDB40%uDD65%uDB40%uDD73%uDB40%uDD20%uDB40%uDD3D%uDB40%uDD20%uDB40%uDD73%uDB40%uDD61%uDB40%uDD66%uDB40%uDD65%uDB40%uDD21
// a valid surrogate pair sequence '%uDB40%uDD6C'.replace(/u.{8}/g,[]); // %6C 6C (hex) === 108 (dec) LATIN SMALL LETTER L unescape('%6C') // 'l'
[]
as a string replacement is a bit incomprehensible. It will be evaluated using toString()
, that is, converted to ''
.[]
is that in this way you can bypass the quote filter or something similar .A:0
- here A
includes many “hidden code units”escape
replace
Source: https://habr.com/ru/post/334980/
All Articles