Probably, my most courageous design decision was the refusal to assign JSON number versions, so there is no mechanism for making changes. We are stuck with JSON: whatever its current form is, it’s just that.
Someone told the ECMA working group that IETF was crazy and was going to rewrite JSON without regard for compatibility and breakdown of the entire Internet, and something urgently needed to be done with this terrible situation. <...> This has nothing to do with complaints that affected the audit by the IETF.
U+2028 LINE SEPARATOR
and U+2029 PARAGRAPH SEPARATOR
. But the JavaScript specification states that string values ​​cannot contain end-of-line characters ( ECMA-262 - 7.8.4 String Literals ), and generally these characters include U+2028
and U+2029
( 7.3 Line Terminators ). The fact that these two characters can be used in JSON strings without escaping, and in JS they are not implied at all, suggests that JSON is not a subset of JavaScript, despite the stated development goals.y
(yes) - successful parsing;n
(no) - parsing error;i
(implementation) - depends on the implementation.n_string_unescaped_tab.json
contains ["09"]
- this is an array with a string that includes the TAB 0x09
character, which MUST be escaped (u-escaped) according to JSON specifications. The file tests string parsing, so the name contains a string
, not a structure
, array
or object
. According to RFC 7159, this is an invalid string value, so n
appears in the file name."test"
), so I embedded strings in arrays ( ["test"]
). y_structure_lonely_string.json "asd"
[123,]
or {"a":1,}
, are not part of the grammar, so such files should not pass tests, right? But the fact is that RFC 7159 allows parsers to support "extensions" ( section 9 ), although no explanation is given about them. In practice, trailing commas are a common extension. Since this is not part of the JSON grammar, parsers are not required to support them, so file names begin with n. n_object_trailing_comma.json {"id":0,} n_object_several_trailing_commas.json {"id":0,,,,,}
[1]//xxx
or even embedded [1,/*xxx*/2]
. y_string_comments.json ["a/*b*/c/*d//e"] n_object_trailing_comment.json {"a":"b"}/**/ n_structure_object_with_comment.json {"a":/*comment*/"b"}
[
or [1,{,3]
. Obviously, this is a mistake and the tests should not be passed. n_structure_object_unclosed_no_value.json {"": n_structure_object_followed_by_closing_object.json {}}
[[[[[]]]]]
. RFC 7159 allows parsers to set limits on the maximum nesting depth ( section 9 ).[
. This is probably because the selector of syntax elements JSON does not implement the depth limit. $ python -c "print('['*100000)" > ~/x.json $ ./Xcode ~/x.json Segmentation fault: 11
0x20
(space), 0x09
(tabulation), 0x0A
(line feed) and 0x0D
(carriage return) as their quality. Spaces are allowed before and after structural characters []{}:,
. So 20[090A]0D
will pass the tests. Conversely, the file will not pass the tests if we include all kinds of spaces that are not explicitly allowed, for example, input form 0x0C
or [E281A0]
- UTF-8 designation for the word connector U+2060 WORD JOINER
. n_structure_whitespace_formfeed.json [0C] n_structure_whitespace_U+2060_word_joiner.json [E281A0] n_structure_no_data.json
NaN
or Infinity
are not part of the JSON grammar. But some parsers accept them, regarding them as “extensions” ( section 9 ). In the test files, the negative forms -NaN
and -Infinity
are also checked. n_number_NaN.json [NaN] n_number_minus_infinity.json [-Infinity]
0xFF
, and such files should not be parsed. n_number_hex_2_digits.json [0x42]
1e9999
or 0.0000000000000000000000000000001
. y_number_very_big_negative_int.json [-237462374673276894279832(...)
[0E0]
, [0e+1]
), and [0E0]
valid variants ( [1.0e+]
, [0E]
and [1eE2]
). n_number_0_capital_E+.json [0E+] n_number_.2e-3.json [.2e-3] y_number_double_huge_neg_exp.json [123.456e-789]
[[]
and [[[]]]
, but will not pass ]
or [[]]]
. n_array_comma_and_number.json [,1] n_array_colon_instead_of_comma.json ["": 1] n_array_unclosed_with_new_lines.json [1,0A10A,1
{"a":1,"a":2}
, but allows parsers to decide for themselves what to do in such cases. Section 4 even mentions that “[some] implementations report an error or failure while parsing an object”, without specifying whether the parsing failure corresponds to the RFC provisions, especially this : “The JSON parser MUST accept all kinds of texts corresponding to the JSON grammar ".{"a":1,"a":1}
, as well as keys or values ​​whose sameness depends on how the strings are compared. For example, the keys may be different in binary expression, but equivalent in accordance with the normalization of Inicode NFC: {"C3A9:"NFC","65CC81":"NFD"}
, here both keys denote" Ă© ". Also included in the tests is {"a":0,"a":-0}
. y_object_empty_key.json {"":0} y_object_duplicated_key_and_value.json {"a":"b","a":"b"} n_object_double_colon.json {"x"::"b"} n_object_key_with_single_quotes.json {key: 'value'} n_object_missing_key.json {:"b"} n_object_non_string_key.json {1:1}
y_string_utf16.json FFFE[00"00E900"00]00 n_string_iso_latin_1.json ["E9"]
n_structure_UTF8_BOM_no_data.json EFBBBF n_structure_incomplete_UTF8_BOM.json EFBB{} i_structure_UTF-8_BOM_empty_object.json EFBBBF{}
U+0000
as U+001F
( section 7 ). This does not include the 0x7F DEL character, which may be part of other control character definitions (see section 4.6, Bash JSON.sh). Therefore, the tests must pass ["7F"]
. n_string_unescaped_ctrl_char.json ["a\09a"] y_string_unescaped_char_delete.json ["7F"] n_string_escape_x.json ["\x00"]
["\"]
, ["\
, [\
. y_string_allowed_escapes.json ["\"\\/\b\f\n\r\t"] n_structure_bad_escape.json ["\
\u005C
). Successful tests include a zero character ( \u0000
zero), which can lead to problems in the C parsers. Failure tests include the capital U \U005C
, non-hexadecimal shielded values \u123Z
and values ​​with incomplete shielding \u123
. y_string_backslash_and_u_escaped_zero.json ["\u0000"] n_string_invalid_unicode_escape.json ["\uqqqq"] n_string_incomplete_escaped_character.json ["\u00A"]
+1D11E
becomes \uD834\uDD1E
. Successful tests include single substitutes, since they are valid from the point of view of JSON grammar. A typo 3984 in RFC 7159 gave rise to the problem of grammatically correct shielded code points that are not Unicode characters ( \uDEAD
), or non-characters from U+FDD0
to U+10FFFE
.i_
(depends on the implementation). According to the Unicode standard, invalid code points must be replaced with the U+FFFD REPLACEMENT CHARACTER
replacement character. If you have already experienced the complexity of Unicode , then you will not be surprised that the replacement is optional and can be done in different ways (see Unicode PR # 121: Recommended Techniques for Replacement Characters ). Therefore, some parsers use replacement characters, while others leave a shielded form or generate a non-Unicode character (see Section 5 - Parsing Content ). y_string_accepted_surrogate_pair.json ["\uD801\udc37"] n_string_incomplete_escaped_character.json ["\u00A"] i_string_incomplete_surrogates_escape_valid.json ["\uD800\uD800\n"] i_string_lone_second_surrogate.json ["\uDFAA"] i_string_1st_valid_surrogate_2nd_invalid.json ["\uD888\u1234"] i_string_inverted_surrogates_U+1D11E.json ["\uDd1e\uD834"]
\uDEAD
). These points are valid Unicode in u-shielded form, but are not decoded into Unicode characters. y_string_utf8.json ["€?"] n_string_invalid_utf-8.json ["FF"] n_array_invalid_utf8.json [FF]
The JSON parser MUST accept all texts corresponding to the JSON grammar. The JSON parser MAY accept non-JSON forms or extensions.
Implementations may limit:
- the size of the received text;
- maximum depth of nesting;
- range and accuracy of numbers;
- the length of the string values ​​and their character set.
SHOULD. This word, like “REQUIRED” or “FOLLOWS”, means a mandatory specification requirement.
run_tests.py
Python script run_tests.py
each test file through each parser (or a single test if the file is passed as an argument). Usually the parsers were wrappers and returned 0 if successful and 1 if parsing failed. A separate status was provided for the fall of the parser, as well as a timeout of 5 seconds. In fact, I turned JSON parsers into JSON validators.run_tests.py
compared the return value for each test with the expected result, reflected in the prefix of the file name. If they did not match or when the prefix was i
(depends on the implementation), run_tests.py
recorded in a log ( results/logs.txt
) a string of a certain format: Python 2.7.10 SHOULD_HAVE_FAILED n_number_infinity.json
run_tests.py
read the log and generated HTML tables with the results ( results/parsing.html
).run_tests.py
there is an option that allows you to display “abbreviated results” (pruned results): when a test suite gives the same results, only the first test is saved. The abbreviated file is available here: www.seriot.ch/json/parsing_pruned.html . *** Assertion failure in -[SBJson4Parser parserFound:isValue:], SBJson4Parser.m:150 *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Invalid parameter not satisfying: obj' *** First throw call stack: ( 0 CoreFoundation 0x00007fff95f4b4f2 __exceptionPreprocess + 178 1 libobjc.A.dylib 0x00007fff9783bf7e objc_exception_throw + 48 2 CoreFoundation 0x00007fff95f501ca +[NSException raise:format:arguments:] + 106 3 Foundation 0x00007fff9ce86856 -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 198 4 test_SBJSON 0x00000001000067e5 -[SBJson4Parser parserFound:isValue:] + 309 5 test_SBJSON 0x00000001000073f3 -[SBJson4Parser parserFoundString:] + 67 6 test_SBJSON 0x0000000100004289 -[SBJson4StreamParser parse:] + 2377 7 test_SBJSON 0x0000000100007989 -[SBJson4Parser parse:] + 73 8 test_SBJSON 0x0000000100005d0d main + 221 9 libdyld.dylib 0x00007fff929ea5ad start + 1 ) libc++abi.dylib: terminating with uncaught exception of type NSException
[123123e100000]
["\ud800"]
[1,]
{"a":0,}
Double.nan
. , NaN
JSON, JSONSerialization , . do { let a = [Double.nan] let data = try JSONSerialization.data(withJSONObject: a, options: []) } catch let e { } SIGABRT *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: 'Invalid number value (NaN) in JSON write'
[1
, {"a":
, " ". #199 , !"0e1"
, #198 , .["
\
. #206 .:cntlr:
. [\x00-\x1F\x7F]
. JSON 0x7F DEL
. 00 nul 01 soh 02 stx 03 etx 04 eot 05 enq 06 ack 07 bel 08 bs 09 ht 0a nl 0b vt 0c np 0d cr 0e so 0f si 10 dle 11 dc1 12 dc2 13 dc3 14 dc4 15 nak 16 syn 17 etb 18 can 19 em 1a sub 1b esc 1c fs 1d gs 1e rs 1f us 20 sp 21 ! 22 " 23 # 24 $ 25 % 26 & 27 ' 28 ( 29 ) 2a * 2b + 2c , 2d — 2e . 2f / 30 0 31 1 32 2 33 3 34 4 35 5 36 6 37 7 38 8 39 9 3a : 3b ; 3c < 3d = 3e > 3f ? 40 @ 41 A 42 B 43 C 44 D 45 E 46 F 47 G 48 H 49 I 4a J 4b K 4c L 4d M 4e N 4f O 50 P 51 Q 52 R 53 S 54 T 55 U 56 V 57 W 58 X 59 Y 5a Z 5b [ 5c \ 5d ] 5e ^ 5f _ 60 ` 61 a 62 b 63 c 64 d 65 e 66 f 67 g 68 h 69 i 6a j 6b k 6c l 6d m 6e n 6f o 70 p 71 q 72 r 73 s 74 t 75 u 76 v 77 w 78 x 79 y 7a z 7b { 7c | 7d } 7e ~ 7f del
["7F"]
. . JSON.sh 10 000 [. . $ python -c "print('['*100000)" | ./JSON.sh ./JSON.sh: line 206: 40694 Done tokenize 40695 Segmentation fault: 11 | parse
NaN
-Infinity
. , parse_constant , , . , . def f_parse_constant(o): raise ValueError o = json.loads(data, parse_constant=f_parse_constant)
JSON_Checker — pushdown automaton , JSON-. . JSON_Checker JSON-.
[1.]
, [0.e1]
, JSON.[0e1]
, JSON-. , - 0e1
.ZE
, 0
, E1
e
E
. , .0.
, . , 1.
, .FR
, . , FR
F0
frac0
. 1.
.[1.]
. JSON_Checker? , json.org. JSON_VALIDATOR_RE = /( # define subtypes and build up the json syntax, BNF-grammar-style # The {0} is a hack to simply define them as named groups here but not match on them yet # I added some atomic grouping to prevent catastrophic backtracking on invalid inputs (?<number> -?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?){0} (?<boolean> true | false | null ){0} (?<string> " (?>[^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " ){0} (?<array> \[ (?> \g<json> (?: , \g<json> )* )? \s* \] ){0} (?<pair> \s* \g<string> \s* : \g<json> ){0} (?<object> \{ (?> \g<pair> (?: , \g<pair> )* )? \s* \} ){0} (?<json> \s* (?> \g<number> | \g<boolean> | \g<string> | \g<array> | \g<object> ) \s* ){0} ) \A \g<json> \Z /uix
["\u002c"]
["\\a"]
[True]
["09"]
JSON- JSON- .
"\uDEAD"
), ? - ? RFC 7159 .0.00000000000000000000001
-0
? , ? RFC 7159 0 –0. , .{"a":1,"a":2}
)? ( {"a":1,"a":1}
)? ? Unicode-, NFC? RFC .1.000000000000000005
1.0
, Rust 1.12.0 / json 0.10.2 1.000000000000000005
1E-999
(double) 0.0
, Freddy "1E-999"
. Swift Apple JSONSerializattion Obj-C JSONKit .10000000000000000999
(Swift Apple JSONSerialization), unsigned long long (Objective-C JSONKit) (Swift Freddy). , cJSON , 10000000000000002048
( )."C3A9:"NFC"
, "65CC81":"NFD"
} NFC- NFD- "Ă©". , Apple JSONSerialization Freddy, .{"a":1,"a":2}
{"a":2}
(Freddy, SBJSON, Go, Python, JavaScript, Ruby, Rust, Lua dksjon), {"a":1}
(Obj-C Apple NSJSONSerialization, Swift Apple JSONSerialization, Swift Freddy) {"a":1,"a":2}
(cJSON, R, Lua JSON).{"a":1,"a":1}
{"a":1}
, cJSON, R Lua JSON {"a":1,"a":1}
.{"a":0,"a":-0}
{"a":0}
, {"a":-0}
(Obj-C JSONKit, Go, JavaScript, Lua) {"a":0, "a":0}
(cJSON, R).["A\u0000B"]
u- 0x00 NUL
, C-. (gracefully), JSONKit cJSON . , Freddy ["A"]
( 0x00).["\uD800"]
u- U+D800
, UTF-16. , JSON. Python ["\uD800"]
. Go JavaScript " " U+FFFD REPLACEMENT CHARACTER ["EFBFBD"]
, R rjson Lua dkjson UTF-8 ["EDA080"]
. R jsonlite Lua JSON 20160728.17 ["?"]
.["EDA080"]
U+D800
, UTF-16, . UTF-8 (. 2.5. — Unicode- ). , cJSON, R rjson jsonlite, Lua JSON, Lua dkjson Ruby, ["EDA080"]
. Go JavaScript ["EFBFBDEFBFBDEFBFBD"]
, ( ). Python 2 Unicode- ["\ud800"]
, Python 3 UnicodeDecodeError
.["\uD800\uD800"]
. R jsonlite ["\U00010000"]
, Ruby- — ["F0908080"]
. var p = STJSONParser(data: data) do { let o = try p.parse() print(o) } catch let e { print(e) }
var p = STJSON(data:data, maxParserDepth:1024, options:[.useUnicodeReplacementCharacter])
y_string_utf16.json
. , , , STJSON UTF-8 , , , , . STJSON , UTF-16 UTF-32.[0e1]
( 4.24 ), , , . ( ) , , , .Source: https://habr.com/ru/post/314014/
All Articles