JSON, which you can comment

Not all JSON can not be commented on (for example, Chrome [ium] completely transfers comments to manifest.json), but the standard does not provide comments to it . Therefore, a number of functions in NodeJS do not process comments in the JS format and consider them to be a mistake. Similarly, AJAX with JSON format takes them for an error. Therefore, for configuration files in JSON format there is a lot of inconvenience when trying to use them as human-readable files. Maybe this is sometimes good. If we want to comment, we will be forced to issue a comment under or above the line as “key-value”.

...{... "some-key_comment":"my comment for key and value", "some-key":"some-value", ...}...

But if we do not write comments, following the rigor of the protocols, errors arise because of another factor - forgetting the meaning of the settings when editing by a person.

 ...{... "some-key":"some-value", //- key?? ,  - ! ...}...

We will come up with a JSON-like format with comments in the JS style, so that they can be performed as JS, and, having cleared comments, is read as JSON. (" TL: DR: show me the code. ")

Syr-boron and source

By the way, Douglas Crockford, who arranged it all , explained in 2012 :)

I removed the comments from JSON because I saw people using them to store parsing directives — a practice that would destroy compatibility (of the format). I know that the absence of comments some saddens, but their (comments) should not be.

Suppose you use JSON to store configuration files that are used to comment. Insert any comments as you like. Then pass them through JSMin before running the JSON parser.

He did it on G +, where you can put only "pluses", and comments closed. So, whatever the reaction of society under the explanation, we will see only “pluses” (or look at those who shared this post).
')
And Crockford's quote from another place:

The main reason why I deleted the comments was that there were people who tried to parse the data based on the comments, which completely broke compatibility. I could not control them at all, so the best way out was to delete the comments.

Therefore, we read further by clicking and agreeing to the promise:

“I promise that I will be careful and will not use comments for parsing data!

Everything has been done before us

The fact of the matter is that another parser is required, and so - there is no big problem. And the problem is not limited to this - sometimes you need to slightly change the file, leaving comments. The first part (the parser) was decided, for example, via JSON.minify () . The second and a number of other problems (end commas or none at all, multiline comments, keys and value strings without quotes) were not too lazy to solve in Hjson , spending 750 lines on JS code (with comments for 10-15%).

Stop, is it really necessary?

Undoubtedly, severe programmers (those who write comments as key values in JSON), as well as robots (network and in general) do not need it. They perfectly recode the names of the keys in any familiar config, and programmers - in general, they still have the intellect, allowing them to understand unfamiliar names and build heuristics on their decoding without any computer. The rest, including non-tough programmers, find comments useful and spend time not only on reading them sometimes, but also on writing them. Undoubtedly, Crockford refers to the harsh programmers, and the creators of YAML - no. With this we have to put up and connect the worlds of robots (and sec. P.) And people.

There are also hackers who are perfectly hacker-like, a valid way to write JSON with sequential repetition of identical keys (in JS + "use strict" will give an error). The value of the first key in most parsers will not be saved, so you can use it (and all such, except the last) as comments. The method also suffers from "machine".

 ...{... "some-key":"comments_comments", "some-key":"some-value", ...}...

Total, not all will be for "another format, the 16th in a row." Any attempt to build, and even more so, apply a format converter will lead to ignoring some part of the developer units of this format. In the same individuals, in which strange fragments of different worlds, which have not yet decided on their essence, are combined in a strange way, the n conversion procedures will prove useful. At least at first, as far as turning.

Therefore, this format contains the incompatible, like a computer for a blonde: on the one hand, transformation methods that only robots can use, on the other hand, the results appear in a more human-readable form.

Bridge between robots and humans

You can come up with a plugin for Grunt / Gulp (for example, grunt-strip-json-comments , Gulp ...) to clean up files from comments. But the essence of the action is reduced to a small (up to 1 K) regular expression, which is easier to write in Gruntfile.js, than to enter another plug-in into it. Moreover, the same expression is needed for JS on the client in order to read the same JSON, so we will not run away from its explicit form anyway.

Methods for format conversions are compiled into a jsonComm object that works in a Javascript environment. To solve particular problems, the entire object is not needed - it does not always make sense to take all the methods into the project. For example, the task of simply deleting comments (as in gulp-strip-json-comments) is handled by a method consisting of one regular expression ( jsonComm.unComment () , up to 1 KB of uncompressed code; example further; in the test section of the jsonComm project there are tests and benchmarks to assess the correctness and speed), which even compile is not necessary if there is no goal to apply different settings of the rules.

Settings can be, for example, such. What symbol to mark the beginning of a comment? If in the environment of pure JS there is a confident answer - "//", then Python supporters or YAML will say - "#". Attempts to combine the irreconcilable lead to the settings of the rules and to the converters - thus, from which, among other things, they began. Among the JS adepts, there is no need for settings, and they will burn from the project the mention of "#". Because you can not spend 36 microsecond (micro) to generate regexp for the sake of loyalty to such a heresy. Loyalists will also be burned out, but they will extend the regex and will spend 0.1-0.5 (conditionally) microseconds (micro) not for generation, but for each recoding cycle. For this they are hated by the Puritans. After all, robots think much faster, and they see slowness on a different scale.

What tasks can be solved when commenting JSON?

just read jsonComm format (with comments) in JS or NodeJS, delete comments from them and then verify it as usual JSON in JSON.parse (); the same thing that most projects do to add comments to JSON. It works quickly (tens to hundreds of microseconds).
read not JSON, but JS files (with code) in order to take some constants from the rest of them as settings (for example, in NodeJS), when the JS file will also use them when it is executed elsewhere (on the client) - a kind of template with a simplified structure configurations;
as in the previous paragraph, but I already want to change some settings after reading (for example, in Noda, update the build number or make the config settings), so that further JS on the client, not knowing it, would use them. This is an analogue of the read-write template.

Here - not all conceivable tasks, but a group that is sufficient for simple project configuration. Instead of Yaml with comments or crutches comments in pure JSON - we write jsonComm with the extension * .js, which can either be read as JSON (on the server, when building the project or on the client), or run as JS, having comments in JS or YAML- style.

Tasks are divided into 2 practical cases - when we do not need to edit our jsonComm, and when we need to edit, while leaving all the comments. When only reading occurs (the same is the case of client AJAX), we restrict ourselves to the only jsonComm.unComment () method with one regexp, and further - JSON.parse ().

The case of writing changed values or keys will require a small procedure of parsing a text JsonComm file (with comments, without deleting them) or JS to change the required pointwise. Manipulation is possible for the "* .js" files, if we don’t touch the language codes with scripts, all we need to do is not to err in writing the key values. A second method is added to the required methods: jsonComm.change () .

What other tasks can be solved when commenting on JSON?
Tasks of academic type:

get “valid” access to jsonComm comments, first translating them into “key #” - “comment” pairs, selecting the key foundation from the line it was found near, and then, after parsing from the correct JSON, process them further (for example, transferring to another format);
work with Yaml directly (but lose the basis for validation recognized by the browser / JS environment)
mutual conversion to Yaml and back through the above made valid JSON;
the same for XML; then a cluster of four data description languages will be obtained, 2 of which are recognized in browsers and numerous computing environments.

The peculiarity of these tasks is that there is no practical need for them, but seeing a niche, space is reserved for them (toYaml, toXml, fromYaml, fromXml, to functions; the latter is “in jsonComm”). No comments - such a cluster already exists in the works of other libraries. To join it with comments, you need to start at least with the function of translating jsonComm comments into one of the valid and recognized formats. Obviously, the first candidate is JSON.

The very first acquaintance with commenting methods creates many questions - to which keys to attach comments to the found pair, and which ones after? For example, comments after the comma separator, but standing on the same line, usually refer to the previous pair, therefore the line separator will also affect the separator. Second, multi-line comments can logically refer to different adjacent pairs. Third: what are the comments in the arrays? Their keys are implicitly expressed, and it would be logical to create a lying array next to it. And if it is multidimensional and with rare filling? Fourth: comments on the line can be several; a pair can be stretched to 3 or more lines.

This whole circle is no less contrived, but real problems will require a well-thought-out approach so that the automatic conversion of comments does not lead to a deterioration in the readability of formats. Therefore, we will not hurry with their manufacture. Let us deal better with the formal description of the jsonComm grammar.

JsonComm grammar

The most common pairs in JSON files are written on separate lines:

 <  >"<>": <>

The value is a quoted string or other terms according to all JSON rules. The key is any string, only with special escaping of quotes inside of it. Between elements there can be whitespace characters, and pairs are separated by commas or brackets, which can stand anywhere before or after the pair, including on adjacent lines.

We will have a very similar format (jsonComm), with the difference that there can be 2 types of comments in place of white space characters.

It turns out that it’s more economical to navigate not by lines, but by separators (brackets, commas). Line breaks for multi-line comments do not matter at all, but for single-line comments they are a sign of the end. This will affect which parser will work in the conversion algorithm.

Further, when solving a creative problem on the question of which pair the comments belong to, the location of the separator and the line breaks will again matter. To simply delete comments, it turns out that line breaks are not important (you shouldn’t just remove the end of the line comment of a single-line comment).

With that said, the basic construction of the jsonComm grammar looks like this:

 (("{" | "[" | ",")<>) <-> (<>"<>"<>) <-> (<>(":")<>) <-> (<><><>) <-> | <-> (<><><>) <-> (<>("}" | "]" | ","))

After filtering, everything that is not in brackets is thrown out, and everything that is shown in parentheses remains. With some features, of course, which are not displayed in this simplified scheme (an empty value means a value-structure). The scheme may skip the wrong JSON, may skip any text except comments, for example, a program or a book text. And this is good because during validation, if it is done, JSON is still checked, and if there is no validation, and the text is parsed by the JS compiler, then there is no need to delete comments, the scheme does not work in this mode.

A similar, more complex scheme will be needed to insert comment-values ( jsonComm.comm2json () function). In it from jsonComm of a type

 ... ,"some-key":"some-value" //comments_comments ...}...

create

 ... "some-key#":"comments_comments", "some-key":"some-value", ...}...

or without a line with a comment key. If there are several comments in the text area related to the pair, they are all copied to the value “some-key #”. But if the comment is not met in the region of the pair (in the array, before or after all the brackets), it is ignored. All comment characters must be converted to valid JSON. For example, tabs - in "\ t", "\" - in "\\", ....

How to practice contain JsonComm?

Until now, we could write without problems and plug-ins only JSON with all the stipulations of the absence of comments or with the presence, but in the form of values (or edit JS as text files, or store in the database). Now we will use the jsonComm files that are editable (for NodeJS) and have the extension * .js .

Two practical niches for using JsonComm files were revealed: for reading configurations decorated with comments, for updating configurations, and one academic format converter.

If the files need only be read (client JS, etc.), read them as xhr.responseText in AJAX or as * .js and convert JsonComm to structure objects with validation via JSON.parse ().

If the files need to be modified, then for speed of work we use the search and replace algorithm using unique keys that are not repeated in the file (jsonComm.change ()). Then there is no need to do a parse tree, while at the same time bypassing comments (however, this should not be slow, too, but a separate complex one-pass algorithm).

No problem adding Python style single line comments (#comments_comments). But then the way of reading as a * .js file will not work. The project code has the ability to disable the "#" syntax for comments (at the initial stage of the regexp compilation).

Simple cases when it is needed:

* In the project builder on Grunt / Gulp / ... we calculate the new version number and store it in the same configuration file.

* ibid, in the collector, create project constants based on other build parameters and write them as parameters for JS.

A little more complicated, client-side JS can also acquire the function of recording such files, by sending the result to the server. This will give even more use cases (assembly panel on the client), leaving comments in the file. To do this, it needs to modify and send the line (the image of a multi-line file) to the server, and then write it to a file (of course, with the solution of security issues).

Implementation

In order not to twist cycles for a long time, the transformation is performed in one fell swoop on regular expressions.

Snatching comments

The converter actually works as a loop in rows, methodically biting through comments and skipping valid JSON fragments. It is easy to build comment text recognizers on its base in order to save them into special key values. In this way, we allow comments for further operations, but not to “break compatibility” (if you wish, it is always possible), but for the code with a comment to be more convenient to write worse than the readable expression from 2 pairs “key-comment” and “ key value.

Not

  "_": "", "": "",

, but

  "": "" //

The solution also performs the task of distributing responsibility for the validity of the code. Everything related to comments is controlled visually and with syntax highlighting in the IDE by the developer. The correctness of the rest of JSON parses the standard JSON.parse () parser.

Let's start with the simple. How does parsing on regexp work? Let's try to delete end comments. (The code is not used further, it is for example only.)

('\n'+_).replace(/(^|\r?\n)(([^"\r\n:]*"(\\"|[^"\r\n])*"|[^"\r\n:]*):?([^"\r\n]*"[^"\r\n]*"|[^"\r\n]*)[^\/"\r\n]*?|[^\/"\r\n]*?)(\/\*\*\/|\/\*([\s\S]?(?!\*\/))+.{3}|\/\/[^\r\n]*)/g,'$1$2')

To understand how it is arranged, let's pay attention to the functional parts:

(^|\r?\n) - exciting brackets to display the previous line breaks.
The next bracket and its pair is ... [^\/"\r\n]*?) - the second exciting brackets used for copying.
"(\\"|[^"\r\n])*" - key or quoted string; if there are no quotes, then an alternative from simply

\s*\/\*\*\/|\s*\/\*([\s\S]?(?!\*\/))+.{3} is a multi-line parser.
\/\/[^\r\n]* - single-line comment parser to the end of the line.

With biting comments at the end of the line, this simple expression is fine. Worse, biting out comments with an asterisk between keys and values. You can ignore and not write such comments. Moreover, the "competitor" YAML has only terminal ones. But, having functional parts, it is already possible to construct a more complex expression in order not to impose such restrictions. In this case, it is necessary not only to leave “everything before the comment in the line”, but also between them - the fragments left behind become more complicated. In fact, this is the entire jsonComm.unComment (jsonCommString). This line can be copied to Gruntfile.js instead of connecting a module to clear the JSON string from comments.

jsonCommString.replace(/(?:(?:((?:\{|,)\s*)(?:(?:\s*(?:\/\/|#)[^\r\n]*(\r?\n|$))*(?:\s*\/\*\*\/|\s*\/\*(?:[\s\S]?(?!\*\/))+.{3})*)*(\s*"(?:\\"|[^\r\n"])*"\s*)(?:(?:\s*(?:\/\/|#)[^\r\n]*(\r?\n|$))*(?:\s*\/\*\*\/|\s*\/\*(?:[\s\S]?(?!\*\/))+.{3})*)*(\s*:\s*)(?:(?:\s*(?:\/\/|#)[^\r\n]*(\r?\n|$))*(?:\s*\/\*\*\/|\s*\/\*(?:[\s\S]?(?!\*\/))+.{3})*)*(\s*(?:[0-9.eE+-]+|true|false|null|"(?:\\"|[^\r\n"])*"|(?!:\{|:\[))\s*)(?:(?:\s*(?:\/\/|#)[^\r\n]*(\r?\n|$))*(?:\s*\/\*\*\/|\s*\/\*(?:[\s\S]?(?!\*\/))+.{3})*)*(\s*(?:\}|(?!,))\s*)?)+?|(?:((?:\[|,)\s*)(?:(?:\s*(?:\/\/|#)[^\r\n]*(\r?\n|$))*(?:\s*\/\*\*\/|\s*\/\*(?:[\s\S]?(?!\*\/))+.{3})*)*(\s*(?:[0-9.eE+-]+|true|false|null|"(?:\\"|[^\r\n"])*"|(?!:\{|:\[))\s*)(?:(?:\s*(?:\/\/|#)[^\r\n]*(\r?\n|$))*(?:\s*\/\*\*\/|\s*\/\*(?:[\s\S]?(?!\*\/))+.{3})*)*(\s*(?:\]|(?!,))\s*)?)+?|(?:(?:\s*(?:\/\/|#)[^\r\n]*(\r?\n|$))*(?:\s*\/\*\*\/|\s*\/\*(?:[\s\S]?(?!\*\/))+.{3})*)*\s*)/g,'$1$2$3$4$5$6$7$8$9$10$11$12$13$14')

Non-capturing brackets are widely used here to leave only exciting, for further simplicity, the second argument in .replace (). (Hack-hacking: these lines are best read in an editor with highlighting and highlighting of pair brackets, for example from jetbrains.)

To convert a jsonComm string to JSON, this expression is sufficient. As the benchmarks show, this transformation flies quite well - the execution time is tens to hundreds of micro seconds per page (it depends a lot on the complexity of parsing). It will be worse to deal with an academic script for outputting comments in JSON, when a function is needed in replace ().

So, we received valid JSON, having solved the first part of the task - to read jsonComm.

Then, the parsing of the validity of the remaining code, as intended, is assigned to the standard JSON.parse (), after which we get the data structure in JS. — - , .

, . , «-» , .

, .

jsonComm ( ). . — .
( )
, — , . .
; — null "". , , .
; . ( ).
() , - , . , , ( , ).

, — . , jsonComm.

: «» ( ), — , .

jsonComm.change(h) , h — «»-« ». ( — «»- {« »:« »}.)

, .change() () JSON .unComment(). JS-, (, , JSON), . Those. .change() — .

:

, .

● JSON ( — jsonComm.comm2json ),
● YAML,
● «jsonComm — YAML».
● «jsonComm — XML».

, -, (comm2json) , — . replace, , JSON .

— 30 , . 1 , . « » (Yaml, XML).

, 3 jsonComm, (, ) . — , . Firefox 34 (jsonComm.unComment):

On Chrome in this test - twice the best results.

How to parse comments (jsonComm.comm2json)? Here the replacement works through replace (, function).

— . , (.replace()) . , — , — .

, , — JSON . (.stringify), ( ).

(jsonComm.change)? — , . jCommChanges.

, . «multiline1: {newKey:'newValue'}», , .

jsonComm, ( ). , , JSON. , — , .

, 1 . 3 — 2.1 (src/jsonComm.min.js).

?

1) , , , ( jsonCommTest.htm «jsonWithComm», .comm2json() ). , JSON.stringify , «jsObjWithComm».

2) .

3) . JSON .

4) Grunt, Gulp,….

issues, ( , ).

● , — : jsonComm — (Github) , (.) .
● Json — , rfc-4627 , rfc-7159 ( 2014): .
● " JSON? " SO.
● JSON.minify() () Github .
● grunt- , gulp , broccoli- , strip-json-comments (Github)
● JSON Comments ( )
● JSON (., 2011)
● JSON5 ( )
● Hjson, the Human JSON (Hjson keep my comments when updating a config file.)

Source: https://habr.com/ru/post/247473/

All Articles