📜 ⬆️ ⬇️

JSON Schema and its use for validating JSON documents in C ++

This article describes the JSON Schema standard and its use to verify compliance with a given format in C ++ using the valijson library tools .

A bit of history

First, let us recall what led to the widespread displacement of JSON-ohm XML, and that this was bad. XML was originally created as a metalanguage markup of documents, allowing the use of a unified parser code and validator documents. Being the first standard of this kind, and even came during the period of rapid introduction of digital corporate information systems, XML served as the basis for countless standards for data serialization and interaction protocols, i.e. storage and transmission of structured data. Whereas it was created primarily for marking documents.

Being developed by committees , the XML standard has been supplemented with many extensions that allow, in particular, to avoid name conflicts and perform complex queries in XML documents. And, most importantly, since the resulting jumble of tags turned out to be completely unreadable by any person, the XML Schema standard was developed and widely implemented, which makes it possible to completely strictly describe the valid content of each document on the same XML for subsequent automatic verification.

Meanwhile, more and more developers under the influence of emerging interactive web-technologies began to get acquainted with the JavaScript language, and they began to realize that it’s not necessary to study hundreds of pages of XML specifications to represent structured objects in text form. And when Douglas Crockford proposed to standardize a subset of JavaScript to serialize objects (but not markup documents!) Without reference to language, the idea was supported by the community. Currently, JSON is one of two (along with XML) languages ​​supported by all popular programming technologies. The same YAML, designed to make JSON more convenient and human-readable, due to its complexity (i.e., its breadth of possibilities) is not so widespread (in my company there were problems with working with YAML from MATLAB not so long ago, whereas with JSON everything is fine) .
')
So, having started to use JSON for data presentation in large quantities, the developers were faced with the need to manually check the contents of the documents, each time re-inventing the validation logic in each language. People familiar with the XML Schema, it could not enrage. And gradually, a similar standard JSON Schema did form and lives at http://json-schema.org/ .

JSON Schema

Consider an example of a simple, but exponential, scheme defining a dictionary of 2D or 3D geometric points in the space (-1, 1) x (-1, 1) x (-1, 1) with keys consisting of numbers:

{ "type": "object", "patternProperties": { "^[0-9]+$": { "type": "object", "properties": { "value": { "type": "number", "minimum": 0 } "x": { "$ref": "#/definitions/point_coord" }, "y": { "$ref": "#/definitions/point_coord" }, "z": { "$ref": "#/definitions/point_coord" } }, "required": ["value", "x", "y"] } } "additionalProperties": false, "definitions": { "point_coord": { "type": "number", "minimum": -1, "maximum": 1 } } } 

If Crockford is forgiven for annoying quotes, it should be clear from this document that we agree to deal with an object (dictionary), the keys of which must consist of numbers (see regular expression), whose values ​​must have fields x, y, value, and the field z, where value is a non-negative number, and x, y, z all have a certain type of point_coord, corresponding to a number from -1 to +1. Even assuming that JSON Schema does not provide other capabilities (which is far from the truth), this should be enough for many usage scenarios.

But this is the case if a validator is implemented for your language / platform. In the case of XML, this question could hardly arise.

On the http://json-schema.org/ site you can find a list of validation software . And here in this place the immaturity of JSON-Schema (and its site) makes itself felt. For C ++, there is one (seemingly interesting) library libvariant that is validated only part-time and also released under the malicious LGPL license (goodbye, iOS). For C, we also have one option , and also under the LGPL.

However, an acceptable solution exists and is called valijson . This library has everything we need (scheme validation and BSD license), and even more, independence from the JSON parser. Valijson allows you to use any json parser through an adapter (bundled adapters for jsoncpp, json11, rapidjson, picojson and boost :: property_tree), thus not requiring you to switch to a new json library (or drag another one along). Plus, it consists only of header files (header only) and does not require compilation. The obvious minus is only one, and that is not for everyone - dependence on boost. Although there is hope for deliverance even from this shortcoming.

Let us consider, by the example of a document, the creation of a JSON schema and the validation of this document.

Scheme Example

Suppose we have a table of some striped objects for which a specific striped coloring is given (in the form of a sequence of 0 and 1 corresponding to black and white).

 { "0inv": { "width": 0.11, "stripe_length": 0.15, "code": "101101101110" }, "0": { "width": 0.05, "stripe_length": 0.11, "code": "010010010001" }, "3": { "width": 0.05, "stripe_length": 0.11, "code": "010010110001" }, ... } 

Here we have a dictionary with numeric keys, to which the “inv” suffix can be assigned (for inverted bar codes). All values ​​in the dictionary are objects and must have the fields “width”, “stripe_length” (strictly positive numbers) and “code” (a string of zeroes and units of length 12).

We start to make a scheme, specifying restrictions on the format of the names of top-level fields:

 { "comment": "Schema for the striped object specification file", "type": "object", "patternProperties": { "^[0-9]+(inv)?$": { } }, "additionalProperties": false } 

Here we used the patternProperties construct, which permits / specifies values ​​whose keys satisfy the regular expression. We also specified (additionalProperties = false) that unspecified keys are prohibited. Using additionalProperties, you can not only enable or disable unspecified fields, but also impose restrictions on their values ​​by specifying a type specifier as a value, like so:

 { "additionalProperties": { "type": "string", "pattern": "^Comment: .*$" } } 

Next, we describe the type of the value of each object in the dictionary:

 { "type": "object", "properties": { "width": { "type": "number", "minimum": 0, "exclusiveMinimum": true }, "stripe_length": { "type": "number", "minimum": 0, "exclusiveMinimum": true }, "code": { "type": "string", "pattern": "^[01]{12}$" } }, "required": ["width", "stripe_length", "code"] } 

Here we explicitly enumerate the allowed fields (properties), requiring their presence (required), not prohibiting (by default) any additional properties. Our numeric properties are strictly positive, and the code string must match the regular expression.

In principle, it remains only to insert the description of the type of an individual object into the above-described table schema. But before you do this, we note that we have duplicated the specification of the fields "width" and "stripe_length". In the real code from which the example was taken, there are even more such fields, so it would be useful to determine this type once and then refer to it as an attribute. This is what the link mechanism ($ ref) is for. Pay attention to the definitions section in the final scheme:

 { "comment": "Schema for the striped object specification file", "type": "object", "patternProperties": { "^[0-9]+(inv)?$": { "type": "object", "properties": { "width": { "$ref": "#/definitions/positive_number" }, "stripe_length": { "$ref": "#/definitions/positive_number" }, "code": { "type": "string", "pattern": "^[01]{12}$" } }, "required": ["width", "stripe_length", "code"] } }, "additionalProperties": false, "definitions": { "positive_number": { "type": "number", "minimum": 0, "exclusiveMinimum": true } } } 

Save it to a file and start writing a validator.

Application valijson

As a json-parser we use jsoncpp . We have the usual function of loading a json document from a file:

 #include <json-cpp/json.h> Json::Value load_document(std::string const& filename) { Json::Value root; Json::Reader reader; std::ifstream ifs(filename, std::ifstream::binary); if (!reader.parse(ifs, root, false)) throw std::runtime_error("Unable to parse " + filename + ": " + reader.getFormatedErrorMessages()); return root; } 

The minimum validator function telling us about the location of all validation errors looks like this:

 #include <valijson/adapters/jsoncpp_adapter.hpp> #include <valijson/schema.hpp> #include <valijson/schema_parser.hpp> #include <valijson/validation_results.hpp> #include <valijson/validator.hpp> void validate_json(Json::Value const& root, Json::Value const& schema_js) { using valijson::Schema; using valijson::SchemaParser; using valijson::Validator; using valijson::ValidationResults; using valijson::adapters::JsonCppAdapter; JsonCppAdapter doc(root); JsonCppAdapter schema_doc(schema_js); SchemaParser parser(SchemaParser::kDraft4); Schema schema; parser.populateSchema(schema_doc, schema); Validator validator(schema); validator.setStrict(false); ValidationResults results; if (!validator.validate(doc, &results)) { std::stringstream err_oss; err_oss << "Validation failed." << std::endl; ValidationResults::Error error; int error_num = 1; while (results.popError(error)) { std::string context; std::vector<std::string>::iterator itr = error.context.begin(); for (; itr != error.context.end(); itr++) context += *itr; err_oss << "Error #" << error_num << std::endl << " context: " << context << std::endl << " desc: " << error.description << std::endl; ++error_num; } throw std::runtime_error(err_oss.str()); } } 

Note that in this example, jsoncpp connects as #include <json-cpp/json.h> , while valijson/adapters/jsoncpp_adapter.hpp in the current version of valijson assumes that jsoncpp connects as #include <json/json.h> . So do not be surprised if the compiler does not find json/json.h , and just correct valijson/adapters/jsoncpp_adapter.hpp .

Now we can upload and validate documents:

 Json::Value const doc = load_document("/path/to/document.json"); Json::Value const schema = load_document("/path/to/schema.json"); try { validate_json(doc, schema); ... return 0; } catch (std::exception const& e) { std::cerr << "Exception: " << e.what() << std::endl; return 1; } 

Everything, we learned to validate json-documents. But note that now we have to think about where to store the schemes! After all, if a document changes every time and is obtained, for example, from a web request or from a command line argument, the schema is unchanged and should be supplied with the application. And for small programs without a developed mechanism for loading static resources, the need to introduce one represents a significant barrier to the introduction of validation through schemes. It would be great to compile the schema with the program, because changing the schema will in any case require changing the code processing the document.

This is possible and even quite convenient if we have C ++ 11 at our disposal. The solution is primitive, but it works great: we just define a string constant with our schema. And in order not to take care of the quotes inside the string, we use raw string literal :

 //   R"(raw string)" static std::string const MY_SCHEMA = R"({ "comment": "Schema for pole json specification", "type": "object", "patternProperties": { "^[0-9]+(inv)?$": { ... ... } } ... })"; //  json   Json::Value json_from_string(std::string const& str); { Json::Reader reader; std::stringstream schema_stream(str); Json::Value doc; if (!reader.parse(schema_stream, doc, false)) throw std::runtime_error("Unable to parse the embedded schema: " + reader.getFormatedErrorMessages()); return doc; } //    doc (validate_json  ) validate_json(doc, json_from_string(MY_SCHEMA)); 

Thus, we have a convenient cross-platform cross-language validation mechanism for json documents, the use of which in C ++ does not require linking external libraries with inconvenient licenses, nor fiddling with ways to static resources. This thing can really save a lot of power, and, importantly, help to finally kill XML as an object presentation format, because it is inconvenient for both people and machines.

Source: https://habr.com/ru/post/276305/


All Articles