Using JSON-Schema in API Testing and Documentation

Reference API 2GIS has been developed for 4 years. There are about 40 methods that return fairly large and hierarchically complex structures in JSON and XML. Recently I decided to share my experience and speak at the DevConf conference.
One of the topics of the report caused the greatest interest among participants - the use of JSON-Schema when testing the format of issuing API. In this article I will tell you what problems this approach solves, what restrictions it has, what you get out of the box, and what comes with a bonus. Go!

JSON Schema is a JSON equivalent of the XML Schema format. Its essence is that it in the declarative form sets the structure of the document. Nothing prevents JSON schemas from being used for testing purposes.

It would seem that there may be difficulties in testing API? This is not some sophisticated UI. Just think, execute the query and compare the result with the expected.
Frustrating cumbersome hierarchical structures that may be in queries or responses. When there are parameters in the region of 50, it is extremely problematic to take into account all possible characteristic cases.
')
In addition to the fact that the API must fulfill the duties assigned to it, you expect reasonableness and consistency from it. Including this refers to the format. It would be desirable, that numbers always were given, as numbers, instead of lines. What if the array is empty, it should not be null, or be completely absent in the answer, and let it be an empty array. This, of course, little things, but API users are very unpleasant to stumble about them. Plus it can lead to hidden errors. So the rigor of the format should be given attention.

In general, testing the structure and format is not a tricky task. For each method, you can describe the structure of the request and response using the JSON-Schema format.
As they say, everything is already invented before us. It should just correctly use this.

API format testing

Request and Reply

We will consider testing the format of responses API. It’s just closer to us, because Our API is focused on reading data only. But when the API is able to receive data in the form of complex objects, the fundamental approach remains the same as in the case of reading.

JSON-Schema

Format

So, help us in our difficult task JSON-Schema. A brief introduction to the format itself is already in place . Therefore, we restrict ourselves to a trivial example. Take a JSON object:

{ "a":10, "b":"boo" }

To set its structure, type of parameters, mandatory, it is enough to replace the values of the parameters with special objects. We look:

 { "a": { "type": "number", "required": true }, "b": { "type": "string", "required": true } }

Agree, quite clearly. By and large, JSON-Schema aims to be an analogue of the XML Schema format.

PHP implementation

Of course, it is not necessary to use JSON-Schema when describing the response. The format is relatively young, not completely settled down. Not so many libraries and tools as for the XML Schema format. But for so many developers, the JSON format is much clearer. And if we talk about PHP, then JSON is almost native to it thanks to the json_encode and json_decode functions.

JSON-Schema format implementations are available for different languages . As you can see, the choice is not great. For PHP, there are two libraries from two institutions: MIT and Berkeley .
At the time of this writing, the last committee at MIT was February 2012, and at Berkeley, June 2013. So, a quick review prompted Berkeley to choose.

As a fly in the ointment, it can be noted that for objects in the response there can be not only the fields described in the diagram, but also completely left ones. And the answer will be completely valid, which in most cases is not acceptable. This is treated, in my opinion, by a misunderstanding by a small preprocessor, which explicitly sets the special property AdditionalProperties to false by default.

Validation

It does not make much sense to include validation of API responses in combat, extra overhead. But when we run tests - this is it. To switch modes, a flag in the application configuration file is sufficient.
If we talk about a specific Berkeley library, then an example of validation from tests:

 $validator = new Validator(); $validator->check(json_decode($input), json_decode($schema)); //  $input    $schema $this->assertTrue($validator->isValid(), print_r($validator->getErrors(), true));

It is even intuitively clear what is happening.

It should be noted that there is no need to write tests specifically for testing JSON schema. We believe that functional tests have already been written to the API. This means that if we turn on filtering, then when running tests automatically in the background, the response format will be checked. As if between times. No special efforts except, in fact, writing schemes do not need to be applied.

Yes, and a small addition: for performance tests, validation should be disabled so that there are no “pickups”. And on the developers' machines, on the contrary, turn it on in order to quickly detect a format violation. The question is - who should write JSON schemes? We write the developers, it is more convenient. Schemes live in the project code, and with any changes in the format, the developer immediately corrects the scheme.

Documentation

We dealt with testing, we connect to unit tests and at the output we get confidence in the correctness of the format. But there is another big bonus when using the JSON scheme to which we arrived - this is documenting the API.

The problem with the documentation is one - her, damn, you need to write API documentation is its specification, all nuances should be reflected in it. Well, how else? The API does not exist by itself, it exists only so that its clients can work with it. And the less surprises you expect from him, the clearer it is and users can use it faster and more comfortably. Perhaps the only case where the documentation can save, if the developers of the API - at the same time its users. And this is not always the case. Therefore, the documentation would be better. Well, if the total number of parameters is hundreds, how to keep the documentation up to date? Not only do we write code with tests, we also have to keep track of the documentation. In short, plus one “smut”, of which there is enough without documentation.

Versioning

So, what about document versioning? The fact is that we, like many, have adopted an approach - each feature is made in a separate branch in git.
Well, if each branch would have its own version of the documentation. So it's easier for everyone: both the developer and the tester. I made the task in the branch, immediately wrote the documentation and “forgot” about the task.

Therefore, it is good when the documentation lives with the code. And in order to make it easier to merge the same document from different tasks, it is logical to store it in a certain text format. You can store it in Markdown format, semantic bb-codes or something else. But in fact it still remains almost a bare text, with different tables, links between them, cross-references, etc. It is not entirely clear how to test the correctness of the documentation. Every time you carefully check everything manually, “you get dirty”. This is the first, and secondly, the errors still remain.

JSON-Schema

Well, all the same:

You can’t completely abandon it, but you can greatly reduce the amount of effort to support it. We already have a JSON-Schema. It already contains all the names of the parameters in the API response, the hierarchical relationships between them, the data types, the binding parameter. Namely, this information is also needed to generate documentation.

Therefore, we decided to cross a bulldog with a rhinoceros and write documentation right in JSON-Schema. In the JSON-Schema format, the description tag is already provided. But it must be a string, according to the specification. And it would be nice to add more examples (the nested examples tag), and some specific options, such as a secret parameter (nested hide tags) and so on. Therefore, the tag object is better suited for this. We have chosen the name of the tag - “meta”. Example:

 "reviews_count": { "type": "number", "required": true, "meta": { "description": " ", "hide": true } }

Representation

Now we set our JSON scheme on spec. parser, and it turns into elegant documentation. While forming the documentation, not only the content of our own “meta” tag is taken into account, but also information from the native tags of the schema.

The specific way of submitting documentation may be different. We preferred ordinary flat tables with links between them. But this, of course, is not ideal. Be that as it may, placing documentation in a JSON scheme does not tie to the final mapping method, which gives more freedom.

For maximum flexibility, the documentation should not be built entirely from the JSON schema. We have a well-established approach when there is a separate text file for each page of the documentation. And already a link to a specific JSON scheme is inserted into it. When the page is assembled, the JSON schema is converted to the final text view. Thus, we solve the problem of placing in the documentation of arbitrary text, examples, and other materials. And at the same time we are not trying everything that can be stuffed into the JSON scheme.

Self test

It turns out that the same JSON schemas are used for both tests and documentation. And this means that JSON-schemes are always relevant and correct, otherwise the tests will fall down. So in the documentation, in principle, there can be no errors associated with the names of the parameters, their types, binding / non-binding, list of acceptable values, hierarchical relationships between the parameters. What, you see, is no longer enough.

Examples

Documentation without examples of use greatly increases the threshold of entry for users of the API. Therefore, they should be added without fail. We organize them as follows. We describe in the documentation a request and a sample response.

But, as it is not difficult to guess, the problem of the relevance of the response text emerges. It will take a week, the format will expand, and what, examples re-make? There is a better way out.

We came to the fact that the results of the queries in the examples of documentation should be dynamic. Here different approaches are possible. Since our API responses are often very large, we show them in the documentation for a click on a specific area. It is at this moment that we execute the request. The simplest scheme, but with a slight delay in receiving data.

If this option is not suitable, you can dynamically perform all requests in the examples with a special command, and fill in the answers in the documentation page. You can do this, for example, before the release.

By the way, since we are still writing examples, they can also be considered a kind of functional tests. Well, really, we collect all requests in the documentation, turn on the JSON scheme and check the validity. At the same time, it is thus possible to identify broken methods or incorrectly written queries.

Empty values

Speaking about format validation, you should decide what to do with parameters with empty values in the response. We have come to the following agreement.

The parameter in any case must be present in the response, even if it contains an empty value. So the response structure is better visible. There is no need to consult the documentation only in order to find out.

An array type parameter returns an empty array. The object is null. For numeric and string parameters, if zero and an empty string are meaningful values, then we return them. For example, the parameter "number of reviews" may well return zero - this is logical. But the parameter “the number of floors of the building” if it returns zero is nonsense. If, say, its number of floors is not known for a certain house, return null.
Where null is possible, and where not, is clearly indicated in the JSON scheme. And that means in the documentation. Your approach may differ, as long as it is consistent across all parameters, otherwise the end users of the API will have a headache.

Conclusion

JSON-Schema saves valuable time for testing and documenting APIs. This is the case when a small amount of effort brings a lot of profit. And the larger the API, the more effort this approach saves.

However, once you have to invest in writing a small toolkit to use JSON-schemes.

In addition to saving time on tests, it is easier to maintain backward compatibility in the API, since API format is clearly expressed in the JSON scheme. A heavy load of documentation becomes easy hand luggage.

Source: https://habr.com/ru/post/186768/

All Articles

Using JSON-Schema in API Testing and Documentation

API format testing

Request and Reply

JSON-Schema

Validation

Documentation

Versioning

JSON-Schema

Representation

Self test

Examples

Empty values

Conclusion

More articles: