Hello! At the recent Superjob IT Meetup, I
talked about how we at Superjob are developing our API for a project with a million people and a bunch of different platforms.
In this article I would like to talk about why we could not dwell on any of the dozens of ready-made solutions, how painful it was to write our own and what awaits you if you decide to repeat our path. All interested in asking under the cat.

Instead of intro
The API history in Superjob began with a harsh XML API. From it we moved to a concise JSON, and later, tired of arguing about what is more correct - {success: true} or {result: true}, implemented the
JSON API . Over time, we abandoned some of its features, agreed on data formats, and wrote our version of the spec, which retained backward compatibility with the original. Exactly on this spec, the latest, third version of our API works, to which we gradually transfer all our services.
')
For our purposes, when most endpoints in the API accept or give away certain objects, the JSON API has proven to be an almost perfect solution. At the heart of this spec are the entities and their connections. Entities are typed, have a fixed set of attributes and relationships, and are inherently very similar to the models with which we are accustomed to working in code. Work with entities is carried out in accordance with the principles of REST - a protocol over HTTP, as, for example, in SOAP or JSON-RPC, no. The format of the request almost completely repeats the format of the response, which greatly facilitates the life of both the server and the client. For example, a typical JSON API response is:
{ "data": { "type": "resume", "id": 100, "attributes": { "position": "" }, "relationships": { "owner": { "data": { "type": "user", "id": 200 } } } }, "included": [ { "type": "user", "id": 200, "attributes": { "name": " " } } ] }
Here we see an entity of type resume, with an owner association on an entity of type user. If the client wanted us to send such an entity, he would put the exact same json in the request body.
The first steps
Initially, the implementation of our API was very naive: the answers of endpoints were formed directly in the actions, the data from the client were obtained using a small add-on over Yii1, on which our server application works, and the documentation lived in a separate file that was filled in by hand.
With the transition to the JSON API, we turned the add-in into a full-fledged framework that controlled the transformation (mapping) of the models in essence, and also managed the transport layer (parsing requests and generating responses).
To mapping the model into essence, it was necessary to describe two additional classes: the DTO for the entity and the hydrator that would fill the DTO with data from the model. This approach made the mapping process quite flexible, but in reality this flexibility turned out to be evil: our hydrators eventually became clouded with copy-paste, and the need for each model to start another 2 classes led to the swelling of our code base.
The transport layer was also far from ideal. The developer was forced to constantly think about the internal structure of the JSON API: as was the case with model mapping, full control over the process led to the need to drag almost identical code from the action to the action.
We began to think about switching to a third-party solution that works with the JSON API. On the JSON API site there is a rather impressive list of implementations of specs in various languages for both the server and the client. There were 18 projects implementing the server part in PHP at the time of writing this article, of which none approached us:
- First, third-party solutions had all the same problems as our own — too much extra code, not enough automation. In some cases, certain requirements were imposed on the models (for example, interface implementation), and with our amount of code this could result in serious refactoring. For requests and responses to work, in any case, we would have to write an adapter linking the chosen solution to Yii.
- Secondly, the overwhelming number of third-party solutions supported one-to-one mapping: you have one model, you can turn it into one entity. This is a normal case, when the data in the models is stored in the form in which you would like to give them to the client, but in reality this is not always the case. For example, the resume model has attributes with contacts, but the client can receive these contacts only under certain conditions. It would be great to bring contacts into a separate entity related to the essence of the resume itself, thus turning one model into several entities, but in third-party decisions this can be done only through crutches.
- Thirdly, we wanted to simplify the development of typical endpoints as much as possible, so that the programmer who is faced with the task of writing an endpoint that selects models from the base and sends them to the client does not have to write the same type of code each time. However, third-party solutions did not offer any integration with DBAL.
- Finally, fourthly, we wanted to simplify the writing of documentation and tests, but third-party solutions for the most part did not provide any information about what attributes and relationships a particular entity has.
The need to start writing your own decision again became obvious :)
Framework Development
After analyzing the shortcomings of our past development and third-party solutions, we formed our own vision of what our new framework should be, which received the very original name Mapper:
- First of all, instead of writing DTO and hydrators, we decided to describe the entire mapping in the config.
- This config, unnoticed by the developer, should have been compiled into PHP code, which, in turn, would be used to hydrate entities.
- All work with the JSON API was to be carried out behind the scenes: for typical endpoints, all work would be reduced to describing business logic and obtaining models.
- Finally, as mentioned above, we wanted to integrate our solution with DBAL, documentation, and tests.
Core
The framework is based on compiled hydrators, that is, objects that fill models and build entities. What knowledge should a hydrator have to cope with the task? First of all, he must know from which models and which entity he will build. He must understand what properties and connections the entity has and how they relate to the properties and connections of the original models.
Let's try to describe the config for such a hydrator. The config format is YAML, which is easy to write, easy to read, and easy to parse (we used
symfony / yaml ).
entities: TestEntity: classes: - TestModel attributes: id: type: integer accessor: '@getId' mutator: '@setId' name: type: string accessor: name mutator: name relations: relatedModel: type: TestEntity2 accessor: relatedModel relatedModels: type: TestEntity3[] accessor: '@getRelatedModels'
Here, the TestEntity entity is collected from the TestModel model. An entity has two attributes: id, which is obtained from the getId getter, and name — from the name property. Also, an entity has two connections: a single relatedModel, which consists of an entity of type TestEntity2, and multiple relatedModels, which consists of entities of TestEntity3.
Compiled from this configuration hydrator is as follows:
class TestEntityHydrator extends Hydrator { public static function getName(): string { return 'TestEntity'; } protected function getClasses(): array { return [Method::DEFAULT_ALIAS => TestModel::class]; } protected function buildAttributes(): array { return [ 'id' => (new CompiledAttribute('id', Type::INTEGER)) ->setAccessor( new MethodCallable( Method::DEFAULT_ALIAS, function (array $modelArray) { return $modelArray[Method::DEFAULT_ALIAS]->getId(); } ) ) ->setMutator( new MethodCallable( Method::DEFAULT_ALIAS, function (array $modelArray, $value) { $modelArray[Method::DEFAULT_ALIAS]->setId($value); } ) ), 'name' => (new CompiledAttribute('name', Type::STRING)) ->setAccessor( new MethodCallable( Method::DEFAULT_ALIAS, function (array $modelArray) { return $modelArray[Method::DEFAULT_ALIAS]->name; } ) ) ->setMutator( new MethodCallable( Method::DEFAULT_ALIAS, function (array $modelArray, $value) { $modelArray[Method::DEFAULT_ALIAS]->name = $value; } ) ) ->setRequired(false), ]; } protected function buildRelations(): array { return [ 'relatedModel' => (new CompiledRelation('relatedModel', TestEntity2Hydrator::getName()))->setAccessor( new MethodCallable( Method::DEFAULT_ALIAS, function (array $modelArray) { return $modelArray[Method::DEFAULT_ALIAS]->relatedModel; } ) ), 'relatedModels' => (new CompiledRelation('relatedModels', TestEntity3Hydrator::getName()))->setAccessor( new MethodCallable( Method::DEFAULT_ALIAS, function (array $modelArray) { return $modelArray[Method::DEFAULT_ALIAS]->getRelatedModels(); } ) )->setMultiple(true), ]; } }
All this monstrous code, in fact, only describes the data that is in essence. Agree, to write this by hand, and even for each entity that is in the project, it would not be great at all.
In order for everything described above to work, we needed to implement three services: a config parser, a validator and a compiler.
The parser was engaged in following the changes to the
config (symfony / config helped us with this) and, if such changes were found, reread all the
config files, merged them and passed them to the validator.
The validator checked the correctness of the config: first, the compliance with the json schema was checked, which we described for our config (here we used
justinrainbow / json-schema ), and then all the mentioned classes, their properties and methods were checked for existence.
Finally, the compiler took the validated config and collected PHP code from it.
DBAL Integration
For historical reasons, there are two DBALs in our project together: the standard for Yii1 ActiveRecord and Doctrine, and we wanted to make friends our framework with both. By integration, it was understood that Mapper would be able to independently both receive data from the database and save it.
To achieve this, we first needed to make small changes to the config. Since in the general case the name of the connection in the model may differ from the name of the getter or the property that returns this connection (this is especially true for Doctrine), we needed to be able to tell the Mapper what name this or that DBAL connection is known by. To do this, we added the internalName parameter to the link description. Later, the same internalName appeared in the attributes, so that Mapper could independently perform selections by fields.
In addition to internalName, we added to the config knowledge about which DBAL the entity belongs to: the adapter parameter specified the name of the service that implemented the interface that allows Mapper to interact with DBAL.
The interface had the following form:
interface IDbAdapter { public function statementByContext(string $className, $context, array $relationNames): IDbStatement; public function statementByAttributes(string $className, array $attributes, array $relationNames): IDbStatement; public function create(string $className); public function save($model); public function link($parent, $child, string $relationName); public function unlink($parent, $child, string $relationName); }
In order to simplify interaction with DBAL, we introduced the concept of context. The context is a certain object, having received which, DBAL should understand, it should fulfill what request. In the case of ActiveRecord, CDbCriteria is used as the context, for Doctrine - QueryBuilder.
For each DBAL, we wrote our adapter implementing the IDbAdapter. There were no surprises: for example, it turned out that during the entire existence of Yii1 not a single extension was written that would support the preservation of all kinds of connections — I had to write my own wrapper.
Documentation and tests
We use
Behat for integration tests and
Swagger for documentation. Both tools natively support JSON Schema, which allowed us to integrate Mapper support into them without any problems.
Tests for Behat are written in the Gherkin language. Each test is a sequence of steps, and each step is a sentence in a natural language.
We added steps that integrated JSON API and Mapper support into Behat:
# When I have entity "resume" And I have entity attributes: | name | value | | profession | | # And I have entity relationship "owner" with data: | name | value | | id | 100 | # , resume Then I send entity via "POST" to "/resume/" and get entity "resume"
In this test, we create the summary entity, fill in its attributes and relationships, send the request and validate the answer. At the same time, the whole routine is automated: we don’t need to compose the request body, since our helpers for Behat do this, we don’t need to describe the JSON Schema of the expected response, since it will be generated by the Mapper.
The situation is somewhat more interesting with documentation. The JSON Schema files for Swagger were originally generated on the fly from YAML sources: as mentioned, YAML is much simpler to write than the same JSON, but Swagger only understands JSON. We have added this mechanism so that the final JSON Schema includes not only the contents of the YAML files, but also the descriptions of the entities from the mapper. So, for example, we taught Swagger to understand links like:
$ref: '#mapper:resume'
Or:
$ref: '#mapper:resume.collection.response'
And Swagger rendered the resume entity object or the entire server response object with the collection of resume entities, respectively. Thanks to such links, as soon as Mapper's config was changed, the documentation was automatically updated.
findings
With a lot of effort, we made a tool that significantly simplified the lives of developers. To create trivial endpoints, it is now enough to describe the entity in the config file and add a couple of lines of code. Automating the routine of writing tests and documentation allowed us to save time on developing new endpoints, and Mapper’s flexible architecture made it possible to easily extend its functionality when we needed it.
The time has come to answer the main question I voiced at the beginning of the article - what did it cost us to make our bike? And do you need to make your own?
The intensive development phase of Mapper took us about three months. We still continue to add new features to it, but in a much less intensive mode. In general, we are satisfied with the result: since Mapper was designed taking into account the peculiarities of our project, he copes with the tasks assigned to him much better than any third-party solution.
Should you go our way? If your project is still young and the code base is small, it is quite possible that writing your bike for you would be an unnecessary waste of time, and the best choice would be to integrate a third-party solution. However, if your code has been written for many years and you are not ready to conduct serious refactoring, then you should definitely think about your own decision. Despite the initial difficulties in the development, it can significantly save you time and energy in the future.