The importance of controlling the output of the serializing API

In this article, the author examines the issue of changing the presentation of data and objects, and shoals, which inevitably follow this change. He proposes in such cases to use serializers, which can convert one presentation format to another without loss. This allows not only to transfer data over the network and save to files. When you have a customized serialization library, you can change the data store at your convenience and without prejudice to the project. It also becomes easy to return answers in the form in which it is convenient for the requesting party to receive.

Disclaimer Just started reading habrahabr.ru/post/260975 . Translation is very clumsy, careless. I began to write to the author more familiar to our ear translation options for some phrases in order to help him fix it all, but then I realized that while reading I stumble on every third phrase. I decided to cut down my entire version. Although it is not entirely fair, it is also not very human to make people read such a level. I would even say, very machine-like. Well, let the community judge. The text is really difficult, I will leave my comments in italics. I translate without looking into the translation of drondo . If there is a match - I will not change. Still, I translate English words into the most similar Russian words, like drondo , but the Devil is in the details. On typos please report in a personal, I'll fix everything in the morning. Thank.

For a year, I spoke a thousand times about dangerous places in the API . Only in 2015 it was on:

')
The part of the report, which I call “adding the presentation layer to the data”, where I speak about serialization, received a great response.

MSDN says it like this:

Serialization is the process of converting an object into a stream of bytes in order to save an object or transfer it to memory, a database or a file. Its main purpose is to preserve the state of the object in order to be able to recreate it, if necessary. The reverse process is called deserialization.
Source: MSDN Programming Concepts

PHP developers often use serialization () as serialization . Yes, this is one of the options, but not the only one. Another common way to serialize is of course the json_encode () function. Modern frameworks automatically convert to JSON any array that returns from a controller method. So you yourself do not even have to call json_encode ().

This is very convenient, but if you create an HTTP API (AJAX / RESTful / Hypermedia), then you need to be extremely careful with what you derive.

Here you have the most frequent rake:

<?php class PlaceController extends CoreController { public function show($id) { return Place::find($id); } }

I apologize for the extremely simplified example, the point is that we are talking about a model (possibly using ORM) and about returning the result.

It looks very nice, but in the end there will be a lot of problems.

Enough secrets

Each method that you add to your data warehouse will end up responding to the calling API. If this is an internal API, it is still not so scary. But if this information slips somewhere in the browser, or in some other external source such as a mobile device, then everything will be very bad.

The most obvious example is user passwords. Of course, they are encrypted, but you obviously do not want to give them to strangers. Leakage of things that are not so thrown into consideration, such as password reset tokens, can also lead to hacking your users.

But things can happen even more inconspicuously. In the example of “place” (comment: PlaceController) from a business process, it may be necessary to add a “contact email” for internal use. If you have spent months of work and have established unique connections (note: in business) , you will not want these email addresses to flow to your competitors.

Yes, in many ORM you can specify a list of visible or hidden properties. But over time, the chances of hiding all the important parameters melt away. Especially if you got a junior, which is not yet aware that one of these fields is super secret. A tired examiner with a hard working day will allow the whole case to leak unnoticed.

Here is one example from Fractal — my PHP serialization library, which I did to simplify serialization in my APIs:

 <?php use League\Fractal\Manager; use League\Fractal\Resource\Collection; // Create a top level instance somewhere $fractal = new Manager(); // Ask the ORM for 10 books $books = Book::take(10)->get(); // Turn this collection $resource = new Collection($books, function(array $book) { return [ 'id' => (int) $book->id, 'title' => $book->title, 'year' => (int) $book->yr, 'author' => [ 'name' => $book->author_name, 'email' => $book->author_email, ], 'links' => [ [ 'rel' => 'self', 'uri' => '/books/'.$book->id, ] ] ]; });

Yes, everything is extremely simplified here again, and callbacks are used instead of classes, but the basic idea is clear.

The same tools will be found for every language in the world. I worked with ActiveModel Serializer . He is almost the same.

Regardless of the language you use this week, I would like to explain to you why this is so important. How to do this, you can find out for yourself. And this article is about why.

Attribute Data Types

When it comes to data binding drivers, many languages, including PHP, are not very smart. Such pieces like MySQL or PostgreSQL have all sorts of data types: integers, floats, booleans, etc. But what comes to the user is always a string.

Instead of true and false, you see "1" and "0", or even "t" and "f". The floats from -41.235 turn into "-41.235".

For a language with weak typing, this may not seem very important. But languages with strict typing will fly out if such a change happens. A couple of times I saw a string with a number turning into an integer due to some kind of math in the ORM accessory , in which “1” + “2” = 3. Such a conversion can in principle pass your unit tests if they are not clear enough . But your iOS app will just blow your head off.

Rails ActiveRecord will track which field the data type should have when they add to the schema through migrations. But if something changes - in accessors, or when changing the type of scheme - then expect trouble.

If you use serialization as in the example above, then you will be able to cast the type on your data, and when outputting, be sure that this is the very type, and it will change only if you change it yourself in the serializer.

Field name change

Renaming fields in the data warehouse should not break your API. If it seems painful for you to update all your tests, imagine what mobile application developers and other front-end teams need to update and deploy new applications. Maybe you even forgot the lock-step (comment: you need help with the translation of this case. The nearest Russian word, no matter how ridiculous, turns out to be "parallelized") is deployed. If yes, then your users will end up with non-working applications, which, even if you release an update on iOS, will remain non-working until they update them.

All you need to get around this hell of changing the field name is the serialization layer, which allows you to update the link to the field without changing its external representation.

Various data storages

Many of these ORM serialization methods have one very important assumption: all your data lives in the same repository all the time. And what will happen if some of your data leaves SQL and moves to Redis or where?

Even if you did not transfer part of the data from SQL to Redis, maybe you divided your table into two? Or began to use pivot tables? Most ORM serializers just ~~crap~~ if you try this. (comment Per .: if anyone is interested, then there was “landed on his face” - will land on their face. Since a considerable amount of cruelty and brutality is embedded in this expression, I decided to translate this into a very appropriate word.)

Or maybe instead use the “repository” pattern that has become so popular in Laravel? You can transfer all data from anywhere, where it is stored, to the serializer, and the serializer will take care of the consistency of the result.

Versions of serializers

Once I had serializers of different versions. FooSerializer v1 and v2 can both exist, each with its own tests, and can satisfy the various needs of API clients. (comment. I did not understand what it is for. Maybe v1 and v2 are so different that they are perceived as independent products and exist in parallel)

Formats of serializers

The fact that Fractal has not yet succeeded, but it is planned to be fixed to v1.0 - these are various “adapters” of formats. This has already been done in the Rails community - it was possible to send different headers and receive different response formats.

By sending a mime type, you can tell the serializer in which format to send the answer, without littering all your code with potentially complex logic.

Solutions

I could give the reasons for the days on the flight, but today my plane sat down at five in the morning, and the guys in the adjacent chairs did not let me sleep all night.

I covered the topic of why in serialization, but not how. I recommend looking at these solutions:

I heard a great story on RailsConf 2015 by my new friend João Moura , which was called AMS Love Story, API, Rails, and development , which reveals some cool functionality.

Whichever system you choose, there’s about one idea everywhere.

If you are doing an API of any kind - please use this.

The API is not just a proxy for SQL commands. It needs to be planned, thought out and carefully maintained. And then just changing the data storage location will not ruin all your network applications and services.

Source: https://habr.com/ru/post/261019/

All Articles