The importance of the output serialization API

I talked about API * Lyons once in the last year. A lot of reviews and questions arose at the moment when I talked about serialization, as " adding a presentation layer to your data ."

MSDN says it like this:

Serialization is the process of converting an object into a stream of bytes in order to save an object or transfer it to memory, a database or a file. The main goal is to save the state of the object in order to be able to restore it if necessary. The reverse process is called deserialization.

PHP developers often see the serialization process as using the serialize () function. Yes, this is one of the forms of serialization, but neither does it exist. Another common approach to data serialization is to use the json_encode () function. Today, modern frameworks automatically convert any array returned from the controller's method to JSON, which means that you do not even need to call json_encode () yourself.

This feature is convenient enough, but if you create an HTTP API ( AJAX / RESTful / Hypermedia ), then you need to be more accurate with what you return.
')
The most common violation is the following:

<?php class PlaceController extends CoreController { public function show($id) { return Place::find($id); } }

Sorry for the simplified code, but here we get the model ( possibly using ORM ) and return the result directly.
This seems like a rather innocent act, but it leads to a number of problems.

No hiding

You add all fields to the output when executing the API method. If this is an internal API, then this may be normal, but if such information goes everywhere with the browser, or somehow goes beyond the limits of the mobile device, then you really are having a bad time.

A commonplace example is user passwords. Of course, the data is encrypted, but obviously you do not want them to fall into the wrong hands.
Slightly less noticeable things are password reset tokens, the leakage of which can lead to hacking of users.

But the problem may be more hidden. In the Place example, confidential information may leak out of your business when you are asked to add a contact email intended for internal use only. If you spent months collecting these sites and built a number of unique contacts, you won’t want all the data leaked to potential competitors.

Yes, many ORMs allow you to set the " hidden " and " visible " properties for a black or white list, but with the expiration of time, the probability of retaining all potentially hidden values is reduced to zero, especially if you have a junior who does not know that one of these the fields should be hidden by default, and the fatigue of the code reviewer may allow this problem to penetrate deep into the busy day.

For example, Fractal — the PHP serialization library I created that helps serialize my API applications — is a simple example:

 <?php use League\Fractal\Manager; use League\Fractal\Resource\Collection; // Create a top level instance somewhere $fractal = new Manager(); // Ask the ORM for 10 books $books = Book::take(10)->get(); // Turn this collection $resource = new Collection($books, function(array $book) { return [ 'id' => (int) $book->id, 'title' => $book->title, 'year' => (int) $book->yr, 'author' => [ 'name' => $book->author_name, 'email' => $book->author_email, ], 'links' => [ [ 'rel' => 'self', 'uri' => '/books/'.$book->id, ] ] ]; });

This is again a too simplified example using a callback function instead of classes for logic, but there is a general idea.

Such tools exist for many languages. I worked with the ActiveModel Serializer library, which is almost identical.

Regardless of the language you use this week, I would like to explain why this is so important. You can familiarize yourself with the solutions to this problem later, but the main purpose of this article is to highlight the importance of this problem.

Attribute Data Types

Many programming languages, including PHP, are pretty dumb when it comes to their data binding drivers. Things like MySQL and PostgreSQL have a lot of data types: integer , float , boolean , and so on, but everything that the user gets at the output is a regular string .

Instead of true and false, you will see " 1 " and " 0 ", or maybe even " t " and " f ". The floating-point numbers at the output represent " -41.235 " instead of -41.235.

For a weakly typed language, this may not seem particularly important, but strongly typed languages will fall, with such changes. It is especially unpleasant when the string representation of a number changes its type to a numeric one during the execution of mathematical operations in the ORM accessory, in which “1” + “2” = 3. Such a change can potentially pass your unit tests, if the latter are sufficiently complex, but " cripple your iOS app to a pulp ".

Rails ActiveRecord keeps track of which field types must be present according to how they were added to the schema through migration, but if these changes are made through the accessor or the data type in the schema changes, this will also cause problems.

Using serialization, as in the example above, we can convert data types to provide the desired output format, and this type will change only when you make changes to the serializer itself.

Rename Fields

Renaming fields in the data warehouse will not break your API. If you are annoyed by the need to update all your integration tests, then consider how annoying the developers of mobile applications, or other frontend-commands, who need to update and deploy new applications. Perhaps you do not even remember the lock-step deployment. If not, then you are going to get non-working applications for end users, and even if you roll out the update for IOS, the applications will still be non-working until users update them manually.

To avoid this problem, you need to rename the field in the serialization layer, which will allow you to update the link to the data of the field being inserted, without making changes to the external representation.

Multiple data warehouses

In most of the ORM solutions to the serialization process, there is one misconception - all your data is stored in one place. What happens when some data flows from SQL to Redis or somewhere else?

Even if you do not move part of the data from SQL to Redis, you can split one table into two, or use pivot tables. In this case, most ORM serializers will ~~land on your face~~ if you try to do this.

Instead, you can use the Repository pattern that has become popular in Laravel , you can transfer all the data from any repositories, and they will be in the serialization library, and the serializer will keep the same output.

Versioning serializers

Earlier I versioned serializers for major versions. V1 and V2 for FoodSerializer could both exist, having different tests, and perfectly satisfying the numerous needs of the client API.

Serializer formats

Something Fractal has not fully achieved is multiformat "adapters", but seeks to fix it in version 1.0. This was pretty well implemented in the Rails community, and you can send different headers to get a completely different data format.

Depending on the mime type that you send, you tell the serializer what data format is needed at the output, without littering your code with potentially complex logic.

Solutions

I told why it is worth using serialization, but did not tell how to use. For this, take a look at the following solutions:

Fractal - PHP
JMSSerializerBundle - PHP + Symfony2
Marshmallow - Python
ActiveModel Serializer - Ruby + Rails
JBuilder - Ruby + Rails
Roar - Ruby

Whichever system you choose, they have a similar idea. If you create any API, please use this.

Remember that the API is not only a proxy for SQL commands, the API needs to be planned, carefully thought out and maintained, and simple changes in your data warehouse should not demolish a whole network of applications and services.

Source: https://habr.com/ru/post/260975/

All Articles