📜 ⬆️ ⬇️

Data serialization or communication dialectics: simple serialization

image Good day, dear. In this article, we will look at the most popular data serialization formats and conduct a little testing with them. This is the first article on the topic of data serialization and we will look at simple serializers, which do not require large changes in the code from the developer to integrate them.

Sooner or later, but you, like our company, can face a situation where the number of services used in your product increases dramatically, and all of them also turn out to be very “talkative”. Whether this happened because of the transition to the “HYIP” microservice architecture today or you just received a pack of orders for minor improvements and implemented them by a handful of services - it does not matter. The important thing is that from now on, your product has got two new problems - what to do with the increased amount of data being driven between individual services, and how to prevent chaos in developing and supporting such a number of services. I’ll explain a little about the second problem: when the number of your services grows to hundreds or more, one development team can no longer develop and maintain them, therefore, you distribute packs of services to different teams. And the main thing is that all these teams use one format for their RPC, otherwise you will encounter such classic problems when one team cannot support the services of the other or just the two services do not fit together without abundant sealing of the junction with crutches. But we will talk about this in a separate article, and today we will pay attention to the first problem of increased data and think about what we can do about it. And we don’t want to do anything because of our Orthodox laziness, but we want to add a couple of lines to the common code and get a profit immediately. With this we begin in this article, namely, consider serializers, the embedding of which does not require major changes in our beautiful RPC.

The issue of format is actually rather painful for our company, because our current products use the xml format to exchange information between components. No, we are not masochists, we are well aware that using xml for data exchange was about 10 years ago, but this is precisely the reason - the product is already 10 years old, and it contains a lot of legacy-architectural solutions that are rather difficult to “cut out” quickly. . After a bit of thinking and stating, we decided that we would use JSON for storing and transmitting data, but we need to choose one of the JSON packing options, since the size of the transmitted data is critical for us (I will explain below why).

We have added a list of criteria by which we will choose the format that suits us:
')

After analyzing a decent number of options, we selected for ourselves such candidates:

  1. Json
  2. BSON
  3. Message pack
  4. Corbor

These formats do not require a description of the IDL scheme of the data being transferred, but contain a data scheme inside it. This greatly simplifies the work and allows in most cases to add support by writing no more than 10 lines of code.

We are also well aware that some factors of a protocol or a serializer are highly dependent on its implementation. What perfectly packs in C ++ can pack badly in Javascript. Therefore, for our experiments, we will use implementations for JS and Go and will drive tests. JS implementation for fidelity will drive in the browser and on nodejs.

So, we will start consideration.

Json


The easiest of the interaction formats we are considering. When comparing other formats, we will use it as a reference, as in our current projects it has shown its effectiveness and has shown all its minuses.

Pros:


Minuses:


Let's see what we have with performance. When considering, we will immediately try to take into account the lack of JSON in its size and make tests with JSON packing using zlib. For the tests we will use the following libraries:


You can find the source code and all test results at the following links:

Go - https://github.com/KyKyPy3/serialization-tests
JS (node) - https://github.com/KyKyPy3/js-serialization-tests
JS (browser) - http://jsperv.com/serialization-benchmarks/5

Experimentally, we found that test data should be taken as close to real as possible, because test results with different test data differ dramatically. So if it is important for you not to miss the format, always test it on the data closest to your realities. We will be testing on data close to our realities. You can look at them in the source code of tests.

This is what we got for JSON speed. Below are the benchmark results for the respective languages:
JS (Node)
Json encode21,507 ops / sec (86 runs sampled)
Json decode9.039 ops / sec (89 runs sampled)
Json roundtrip6.090 ops / sec (93 runs sampled)
Json compres encode1,168 ops / sec (84 runs sampled)
Json compres decode2,980 ops / sec (93 runs sampled)
Json compres roundtrip874 ops / sec (86 runs sampled)

JS (browser)
Json roundtrip5.754 ops / sec
Json compres roundtrip890 ops / sec

Go
Json encode5000391100 ns / op24.37 MB / s54520 B / op1478 allocs / op
Json decode3000392785 ns / op24.27 MB / s76634 B / op1430 allocs / op
Json roundtrip2000796115 ns / op11.97 MB / s131150 b / op2908 allocs / op
Json compres encode3000422254 ns / op0.00 MB / s54790 B / op1478 allocs / op
Json compres decode3000464569 ns / op4.50 MB / s117206 b / op1446 allocs / op
Json compres roundtrip2000881305 ns / op0.00 MB / s171795 b / op2915 allocs / op

But what got the size of the data:
JS (Node)
Json9482 bytes
Json compressed1872 bytes

JS (Browser)
Json9482 bytes
Json compressed1872 bytes

At this stage, we can conclude that even though JSON compression gives excellent results, the loss in processing speed is simply catastrophic. Another conclusion: JS works great with JSON, which cannot be said, for example, about go. It is possible that processing JSON in other languages ​​will show results incomparable with JS. While we postpone the JSON results aside and see how it will be with other formats.

BSON


This data format came from MongoDb and is actively promoted by them. The format was originally designed for data storage and was not intended for transmission over the network. Honestly, after a brief search on the Internet, we did not find a single serious product that uses BSON inside. But let's see what this format can give us.

Pros:


Minuses:


So for example JSON object

{«hello": "world»} 

will turn into this:

 \x16\x00\x00\x00 // total document size \x02 // 0x02 = type String hello\x00 // field name \x06\x00\x00\x00world\x00 // field value \x00 // 0x00 = type EOO ('end of object') 

The specification says that BSON was designed as a format with fast serialization / deserialization, at least due to the fact that it stores numbers as an Int type, and does not waste time parsing them from a string. Let's check. For testing we took the following libraries:


And here are the results we obtained (for clarity, I also added results for JSON):
JS (Node)
Json encode21,507 ops / sec (86 runs sampled)
Json decode9.039 ops / sec (89 runs sampled)
Json roundtrip6.090 ops / sec (93 runs sampled)
Json compres encode1,168 ops / sec (84 runs sampled)
Json compres decode2,980 ops / sec (93 runs sampled)
Json compres roundtrip874 ops / sec (86 runs sampled)
Bson encode93.21 ops / sec (76 runs sampled)
Bson decode242 ops / sec (84 runs sampled)
Bson roundtrip65.24 ops / sec (65 runs sampled)

JS (browser)
Json roundtrip5.754 ops / sec
Json compres roundtrip890 ops / sec
Bson roundtrip374 ops / sec

Go
Json encode5000391100 ns / op24.37 MB / s54520 B / op1478 allocs / op
Json decode3000392785 ns / op24.27 MB / s76634 B / op1430 allocs / op
Json roundtrip2000796115 ns / op11.97 MB / s131150 b / op2908 allocs / op
Json compres encode3000422254 ns / op0.00 MB / s54790 B / op1478 allocs / op
Json compres decode3000464569 ns / op4.50 MB / s117206 b / op1446 allocs / op
Json compres roundtrip2000881305 ns / op0.00 MB / s171795 b / op2915 allocs / op
Bson encode10,000249024 ns / op40.42 MB / s70085 B / op982 allocs / op
Bson decode3000524408 ns / op19.19 MB / s124777 b / op3580 allocs / op
Bson roundtrip2000712524 ns / op14.13 MB / s195334 b / op4562 allocs / op

But what got the size of the data:
JS (Node)
Json9482 bytes
Json compressed1872 bytes
Bson112710 bytes

JS (Browser)
Json9482 bytes
Json compressed1872 bytes
Bson9618 bytes

Although BSON gives us the possibility of additional data types and, most importantly, the possibility of partial reading / changing data, in terms of data compression, it is all very sad, so we have to continue searching further.

Message pack


The next format that came to our table is the Message Pack. This format is quite popular lately and I personally found out about it when picking with tarantool.

If you look at the site format, you can:


We'll have to check how true this is, but first let's see what the format offers.

By tradition, let's start with the pros:


Minuses:


Now let's see how productive it is and how it compresses the data. The following libraries were used for tests:


We got the following results:
JS (Node)
Json encode21,507 ops / sec (86 runs sampled)
Json decode9.039 ops / sec (89 runs sampled)
Json roundtrip6.090 ops / sec (93 runs sampled)
Json compres encode1,168 ops / sec (84 runs sampled)
Json compres decode2,980 ops / sec (93 runs sampled)
Json compres roundtrip874 ops / sec (86 runs sampled)
Bson encode93.21 ops / sec (76 runs sampled)
Bson decode242 ops / sec (84 runs sampled)
Bson roundtrip65.24 ops / sec (65 runs sampled)
MsgPack encode4,758 ops / sec (79 runs sampled)
MsgPack decode2,632 ops / sec (91 runs sampled)
MsgPack roundtrip1.692 ops / sec (91 runs sampled)

JS (browser)
Json roundtrip5.754 ops / sec
Json compres roundtrip890 ops / sec
Bson roundtrip374 ops / sec
MsgPack roundtrip1,048 ops / sec

Go
Json encode5000391100 ns / op24.37 MB / s54520 B / op1478 allocs / op
Json decode3000392785 ns / op24.27 MB / s76634 B / op1430 allocs / op
Json roundtrip2000796115 ns / op11.97 MB / s131150 b / op2908 allocs / op
Json compres encode3000422254 ns / op0.00 MB / s54790 B / op1478 allocs / op
Json compres decode3000464569 ns / op4.50 MB / s117206 b / op1446 allocs / op
Json compres roundtrip2000881305 ns / op0.00 MB / s171795 b / op2915 allocs / op
Bson encode10,000249024 ns / op40.42 MB / s70085 B / op982 allocs / op
Bson decode3000524408 ns / op19.19 MB / s124777 b / op3580 allocs / op
Bson roundtrip2000712524 ns / op14.13 MB / s195334 b / op4562 allocs / op
MsgPack Encode5000306260 ns / op27.36 MB / s49907 b / op968 allocs / op
MsgPack Decode10,000214967 ns / op38.98 MB / s59649 b / op1690 allocs / op
MsgPack Roundtrip3000547434 ns / op15.31 MB / s109754 b / op2658 allocs / op

But what got the size of the data:
JS (Node)
Json9482 bytes
Json compressed1872 bytes
Bson112710 bytes
Msgpack7628 bytes

JS (Browser)
Json9482 bytes
Json compressed1872 bytes
Bson9618 bytes
Msgpack7628 bytes

Of course, MessagePack doesn’t compress data as coolly as we would like, but at least it behaves fairly consistently in both JS and Go. Perhaps, at the moment it is the most attractive candidate for our tasks, but it remains to consider our last patient.

Corbor


To be honest, the format is very similar to MessagePack in its capabilities, and it seems that the format was designed as a replacement for MessagePack. It also has support for data type extensions and full compatibility with JSON. Of the differences, I noticed only support for arrays / lines of arbitrary length, but, in my opinion, this is a very strange feature. If you want to know more about this format, then it was a great article on Habré - habrahabr.ru/post/208690 . Well, we'll see how Cbor works with performance and data compression.

The following libraries were used for tests:


And, of course, here are the final results of our tests, taking into account all the formats considered:
JS (Node)
Json encode21,507 ops / sec ± 1.01% (86 runs sampled)
Json decode9.039 ops / sec ± 0.90% (89 runs sampled)
Json roundtrip6.090 ops / sec ± 0.62% (93 runs sampled)
Json compres encode1,168 ops / sec ± 1.20% (84 runs sampled)
Json compres decode2,980 ops / sec ± 0.43% (93 runs sampled)
Json compres roundtrip874 ops / sec ± 0.91% (86 runs sampled)
Bson encode93.21 ops / sec ± 0.64% (76 runs sampled)
Bson decode242 ops / sec ± 0.63% (84 runs sampled)
Bson roundtrip65.24 ops / sec ± 1.27% (65 runs sampled)
MsgPack encode4,758 ops / sec ± 1.13% (79 runs sampled)
MsgPack decode2.632 ops / sec ± 0.90% (91 runs sampled)
MsgPack roundtrip1.692 ops / sec ± 0.83% (91 runs sampled)
Cbor encode1,529 ops / sec ± 4.13% (89 runs sampled)
Cbor decode1,198 ops / sec ± 0.97% (88 runs sampled)
Cbor roundtrip351 ops / sec ± 3.28% (77 runs sampled)

JS (browser)
Json roundtrip5.754 ops / sec ± 0.63%
Json compres roundtrip890 ops / sec ± 1.72%
Bson roundtrip374 ops / sec ± 2.22%
MsgPack roundtrip1,048 ops / sec ± 5.40%
Cbor roundtrip859 ops / sec ± 4.19%

Go
Json encode5000391100 ns / op24.37 MB / s54520 B / op1478 allocs / op
Json decode3000392785 ns / op24.27 MB / s76634 B / op1430 allocs / op
Json roundtrip2000796115 ns / op11.97 MB / s131150 b / op2908 allocs / op
Json compres encode3000422254 ns / op0.00 MB / s54790 B / op1478 allocs / op
Json compres decode3000464569 ns / op4.50 MB / s117206 b / op1446 allocs / op
Json compres roundtrip2000881305 ns / op0.00 MB / s171795 b / op2915 allocs / op
Bson encode10,000249024 ns / op40.42 MB / s70085 B / op982 allocs / op
Bson decode3000524408 ns / op19.19 MB / s124777 b / op3580 allocs / op
Bson roundtrip2000712524 ns / op14.13 MB / s195334 b / op4562 allocs / op
MsgPack Encode5000306260 ns / op27.36 MB / s49907 b / op968 allocs / op
MsgPack Decode10,000214967 ns / op38.98 MB / s59649 b / op1690 allocs / op
MsgPack Roundtrip3000547434 ns / op15.31 MB / s109754 b / op2658 allocs / op
Cbor Encode20,00071203 ns / op117.48 MB / s32944 B / op12 allocs / op
Corbor decode3000432005 ns / op19.36 MB / s40216 b / op2159 allocs / op
Cbor roundtrip3000531434 ns / op15.74 MB / s73160 B / op2171 allocs / op

But what got the size of the data:
JS (Node)
Json9482 bytes
Json compressed1872 bytes
Bson112710 bytes
Msgpack7628 bytes
Corbor7617 bytes


JS (Browser)
Json9482 bytes
Json compressed1872 bytes
Bson9618 bytes
Msgpack7628 bytes
Corbor7617 bytes

Comments, I think, are unnecessary here, everything is perfectly visible from the results - CBor was the slowest format.

findings


What conclusions did we draw from this comparison? After a little thought and looking at the results, we came to the conclusion that none of the formats satisfied us. Yes, MsgPack proved to be quite a good option: it is easy to use and quite stable, but after consulting with colleagues, we decided to take a fresh look at other binary data formats, not based on JSON: Protobuf, FlatBuffers, Cap'n proto and avro. That we succeeded and what we ultimately chose will be discussed in the next article.

Posted by: KyKyPy3uK

Source: https://habr.com/ru/post/312320/


All Articles