NFX - Ultra Efficient Binary Serialization in the CLR

Requirements

In this article, we will consider the tasks of transferring complex objects between processes and machines. In our systems there were many places where it was required to move a large number of business objects of various structures, for example:

self-looped object graphs (trees with back references)
arrays of structures (value types)
classes / structures with readonly fields
existing .Net collections instances (Dictionary, List) that use custom-serialization internally
a large number of instances of types specialized for a specific task

It will focus on three aspects that are very important in distributed cluster systems:
')

serialization / deserialization rate
volume of objects in the serialized form
the ability to use existing objects without the need to “decorate” these objects and their fields with auxiliary attributes for serialization

Briefly consider the three aspects listed above.

The first is speed. This is very important to ensure the overall speed of the system in a distributed environment, when to perform a task (for example, a request from one user) you need to execute five to ten requests to other back end machines.

The second is the volume. When transferring / replicating a large amount of data, the budget of the communication channel between data centers should not “swell”.

The third is convenience. It is very inconvenient when only serialization / marshaling requires the creation of “extra” objects, ktr. transfer data. It is also inconvenient to force a programmer of a particular business type to write low-level code for writing an instance into an array of bytes. Maybe this can be done when you have 5-6 classes, but what to do if your system has 30 basic generic classes (ie DeliveryStrategy), each of which is combined with dozens of other classes (this gives hundreds of specific types, ie: DeliveryStrategy, DeliveryStrategy, DeliveryStrategy etc.). I would very much like to have a transparent system that can serialize almost all classes of the domain without the need for additional markup, code, etc. Of course, there are things that do not need to be serialized, for example, some unmanaged resources or delegates, but everything else is usually needed, even such elements as readonly fields of structures and classes.

This article covers the topic of binary serialization. We will not talk about JSON and other formats, as they are not designed to effectively solve the above problems.

The problems of existing serializers

Immediately make a reservation, everything that is written here is relative - depending on what to compare with. If you write / read hundreds of objects per second, then there is no problem. Another thing is when you need to handle tens or even hundreds of thousands of objects per second.

BinaryFormatter - veteran .Net. It is easy to use and better suited to requirements than the DataContractSerializer. Well supports all built-in types of collections and other BCL classes. Supports object versioning. Not interoperable between platforms. It has very large performance disadvantages. It is very slow and serialization produces very massive streams.

DataContractSerializer is a WCF engine. Works faster than BinaryFormatter in many cases. Supports interoperability and versioning. However, this serializer is not intended to solve general-purpose serialization problems per se. It requires specialized decoration of classes and fields with attributes, there are also problems with polymorphism and support for complex types. This is very explicable. The fact is that DataContractSerializer is not intended, by definition, to work with arbitrary types (hence the name).

Protobuf - super speed! Uses Google format, allows you to change the version of objects and super-fast. Interoperability between platforms. It has a major drawback - it does not “understand” all types automatically and does not support complex graphs.

Thrift is a facebook development. Uses its IDL, interoperable between languages, allows you to change the version. Disadvantages: it works rather slowly, consumes a lot of memory, does not support cyclic graphs.

Based on the above characteristics, if you do not take into account the performance, the most suitable serializer for us is BinaryFormatter. He is the most “transparent”. The fact that it does not support interoperability between platforms is not important for us, because we have one platform - Unistack. But the speed of his work is just awful. Very slow and large output.

NFX.Serialization.Slim.SlimSerializer

github.com/aumcode/nfx/blob/master/Source/NFX/Serialization/Slim/SlimSerializer.cs

SlimSerializer is a hybrid serializer with dynamic generation of ser / deser code in runtime for each specific type.

We did not try to make an absolutely universal decision, because then we would have to sacrifice something. We did not do things that. for us are not important, namely:

cross-platform
object version upgrade

Based on the foregoing, the SlimSerializer is not suitable for such tasks, where:

data is stored in storage (for example, on disk)
data is generated / received by processes not on the CLR platform, but Windows.NET - to - Linux.MONO and Linux.MONO - to - Windows.NET work great

SlimSerializer is designed for situations where:

need a high speed (hundreds of thousands of operations per second)
required to save the amount of data transferred
specialized markup for serialization is unrealistic for various reasons (for example, very many classes)

SlimSerializer supports all kinds of edge-cases, for example:

direct serialization of primitive structures and their Nullable equivalents (DateTime, Timespan, Amount, GDID, FID, GUID, MethodSpec, TypeSpec etc.)
direct serialization of basic reference types (byte [], char [], string [])
support for classes and structures with read-only fields
support for custom-serialization ISerializable, OnSerializing, OnSerialized ... etc.
cascaded nested serialization (for example, some type custom-serializes itself and must call the SlimSerializer for some field)
allows you to serialize any supported types (except delegates) to the root
normalizes graphs of any complexity and nesting
Buffer-overflow detection in deserialization (this is necessary when the stream is corrupted and an inadvertent allocation of a large piece of memory is possible)

The development is not easy and has already undergone many optimizations. The results that we have been able to achieve are not finite, we can still speed up, but this will cause a complication of the already non-trivial code.

How it works?

SlimSeralizer uses a streamer, which is taken from the injectable format github.com/aumcode/nfx/blob/master/Source/NFX/IO/StreamerFormats.cs . Streammer formats are needed in order to serialize certain types directly into a stream. For example, we by default support such types as FID, GUID, GDID, MetaHandle etc. The fact is that certain types can be cleverly pack variable-bit encoding. This gives a very large increase in speed and saves space. All integer primitives are written with variable-bit encoding. Thus, in cases when you need a super-fast support of a special type, you can inherit StreamerFormat and add WriteX / ReadX methods. The system itself collects and turns them into lambda functors, which are needed for fast serialization / deserialization.

For each type, TypeDescriptor github.com/aumcode/nfx/blob/master/Source/NFX/Serialization/Slim/TypeSchema.cs . Is built, which dynamically compiles a pair of functors for serialization and deserialization.

SlimSerializer is built on the idea of TypeRegistry and this is the main highlight of the entire serializer github.com/aumcode/nfx/blob/master/Source/NFX/Serialization/Slim/TypeRegistry.cs . Types are written as a string — the full name of the type, but if such a type has already been encountered before, then a type handle of the form “$ 123” is written. This designates the type found in the registry behind the number 123.

When we encounter a reference, we replace it with MetaHandle github.com/aumcode/nfx/blob/master/Source/NFX/IO/MetaHandle.cs , which effectively inlines either a string, if reference is string, or integer, which is the instance number object in the object graph, i.e. a kind of pseudo-pointer handle. During deserialization, everything is reconstructed in the reverse order.

Performance

All the following tests were performed on Intel Core I7 3.2 GHz on a single stream.
Performance SlimSerializer is scaled in proportion to the number of threads. We use specialized thread-static optimization in order not to copy the buffer.

Take the next type as an “experimental.” Pay attention to the various attributes that are needed for DataContractSerializer:

[DataContract(IsReference=true)] [Serializable] public class Perzon { [DataMember]public string FirstName; [DataMember]public string MiddleName; [DataMember]public string LastName; [DataMember]public Perzon Parent; [DataMember]public int Age1; [DataMember]public int Age2; [DataMember]public int? Age3; [DataMember]public int? Age4; [DataMember]public double Salary1; [DataMember]public double? Salary2; [DataMember]public Guid ID1; [DataMember]public Guid? ID2; [DataMember]public Guid? ID3; [DataMember]public List<string> Names1; [DataMember]public List<string> Names2; [DataMember]public int O1 = 1; [DataMember]public bool O2 = true; [DataMember]public DateTime O3 = App.LocalizedTime; [DataMember]public TimeSpan O4 = TimeSpan.FromHours(12); [DataMember]public decimal O5 = 123.23M; }

And now we do many times 500,000 objects:

Slim serialize: 464 252 ops / sec; size: 94 bytes
Slim deser: 331,564 ops / sec
BinFormatter serialize: 34,702 ops / sec: size: 1188 bytes
BinFormatter deser: 42 702 ops / sec
DataContract serialize: 108 932 ops / sec: size: 773 bytes
DataContract deser: 41 985 ops / sec

Slim serialization speed to BinFormatter: 13.37 times faster.
Slim deserialization speed to BinFormatter: 7.76 times faster.
Slim volume to BinFormatter: 12.63 times smaller.

Slim serialization speed to DataContract: 4.26 times faster.
Slim deserialization speed to DataContract: 7.89 times faster.
Slim volume to DataContract: 8.22 times smaller.

And now we try a complex object-graph from several dozen mutually referring objects, including arrays and sheets (many times 50,000 objects):

Slim serialize: 12,036 ops / sec; size: 4 466 bytes
Slim deser: 11,322 ops / sec
BinFormatter serialize: 2,055 ops / sec: size: 7,393 bytes
BinFormatter deser: 2,277 ops / sec
DataContract serialize: 3,943 ops / sec: size: 20,246 bytes
DataContract deser: 1,510 ops / sec

Slim serialization speed to BinFormatter: 5.85 times faster.
Slim deserialization speed to BinFormatter: 4.97 times faster.
Slim volume to BinFormatter: 1.65 times smaller.

Slim serialization speed to DataContract: 3.05 times faster.
Slim de-serialization rate to DataContract: 7.49 times faster.
Slim volume to DataContract: 4.53 times smaller.

Note the difference in the serialization of the typed class (the first case is “Perzon”) and the second (many objects). In the second case, there is a complex graph with cyclic interconnections of objects and therefore Slim begins to approach (slow down) in speed to Microsoft. However, it still exceeds the latter at least 4 times in speed and one and a half times in volume. The code for this test is: github.com/aumcode/nfx/blob/master/Source/Testing/Manual/WinFormsTest/SerializerForm2.cs#L51-104

And here is the comparison with Apache.Thrift: blog.aumcode.com/2015/03/apache-thrift-vs-nfxglue-benchmark.html .
Although these numbers are not based on pure serialization, but on the whole of NFX.Glue (which includes messaging, TCP networking, security etc), the speed is very dependent on the SlimSerializer, on which the native NFX.Glue builds.

 Each test is: 64,000 calls each returning a set of 10 rows each having 10 fields 640,000 total rows pumped Glue: took 1982 msec @ 32290 calls/sec Thrift1: took 65299 msec @ 980 calls/sec 32x slower than Glue Thrift2: took 44925 msec @ 1424 calls/sec 22x slower than Glue ================================================================= Glue is: 32 times faster than Thrift BinaryProtocol 22 times faster than Thrift CompactProtocol

Results

The NFX SlimSerializer delivers exceptionally high and predictably stable performance, saving processor and memory resources. This is what opens up opportunities for high-load technologies on the CLR platform, allowing hundreds of thousands of requests per second to be processed at each node of distributed systems.

SlimSerializer has several limitations due to the inability to create a practical “one size fits all” system. These restrictions: lack of versioning of data structures, delegate serialization, interoperability with other platforms except CLR. However, it is worth noting that in the Unistack concept (unified software stack for all nodes of the system), these restrictions are generally invisible except for the lack of versioning, i.e. SlimSerializer is not intended for long-term storage of data on the disk, if the data structure may change.

NFX.Glue's ultra-efficient native banding allows servicing 100,000 + two-way calls per second thanks to the specialized optimizations used in the serializer, while not requiring the programmer to do extra work on creating extra data-transfer types

youtu.be/m5zckEbXAaA

youtu.be.com/KyhYwaxg2xc

SlimSerializer significantly overtakes the tools built into .NET, allowing you to efficiently handle complex graphs of interrelated objects (which neither Protobuf nor Thrift can do).

Source: https://habr.com/ru/post/257247/

All Articles