We write a serializer for a network game in C ++ 11

I was inspired to write this post by the wonderful article in the Gaffer on Games blog “Reading and Writing Packets” and the irrepressible urge to automate everything (especially writing code in C ++!).

Let's start with the problem statement. We write a network game (and immediately MMORPG, of course!), And regardless of the architecture, we have the need to constantly send and receive data over the network. Most likely, we will need to send several different types of packages (actions of players, updates of the game world, simply authentication, in the end!), And for everyone we should have a read function and a write function. It would seem that it is not a question to sit down and write calmly these two functions and not be nervous, however, we immediately have a number of problems.

The choice of format. If we were writing a simple JavaScript game, we would be satisfied with JSON or any samopisny relative of it. But we are writing a serious multiplayer game that is demanding for traffic; we cannot afford to send ~ 16 bytes to float instead of four. So, we need a raw binary format. However, binary data complicates debugging; It would be great if we could change the format at any time, without rewriting all our read / write functions.
Security issues. The first rule of the network game: do not trust the data sent by the client ! The reading function should be able to terminate at any time and return false if something went wrong. However, using exceptions is considered a bad idea, since they are too slow. Mamkin hacker even if it does not break your server, but it can quite significantly slow down its continuous exept. But manually writing code consisting of ifs and returns is unpleasant and unaesthetic.
Duplicate code. The functions of reading and writing are similar, but not quite. The need to change the structure of the package leads to the need to change two functions, which sooner or later will lead to the fact that you forget to change one of them or change them differently, which will lead to difficult bugs to be caught. As Gaffer on Games rightly notes, it is

All interested in how Bender fulfilled his promise and at the same time solved the indicated problems, please under the cat.

Read and write streams

Let's start with the initial assumptions. We want to be able to write and read text and binary formats; Let the text format be read and written from / to standard STL streams ( std::basic_istream and std::basic_ostream , respectively). For a binary format, we will have our own BitStream class that supports the same STL BitStream interface (at least the << and >> operators, the rdstate() method, which returns 0 if there are no read / write errors and not 0 otherwise, and the ability to eat manipulators ) ; It would also be great if he could write and read data lengths that are not a multiple of eight bits.

Possible interface BitStream class

 using byte = uint8_t; class BitStream { byte* bdata; uint64_t position; uint64_t length, allocated; int mode; // 0 = read, other = write int state; // 0 = OK void reallocate(size_t); public: static const int MODE_READ = 0; // ,  ,   static const int MODE_WRITE = 1; // enum class,    inline int get_mode(void) const noexcept { return mode; } BitStream(void); //   BitStream(void*, uint64_t); //   ~BitStream(void); int rdstate(void) const; //   how_much : void write_bits(char how_much, uint64_t bits); //  how_much     : uint64_t read_bits(char how_much); void* data(void); BitStream& operator<<(BitStream&(*func)(BitStream&)); //  BitStream& operator>>(BitStream&(*func)(BitStream&)); //  }; template<typename Int> typename std::enable_if<std::is_integral<Int>::value, BitStream&>::type operator<<(BitStream& out, const Int& arg); //  8*sizeof(Int)    template<typename Int> typename std::enable_if<std::is_integral<Int>::value, BitStream&>::type operator>>(BitStream& in, Int& arg); //  8*sizeof(Int)

Why enable_if here and how does it work?

std::enable_if<condition, T> checks the condition condition and, if it is satisfied (that is, non-zero), determines the type std::enable_if<...>::type equal to the user-specified type T or (by default) void . If the condition is not met, a call to std::enable_if<...>::type produces undefined; such an error will prevent our template from compiling, but it will not prevent the program from compiling, since substitution failure is not an error (SFINAE) - an error during the substitution of arguments into the template is not a compilation error. The program will compile successfully if another implementation of operator<< with a suitable signature is defined somewhere, or it says that there is simply no suitable function to call (the smart compiler may specify that he tried, but SFINAE happened).

Serializer interface

It is clear that now we need the basic “building blocks” of the serializer: functions or objects that can serialize and parse whole or floating point numbers. However, we (of course!) Want extensibility, i.e. so that the programmer could write a “brick” to serialize any of his data types and use it in our serializer. How should such a brick look? I suggest the simplest format:

 struct IntegerField { template<class OutputStream> static void serialize(OutputStream& out, int t) { out << t; //      ! } //       bool,    template<class InputStream> static bool deserialize(InputStream& in, int& t) { in >> t; //      ! return !in.rdstate(); //  true,       } };

Just a class with two static methods and, possibly, an unlimited number of their overloads. (So, instead of one template method, it is allowed to write several: one for std::basic_ostream , one for BitStream , an unlimited number for any other stream to the taste of the programmer.)
')
For example, for serialization and parsing of a dynamic array of elements, the interface might look like this:

 template<typename T> struct ArrayField { template<class OutputStream> static void serialize(OutputStream& out, size_t n, const T* data); template<class OutputStream> static void serialize(OutputStream& out, const std::vector<T>& data); template<class InputStream> static bool deserialize(InputStream& in, size_t& n, T*& data); template<class InputStream> static bool deserialize(InputStream& in, std::vector<T>& data); };

Auxiliary templates `can_serialize` and `can_deserialize`

Next, we need to be able to check if such and such a field can start serialization / parsing with such and such arguments. Here we come to a more detailed discussion of the variadic tempates and SFINAE.

Let's start with the code:

 template<typename... Types> struct TypeList { //   ,  « » static const size_t length = sizeof...(Types); }; template<typename F, typename L> class can_serialize; template<typename F, typename... Ts> class can_serialize<F, TypeList<Ts...>> { template <typename U> static char func(decltype(U::serialize(std::declval<Ts>()...))*); template <typename U> static long func(...); public: static const bool value = ( sizeof(func<F>(0)) == sizeof(char) ); };

What is it? This is a structure that, at the compilation stage, determines, given a class F and a list of types L = TypeList<Types...> , whether it is possible to call the function F::serialize with arguments of these types. For example,

 can_serialize<IntegerField, TypeList<BitStream&, int> >::value

equals 1, like

 can_serialize<IntegerField, TypeList<BitStream&, char&> >::value

(because char& converts nicely to int ), however

 can_serialize<IntegerField, TypeList<BitStream&> >::value

equal to 0, since IntegerField does not provide a serialize method that accepts only output stream as input.

How it works? A more subtle question, let's figure it out.

Let's start with the TypeList class. Here we use the variadic templates promised by Bender, that is, templates with a variable number of arguments . The TypeList class TypeList accepts an arbitrary number of type arguments, which are placed in the parameter pack under the name Types . (I wrote more about how to use parameter packs in the previous article .) Our TypeList class TypeList n’t do anything useful, but in general we can do quite a lot with the parameter pack. For example, the design

 std::declval<Ts>()...

for the parameter pack of length 4 containing the types T1, T2, T3, T4 , will open when compiled into

 std::declval<T1>(), std::declval<T2>(), std::declval<T3>(), std::declval<T4>()

Further. We have a can_serialize template that accepts a class F and a list of types L , and partial specialization giving us access to the types themselves in the list. (If you request can_serialize<F, L> , where L not a list of types, the compiler will complain about an undefined template (undefined template), and it will be divided.) In this partial specialization, all the magic goes.

In its code, there is a call to func<F>(0) inside sizeof . The compiler will have to determine which of the function overloads func is called to calculate the size returned in bytes, but it will not try to compile it, and therefore we are not waiting for errors like “I don’t find your function” (as well as "There is some kind of crap in the function body", if this body were). First, he will try to use the first definition of func , a very intricate look:

 template <typename U> static char func( decltype( U::serialize( std::declval<Ts>()... ) )* );

The decltype construct decltype type of the expression in brackets; for example, decltype(10) is the same as int . But, like sizeof , it does not compile it; this allows the focus to work with std::declval . std::declval is a function that pretends to return an rvalue reference of the desired type; it makes the expression U::serialize( std::declval<Ts>()... ) meaningful and mimicking a real call to U::serialize , even if half of the arguments do not have a default constructor and we cannot simply write U::serialize( Ts()... ) (not to mention that this function may require lvalue-references! declval way, in this case declval will declval lvalue-reference, because according to the rules of C ++, T& && is T& ). It, of course, has no realization; write in plain code

 int a = std::declval<int>();

- bad idea.

So here. If a call inside decltype impossible (there is no function with such a signature or its substitution causes an error for some reason) - the compiler considers that a substitution failure error has occurred, which, as we know, is not an error (SFINAE). And he calmly goes on, trying to use the following definition of func , in which no problems are already foreseen. However, another function returns a result of a different size, which can be easily caught using sizeof . (In fact, not so easy, and sizeof(long) may well be equal to sizeof(char) on exotic platforms, but omitting these details is all fixable.)

As food for self-reflection, I will also give the code of the can_deserialize template, which is a little bit more complicated: it not only checks whether F::deserialize can be called with the specified argument types, but also makes sure that the type of result is equal to bool .

 template<typename F, typename L> class can_deserialize; template<typename F, typename... Ts> class can_deserialize<F, TypeList<Ts...>> { template <typename U> static char func( typename std::enable_if< std::is_same<decltype(U::deserialize(std::declval<Ts>()...)), bool>::value >::type* ); template <typename U> static long func(...); public: using type = can_deserialize; static const bool value = ( sizeof(func<F>(0)) == sizeof(char) ); };

We collect packages from bricks

Finally, time to do the content part of the serializer. In short, we want to get the Schema template class, which would provide the serialize and deserialize functions built from “ deserialize blocks”:

 using MyPacket = Schema<IntegerField, IntegerField, FloatField, ArrayField<float>>; MyPacket::serialize(std::cout, 10, 15, 0.3, 0, nullptr); int da, db; float fc; std::vector<float> my_vector; bool success = MyPacket::deserialize(std::cin, da, db, fc, my_vector);

Let's start with a simple one - declarations of a template class (with a variable number of arguments, nya!) And the end of recursion.

 template<typename... Fields> struct Schema; template<> struct Schema<> { template<typename OutputStream> static void serialize(OutputStream&) { //    ! } template<typename InputStream> static bool deserialize(InputStream&) { return true; //   --  ! } };

But what should the code of the function serialize look like in a scheme with a non-zero number of fields? We cannot calculate the types accepted by the serialize functions of all these fields in advance, and we cannot combine them: this would require the not yet included standard invocation type traits . It remains only to make a function with a variable number of arguments and send as many of them to each field as it can eat - this is where the can_serialize born in torment will come in handy for can_serialize .

For such a recursion by the number of arguments, we need an auxiliary class (the main Schema class will deal with recursion by the number of fields). Define it without stint on the arguments:

 template< typename F, //  , serialize     typename NextSerializer, //    «»  typename OS, //    typename TL, //  ,     F::serialize bool can_serialize //       > struct SchemaSerializer;

Then Schema partial specialization, which finally implements recursion by the number of fields, takes the form

 template<typename F, typename... Fields> struct Schema<F, Fields...> { template< typename OutputStream, //    typename... Types //      > static void serialize(OutputStream& out, Types&&... args) { //   serialize  : SchemaSerializer< F, //   Schema<Fields...>, //     OutputStream&, //    TypeList<Types...>, //     can_serialize<F, TypeList<OutputStream&, Types...>>::value // !!! >::serialize(out, std::forward<Types>(args)...); } // . . . (    deserialize) };

Now we will write a recursion for SchemaSerializer . Let's start with a simple - from the end:

 template<typename F, typename NextSerializer, typename OS> struct SchemaSerializer<F, NextSerializer, OS, TypeList<>, false> { //      ,    . //   (  )  F::serialize //   .  ,     //  --  - ,   //  no such function serialize(...)   . }; template<typename F, typename NextSerializer, typename OS> struct SchemaSerializer<F, NextSerializer, OS, TypeList<>, true> { //        --  ! -- F::serialize //     ! (   ) template<typename... TailArgs> //   static void serialize(OS& out, TailArgs&&... targs) { F::serialize(out); //    ,  // (    out - ) //      : NextSerializer::serialize(out, std::forward<TailArgs>(targs)...); } };

Here we come to the second concept promised by Bender - perfect forwarding . We received extra arguments (maybe zero arguments, but most likely not), and we want to send them further to NextSerializer::serialize . In the case of templates, this is a problem known as the perfect forwarding problem.

Perfect forwarding

Suppose you want to write a wrapper around a template function f that takes one argument. For example,

 template<typename T> void better_f(T arg) { std::cout << "I'm so much better..." << std::endl; f(arg); }

It looks good, however, it breaks immediately if f takes an input T& lvalue as input, and not just T : the original function f will receive a reference to the temporary object as the type T will be calculated (deduced) as a type without a link. The solution is simple:

 template<typename T> void better_f(T& arg) { std::cout << "I'm so much better..." << std::endl; f(arg); }

And again, it breaks immediately if f takes an argument by value: literals and other rvalues could be sent to the original function, but not to the new function.
We'll have to write both options so that the compiler can choose and full compatibility is present in both cases:

 template<typename T> void better_f(T& arg) { std::cout << "I'm so much better..." << std::endl; f(arg); } template<typename T> void better_f(const T& arg) { std::cout << "I'm so much better..." << std::endl; f(arg); }

And this whole circus is for one function with one argument. As the number of arguments grows, the number of required overloads for a full wrapper will grow exponentially.

To combat this, C ++ 11 introduces rvalue reference and new type calculation rules. Now you can write just

 template<typename T> void better_f(T&& arg) { std::cout << "I'm so much better..." << std::endl; // ? . . }

The && modifier in the context of type evaluation has a special meaning (although it can be easily confused with a regular rvalue reference). If an lvalue-reference to an object of type type is passed to the function, type T will now be guessed as type& ; if an rvalue of type type is passed, type T will be guessed as type&& . The last thing left to do for pure perfect forwarding without copying the default arguments is to use std::forward :

 template<typename T> void better_f(T&& arg) { std::cout << "I'm so much better..." << std::endl; f(std::forward<T>(arg)); }

std::forward does not touch regular links and turns objects passed by value into rvalue links; thus, after the first wrapper, a rvalue-link will go further along the chain of wrappers (if any), instead of the object itself, eliminating unnecessary copies.

We continue the serializer

So, the design

 NextSerializer::serialize(out, std::forward<TailArgs>(targs)...);

performs perfect forwarding, sending all the "extra" arguments unchanged further down the chain of serializers.

Continue writing recursion for the SchemaSerializer . Recursion step for can_serialize = false :

 template<typename F, typename NextSerializer, typename OS, typename... Types> struct SchemaSerializer<F, NextSerializer, OS, TypeList<Types...>, false>: //     F::serialize   -- //    ;  ,   //   serialize public SchemaSerializer<F, NextSerializer, OS, typename Head<TypeList<Types...>>::Result, //  ,   can_serialize<F, typename Head<TypeList<OS, Types...>>::Result>::value // !!! > { //      ¯\_(ツ)_/¯ };

The implementation of the auxiliary class Head, which cuts the last element from the type list

 template<typename T> struct Head; //      ... template<typename... Ts> struct Concatenate; //       ! template<> struct Concatenate<> { using Result = EmptyList; }; template<typename... A> struct Concatenate<TypeList<A...>> { using Result = TypeList<A...>; }; template<typename... A, typename... B> struct Concatenate<TypeList<A...>, TypeList<B...>> { using Result = TypeList<A..., B...>; }; template<typename... A, typename... Ts> struct Concatenate<TypeList<A...>, Ts...> { using Result = typename Concatenate< TypeList<A...>, typename Concatenate<Ts...>::Result >::Result; }; //  ,  ++   // template<typename T, typename... Ts> // struct Head<TypeList<Ts..., T>>,   //      template<typename T, typename... Ts> struct Head<TypeList<T, Ts...>> { using Result = typename Concatenate<TypeList<T>, typename Head<TypeList<Ts...>>::Result>::Result; }; template<typename T, typename Q> struct Head<TypeList<T, Q>> { using Result = TypeList<T>; }; template<typename T> struct Head<TypeList<T>> { using Result = TypeList<>; }; template<> struct Head<TypeList<>> { using Result = TypeList<>; };

Recursion step for can_serialize = true :

 template<typename F, typename NextSerializer, typename OS, typename... Types> struct SchemaSerializer<F, NextSerializer, OS, TypeList<Types...>, true> { template<typename... TailTypes> //   static void serialize(OS& out, Types... args, TailTypes&&... targs) { F::serialize(out, std::forward<Types>(args)...); // (    out - ) //      : NextSerializer::serialize(out, std::forward<TailTypes>(targs)...); } };

Iiiii ... that's all! On this our serializer (in the most general terms) is ready, and the simplest code

 using MyPacket = Schema< IntegerField, IntegerField, CharField >; MyPacket::serialize(std::cout, 777, 6666, 'a');

successfully displays

7776666a

But how is deserialized? You still need to add spaces. A decent (that is, fairly abstract for tru-C ++) way to do this is to gash a field separator:

 template< class CharT, class Traits > std::basic_ostream<CharT, Traits>& delimiter( std::basic_ostream<CharT, Traits>& os ) { return os << CharT(' '); //   std::ostream   } template< class CharT, class Traits > std::basic_istream<CharT, Traits>& delimiter( std::basic_istream<CharT, Traits>& is ) { return is; //         } BitStream& delimiter(BitStream& bs) { return bs; //     --   ,   ! // (       , //     ) }

std::basic_ostream can eat functions that accept and return a reference to it (how do you think std::endl , std::flush ? is arranged), so now all the serialization code is rewritten as

 serialize(OS& out, ...) { F::serialize(out, ...); out << delimiter; //    NextSerializer::serialize(out, ...); }

After which we get the regular (and ready for deserialization)

 777 6666 a

But still there is a small detail ...

Nesting

Since our circuits have the same interface as simple fields, why not make a circuit from circuits?

 using MyBigPacket = Schema<MyPacket, IntegerField, MyPacket>; MyBigPacket::serialize(std::cout, 11, 22, 'a', 33, 44, 55, 'b');

Compile iii ... we get no matching function for call to 'serialize'. What's the matter?

The fact is that Schema::serialize eats up all the arguments that are given to it. The external scheme sees that Schema::serialize can be called with all the arguments thrown, well, it calls. The compiler compiles and sees that the last four arguments are out of work ( candidate function template not viable: requires 1 argument, but 5 were provided ), well, it reports an error.

The advantage of SFINAE has crawled out here as a disadvantage. The compiler does not compile the function before determining whether it can be called with the specified arguments or not; he just looks at her type. To eliminate this undesirable behavior, we must force Schema::serialize be of an invalid type if inappropriate arguments are passed to it.

We will do this immediately for the Schema and SchemaSerializer - it's easier.Suppose that Schemait has already been done for this, and its function serializeis of invalid type with invalid arguments. We modify some specialties of our class SchemaSerializer:

 template<typename F, typename NextSerializer, typename OS> struct SchemaSerializer<F, NextSerializer, OS, TypeList<>, true> { template<typename... TailArgs> static auto serialize(OS& out, TailArgs&&... targs) -> decltype(NextSerializer::serialize(out, std::forward<TailArgs>(targs)...)) { F::serialize(out); out << delimiter; NextSerializer::serialize(out, std::forward<TailArgs>(targs)...); } }; template<typename F, typename NextSerializer, typename OS, typename... Types> struct SchemaSerializer<F, NextSerializer, OS, TypeList<Types...>, true> { template<typename... TailTypes> static auto serialize(OS& out, Types... args, TailTypes&&... targs) -> decltype(NextSerializer::serialize(out, std::forward<TailTypes>(targs)...)) { F::serialize(out, std::forward<Types>(args)...); out << delimiter; NextSerializer::serialize(out, std::forward<TailTypes>(targs)...); } };

What happened? First, we used the new syntax. Starting from C ++ 11, the following methods are equivalent to specifying the type of the function result:

 type func(...) { ... } auto func(...) -> type { .. }

Why do you need it? . , , std::declval , type , — .

, , ? : NextSerialize::serialize , NextSerialize::serialize(out, std::forward<TailTypes>(targs)...) . ( , ) ; SchemaSerializer::serialize . , , Schema::serialize - , . Schema :

 template<typename F, typename... Fields> struct Schema<F, Fields...> { //  using ( , ++11!) template<class OutputStream, typename... Types> using Serializer = SchemaSerializer< F, //   Schema<Fields...>, //     OutputStream&, //    TypeList<Types...>, //     can_serialize<F, TypeList<OutputStream&, Types...>>::value // !!! >; template< typename OS, //    typename... Types //      > static auto serialize(OS& out, Types&&... args) -> decltype(Serializer<OS, Types...>::serialize(out, std::forward<Types>(args)...) ) { Serializer<OS, Types...>::serialize(out, std::forward<Types>(args)...); } // . . . };

Fine!

 using MyPacket = Schema< IntegerField, IntegerField, CharField >; using MyBigPacket = Schema< MyPacket, IntegerField, MyPacket >; MyBigPacket::serialize(std::cout, 11, 22, 'a', 33, 44, 55, 'b');