
All programmers have to transfer data sooner or later. It's no secret that Java serialization libraries exist in about> 9000, but in C ++ they seem to be there, but they don't seem to exist either. Fortunately for most, a few years ago Google Protobuf appeared, which brought quite a convenient way to define data structures and quickly won the love of the whole people. It was actually the first library accessible to the masses, which allows you to drive ready-made data structures across the network without being associated with something like XML. The year was 2008.
Let's go back a bit. In 2006, a simple Indian programmer (no matter how suspicious it sounded!) Anil Madhavapeddi, one of the most famous OCaml developers in the world now and the author of the freshly-running
Real World OCaml , defended
his Ph.D. thesis at Cambridge. It is about her that I will tell you today.
Anil immediately went further than Google. He immediately thought, why do people usually send some formalized data structures over the network? To implement some kind of protocol. What is a protocol? This is some kind of
state machine . And where can we take a good example of a complex, well-designed and time-tested protocol? Yes, right in the usual network stack! So, a set of network data structures and protocols were taken: Ethernet frame, IPv4, ICMP, TCP, UDP, SSH, DNS and DHCP and problem statement: most of these protocols (especially SSH and DNS) are implemented, what is called "hands", and there is a wish that there were no buffer overflows typical for C, all transitions were performed automatically, it could all be verified, and that it worked quickly, and not as usual.
')
Since no one will read a thesis, I will say straight away: it is more than possible. Based on the results, reference implementations of DNS and SSH server were written and compared with BIND and OpenSSH. OCaml-implementation gave compared with traditional performance gains from insignificant to almost twofold. In addition, an error was found in the RFC at SSH (the working group was notified and the RFC was corrected). About what has been done, and how to live with it, read under the cut.
MPL
First of all, two description languages and their translators were written on OCaml. The first language is Meta Packet Language or MPL, which describes the structure of the package. In general, it is analogous to protobuf, but not quite. First, MPL does not add any overhead to your structure. At all. No extra bits indicating the type of data or something else. On the one hand, it does not allow, as in protobuf, to easily expand the structure by adding new fields to it, on the other - you will never read the TCP header with the protobuf. Secondly, MPL immediately implements all the logic that is needed in network structures — packaging or alignment, as well as such things as a variable set of fields or field values depending on other structure fields. For an example, look at the IPv4 header:
packet ipv4 { version: bit[4] const(4); ihl: bit[4] min(5) value(offset(options) / 4); tos_precedence: bit[3] variant { |0 => Routine |1 -> Priority |2 -> Immediate |3 -> Flash |4 -> Flash_override |5 -> ECP |6 -> Internetwork_control |7 -> Network_control }; tos_delay: bit[1] variant {|0 => Normal |1 -> Low}; tos_throughput: bit[1] variant {|0 => Normal |1 -> Low}; tos_reliability: bit[1] variant {|0 => Normal |1 -> Low}; tos_reserved: bit[2] const(0); length: uint16 value(offset(data)); id: uint16; reserved: bit[1] const(0); dont_fragment: bit[1] default(0); can_fragment: bit[1] default(0); frag_offset: bit[13] default(0); ttl: byte; protocol: byte variant {|1->ICMP |2->IGMP |6->TCP |17->UDP}; checksum: uint16 default(0); src: uint32; dest: uint32; options: byte[(ihl * 4) - offset(dest)] align(32); header_end: label; data: byte[length-(ihl*4)]; }
Here the contents of the packet are described as an array of data bytes (so as not to describe all other possible protocols), but in its place there could well be a record:
classify (protocol) { |1: "ICMP"-> data: packet icmp(); |2: "TCP" -> data: packet tcp(); |3: "UDP" -> data: packet udp(); };
And then when reading an IPv4 package, we would immediately parse its contents. As a result, (de) serialization of any package turns into a fascinating activity that does not require any thoughts on how to properly package the data and where to align it. So, the almost complete implementation of the popular
MessagePack format took me about 40 lines.
Unfortunately, this language has its drawbacks. I counted them exactly two. First: recursive packages are prohibited. This just caused the impossibility of a complete implementation of the MessagePack - MPL can not be said that lists or maps can lie in a list or map, you have to describe them simply as a pack of bytes, and then parse them with another call for deserialization. This is done on purpose so that each reading of the package is strictly finite, but this is not easier for us. The second problem: you can not define your data types. Anil implemented network-standard types, such as bytes, bits, numbers, strings, or even mpint, which is present in SSH, but this list is fixed. If you suddenly want to implement the ssh-agent protocol, which uses the mpint1 type, you just have to describe it as an array of bytes and parse it in your code. The only way to expand the list of supported types is to write MPL compiler patches, which is not a trivial task.
SPL
The second language was the Statecall Policy Language or SPL. This is a language for describing finite automata, that is, the heart of our protocol. Strictly speaking, libraries for creating finite automata for all languages exist decently. There are only a few differences from the SPL from them (apart from their own description language instead of the software assignment of the automaton). First, the SPL compiler can immediately generate a PROMELA description for the SPIN
software model verifier . I will be honest, I could not figure out SPIN, so in this place I had to believe the author for the word that this is cool. Second, by using the state names
Receive_NAME and
Transmit_NAME (where
NAME is the message type from the MPL), you can tightly integrate the state machine with the data structures from the MPL. We will talk about this later, but for now let's look at an example of the description of the state machine for authorization in SSH:
automaton auth (bool success, bool failed) { Receive_Transport_ServiceAccept_UserAuth; Transmit_Auth_Req_None; Receive_Auth_Failure; do { either { always_allow (Receive_Auth_Banner) { either { Transmit_Auth_Req_Password_Request; auth_decision (success); } or { Transmit_Auth_Req_PublicKey_Request; auth_decision (success); } or { Transmit_Auth_Req_PublicKey_Check; either { Receive_Auth_PublicKeyOK; } or { Receive_Auth_Failure; } } } } or { Notify_Auth_Permanent_Failure; failed = true; } } until (success || failed); }
As you can see, the state machine in SPL allows you to write quite extensive logic, insert function calls (
auth_decision ) written in the same SPL, and even work with variables.
How to work with it?
Unfortunately, the project (the whole of it is called
Melange ) is not too rich in documentation, its main source is the dissertation itself. Therefore, I decided to write a small Proof of concept, demonstrating the work of the entire product, and at the same time being a small Quick Start guide. To do this, write some small network application. For the role of a simple application with a simple and clear protocol, I chose the good old game - sea battle. This is what our message structure will look like:
packet message { message_type: byte; message_id: uint16; classify (message_type) { | 0:"Shot" -> row: bit[4]; column: bit[4]; | 1:"ShotResult" -> result: byte variant { |0 -> Missed |1 -> Damaged |2 -> Killed }; | 2:"Disconnect" -> (); }; }
We have three types of messages: a shot, the result of a shot, and the information that for some reason we want to disconnect. Now let's take a look at the proposed protocol:
automaton seawar () { Initialize; during { multiple(1..) { Ready; either { Transmit_Message_Shot; Receive_Message_ShotResult; } or { Receive_Message_Shot; Transmit_Message_ShotResult; } } } handle { either { Transmit_Message_Disconnect; exit; } or { Receive_Message_Disconnect; exit; } } }
We start with the initialization (the initial state of our automaton), and then at each step, either we shoot, or at us. It's simple. Moreover, if for some reason a Disconnect message appears, this means that the game is over and the machine must be stopped.
Now let's see how this is used from the code. To read messages, we will use a special MPL buffer, which will be filled with data - in our case, we will take them from the network.
val mutable env_ = Mpl_stdlib.new_env (String.make 4 '\000'); val mutable tick_ = Protocol.init (); method tick state = tick_ <- Protocol.tick tick_ state; method send_message msg = ( Mpl_stdlib.reset env_; self#tick (msg#xmit_statecall :> Protocol.s); if not (Thread.wait_timed_write sock_ 10.) then self#disconnect ~exc_text:"Timeout"; Mpl_stdlib.flush env_ sock_ ); method receive_message = ( Mpl_stdlib.reset env_; if not (Thread.wait_timed_read sock_ 300.) then self#disconnect ~exc_text:"Timeout"; Mpl_stdlib.fill env_ sock_; let msg = Message.unmarshal env_ in let state = Message.recv_statecall msg in self#tick state; msg );
Please note that each message, both sent and received, necessarily causes the state machine to transition to a new state. The calls
msg # xmit_statecall and
Message.recv_statecall msg are responsible for this, which, based on messages (such as ShotResult), create the names of the corresponding states (
Transmit_Message_ShotResult and
Receive_Message_ShotResult ). Due to this, most of the potential errors of the program will be detected here, when the wrong transition of the machine will cause the exception Bad_statecall. For example, if in the case of AI everything is simple - it works in one stream, completely synchronous and there can never be any problems in such a simple task, then in the graphical interface everything can be more complicated.

For example, a simple example of how everything can easily "explode." For the graphical interface, I took the freshly released Qt 5.2 framework, for which our compatriot Dmitry Kosarev wrote
OCaml bindings (interesting enough, if there are any, I’ll tell you in a separate post). When clicking on the cell of the enemy field in a separate thread, the following code can be executed:
let send_shot col row = ignore(game#send_message (Message.Shot.t ~row:row ~column:col)); let result, state = game#receive_message in (match result with | `ShotResult x -> Board.mark opp_board row col x#result; next_turn state x#result | `Disconnect x -> game#disconnect ~send:false | _ -> game#disconnect ~exc_text:"Unexpected_Message_Type" ~raise_exc:true ~send:true)
If a double shot is not prevented, this method will be called twice - in this case, either two messages with a shot in a row can be sent, or the second message can be sent after receiving a message about a miss.
In order not to overload the article with a full review of the code, I’d rather give a link to the
source code that anyone can download and watch.
Conclusion
What's next, the reader will ask me. Well, some marginal, already 4 years unresolved tool, working in an incomprehensible language. Why do I need all this if I have Node.js, MongoDB and strawberry smoothies?
The reader is right. The tool is outdated, it has several significant flaws that I mentioned. But at the same time he shows in which direction it is necessary to develop. So, all application code, including all declarative descriptions, a graphical interface and not the dumbest AI, is 850 lines. This is certainly not "30 lines of javascript", but also not too much.
Almost 8 years ago, it was shown exactly how the networking should take place, and almost 6 years ago, Google popularized only half of it. There is no roktsaysaensa in the idea, it is true, and separately all the components have long been written. You,% USERNAME%, have a great opportunity to realize this idea in the coming New Year, to become world famous and enslave the world. Well, or something like that.