Splitting a continuous data stream into structural units

Quite often a situation arises when it is necessary to transfer data blocks along a continuous stream. In this case, the question of how to separate one data block from another comes to the fore. The second question is whether to transmit data in binary or text form. Add to this the possibility of continuing work with small distortions (loss, garbage, errors of interacting nodes) and the need for effective utilization of the data transmission channel. In this case, the problem should be solved on a simple microcontroller with limited resources.

Such tasks arise, for example, when transmitting telemetry and for controlling remote equipment. On the one hand, there is usually a simple microcontroller, on the other hand there is a computer. Communication between them can be carried out by the old, good RS232. Although it is more difficult, for example, the output of the UART microcontroller is converted to 802.11b, then the radio signal propagates to the radio tower and Ethernet comes to the server.
')
If my bike is interesting on this topic, welcome under cat.

First of all, let's define the requirements:

The channel can be created at any time.
Both the controller and the computer can be connected to the channel at an arbitrary point in time, including in the middle of the data packages.
If a channel is corrupted, the corresponding data block must be discarded.
On one channel can be multiple devices.
Data blocks can contain any sequence of bytes and be of arbitrary length.
Resources allocated to support the protocol are strictly limited.

It turns out the implementation is remotely resembling UDP.

Let us consider several common methods using the example of transmitting three numbers of "voltages", as one block and two other numbers "temperature", as another block.

Often there is a solution to this problem by translating all the sent data into a text view, and separating the blocks (packages) with a line feed.

It might look like this:

V 1231, 2400, -231
V 1333, 2100, -232
T 36, -40
Use sprintf (buf, “V% d,% d”, ...).

Transmission problem: sprintf uses a significant amount of stack, quite a long one.

Acceptance problem: on the controller, converting the character set “-232” to int requires additional resources. No type control, 100,500 error checking conditions.

As pluses - a person can see the transmitted parameters with the naked eye.

If the project continues to evolve on this protocol, then after a while it will be impossible to maintain it and even the availability of human analysis will disappear.

Partially, you can solve the problem by transferring not decimal digits, but hexadecimal ones - this will simplify processing, but it will not get rid of other problems.

To improve control, you can wrap the transmitted data in XML:

<?xml version="1.0" encoding="UTF-8"?> <voltages> <voltage num=”1”>1231</voltage> <voltage num=”1”> 2400</voltage> <voltage num=”1”> -231</voltage> </voltages>

Or you can wrap in JSON:

 { “Voltages”: [1231, 2400, -231] }

The protocol will be documented. But at the same time, there is still no type control at the compilation stage. And the amount of additionally transmitted data becomes excessively large. At the same time, there remains the problem of parsing numbers and text on limited resources.

If one of the nodes continuously transmits something, then the moment of connecting the second node may fall in the middle of the transmitted data. In most cases, it is impossible to reliably determine the beginning of a package on a freshly-included node, so you have to either wait for a line break or rely on luck. In addition, there is no packet inspection for correctness (for example, due to a close thunderstorm, one bit was transmitted wrong). These disadvantages can be solved using an approach close to the NMEA protocol:

$ GNVLT, 1231,2400, -231 * 71

Here the beginning of the packet is always $, the end of the packet is an asterisk and a CRC (0x71) of the packet data. This is where the problem of package incorrectness is solved (the classic CRC is very simple here - XOR), but there are still problems in documenting and controlling the types of packages.
It turns out that when using a text stream, there is a lot of overhead, there is little type control, there are difficulties in documenting.

Consider data transfer in a binary way. The sets of bytes will be transmitted, therefore it is necessary to define structures:

 typedef struct { int16_t supplyIn; int16_t supplyOut; int16_t groundPotential; }PackVoltages; typedef struct { int8_t internal; int8_t outside; }PackTemperature;

We have 6 bytes for voltages and 2 bytes for temperatures.

You can combine these structures into one, add 5 more bytes there for further protocol expansion, enter the type of transmitted data in the structure:

 typedef struct{ char type; union { PackVoltages voltages; PackTemperature temperatures; }Data; char rezerv[5]; }FullPack;

We get a packet of 16 bytes (because of not obvious alignment), and not 12 as it might seem (in our example this is not a problem, but you should be careful when aligning). You can include compiler options for dense packing bytes, but another problem may arise - some processors (on the ARM core) cannot read unaligned data and it is not so easy for novice Jedi to find this error.

Further, the channel is always transmitted by 16 bytes. The receiving party waits to receive the next 16 bytes and processes the next packet. It is easy to see that there is no CRC. Add CRC:

 typedef struct{ char type; union { PackVoltages voltages; PackTemperature temperatures; }Data; char rezerv[5]; char CRC; }FullPack;

The packet size was 16 bytes. When sending data, we need to put down the type of the package, calculate and add it to the CRC package, then send the received package.

The advantages of this approach are that when processing packages, type checking appears during compilation. No unnecessary conversion number - text and back. The received portion of data is always the same size - it is convenient to allocate memory in advance.
There are many shortcomings: there is an extra overhead of data transfer from small values up to tens of times with significantly different lengths of transmitted data. This synchronization method is well suited for client-server communication over a TCP channel — nothing is lost along the way, the beginning of a packet is always known. In the situation of connecting to the channel after initialization, a situation is possible when the first few bytes have not reached. Then all received packets will be shifted by this number of bytes and the data in them will naturally be incorrect. Well, if the CRC discards them, there will be no connection with the node. And with a probability of 1/256, skips for “broken” packets are possible. You can try to solve this problem by transferring a certain signature byte of the “beginning” of a packet, but, given that we are transmitting binary data, the same byte can also be found in the data itself. Therefore, it is not always possible to reliably determine the beginning of a packet. Another problem is the alignment of variables. One byte is required for a packet header, 32-bit numbers can often be found in the data themselves, which will result in periodic data shifts of 0-2 bytes. An annoying nuisance is that the CRC needs to be calculated "manually" when sending different types of packets.

Another option is similar to the previous one, to reduce the overhead of transferring bytes at the beginning of the packet, its real length is transmitted. The problem with this approach is that the packet must be fully calculated in advance (that is, placed in memory) and then transmitted. This can be a difficult task on limited microcontroller resources, especially for large packets. In addition, while the library does not receive the entire packet, the transmission cannot begin, which can adversely affect the bandwidth and latency of the channel. Other disadvantages and advantages are similar to the previous version.

And the last considered method, it is also used in my library. The historical name is Bin Protocol or BIN Protocol.
When sending binary data, they can be separated by a dedicated byte. However, if this byte is found in the data, replace it with another sequence of bytes. When receiving do the reverse procedure. This method is called “byte stuffing” (thanks for the Flexz name)
Counting the CRC is also to entrust the transfer package.

For various conversions to work, you must reserve three bytes. It is best to choose them so that they are less likely to meet. Rules for replacing bytes:
<Splitting> = <Splitting> <Splitting>
<Final> = <Splitting> <Optional>
<Optional> = <Optional>

This shows that <Final> cannot be declared in the data stream in any way.
You can create a package like this:
< Split > Protocol data <Final>
If you want increased reliability, you can create a package like this:
<Final> <Splitting> Protocol_Name <Splitting> <Final>

The second form increases the percentage of discarded spoiled small packages by about 15%, while adding overhead costs of the same 10-15%, so it will not be considered further.

Thus, when receiving packets, even if we connected at an arbitrary point in time, it is enough to wait for the <Sharing> symbol to begin receiving a packet. And only by receiving the <Final> byte, it is necessary to check the correctness of the packet and send it for processing.

Now you can see what the "protocol data" consists of:

Header: 1 byte - packet type, 1 byte destination address
Data: the data itself, processed by the above rules
CRC: 1 byte, also processed by the above rules.
That is, you can send an arbitrary number of bytes, and they will be wrapped with markers of the beginning and end of the packet, the packet type, the destination address and the CRC are added to them.

For our case, it will look like a definition of structures with data:

 typedef struct { int16_t supplyIn; int16_t supplyOut; int16_t groundPotential; }PackVoltages; typedef struct { int8_t internal; int8_t outside; }PackTemperature;

And the knowledge that the first structure will be denoted by the symbol 'V', and the second 'T'. These parameters will be transmitted through a function with 3 parameters - this is the type of the packet, the address of the beginning of the transmitted data and the length of the transmitted data.
BP_SendMyPack ('T', & packTemperature, sizeof (PackTemperature));

And in the transmission channel, there will be the following sequence:

On small packages, a large overhead projector is obtained, but as the size of the package grows, it becomes imperceptible.
The real information transmitted is marked in gray, the necessary minimum of auxiliary information is necessary for white, and yellow is my binprotocol overhead.

The advantages of this approach are that there is little overhead, arbitrary connection is not a problem, a good level of abstraction from data - you can use the same functions for sending and receiving on different devices, programs, protocols. Good type control can be achieved at compile time. The data itself can be aligned in the program in the correct way, and this will not add extra bytes when sending packets.

Disadvantages: the type of the packet and the address of the packet cannot be equal to special characters and can take values from 1 to 254. The CRC byte is only one and, as a result, there is 1/256 probability of missing a bad packet.

When transferring parameters in binary form between different architectures, it is necessary to consider the order of bytes . In case of differences, it is necessary to use conversion functions that replace the order of bytes to the reverse.

As a working illustration of the protocol, a small QT program is attached. At startup, the program opens a TCP socket and starts itself again with the connection parameters to this socket. That is, two almost identical instances of the program are created that are interconnected by a TCP socket. If necessary, you can start with the necessary keys to start the server or client of the program separately.

Available keys:
-dedicated - creates a server, displays connection parameters in the console.
-child - connects by the specified attributes:
-A: 11.22.33.44 - ip connection address (default is localhost)
-P: 12345 - connection port
Without keys - starts the server and client, and connects them.
Program - translates user actions through a binary protocol into a socket, and also listens to the return channel and performs actions based on the received data.
In the program, any mouse button on a black background draws an expanding circle before releasing the button. Clicking on the top rainbow strip changes the current color. It’s very fun to draw a couple with a baby from two different computers :)

Explanations for working with a binary protocol.
In the MainWindow class, all the interaction between the protocol and TCP connections is collected (instead of TCP, you could use anything else).

In the constructor MainWindow :: MainWindow, the private function is called initBinProtocol (); Which initializes the protocol. In the same place, the address of the function on the “byte issue” globalSendCharToExternal () is transferred to the protocol. Then the handler of signals arrival of characters in TCP socket is installed on ReadFromParent (), which as a result, character-by-character transfers all received bytes to the protocol handler.

After the TCP connection to the connected session, another ReadFromChild () handler is hung, which similarly transfers all received bytes to the protocol handler.

The PackTypes.h file contains all types of transmitted packets. In fact, this is a description of the protocol. The type TPackAllTypes is entered for convenience of processing on a computer; it is not necessary to use this type on a microcontroller.

The PaintBox class contains the actual work with the protocol. Packages from other instances of the program are checked once every 50ms by timer. If desired, you can do the processing by receiving the last byte of the whole packet.

Events are sent to the protocol at the moments when the mouse buttons are pressed and released, as well as when the “clear” button is pressed through the BP_SendMyPack () function. At first the structure with binary values of parameters is filled, then it is transferred. To send a clear command, no data is required and all that is transmitted is a command byte.

The PaintBox :: timerCheckPacks () function periodically checks for the presence of commands in the protocol buffer ( BP_IsPackReceived () ) and their execution.

The Types.h file contains definitions of similar basic types for cross-compilation by different compilers for different platforms, it may be necessary to edit it in your case.

In general, the code is documented, so it's not hard to figure out.

Links to github with library:

git: git@github.com: Elvish / microkern.git
http: https://github.com/Elvish/microkern

Ps. And how do you split the streams into pieces? If you have questions, I will try to answer. If you are interested in the topic - in the next article I can issue a simplified shell with an auto-add-on for the simplest controllers (an indispensable thing for debugging almost any small device).

UPD.
Another crushing method from Flexz :
The method used, for example, in the Modbus-RTU protocol. Packets are separated by “silence” intervals, the line is kept in an inactive state for the time required to transmit several characters. In this case, it is not necessary to process each byte to disassemble byte stuffing, and you can use the DMA receiver, if it is available, of course.

From the author: In my opinion time separation is possible on lightly loaded lines or at very accurate clocks. In a situation where physical data transfer is difficult to monitor (for example, relaying over WiFi). Packages can be combined with each other or split up arbitrarily. Even some USB-RS232 adapters are wrong with this - they combine packets of 8 bytes each and as a result not all the hardware works through adapters.

The presence of two different bytes of the beginning and end of the packet is associated with an increased protocol stability with heavy channel losses. Initially, the protocol was developed when the transmission was conducted by ancient radio modems on mobile equipment in very poor visibility conditions. With one byte, the protocol was more often mistaken when receiving the beginning of one packet (then the transmission failed) and the end of another packet.

Source: https://habr.com/ru/post/174115/

All Articles

Splitting a continuous data stream into structural units

More articles: