Dot the C / C ++ structures

Recently I got acquainted with C / C ++ - struct structures. Lord, yes, "what to get acquainted with them," you say? Thus, you will make two mistakes at once: ~~first, I am not Lord~~ , and secondly, I also thought that structures are structures in Africa. But as it turned out and - no. I will talk about a few vital details that will relieve any of the readers of hourly debugging ...

Field alignment in memory

Pay attention to the structure:
')

struct Foo { char ch; int value; };

Well, firstly, what is the size of this structure in memory? sizeof(Foo) ?
The size of this structure in memory depends on the compiler settings and on the directives in your code ...

In general, the fields in memory are aligned at the boundary with a multiple of its size. That is, 1-byte fields are not aligned, 2-byte fields are aligned to even positions, 4-byte fields are multiples of four, and so on. In most cases (or just assume that this is the case today), the alignment of the size of the structure in memory is 4 bytes. Thus, sizeof(Foo) == 8 . Where and how will stick extra 3 bytes? If you do not know - you will never guess ...

1 byte: ch
2 bytes: empty
3 bytes: empty
4 bytes: empty
5 bytes: value [0]
6 bytes: value [1]
7 bytes: value [2]
8 bytes: value [3]

Let us now see the placement in memory of the following structure:

 struct Foo { char ch; short id; int value; };

It looks like this:

1 byte: ch
2 bytes: empty
3 bytes: id [0]
4 bytes: id [1]
5 bytes: value [0]
6 bytes: value [1]
7 bytes: value [2]
8 bytes: value [3]

That is, the fact that you can shove up to 4 byte alignment is crammed with a bang (without increasing the size of the structure in memory), add another field:

 struct Foo { char ch; short id; short opt; int value; };

Let's look at the placement of fields in memory:

1 byte: ch
2 bytes: empty
3 bytes: id [0]
4 bytes: id [1]
5 bytes: opt [0]
6 bytes: opt [1]
7 bytes: empty
8 bytes: empty
9 bytes: value [0]
10 bytes: value [1]
11 bytes: value [2]
12 bytes: value [3]

All this is oh so sad, but there is a way to deal with it right from the code:

 #pragma pack(push, 1) struct Foo { // ... }; #pragma pack(pop)

We set the alignment size to 1 byte, described the structure, and returned the previous setting. Return the previous setting - strongly recommend. Otherwise, everything can end very badly. I once had this - Qt fell. Somewhere I stumbled their .h-nick below his .h-nick ...

Bit fields

In the comments, they pointed out to me that the bit fields in the structures according to the standard are “implementation defined” - therefore, it is better to avoid using them, but for me the temptation is too great ...

I’m getting not so much restlessness in my heart, but generally it’s getting bad, when I see in the code the filling of bit fields with the help of masks and shifts, like this:

 unsigned field = 0x00530000; // ... field &= 0xFFFF00FF; field |= (id) << 8; // ... field &= 0xFFFFFF83; field |= (proto) << 2;

All this smells of such sadness and such errors and their debugging, that I immediately start a migraine! And here from behind the scenes they come out - Bit Fields. What is most surprising is that they were still in C, but whoever I ask, everyone hears about them for the first time. This chaos must be corrected. Now I will give them all a link, or at least a link to this article.

How do you like this piece of code:

 #pragma pack(push,1) struct IpHeader { uint8_t header_length:4; uint8_t version:4; uint8_t type_of_service; uint16_t total_length; uint16_t identificator; // Flags uint8_t _reserved:1; uint8_t dont_fragment:1; uint8_t more_fragments:1; uint8_t fragment_offset_part1:5; uint8_t fragment_offset_part2; uint8_t time_to_live; uint8_t protocol; uint16_t checksum; // ... }; #pragma pack(pop)

And further in the code we can work with fields as we always work with fields in C / C ++. All work on shifts, etc. compiler takes over. Of course there are some limitations ... When you list several bit fields in a row that belong to the same physical field (I mean the type that is to the left of the bit field name), specify the names for all bits to the end of the field, otherwise you will not have access to these bits will be, ~~in other words~~ code:

 #pragma pack(push,1) stuct MyBitStruct { uint16_t a:4; uint16_t b:4; uint16_t c; }; #pragma pack(pop)

The structure turned out to be 4 bytes! The two halves of the first byte are the fields a and b . The second byte is not available by name and the last 2 bytes are available by name c . This is a very dangerous moment. After you have described the structure with bit fields, be sure to check its sizeof !

Also, the order of placement of bit pains in a byte depends on the byte order. When the order is LITTLE_ENDIAN, the bit fields are distributed starting from the first bytes, with BIG_ENDIAN - on the contrary ...

Byte order

I am also saddened in the code by the calls of the functions htons() , ntohs() , htonl() , nthol() in the code in C ++. In C, this is still valid, but not in C ++. I will never accept this! Attention all of the following applies to C ++!

Well, here I will be brief. I have already written in one of my previous articles what to do with byte orders. It is possible to describe structures that externally work as numbers, and inside they themselves determine the storage order in bytes. So our IP header structure will look like this:

 #pragma pack(push,1) struct IpHeader { uint8_t header_length:4; uint8_t version:4; uint8_t type_of_service; u16be total_length; u16be identificator; // Flags uint8_t _reserved:1; uint8_t dont_fragment:1; uint8_t more_fragments:1; uint8_t fragment_offset_part1:5; uint8_t fragment_offset_part2; uint8_t time_to_live; uint8_t protocol; u16be checksum; // ... }; #pragma pack(pop)

Attention is u16be to the types of 2-byte fields - u16be . Now the structure fields do not need any byte conversion. There are problems with fragment_offset , but whoever does not have them, there are problems. Nevertheless, you can also create a template that hides this disgrace, test it once and boldly use it in all of your code.

"The C ++ language is complicated enough to allow us to write on it simply" © Oddly enough - I

ZY I plan in one of the following articles to lay out the ideal, from my point of view, structures for working with the TCP / IP stack protocol headers. Speak - it's not too late!

Source: https://habr.com/ru/post/142662/

All Articles

Dot the C / C ++ structures

Field alignment in memory

Bit fields

Byte order

More articles: