Working with structures in C #

In the wake of the recent topic “Processing large amounts of data in memory in C # ” I present a translation of the article about structures mentioned there.

Structures are the fundamental data types in C # and most other modern programming languages. At their core, the structures are simple, but you may wonder how quickly work with them can become difficult. Most often problems arise if you have to work with structures created in other languages and stored on disk or obtained as a result of calling functions from libraries or COM objects. In this article, I mean that you are familiar with the concept of structure, know how to define them and have basic skills of working with structures. It is assumed that you have an idea of how to call API functions using p / Invoke, as well as marshaling. In case of uncertainty in your knowledge, you can refer to the documentation.
Many of the techniques described in this article can be extended and applied to any type of data.

Location

In most cases, you can describe and use a structure without knowing how it is implemented — especially how its fields are located in memory. If you have to create a structure for use by other applications, or you have to use someone else's structure, then the memory issues become important. What do you think, what is the size of the following structure?

public struct struct1 { public byte a; // 1 byte public int b; // 4 bytes public short c; // 2 bytes public byte d; // 1 byte }

The reasonable answer is 8 bytes, just the sum of the sizes of all the fields. However, if you try to find out the size of the structure:

 int size = Marshal.SizeOf(test);

... then (in most cases) find that the structure is 12 bytes. The reason lies in the fact that most processors work better with data that is larger than a byte, and aligned to certain address boundaries. Pentium prefers data in blocks of 16 bytes aligned to the address boundaries with a size identical to the size of the data itself. For example, a 4-byte integer should be aligned on a 4-byte boundary. Detailed details in this case are unimportant. The important thing is that the compiler will add the missing bytes to align the data within the structure. You can control this manually, however please note that some processors may return an error if unaligned data is used. This creates additional problems for users of the .NET Compact Framework ( interesting, many of these? - Lane comment. ).

To work you need a link to InteropServices:

 using System.Runtime.InteropServices;

For manual arrangement of fields in memory, the StructLayout attribute is used. For example:

 [StructLayout(LayoutKind.Sequential)] public struct struct1 { public byte a; // 1 byte public int b; // 4 bytes public short c; // 2 bytes public byte d; // 1 byte }

This forces the compiler to place the fields sequentially, in the order of the declaration, which it does by default. Other attribute values are the Auto value, which allows the compiler to determine the order of the fields themselves, and the Explicit value, which allows the programmer to specify the size of each field. The Explicit type is often used for sequential packing, but in most cases it is easier to use the Pack parameter. It tells the compiler how much memory should be allocated and how the data should be aligned. For example, if you specify Pack = 1, then the structure will be organized in such a way that each field will be within one byte and can be read byte by-ie, no packaging required. If you change the structure declaration:

 [StructLayout(LayoutKind.Sequential, Pack=1)] public struct struct1

... then you will find that now the structure occupies exactly 8 bytes, which corresponds to the sequential arrangement of the fields in memory without additional “packing” bytes. This is the way to work with most of the structures declared in the Windows API and C / C ++. In most cases, you will not need to use other values for the Pack parameter. If you set Pack = 2, then you will find that the structure will occupy 10 bytes, because one byte will be added to each single byte field so that the data can be read in 2 byte chunks. If you set Pack = 4, the size of the structure will increase to 12 bytes, so that the structure can be read in blocks of 4 bytes. The parameter value will no longer be taken into account, because the Pack size is ignored if it is equal to or exceeds the alignment used in this processor and is 8 bytes for the Intel architecture. The location of the structure in memory for different values of the Pack is shown in the figure:

It is also worth mentioning that it can change the way the structure is packaged by changing the order of the fields in it. For example, when changing the order of the fields to:

 public struct struct1 { public byte a; // 1 byte public byte d; // 1 byte public short c; // 2 bytes public int b; // 4 bytes }

... the structure does not need packaging, it already takes exactly 8 bytes.
')

To be precise

If you need to specify exactly how much memory will be allocated for each field, use the location type Explicit. For example:

 [StructLayout(LayoutKind.Explicit)] public struct struct1 { [FieldOffset(0)] public byte a; // 1 byte [FieldOffset(1)] public int b; // 4 bytes [FieldOffset(5)] public short c; // 2 bytes [FieldOffset(7)] public byte d; // 1 byte }

So you get an 8-byte structure without additional leveling bytes. In this case, this is equivalent to using Pack = 1. However, using Explicit allows you to completely control memory. For example:

 [StructLayout(LayoutKind.Explicit)] public struct struct1 { [FieldOffset(0)] public byte a; // 1 byte [FieldOffset(1)] public int b; // 4 bytes [FieldOffset(10)] public short c; // 2 bytes [FieldOffset(14)] public byte d; // 1 byte }

This structure will occupy 16 bytes, along with the extra bytes after field b. Prior to C # 2.0, the Explicit type was mainly used to specify buffers with fixed sizes when calling third-party functions. You cannot declare a fixed-length array in a structure because field initialization is prohibited.

 public struct struct1 { public byte a; public int b; byte[] buffer = new byte[10]; public short c; public byte d; }

This code will generate an error. If you need a 10-byte array, here’s one way:

 [StructLayout(LayoutKind.Explicit)] public struct struct1 { [FieldOffset(0)] public byte a; [FieldOffset(1)] public int b; [FieldOffset(5)] public short c; [FieldOffset(8)] public byte[] buffer; [FieldOffset(18)] public byte d; }

So you leave 10 bytes for the array. There are a number of interesting nuances. First, why use an 8 byte offset? The reason is that you cannot start an array with an odd address. If you use a 7 byte offset, you will see a runtime error indicating that the structure could not be loaded due to alignment problems. This is important because when using Explicit you may run into problems if you don’t understand what you are doing. The second point is related to the fact that additional bytes are added to the end of the structure, so that the structure size is a multiple of 8 bytes. The compiler is still involved in how the structure will be located in memory. Of course, in practice, any external structure that you try to convert to a C # structure must be correctly aligned.
Finally, it is worth mentioning that you cannot access a 10-byte array using the array name (for example, buffer [1]), because C # thinks the value is not assigned to the array. Therefore, if you cannot use an array and this causes a problem with alignment, it is much better to declare the structure like this:

 [StructLayout(LayoutKind.Explicit)] public struct struct1 { [FieldOffset(0)] public byte a; // 1 byte [FieldOffset(1)] public int b; // 4 bytes [FieldOffset(5)] public short c; // 2 bytes [FieldOffset(7)] public byte buffer; [FieldOffset(18)] public byte d; // 1 byte }

To access the array, you will need to use arithmetic on pointers, which is unsafe code. To allocate a fixed number of bytes for the structure, use the Size parameter in the StructLayout attribute:

 [StructLayout(LayoutKind.Explicit, Size=64)]

Now in C # 2.0, fixed-size arrays are allowed, so all of the above constructs are generally optional. It is worth noting that fixed-length arrays use the same mechanism: allocating a fixed number of bytes and pointers (which is also unsafe). If you need to use arrays to call functions from libraries, perhaps the best way is explicitly marshaling arrays, which is considered “safe”. Let's look at all the three mentioned ways.

API calls

As an example of a structure that requires alignment, we can use the EnumDisplayDevices function, which is defined as follows:

 BOOL EnumDisplayDevices( LPCTSTR lpDevice, // device name DWORD iDevNum, // display device PDISPLAY_DEVICE lpDisplayDevice, // device information DWORD dwFlags // reserved );

It is quite simply converted to C #:

 [DllImport(“User32.dll”, CharSet=CharSet.Unicode )] extern static bool EnumDisplayDevices( string lpDevice, uint iDevNum, ref DISPLAY_DEVICE lpDisplayDevice, uint dwFlags);

The DISPLAY_DEVICE structure is defined as:

 typedef struct _DISPLAY_DEVICE { DWORD cb; WCHAR DeviceName[32]; WCHAR DeviceString[128]; DWORD StateFlags; WCHAR DeviceID[128]; WCHAR DeviceKey[128]; } DISPLAY_DEVICE, *PDISPLAY_DEVICE;

It is clear that it contains four character arrays with a fixed length. Using the Explicit alignment type, we rewrite the structure in C #:

 [StructLayout(LayoutKind.Explicit, Pack = 1,Size=714)] public struct DISPLAY_DEVICE { [FieldOffset(0)] public int cb; [FieldOffset(4)] public char DeviceName; [FieldOffset(68)] public char DeviceString; [FieldOffset(324)] public int StateFlags; [FieldOffset(328)] public char DeviceID; [FieldOffset(584)] public char DeviceKey; }

Note the use of the Size parameter to specify the space required to store the DeviceKey field. Now if we use this structure when calling the function:

 DISPLAY_DEVICE info = new DISPLAY_DEVICE(); info.cb = Marshal.SizeOf(info); bool result = EnumDisplayDevices(null, 0, ref info, 0);

... then all that you can refer to directly is the first characters of the arrays. For example, a DeviceString contains the first character of a string of device information. If you want to get the rest of the characters from the array, you need to get a pointer to DeviceString and use pointer arithmetic to go through the array.
When using C # 2.0, the simplest solution is to use arrays in the structure:

 [StructLayout(LayoutKind.Sequential, Pack = 1)] public unsafe struct DISPLAY_DEVICE { public int cb; public fixed char DeviceName[32]; public fixed char DeviceString[128]; public int StateFlags; public fixed char DeviceID[128]; public fixed char DeviceKey[128]; }

Note that the structure must be marked with the unsafe modifier. Now, after the API call, we can get data from arrays without using pointers. However, implicitly, they are still used, and any code that accesses arrays should be marked as unsafe.
The third and last method is custom marshaling. Many C # programmers do not understand that the essence of marshaling is not only how type data is passed to library calls, it is also an active process that copies and modifies managed data. For example, if you want to pass a reference to a typed array, you can pass it by value, and the system will convert it to a fixed-length array and back to the managed array without further action on your part.
In this case, all we have to do is add the MarshalAs attribute, indicating the type and size of the arrays:

 [StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Unicode)] public struct DISPLAY_DEVICE { public int cb; [MarshalAs(UnmanagedType.ByValArray, SizeConst=32)] public char[] DeviceName; [MarshalAs(UnmanagedType.ByValArray, SizeConst=128)] public char[] DeviceString; public int StateFlags; [MarshalAs(UnmanagedType.ByValArray, SizeConst = 128)] public char[] DeviceID; [MarshalAs(UnmanagedType.ByValArray, SizeConst = 128)] public char[] DeviceKey; }

In this case, when calling a library function, the fields are transmitted by creating unmanaged arrays of the required length inside the copy of the structure, which is passed to the call. When the function completes its work, unmanaged arrays are converted into managed character arrays and references to them are assigned to structure fields. As a result, after the function calls, you will find that the structure contains an array of the right size, filled with data.
In the case of calling functions of third-party libraries, the use of custom marshaling is the best solution, since it uses secure code. Although calling third-party functions with p / Invoke is not safe in a general sense.

Serialization of structures

Now, after we have examined the rather complex issues associated with the placement of structures in memory, it's time to learn how to get all the bytes that make up the structure. In other words, how to serialize the structure? There are many ways to do this, the Marshal.AllocHGlobal method is most often used to allocate heap memory for an unmanaged array. After that, everything is done by memory functions, such as StructToPtr or Copy. Example:

 public static byte[] RawSerialize(object anything) { int rawsize = Marshal.SizeOf(anything); IntPtr buffer = Marshal.AllocHGlobal(rawsize); Marshal.StructureToPtr(anything, buffer, false); byte[] rawdata = new byte[rawsize]; Marshal.Copy(buffer, rawdata, 0, rawsize); Marshal.FreeHGlobal(buffer); return rawdata; }

In fact, there is no need for so many actions; it is easier to move the structure bytes directly to a byte array without using an intermediate buffer. The key object in this method is GCHandle. It will return the Garbage Collector handle, and you can use the AddrOfPinnedObject method to get the starting address of the structure. The RawSerialize method can be rewritten as follows:

 public static byte[] RawSerialize(object anything) { int rawsize = Marshal.SizeOf(anything); byte[] rawdata = new byte[rawsize]; GCHandle handle = GCHandle.Alloc(rawdata, GCHandleType.Pinned); Marshal.StructureToPtr(anything, handle.AddrOfPinnedObject(), false); handle.Free(); return rawdata; }

This method is easier and faster. You can use the same methods to deserialize data from a byte array into a structure, but it would be more useful to consider solving the problem of reading a structure from a stream.

Reading structures from streams

Sometimes there is a need to read a structure, possibly written in another language, into a C # structure. For example, you need to read a bitmap file that starts with a file header, then a bitmap header, and then the actual bit data. The file header structure looks like this:

 [StructLayout(LayoutKind.Sequential, Pack = 1)] public struct BITMAPFILEHEADER { public Int16 bfType; public Int32 bfSize; public Int16 bfReserved1; public Int16 bfReserved2; public Int32 bfOffBits; };

A function that will read any stream and return a structure can be written without using generalizations:

 public object ReadStruct(FileStream fs, Type t) { byte[] buffer = new byte[Marshal.SizeOf(t)]; fs.Read(buffer, 0, Marshal.SizeOf(t)); GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned); Object temp = Marshal.PtrToStructure(handle.AddrOfPinnedObject(), t); handle.Free(); return temp; }

For data transfer, GCHandle is used here. New in this code is the use of a parameter indicating the type of structure. Unfortunately, you cannot use this type for the return value, so after calling the function, you must convert its result:

 FileStream fs = new FileStream(@”c:\1.bmp”, FileMode.Open, FileAccess.Read); BITMAPFILEHEADER bmFH = (BITMAPFILEHEADER) ReadStruct(fs, typeof(BITMAPFILEHEADER));

If we want to avoid conversion, then we need to use the generalized method:

 public T ReadStruct<T> (FileStream fs) { byte[] buffer = new byte[Marshal.SizeOf(typeof(T))]; fs.Read(buffer, 0, Marshal.SizeOf(typeof(T))); GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned); T temp = (T) Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T)); handle.Free(); return temp; }

Notice that now we have to convert the object returned by the PtrToStructure method in the method itself, and not in the place of the call, which now looks like this:

 BITMAPFILEHEADER bmFH = ReadStruct<BITMAPFILEHEADER>(fs);

It's nice to see how much better the use of the generic method looks.

Manual marshaling

Marshaling works so well in the overwhelming number of cases that you can forget about its existence altogether. However, if you encounter something unusual, you may wonder what happens when the marshaling stops working. For example, some API calls need to pass a pointer to a pointer to a structure. You already know how to pass a pointer to a structure — it's just passing by reference — and therefore it may seem to you that passing a pointer to a pointer is also easy. However, everything is more complicated than you expect. Let's get a look.
In the AVIFileCreateStream function, the last two parameters are passed as pointers to the IntPtr and structure, respectively:

 [DllImport(“avifil32.dll”)] extern static int AVIFileCreateStream(IntPtr pfile, ref IntPtr pavi, ref AVISTREAMINFO lParam);

To call this function, you would write:

 result = AVIFileCreateStream(pFile, ref pStream, ref Sinfo);

Based on the previous examples, it seems easier to change the transmission of a pointer to a structure by the pointer itself. It would seem that could be wrong in the following ad:

 [DllImport(“avifil32.dll”)] extern static int AVIFileCreateStream(IntPtr pfile, ref IntPtr pavi, IntPtr lParam);

However, if you try to transfer the address of the pinned structure:

 GCHandle handle = GCHandle.Alloc(Sinfo, GCHandleType.Pinned); result = AVIFileCreateStream(pFile, ref pStream, handle.AddrOfPinnedObject()); handle.Free();

... you will see an error.

The reason for this error is that although you pass a pointer to the address of the beginning of the structure, this structure is located in managed memory, and unmanaged code cannot access it. We forget that standard marshaling does some more work when creating pointers. Before you create pointers, for all parameters passed by reference, complete copies are created in unmanaged memory. After the end of the call, data from unmanaged memory is copied back to managed.
Writing a similar function that does the marshaling job is easy and obviously useful:

 private IntPtr MarshalToPointer(object data) { IntPtr buf = Marshal.AllocHGlobal(Marshal.SizeOf(data)); Marshal.StructureToPtr(data, buf, false); return buf; }

Here, IntPtr is simply returned to the area on the heap that contains a copy of the data. The only bad thing is that you need to remember about freeing the allocated memory:

 IntPtr lpstruct = MarshalToPointer(Sinfo); result = AVIFileCreateStream(pFile, ref pStream, lpstruct); Marshal.FreeHGlobal(lpstruct);

The code above works exactly like standard marshaling. However, do not forget that lpstruct is passed by value as an integer. In order to copy the result back into the structure, we need another method:

 private object MarshalToStruct(IntPtr buf, Type t) { return Marshal.PtrToStructure(buf, t); }

Now, after we have implemented the manual marshaling of the pointer into the structure, we need to get a pointer to the pointer to the structure. Fortunately, we do not need to write new code, because our structure-to-pointer transformation function can convert any data type to an unmanaged pointer — including the pointer itself.

As an example, take the AVISaveOption function, since it takes two pointers to the pointer as parameters:

 [DllImport(“avifil32.dll”)] extern static int AVISaveOptions( IntPtr hWnd, int uiFlags, int noStreams, IntPtr ppavi, IntPtr ppOptions);

In fact, ppavi is a pointer to a handle (which in turn is a pointer), and ppOptions is a pointer to a pointer to a structure. To call this method, we need a structure:

 AVICOMPRESSOPTIONS opts = new AVICOMPRESSOPTIONS();

The definition of this structure can be found in the documentation for the standard AVI. In the next step, we need to get a marshalized pointer to the structure:

 IntPtr lpstruct = MarshalToPointer(opts);

... and then a pointer to a pointer:

 IntPtr lppstruct = MarshalToPointer(lpstruct);

... followed by a pointer to the handle:

 IntPtr lphandle = MarshalToPointer(pStream);

Now call the function:

 result = AVISaveOptions(m_hWnd, ICMF_CHOOSE_KEYFRAME | ICMF_CHOOSE_DATARATE, 1, lphandle, lppstruct);

... where other parameters are of no interest, information about them can be found in the documentation.

After calling the function, all that remains is to transfer the data from the unmanaged buffer to the structure:

 opts = (AVICOMPRESSOPTIONS) MarshalToStruct(lpstruct, typeof(AVICOMPRESSOPTIONS));

Please note that you need to use a pointer to the structure itself, not a pointer to a pointer! Well, in the end we release the memory:

 Marshal.FreeHGlobal(lpstruct); Marshal.FreeHGlobal(lppstruct); Marshal.FreeHGlobal(lphandle);

All this may seem complicated. Using pointers to pointers is not a simple thing, which is why C # requires that code that works with pointers is marked unsafe.
On the other hand, the general principles of operation are quite simple. When you pass something by reference, this content is copied to unmanaged memory, and the address to the new memory location is passed to the function call.
Normally, standard marshaling takes over the work. However, if you need something beyond this, you can manage all copying manually.

Source: https://habr.com/ru/post/114953/

All Articles