Generate P / Invoke signatures in C #. Misuse of Interface Definition Language and OLE Automation Type Libraries

This is NOT another article about what P / Invoke is.

So, let's say in a spherical C # project, you need to use any technology that is not in .NET, and all we have is the Windows SDK 8.1 in which there is only a set of header files for C / C ++. We'll have to declare a bunch of types, check the correctness of alignment of structures and write various wrappers. This is a large amount of routine work, and the risk of making a mistake. You can of course write a parser header files ... Everything is simple and clear here except for the number of man-hours required for it. Therefore, we discard this option and try to somehow minimize the number of necessary actions for interacting with unmanaged code.

In addition, the resulting code will not depend on the bitness of the process, strict typing will be preserved, automatic testing will be applied.

')

Interaction Managed and Unmanaged Code.

As you know, in .NET there are 2 main ways to interact with unmanaged code:

C ++ / CLI : You can write a wrapper to wrap unmanaged calls into managed methods, manually convert native structures, strings, and arrays into managed objects. Undoubtedly it is as flexible as possible, but there are more disadvantages.
Firstly, this is a bunch of code, including unmanaged, respectively, the potential risk of making a mistake (only gods and liars write without bugs).
Secondly, the resulting assemblies are nailed to the architecture - x64, x86, etc., respectively, if we have the entire AnyCPU project, then we have to collect the wrappers for several platforms and drag them all with us, unpacking the appropriate configurations.
~~Thirdly, it is C ++, but it is not needed.~~
P / Invoke and COM : Many windows components are implemented using COM. In general, .net works acceptably with this technology. Necessary interfaces and structures can either be manually declared independently, or, if you have a type library, you can import them from there automatically using a special tlbimp utility.
And you can call exported functions from dynamic libraries by declaring extern methods with the DllImport attribute. There is even a whole site where ads are posted for basic winapi functions.

Let us dwell on type libraries . Type libraries, as the name suggests, contain information about types, and are obtained by compiling IDL - interface definition language - a language whose syntax is damn similar to C. Type libraries are usually supplied either as separate files with the extension .tlb or built into the same DLL where are the described objects. The tlbimp utility mentioned above generates from the type libraries a special interop assembly containing the necessary declarations for .NET.
Since the syntax of IDL is similar to the declarations in the header files of the C language, the first thought that comes to mind is not to generate a type library in any way in order to further import it into a .net project? If in the IDL file you can copy all the necessary declarations from the header files almost as it is, without thinking about converting all DWORD to uint, then this is just what you need. But there are a number of problems: firstly, IDL does not support everything, and secondly, tlbimp does not import everything. In particular:

IDL cannot use function pointers
IDL cannot declare bit fields
tlbimp does not use unsafe code, so the output of an overwhelming number of pointers will be represented by an untyped IntPtr
If a structure is passed as an argument to a method by reference, tlbimp will declare such an argument as ref. And if, in theory, it is implied that the address of the array should in fact be transferred there, then we go through the forest. Of course, a zero pinned array element can be referred to as ref, it will even work, but it looks a little Indian. In any case, due to ref, we will not be able to pass a null pointer if the argument is suddenly optional.
Pointers to C-style null-terminated strings (a la LPWSTR) tlbimp converts to string, and if suddenly a bad COM object decides to write something to this piece of memory, the application will say “quack”
tlbimp imports only interfaces and structures. Methods from DLL should be declared manually
tlbimp generates the build but not the code. Although not so critical

All problems with tlbimp are easily solved - we will not use this utility, but write our own. But with IDL the situation is more complicated - you have to shaman. I warn you at once: since the type library will be only an intermediate link, we will forget about compatibility with any standards, good tone, etc. and we will keep it all in the form in which it is more convenient for us .

IDL

I will not elaborate on the description of this language, but only briefly list the key elements of IDL that will be used. A full description of IDL is in msdn

The main block in the IDL file is the library. All types that are inside it will be included in the library. Types declared outside the library block will be included only if they are referenced by any of the library block. For good library block should have a name and a unique identifier. There are a number of other attributes, but we do not need any of this.

[uuid(00000000-0000-0000-0000-000000000001)] library Import { }

But if you still need to force the inclusion of the type declared outside the block, you can write inside the library

 typedef MY_TYPE MY_TYPE;

Inside the block are type declarations. We need a struct, union, enum, interface, and module. The first three are exactly the same as in C, so we will not dwell on them in detail. It should be noted only one feature, which consists in the fact that with this declaration:

 typedef struct tagTEST { int i; } TEST;

the structure name will be tagTEST, and TEST is the alias which will eventually be replaced with the name. Since many header files in the declarations of structures contain various nasty prefixes, it’s better to take some measures to avoid a mess in the names. But in general, in IDL, just like in C, you can create any number of alias with the typedef directive.

To declare interfaces, use the interface block. Inside this function block:

 [uuid(38BF1A5B-65EE-4C5C-9BC3-0D8BE47E8A1F)] interface IXAudio2MasteringVoice : IXAudio2Voice { HRESULT GetChannelMask(DWORD* pChannelmask); };

It's pretty obvious. Of the attributes in our case, only uuid is important, which is the interface identifier.

There is also a block module. It can, for example, place functions from a DLL, or some constants.

 [dllname("kernel32.dll")] module NativeMethods_kernel32 { const UINT DONT_RESOLVE_DLL_REFERENCES = 0x00000001; [entry("RtlMoveMemory")] void RtlMoveMemory( void *Destination, const void *Source, SIZE_T Length); }

The dllname and entry attributes are important here, indicating where the method will be loaded from. The entry can be the ordinal function instead of the name.

IDL Ads

Make a list of what should be taken from the header file:

Structures and unions, incl. with bit fields
Transfers
Declarations of functions imported from DLL
Interfaces
Constants (macros declared with #define)
Function pointers
Alias types declared via typedef (i.e. all DWORDs there, etc.)

Now you need to decide how to copy everything in the IDL.

Structures and unions : Copy as is, optionally removing only unnecessary prefixes from names.
Enumerations : Similar to structures.
Declarations of functions imported from DLL : Copy as is in the module module for the corresponding DLL. Obviously, for each DLL you need to create at least one module module.
Constants (declared via #define) : Of course, it doesn't work very well - you have to add a type, i.e. the constant from the example above is actually
```
 #define DONT_RESOLVE_DLL_REFERENCES 0x00000001 
```
There are few variants - macros can not naturally get into the type library.
Another problem is any structures like GUIDs declared with DEFINE_GUID. Well, to be precise, in fact these are no constants of any kind, but global variables, but they are usually used as constants. Here, alas, nothing. GUIDs we can still declare as strings, but everything else will have to be handled manually.
Alias types declared via typedef (i.e. all DWORDs there, etc.): Copy as is.
Interfaces : Since neither C nor C ++ support interfaces, in most header files they are declared through conditional compilation in two ways - as a class for C ++ with __declspec (uuid (x)) in one form or another and as a structure with a list of function pointers for C. We are interested in ads for C ++. They usually look like this:
```
 MIDL_INTERFACE("0c733a30-2a1c-11ce-ade5-00aa0044773d") ISequentialStream : public IUnknown { public: virtual /* [local] */ HRESULT STDMETHODCALLTYPE Read( /* [annotation] */ _Out_writes_bytes_to_(cb, *pcbRead) void *pv, /* [annotation][in] */ _In_ ULONG cb, /* [annotation] */ _Out_opt_ ULONG *pcbRead) = 0; virtual /* [local] */ HRESULT STDMETHODCALLTYPE Write( /* [annotation] */ _In_reads_bytes_(cb) const void *pv, /* [annotation][in] */ _In_ ULONG cb, /* [annotation] */ _Out_opt_ ULONG *pcbWritten) = 0; }; 
```
It is necessary to clean everything superfluous from here, so that the interface looks like this:
```
 [uuid(0c733a30-2a1c-11ce-ade5-00aa0044773d)] interface ISequentialStream : IUnknown { HRESULT Read( void *pv, ULONG cb, ULONG *pcbRead); HRESULT Write( void const *pv, ULONG cb, ULONG *pcbWritten); }; 
```
If you wish, you can leave the comments unchecked, and hide the SAL annotations in the [annotation (...)] attribute.
Yes, we still have to perform a number of operations, but the key point, like the main point of the article, is that we don’t touch the function arguments and return values . Those. even though the original declaration changes somewhat, one can guarantee with sufficient confidence that it is correct, since all types and indirection level of pointers remain unchanged. If we forget to clean something, it will not compile, but if it is compiled, the result will be correct because the "signatures" do not change.
Function Pointers : Crutches start here. We declare an interface with one method, and when converting a type library, we will convert such interfaces into delegates. Thus, we still will not touch the arguments, and the rest of the code using this pointer will not produce compilation errors.
Those. for example this:
```
 typedef LRESULT (CALLBACK* WNDPROC)(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam); 
```
would look like this:
```
 [uuid(C17B0B13-6E49-4268-B699-2D083BAE88F9) interface WNDPROC : __Delegate { LRESULT WNDPROC(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam); } 
```
In this case, __Delegate is an empty interface declared by us, by which we will distinguish such a “function pointer” from ordinary interfaces. The uuid attribute contains a random value (so as not to conflict with anything), simply without it will not compile. Of course, it would be possible to replace all function pointers with void *, but thanks to this hack we will keep strict typing, for example, the WNDPROC lpfnWndProc field of the WNDCLASSEX structure in the type library will also be strongly typed, and we only need information about the type name and indirection level of pointers , because the fact that this interface doesn't matter.
Bit fields : Although this applies to structures, I have put them in a separate paragraph, because here I will also have to be cunning. It is necessary to each in any way bind information about the number of bits. For example, you can do this with arrays. And in order to understand that this is a bit field when converting a type library, add some unnecessary attribute. For example this:
```
 struct DWRITE_LINE_BREAKPOINT { UINT8 breakConditionBefore : 2; UINT8 breakConditionAfter : 2; UINT8 isWhitespace : 1; UINT8 isSoftHyphen : 1; UINT8 padding : 2; }; 
```
declare as follows:
```
 typedef struct DWRITE_LINE_BREAKPOINT { [replaceable] UINT8 breakConditionBefore[2]; [replaceable] UINT8 breakConditionAfter[2]; [replaceable] UINT8 isWhitespace[1]; [replaceable] UINT8 isSoftHyphen[1]; [replaceable] UINT8 padding[2]; } DWRITE_LINE_BREAKPOINT; 
```
And for simplicity, we agree that if there are bit fields in the structure, then there should not be any ordinary fields there. Then these ads:
```
 typedef struct TEST { int i1 : 1; int i2 : 31; float f1; } TEST; 
```
It will be necessary to convert to:
```
 typedef struct TEST { struct { int i1 : 1; int i2 : 31; }; float f1; } TEST; 
```
But bit fields are a very big rarity, because in principle they could not have been supported at all, but replaced by the base type and already in C # do everything else manually:
```
 typedef struct TEST { int i; float f1; } TEST; 
```

The above should be enough to transfer information about everything you might need when working with native libraries to IDL. Of course, different classes and templates for C ++ are not taken into account here, but in any case, ninety-five percent of the contents of the header files from the Windows API can be transferred in this way. Despite the presence of several dirty hacks, copying to IDL is still easier, faster and safer than writing wrappers in CLI or manually typing types in .NET.

Ads in C #

Consider now how it should look like in C #.

We will generate unsafe code. Firstly, for strict typing of pointers, secondly, in order not to drive data back and forth all there Marshal.PtrToStructure. Not so much because of catching fleas on performance, but simply because with racially-correct pointers the code is stupidly simpler. The marshalling of complex types cannot be made laconically otherwise - it will be tons of code. I tried all the options and for a very long time I tried to find a universal way of not using unsafe code. It is not there, and the refusal of unsafe is a stick in its wheels - the code will not become safer and safer, but problems will be added.

The difference is best seen when you need to pass a structure to a function that contains a pointer to another structure, or to a string, or generally a recursive reference. And if in the unmanaged code one pointer will then be replaced with another one and it is necessary that these changes affect the original structure in the managed code ... then even custom marshaling will not help much. Yes, and by the way, the MarshalAs attribute is not needed and will not be used.

In addition, the use of imported ads will be as close as possible to that in C , which may be able to facilitate the transfer of already written code. It should be immediately noted that in C # to get the address of a variable, it must have a blittable -type. All our structures will meet these requirements. We declare fields with arrays as fixed, for strings we will use char * / byte *, but the bool type is not blittable, so in our case we will use a structure with an int field and implicit operators to cast from / to bool to represent it. It is necessary to dwell on the arrays inside the structures. There are restrictions: firstly, the fixed keyword is applicable only to arrays of primitive types, therefore arrays of structures are not so declared, and secondly, only one-dimensional arrays are supported. Regular arrays (with the MarshalAs attribute and the SizeConst option) can contain structures, but they are not blittable-type, besides they can also be only one-dimensional. To solve this problem, for arrays we will create special structures with private fields according to the number of elements. Such structures will have an indexer property for accessing the elements, as well as implicit operators for copying from / to managed arrays. Pseudo-multidimensionality will be provided through access on several indices. Those. A 4x4 matrix will be a structure with 16 fields, and the indexer property will take the address of the first element and calculate the offset using the following formula: index1 * length1 + index2, where length1 is 4, and both indexes are numbers from 0 to 3.

Structures and associations: Structures as structures, nothing special. For LayoutKind.Explicit and FieldOffset (0) for all fields. Of particular note is anonymous fields with structures and associations. The fact is that type libraries do not support this, instead they will be assigned generated names starting with __MIDL__.
Structure
```
 typedef struct TEST { struct { int i; }; } TEST; 
```
In fact, it will be something like this:
```
 typedef struct TEST { struct __MIDL___MIDL_itf_Win32_0001_0001_0001 { int i; } __MIDL____MIDL_itf_Win32_0001_00010000; } TEST; 
```
Accordingly, if imported into C # as is, we get the following:
```
 [StructLayout(LayoutKind.Sequential)] public unsafe struct TEST { [StructLayout(LayoutKind.Sequential)] public unsafe struct __MIDL___MIDL_itf_Win32_0001_0001_0001 { public int i; } public __MIDL___MIDL_itf_Win32_0001_0001_0001 __MIDL____MIDL_itf_Win32_0001_00010000; } 
```
In principle, it would be with him, but access to the field i in C is performed directly, as if this is the field of the main structure, i.e. myVar.i, and here will be a creepy myVar. __MIDL ____ MIDL_itf_Win32_0001_00010000.i. It is not suitable, so for such cases we will generate properties for access directly to the fields of nested unnamed structures:
```
 [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)] public unsafe struct TEST { [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)] public unsafe struct __MIDL___MIDL_itf_Win32_0001_0001_0001 { public int i; } public __MIDL___MIDL_itf_Win32_0001_0001_0001 __MIDL____MIDL_itf_Win32_0001_00010000; public int i { get { return __MIDL____MIDL_itf_Win32_0001_00010000.i; } set { __MIDL____MIDL_itf_Win32_0001_00010000.i = value; } } } 
```
Perhaps this approach is not without flaws, but it allows you to achieve maximum matching ads and correctly handle for example such structures:
```
 typedef struct TEST { union { struct { int i1; int i2; }; struct { float f1; float f2; }; }; char c1; } TEST; 
```
Accessing directly through the properties will allow working with the structure in almost the same way as in C. The only exception is the case when the address of the nested fields is necessary, then you have to specify the full path.
Enumerations. Everything is simple, only minor differences in syntax.

Bit fields They will look like this - an integer private variable (the type depends on the total size of the structure with bit fields) and the generated properties that perform bit operations to read / set only the corresponding bits:

 [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode, Pack = 1)] public unsafe struct DWRITE_LINE_BREAKPOINT { private byte __bit_field_value; public byte breakConditionBefore { get { return (byte)((__bit_field_value >> 8) & 3); } set { __bit_field_value = (byte)((value & 3) << 8); } } public byte breakConditionAfter { get { return (byte)((__bit_field_value >> 8) & 3); } set { __bit_field_value = (byte)((value & 3) << 8); } } ... }

Function declarations imported from a DLL: As usual, the static extern methods with the DllImport attribute in the NativeMethods class
Alias types declared via typedef: If IDL did not randomly add any extra attributes, then alias will be replaced with the type itself when compiling the type library (see here ). And if all the same they get there, then we substitute the type that they represent instead.
Constants: constants in the NativeConstants class. Strings or numbers.
Pointers to functions (which are in the form of special interfaces): We generate 2 main types: the delegate and the structure that will be the pointer itself. In the structure, one private field is of type void *. And through the implicit operator, implicitly cast the types from / to the delegate by calling Marshal.GetFunctionPointerForDelegate and Marshal.GetDelegateForFunctionPointer
Interfaces: It would seem that everything is simple - I declared an interface with the ComImport attribute and it's in the bag, and in the Marshal class in bulk of methods for additional functionality.
But no, it only works for COM interfaces. And we can easily return something that does not inherit IUnknown. For example IXAudio2Voice. And here the standard .NET mechanisms will tell you the "quack". Well, it's not scary, in stock there is a cunning knight's move - we will generate virtual method tables ourselves and call them through Marshal.GetFunctionPointerForDelegate and Marshal.GetDelegateForFunctionPointer. There is nothing special here - the interfaces will be represented by structures, inside of which there are private structures with a set of pointers. For each interface function, the main structure generates a method that calls the corresponding pointer via Marshal.GetDelegateForFunctionPointer. As well as a set of implicit operators to support type casting in the case of interface inheritance. An example would take too much space to bring it here, so everything can be viewed in the attached archive.

Conversion utility

With theory on it all. We proceed to practice.

For the conversion of IDL into a type library, the midl compiler included in the Windows SDK will be responsible.

For converting a type library to C # code will be answered by its own utility (but from it we will also run the compiler).

I'll start with the second. To read the contents of the type library, the standard ITypeLib2 and ITypeInfo2 interfaces are used. Documentation can be found here . They are also used in the tlbimp utility. The implementation of the converter is nothing interesting, so there is nothing more to tell about it. The source code in the attached archive (and yes, I know that there are libraries for generating C # code, but it's easier without them).

Now about compiling IDL.

Copy the compiler files into a separate folder. Firstly, because they will have to be modified, and secondly, in order to get rid of the Windows 8.1 SDK and not to write anywhere else any absolute paths like C: \ Program Files (x86) \ blablabla.
The following files will be needed:
C: \ Program Files (x86) \ Microsoft Visual Studio 12.0 \ VC \ bin \ amd64 \ 1033 \ clui.dll
C: \ Program Files (x86) \ Microsoft Visual Studio 12.0 \ VC \ bin \ amd64 \ c1.dll
C: \ Program Files (x86) \ Microsoft Visual Studio 12.0 \ VC \ bin \ amd64 \ cl.exe
C: \ Program Files (x86) \ Microsoft Visual Studio 12.0 \ VC \ bin \ amd64 \ mspdb120.dll
C: \ Program Files (x86) \ Windows Kits \ 8.1 \ bin \ x64 \ midl.exe
C: \ Program Files (x86) \ Windows Kits \ 8.1 \ bin \ x64 \ midlc.exe
All but clui.dll dump in one pile. A clui.dll should be located in a subfolder of 1033.

The process midl.exe starts another process - midlc.exe, which does all the work.

The compiler requires the presence of a file named oaidl.idl anywhere within reach, with the IUnknown interface declared there. For convenience of configuration, we will create a copy of this file and copy the main declarations there from the original oaidl.idl and the files to which it refers. Although it can be limited to only the interface IUnknown, and add the rest of the ads already in use. Place the resulting file next to the compiler.
It is necessary that some of the system types will have to be corrected a little. For example, BOOL and BOOLEAN are needed in the form of structures with one field so as not to mess with int and byte, but to support casting such a structure to bool (which, as mentioned above, is not a blittable type and therefore cannot be directly used). It is also necessary to declare the base interface in the same place for the types denoting pointers to functions.

Compiler bug fixes Bypassing compiler limitations

The following feature was a barrel of tar: http://support.microsoft.com/default.aspx?scid=kb;en-us;220137 . Microsoft is positioning it as a feature. On the one hand, it is logical - the main purpose of type libraries is OLE Automation, which implies support for case-insensitive languages. On the other hand, to put it mildly, the implementation is strange - there is no connection between argument names and method or type names, why use one global list of strings instead of separate lists for type names, separate lists for method names in each type, etc.? In any case, such a “by design” does not suit us, because the result is a monstrous cesspool in the names, and even with automatic testing (see below) there will be problems, since this requires an exact correspondence of the names to those in the source files.

Register-independent string comparisons are usually even the most notorious Indians rarely write from scratch, so the API function is very likely to be used.

Armed with a debugger, we observe practical confirmation of the behavior described in KB220137:

Inside the compiler there is a global dictionary in which strings with names are added. If the string “msg” (for example, as an argument in any function) is found in the file, it will be added to the dictionary. If the string “Msg” appears in the source file later (for example, the structure name), then the presence of this string in the dictionary will be performed using CompareStringA and the NORM_IGNORECASE flag. The check will return the result that the strings are the same, the text “Msg” will be ignored and the compiler to the type library in both cases (both the argument name and the structure name) will write “msg”, although in fact they are not connected in any way. This logic is executed depending on the value of the global variable.

In addition, COM objects from oleaut32.dll (ICreateTypeLib, ICreateTypeInfo, etc.) are used to create a file with a type library, which also use CompareStringA to check for duplicate names. , ICreateTypeInfo::SetVarName TYPE_E_AMBIGUOUSNAME . .

– CompareStringA dwCmpFlags NORM_IGNORECASE.

Midlc.exe CompareStringA kernel32.dll, CompareStringA kernelbase.dll, oleaut32.dll CompareStringA kernelbase.dll. , .

: , , , , . : http://www.codeproject.com/Articles/44326/MinHook-The-Minimalistic-x86-x64-API-Hooking-Libra ( – ).

DLL midlc.exe . DllMain.

, , http://www.ntcore.com/exsuite.php . CFF Explorer exe Import Adder - ( , ) Rebuild Import Table .

build-event- T4. . C# . IDL . T4 IDL midl-, T4. , , . - IDL

 /* <#@ include file="..\InternalTools\TransformIDL.tt" #> */

IDL TextTemplatingFileGenerator Custom Tool.

– . C# T4- . T4 , , .

, .idl .

T4 ( ~64), “Compiling transformation: An expression is too long or complex to compile ”. :

 // <# #>

Settings

, IDL . IUnknown. , . IDL .

Testing

, 32 64 .

. 99% 4 . int .

native . CLI (32 64). managed- . :

 #define STRUCT_SIZES \ {\ { L"ARRAYDESC", sizeof(::ARRAYDESC) },\ { L"BLOB", sizeof(::BLOB) },\ { NULL, 0 }\ }\ #define STRUCT_OFFSETS \ {\ { L"ARRAYDESC.tdescElem", FIELD_OFFSET(::ARRAYDESC, tdescElem) },\ { L"ARRAYDESC.tdescElem.lptdesc", FIELD_OFFSET(::ARRAYDESC, tdescElem.lptdesc) },\ { NULL, 0 }\ }\

 STRUCT_SIZE structSizes[] = STRUCT_SIZES; STRUCT_OFFSET structOffsets[] = STRUCT_OFFSETS;

!

Dictionary<string, int>. . – ' ' , – .

32 64 , . C#. managed , Marshal.SizeOf Marshal.OffsetOf.

dll LoadLibrary GetProcAddress. , , IDL.

. #include , .

– VisualStudio 32- 64- . . , .

- . FieldOffset ( ), . Here is an example:

 typedef struct SOCKET_ADDRESS_LIST { INT iAddressCount; SOCKET_ADDRESS Address[1]; } SOCKET_ADDRESS_LIST;

In x64, the Address array will have an offset of 8, i.e. after the iAddressCount field, padding of 4 bytes is required. At x86 it should not be. Analog in .NET will be aligned to 4 bytes on both platforms. The knight’s tricky move is as follows:

 typedef struct SOCKET_ADDRESS_LIST { union { INT iAddressCount; [hidden] void* ___padding000; }; SOCKET_ADDRESS Address[1]; } SOCKET_ADDRESS_LIST;

, , .NET – 4 32- 8 64-, “” 4 64- .
( #pragma pack(2) x86 #pragma pack(16) 64) — 99% 1 , .

x86 x64, WSADATA. . , .

That's all. .
, midl . VisualStudio ( 64- ).

Source: https://habr.com/ru/post/202282/

All Articles