📜 ⬆️ ⬇️

Generate P / Invoke signatures in C #. Misuse of Interface Definition Language and OLE Automation Type Libraries

This is NOT another article about what P / Invoke is.

So, let's say in a spherical C # project, you need to use any technology that is not in .NET, and all we have is the Windows SDK 8.1 in which there is only a set of header files for C / C ++. We'll have to declare a bunch of types, check the correctness of alignment of structures and write various wrappers. This is a large amount of routine work, and the risk of making a mistake. You can of course write a parser header files ... Everything is simple and clear here except for the number of man-hours required for it. Therefore, we discard this option and try to somehow minimize the number of necessary actions for interacting with unmanaged code.

In addition, the resulting code will not depend on the bitness of the process, strict typing will be preserved, automatic testing will be applied.

')

Interaction Managed and Unmanaged Code.


As you know, in .NET there are 2 main ways to interact with unmanaged code:
  1. C ++ / CLI : You can write a wrapper to wrap unmanaged calls into managed methods, manually convert native structures, strings, and arrays into managed objects. Undoubtedly it is as flexible as possible, but there are more disadvantages.
    Firstly, this is a bunch of code, including unmanaged, respectively, the potential risk of making a mistake (only gods and liars write without bugs).
    Secondly, the resulting assemblies are nailed to the architecture - x64, x86, etc., respectively, if we have the entire AnyCPU project, then we have to collect the wrappers for several platforms and drag them all with us, unpacking the appropriate configurations.
    Thirdly, it is C ++, but it is not needed.
  2. P / Invoke and COM : Many windows components are implemented using COM. In general, .net works acceptably with this technology. Necessary interfaces and structures can either be manually declared independently, or, if you have a type library, you can import them from there automatically using a special tlbimp utility.
    And you can call exported functions from dynamic libraries by declaring extern methods with the DllImport attribute. There is even a whole site where ads are posted for basic winapi functions.

Let us dwell on type libraries . Type libraries, as the name suggests, contain information about types, and are obtained by compiling IDL - interface definition language - a language whose syntax is damn similar to C. Type libraries are usually supplied either as separate files with the extension .tlb or built into the same DLL where are the described objects. The tlbimp utility mentioned above generates from the type libraries a special interop assembly containing the necessary declarations for .NET.
Since the syntax of IDL is similar to the declarations in the header files of the C language, the first thought that comes to mind is not to generate a type library in any way in order to further import it into a .net project? If in the IDL file you can copy all the necessary declarations from the header files almost as it is, without thinking about converting all DWORD to uint, then this is just what you need. But there are a number of problems: firstly, IDL does not support everything, and secondly, tlbimp does not import everything. In particular:


All problems with tlbimp are easily solved - we will not use this utility, but write our own. But with IDL the situation is more complicated - you have to shaman. I warn you at once: since the type library will be only an intermediate link, we will forget about compatibility with any standards, good tone, etc. and we will keep it all in the form in which it is more convenient for us .

IDL


I will not elaborate on the description of this language, but only briefly list the key elements of IDL that will be used. A full description of IDL is in msdn

The main block in the IDL file is the library. All types that are inside it will be included in the library. Types declared outside the library block will be included only if they are referenced by any of the library block. For good library block should have a name and a unique identifier. There are a number of other attributes, but we do not need any of this.
[uuid(00000000-0000-0000-0000-000000000001)] library Import { } 
But if you still need to force the inclusion of the type declared outside the block, you can write inside the library
 typedef MY_TYPE MY_TYPE; 

Inside the block are type declarations. We need a struct, union, enum, interface, and module. The first three are exactly the same as in C, so we will not dwell on them in detail. It should be noted only one feature, which consists in the fact that with this declaration:
 typedef struct tagTEST { int i; } TEST; 
the structure name will be tagTEST, and TEST is the alias which will eventually be replaced with the name. Since many header files in the declarations of structures contain various nasty prefixes, it’s better to take some measures to avoid a mess in the names. But in general, in IDL, just like in C, you can create any number of alias with the typedef directive.

To declare interfaces, use the interface block. Inside this function block:
 [uuid(38BF1A5B-65EE-4C5C-9BC3-0D8BE47E8A1F)] interface IXAudio2MasteringVoice : IXAudio2Voice { HRESULT GetChannelMask(DWORD* pChannelmask); }; 
It's pretty obvious. Of the attributes in our case, only uuid is important, which is the interface identifier.

There is also a block module. It can, for example, place functions from a DLL, or some constants.
 [dllname("kernel32.dll")] module NativeMethods_kernel32 { const UINT DONT_RESOLVE_DLL_REFERENCES = 0x00000001; [entry("RtlMoveMemory")] void RtlMoveMemory( void *Destination, const void *Source, SIZE_T Length); } 
The dllname and entry attributes are important here, indicating where the method will be loaded from. The entry can be the ordinal function instead of the name.

IDL Ads


Make a list of what should be taken from the header file:

Now you need to decide how to copy everything in the IDL.

The above should be enough to transfer information about everything you might need when working with native libraries to IDL. Of course, different classes and templates for C ++ are not taken into account here, but in any case, ninety-five percent of the contents of the header files from the Windows API can be transferred in this way. Despite the presence of several dirty hacks, copying to IDL is still easier, faster and safer than writing wrappers in CLI or manually typing types in .NET.

Ads in C #


Consider now how it should look like in C #.

We will generate unsafe code. Firstly, for strict typing of pointers, secondly, in order not to drive data back and forth all there Marshal.PtrToStructure. Not so much because of catching fleas on performance, but simply because with racially-correct pointers the code is stupidly simpler. The marshalling of complex types cannot be made laconically otherwise - it will be tons of code. I tried all the options and for a very long time I tried to find a universal way of not using unsafe code. It is not there, and the refusal of unsafe is a stick in its wheels - the code will not become safer and safer, but problems will be added.

The difference is best seen when you need to pass a structure to a function that contains a pointer to another structure, or to a string, or generally a recursive reference. And if in the unmanaged code one pointer will then be replaced with another one and it is necessary that these changes affect the original structure in the managed code ... then even custom marshaling will not help much. Yes, and by the way, the MarshalAs attribute is not needed and will not be used.

In addition, the use of imported ads will be as close as possible to that in C , which may be able to facilitate the transfer of already written code. It should be immediately noted that in C # to get the address of a variable, it must have a blittable -type. All our structures will meet these requirements. We declare fields with arrays as fixed, for strings we will use char * / byte *, but the bool type is not blittable, so in our case we will use a structure with an int field and implicit operators to cast from / to bool to represent it. It is necessary to dwell on the arrays inside the structures. There are restrictions: firstly, the fixed keyword is applicable only to arrays of primitive types, therefore arrays of structures are not so declared, and secondly, only one-dimensional arrays are supported. Regular arrays (with the MarshalAs attribute and the SizeConst option) can contain structures, but they are not blittable-type, besides they can also be only one-dimensional. To solve this problem, for arrays we will create special structures with private fields according to the number of elements. Such structures will have an indexer property for accessing the elements, as well as implicit operators for copying from / to managed arrays. Pseudo-multidimensionality will be provided through access on several indices. Those. A 4x4 matrix will be a structure with 16 fields, and the indexer property will take the address of the first element and calculate the offset using the following formula: index1 * length1 + index2, where length1 is 4, and both indexes are numbers from 0 to 3.


Conversion utility


With theory on it all. We proceed to practice.

For the conversion of IDL into a type library, the midl compiler included in the Windows SDK will be responsible.

For converting a type library to C # code will be answered by its own utility (but from it we will also run the compiler).

I'll start with the second. To read the contents of the type library, the standard ITypeLib2 and ITypeInfo2 interfaces are used. Documentation can be found here . They are also used in the tlbimp utility. The implementation of the converter is nothing interesting, so there is nothing more to tell about it. The source code in the attached archive (and yes, I know that there are libraries for generating C # code, but it's easier without them).

Now about compiling IDL.

Copy the compiler files into a separate folder. Firstly, because they will have to be modified, and secondly, in order to get rid of the Windows 8.1 SDK and not to write anywhere else any absolute paths like C: \ Program Files (x86) \ blablabla.
The following files will be needed:
C: \ Program Files (x86) \ Microsoft Visual Studio 12.0 \ VC \ bin \ amd64 \ 1033 \ clui.dll
C: \ Program Files (x86) \ Microsoft Visual Studio 12.0 \ VC \ bin \ amd64 \ c1.dll
C: \ Program Files (x86) \ Microsoft Visual Studio 12.0 \ VC \ bin \ amd64 \ cl.exe
C: \ Program Files (x86) \ Microsoft Visual Studio 12.0 \ VC \ bin \ amd64 \ mspdb120.dll
C: \ Program Files (x86) \ Windows Kits \ 8.1 \ bin \ x64 \ midl.exe
C: \ Program Files (x86) \ Windows Kits \ 8.1 \ bin \ x64 \ midlc.exe
All but clui.dll dump in one pile. A clui.dll should be located in a subfolder of 1033.

The process midl.exe starts another process - midlc.exe, which does all the work.

The compiler requires the presence of a file named oaidl.idl anywhere within reach, with the IUnknown interface declared there. For convenience of configuration, we will create a copy of this file and copy the main declarations there from the original oaidl.idl and the files to which it refers. Although it can be limited to only the interface IUnknown, and add the rest of the ads already in use. Place the resulting file next to the compiler.
It is necessary that some of the system types will have to be corrected a little. For example, BOOL and BOOLEAN are needed in the form of structures with one field so as not to mess with int and byte, but to support casting such a structure to bool (which, as mentioned above, is not a blittable type and therefore cannot be directly used). It is also necessary to declare the base interface in the same place for the types denoting pointers to functions.

Compiler bug fixes Bypassing compiler limitations


The following feature was a barrel of tar: http://support.microsoft.com/default.aspx?scid=kb;en-us;220137 . Microsoft is positioning it as a feature. On the one hand, it is logical - the main purpose of type libraries is OLE Automation, which implies support for case-insensitive languages. On the other hand, to put it mildly, the implementation is strange - there is no connection between argument names and method or type names, why use one global list of strings instead of separate lists for type names, separate lists for method names in each type, etc.? In any case, such a “by design” does not suit us, because the result is a monstrous cesspool in the names, and even with automatic testing (see below) there will be problems, since this requires an exact correspondence of the names to those in the source files.

Register-independent string comparisons are usually even the most notorious Indians rarely write from scratch, so the API function is very likely to be used.

Armed with a debugger, we observe practical confirmation of the behavior described in KB220137:

Inside the compiler there is a global dictionary in which strings with names are added. If the string “msg” (for example, as an argument in any function) is found in the file, it will be added to the dictionary. If the string “Msg” appears in the source file later (for example, the structure name), then the presence of this string in the dictionary will be performed using CompareStringA and the NORM_IGNORECASE flag. The check will return the result that the strings are the same, the text “Msg” will be ignored and the compiler to the type library in both cases (both the argument name and the structure name) will write “msg”, although in fact they are not connected in any way. This logic is executed depending on the value of the global variable.

In addition, COM objects from oleaut32.dll (ICreateTypeLib, ICreateTypeInfo, etc.) are used to create a file with a type library, which also use CompareStringA to check for duplicate names. , ICreateTypeInfo::SetVarName TYPE_E_AMBIGUOUSNAME . .

– CompareStringA dwCmpFlags NORM_IGNORECASE.

Midlc.exe CompareStringA kernel32.dll, CompareStringA kernelbase.dll, oleaut32.dll CompareStringA kernelbase.dll. , .

: , , , , . : http://www.codeproject.com/Articles/44326/MinHook-The-Minimalistic-x86-x64-API-Hooking-Libra ( – ).

DLL midlc.exe . DllMain.

, , http://www.ntcore.com/exsuite.php . CFF Explorer exe Import Adder - ( , ) Rebuild Import Table .


build-event- T4. . C# . IDL . T4 IDL midl-, T4. , , . - IDL
 /* <#@ include file="..\InternalTools\TransformIDL.tt" #> */ 
IDL TextTemplatingFileGenerator Custom Tool.

– . C# T4- . T4 , , .

, .idl .

T4 ( ~64), “Compiling transformation: An expression is too long or complex to compile ”. :
 // <# #> 


Settings


, IDL . IUnknown. , . IDL .

Testing


:

, 32 64 .

. 99% 4 . int .

native . CLI (32 64). managed- . :
 #define STRUCT_SIZES \ {\ { L"ARRAYDESC", sizeof(::ARRAYDESC) },\ { L"BLOB", sizeof(::BLOB) },\ { NULL, 0 }\ }\ #define STRUCT_OFFSETS \ {\ { L"ARRAYDESC.tdescElem", FIELD_OFFSET(::ARRAYDESC, tdescElem) },\ { L"ARRAYDESC.tdescElem.lptdesc", FIELD_OFFSET(::ARRAYDESC, tdescElem.lptdesc) },\ { NULL, 0 }\ }\ 
:
 STRUCT_SIZE structSizes[] = STRUCT_SIZES; STRUCT_OFFSET structOffsets[] = STRUCT_OFFSETS; 
!

Dictionary<string, int>. . – ' ' , – .

32 64 , . C#. managed , Marshal.SizeOf Marshal.OffsetOf.

dll LoadLibrary GetProcAddress. , , IDL.

. #include , .

– VisualStudio 32- 64- . . , .

- . FieldOffset ( ), . Here is an example:
 typedef struct SOCKET_ADDRESS_LIST { INT iAddressCount; SOCKET_ADDRESS Address[1]; } SOCKET_ADDRESS_LIST; 
In x64, the Address array will have an offset of 8, i.e. after the iAddressCount field, padding of 4 bytes is required. At x86 it should not be. Analog in .NET will be aligned to 4 bytes on both platforms. The knight’s tricky move is as follows:
 typedef struct SOCKET_ADDRESS_LIST { union { INT iAddressCount; [hidden] void* ___padding000; }; SOCKET_ADDRESS Address[1]; } SOCKET_ADDRESS_LIST; 
, , .NET – 4 32- 8 64-, “” 4 64- .
( #pragma pack(2) x86 #pragma pack(16) 64) — 99% 1 , .

x86 x64, WSADATA. . , .

That's all. .
, midl . VisualStudio ( 64- ).

Source: https://habr.com/ru/post/202282/


All Articles