📜 ⬆️ ⬇️

.NET in an unmanaged environment: platform invoke or what LPTSTR is

The technique is the same - a minimum of explanations, a maximum of recipes. For a deep understanding of the processes taking place, I recommend to refer to the documentation in MSDN - this section has already been translated into Russian.

Strings and enums



We will learn from concrete examples. Naturally, as is customary, we begin with the simplest application - hello, world! To display this text, we interpose the MessageBox function from WinAPI, using which we will examine in detail the strings and the encoding using an example.

So, who said MessageBox.Show? I do not like your pink blouse, your boobs and your microphone, got up and went away. Come back with smart tips when we marshall the arrays of structures. So that the others do not have temptations, we will work within the framework of the console project, without connecting Windows.Forms.
')
So, example one, trivial. We create a static class, write the definition of a function with extern, indicate what kind of function we want to get, and then everything is exactly the same as in my previous article - we define the marshalling parameters with the appropriate attributes.

public static class PInvoke
{
[DllImport( "User32.dll" , EntryPoint= "MessageBox" , CharSet=CharSet.Auto)]
public extern static int MsgBox(
[MarshalAs(UnmanagedType.I4)] int hwnd,
[MarshalAs(UnmanagedType.LPTStr)] string text,
[MarshalAs(UnmanagedType.LPTStr)] string caption,
[MarshalAs(UnmanagedType.U4)] uint type);
}


Attention should be paid to the task EntryPoint and CharSet. EntryPoint is the exported name of the function we want to call, and CharSet is the encoding used. If you are a little familiar with WinAPI, then you know that most of the functions have two versions - ANSI and Unicode. The first ones chew and issue ANSI strings, the second, respectively, Unicode. They differ by the suffix - for example, the MessageBox function that works with Unicode strings (which, by the way, we call) in the library is named MessageBoxW. However, a smart .NET Marshalller knows about this WinAPI feature (I wondered if he didn’t), and therefore, when processing the DllImport attribute, it automatically substitutes the suffix depending on the encoding.

However, we can specify the encoding used directly. For example:
[DllImport( "User32.dll" , EntryPoint= "MessageBoxA" , CharSet=CharSet.ANSI)]


Here we work with ANSI-coding. However, by raising the window with the messagebox, which is defined in this way, you will see that only the first characters are left from the transmitted lines. What is the matter?

The point is in the marshalling of the rows, or rather in the LPTStr type. Let us dwell on it in more detail.

The fact is that in C ++, in contrast to C #, where its majesty reigns supreme, there are also ANSI strings. In C ++ terms, the type used for the ANSI string is called LPSTR (long pointer to string), the Unicode string is LPWSTR (long pointer to wide string), and LPTSTR is a special type that is defined as follows:

#ifdef UNICODE
typedef LPCWSTR LPCTSTR;
#else
typedef LPCSTR LPCTSTR;
#endif


That is, depending on how we compile, we have this or that type of string substituted - either Unicode or ANSI. This was done in order to ensure compatibility at the code level with OS versions that do not support Unicode. Now it certainly looks like an atavism, but Microsoft has to drag this solution for compatibility with older versions.

And what is LPTStr in .NET? MSDN helpfully suggests that this type is equivalent to either LPStr for Windows 98, or LPWStr for Windows NT and later. Thus, theoretically, the function defined in this way will work in Windows 98. However, if we recall that only version 1.1 of the framework is available for Win98, then everything that I wrote can be thrown out of my head, and always marshall functions like Unicode in the following way:

public static class PInvoke
{
[DllImport( "User32.dll" , EntryPoint= "MessageBox" , CharSet=CharSet.Unicode)]
public extern static int MsgBox(
[MarshalAs(UnmanagedType.I4)] int hwnd,
[MarshalAs(UnmanagedType.LPWStr)] string text,
[MarshalAs(UnmanagedType.LPWStr)] string caption,
[MarshalAs(UnmanagedType.U4)] uint type);
}


Well, if you suddenly come across a tricky library that only understands Ansi - calmly prescribe CharSet.Ansi and use LPStr for strings. That's all the magic.

By the way, EntryPoint can also not be set - by default it is equivalent to the function name that you specify in the code, so if they match, then ignore this parameter.

If the function takes an enumeration as an input parameter (for example, the MessageBox has a type parameter), then you can define the corresponding enum in C #, inheriting it from the desired type, and pass it as a parameter, for example,

[Flags]
public enum MBoxStyle : uint
{
MB_OK = 0,
MB_OKCANCEL = 1,
MB_RETRYCANCEL = 2,
MB_YESNO = 4,
MB_YESNOCANCEL = 8,
MB_ICONEXCLAMATION = 16,
MB_ICONWARNING = 32,
MB_ICONINFORMATION = 64 ...
}

[DllImport( "User32.dll" , EntryPoint= "MessageBox" , CharSet=CharSet.Unicode)]
public extern static int MsgBox(
[MarshalAs(UnmanagedType.I4)] int hwnd,
[MarshalAs(UnmanagedType.LPWStr)] string text,
[MarshalAs(UnmanagedType.LPWStr)] string caption,
[MarshalAs(UnmanagedType.U4)] MBoxStyle type);


Now we can use not numeric constants, but a normal enum.

In order to determine the string returned from the function, simply write out or ref to the function parameter. However, there is one trick here that many functions require a fixed-size buffer to return rows. So, for example, the GetWindowText function, which is defined as follows:

int GetWindowText(HWND hWnd, LPTSTR lpString, INT nMaxCount);


When you call a function from C ++, something like this is done:

const int BUFF_SIZE = 200;
LPTSTR buff = new TCHAR[BUFF_SIZE];
GetWindowText(hwnd, buff, BUFF_SIZE);


The GetWindowText function fills the received buffer with the required data, and with the help of nMaxCount ensures that the buffer does not overflow.

If we try to use the standard methodology:

[DllImport( "User32.dll" , EntryPoint = "GetWindowText" , CharSet = CharSet.Unicode)]
public extern static void GetWindowText( int hWnd, [MarshalAs(UnmanagedType.LPWStr)] ref string lpString, int nMaxCount);


That attempt to call such a function will result in failure. The fact is that strings in C # are fundamentally unchangeable - with any operation, like concatenation, searching for a substring, and the like, a new object is created in memory. Then how to be?

Exit - use StringBuilder.

[DllImport( "User32.dll" , EntryPoint = "GetWindowText" , CharSet = CharSet.Unicode)]
public extern static void GetWindowText( int hWnd, [MarshalAs(UnmanagedType.LPWStr)] StringBuilder lpString, int nMaxCount);


And call the function as follows:

StringBuilder sb = new StringBuilder (256);
PInvoke.GetWindowText(handle, sb, sb.Capacity);


By the way, if you pay attention, then the StringBuilder object has no ref modifier. It is not needed - the fact is that the .NET Marshalller understands that in such situations StringBuilder is used for the return value.

Marshalling the return value of a function is done in the same way as I showed in the previous article - setting the attribute [return: MarshalAs (...)].

Structures and unions.



When marshalling structures, they must be aligned (LayoutKind.Sequential).

If the structure has a string buffer of a fixed length, then the marshalller should specifically indicate this, otherwise the structure will “sprawl” in memory and you will get at the output what that line is.

Consider both of these rules on the example of the function GetVersionEx. Its full definition in C ++ is as follows:

typedef struct _OSVERSIONINFO
{
DWORD dwOSVersionInfoSize;
DWORD dwMajorVersion;
DWORD dwMinorVersion;
DWORD dwBuildNumber;
DWORD dwPlatformId;
TCHAR szCSDVersion[128];
} OSVERSIONINFO;

BOOL GetVersionEx(LPOSVERSIONINFO lpVersionInfo);


So, we see that the structure contains a fixed-length string buffer. Accordingly, our actions will be such. Determine the structure

[StructLayout(LayoutKind.Sequential)]
public struct OSVERSIONINFO
{
[MarshalAs(UnmanagedType.I4)] public int dwOSVersionInfoSize;
[MarshalAs(UnmanagedType.I4)] public int dwMajorVersion;
[MarshalAs(UnmanagedType.I4)] public int dwMinorVersion;
[MarshalAs(UnmanagedType.I4)] public int dwBuildNumber;
[MarshalAs(UnmanagedType.I4)] public int dwPlatformId;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 128)] public string szCSDVersion; //
}


And a function.
[DllImport( "kernel32" , EntryPoint = "GetVersionEx" , CharSet=CharSet.Unicode)]
public static extern bool GetVersionEx2( ref OSVERSIONINFO osvi);


However, besides structures in C ++, there is such a thing as a union. Combining is when the same amount of binary data in memory is interpreted differently, depending on some external or internal factors. The most famous type of union is the type VARIANT.

Combinations are good because they save memory while maintaining the enormous flexibility of the internal representation.

In .NET, you can also create a join. It will look like this - in the StructLayout attribute, we set the LayoutKind.Explicit parameter, which says that we ourselves determine how the structure is placed in memory, and we specify the offset of each field in bytes from the beginning of the structure.
[StructLayout(LayoutKind.Explicit)]
public struct TestStruct
{
[FieldOffset(0)]
public int a;
[FieldOffset(0)]
public float b;
}


This thing looks wildly funny.

TestStruct s = new TestStruct();
sb = 5.2f;
Console .WriteLine(sb);
Console .WriteLine(sa);
sa = 99;
Console .WriteLine(sb);
Console .WriteLine(sa);
Console .ReadLine();


The result of the program:

5,2
1084647014
1,387285E-43
99


It would seem that the problem with associations was solved, but it was not there. Attempting to add a reference type to the join (for example, a string)

[StructLayout(LayoutKind.Explicit)]
public struct TestStruct
{
[FieldOffset(0)]
public int a;
[FieldOffset(0)]
public float b;
[FieldOffset(0)]
public string c;
}


You will get a deafening “boom” on the head.

Could not load type 'TestStruct' from assembly 'TestPInvoke, Version = 1.0.0.0, Culture = neutral, PublicKeyToken = null' it can be corrected by the non-object field.

The problem is that the reference fields in memory are arranged as pointers. Overlapping (overlap) a pointer with a type-value, we get the possibility of memory manipulation. For example, so.

sc = "" ;
sa = 0xA000;
Console .WriteLine(sc);


In this case, the contents of the memory at 0xA000 to the first double zero would be displayed on the screen. Using as a reference type a class containing value types (for example, int), we can manipulate memory directly. Naturally, this disgrace CLR can not allow, and therefore such actions are clearly prohibited.

What to do? Starting to beat your head against the wall, because the only way out is to define as many types of structures as necessary so that all field-values ​​do not overlap with reference fields, and to call functions use overload with fortune telling on coffee grounds, what kind of result will return us function. Horror. Scary horror. Therefore, whenever possible, avoid such associations where the reference fields are adjacent to the value fields. And if it doesn't work out - study MSDN, this topic is too big for our review.

Work with HANDLE.



Many WinAPI functions require or return HANDLE. To put it simply, HANDLE is a pointer to an object of the OS kernel, and in fact it is a four-byte integer that remains unchanged from the moment we request the object to the moment we release it.

To work with HANDLE in C #, the SafeHandle class is intended. However, passing this class directly to functions via p / invoke is not possible. We'll have to dodge.

Let's take as an example the function SetSecutiryInfo ( function description on MSDN ). While not paying attention to all sorts of left-wing things, let's focus on the main thing.

[DllImport( "advapi32.dll" , SetLastError = true , CallingConvention = CallingConvention.Winapi)]
public static extern Int32 SetSecurityInfo
(
IntPtr handle,
SE_OBJECT_TYPE ObjectType,
SECURITY_INFORMATION SecurityInfo,
IntPtr psidOwner,
IntPtr psidGroup,
IntPtr pDacl,
IntPtr pSacl
);


The scenario of working with SafeHandle is demonstrated on the example of FileStream.

FileStream fs = new FileStream ( "c:\test.txt" );
bool success = false ;
// - GC .
fs.SafeFileHandle.DangerousAddRef( ref success);
if (success)
{
// handle IntPtr
IntPtr h = fs.SafeFileHandle.DangerousGetHandle();
// native-
PInvoke.SetSecutiryInfo(h, < >);
// handle - .
fs.SafeFileHandle.DangerousRelease();
}
else
{
// HANDLE, .
}


The script should be executed up to a line of code. Work with unmanaged-resources requires accuracy and accuracy in order to prevent memory leak.

Here we are faced with two previously unknown parameters of the DllImport attribute. The first - SetLastError when set to true allows, in case of an error, to save its code, after which it can be obtained using Marshal.GetLastWin32Error. If the description of the function says that the error value can be obtained via GetLastError, then this flag should be set. The second, CallingConvention, establishes a function call convention - PASCAL (aka stdcall, aka WinApi) or cdecl. The only difference is who clears the stack — the calling code or the called function. All WinAPI functions use the stdcall convention.

However, the article has already outgrown all imaginable limits and I have to stop. Outside of our consideration were such interesting things as manual parsing of structures from IntPtr and back, wrapping returned from HANDLE, GlobalAlloc and buffers, complex data types and manual marshalling using the Marshall class. Honestly, this topic seems to me to be too specific for the audience of Habr, so I most likely will not consider it. These methods are quite sufficient for most typical tasks that require p / invoke, but if you get a more complicated task, then MSDN will have to do it.

UPD: From the audience suggest that on Win98 it is possible to install .NET up to version 2.0, which does not change much, but nevertheless, it is worth noting this fact.

Source: https://habr.com/ru/post/58582/


All Articles