- Dad, I ran for the trolley and saved five kopecks!
- Son, would run for a taxi - would save five rubles!
Today I want to tell you how to save 10 thousand dollars. And at the same time, what is much less interesting is to teach how to intercept calls to Win32 API functions, and not only. Although, first of all - of course, it is them.
Main provisions
There are exactly two well-known methods for intercepting API functions, all the rest are their variations.
')
The idea of ​​the first method is based on the fact that calls of any functions in the process from third-party DLLs are performed through the function import table. This table is populated when the DLL is loaded into the process and the addresses of all imported functions that the process may need are specified in it. Accordingly, in order to intercept an API function call, it is necessary to find the import table, in it, the function we want to intercept, save the address stored there (the same pointer to the function body) to some variable (to be able to call original), after which place a pointer to its function. Naturally, this must be done for each module (exe or dll), which is in the process, since each of them has its own import table. In addition, to implement the interception of functions that are called using the late binding mechanism, you should likewise infiltrate the export table of the module that exports this function (this time, only into one) and make a similar replacement. After that, you should prevent the unloading of your DLL for the duration of the interception (for example, DllCanUnloadNow should return false, or make an extra Lock), so that during the dll operation it will not be unloaded, the interception address will not become invalid and you will not get access violation with all the consequences.
This method, in principle, has been repeatedly described in the relevant literature, and ready-made implementations can be found, for example, on the RSDN
[1] ,
[2] . Therefore, we will not dwell on it.
Much more interesting is the second method - interception of a function through code injection. His idea is also quite primitive, and has been repeatedly described. All we need is to overwrite the first few bytes of the original function code, inserting the unconditional transition instruction to our interceptor function, perform the necessary processing, then, if we need to call the original function, first execute the code of the erased function start, then do unconditional transition to the body of the original function, skipping, of course, the worn beginning.
It sounds simple enough, however, for a person who has written all his life in high-level languages, it can become an insoluble task. The problem is further complicated by the fact that there are no definite problems, which I will discuss a little later, of ready-made implementations of this method. Although ... of course, I'm a little cunning. Microsoft has a whole framework dedicated to solving exactly this problem. It is called Microsoft Detours
[3] , easily googled and costs 10 thousand dollars for the commercial version.
Naturally, for such money they will buy it only if it is necessary. And if it’s not really necessary, but I want to - then my implementation of the second method, which I will describe here now, will do. Naturally, this implementation is far from universal, but some features of the Win32 API allow it to work in our applications and successfully replace the expensive framework.
Implementation of code injection method
Let's start from the beginning. Let's prepare a small test stand for ourselves on which we will check the success of our actions. This will be a console project in C ++. For development, I will use MS Visual Studio 2010 BETA, and you can adjust my actions depending on the IDE used.
Copy Source | Copy HTML int _tmain( int argc, _TCHAR* argv []) { HANDLE hFile = CreateFile(L "d:\\test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); CloseHandle(hFile); return 0 ; }
Copy Source | Copy HTML int _tmain( int argc, _TCHAR* argv []) { HANDLE hFile = CreateFile(L "d:\\test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); CloseHandle(hFile); return 0 ; }
Copy Source | Copy HTML int _tmain( int argc, _TCHAR* argv []) { HANDLE hFile = CreateFile(L "d:\\test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); CloseHandle(hFile); return 0 ; }
Copy Source | Copy HTML int _tmain( int argc, _TCHAR* argv []) { HANDLE hFile = CreateFile(L "d:\\test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); CloseHandle(hFile); return 0 ; }
Copy Source | Copy HTML int _tmain( int argc, _TCHAR* argv []) { HANDLE hFile = CreateFile(L "d:\\test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); CloseHandle(hFile); return 0 ; }
Copy Source | Copy HTML int _tmain( int argc, _TCHAR* argv []) { HANDLE hFile = CreateFile(L "d:\\test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); CloseHandle(hFile); return 0 ; }
Copy Source | Copy HTML int _tmain( int argc, _TCHAR* argv []) { HANDLE hFile = CreateFile(L "d:\\test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); CloseHandle(hFile); return 0 ; }
Our task will be to intercept the CreateFile and CloseHandle functions.
So let's start from the beginning. Run the program for execution by setting the breakpoint on the CreateFile function. As soon as the program stops, select the Go To Disassembly item in the context menu of our code. And that's what we'll see there.
HANDLE hFile = CreateFile(L"d:\\test.txt", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
01138A8B mov esi,esp
01138A8D push 0
01138A8F push 80h
01138A94 push 2
01138A96 push 0
01138A98 push 0
01138A9A push 40000000h
01138A9F push offset string L"d:\\test.txt" (11415B0h)
01138AA4 call dword ptr [__imp__CreateFileW@28 (114527Ch)]
01138AAA cmp esi,esp
01138AAC call @ILT+1000(__RTC_CheckEsp) (11313EDh)
01138AB1 mov dword ptr [hFile],eax
Now, pressing F10, we reach the instruction call dword ptr [__imp__CreateFileW @ 28 (114527Ch)] - this, in fact, is the function call, and press F11. We’ll get into the body of the CreateFile function.
76D60B7D mov edi,edi
76D60B7F push ebp
76D60B80 mov ebp,esp
76D60B82 push ecx
76D60B83 push ecx
76D60B84 push dword ptr [ebp+8]
76D60B87 lea eax,[ebp-8]
76D60B8A push eax
76D60B8B call dword ptr ds:[76D11568h]
So what we see here?
The first command is mov edi, edi is nothing more than a two-byte nop (not an operation). The point of this command is to gobble up one processor cycle without doing anything. Well, at the same time, take two bytes in the code. It would seem, what a waste, however, the presence of this instruction is very useful for us.
The next two commands take three bytes and they do the following. The esp register, as is known, points to the top of the stack, in which all parameters passed to the function are stored via the push instruction. At the top of the esp register (in the assembler, this address is written as [esp]) is the return point address, which is placed there by the instruction call (in our case, it will be 0x01138AAA), and then, up the stack (the stack is known to grow down) at [esp + 4] is the file name, [esp + 8] is the opening parameters, and so on.
The stack also contains local variables that are used by the function itself. If you look closely at the code, you will see two instructions.
76D60B82 push ecx
76D60B83 push ecx
These instructions simply reserve 8 bytes per stack, that is, they leave room for two DWORD variables. This call is interpreted in this way because the function uses the stdcall calling convention (that is, it passes parameters through the stack, not through registers, such as fastcall), and the ecx register is a general-purpose register, and if the function did not put any -or the values, then it can contain any garbage that was left there by the previous code. There is no sense in transmitting any garbage data to the parent function, so we interpret this call in this way.
However, after executing the push instruction, the top of the esp stack will move 4 bytes down, and [esp] will no longer point to the return point address, but to the garbage value just put there. That is, we will lose access to the variables passed to the function! This can not be allowed, but because the next thing is done.
76D60B7F push ebp
76D60B80 mov ebp,esp
The stack stores the current value of the base register, and the current register register is placed in the base register. Now we can address the variables passed to the function through the base register (at [ebp] we have the stored value of the stack register, [ebp + 4] is the return point address, [ebp + 8] is the file name, etc.) by freely manipulating the stack.
This pair of instructions (push ebp / mov ebp, esp) is called a standard prolog and has its own mirror image - a standard epilogue that looks like this:
pop ebp
However, here we will not find it - it is replaced by the leave command, which does the same.
76D60BC7 leave
76D60BC8 ret 1Ch
The last command is a return from a function, removing 0x1c bytes from the stack, which is required by stdcall agreement, when the function is required to clear the stack itself after the work is finished.
After analyzing other API functions, you can understand that they all start in exactly the same way:
mov edi,edi
push ebp
mov ebp,esp
That is, in 99% of cases, for us, “household” interception is guaranteed at the beginning of the function 5 bytes, which we can safely replace with our code, and then restore it somewhere else. This is good, so the size limit of our jump instruction can be 5 bytes. This is more than enough.
So, now we have figured out how the function call occurs and are ready for its interception. Only one detail remains - how can you actually intercept?
For this, all we need is to put the jmp instruction at the beginning of the function with the address that will indicate the beginning of our function. However, not all so simple. The fact is that the jmp instruction that would take the absolute address of our function, 5 bytes in size, simply does not exist. The only jump that works with absolute addresses is jump far, which takes 6 bytes.
Therefore, we will use jump near, which takes a relative address (that is, the difference between the address of the destination point and the instruction following the jump near). In fact, to calculate the operation parameter jump near, you need to subtract the address of the starting point from the address of the destination point and add 5 bytes (this is how much this instruction takes).
Copy Source | Copy HTML
- size_t _CalculateDispacement ( void * lpFirst, void * lpSecond)
- {
- return reinterpret_cast < char *> (lpSecond) - (reinterpret_cast < char *> (lpFirst) + 5 );
- }
Turning to the literature, we learn that the jump near opcode is 0xe9. Thus, we can perform interception as follows:
Copy Source | Copy HTML
- HANDLE WINAPI _My_CreateFileW (LPCWSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurity, DWORD dwCreationDisp, DWORD dwFlags, HANDLE hTemplate)
- {
- OutputDebugStringA (__ FUNCTION__);
- return (HANDLE) - 1 ;
- }
- #pragma pack (push, 1 )
- struct jump_near
- {
- BYTE opcode; // 0xe9
- DWORD relativeAddress;
- };
- #pragma pack (pop)
- int _tmain ( int argc, _TCHAR * argv [])
- {
- HMODULE hKernel32 = GetModuleHandle (L "kernel32.dll" );
- jump_near * lpFunc = reinterpret_cast < jump_near *> (GetProcAddress (hKernel32, "CreateFileW" ));
- lpFunc-> opcode = 0xe9;
- lpFunc-> relativeAddress = _CalculateDispacement (lpFunc, & _My_CreateFileW);
- HANDLE hFile = CreateFile (L "d: \\ test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
- CloseHandle (hFile);
- return 0 ;
- }
In order to eliminate the parasitic effects of optimization by the compiler of structures to align the data sections, we use the #pragma pack directive, which, in our case, aligns the data to a byte boundary (that is, it does not align at all).
We launch for execution, and ... op, access violation. The fact is that the code pages, for protection against buffer overflow, are write-protected.
However, not everything is so bad. They are protected from the outside, and we work from the inside, and therefore we can bypass this mechanism using the VirtualProtect function. Put before the recording opcode call:
DWORD dwProtect = PAGE_READWRITE;
VirtualProtect(lpFunc, sizeof(jump_near), dwProtect, &dwProtect);
And after the call:
VirtualProtect(lpFunc, sizeof(jump_near), dwProtect, &dwProtect);
We start on performance - and, voila, interception is carried out.
Now, there is a second problem - we need to call the original function. For this, we must do the following:
1. Save the pointer to the beginning of the function.
2. Create a block of 10 bytes in memory with permissions to execute the code (without them, when trying to execute the code, we will receive an access violation due to the implementation of the NX-Bit protection system)
3. Copy there the first 5 bytes of the original function before installing the interceptor there.
4. Create in the last 5 bytes a similar instruction jump near, which will forward the execution of the function to the original handler, passing 5 bytes overwritten by us.
5. Save the address of a 10-byte block and bring it to the CreateFileWProc type, which is described as follows:
typedef HANDLE (WINAPI * CreateFileWProc) (LPCWSTR, DWORD, DWORD, LPSECURITY_ATTRIBUTES, DWORD, DWORD, HANDLE);
6. Now, if we need to call the original, we simply use this pointer.
The code that implements this functionality in a more general case is available here:
pastebin.com/5gZdr6Hm (header file Detours.h)
pastebin.com/RCJ896TM (implementation of Detours.cpp)
I will briefly tell you how this works in the end.
Let's connect both files to our project, define a couple of interceptors and run the code for execution from the breakpoint on CreateFile.
Copy Source | Copy HTML
- #include "Detours.h"
- typedef HANDLE (WINAPI * CreateFileWProc) (LPCWSTR, DWORD, DWORD, LPSECURITY_ATTRIBUTES, DWORD, DWORD, HANDLE);
- typedef BOOL (WINAPI * CloseHandleProc) (HANDLE);
- CreateFileWProc _Std_CreateFileW;
- CloseHandleProc _Std_CloseHandle;
- HANDLE WINAPI _My_CreateFileW (LPCWSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurity, DWORD dwCreationDisp, DWORD dwFlags, HANDLE hTemplate)
- {
- OutputDebugStringA (__ FUNCTION__);
- return _Std_CreateFileW (lpFileName, dwDesiredAccess, dwShareMode, lpSecurity, dwCreationDisp, dwFlags, hTemplate);
- }
- BOOL WINAPI _My_CloseHandle (HANDLE handle)
- {
- OutputDebugStringA (__ FUNCTION__);
- return _Std_CloseHandle (handle);
- }
- int _tmain ( int argc, _TCHAR * argv [])
- {
- HMODULE hKernel32 = GetModuleHandle (L "kernel32.dll" );
- void * lpFunc = GetProcAddress (hKernel32, "CreateFileW" );
- Detours :: HookFunction (lpFunc, _My_CreateFileW, reinterpret_cast < void **> (& _ Std_CreateFileW));
- lpFunc = GetProcAddress (hKernel32, "CloseHandle" );
- Detours :: HookFunction (lpFunc, _My_CloseHandle, reinterpret_cast < void **> (& _ Std_CloseHandle));
- HANDLE hFile = CreateFile (L "d: \\ test.txt" , GENERIC_WRITE, 0 , NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
- CloseHandle (hFile);
- return 0 ;
- }
Out of habit, go to Disassembly, get to the instructions
000D8AD0 call dword ptr [__imp__CreateFileW@28 (0E527Ch)]
And press F11.
Where are we?
76D60B7D jmp _My_CreateFileW (0D13E8h)
This is the transition code set by our interceptor. This means that the interception was successful!
Press F10 a couple of times (skipping one more intermediate buffer, which the compiler puts in the DEBUG versions), and ...
HANDLE WINAPI _My_CreateFileW(LPCWSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurity, DWORD dwCreationDisp, DWORD dwFlags, HANDLE hTemplate)
{
000D8910 push ebp
000D8911 mov ebp,esp
000D8913 sub esp,0C0h
000D8919 push ebx
000D891A push esi
000D891B push edi
000D891C lea edi,[ebp-0C0h]
000D8922 mov ecx,30h
000D8927 mov eax,0CCCCCCCCh
000D892C rep stos dword ptr es:[edi]
Now, the most interesting moment. Let's get to the original function call.
000D8960 call dword ptr [_Std_CreateFileW (0E4240h)]
And press F11.
00060000 mov edi,edi
00060002 push ebp
00060003 mov ebp,esp
00060005 jmp 76D60B82
We hit the so-called springboard - this is a piece of code that performs the operations we have replaced and transfers control to the original function. Let's get to jmp, hit F10 and see a wonderful picture.
76D60B7D jmp _My_CreateFileW (0D13E8h)
76D60B82 push ecx
76D60B83 push ecx
This time we skipped the jmp instruction and immediately got on the first significant instruction - push ecx. So everything works as it should.
Potential problems and opportunities for modernization
Unfortunately, the code is not universal - it determines the possibility of interception by the presence of the standard WinAPI prolog. It is difficult to build a universal solution - at the beginning of a function, in general, there can be absolutely any instructions, including indirect addressing instructions, which will have to be corrected during the transfer. Microsoft Detours solves this problem by having a tabular disassembler and an instruction corrector.
In addition, if the size of the function is less than 5 bytes, interception is simply impossible. Such functions are sometimes found, but I have never come across tasks that require their interception. Microsoft Detours in this case folds.
To create a springboard function in memory, you cannot use the new and delete operators to work with dynamic memory, since the memory is allocated in the data section with the prohibition on code execution, and by changing the rights to the dynamic memory you open to detractors the ability to overflow the buffer. Now the program works irrationally, allocating 4 KB of memory for each interceptor - this is due to the fact that this size is minimal for the allocation of virtual memory. In theory, you need to write your own memory manager and use it. MS Detours does this.
However, what I wrote is quite a working code, which is useful if you really need it, but there is no money. The absence of a table analyzer in it can be replaced with a private analyzer - to do this, you need to demobilize the required functions and analyze their code, then add signatures to _Analyze. A 4 KB of memory per interceptor, if the program has 5-6 interceptors - not so much.
References
1. Barry Bray. Intel microprocessors. Architecture, programming and interfaces. Sixth edition. BHV-Petersburg, 2005