AGTH reverse to recreate an alternate GUI

Since the topic of reverse engineering is quite popular in Habré, I decided to share my insights on this topic. I, like many fans of visual stories , know a program like AGTH (Anime-Game-Text-Hooker) . It allows you to extract text from short stories for later translation (most games are Japanese). The development of this program, apparently, was discontinued in 2011, the source could not be found, and since the soul wanted additional features, it was decided to reverse this program and, based on the data, recreate an alternative shell with all the functions I lack.

The original program consists of two parts - an executable file and an interception module executed in the form of a dynamic library. The program injects this library into the game process and with its help receives text from there.
I will rewrite and rewrite only the executable file, and the interception module will leave the original one. There are several reasons for this. In addition to the obvious complexity of the module and the inherent laziness, it is necessary to ensure the compatibility of my design with the so-called H-codes. H-code is a data set needed by the interceptor for correctly installing a hook in the case when default hooks are ineffective. It contains memory addresses, register numbers and other information about the location of the text in the game. For each individual game, this code is unique and found by enthusiasts. Therefore, to write your own module so to say “based on” will not work. It will be necessary to ensure full compatibility with these codes, and this is a completely different level of complexity. Yes, and it will not give any additional benefits.

Parsing the communication protocol of the interception module and AGTH

Obviously, the interception module in the game and AGTH somehow interact with each other, and to write an alternative shell, you need to know how. There are quite a few ways to transfer data from one program to another, ranging from window messages to sockets. What method was actually used, I learned by chance. Just went into the properties of the process agth.exe through Process Explorer and decided to see what lines this program contains.

')
Immediately, the line "\\. \ Pipe \ agth" rushed into the eyes - this is how the named pipe is indicated, which means we can assume that AGTH uses pipes to communicate with the game. Now we have a direction in which to start searches. For debugging, I will use OllyDbg , my favorite debugger.
Let's load AGTH into “Olya” and immediately put bryaks on the CreateNamedPipe * functions inside the kernel32 module. One of these bryak should work as soon as the program tries to create a named pipe, and from this point you can get to the code that works with these pipes.

We continue the execution and from the second trigger of the breakpoint we get to the right place. The fact that this place is necessary tells us the presence of the string "\\. \ Pipe \ agth" on the stack.

Now let's go to the address 0x00AF3A64 , which lies on top of the stack and should point to the code immediately after the call to CreateNamedPipeW .

001B3A43 > 56 PUSH ESI ; 0x0 00AF3A44 . 6A 00 PUSH 0 00AF3A46 . 68 00000200 PUSH 20000 00AF3A4B . 6A 00 PUSH 0 00AF3A4D . 68 FF000000 PUSH 0FF 00AF3A52 . 6A 06 PUSH 6 00AF3A54 . 68 01000840 PUSH 40080001 00AF3A59 . 68 A026AF00 PUSH agth.00AF26A0 ; UNICODE "\\.\pipe\agth" 00AF3A5E . FF15 4010AF00 CALL DWORD PTR DS:[<&KERNEL32.CreateName>; kernel32.CreateNamedPipeW 00AF3A64 . 8BF8 MOV EDI,EAX 00AF3A66 . EB 03 JMP SHORT agth.00AF3A6B

Here it is already possible to make out with what parameters our pipe is created, namely:

 CreateNamedPipeW("\\.\pipe\agth", 40080001, 6, 0xFF, 0, 0x20000, 0, NULL);

Use the documentation and expand the magic numbers into named constants. It will turn out like this:

 CreateNamedPipeW("\\.\pipe\agth", PIPE_ACCESS_INBOUND | FILE_FLAG_OVERLAPPED | FILE_FLAG_FIRST_PIPE_INSTANCE, PIPE_WAIT | PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE, 0xFF, 0, 0x20000, 0, NULL);

Running through the code below, you can find the call to the ConnectNamedPipe and WaitForMultipleObjects functions, which are waiting for an event from the created pipe.

Well, now you need to know how the data is read, or rather, what is the size of the data block transferred from the game to the application. The fact that the data is transmitted in blocks, rather than in a continuous stream of bytes, is indicated by the presence of the PIPE_TYPE_MESSAGE flag used to create the channel.

It is easy to see that after WaitForMultipleObjects returns control, a new thread will be created, which probably handles events on the newly connected pipe. Let's go to the address 0x00CC5080 :

Here is the required ReadFile function, which is called with parameters:

 0291D9B4 00000104 |hFile = 00000104 (window) 0291D9B8 0291DA78 |Buffer = 0291DA78 0291D9BC 00001FE8 |BytesToRead = 1FE8 (8168.) 0291D9C0 0291DA14 |pBytesRead = 0291DA14 0291D9C4 004C4168 \pOverlapped = 004C4168

I got them from the stack at the moment when the breakpoint, set in advance for the ReadFile call, worked. In general, we are only interested in the BytesToRead parameter, which is 8168 bytes. Probably - this is the size of the structure with the text that the game sends to the program.

As a result, enough information was gathered about how the interaction with the game takes place: AGTH implements a pipe server that receives data in 8168 byte chunks. Now you can proceed to the analysis of what these bytes mean.

I decided to do a data format analysis inside my program. In it, I implemented my own server using the data obtained earlier, and with its help I received messages from the game. Very convenient - you can get the structure of the desired size and read data directly into it. In the course of the analysis of what this or that group of bytes means, this structure can be modified and at the end get a complete description of all the fields.

This is what looks like what comes into the program from the game. Immediately struck by the lines UserHookQ and Kotarou. The first is the name of the function that is displayed in the original program, the second is the text from the game in UTF-16 encoding. Also, the number 7 (blue selection) is noticed, which, as it turned out, is always equal to the number of characters of the string of game text. Looking through different data sets, it turned out that the function name is a null-terminated string with a maximum length of 24 characters. That is, in the case of the screenshot above, all the bytes between the green and blue selection are just garbage. There are still 16 bytes of data at the beginning of the structure. The first two variables were easy to determine - these are Context and Subcontext, which can also be seen in the window of the original program. The third parameter was a little harder to find - it always had small values and changed only when the game was restarted. It turned out to be ProcessID games. The last of the four was constantly changing and had quite large values. The only clue was that this value always increased with time and never decreased. This was the time, or rather the result of the GetTickCount function call .

The result was the following structure:

  TAGTHRcPckt = packed record // SizeOf = 8168 bytes Context: Cardinal; Subcontext: Cardinal; ProcessID: Cardinal; UpTime: Cardinal; TextLength: Cardinal; HookName: array [0 .. 23] of ansichar; Text: array [0 .. 4061] of widechar; end;

With the communication between the application and the game figured out, now you need to find out how the text capture module gets into the game and gets information about where and how to install the hooks.

Loader Study

Run the game (or any other application), wait for the final download and hook on it with a debugger. Next, open the list of modules, select kernel32 , and in the list of functions, set bryakpoint on all functions that start at LoadLibrary * . This is done because how not to twist, but the final dll loading will be done by calling one of these functions and, if you intercept the call, you can wander around the stack and go to the loader itself.

We continue the program. Then run AGTH and tell it the process of the game:

 agth /PN_.exe

The debugger will work right there. In my case, the breakpoint worked on the LoadLibraryW function.
Let's look at the stack:

the second one is the function argument, but the first one is the return address and it leads somewhere in the kernel32 kernel . Strange, I expected to see the address of the loader code embedded in the game there. Well, let's see what lies next to the LoadLibraryW argument. Let's go to 0x7EF80022 and here it is!

This is the desired bootloader, by the way, quite tricky: there are only 4 teams (starting with the address 0x7EF80014 , the data goes).

 7EF80000 68 1E00F87E PUSH 7EF8001E ; UNICODE "0" 7EF80005 68 1400F87E PUSH 7EF80014 ; UNICODE "AGTH" 7EF8000A 68 121E4D75 PUSH kernel32.LoadLibraryW 7EF8000F -E9 CE9755F6 JMP kernel32.SetEnvironmentVariableW

First, the parameters of the SetEnvironmentVariableW ('AGTH', '0') function are added to the stack, then the address of the LoadLibraryW function, which serves as the return address for the SetEnvironmentVariableW function, since it is called not via CALL , but by using an unconditional JMP transition. “So this is why LoadLibraryW was called from somewhere in the depths of kernel32 , and not the loader!” I thought. But the thought of what would happen after the LoadLibrary worked would not let me rest. Therefore, I decided to look where all the same control will return after the call. We go to the address 0x754D3677 and see:

 754D3677 50 PUSH EAX 754D3678 FF15 F0064D75 CALL DWORD PTR DS:[<&ntdll.RtlExitUserThread>] ; ntdll.RtlExitUserThread

Apparently, after calling LoadLibraryW , RtlExitUserThread will be called with a parameter that will return LoadLibraryW and thus the remote thread will end successfully. It would seem that everything is fine, but the thought did not leave me: “Where did this address appear on the stack, and where did the program get the address of the line in which the path to the embedded dll lies? After all, there is nothing like this in the loader code! ” It turns out that someone put these addresses on the stack before the first loader instruction was called. And then it dawned on me: remote threads are created using the CreateRemoteThread function, and it, in addition to a pointer to a function, also accepts a parameter for this function. That is, it first pushes the RtlExitUserThread address onto the stack so that the thread, having made RET , correctly terminated, and then another variable, the parameter.

Once again in brief:

CreateRemoteThread pushes the RtlExitUserThread address, the path to the dll and starts the bootloader
the loader adds to the stack the arguments for SetEnvironmentVariableW, the address LoadLibraryW and makes an unconditional transition to SetEnvironmentVariableW
SetEnvironmentVariableW takes its arguments from the stack and when returning from it the thread is at the beginning of LoadLibraryW
LoadLibraryW takes the path to the dll from the stack and when returning from it, the stream goes to RtlExitUserThread
RtlExitUserThread ends the stream

By the way, such a game with a stack, when a function after RET arrives not in the code that caused it, but in another function, is called a return-oriented programming technique or simply ROP (Return-Oriented Programming) .

Well, with the implementation and transfer of parameters to the target process, we figured out, all parameters are passed through an environment variable named “AGTH”. It turns out that in case of writing your own loader, it is enough to set the environment variable and load the dll.

Loader:

 //       TInject = packed record // code cmd0: BYTE; cmd1: BYTE; cmd1arg: DWORD; cmd2: BYTE; cmd2arg: DWORD; cmd3: WORD; cmd3arg: DWORD; cmd4: BYTE; cmd4arg: DWORD; cmd5: WORD; cmd5arg: DWORD; cmd6: BYTE; cmd6arg: DWORD; cmd7: WORD; cmd7arg: DWORD; // data pLoadLibrary: Pointer; pExitThread: Pointer; pSetEnvironmentVariableW: Pointer; ENVName: array [0 .. 4] of WideChar; ENVValue: array [0 .. MAX_PATH] of WideChar; LibraryPath: array [0 .. MAX_PATH] of WideChar; end; const //     PUSH: BYTE = $68; CALL_DWORD_PTR: WORD = $15FF; INT3: BYTE = $CC; NOP: BYTE = $90; {  Dll   } class function THooker.InjectDll(Process: DWORD; ModulePath, HCode: WideString): boolean; var Memory: Pointer; CodeBase: DWORD; BytesWritten: SIZE_T; ThreadId: DWORD; hThread: DWORD; hKernel32: DWORD; Inject: TInject; function RebasePtr(ptr: Pointer): DWORD; //      //    begin Result := CodeBase + DWORD(ptr) - DWORD(@Inject); end; begin Result := false; //      //        Memory := VirtualAllocEx(Process, nil, sizeof(Inject), MEM_TOP_DOWN or MEM_COMMIT, PAGE_EXECUTE_READWRITE); if Memory = nil then Exit; CodeBase := DWORD(Memory); hKernel32 := GetModuleHandle('kernel32.dll'); //   : //  Inject       FillChar(Inject, sizeof(Inject), 0); with Inject do begin // code cmd0 := NOP; cmd1 := PUSH; cmd1arg := RebasePtr(@ENVValue); cmd2 := PUSH; cmd2arg := RebasePtr(@ENVName); cmd3 := CALL_DWORD_PTR; cmd3arg := RebasePtr(@pSetEnvironmentVariableW); cmd4 := PUSH; cmd4arg := RebasePtr(@LibraryPath); cmd5 := CALL_DWORD_PTR; cmd5arg := RebasePtr(@pLoadLibrary); cmd6 := PUSH; cmd6arg := 0; cmd7 := CALL_DWORD_PTR; cmd7arg := RebasePtr(@pExitThread); // data //      , //  ImageBase kernel32.dll     //         //  -      //     kernel32.dll  //     //       pLoadLibrary := GetProcAddress(hKernel32, 'LoadLibraryW'); pExitThread := GetProcAddress(hKernel32, 'ExitThread'); pSetEnvironmentVariableW := GetProcAddress(hKernel32, 'SetEnvironmentVariableW'); lstrcpy(@LibraryPath, PWideChar(ModulePath)); lstrcpy(@ENVName, PWideChar('AGTH')); lstrcpy(@ENVValue, PWideChar(HCode)); end; //       WriteProcessMemory(Process, Memory, @Inject, SIZE_T(sizeof(Inject)), BytesWritten); //    hThread := CreateRemoteThread(Process, nil, 0, Memory, nil, 0, ThreadId); if hThread = 0 then Exit; //      WaitForSingleObject(hThread, INFINITE); CloseHandle(hThread); VirtualFreeEx(Process, Memory, 0, MEM_RELEASE); // -      Result := true; end;

Now we need to deal with the parameters, more precisely with the way the command line of the program, through which the H-code is specified, is converted into the value of that environment variable.
In order not to constantly poking around in the debugger, a stub library was written whose only function is to read and write the “AGTH” variable for further study.

Stub code:

 library AGTH; uses windows; var buffer: array [0 .. 255] of widechar; begin GetEnvironmentVariableW('AGTH', buffer, 256); MessageBoxW(0, buffer, buffer, 0); end.

Next, replacing the original dll, I began to sort through all the possible command line switches and see how they are mapped to the environment variable. It turned out to be easy.
A list of all commands can be found in the help embedded in the original program. Of these commands, I was only interested in Hook options.

Hook options:

 /H[X]{A|B|W|S|Q}[N][data_offset[*drdo]][:sub_offset[*drso]]@addr[:module[:{name|#ordinal}]] - select OK for more help /NC - don't hook child processes /NH - no default hooks /NJ - use thread code page instead of Shift-JIS for non-unicode text (should be specified for capturing non-japanese text) /NS - don't use subcontexts /S[IP_address] - send text to custom computer (default parameter: local computer) /V - process text threads from system contexts /X[sets_mask] - extended sets of hooked functions (default parameter: 1; number of available sets: 2)

Then simply enter the random command line parameters and see how they affect the final result.
For example, the set of keys '/ HQN54 @ 48693e / NH / Slocalhost' turns into '20S0: localhostUQN54 @ 48693e' and you can immediately see that the values of the keys / H and / S are transmitted as is. It was also found that the prefixes U and S0: never change and disappear completely only in the absence of the corresponding keys / H and / S. All other keys affect only the first two hexadecimal numbers. After playing with the keys a little more, it turned out that these are bit flags, where each key is responsible for setting a separate bit in the byte that these two numbers represent.

The result was a sign:

 /nh - 20 - 10 0000 /nc - 10 - 01 0000 /nj - 08 - 00 1000 /x3 - 06 - 00 0110 //  /x2  /x /x2 - 04 - 00 0100 /x - 02 - 00 0010 /V - 01 - 00 0001

Command line to H code conversion function

 const PROCESS_SYSTEM_CONTEXT = $01; HOOK_SET_1 = $02; HOOK_SET_2 = $04; USE_THREAD_CODEPAGE = $08; NO_HOOK_CHILD = $10; NO_DEF_HOOKS = $20; class function THooker.GenerateHCode(AGTHcmd: string): string; var i: Integer; lcmd, uFlag, sFlag: string; flags: BYTE; begin lcmd := lowercase(AGTHcmd); flags := 0; if pos('/nh', lcmd) > 0 then flags := flags or NO_DEF_HOOKS; if pos('/nc', lcmd) > 0 then flags := flags or NO_HOOK_CHILD; if pos('/nj', lcmd) > 0 then flags := flags or USE_THREAD_CODEPAGE; if pos('/v', lcmd) > 0 then flags := flags or PROCESS_SYSTEM_CONTEXT; if pos('/x3', lcmd) > 0 then flags := flags or (HOOK_SET_1 or HOOK_SET_2) else if pos('/x2', lcmd) > 0 then flags := flags or HOOK_SET_2 else if pos('/x', lcmd) > 0 then flags := flags or HOOK_SET_1; //    /h        U i := pos('/h', lcmd); if i > 0 then begin uFlag := copy(AGTHcmd, i, length(AGTHcmd) - (i - 1)); // /h -> endstr delete(uFlag, 1, 2); // del /h i := pos(' ', uFlag); if i > 0 then delete(uFlag, i, length(uFlag) - (i - 1)); uFlag := 'U' + uFlag; end else uFlag := ''; //    /s        S0: i := pos('/s', lcmd); if i > 0 then begin sFlag := copy(AGTHcmd, i, length(AGTHcmd) - (i - 1)); delete(sFlag, 1, 2); // del /s i := pos(' ', sFlag); if i > 0 then delete(sFlag, i, length(sFlag) - (i - 1)); sFlag := 'S0:' + sFlag; end else sFlag := ''; Result := IntToHex(flags, 1) + sFlag + uFlag; end;

Thus, the format of the parameters for the library was dismantled.

the end

That's all. The case is left for small - to implement your own interface and add the necessary features. What was done:

With the help of layered windows, the subtitle output over the game
Added integration with Google translator
JS user scripts for text preprocessing before translation

Writing the rest of the code is rather trivial, so I won't give it here, just leave a link to Github .

Source: https://habr.com/ru/post/262311/

All Articles

AGTH reverse to recreate an alternate GUI

Parsing the communication protocol of the interception module and AGTH

Loader Study

the end

More articles: