
Protecting your own software from reverse engineering is a rather old problem, which once tormented the hearts of many shareware developers and not only. Usually, a protector is used for such purposes, but no matter how tough the protector is, there will always be people who will cut it and crack it. However, recently, protectors have begun to apply code modification technologies (mutation and virtualization), which allow us to make a mess out of the original algorithm that looks like a black box. And indeed there are people who are confident that the virtualization and mutation of the executable code by modern commercial protectors is a kind of panacea. It’s clear that any security person will rather grin and disagree with this statement, because people who know the bitter price of security will most likely perceive any hints of perfect protection as a myth and a marketing fairy tale. In this article I will talk about my own experience and vision of the study of the black box of commercial protectors and possible attacks on it. I hope an understanding of the shortcomings of such technologies will help you to put them into practice more rationally and efficiently or not at all.
0x00. Parsing code protection mechanisms
First, let's define the technologies that we will explore:
')
1. Mutation is an obfuscation method of code, in which the initial control flow graph is broken by additional vertices, branches, supplemented with garbage instructions, cycles, without violating the original algorithm of the program. Often, original instructions are mutated into a subset of other instructions that perform the same work.
2. Virtualization is a code obfuscation method, in which the initial instructions of the algorithm are translated into instructions of the virtual machine generated by the protector. A code is embedded in the place of the original algorithm, which at run time passes intermediate instructions to the input to the virtual machine interpreting them.
Both methods complicate both static and dynamic analysis of executable code, and often protectors allow for a combination of methods.
Further in the article I will consider free demo versions of two protectors:
VMProtect ,
Safengine , they allow you to mutate, virtualize and combine both obfuscation methods.
For the imposition of technology mutations and code virtualization protectors provide the following methods:
1. At the development stage, through special markers (SDK)
The developer of the software marks in the source code a fragment of the protected code with special functions from the SDK of the protector, then after compilation, during the installation of the protector, these areas will be detected, cut and obfuscated.
int main () { VMProtectMutate("Critical_code_mut"); ...
Example of marking source code with VMProtect markers2. At the protection stage, via debug files
Directly during the installation of the protector, debug files (pdb, map) are read and based on them the map of the application objects is determined. Next, the developer chooses which functions need to be protected and how, after which they are completely cut from the code and processed.
Why is the protected code cut? The fact is that when a code is diluted, its size increases significantly, and therefore it is impossible to fit new instructions in the original segment, therefore, cutting the code into its own memory section is applied.
A simplified mutation scheme for the executable code of the function void Test () {printf ("Hello"); }0x01. Advantages and disadvantages of protectors
The obvious advantage of the mutation is of course the impossibility of a visual study of the algorithm. At first glance, before a researcher, there is just a plate with a mountain of spaghetti, which is extremely difficult to disassemble into a manual one, which is what modern protectors place on bets.
Equally important is the combination of various techniques of anti-debugging, anti-patching, anti-cheating, mutation, etc. In the aggregate, all this leads to a complication of the analysis processes, but does not stop them.
The disadvantages of such technologies are also missing. The first of which is the loss of performance, because the mutable code is increased hundreds or even thousands of times. Not less it concerns also virtualization, usually the virtual computer is much heavier than the mutated code. And if you combine both approaches, you get manic bloated code. Moreover, overhead costs do not always justify such obfuscation, the use of mutating technologies is one-sidedly aimed at complicating the restoration of the original graph of program execution.
Here is an example of tracing a primitive function:
void test() { printf("This is protected message #1\n"); printf("This is protected message #2\n"); }
Before mutation:
Trace log main 00405A2A CALL 004059F0 ESP = 0018FEE0
main 004059F0 PUSH EBP ESP = 0018FEDC
main 004059F1 MOV EBP, ESP EBP = 0018FEDC
main 004059F3 SUB ESP, 40 ESP = 0018FE9C
main 004059F6 PUSH EBX ESP = 0018FE98
main 004059F7 PUSH ESI ESP = 0018FE94
main 004059F8 PUSH EDI ESP = 0018FE90
main 004059F9 PUSH OFFSET 0040ED10 ESP = 0018FE8C
main 004059FE CALL DWORD PTR DS: [<& MSVCR100D.printf>] EAX = 00000012, ECX = 84CB6CA9, EDX = 7418F4B8
main 00405A04 ADD ESP, 4 ESP = 0018FE90
main 00405A07 PUSH OFFSET 0040ECF8 ESP = 0018FE8C
main 00405A0C CALL DWORD PTR DS: [<& MSVCR100D.printf>] EAX = 00000014
main 00405A12 ADD ESP, 4 ESP = 0018FE90
main 00405A15 POP EDI ESP = 0018FE94
main 00405A16 POP ESI ESP = 0018FE98
main 00405A17 POP EBX ESP = 0018FE9C
main 00405A18 MOV ESP, EBP ESP = 0018FEDC
main 00405A1A POP EBP ESP = 0018FEE0, EBP = 0018FF30
main 00405A1B RETN ESP = 0018FEE4
After mutation by Safengine protector:
Trace log main 00405A2A CALL 004059F0 ESP = 0018FEE0
main 004059F0 JMP 004C9C6A
main 004C9C6A JMP 004C8391
main 004C8391 CALL 004C82D6 ESP = 0018FEDC
main 004C82D6 LEA ESP, [ESP + 2] ESP = 0018FEDE
main 004C82DA LEA ESP, [ESP + 2] ESP = 0018FEE0
main 004C82DE PUSH EBP ESP = 0018FEDC
main 004C82DF NEG BP EBP = 001800D0
main 004C82E2 JMP 004C812E
main 004C812E MOV EBP, ESP EBP = 0018FEDC
main 004C8130 STC
main 004C8131 SUB ESP, 40 ESP = 0018FE9C
main 004C8134 CALL 004C8006 ESP = 0018FE98
main 004C8006 JMP SHORT 004C7F96
main 004C7F96 LEA ESP, [ESP + 4] ESP = 0018FE9C
main 004C7F9A PUSH EBX ESP = 0018FE98
main 004C7F9B CALL 004C7F80 ESP = 0018FE94
main 004C7F80 LEA ESP, [ESP + 4] ESP = 0018FE98
main 004C7F84 PUSH ESI ESP = 0018FE94
main 004C7F85 JMP SHORT 004C7FB6
main 004C7FB6 PUSH EDI ESP = 0018FE90
main 004C7FB7 PUSH 0040ED10 ESP = 0018FE8C
main 004C7FBC CALL DWORD PTR DS: [40E19C] EAX = 00000012, ECX = 93D2AD8F, EDX = 7418F4B8
main 004C7FC2 JMP SHORT 004C7FA0
main 004C7FA0 STC
main 004C7FA1 JMP SHORT 004C7FDA
main 004C7FDA ADD ESP, 4 ESP = 0018FE90
main 004C7FDD CALL 004C7FC4 ESP = 0018FE8C
main 004C7FC4 LEA ESP, [ESP + 4] ESP = 0018FE90
main 004C7FC8 PUSH 0040ECF8 ESP = 0018FE8C
main 004C7FCD CALL 004C7FE2 ESP = 0018FE88
main 004C7FE2 MOV BYTE PTR SS: [ESP], CH
main 004C7FE5 JMP SHORT 004C7FEB
main 004C7FEB LEA ESP, [ESP + 4] ESP = 0018FE8C
main 004C7FEF CALL DWORD PTR DS: [40E19C] EAX = 00000014
main 004C7FF5 SETPE BH EBX = 7EFD0100
main 004C7FF8 XCHG BL, BH EBX = 7EFD0001
main 004C7FFA INC EBX EBX = 7EFD0002
main 004C7FFB JMP SHORT 004C805A
main 004C805A ADD ESP, 4 ESP = 0018FE90
main 004C805D POP EDI ESP = 0018FE94
main 004C805E MOV ESI, 4B536EDD ESI = 4B536EDD
main 004C8063 MOV SI, WORD PTR SS: [ESP] ESI = 4B530000
main 004C8067 JMP SHORT 004C801E
main 004C801E LEA EBX, [CDDFCA2F] EBX = CDDFCA2F
main 004C8024 POP ESI ESP = 0018FE98, ESI = 00000000
main 004C8025 CALL 004C8008 ESP = 0018FE94
main 004C8008 POP WORD PTR SS: [ESP] ESP = 0018FE96
main 004C800C MOV BX, WORD PTR SS: [ESP + 1] EBX = CDDF0000
main 004C8011 XCHG BYTE PTR SS: [ESP], BL EBX = CDDF004C
main 004C8014 JMP SHORT 004C8040
main 004C8040 LEA ESP, [ESP + 2] ESP = 0018FE98
main 004C8044 POP EBX EBX = 7EFDE000, ESP = 0018FE9C
main 004C8045 JMP SHORT 004C802A
main 004C802A MOV ESP, EBP ESP = 0018FEDC
main 004C802C LEA EBP, [EDI + EAX] EBP = 00000014
main 004C802F MOV BP, 3200 EBP = 00003200
main 004C8033 MOV EBP, CEF73787 EBP = CEF73787
main 004C8038 JMP 004C80ED
main 004C80ED POP EBP ESP = 0018FEE0, EBP = 0018FF30
main 004C80EE RETN ESP = 0018FEE4
-------- Logging stopped
In the given example, the code mutation is performed at the lowest level of complexity. Safengine allows you to increase this complexity up to 254 times, which ultimately can inflate your code of 10 instructions into a garbage collection that is a couple of times longer than the original program size, which is quite redundant.
Also to the disadvantage can be attributed cases of damage to the program, which unfortunately in my memory were. It is one thing if such failures occur in a normal program, which will lead to a fall, and completely different if a fall occurs in the driver, which is completely unacceptable. As it is known, protectors can process various executable files (exe, dll, ocx, sys).
Marketing policy also sometimes leaves much to be desired. Technical garbage in the ears of customers creating the illusion of security is not good. After all, the developers of protectors do not write in the descriptions of their products that this technology is good, but there is such and such a flaw in it.
0x02. The problem of incomplete code protection
Finally, we turn to the question, what prevents the protector developers from writing an even more complex and durable protection? The answer is quite simple - incomplete information. Receiving a binary file as input, even with debug information, there are a number of restrictions, the violation of which will lead to non-universality of the protector or damage to the protected application. For an example of such restrictions, take a look at the structure of a regular PE application:
Raw virtual name
------------------------------------
00000000 00400000 PE header
00000200 00401000 Code sector
00000400 00402000 Data sector
00000600 00403000 Resources Sector
The code and data in the application are placed in different sections, but there are clear links between them. Thus, several different functions can refer to the same data block, and data can refer to other data and functions. Moreover, this connection is not always obvious. On this basis, protectors cannot freely manipulate the structure of the executable file: move data, expand, move functions in the source segment, etc. Although once I tried to
expand sections of the executable file, but this is a special case that is not universal. Therefore, the protectors stack their data as follows:
Raw virtual name
------------------------------------
00000000 00400000 PE header
00000200 00401000 Code sector
00000400 00402000 Data sector
00000600 00403000 Protector sector
00000800 00404000 Resources Sector
The original location of the code and data does not change, although the movement of some other sectors is possible (resources, relocation, ...). The protected code is cut out and trash instructions are placed in its place, most often these instructions are part of the mutated execution graph. The mutated graph and the virtual machine are placed in the tread sector.
Also, due to an increase in computational load, protectors cannot mutate all application code. Therefore, the choice of protected areas of the code falls on the shoulders of the programmer. But the programmer cannot always apply this overlay wisely. For example, having covered some encryption algorithm with a mutation, the programmer will forget to mutate all the calls of this cipher, finding that the researcher will be able to get the input data of this algorithm and based on them build assumptions on the cipher device and maybe even classify or reproduce it.
All this leads to the leakage of information from the mutated \ virtualized code and allows you to perform attacks on it. Knowing the approximate location of data or some functions, we can track the access to them from the black box, thus, instead of unraveling the coil, we make up a phantom model of the algorithm. Of course, this approach is far from claiming to give us a complete picture of the algorithm of the program, but in some cases this is quite enough.
0x03. Fishing in a black box
A typical black box research tool is a tracer. However, the obfuscated code track consists of 80-99% of garbage, so we need to get only useful information from this garbage. This process is a bit like fishing. Imagine that the trail is a lake, a tracer rod, and the tracing conditions are this bait. Using the above disadvantages of treads, we can find the right bait and catch the right information. Let's see how it looks in practice.
Suppose we have the following program:
void array_fill(unsigned char *buf, size_t size) { for (int i = 0; i < size; i++) { buf[i] = i; if (i > 0) { buf[i] ^= buf[i - 1]; } } } int main() { unsigned char buf[10]; array_fill(buf, sizeof(buf)); return 0; }
Let the mutation and virtualization be superimposed on the array_fill () function. Let's trace the call to the array_fill () function:
Original steps:
230Number of steps after obfuscation VMProtect:
83924Number of steps after obfuscation Safengine:
250382As you can see from the figures, it’s simply unrealistic to disassemble the route manually. Therefore, we use the method of fishing.
Imagine that we know nothing about the array_fill () function. By examining main () we can say for sure that its sub-call accepts the address of the buffer and its size at the input, after which, according to some algorithm, the buffer is filled with information. Therefore, we will give our tracer a rule according to which we will log only read / write access to the buffer transferred to the function. The result for all three variants of applications will be the same:
Trace log main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7700, ECX = 0018FF34
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7701, ECX = 0018FF35
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF34
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF35
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7701, ECX = 0018FF35
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7702, ECX = 0018FF36
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF35
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF36
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7703, ECX = 0018FF36
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7703, ECX = 0018FF37
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF36
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF37
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7700, ECX = 0018FF37
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7704, ECX = 0018FF38
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF37
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF38
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7704, ECX = 0018FF38
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7705, ECX = 0018FF39
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF38
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF39
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7701, ECX = 0018FF39
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7706, ECX = 0018FF3A
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF39
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF3A
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7707, ECX = 0018FF3A
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7707, ECX = 0018FF3B
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF3A
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF3B
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7700, ECX = 0018FF3B
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7708, ECX = 0018FF3C
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF3B
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF3C
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7708, ECX = 0018FF3C
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7709, ECX = 0018FF3D
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF3C
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF3D
As can be seen, regardless of the degree of code mutation and the amount of garbage, we were able to get a clean route describing all calls to the buffer. But can we restore the original algorithm from it? Let's try.
And so, if you peer into the track, a cycle becomes visible (this can be seen from the repeated calls to the code at 004B7898):
; 1 step
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7700, ECX = 0018FF34
; 2 step
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7701, ECX = 0018FF35
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF34
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF35
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7701, ECX = 0018FF35
; 3 step
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7702, ECX = 0018FF36
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF35
main 004B7A8E MOV AL, BYTE PTR DS: [ECX] EAX = 004B7A89, ECX = 0018FF36
main 004B7898 MOV BYTE PTR DS: [ECX], AL EAX = 004B7703, ECX = 0018FF36
...
and there are only 10 such steps, which corresponds to the size of our buffer. Then everything is quite simple, knowing what values are taken and what are put back, our algorithm of work lies with us almost in the palm of your hand. The only thing you need to guess here is the application of the XOR operation, but in this case it is absolutely not difficult.
Of course, this example is artificial, but in practice we have to deal with more complex algorithms with nested calls and implicit logic, obtaining information about which can be ten times more complicated. In such situations, more complex tracers, deobfuscators, DBI, etc. are used. Nevertheless, it all comes down to extracting information from traders and analyzing it. Knowing the addresses that the algorithm can access, we collect enough useful information about it.
0x04. Analysis of nested calls
Equally important information that can be pulled out of the black box is information about nested calls. These can be WinAPI calls, library functions, functions of the application itself. Such information would help us to study in more detail the internal structure and dependencies of the protected algorithm.
In the simplest case, to analyze nested calls, we can use known information about the structure of the executable file. That is, knowing in which segment the mutated code is located, it is possible to track all exits from this segment that will correspond to the calls of external functions. And this will actually work for library functions, but problems may arise for calling the program's own functions. As I said somewhere above, cutting out the protected code, in its place the protector can put a part of its own code (a mutated graph, a piece of a virtual machine). Thus, if the execution of the code goes from the protector section to the application code section, there is no guarantee that this is a call to the nested function, and not the execution of a part of the mutated graph.
The solution to this problem is again quite probabilistic. First, we need a flexible tracer, for example, you can use the
Intel Pin framework, the last time I used the tracer built into OllyDbg. Secondly, we need to create competent tracing rules that will allow us to log only nested calls, excluding garbage located in the code section. If we want to define a call to a library function, we just need to create a rule that will fix the transfer of control beyond the memory to the module under study. In most cases, this will be the transfer of control to some library, although in some non-standard situations the transfer of control can be performed with some base-independent protector code. But we will not consider such special cases. However, what to do with nested calls in the program itself?
Alternatively, nested functions can be defined by the prolog signature. Compilers usually give functions a fairly patterned look, which you can hook on.
Template function prologueThat is, when returning control from the obfuscated code (tread section), we can check the presence of Int3 and Retn instructions located before the intended prologue in the application code section and also verify the prolog with pre-prepared signatures. This will not always help, since the compiler can stuff anything into the code, especially after optimization, but this is more than nothing.
I also noticed one small flaw in obfuscation with Safengine protector, which may be present in other protectors, but not in VMProtect. The disadvantage is that if the protector mutates some function and there is a call to another mutated function, the nested function is not called in the mutated code, but through an adapter (jmp instruction) which is located at its original address. This is also a leakage of information from the mutated code and can be used to create a tracer rule. Although in VMProtect, for example, the nested mutable function will be called immediately in the tread segment, not allowing us to define the nested call in this way. Perhaps this flaw exists only in the demo version of Safengine.
I did not begin to clog the article with the source code of the tracer on Intel Pin, but if you like the article, you can write a separate article about the tracer.
0x05. Recovery graph execution
We were all around and around, trying to avoid a complete analysis of the mutated code, but deobfuscation and devirtualization are far from a myth. Just in fact, such technologies are quite complex and are not written on the knee. I think experts from universities and anti-virus laboratories with might and main have already mastered this area and have a rich set of tools for their implementation. Unfortunately, I’m not very familiar with the practice of applying such technologies and I don’t have much to say about them, except that they exist and that it’s cool :)
If you have any knowledge and experience in this matter I would be grateful if you throw a couple of references in the comments.
0x06. Conclusion
As you can see, the black box actually turns out to be not so black. By collecting bits of information, we can partially restore the logic of the protected algorithm and sometimes in sufficient volume.
In general, I think it is best to unlock the potential of using virtualization technologies and mutations can only specialized compilers. After all, the compiler, unlike the protector, has almost complete information about the protected code and can easily manipulate the location and appearance of the protected code. If you implement a set of hints for the compiler, you can help the programmer to choose the degree of obfuscation complexity for certain functions and methods, which will ensure a more efficient distribution of protection and therefore the load.
Finally, if you want to test your skills in reverse obfuscated code, I suggest solving
this crackme .
Thanks for attention.