📜 ⬆️ ⬇️

Dynamic instrumentation is not easy, but trivial *: we write yet another instrumentation for American Fuzzy Lop

(*) Not really, really.
Probably, many have heard about Valgrind - a debugger that can tell where memory is leaking in your native program, where branching depends on an uninitialized variable and much more (and in addition to memcheck, it also has other modes of operation). Inside itself, this wonder-program grinds the native code into some intermediate bytecode, instructs it, and generates a new machine code — already with run-time checks. But there is a problem: Valgrind does not know how to work under Windows. When I needed it, the search led me to a similar utility called DrMemory , and there was also an analogue strace with it. But it’s not so much about them as about the dynamic instrumentation library on the basis of which they are built, DynamoRIO . At some point I became interested in this library from the point of view of writing my own instrumentation, started searching for documentation, came across a large number of examples and was amazed that you can write simple, but complete instrumentation like counting call instructions in literally 237 lines of code, 32 of which - a license, and 8 - a description. No, it’s certainly not “writing a killer of Valgrind in 30 lines of code in JavaScript”, but much simpler than what you can imagine for such a task.


As an example, let's write the fourth implementation of instrumentation for fuzzer American Fuzzy Lop, which was recently written on Habré .


What is AFL


AFL is a guided fuzzing tool for finding bugs and vulnerabilities, assembled from reinforced concrete crutches heuristics implemented in the most trivial and efficient way. This is how you look at a tool that is capable, by observing the behavior of libjpeg, to synthesize valid jeeps, and you are amazed that all this is done on the basis of not so clever mechanics. In short, in order to complete the AFL operation, the target binary must be instrumented so as to collect edge coating during execution: imagine each basic block ( basic block , something like a sequence of instructions from the label to the nearest transition instruction) as a vertex of the graph. Ribs are possible ways to transfer control between the BB. Accordingly, the AFL is interested in what transitions and in what quantity have occurred between the basic blocks of the program.


The main method of instrumentation in AFL is static at compile time using afl-gcc / afl-g++ wrappers or their counterparts for clang . What is funny, afl-gcc replaces the called as command with a wrapper that rewrites the assembler listing generated by the compiler. There is a more advanced version, called llvm mode , which is honestly built into the compilation process (produced with LLVM, of course) and, in theory, should therefore give greater performance to the generated code. Finally, for fuzzing already compiled binaries, there is a qemu mode - a patch to QEMU in the emulation mode of one process that adds the necessary instrumentation (initially this mode of operation of QEMU was intended to run separate processes compiled for another architecture using the host core).


What is DynamoRIO


DynamoRIO is a dynamic instrumentation system (that is, it instruments already compiled binaries directly at runtime), running on Windows, Linux and Android on x86 and x86_64 architectures, as well as ARM (support for AArch64 is in release candidate version 7.0). Unlike QEMU, it is not designed to execute programs “close to text” on a foreign architecture, but to easily create your own instrumentations that modify the behavior of a program on a native architecture. At the same time, the goal is to avoid spoiling the optimized code. Unfortunately, I never found a way in which clients (the so-called user instrumentation libraries) could not be aware of the target instruction set (except for some trivial cases where there are enough cross-platform wrappers for basic instructions), because there is no conversion. " machine code -> bytecode - [instrumentation] -> new bytecode -> instrumented machine code ". Instead, for each broadcast base unit, a list of decoded instructions is transmitted to the client, which it can modify and supplement with the help of convenient functions and macros. That is, it is not necessary to program in machine codes, but you will most likely have to know the x86 instruction set (or another platform).


As a small bonus: I got to see what else is interesting in their account on Gitkhab, and came across an interesting repository: DRK . The repository seems to be abandoned and somewhat lost relevance, but the description is impressive:


DRK is DynamoRIO as a loadable Linux Kernel module. When DRK is loaded, all kernel-mode execution (system calls, interrupt & exception handlers, kernel threads, etc.) mode process and doesn't touch kernel-mode execution.

Test program


First, let's look at what AFL is capable of. No, we will not take a vulnerable version of any library and wait for hours or days. For the test, we will write the most stupid program that dereferences the null pointer, if stdin is given a string starting with the letters NULL . This, of course, is not the synthesis of a dzipega from nowhere, but on the other hand it is almost not necessary to wait.


So, download the AFL from here and collect it. As you have probably guessed, we will be collecting under GNU / Linux. However, other Unix-like systems and Unixes like Mac OS X should work too. Take a small program:


 #include <stdio.h> #include <string.h> volatile int *ptr = NULL; const char cmd[] = "NULL"; int main(int argc, char *argv[]) { char buf[16]; fgets(buf, sizeof buf, stdin); if (strncmp(buf, cmd, 4)) { return 0; } *ptr = 1; return 0; } 

Compile it and run fuzzing:


 $ export AFL_PATH=~/tmp/build/afl-2.42b/ $ #   ,     $ $AFL_PATH/afl-gcc example-bug-libc.c -o example-bug-libc $ #  -    ( ) $ mkdir input $ echo test > input/1 $ #   $ $AFL_PATH/afl-fuzz -i input -o output -- ./example-bug-libc 

And what we see:



Somehow it does not work ... Pay attention to the line last new path: AFL swears that after 91 thousand launches, he never found a new path. In fact, this is quite logical: let me remind you, we used static instrumentation at the stage of calling the assembler. The main comparison is made by a function from libc, which is not instrumented, and therefore it is not possible to count the number of matched characters. So I thought, until I decided to check it, but it turned out that our binary does not import the strncmp function. Judging by the output of objdump -d , the compiler simply generated in place of the strncmp instruction with a prefix instead of a loop where you could cram the instrumentation.


Instrumented main function with strncmp
 00000000000007f0 <.plt.got>: 7f0: ff 25 82 17 20 00 jmpq *0x201782(%rip) # 201f78 <getenv@GLIBC_2.2.5> 7f6: 66 90 xchg %ax,%ax 7f8: ff 25 8a 17 20 00 jmpq *0x20178a(%rip) # 201f88 <_exit@GLIBC_2.2.5> 7fe: 66 90 xchg %ax,%ax 800: ff 25 8a 17 20 00 jmpq *0x20178a(%rip) # 201f90 <write@GLIBC_2.2.5> 806: 66 90 xchg %ax,%ax 808: ff 25 8a 17 20 00 jmpq *0x20178a(%rip) # 201f98 <__stack_chk_fail@GLIBC_2.4> 80e: 66 90 xchg %ax,%ax 810: ff 25 8a 17 20 00 jmpq *0x20178a(%rip) # 201fa0 <close@GLIBC_2.2.5> 816: 66 90 xchg %ax,%ax 818: ff 25 8a 17 20 00 jmpq *0x20178a(%rip) # 201fa8 <read@GLIBC_2.2.5> 81e: 66 90 xchg %ax,%ax 820: ff 25 92 17 20 00 jmpq *0x201792(%rip) # 201fb8 <fgets@GLIBC_2.2.5> 826: 66 90 xchg %ax,%ax 828: ff 25 9a 17 20 00 jmpq *0x20179a(%rip) # 201fc8 <waitpid@GLIBC_2.2.5> 82e: 66 90 xchg %ax,%ax 830: ff 25 a2 17 20 00 jmpq *0x2017a2(%rip) # 201fd8 <shmat@GLIBC_2.2.5> 836: 66 90 xchg %ax,%ax 838: ff 25 a2 17 20 00 jmpq *0x2017a2(%rip) # 201fe0 <atoi@GLIBC_2.2.5> 83e: 66 90 xchg %ax,%ax 840: ff 25 aa 17 20 00 jmpq *0x2017aa(%rip) # 201ff0 <__cxa_finalize@GLIBC_2.2.5> 846: 66 90 xchg %ax,%ax 848: ff 25 aa 17 20 00 jmpq *0x2017aa(%rip) # 201ff8 <fork@GLIBC_2.2.5> 84e: 66 90 xchg %ax,%ax ... 0000000000000850 <main>: 850: 48 8d a4 24 68 ff ff lea -0x98(%rsp),%rsp 857: ff 858: 48 89 14 24 mov %rdx,(%rsp) 85c: 48 89 4c 24 08 mov %rcx,0x8(%rsp) 861: 48 89 44 24 10 mov %rax,0x10(%rsp) 866: 48 c7 c1 04 6a 00 00 mov $0x6a04,%rcx 86d: e8 9e 02 00 00 callq b10 <__afl_maybe_log> 872: 48 8b 44 24 10 mov 0x10(%rsp),%rax 877: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx 87c: 48 8b 14 24 mov (%rsp),%rdx 880: 48 8d a4 24 98 00 00 lea 0x98(%rsp),%rsp 887: 00 888: 53 push %rbx 889: be 10 00 00 00 mov $0x10,%esi 88e: 48 83 ec 20 sub $0x20,%rsp 892: 48 8b 15 77 17 20 00 mov 0x201777(%rip),%rdx # 202010 <stdin@@GLIBC_2.2.5> 899: 48 89 e7 mov %rsp,%rdi 89c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 8a3: 00 00 8a5: 48 89 44 24 18 mov %rax,0x18(%rsp) 8aa: 31 c0 xor %eax,%eax 8ac: e8 6f ff ff ff callq 820 <.plt.got+0x30> 8b1: 48 8d 3d dc 06 00 00 lea 0x6dc(%rip),%rdi # f94 <cmd> 8b8: b9 04 00 00 00 mov $0x4,%ecx 8bd: 48 89 e6 mov %rsp,%rsi 8c0: f3 a6 repz cmpsb %es:(%rdi),%ds:(%rsi) 8c2: 75 45 jne 909 <main+0xb9> 8c4: 48 8d a4 24 68 ff ff lea -0x98(%rsp),%rsp 8cb: ff 8cc: 48 89 14 24 mov %rdx,(%rsp) 8d0: 48 89 4c 24 08 mov %rcx,0x8(%rsp) 8d5: 48 89 44 24 10 mov %rax,0x10(%rsp) 8da: 48 c7 c1 2d 5b 00 00 mov $0x5b2d,%rcx 8e1: e8 2a 02 00 00 callq b10 <__afl_maybe_log> 8e6: 48 8b 44 24 10 mov 0x10(%rsp),%rax 8eb: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx 8f0: 48 8b 14 24 mov (%rsp),%rdx 8f4: 48 8d a4 24 98 00 00 lea 0x98(%rsp),%rsp 8fb: 00 8fc: 48 8b 05 1d 17 20 00 mov 0x20171d(%rip),%rax # 202020 <ptr> 903: c7 00 01 00 00 00 movl $0x1,(%rax) 909: 0f 1f 00 nopl (%rax) 90c: 48 8d a4 24 68 ff ff lea -0x98(%rsp),%rsp 913: ff 914: 48 89 14 24 mov %rdx,(%rsp) 918: 48 89 4c 24 08 mov %rcx,0x8(%rsp) 91d: 48 89 44 24 10 mov %rax,0x10(%rsp) 922: 48 c7 c1 8f 33 00 00 mov $0x338f,%rcx 929: e8 e2 01 00 00 callq b10 <__afl_maybe_log> 92e: 48 8b 44 24 10 mov 0x10(%rsp),%rax 933: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx 938: 48 8b 14 24 mov (%rsp),%rdx 93c: 48 8d a4 24 98 00 00 lea 0x98(%rsp),%rsp 943: 00 944: 31 c0 xor %eax,%eax 946: 48 8b 54 24 18 mov 0x18(%rsp),%rdx 94b: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx 952: 00 00 954: 75 40 jne 996 <main+0x146> 956: 66 90 xchg %ax,%ax 958: 48 8d a4 24 68 ff ff lea -0x98(%rsp),%rsp 95f: ff 960: 48 89 14 24 mov %rdx,(%rsp) 964: 48 89 4c 24 08 mov %rcx,0x8(%rsp) 969: 48 89 44 24 10 mov %rax,0x10(%rsp) 96e: 48 c7 c1 0a 7d 00 00 mov $0x7d0a,%rcx 975: e8 96 01 00 00 callq b10 <__afl_maybe_log> 97a: 48 8b 44 24 10 mov 0x10(%rsp),%rax 97f: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx 984: 48 8b 14 24 mov (%rsp),%rdx 988: 48 8d a4 24 98 00 00 lea 0x98(%rsp),%rsp 98f: 00 990: 48 83 c4 20 add $0x20,%rsp 994: 5b pop %rbx 995: c3 retq 996: 66 90 xchg %ax,%ax 998: 48 8d a4 24 68 ff ff lea -0x98(%rsp),%rsp 99f: ff 9a0: 48 89 14 24 mov %rdx,(%rsp) 9a4: 48 89 4c 24 08 mov %rcx,0x8(%rsp) 9a9: 48 89 44 24 10 mov %rax,0x10(%rsp) 9ae: 48 c7 c1 a8 dc 00 00 mov $0xdca8,%rcx 9b5: e8 56 01 00 00 callq b10 <__afl_maybe_log> 9ba: 48 8b 44 24 10 mov 0x10(%rsp),%rax 9bf: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx 9c4: 48 8b 14 24 mov (%rsp),%rdx 9c8: 48 8d a4 24 98 00 00 lea 0x98(%rsp),%rsp 9cf: 00 9d0: e8 33 fe ff ff callq 808 <.plt.got+0x18> 9d5: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 9dc: 00 00 00 9df: 90 nop 

The same, but without instrumentation
 0000000000000630 <.plt.got>: 630: ff 25 82 09 20 00 jmpq *0x200982(%rip) # 200fb8 <strncmp@GLIBC_2.2.5> 636: 66 90 xchg %ax,%ax 638: ff 25 8a 09 20 00 jmpq *0x20098a(%rip) # 200fc8 <__stack_chk_fail@GLIBC_2.4> 63e: 66 90 xchg %ax,%ax 640: ff 25 92 09 20 00 jmpq *0x200992(%rip) # 200fd8 <fgets@GLIBC_2.2.5> 646: 66 90 xchg %ax,%ax 648: ff 25 aa 09 20 00 jmpq *0x2009aa(%rip) # 200ff8 <__cxa_finalize@GLIBC_2.2.5> 64e: 66 90 xchg %ax,%ax ... 0000000000000780 <main>: 780: 55 push %rbp 781: 48 89 e5 mov %rsp,%rbp 784: 48 83 ec 30 sub $0x30,%rsp 788: 89 7d dc mov %edi,-0x24(%rbp) 78b: 48 89 75 d0 mov %rsi,-0x30(%rbp) 78f: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 796: 00 00 798: 48 89 45 f8 mov %rax,-0x8(%rbp) 79c: 31 c0 xor %eax,%eax 79e: 48 8b 15 6b 08 20 00 mov 0x20086b(%rip),%rdx # 201010 <stdin@@GLIBC_2.2.5> 7a5: 48 8d 45 e0 lea -0x20(%rbp),%rax 7a9: be 10 00 00 00 mov $0x10,%esi 7ae: 48 89 c7 mov %rax,%rdi 7b1: e8 8a fe ff ff callq 640 <.plt.got+0x10> 7b6: 48 8d 45 e0 lea -0x20(%rbp),%rax 7ba: ba 04 00 00 00 mov $0x4,%edx 7bf: 48 8d 35 ce 00 00 00 lea 0xce(%rip),%rsi # 894 <cmd> 7c6: 48 89 c7 mov %rax,%rdi 7c9: e8 62 fe ff ff callq 630 <.plt.got> 7ce: 85 c0 test %eax,%eax 7d0: 74 07 je 7d9 <main+0x59> 7d2: b8 00 00 00 00 mov $0x0,%eax 7d7: eb 12 jmp 7eb <main+0x6b> 7d9: 48 8b 05 40 08 20 00 mov 0x200840(%rip),%rax # 201020 <ptr> 7e0: c7 00 01 00 00 00 movl $0x1,(%rax) 7e6: b8 00 00 00 00 mov $0x0,%eax 7eb: 48 8b 4d f8 mov -0x8(%rbp),%rcx 7ef: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx 7f6: 00 00 7f8: 74 05 je 7ff <main+0x7f> 7fa: e8 39 fe ff ff callq 638 <.plt.got+0x8> 7ff: c9 leaveq 800: c3 retq 801: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 808: 00 00 00 80b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 

What is interesting, it seems, AFL itself turned on the optimization, because strncmp is honestly called in the non-instrumented code. As for the swollen PLT in the code compiled by afl-gcc , then, apparently, we see the functions called by forkserver — we will write it too, but in order. Well, let's pretend that we have not seen this, and try to naively rewrite our example without library functions:


 #include <stdio.h> volatile int *ptr = NULL; const char cmd[] = "NULL"; int main(int argc, char *argv[]) { char buf[16]; fgets(buf, sizeof buf, stdin); for (int i = 0; i < sizeof cmd - 1; ++i) { if (buf[i] != cmd[i]) return 0; } *ptr = 1; return 0; } 

Compile, run and ... tadam!



Writing your forkserver


As I said, AFL assumes the presence of instrumentation that collects information about transitions between the basic blocks of the program under study. But there is another optimization added by afl-gcc : forkserver . Its meaning is not to restart the program using the fork - execve bundle, each time performing dynamic linking, etc., and once in the fuzzer process, make a fork , then execve into an instrumented program. How are we going to restart the program under study? The point is that we will not test this running process. Instead, it runs code that waits for a command from the fuzzer in an infinite loop, makes a fork , waits for the completion of the child process, and calls back the result. But the budding process actually processes the input data and collects information about the edge coating. In the case of afl-gcc , if I understand its logic correctly, the forkserver is started when you first access the instrumented code. In the case of llvm mode , the deferred forkserver llvm mode also supported - this way you can skip not only the dynamic linking, but also the initialization standard for the process being studied, which can potentially speed up the whole process by orders of magnitude, but some nuances should be taken into account:



Also in llvm mode, persistent mode is supported, in which several test cases are executed in a row within the same budding process. Probably, this further reduces the overhead, but there is a danger that the result of the program at the next test example will depend not only on the current example, but also on the launch history. By the way, after I started writing this article, I came across information that the AFL port on Windows also uses DynamoRIO for instrumentation, and it just uses persistent mode. Well, what else is he left without fork support?


So, you need to write your forkserver on DynamoRIO, but first you need to understand the required protocol for its interaction with the fuzzer process. Something can be found at the above link to the AFL author's blog, but it is easier to find the llvm_mode/afl-llvm-rt.oc file in the llvm_mode/afl-llvm-rt.oc . There are a lot of interesting things in it, but first we will look at the function __afl_start_forkserver - everything is described there in detail, even with comments. We are not interested in Persistent mode, otherwise everything is pretty clear: we are given two file descriptors with known numbers - from one we read, to the other we write. We need:


  1. Quote from source: Phone home and tell the parent that we're OK . Write any 4 bytes.
  2. We read 4 bytes. Since it seems that in our case (without persistent mode) we have the invariant child_stopped == 0 , what we read does not matter.
  3. We budge the child process and close the file descriptors in it to communicate with the fuzzer.
  4. We write 4 bytes with the PID of the child process into the file descriptor and wait for it (the process) to complete.
  5. After waiting, we write another 4 bytes with the return code and go to step 2.

And here we are, in fact, approached the writing of his client . Here it is necessary to make a digression about the fact that although the examples from the documentation take only a couple of hundred lines, the documentation is still worth reading. You can start, for example, from here , where, in particular, it is written about what you should not do. For example, to paraphrase a well-known literary character, one can say that “the client of the instrumentation is a very strange thing: it seems to be there, but it’s not like”, which is called client transparency in the documentation: in particular, you need to use your copies of the system libraries ( which will help the private loader), or use the API DynamoRIO (memory allocation, parsing command line options and much more). Also, important information is in the API function documentation: for example, the description of the dr_register_bb_event function indicates a short list of 11 items, which the resulting sequence of instructions after instrumentation should satisfy.


To control the client build for DynamoRIO, it is recommended to use CMake - we will use it. How to do this, you can read in the documentation , we move on to more interesting questions. For example, in order to make a deferred forkserver, we need to somehow mark the place of its launch in the program under study, and then find this mark in DynamoRIO (however, nothing seems to prevent the forkserver from being simply called as a normal function call inside the program being studied, but it's also not interesting, is it?), Fortunately, this functionality is built into DynamoRIO and is called annotations . The client's developer must compile a special static library that needs to be linked to the program under study, and in the required place call a function from this library, the sequence of instructions in which does not do anything interesting with the normal execution, but when launched under DynamoRIO it is recognized by it and replaced by a specific constant or function call.


I will not repeat the official tutorial , let me just say that it is proposed to copy the header file from the DynamoRIO distribution, remake the “call” of two macros under the name and signature of our annotation (this file will need to be included in the program under test), and also create a source code with more by one call of the macro, which will generate the implementation of the stub for the annotation (a static library will be assembled from it, which must be linked to the program under test).


About the forkserver implementation, everything is also pretty straightforward, except that it is probably not very correct to call the fork from our copy of libc : for example, the AFL author in the article above on the implementation features of forkserver said that libc caches the PID. In the case of a fork call from the libc copy of the application under study, it will know about the fork call that has occurred, and DynamoRIO will, I would like to believe, also notice. So I had to write something like


 module_data_t *module = dr_lookup_module_by_name("libc.so.6"); EXIT_IF_FAILED(module != NULL, "Cannot lookup libc.\n", 1) fork_fun_t fork_ptr = (fork_fun_t)dr_get_proc_address(module->handle, "fork"); EXIT_IF_FAILED(fork_ptr != NULL, "Cannot get fork function from libc.\n", 1) dr_free_module_data(module); 

Therefore, in order to support the traditional startup mode of the forkserver when starting the program, you need to make sure that libc is already available at this point.


, : , forkserver — . dr_client_main — . . … . nm -D : , dr_client_main . — … drrun -verbose -debug . , , , . QtCreator .../build-afl-dr-Desktop_3fb6e5- , "" — .../build-afl-dr-Desktop- . , . , , , , , , , … (, - .)


Testing:


 $ ~/soft/DynamoRIO-Linux-6.2.0-2/bin64/drrun -c libafl-dr.so -- ./example-bug Running forkserver... Cannot connect to fuzzer. 1 $ ~/soft/DynamoRIO-Linux-6.2.0-2/bin64/drrun -c libafl-dr.so -- ./example-bug 198<&0 199>/dev/null Running forkserver... xxxx 1 Incorrect spawn command from fuzzer. 

, - . , AFL. dumb mode ( ), - forkserver:


 $ AFL_DUMB_FORKSRV=1 $AFL_PATH/afl-fuzz -i input -o output -n -- ./example-bug ... -   Fork server handshake failed -- ,   ... $ AFL_DUMB_FORKSRV=1 $AFL_PATH/afl-fuzz -i input -o output -n -- ~/soft/DynamoRIO-Linux-6.2.0-2/bin64/drrun -c libafl-dr.so -- ./example-bug ... -   Timeout while initializing fork server (adjusting -t may help) $ #   strace- ... $ AFL_DUMB_FORKSRV=1 $AFL_PATH/afl-fuzz -i input -o output -n -m 2048 -- ~/soft/DynamoRIO-Linux-6.2.0-2/bin64/drrun -c libafl-dr.so -- ./example-bug ...  ,     ( -m) 


, . .


: forkserver , SIGSEGV. , , , , fork libc , , . , extension droption . , dr_option_t<T> , , , dr_client_main droption_parser_t::parse_argv(...) (, C++). , 7.0 RC1, 6.2.0-2 , CMake droption - . . , , , , drutil .


:


afl-annotations.h
 // Based on dr_annotations.h from DynamoRIO sources #ifndef _AFL_DR_ANNOTATIONS_H_ #define _AFL_DR_ANNOTATIONS_H_ 1 #include "annotations/dr_annotations_asm.h" /* To simplify project configuration, this pragma excludes the file from GCC warnings. */ #ifdef __GNUC__ # pragma GCC system_header #endif #define RUN_FORKSERVER() \ DR_ANNOTATION(run_forkserver) #ifdef __cplusplus extern "C" { #endif DR_DECLARE_ANNOTATION(void, run_forkserver, ()); #ifdef __cplusplus } #endif #endif 

afl-annotations.c
 #include "afl-annotations.h" DR_DEFINE_ANNOTATION(void, run_forkserver, (), ); 

afl-dr.c
 #include <dr_api.h> #include <droption.h> #include <stdint.h> #include <unistd.h> #include <sys/wait.h> #include "afl-annotations.h" static const int FROM_FUZZER_FD = 198; static const int TO_FUZZER_FD = 199; typedef int (*fork_fun_t)(); #define EXIT_IF_FAILED(isOk, msg, code) \ if (!(isOk)) { \ dr_fprintf(STDERR, (msg)); \ dr_exit_process((code)); \ } static droption_t<bool> opt_private_fork(DROPTION_SCOPE_CLIENT, "private-fork", false, "Use fork function from the private libc", "Use fork function from the private libc"); static void parse_options(int argc, const char *argv[]) { std::string parse_err; if (!droption_parser_t::parse_argv(DROPTION_SCOPE_CLIENT, argc, argv, &parse_err, NULL)) { dr_fprintf(STDERR, "Incorrect client options: %s\n", parse_err.c_str()); dr_exit_process(1); } } static void start_forkserver() { // For references, see https://lcamtuf.blogspot.ru/2014/10/fuzzing-binaries-without-execve.html // and __afl_start_forkserver in llvm_mode/afl-llvm-rt.oc from AFL sources static bool forkserver_is_running = false; uint32_t unused_four_bytes = 0; uint32_t was_killed; if (!forkserver_is_running) { dr_printf("Running forkserver...\n"); forkserver_is_running = true; } else { dr_printf("Warning: Attempt to re-run forkserver ignored.\n"); return; } if (write(TO_FUZZER_FD, &unused_four_bytes, 4) != 4) { dr_printf("Cannot connect to fuzzer.\n"); return; } fork_fun_t fork_ptr; // Lookup the fork function from target application, so both DynamoRIO // and application's copy of libc know about fork // Currently causes crashes sometimes, in that case use the private libc's fork. if (!opt_private_fork.get_value()) { module_data_t *module = dr_lookup_module_by_name("libc.so.6"); EXIT_IF_FAILED(module != NULL, "Cannot lookup libc.\n", 1) fork_ptr = (fork_fun_t)dr_get_proc_address(module->handle, "fork"); EXIT_IF_FAILED(fork_ptr != NULL, "Cannot get fork function from libc.\n", 1) dr_free_module_data(module); } else { fork_ptr = fork; } while (true) { EXIT_IF_FAILED(read(FROM_FUZZER_FD, &was_killed, 4) == 4, "Incorrect spawn command from fuzzer.\n", 1) int child_pid = fork_ptr(); EXIT_IF_FAILED(child_pid >= 0, "Cannot fork.\n", 1) if (child_pid == 0) { close(TO_FUZZER_FD); close(FROM_FUZZER_FD); return; } else { int status; EXIT_IF_FAILED(write(TO_FUZZER_FD, &child_pid, 4) == 4, "Cannot write child PID.\n", 1) EXIT_IF_FAILED(waitpid(child_pid, &status, 0) >= 0, "Wait for child failed.\n", 1) EXIT_IF_FAILED(write(TO_FUZZER_FD, &status, 4) == 4, "Cannot write child exit status.\n", 1) } } } DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[]) { parse_options(argc, argv); EXIT_IF_FAILED( dr_annotation_register_call("run_forkserver", (void *)start_forkserver, false, 0, DR_ANNOTATION_CALL_TYPE_FASTCALL), "Cannot register forkserver annotation.\n", 1); } 


, , . , , AFL, . , afl-gcc afl-as.c , afl-as.h . AFL , , , technical details . ,


 cur_location = <COMPILE_TIME_RANDOM>; shared_mem[cur_location ^ prev_location]++; prev_location = cur_location >> 1; 

( AFL). , , , 1 — . 64- shared memory, llvm_mode/afl-llvm-rt.oc .


, DynamoRIO, . , ( , , / ) DynamoRIO . , dr_register_bb_event . , , thread-local , , . , -, , :


 //  dr_client_main: lock = dr_mutex_create(); dr_register_thread_init_event(event_thread_init); dr_register_thread_exit_event(event_thread_exit); dr_register_bb_event(event_basic_block); dr_register_exit_event(event_exit); 

:


 typedef struct { uint64_t scratch; uint8_t map[MAP_SIZE]; } thread_data; static void event_thread_init(void *drcontext) { void *data = dr_thread_alloc(drcontext, sizeof(thread_data)); memset(data, 0, sizeof(thread_data)); dr_set_tls_field(drcontext, data); } static void event_thread_exit(void *drcontext) { thread_data *data = (thread_data *) dr_get_tls_field(drcontext); dr_mutex_lock(lock); for (int i = 0; i < MAP_SIZE; ++i) { shmem[i] += data->map[i]; } dr_mutex_unlock(lock); dr_thread_free(drcontext, data, sizeof(thread_data)); } 

… . , -, , DynamoRIO , , thread-specific memory pool. -, --- thread_data , tls field.


, : event_basic_block(void *drcontext, void *tag, instrlist_t *bb, bool for_trace, bool translating) , . , — instrlist_t *bb . ( ) , , , amd64 aka x86_64. DynamoRIO , dr_register_bb_event . , basic block:


DR constructs dynamic basic blocks, which are distinct from a compiler's classic basic blocks. DR does not know all entry points ahead of time, and will end up duplicating the tail of a basic block if a later entry point is discovered that targets the middle of a block created earlier, or if a later entry point targets straight-line code that falls through into code already present in a block.

, , — !


. , , :


 static dr_emit_flags_t event_basic_block(void *drcontext, void *tag, instrlist_t *bb, bool for_trace, bool translating) { instr_t *where = instrlist_first(bb); reg_id_t tls_reg = DR_REG_XDI, offset_reg = DR_REG_XDX; dr_save_arith_flags(drcontext, bb, where, SPILL_SLOT_1); dr_save_reg(drcontext, bb, where, tls_reg, SPILL_SLOT_2); dr_save_reg(drcontext, bb, where, offset_reg, SPILL_SLOT_3); dr_insert_read_tls_field(drcontext, bb, where, tls_reg); //     dr_restore_reg(drcontext, bb, where, offset_reg, SPILL_SLOT_3); dr_restore_reg(drcontext, bb, where, tls_reg, SPILL_SLOT_2); dr_restore_arith_flags(drcontext, bb, where, SPILL_SLOT_1); return DR_EMIT_DEFAULT; } 

tls_reg . COMPILE_TIME_RANDOM. , event_basic_block : for_trace translating . , , , DynamoRIO . , , , for_trace = true , . , , , translating = true — , , , translating = false . , DR_EMIT_STORE_TRANSLATIONS DR_EMIT_DEFAULT , , . , by design, , . , basic block.


 void *app_pc = dr_fragment_app_pc(tag); uint32_t cur_location = ((uint32_t)(uintptr_t)app_pc * (uint32_t)33533) & 0xFFFF; 

, ASLR , . , " ", sysctl -w kernel.randomize_va_space=0 .


, . . API , . .:


  instrlist_meta_preinsert(bb, where, XINST_CREATE_load(drcontext, opnd_create_reg(offset_reg), OPND_CREATE_MEM64(tls_reg, offsetof(thread_data, scratch)))); instrlist_meta_preinsert(bb, where, INSTR_CREATE_xor(drcontext, opnd_create_reg(offset_reg), OPND_CREATE_INT32(cur_location))); instrlist_meta_preinsert(bb, where, XINST_CREATE_store(drcontext, OPND_CREATE_MEM32(tls_reg, offsetof(thread_data, scratch)), OPND_CREATE_INT32(cur_location >> 1))); instrlist_meta_preinsert(bb, where, INSTR_CREATE_inc(drcontext, opnd_create_base_disp(tls_reg, offset_reg, 1, offsetof(thread_data, map), OPSZ_1))); 

, , , -- , Segmentation fault. , XINST_CREATE_load XINST_CREATE_store :


 $ ~/soft/DynamoRIO-Linux-6.2.0-2/bin64/drrun -c libafl-dr.so --private-fork -- ./example-bug Cannot get SHM id from environment. Creating dummy map. Running forkserver... Cannot connect to fuzzer. ^C $ #  load  store $ ~/soft/DynamoRIO-Linux-6.2.0-2/bin64/drrun -c libafl-dr.so --private-fork -- ./example-bug Cannot get SHM id from environment. Creating dummy map. <Application /path/to/example-bug (5058). Tool internal crash at PC 0x00005605e72bbeaa. Please report this at your tool's issue tracker. Program aborted. Received SIGSEGV at pc 0x00005605e72bbeaa in thread 5058 Base: 0x00005605e71c5000 Registers:eax=0x0000000000000001 ebx=0x00007ff6dfa12038 ecx=0x0000000000000048 edx=0x0000000000000000 esi=0x0000000000000049 edi=0x0000000000000005 esp=0x00007ff6dfa0ebb0 ebp=0x00007ff6dfa0ebc0 r8 =0x0000000000000003 r9 =0x0000000000000005 r10=0x0000000000000000 r11=0x00005605e72b70ef r12=0x0000000000000000 r13=0x0000000000000000 r14=0x000000000000000c r15=0x00005605e7368c50 eflags=0x0000000000010202 version 6.2.0, build 2 -no_dynamic_options -client_lib '/path/to/libafl-dr.so;0;"--private-fork"' -code_api -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct 0x00007ff6dfa0ebc0 0x0000020803000000> 

What to do? , . -debug -loglevel 1 -logdir /tmp/dynamorio/ , - :


 ERROR: Could not find encoding for: mov (%rdi)[8byte] -> %rdx SYSLOG_ERROR: Application /path/to/example-bug (5192) DynamoRIO usage error : instr_encode error: no encoding found (see log) SYSLOG_ERROR: Usage error: instr_encode error: no encoding found (see log) (/dynamorio_package/core/arch/x86/encode.c, line 2417) 

, : , — .


, , :


 $ $AFL_PATH/afl-fuzz -i input -o output -m 2048 -- ~/soft/DynamoRIO-Linux-6.2.0-2/bin64/drrun -c libafl-dr.so -- ./example-bug afl-fuzz 2.42b by <lcamtuf@google.com> [+] You have 4 CPU cores and 1 runnable tasks (utilization: 25%). [+] Try parallel jobs - see docs/parallel_fuzzing.txt. [*] Checking CPU core loadout... [+] Found a free CPU core, binding to #0. [*] Checking core_pattern... [*] Checking CPU scaling governor... [*] Setting up output directories... [+] Output directory exists but deemed OK to reuse. [*] Deleting old session data... [+] Output dir cleanup successful. [*] Scanning 'input'... [+] No auto-generated dictionary tokens to reuse. [*] Creating hard links for all input files... [*] Validating target binary... [-] Looks like the target binary is not instrumented! The fuzzer depends on compile-time instrumentation to isolate interesting test cases while mutating the input data. For more information, and for tips on how to instrument binaries, please see docs/README. When source code is not available, you may be able to leverage QEMU mode support. Consult the README for tips on how to enable this. (It is also possible to use afl-fuzz as a traditional, "dumb" fuzzer. For that, you can use the -n option - but expect much worse results.) [-] PROGRAM ABORT : No instrumentation detected Location : check_binary(), afl-fuzz.c:6894 

… … ? : strace , , drrun . , afl-fuzz , ? , , AFL — ? , , , , afl-fuzz.c:6894 , :


 f_data = mmap(0, f_len, PROT_READ, MAP_PRIVATE, fd, 0); // ... if (!qemu_mode && !dumb_mode && !memmem(f_data, f_len, SHM_ENV_VAR, strlen(SHM_ENV_VAR) + 1)) { // ... FATAL("No instrumentation detected"); } 

-, : AFL __AFL_SHM_ID — , . - , echo -ne "__AFL_SHM_ID\0" >> /path/to/drrun , , : , , AFL_SKIP_BIN_CHECK :


 $ #  -d      $ AFL_SKIP_BIN_CHECK=1 $AFL_PATH/afl-fuzz -i input -o output -m 2048 -d -- ~/soft/DynamoRIO-Linux-6.2.0-2/bin64/drrun -c libafl-dr.so -- ./example-bug 


, AFL , , , total paths: 12 , 4-5. , , libc. example-bug , , (, DynamoRIO, ). …



… , . , , " ". module_data_t * , dr_client_main , event_basic_block , " ":


 module_data_t *main_module; //  dr_client_main: main_module = dr_get_main_module(); //  event_basic_block: if (!opt_instrument_everything.get_value() && !dr_module_contains_addr(main_module, pc)) { return DR_EMIT_DEFAULT; } 

80 ( 2 ), output/queue test NU — , .


, DynamoRIO -thread_private , . tls field , , , immediate-:


 if (dr_using_all_private_caches()) { instrlist_meta_preinsert(bb, where, INSTR_CREATE_mov_imm(drcontext, opnd_create_reg(tls_reg), OPND_CREATE_INTPTR(dr_get_tls_field(drcontext)))); } else { dr_insert_read_tls_field(drcontext, bb, where, tls_reg); } 

-thread_private ( -c libafl-dr.so , DynamoRIO, ), 5 . , if (0 && dr_using_all_private_caches()) , — , DynamoRIO . :)


, : -disable_traces — , , , , , , . … 10-15 . , , , , , test case-.


, , "" forkserver-: example-bug.c


 ungetc('1', stdin); char ch; fscanf(stdin, "%c", &ch); RUN_FORKSERVER(); 

… 15 . , , : , - libc forkserver, , - .


. :



… . , : " API / ?" , ", , !" — , . , , , , — , .


References:



UPD: , ( qemu_mode ). API , 5. , , .


')

Source: https://habr.com/ru/post/332076/


All Articles