📜 ⬆️ ⬇️

Create jpeg from nowhere

Here is an interesting demonstration of the possibilities of afl ; I was really surprised that it works!

$ mkdir in_dir $ echo 'hello' >in_dir/hello $ ./afl-fuzz -i in_dir -o out_dir ./jpeg-9a/djpeg 

In essence, I created a text file with only the word "hello" and asked the fuzzer to output a stream to a program that expects a JPEG image at the input ( djpeg is a simple utility that comes with the popular IJG jpeg graphic library; libjpeg-turbo should also come up) . Of course, my input does not look like a valid image, so the utility quickly rejects them:

 $ ./djpeg '../out_dir/queue/id:000000,orig:hello' Not a JPEG file: starts with 0x68 0x65 

Usually such fuzzing would be completely meaningless: essentially, there is no chance that a traditional format-independent fuzzer can ever turn the word "hello" into a valid JPEG image. The probability that dozens of random settings will line up one after another is astronomically small.
')
Fortunately, afl-fuzz can use for its own purposes simple tools at the assembly level — and within a millisecond or so he notices that although setting the first byte to 0xff does not change the outwardly observed output, you can run a slightly different internal path in the test application . With this information, he decides to use this test case as a basis for future fuzzing rounds:

 $ ./djpeg '../out_dir/queue/id:000001,src:000000,op:int8,pos:0,val:-1,+cov' Not a JPEG file: starts with 0xff 0x65 

Processing then the test case of the second generation, fuzzer almost immediately notices that setting the second byte to 0xd8 does something even more interesting:

 $ ./djpeg '../out_dir/queue/id:000004,src:000001,op:havoc,rep:16,+cov' Premature end of JPEG file JPEG datastream contains no image 

Here, Fazzer managed to synthesize a valid file header — and really understood its significance. Using such a rendition as the basis for the next fuzzing round, he quickly begins to sink deeper and deeper into the core. After several hundred generations and several hundred million execve () calls, he finds more and more control structures that are needed for a valid JPEG file — SOFs, Huffman tables, quantization tables, SOS markers, and so on:

 $ ./djpeg '../out_dir/queue/id:000008,src:000004,op:havoc,rep:2,+cov' Invalid JPEG file structure: two SOI markers ... $ ./djpeg '../out_dir/queue/id:001005,src:000262+000979,op:splice,rep:2' Quantization table 0x0e was not defined ... $ ./djpeg '../out_dir/queue/id:001282,src:001005+001270,op:splice,rep:2,+cov' >.tmp; ls -l .tmp -rw-r--r-- 1 lcamtuf lcamtuf 7069 Nov 7 09:29 .tmp 

The first picture, obtained after six hours of fuzzing on an 8-core system, looks quite modest: it is a pure gray rectangle 3 pixels high and 748 pixels wide. But from the moment of its opening, fuzzer begins to use this picture as a basis - and quickly produces a wide range of more interesting pictures for each new path of execution:



Of course, the synthesis of a complete image from nowhere is an exceptional case, and hardly useful in practice. But for more prosaic purposes, fuzzers are suitable for load testing any function in a target program. Equipped with a snap, evolutionary fuzzing using less well-known functions (for example, JPEG with progressive or arithmetic coding, black and white JPEG) can be used as an alternative to the gigantic high-quality package of various test cases that begin fuzzing.

A remarkable feature of the libjpeg case is that it works without any special preparation: there is nothing special in the "hello" line, the fuzzer knows nothing about parsing images, it is not intended and not configured to work specifically with this library. There are not even any command line keys to activate. You can play afl-fuzz on many other types of parsers with the same results: with bash, it will write valid scripts ; with giflib to produce gifs; with fileutils to produce ELF files and set flags, create binaries for Atari 68xxx, boot sectors x86 and UTF-8 with BOM. In almost all cases, the impact of tooling on performance is also minimal.

Of course, not everything is so smooth. In essence, afl-fuzz remains a program for brute force. This makes it simple, fast and reliable, but also means that certain types of atomized checks in a large search space can become an insurmountable obstacle to a fuzzer. Here is a good example:

 if (strcmp(header.magic_password, "h4ck3d by p1gZ")) goto terminate_now; 

In practice, this means that afl-fuzz is unlikely to be able to “invent” from scratch PNG files or non-trivial HTML documents — and it needs a better starting point than just “hello”. In order to invariably work with code constructions as in the above example, the universal fuzzer needs to understand the work of the target binaries at a completely different level. Scientists have made some progress in this regard, but we still have years to wait for the emergence of frameworks that are able to quickly, simply and reliably work with diverse and complex code bases.

Several people asked me about the symbolic performance and other things influenced by afl-fuzz ; I have collected some notes in this document .

Source: https://habr.com/ru/post/328652/


All Articles