KLEE virtual machine for symbolic code execution

In this post we will try to apply the symbolic execution technique using the example of the symbolic VM KLEE to solve a simple ASCII maze. How much do you think we can find the right solutions?

Labyrinth

The labyrinth itself is a C program, the code for which you can take here . After launch, the game waits for a sequence of steps to be entered (a - step left, d - right, w - up and s - down). It looks like this:

  Player pos: 1x4
 Iteration no.  2. Action: s.
 + - + --- + --- +
 | X |  | # |
 | X |  - + |  |
 | X |  |  |  |
 | X + - |  |  |
 |  |  |
 + ----- + --- +

Despite the seeming simplicity, the maze hides within itself one secret and therefore has more than one solution.

KLEE

KLEE is a symbolic interpreter of the LLVM bitcode, i.e. he can execute code generated by any front end. The essence of symbolic execution is that all input values of the program being started contain not specific values, but symbolic restrictions on possible values. During execution, new character variables can be created (for example, when their values depend on the result of an operation with a character operand), as well as existing ones change (when branching in the program execution path). In the second case, KLEE uses the SMT solver to determine if it is possible in principle to execute a program along a given path.
If you are interested in this topic, you can look here and read it or even this .
')
The idea of the post is simple - to feed the maze program a character string and see how KLEE will find solutions.

Parse the code

The labyrinth is defined as a two-dimensional array of characters.

#define H 7 #define W 11 char maze[H][W] = { "+-+---+---+", "| | |#|", "| | --+ | |", "| | | | |", "| +-- | | |", "| | |", "+-----+---+" };

The draw () task is to draw our array on the screen.

 void draw () { int i, j; for (i = 0; i < H; i++) { for (j = 0; j < W; j++) printf ("%c", maze[i][j]); printf ("\n"); } printf ("\n"); }

The main () function starts with declaring variables that hold the position of the player, an iterator, and a 28-byte array that stores actions. Then, the initial position of the player is set equal to (1.1) and a cross is drawn on this spot.

 int main (int argc, char *argv[]) { int x, y; //Player position int ox, oy; //Old player position int i = 0; //Iteration number #define ITERS 28 char program[ITERS]; x = 1; y = 1; maze[y][x]='X';

We read from stdin what the player entered there.

  read(0,program,ITERS);

Further, at each iteration of the cycle, we save the old coordinates of the user and move it to a new location.

  while(i < ITERS) { ox = x; //Save old player position oy = y; switch (program[i]) { case 'w': y--; break; case 's': y++; break; case 'a': x--; break; case 'd': x++; break; default: printf("Wrong command!(only w,s,a,d accepted!)\n"); printf("You loose!\n"); exit(-1); }

If the player has reached the end - we congratulate him.

  if (maze[y][x] == '#') { printf ("You win!\n"); printf ("Your solution \n",program); exit (1); }

If something is wrong - we return to the place where we stood.

  if (maze[y][x] != ' ' && !((y == 2 && maze[y][x] == '|' && x > 0 && x < W))) { x = ox; y = oy; }

If you hit the wall - the player lost.

  if (ox==x && oy==y){ printf("You loose\n"); exit(-2); }

Well, if nothing happened - then we just stepped. Move the player, draw the picture, increase the iterator and go back to the beginning.

  maze[y][x]='X'; draw (); //draw it i++; sleep(1); //me wait to human }

Let's go through the maze manually

If you suddenly want to go through the game yourself, then most likely you will receive the following solution: ssssddddwwaawwddddssssssddwwww

And now with KLEE

First of all, we need LLVM, the translator C -> LLVM IR (llvm-gcc or clang) and, in fact, KLEE. In most Linux distributions, these packages are already in the repositories.
To translate mace.c in the LLVM bitcode, run the command

  $ llvm-gcc -c –emit-llvm maze.c -o maze.bc

As mentioned earlier, instead of llvm-gcc, you can use clang.

As a result of executing this command, we received the file maze.bc, which is a bitcode of the labyrinth program. It can even be run on the LLVM interpreter:

  $ lli maze.bc

To test the code with KLEE, we need to mark some variables with symbolic ones. In this case, it will be an array of actions declared at the beginning of main (). So, replace the string

  read(0,program,ITERS);

  klee_make_symbolic(program,ITERS,"program");

And do not forget to add the header file KLEE

 #include <klee/klee.h>

Now KLEE will be able to go through any possible path of execution (and the labyrinth corridor). Moreover, if these paths can lead to any error (for example, a buffer overflow), KLEE signals this to us. We carry out

  $ klee maze.bc

and observe the following picture:

As can be seen from the last 3 messages, KLEE found 321 different paths. True, this is not the 321 right decision.

  KLEE: done: total instructions = 112773
 KLEE: done: completed paths = 321
 KLEE: done: generated tests = 318

A detailed report is saved to the klee-last directory.

  $ ls klee-last /
 assembly.ll test000078.ktest test000158.ktest
 info test000079.ktest test000159.ktest
 messages.txt test000080.ktest test000160.ktest
 run.istats test000081.ktest test000161.ktest
 run.stats test000082.ktest test000162.ktest
 test000001.ktest test000083.ktest test000163.ktest
 test000075.ktest test000155.ktest warnings.txt

Each test can be viewed using the ktest-tool utility.

  $ ktest-tool klee-last / test000222.ktest
 ktest file: 'klee-last / test000222.ktest'
 args: ['maze_klee.o']
 num objects: 1
 object 0: name: 'program'
 object 0: size: 29
 object 0: data: 'ssssddddwwaawwddddssssddwwwd \ x00 ′

In this case, you can simply take the value of the program variable from here and play the test, passing this value to the maze program.

So, we have found all possible routes in the program. But do not look through every test to find all the right solutions! We need to force KLEE to highlight tests that really go to “You win!”.
In the KLEE interface, there is a function klee_assert (), which does the same thing as its similar assert () in C - calculates the value of a boolean expression and, if it is false, interrupts program execution. For our case, this is what the doctor ordered.
Add after line

 printf ("You win!\n");

unconditional call klee_assert ()

 printf ("You win!\n"); klee_assert(0); //Signal The solution!!

Now KLEE will not stop generating tests for all possible execution paths, however the tests leading to the call to klee_assert () will have .err in their name:

  $ ls -1 klee-last / | grep -A2 -B2 err
 test000096.ktest
 test000097.ktest
 test000098.assert.err
 test000098.ktest
 test000098.pc

Let's look at one of them.

  $ ktest-tool klee-last / test000098.ktest
 ktest file: 'klee-last / test000098.ktest'
 args: ['maze_klee.o']
 num objects: 1
 object 0: name: 'program'
 object 0: size: 29
 object 0: data: 'sddwddddssssddwwww \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 ′

So, the proposed KLEE solution is sddwddddssssddwwww
Hey, let me! It looks short! Let's try it on a real maze:

I knew it!!! There are fake walls in the maze! And KLEE safely passed through them! But wait, if KLEE had to find all the solutions, then where is our first? Why didn't KLEE find it?
Well, usually we only need one way to the error, if we are looking for the error itself, and we do not need alternative ways to it. Therefore, in this case we will use one of 10,000 command line options.

  $ klee –emit-all-errors maze_klee.o

We look at the result:

Now we got 4 tests, which are 4 possible solutions of our labyrinth:

  $ ktest-tool klee-last / test000097.ktest
 ktest file: 'klee-last / test000097.ktest'
 args: ['maze_klee.o']
 num objects: 1
 object 0: name: 'program'
 object 0: size: 29
 object 0: data: 'sddwddddsddw \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 ′
 $ ktest-tool klee-last / test000136.ktest
 ktest file: 'klee-last / test000136.ktest'
 args: ['maze_klee.o']
 num objects: 1
 object 0: name: 'program'
 object 0: size: 29
 object 0: data: 'sddwddddssssddwwww \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 ′
 $ ktest-tool klee-last / test000239.ktest
 ktest file: 'klee-last / test000239.ktest'
 args: ['maze_klee.o']
 num objects: 1
 object 0: name: 'program'
 object 0: size: 29
 object 0: data: 'ssssddddwwaawwddddsddw \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 ′
 $ ktest-tool klee-last / test000268.ktest
 ktest file: 'klee-last / test000268.ktest'
 args: ['maze_klee.o']
 num objects: 1
 object 0: name: 'program'
 object 0: size: 29
 object 0: data: 'ssssddddwwaawwddddssssddwwww \ x00 ′

So, the solutions of this labyrinth:
1. ssssddddwwaawwddddssssddwwww
2. ssssddddwwaawwddddsddw
3. sddwddddssssddwwww
4. sddwddddsddw

Source: https://habr.com/ru/post/123725/

All Articles