📜 ⬆️ ⬇️

First steps with Unicorn Engine

When searching for "Unicorn Engine" on Habré, I was surprised to find that this tool has never been in the article. I will try to fill this void. Let's start with the basics, and look at an example of using an emulator in real life. In order not to reinvent the wheel, I decided to just translate this manual. Before starting, I will say that all my comments or observations will look like this .


What is the Unicorn Engine?


The developers themselves write about Unicorn Engine Unicorn Engine like this:


Unicorn is a lightweight, multi-platform and multi-architecture processor emulator.

This is not a standard emulator. It does not emulate the work of the entire program or the whole OS. It does not support system commands (such as opening a file, outputting a character to the console, etc.). You will have to independently engage in memory markup and data loading into it, and then you simply start execution from some specific address.


So why is it useful?



What do you need?



Example


For example, let's take the task with hxp CTF 2017 under the name Fibonacci . Binary can be downloaded here .


When you start the program, it starts outputting our flag to the console, but very slowly. Each next byte of the flag is considered slower and slower.


The flag is: hxp{F 

This means that in order to get the flag in a reasonable time, we need to optimize the operation of this application.


Using IDA Pro ( I personally used radare2 + Cutter ), we decompiled the code into C-like pseudo-code. Despite the fact that the code was not properly decompiled, we can still get information from it about what is going on inside.


Decompiled code
 __int64 __fastcall main(__int64 a1, char **a2, char **a3) { void *v3; // rbp@1 int v4; // ebx@1 signed __int64 v5; // r8@2 char v6; // r9@3 __int64 v7; // r8@3 char v8; // cl@3 __int64 v9; // r9@5 int a2a; // [sp+Ch] [bp-1Ch]@3 v3 = &encrypted_flag; v4 = 0; setbuf(stdout, 0LL); printf("The flag is: ", 0LL); while ( 1 ) { LODWORD(v5) = 0; do { a2a = 0; fibonacci(v4 + v5, &a2a); v8 = v7; v5 = v7 + 1; } while ( v5 != 8 ); v4 += 8; if ( (unsigned __int8)(a2a << v8) == v6 ) break; v3 = (char *)v3 + 1; _IO_putc((char)(v6 ^ ((_BYTE)a2a << v8)), stdout); v9 = *((char *)v3 - 1); } _IO_putc(10, stdout); return 0LL; } 

 unsigned int __fastcall fibonacci(int i, _DWORD *a2) { _DWORD *v2; // rbp@1 unsigned int v3; // er12@3 unsigned int result; // eax@3 unsigned int v5; // edx@3 unsigned int v6; // esi@3 unsigned int v7; // edx@4 v2 = a2; if ( i ) { if ( i == 1 ) { result = fibonacci(0, a2); v5 = result - ((result >> 1) & 0x55555555); v6 = ((result - ((result >> 1) & 0x55555555)) >> 2) & 0x33333333; } else { v3 = fibonacci(i - 2, a2); result = v3 + fibonacci(i - 1, a2); v5 = result - ((result >> 1) & 0x55555555); v6 = ((result - ((result >> 1) & 0x55555555)) >> 2) & 0x33333333; } v7 = v6 + (v5 & 0x33333333) + ((v6 + (v5 & 0x33333333)) >> 4); *v2 ^= ((BYTE1(v7) & 0xF) + (v7 & 0xF) + (unsigned __int8)((((v7 >> 8) & 0xF0F0F) + (v7 & 0xF0F0F0F)) >> 16)) & 1; } else { *a2 ^= 1u; result = 1; } return result; } 

Here is the assembler code of the main and fibonacci functions:


main
 .text:0x4004E0 main proc near ; DATA XREF: start+1Do .text:0x4004E0 .text:0x4004E0 var_1C = dword ptr -1Ch .text:0x4004E0 .text:0x4004E0 push rbp .text:0x4004E1 push rbx .text:0x4004E2 xor esi, esi ; buf .text:0x4004E4 mov ebp, offset unk_4007E1 .text:0x4004E9 xor ebx, ebx .text:0x4004EB sub rsp, 18h .text:0x4004EF mov rdi, cs:stdout ; stream .text:0x4004F6 call _setbuf .text:0x4004FB mov edi, offset format ; "The flag is: " .text:0x400500 xor eax, eax .text:0x400502 call _printf .text:0x400507 mov r9d, 49h .text:0x40050D nop dword ptr [rax] .text:0x400510 .text:0x400510 loc_400510: ; CODE XREF: main+8Aj .text:0x400510 xor r8d, r8d .text:0x400513 jmp short loc_40051B .text:0x400513 ; --------------------------------------------------------------------------- .text:0x400515 align 8 .text:0x400518 .text:0x400518 loc_400518: ; CODE XREF: main+67j .text:0x400518 mov r9d, edi .text:0x40051B .text:0x40051B loc_40051B: ; CODE XREF: main+33j .text:0x40051B lea edi, [rbx+r8] .text:0x40051F lea rsi, [rsp+28h+var_1C] .text:0x400524 mov [rsp+28h+var_1C], 0 .text:0x40052C call fibonacci .text:0x400531 mov edi, [rsp+28h+var_1C] .text:0x400535 mov ecx, r8d .text:0x400538 add r8, 1 .text:0x40053C shl edi, cl .text:0x40053E mov eax, edi .text:0x400540 xor edi, r9d .text:0x400543 cmp r8, 8 .text:0x400547 jnz short loc_400518 .text:0x400549 add ebx, 8 .text:0x40054C cmp al, r9b .text:0x40054F mov rsi, cs:stdout ; fp .text:0x400556 jz short loc_400570 .text:0x400558 movsx edi, dil ; c .text:0x40055C add rbp, 1 .text:0x400560 call __IO_putc .text:0x400565 movzx r9d, byte ptr [rbp-1] .text:0x40056A jmp short loc_400510 .text:0x40056A ; --------------------------------------------------------------------------- .text:0x40056C align 10h .text:0x400570 .text:0x400570 loc_400570: ; CODE XREF: main+76j .text:0x400570 mov edi, 0Ah ; c .text:0x400575 call __IO_putc .text:0x40057A add rsp, 18h .text:0x40057E xor eax, eax .text:0x400580 pop rbx .text:0x400581 pop rbp .text:0x400582 retn .text:0x400582 main endp 

fibonacci
 .text:0x400670 fibonacci proc near ; CODE XREF: main+4Cp .text:0x400670 ; fibonacci+19p ... .text:0x400670 test edi, edi .text:0x400672 push r12 .text:0x400674 push rbp .text:0x400675 mov rbp, rsi .text:0x400678 push rbx .text:0x400679 jz short loc_4006F8 .text:0x40067B cmp edi, 1 .text:0x40067E mov ebx, edi .text:0x400680 jz loc_400710 .text:0x400686 lea edi, [rdi-2] .text:0x400689 call fibonacci .text:0x40068E lea edi, [rbx-1] .text:0x400691 mov r12d, eax .text:0x400694 mov rsi, rbp .text:0x400697 call fibonacci .text:0x40069C add eax, r12d .text:0x40069F mov edx, eax .text:0x4006A1 mov ebx, eax .text:0x4006A3 shr edx, 1 .text:0x4006A5 and edx, 55555555h .text:0x4006AB sub ebx, edx .text:0x4006AD mov ecx, ebx .text:0x4006AF mov edx, ebx .text:0x4006B1 shr ecx, 2 .text:0x4006B4 and ecx, 33333333h .text:0x4006BA mov esi, ecx .text:0x4006BC .text:0x4006BC loc_4006BC: ; CODE XREF: fibonacci+C2j .text:0x4006BC and edx, 33333333h .text:0x4006C2 lea ecx, [rsi+rdx] .text:0x4006C5 mov edx, ecx .text:0x4006C7 shr edx, 4 .text:0x4006CA add edx, ecx .text:0x4006CC mov esi, edx .text:0x4006CE and edx, 0F0F0F0Fh .text:0x4006D4 shr esi, 8 .text:0x4006D7 and esi, 0F0F0Fh .text:0x4006DD lea ecx, [rsi+rdx] .text:0x4006E0 mov edx, ecx .text:0x4006E2 shr edx, 10h .text:0x4006E5 add edx, ecx .text:0x4006E7 and edx, 1 .text:0x4006EA xor [rbp+0], edx .text:0x4006ED pop rbx .text:0x4006EE pop rbp .text:0x4006EF pop r12 .text:0x4006F1 retn .text:0x4006F1 ; --------------------------------------------------------------------------- .text:0x4006F2 align 8 .text:0x4006F8 .text:0x4006F8 loc_4006F8: ; CODE XREF: fibonacci+9j .text:0x4006F8 mov edx, 1 .text:0x4006FD xor [rbp+0], edx .text:0x400700 mov eax, 1 .text:0x400705 pop rbx .text:0x400706 pop rbp .text:0x400707 pop r12 .text:0x400709 retn .text:0x400709 ; --------------------------------------------------------------------------- .text:0x40070A align 10h .text:0x400710 .text:0x400710 loc_400710: ; CODE XREF: fibonacci+10j .text:0x400710 xor edi, edi .text:0x400712 call fibonacci .text:0x400717 mov edx, eax .text:0x400719 mov edi, eax .text:0x40071B shr edx, 1 .text:0x40071D and edx, 55555555h .text:0x400723 sub edi, edx .text:0x400725 mov esi, edi .text:0x400727 mov edx, edi .text:0x400729 shr esi, 2 .text:0x40072C and esi, 33333333h .text:0x400732 jmp short loc_4006BC .text:0x400732 fibonacci endp 

At this stage, we have many opportunities to solve this problem. For example, we can recover the code using one of the programming languages ​​and apply optimization there, but the process of recovering the code is a very difficult task during which we can make mistakes. Well, then comparing the code to find an error is generally no good. But, if we use the Unicorn Engine, then we can skip the code reconstruction step and avoid the problem described above. Of course, we can avoid these troubles using frida or writing scripts for gdb, but this is not about that.


Before starting the optimization, we will run the emulation in the Unicorn Engine without any changes to the program. And only after a successful launch, let's move on to optimization.


Step 1: Let Virtualization Come


Let's create the fibonacci.py file and save it next to the binary.


Let's start by importing the required libraries:


 from unicorn import * from unicorn.x86_const import * import struct 

The first line loads the main binary and base Unicorn constants. The second line loads constants for the two x86 and x86_64 architectures.


Next, add some necessary functions:


 def read(name): with open(name) as f: return f.read() def u32(data): return struct.unpack("I", data)[0] def p32(num): return struct.pack("I", num) 

Here we declared the functions that we will need later:



Note: If you have pwntools installed, then you do not need to create these functions, you just need to import them:


 from pwn import * 

And so, let's finally start initializing our Unicorn Engine class for the x86_64 architecture:


 mu = Uc (UC_ARCH_X86, UC_MODE_64) 

Here we call Uc functions with the following parameters:



You can find all the constants in Cheatsheet .


As I wrote above, to use the Unicorn Engine, we need to initialize the virtual memory manually. For this example, we need to place the code and the stack somewhere in memory.


The base address (Base addr) of the binary starts at 0x400000. Let's place our stack in 0x0 and allocate 1024 * 1024 memory for it. Most likely, we do not need so much space, but it still does not hurt.


We can mark up memory by calling the mem_map method.


Add these lines:


 BASE = 0x400000 STACK_ADDR = 0x0 STACK_SIZE = 1024*1024 mu.mem_map(BASE, 1024*1024) mu.mem_map(STACK_ADDR, STACK_SIZE) 

Now we need to load the binary into its main address in the same way as the bootloader does. After that we need to put the RSP in the end of the stack.


 mu.mem_write(BASE, read("./fibonacci")) mu.reg_write(UC_X86_REG_RSP, STACK_ADDR + STACK_SIZE - 1) 

Now we can start the emulation and run the code, but we need to figure out which address to start working with and when the emulator should stop.


Take the address of the first command from main () , we can start the emulation with 0x004004e0. The end will be a call to putc ("\ n") , which is located at 0x00400575, after removing the entire flag.


 .text:0x400570 mov edi, 0Ah ; c .text:0x400575 call __IO_putc 

We can start emulation:


 mu.emu_start(0x004004e0,0x00400575) 

Now run the script:


 a@x:~/Desktop/unicorn_engine_lessons$ python solve.py Traceback (most recent call last): File "solve.py", line 32, in <module> mu.emu_start(0x00000000004004E0, 0x0000000000400575) File "/usr/local/lib/python2.7/dist-packages/unicorn/unicorn.py", line 288, in emu_start raise UcError(status) unicorn.unicorn.UcError: Invalid memory read (UC_ERR_READ_UNMAPPED) 

Oops, something went wrong, but we don't even know what. Immediately before calling mu.emu_start, we can add:


 def hook_code(mu, address, size, user_data): print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' %(address, size)) mu.hook_add(UC_HOOK_CODE, hook_code) 

This code adds a hook. We declare our own hook_code function, which is called by the emulator before each command. It takes the following parameters:



')

Source: https://habr.com/ru/post/431058/


All Articles