📜 ⬆️ ⬇️

Greetings from the libc free world! (Part 1)

As an exercise, I want to write a program in C. Simple enough to disassemble it and explain the whole code to myself.

Sounds easy, right?

The reader assumes experience in compiling programs and working in Linux. A small ability to read assembly code is also useful.

So, here is our simplest hello world:
')
  jesstess @ kid-charlemagne: ~ / c $ cat hello.c
 #include <stdio.h>

 int main ()
 {
	 printf ("Hello World \ n");
	 return 0;
 } 

Compile it and count the number of characters:

  jesstess @ kid-charlemagne: ~ / c $ gcc -o hello hello.c
 jesstess @ kid-charlemagne: ~ / c $ wc -c hello
 10931 hello 

Figas! Where do these 11 kilobytes come from? objdump -t hello shows 79 entries in the table of identifiers, for most of which the standard library is responsible.

So we will not use it. And we won't use printf either to get rid of the include:

  jesstess @ kid-charlemagne: ~ / c $ cat hello.c
 int main ()
 {
	 char * str = "Hello World";
	 return 0;
 } 

Recompile and recalculate the number of characters:

  jesstess @ kid-charlemagne: ~ / c $ gcc -o hello hello.c
 jesstess @ kid-charlemagne: ~ / c $ wc -c hello
 10892 hello 

Almost nothing has changed? Ha!

The problem is that gcc still uses startup files (?) During linking. Proof of? -nostdlib with the -nostdlib key, after which (according to the documentation) gcc “will not use the system libraries and startup files when linking. Only files explicitly transferred to the linker will be used. ”

  jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib -o hello hello.c
 / usr / bin / ld: warning: cannot find entry symbol _start;  defaulting to 00000000004000e8 

Just a warning, still try:

  jesstess @ kid-charlemagne: ~ / c $ wc -c hello
 1329 hello 

Looks good! We reduced the size to a much more sane (as much as a whole order!) ...

  jesstess @ kid-charlemagne: ~ / c $ ./hello
 Segmentation fault 

... and paid for it with a segfolt. Pancake.

For fun, let's make our program run before we start understanding the assembler.

What makes the symbol _start , which seems to be needed to run the program? Where is it usually defined when using libc?

By default, from the point of view of the linker , _start , rather than main , is the real entry point into the program. Usually _start defined in the crt1.o ELF being crt1.o . Verify this by linking the helloWord with crt1.o and noting that _start now detected (but other problems appeared instead because other startup records libc are not defined):

  # compile the source without linking
 jesstess @ kid-charlemagne: ~ / c $ gcc -Os -c hello.c
 # now try to link
 jesstess @ kid-charlemagne: ~ / c $ ld /usr/lib/crt1.o -o hello hello.o
 /usr/lib/crt1.o: In function `_start ':
 /build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:106: undefined reference to `__libc_csu_fini '
 /build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:107: undefined reference to `__libc_csu_init '
 /build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:113: undefined reference to `__libc_start_main ' 

The check reported that on this computer _start lives in the libc source: sysdeps/x86_64/elf/start.S sysdeps/x86_64/elf/start.S . This delightfully commented file exports the _start character, initializes the stack, some registers, and calls __libc_start_main . If you look at the bottom csu/libc-start.c csu/libc-start.c , you can see the _main call of our program:

  / * Nothing special, just call the function * /
 result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);

... and went.

So this is why _start needed. For convenience, _start summarize what happens between _start and the main call: initialize a bunch of things for libc and call main . And since we don’t need libc, we export our own _start symbol, which only knows what to call main , and link it with it:

  jesstess @ kid-charlemagne: ~ / c $ cat stubstart.S
 .globl _start

 _start:
	 call main 

Compile and execute the helloWorld with the _start assembler stub:

  jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib stubstart.S -o hello hello.c
 jesstess @ kid-charlemagne: ~ / c $ ./hello
 Segmentation fault 

Hooray, with the compilation of problems is no more. But the Segfolt did not go anywhere. Why? Compile with debug information and take a look at gdb. Set a breakpoint on main and step by step execute the program before the default:

  jesstess @ kid-charlemagne: ~ / c $ gcc -g -nostdlib stubstart.S -o hello hello.c
 jesstess @ kid-charlemagne: ~ / c $ gdb hello
 GNU gdb 6.8-debian
 Copyright (C) 2008 Free Software Foundation, Inc.
 License GPLv3 +: GNU GPL version 3 or later
 This is free software:
 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
 and "show warranty" for details.
 This GDB was configured as "x86_64-linux-gnu" ...
 (gdb) break main
 Breakpoint 1 at 0x4000f4: file hello.c, line 3.
 (gdb) run
 Starting program: / home / jesstess / c / hello

 Breakpoint 1, main () at hello.c: 5
 5 char * str = "Hello World";
 (gdb) step
 6 return 0;
 (gdb) step
 7}
 (gdb) step
 0x00000000004000ed in _start ()
 (gdb) step
 Single stepping until exit from function _start,
 which has no line number information.
 main () at helloint.c: 4
 four {
 (gdb) step

 Breakpoint 1, main () at helloint.c: 5
 5 char * str = "Hello World";
 (gdb) step
 6 return 0;
 (gdb) step
 7}
 (gdb) step

 Program received signal SIGSEGV, Segmentation fault.
 0x0000000000001 in ??  ()
 (gdb) 

What? main is executed twice? ... The time has come to take on the assembler:

  jesstess @ kid-charlemagne: ~ / c $ objdump -d hello
 hello: file format elf64-x86-64
 Disassembly of section .text:

 00000000004000e8 <_start>:
   4000e8: e8 03 00 00 00 callq 4000f0
   4000ed: 90 nop
   4000ee: 90 nop
   4000ef: 90 nop    

 00000000004000f0:
   4000f0: 55 push% rbp
   4000f1: 48 89 e5 mov% rsp,% rbp
   4000f4: 48 c7 45 f8 03 01 40 movq $ 0x400103, -0x8 (% rbp)
   4000fb: 00
   4000fc: b8 00 00 00 00 mov $ 0x0,% eax
   400101: c9 leaveq
   400102: c3 retq 

Heh We will leave a detailed analysis of the assembler for later, noting in brief the following: after returning from callq to main we execute a few nop and return directly to main . Since the re-entry into main was made without setting the return instruction pointer on the stack (as part of the standard preparation for calling the function), the second retq call retq to retq dummy return instruction pointer from the stack and the program crashes. Need a way to complete.

Literally. After returning from callq to %eax , push 1 , the sys_exit system call sys_exit , and so on are made. need to report the correct completion put in %ebx 0 , the only argument is SYS_exit . Now we enter the kernel with the int $0x80 interrupt.

  jesstess @ kid-charlemagne: ~ / c $ cat stubstart.S
 .globl _start

 _start:
	 call main
	 movl $ 1,% eax
	 xorl% ebx,% ebx
	 int $ 0x80
 jesstess @ kid-charlemagne: ~ / c $ gcc -nostdlib stubstart.S -o hello hello.c
 jesstess @ kid-charlemagne: ~ / c $ ./hello
 jesstess @ kid-charlemagne: ~ / c $ 

Hooray! The program is compiled, run, when running through gdb even ends normally.

Greetings from the libc free world!

Stay with me, in the second part we will analyze the assembler code in detail, let's see what happens if we make the program more complex, and understand a little more about linking, calling conventions and the structure of the binary ELF file in the x86 architecture.

Source: https://habr.com/ru/post/88101/


All Articles