📜 ⬆️ ⬇️

Compilation. 10: compile in ELF

Last time, we limited ourselves to compiling j-squeaks to a file in our own format, which required a special loader. In addition, we conceived a couple of executable code optimizations that require analysis of neighboring commands.

Further in the post:

  1. Peephole Optimization
  2. Standard features
  3. Output to ELF
  4. How it works?
  5. What happened?

Peephole Optimization


To realize the planned optimizations, firstly, we divide the common vector code into small vectors of the generated machine code for each command, and at the end we will glue them together.

Secondly, two passes are not enough for us: on the first pass, we generate machine code for each team, on the second - we perform optimizations, on the third - “polish”, filling in offsets of jumps and lines. Simultaneously with the calculation of jumps offsets, we will replace the near ( near ) jumps with short ( short ) whenever possible, so that on the third pass the code will be further reduced, and one more pass will be needed - the fourth; on it we will again correct the displacement of jumps that have changed due to the reduction of the code.
')
Each team will have to store the number of generated POP , because by machine code alone it is difficult to understand whether the last byte is a POP instruction, or simply matches the value. The number of generated PUSH not necessary to store: the first byte of the machine code is decrypted unambiguously.
  struct commandn { // ... int offset; //    int needfixat; //  JZ, ECHO int popcnt; // INPUT, ECHO std::vector<unsigned char> code;//   // ... void pop_back(int c = 1) { code.resize(code.size()-c); } void pop_front(int c = 1) { code.erase(code.begin(), code.begin()+c); } }; // ... // ""    JZ: // ,     case command::jz: if(!i->cmd.dest) // JMP off i->emit(0xe9); else { // OR dst, dst / JZ off i->emit(0x0b, 0xc0|(i->cmd.dest-1)<<3|(i->cmd.dest-1)); i->emit(0x0f, 0x84); } i->needfixat = i->code.size(); i->emit4(0); break; // ... //  ECHO  INPUT   popcnt; //   --   case command::input: foreach(rp, i->onenterp) if(*rp!=4) i->emit(0x50|(*rp-1)); i->emit(0xff, 0x55, 0); if(i->cmd.dest!=1) i->emit(0x90|(i->cmd.dest-1)); foreachr(rp, i->onenterp) if(*rp!=4) { i->emit(0x58|(*rp-1)); i->popcnt++; } break; // ... //  :   ,    int offset = 0; foreach2(i,pcode,next) { i->offset = offset; //  "POP-PUSH" while(i->popcnt && ((next->code[0]&0xfc) == 0x50) && ((i->code.back()&3) == (next->code[0]&3)) && //  :     !((next->cmd.opcode==command::echo) && (next->cmd.dest==(next->code[0]&3)+1))) { i->pop_back(); next->pop_front(); i->popcnt--; if(next->needfixat) next->needfixat--; } //  "-JZ" if((i->cmd.opcode>=command::eq) && (next->cmd.opcode==command::jz) && (i->cmd.dest==next->cmd.dest) && !next->onexitp.count((physreg)next->cmd.dest)) { char cc = i->code[i->code.size()-5]; // cond code i->pop_back(6); // SETcc / MOVZX next->code.clear(); //    next->emit(0x0f, cc^0x11); next->needfixat = next->code.size(); next->emit4(0); } offset += i->code.size(); } //  :    offset = 0; foreach(i, pcode) { i->offset = offset; if(i->cmd.opcode==command::jz) { int joffset = i->tgt->offset-(i->offset+i->code.size()); if((joffset>=-128) && (joffset<128)) { //   if(!i->cmd.dest) { // JMP SHORT i->code.clear(); i->emit(0xeb, (char)joffset); } else if(i->code[0]==0x0b && i->code[1]==0xc9) { // OR ECX, ECX i->code.clear(); i->emit(0xe3, (char)joffset); // JECXZ } else { char cc = i->code[i->code.size()-5]; // cond code i->pop_back(6); i->emit(cc^0xf0, (char)joffset);// Jcc SHORT } i->needfixat = i->code.size()-1; //     } } offset += i->code.size(); } //  :     foreach(i, pcode) if(i->needfixat) if(i->cmd.opcode==command::jz) { int joffset = i->tgt->offset-(i->offset+i->code.size()); switch(i->code[i->needfixat]-1) { case 0xeb: case 0xe3: case 0x74: case 0x75: case 0x7c: case 0x7d: case 0x7e: case 0x7f: i->code[i->needfixat] = (char)joffset; break; // short default: i->fix4(joffset); } } else if (i->cmd.opcode==command::echo) i->fix4(offsets[i->cmd.imm]); 


Standard features


Last time, the implementation of the standard functions input,echoi,echos was in our bootloader. This time there will be no bootloader; where will the functions be?

We can, in principle, insert their code into each generated binary; but it is ugly. Instead, we compile them into a separate .o file, which will then be linked to our file. The advantage of this approach is to isolate the platform dependency: all the differences between x86 and x64, which, from the point of view of our compiler, lie in the way we call ssh functions, hide it in a platform-dependent library, and we will generate a platform-independent code.

It is convenient for us that our standard functions take a parameter in ESI , and return the result to EAX . So do. Here is the implementation for x64:
 .global input,echoi,echos .text fd: .asciz "%d" fs: .asciz "%s" input: push %rax lea fd, %edi xor %eax, %eax mov %rsp, %rsi call scanf pop %rax ret echoi: lea fd, %edi echo: xor %eax, %eax jmp printf echos: movslq %esi, %rsi add %rbp, %rsi lea fs, %edi jmp echo 

And here is for x86:
 .global input,echoi,echos .text fd: .asciz "%d" fs: .asciz "%s" input: push %eax push %esp push $fd call scanf pop %eax pop2: pop %eax pop %eax ret echoi: push %esi push $fd echo: call printf jmp pop2 echos: add %ebp, %esi push %esi push $fs jmp echo 


Output to ELF


Our executable code is almost ready; it remains only to "pack" it so that the OS can link it and run it.

In memory, the code will consist of the same parts as before: code, data, communication area for calling standard functions. The difference between the .o file format and directly executable files — that in .o co-location of program parts ( sections ) in memory is unknown; therefore, the compiler cannot generate a link from one section to an address in another. Instead, the compiler generates an indication to the linker - a relocation indicating what address to calculate when linking, and how to calculate it. Therefore, the .o files in the ELF format specification are called relocated .

It would be possible to divide the data into two sections: separately initialized and unchangeable (communication area and lines), uninitialized and changeable separately (cells for poured registers); but then on x86 we would not have enough registers to constantly store the base addresses of both data sections, which means we would have to generate a relocation for each call. Simplify your life, and get along with one section of data.
For the same reason, to call standard functions, we use the communication area: to call directly we would need to create a relocation for each call. Let's make the communication area 24-byte both on x86 and x64 - again, for simplicity and cross-platform. On x64, it will simply be an array of three pointers (to input,echoi,echos ); on x86, each pointer will be followed by a 4-byte stub.

It turns out that we will always have four relocations: three addresses of standard functions in the field of communication, and the loading of the base address of data into RBP/EBP first command of the program. As a result, the generation of the “relocated” code will hardly differ from the generation of the “solid piece” last time. In some respects, it will even be simplified: since the lines will now be stored separately from the code, we can memorize their offsets right at the syntactic parsing stage; this way we will completely get rid of the “temporary identifiers of strings” and the stage of their binding before outputting the compiled code. In addition, the method of calling standard functions will no longer depend on the platform.
  typedef std::map<std::string,int> stringmap; stringmap strings; int laststr = 24; //       std::vector<stringmap::iterator> strdata; //   // ... //    VAL: ID '(' ARGS ')' if (!$1.compare("echo")) { if(!$3.size()) yyerror("Input: too many arguments"); $$ = 0; foreach(i, $3) if(!i->dest) // string if(strings.count(i->str)) emit(command::echo, 0, strings[i->str]); else { strdata.push_back(strings.insert(stringmap::value_type(i->str,laststr)).first); emit(command::echo, 0, laststr); laststr += i->str.length()+1; } else emit(command::echo, i->dest, 0); } // ... // (    -) case command::hlt: i->emit(0x5b, 0x5e, 0x5f, 0x5d); // POP EBX / POP ESI / POP EDI / POP EBP i->emit(0xc3); // RET break; case command::echo: // PUSH live / MOV EDI, dst / CALL [EBP+?] / POP live foreach(rp, i->onexitp) if(*rp!=4) i->emit(0x50|(*rp-1)); if(!i->cmd.dest) { // imm / [EBP+16] i->emit14(0xbe, i->cmd.imm); i->emit(0xff, 0x55, 16); } else { if(i->known.count(i->cmd.dest)) // imm / [EBP+4] i->emit14(0xbe, i->known[i->cmd.dest]); else // dst / [EBP+8] i->emit(0x8b, 0xf0|(i->cmd.dest-1)); i->emit(0xff, 0x55, 8); } foreachr(rp, i->onexitp) if(*rp!=4) { i->emit(0x58|(*rp-1)); i->popcnt++; } break; 

Relocations in ELF come in two formats: rel or rela . The rel format is smaller, but the linker is easier to work with rela ; therefore, when switching to the x64 platform, rel type relocations were declared “ deprecated ”. However, my version of ld supports rel in 64-bit code, so we’ll generate rel .

Generate eight standard sections: empty, .shstrtab, .strtab, .symtab, .rel.text, .rel.data, .text, .data . The ELF header and first six sections have predefined content; only .text and .data filled depending on the generated code.
 //  ELF #include <linux/elf.h> #if ELF_CLASS == ELFCLASS32 #define Elf_Shdr Elf32_Shdr #define Elf_Sym Elf32_Sym #define Elf_Rel Elf32_Rel #define ELF_R_INFO(s,t) (((s)<<8)+(unsigned char)(t)) #define R_32 R_386_32 #else #define Elf_Shdr Elf64_Shdr #define Elf_Sym Elf64_Sym #define Elf_Rel Elf64_Rel #define ELF_R_INFO(s,t) (((unsigned long)(s)<<32)+(t)) #define R_32 R_X86_64_32 #endif #define ST_GLOBAL_NOTYPE STB_GLOBAL<<4 // ... //    : //       //     // : PUSH EBP / PUSH EDI / PUSH ESI / PUSH EBX / MOV RBP, ... const char prolog[] = {0x55,0x57,0x56,0x53,0x48,0xc7,0xc5,0,0,0,0}; offset += sizeof(prolog); int alignment = ((offset+3)&~3) - offset; //   dword offset += alignment; const struct { elfhdr hdr; Elf_Shdr Shdr[8]; char shstrtab[64]; char strtab[24]; Elf_Sym symtab[6]; Elf_Rel reltext[1]; Elf_Rel reldata[3]; } elf = {{{ELFMAG0, ELFMAG1, ELFMAG2, ELFMAG3, ELF_CLASS, ELF_DATA, EV_CURRENT}, // identification ET_REL, ELF_ARCH, EV_CURRENT, 0, 0, sizeof(elfhdr), 0, sizeof(elfhdr), 0, 0, sizeof(Elf_Shdr),8, 1}, {{0, SHT_NULL}, {1, SHT_STRTAB, 0, 0, (char*)&elf.shstrtab-(char*)&elf, sizeof(elf.shstrtab), 0, 0, 1, 0}, {11, SHT_STRTAB, 0, 0, (char*)&elf.strtab-(char*)&elf, sizeof(elf.strtab), 0, 0, 1, 0}, {19, SHT_SYMTAB, 0, 0, (char*)&elf.symtab-(char*)&elf, sizeof(elf.symtab), 2, 2, 8,sizeof(Elf_Sym)}, {27, SHT_REL, 0, 0, (char*)&elf.reltext-(char*)&elf, sizeof(elf.reltext), 3, 6, 8, sizeof(Elf_Rel)}, {37, SHT_REL, 0, 0, (char*)&elf.reldata-(char*)&elf, sizeof(elf.reldata), 3, 7, 8, sizeof(Elf_Rel)}, {47, SHT_PROGBITS, SHF_ALLOC|SHF_EXECINSTR, 0, sizeof(elf), offset, 0, 0, 4, 0}, {53, SHT_PROGBITS, SHF_ALLOC|SHF_WRITE, 0, sizeof(elf)+offset, laststr+lastspill*4, 0, 0, 4, 0}}, "\0.shstrtab\0.strtab\0.symtab\0.rel.text\0.rel.data\0.text\0.data", // shstrtab "\0main\0input\0echoi\0echos", // strtab {{}, #if ELF_CLASS == ELFCLASS32 {0, 0, 0, STT_SECTION, 0, 7}, {1, 0, 0, ST_GLOBAL_NOTYPE, 0, 6}, // main {6, 0, 0, ST_GLOBAL_NOTYPE, 0, 0}, // input {12, 0, 0, ST_GLOBAL_NOTYPE, 0, 0}, // echoi {18, 0, 0, ST_GLOBAL_NOTYPE, 0, 0}}, // echos #else {0, STT_SECTION, 0, 7, 0, 0}, {1, ST_GLOBAL_NOTYPE, 0, 6, 0, 0}, // main {6, ST_GLOBAL_NOTYPE, 0, 0, 0, 0}, // input {12, ST_GLOBAL_NOTYPE, 0, 0, 0, 0}, // echoi {18, ST_GLOBAL_NOTYPE, 0, 0, 0, 0}}, // echos #endif {{7, ELF_R_INFO(1,R_32)}}, // reltext {{0, ELF_R_INFO(3,1)}, // input {8, ELF_R_INFO(4,1)}, // echoi {16, ELF_R_INFO(5,1)}} // echos }; write(1, &elf, sizeof(elf)); //   write(1, prolog, sizeof(prolog)); foreach(i, pcode) write(1, &*i->code.begin(), i->code.size()); //   dword const char zero[24] = {}; write(1, zero, alignment); //   write(1, zero, 24); //   foreach(i, strdata) write(1, (*i)->first.c_str(), (*i)->first.length()+1); //     ftruncate(1, sizeof(elf)+offset+laststr+lastspill*4); 


How it works?


In the invariable header, which we append to the generated code, a bunch of incomprehensible digits. What do they all mean?
Let's go in order.To confuse programmers, the Elf32_Sym and Elf64_Sym defined with the same fields, but in a different order. We have no choice but to write two variants of the code, and with the help of #ifdef choose one of them.

Rel type relocations consist of triples "offset of the field being relocated, symbol number, code of the binding type". The first relocation ( {7, ELF_R_INFO(1,R_32)} ) is in the prologue, at offset 7 from the beginning of the code; it specifies the base data address loaded into EBP/RBP . This relocation refers to symbol # 1, i.e. per section .data . On any architecture, it has a size of 32 bits. The binding type code is different here: R_386_32=1 for x86, and R_X86_64_32=10 for x64.
Three other relocations are in the data, getting the addresses of three imported functions by the offset of 0.8.16. These three relocations refer to the imported characters (Nos. 3, 4, 5), and use the binding code 1, always equal to the machine word ( R_386_32,R_X86_64_64 ).

Full compiler code: tyomitch.net.ru/jsk.y.elf.html

If you are interested in collecting binaries manually, I advise you to look at ways to reduce the size of ELF files , from practically useful tricks to crazy hacks; and at the same time examples of utterly terrible programs .
It turned out a file of 45 bytes in size: five times smaller than in assembler, and fifty times smaller than in C. We threw out of the file everything we could; and the fact that they could not throw out, we use simultaneously for two or three purposes.

Approximately half of the values ​​in this file somehow violate the ELF standard; a normal programmer would be ashamed to admit that such a program is the fruit of his hands. Amazingly, Linux agrees to assign a PID to this nightmare.

On the other hand, about every byte in this file I can explain why it is needed. Often you can say the same about the files you compiled?

What happened?


Now the final binary is made up of two independent components that we can compile separately.
 [tyomitch@home ~]$ as jskstd.s -o jskstd.o 
[tyomitch@home ~]$
[tyomitch@home ~]$ bison jsk.y
[tyomitch@home ~]$ c++ jsk.tab.c lex.yy.c -o jskc
[tyomitch@home ~]$
[tyomitch@home ~]$ ./jskc < test.jsk > code.o
[tyomitch@home ~]$ cc jskstd.o code.o
[tyomitch@home ~]$ ./a.out
0 1000,
500? (1=, 2=, 3=) 1
249? (1=, 2=, 3=) 3
! !
[tyomitch@home ~]$ objdump -d code.o
code.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <main>: 0: 55 push %rbp 1: 57 push %rdi 2: 56 push %rsi 3: 53 push %rbx 4: 48 c7 c5 00 00 00 00 mov $0x0,%rbp b: 33 c0 xor %eax,%eax d: b9 e8 03 00 00 mov $0x3e8,%ecx 12: 50 push %rax 13: 51 push %rcx 14: be 18 00 00 00 mov $0x18,%esi 19: ff 55 10 callq *0x10(%rbp) 1c: 59 pop %rcx 1d: 58 pop %rax 1e: 50 push %rax 1f: 51 push %rcx 20: be 00 00 00 00 mov $0x0,%esi 25: ff 55 08 callq *0x8(%rbp) 28: be 38 00 00 00 mov $0x38,%esi 2d: ff 55 10 callq *0x10(%rbp) 30: 59 pop %rcx 31: 51 push %rcx 32: be e8 03 00 00 mov $0x3e8,%esi 37: ff 55 08 callq *0x8(%rbp) 3a: be 3f 00 00 00 mov $0x3f,%esi 3f: ff 55 10 callq *0x10(%rbp) 42: 59 pop %rcx 43: 58 pop %rax 44: 3b c1 cmp %ecx,%eax 46: 0f 8f 6c 00 00 00 jg b8 <main+0xb8> 4c: 8d 14 01 lea (%rcx,%rax,1),%edx 4f: d1 fa sar %edx 51: 50 push %rax 52: 51 push %rcx 53: 52 push %rdx 54: be 64 00 00 00 mov $0x64,%esi 59: ff 55 10 callq *0x10(%rbp) 5c: 5a pop %rdx 5d: 52 push %rdx 5e: 8b f2 mov %edx,%esi 60: ff 55 08 callq *0x8(%rbp) 63: be 6c 00 00 00 mov $0x6c,%esi 68: ff 55 10 callq *0x10(%rbp) 6b: ff 55 00 callq *0x0(%rbp) 6e: 93 xchg %eax,%ebx 6f: 5a pop %rdx 70: 59 pop %rcx 71: 58 pop %rax 72: 89 85 04 01 00 00 mov %eax,0x104(%rbp) 78: 83 fb 01 cmp $0x1,%ebx 7b: 75 0b jne 88 <main+0x88> 7d: 8b 85 04 01 00 00 mov 0x104(%rbp),%eax 83: 8d 4a ff lea 0xffffffffffffffff(%rdx),%ecx 86: eb bc jmp 44 <main+0x44> 88: 83 fb 02 cmp $0x2,%ebx 8b: 75 05 jne 92 <main+0x92> 8d: 8d 42 01 lea 0x1(%rdx),%eax 90: eb b2 jmp 44 <main+0x44> 92: 8b 85 04 01 00 00 mov 0x104(%rbp),%eax 98: 83 fb 03 cmp $0x3,%ebx 9b: 75 0d jne aa <main+0xaa> 9d: be 9f 00 00 00 mov $0x9f,%esi a2: ff 55 10 callq *0x10(%rbp) a5: 5b pop %rbx a6: 5e pop %rsi a7: 5f pop %rdi a8: 5d pop %rbp a9: c3 retq aa: 50 push %rax ab: 51 push %rcx ac: be bb 00 00 00 mov $0xbb,%esi b1: ff 55 10 callq *0x10(%rbp) b4: 59 pop %rcx b5: 58 pop %rax b6: eb 8c jmp 44 <main+0x44> b8: be dd 00 00 00 mov $0xdd,%esi bd: ff 55 10 callq *0x10(%rbp) c0: 5b pop %rbx c1: 5e pop %rsi c2: 5f pop %rdi c3: 5d pop %rbp c4: c3 retq
, , , llvm. — .
[tyomitch@home ~]$ as jskstd.s -o jskstd.o
[tyomitch@home ~]$
[tyomitch@home ~]$ bison jsk.y
[tyomitch@home ~]$ c++ jsk.tab.c lex.yy.c -o jskc
[tyomitch@home ~]$
[tyomitch@home ~]$ ./jskc < test.jsk > code.o
[tyomitch@home ~]$ cc jskstd.o code.o
[tyomitch@home ~]$ ./a.out
0 1000,
500? (1=, 2=, 3=) 1
249? (1=, 2=, 3=) 3
! !
[tyomitch@home ~]$ objdump -d code.o
code.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <main>: 0: 55 push %rbp 1: 57 push %rdi 2: 56 push %rsi 3: 53 push %rbx 4: 48 c7 c5 00 00 00 00 mov $0x0,%rbp b: 33 c0 xor %eax,%eax d: b9 e8 03 00 00 mov $0x3e8,%ecx 12: 50 push %rax 13: 51 push %rcx 14: be 18 00 00 00 mov $0x18,%esi 19: ff 55 10 callq *0x10(%rbp) 1c: 59 pop %rcx 1d: 58 pop %rax 1e: 50 push %rax 1f: 51 push %rcx 20: be 00 00 00 00 mov $0x0,%esi 25: ff 55 08 callq *0x8(%rbp) 28: be 38 00 00 00 mov $0x38,%esi 2d: ff 55 10 callq *0x10(%rbp) 30: 59 pop %rcx 31: 51 push %rcx 32: be e8 03 00 00 mov $0x3e8,%esi 37: ff 55 08 callq *0x8(%rbp) 3a: be 3f 00 00 00 mov $0x3f,%esi 3f: ff 55 10 callq *0x10(%rbp) 42: 59 pop %rcx 43: 58 pop %rax 44: 3b c1 cmp %ecx,%eax 46: 0f 8f 6c 00 00 00 jg b8 <main+0xb8> 4c: 8d 14 01 lea (%rcx,%rax,1),%edx 4f: d1 fa sar %edx 51: 50 push %rax 52: 51 push %rcx 53: 52 push %rdx 54: be 64 00 00 00 mov $0x64,%esi 59: ff 55 10 callq *0x10(%rbp) 5c: 5a pop %rdx 5d: 52 push %rdx 5e: 8b f2 mov %edx,%esi 60: ff 55 08 callq *0x8(%rbp) 63: be 6c 00 00 00 mov $0x6c,%esi 68: ff 55 10 callq *0x10(%rbp) 6b: ff 55 00 callq *0x0(%rbp) 6e: 93 xchg %eax,%ebx 6f: 5a pop %rdx 70: 59 pop %rcx 71: 58 pop %rax 72: 89 85 04 01 00 00 mov %eax,0x104(%rbp) 78: 83 fb 01 cmp $0x1,%ebx 7b: 75 0b jne 88 <main+0x88> 7d: 8b 85 04 01 00 00 mov 0x104(%rbp),%eax 83: 8d 4a ff lea 0xffffffffffffffff(%rdx),%ecx 86: eb bc jmp 44 <main+0x44> 88: 83 fb 02 cmp $0x2,%ebx 8b: 75 05 jne 92 <main+0x92> 8d: 8d 42 01 lea 0x1(%rdx),%eax 90: eb b2 jmp 44 <main+0x44> 92: 8b 85 04 01 00 00 mov 0x104(%rbp),%eax 98: 83 fb 03 cmp $0x3,%ebx 9b: 75 0d jne aa <main+0xaa> 9d: be 9f 00 00 00 mov $0x9f,%esi a2: ff 55 10 callq *0x10(%rbp) a5: 5b pop %rbx a6: 5e pop %rsi a7: 5f pop %rdi a8: 5d pop %rbp a9: c3 retq aa: 50 push %rax ab: 51 push %rcx ac: be bb 00 00 00 mov $0xbb,%esi b1: ff 55 10 callq *0x10(%rbp) b4: 59 pop %rcx b5: 58 pop %rax b6: eb 8c jmp 44 <main+0x44> b8: be dd 00 00 00 mov $0xdd,%esi bd: ff 55 10 callq *0x10(%rbp) c0: 5b pop %rbx c1: 5e pop %rsi c2: 5f pop %rdi c3: 5d pop %rbp c4: c3 retq
, , , llvm. — .

Source: https://habr.com/ru/post/104104/


All Articles