📜 ⬆️ ⬇️

Part 0. Requires an elf to work in the Matrix. Relocation possible

Attention: contains system programming. Yes, in essence, nothing else and does not contain.


Let's imagine that you were given the task to write a fantasy-fiction game. Well there about the elves. And about virtual reality. Since childhood, you dreamed of writing something such and, without hesitation, you agree. Soon you realize that you know about the world of elves mostly from anecdotes from the old bashorg and other isolated sources. Oops, discrepancy. Well, where ours did not disappear ... Taught by rich programming experience, you go to Google, enter "Elf specification" and follow the links. ABOUT! This one leads to some kind of PDF ... so, what do we have here ... some Elf32_Sword - elf swords - it looks like what you need. 32 is, apparently, the level of the character, and the two fours in the next columns are damage, I guess. Exactly what is needed, moreover, how systematized! ..


As it was said in one task on Olympiad programming after a couple of paragraphs of detailed text on the topic of Japan, samurai and geishas: “As you already understood, the task will not be at all about that.” Oh yeah, the contest was, naturally, for a while. In general, I declare the five-minute stop up closed.


Today I will try to tell you about parsing a file in a 64-bit ELF format. In principle, that it just does not store - native programs, static libraries, dynamic libraries, any implementation-specific, like crashdump-s ... It is used, for example, on Linux and many other Unix-like systems, yes, they say, even on phones his support was previously actively stuffed in patched firmware. It would seem that supporting the storage format of programs from serious operating systems should be difficult. So I thought. Yes, so it probably is. But we will support a very specific use case: loading the eBPF bytecode from .o files. Why is that? Just for further experiments, I will need some serious (that is, not knee-length ) cross-platform byte code that can be obtained from C, rather than manually writing, so eBPF is simple and has an LLVM backend for it. And I need ELF to parse just as a container in which this byte code is put by the compiler.


Just in case, I will clarify: the article is in the nature of exploratory programming and does not claim to be a comprehensive guide. The ultimate goal is to make a bootloader that will allow you to read C programs compiled into eBPF with the help of Clang programs in C - those that I have - in a volume sufficient to continue the experiments.


Headline


Starting with a zero offset in the ELF lies the header. It contains the very letters E, L, F that you can see if you try to open it with a text editor, and some global variables. Actually, the header is the only structure in the file located at a fixed offset, and it contains information to track down the remaining structures. (Hereinafter, I am guided by the documentation on the 32-bit format and elf.h , who know about 64-bit. So, if you notice errors, you can easily correct it)


The first thing that meets us in the file is the unsigned char e_ident[16] field. Remember these funny articles in the “all following statements are false” series? Here is about the same: ELF can contain a 32-bit or 64-bit code, Little or Big Endian, and even under a dozen processor architectures. You are going to read it as Elf64 under Little endian - well, good luck ... This is an array of bytes and is a kind of signature of what is inside and how to parse.


With the first four bytes, everything is simple - it is [0x7f, 'E', 'L', 'F'] . If they do not match, then there is reason to believe that these are some kind of irregular bees. The next byte contains the class. character File: ELFCLASS32 or ELFCLASS64 - digit capacity. For simplicity, we will work only with 64-bit files (and is there 32-bit eBPF?). If the class turned out to be ELFCLASS32 , we simply exit with an error: the structures will still “float”, and the sanity check will not hurt. The last byte of interest in this structure indicates the endianness of the file - we will work only with the “native” order of bytes for our processor.


Just in case, I’ll clarify: working with the ELF format on C you shouldn’t read every int according to the cleverly calculated offset - elf.h contains the necessary structures, and even the byte numbers in e_ident : EI_MAG0 , EI_MAG1 , EI_MAG2 , EI_MAG3 , EI_CLASS , EI_DATA ... You just need to bring pointer to the read or mapped data from the file to the pointer to the structure and read.


In addition to the e_ident header contains other fields, some we just check, and some use for further analysis, but then. Namely, we check that e_machine == EM_BPF (that is, it is “for the architecture of the eBPF processor”), e_type == ET_REL , e_shoff != 0 . The last check has the following meaning: the file may contain information for linking (section table and sections), for launching (program table and segments), or both. With the last two checks, we check that the information we need (as if for linking) is in the file. Also verify that the format version is EV_CURRENT .


Immediately make a reservation, I will not check the validity of the file, assuming that if we load it into our process, we trust it. In the code of the kernel or other programs working with untrusted files, this, of course, cannot be done in any way .


Section table


As I said, we are interested in the linking view of the file, that is, the table of sections and the sections themselves. Information on where to look for the section table is in the header. Its size is also indicated there, as well as the size of one element - it may be larger than the sizeof(Elf64_Shdr) (how this will affect the version number of the format, to be honest, I don’t know). Some of the higher section numbers are reserved, and in fact in the table are not present. Referring to them has a special meaning. We are interested, apparently, only SHN_UNDEF (zero is also reserved - the missing section; by the way, as you understand, its heading in the table is still there) SHN_ABS . The symbol “defined in the SHN_UNDEF section” is actually undefined, but in SHN_ABS it actually has an absolute value and is not relocated. However, SHN_ABS doesn't seem to be SHN_ABS me yet.


Row table


Here we first stumble upon string tables - tables of strings used in the file. In fact, if const char *strtab is a string table, then sh_name is just strtab + sh_name . Yes, this is just a string, starting with a certain index, and continuing to a zero byte. Strings may intersect (more precisely, one may be a suffix of the other). Sections can have names, then in ELF Header the e_shstrndx field will point to the row table section (the one for section names if there are several), and the sh_name field in the section header is for a specific row.


The first (zero) and last bytes of the string table contain null characters. The last is clear why: the value is hour, ends the last line. But the zero offset specifies the missing or empty name - depending on the context.


Loading sections


In the header of each section there are two addresses: one, sh_addr is the download address (where the section will be placed in memory), the other, sh_offset is the offset in the file where this section is located there. I don’t know how both are, but each one of these values ​​may be 0: in one case the section “remains on disk”, since there is some service information there. In the other - the section is not loaded from the disk , for example, it just needs to be allocated, and filled with zeros ( .bss ). Honestly, until I had to handle the download address - where it was loaded, it was loaded there :) However, we also have, we must say, specific programs.


Relocation


And now it’s interesting: as it is known, safety does not go to the Matrix without an operator remaining on the base. And since we still have fantasy here, the connection with the operator will be telepathic. Oh yeah, I declared the five-minute perusibility completed. In general, we briefly discuss the linking process.


For my experiment, I will need a piece of code compiled into the usual so-shku loaded by the usual libdl . Here I will not even describe in detail - just open the dlopen , stretch the characters through dlsym , close the program with dlclose when the program dlclose . However, even this is already implementation details not related to our ELF file loader. There is simply a context : the ability to get a pointer by name.


In general, the eBPF instruction set is a triumph of aligned machine code: the instruction always takes 8 bytes and has the structure


 struct { uint8_t opcode; uint8_t dst:4; uint8_t src:4; uint16_t offset; uint32_t imm; }; 

Moreover, many fields in each specific instruction may not be used - saving space for a “machine” code is not about us.


In fact, the second instruction can immediately follow the second one, which does not contain any opcodes, but simply extends the immediate field from 32 to 64 bits. Here is the patching of such a compound instruction called R_BPF_64_64 .


In order to perform a relocation, once again look at the partition table for sh_type == SHT_REL . The sh_info field of the header indicates which section we are patching, and sh_link - from which table to take a description of the characters.


 typedef struct { Elf64_Addr r_offset; Elf64_Xword r_info; } Elf64_Rel; 

Actually, there are two types of relocation sections: REL and RELA - the second explicitly contains an additional term, but I haven’t met it yet, so just add assertion to the fact that it really does not meet, and we will process it. Further I will add to the value that is written in the instructions, the address of the symbol. And where to get it? Here, as we already know, options are possible:



How to try it yourself


First, what to read? In addition to the already specified specification, it makes sense to read this file , in which the iovisor team collects information extracted from the Linux kernel via eBPF.


Secondly, how to work with all this? First you need to get an ELF file from somewhere. As stated on StackOverfow , the team will help us.


 clang -O2 -emit-llvm -c bpf.c -o - | llc -march=bpf -filetype=obj -o bpf.o 

Secondly, you need to somehow get the reference file parsing into pieces. objdump command would help us:


 $ objdump : objdump <> <()>     <()>.          : -a, --archive-headers Display archive header information -f, --file-headers Display the contents of the overall file header -p, --private-headers Display object format specific file header contents -P, --private=OPT,OPT... Display object format specific contents -h, --[section-]headers Display the contents of the section headers -x, --all-headers Display the contents of all headers -d, --disassemble Display assembler contents of executable sections -D, --disassemble-all Display assembler contents of all sections --disassemble=<sym> Display assembler contents from <sym> -S, --source Intermix source code with disassembly -s, --full-contents Display the full contents of all sections requested -g, --debugging Display debug information in object file -e, --debugging-tags Display debug information using ctags style -G, --stabs Display (in raw form) any STABS info in the file -W[lLiaprmfFsoRtUuTgAckK] or --dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames, =frames-interp,=str,=loc,=Ranges,=pubtypes, =gdb_index,=trace_info,=trace_abbrev,=trace_aranges, =addr,=cu_index,=links,=follow-links] Display DWARF info in the file -t, --syms Display the contents of the symbol table(s) -T, --dynamic-syms Display the contents of the dynamic symbol table -r, --reloc Display the relocation entries in the file -R, --dynamic-reloc Display the dynamic relocation entries in the file @<file> Read options from <file> -v, --version Display this program's version number -i, --info List object formats and architectures supported -H, --help Display this information 

But in this case, it is powerless:


 $ objdump -d test-bpf.o test-bpf.o:   elf64-little objdump:      UNKNOWN! 

More precisely, it will show sections, but there is a problem with disassembling. Here we remember what we collected using LLVM. And LLVM has its own extended analogues of utilities from binutils, with names like llvm-< > . They, for example, understand LLVM bitcode. And they also understand eBPF - it probably depends on the compilation parameters, but since it is compiled, it probably should always be parsed. Therefore, for convenience, I recommend creating a script:


 vim test-bpf.c #     clang -Oz -emit-llvm -c test-bpf.c -o - | llc -march=bpf -filetype=obj -o test-bpf.o llvm-objdump -d -t -r test-bpf.o 

Then for this source:


 #include <stdint.h> extern uint64_t z; uint64_t func(uint64_t x, uint64_t y) { return x + y + z; } 

There will be such a result:


 $ ./compile-bpf.sh test-bpf.o: file format ELF64-BPF Disassembly of section .text: 0000000000000000 func: 0: bf 20 00 00 00 00 00 00 r0 = r2 1: 0f 10 00 00 00 00 00 00 r0 += r1 2: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 0000000000000010: R_BPF_64_64 z 4: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0) 5: 0f 10 00 00 00 00 00 00 r0 += r1 6: 95 00 00 00 00 00 00 00 exit SYMBOL TABLE: 0000000000000000 l df *ABS* 00000000 test-bpf.c 0000000000000000 ld .text 00000000 .text 0000000000000000 g F .text 00000038 func 0000000000000000 *UND* 00000000 z 

Code


Part 1. QInst: it is better to lose a day, then fly five minutes later (we write the instrumentation is trivial)


')

Source: https://habr.com/ru/post/452592/


All Articles