📜 ⬆️ ⬇️

Embedding into the Linux kernel: intercepting functions

Intercepting kernel functions is a basic method that allows you to redefine / supplement its various mechanisms. Based on the fact that the Linux kernel is written almost entirely in C, with the exception of small architecture-specific parts, it can be argued that to implement embedding in most of the kernel components, it is enough to be able to intercept the corresponding functions.

This article is a continuation of the previously announced cycle devoted to the particular issues of the implementation of the imposed remedies and, in particular, embedding into the software systems.


')
The purpose of intercepting any function is to get control at the time it is called. Further actions depend on specific tasks. In some cases, it is necessary to replace the system implementation of the algorithm with its own; in others, it should be supplemented. In this case, it is important to leave the possibility of using the intercepted function for its own purposes.

The traditional approach to intercepting is the use of the concept of “wrappers”, which allows pre-and post-processing to be implemented while preserving the ability to access the original functionality of the intercepted function.

The basis of most methods for intercepting functions is patching — modifying the kernel code so that it can transfer control to the interceptor function when the target function is called. At the same time, due to the developed command system of the x86 architecture, there may be a number of options for changing the flow of execution (yes, JMP is only one of them: more ).

Method of interception



The essence of the described method of intercepting will be to modify the prolog (beginning) of the objective function so that its execution by the processor will result in the transfer of control to the handler function.

In other words, for each target function, we modify the prolog by writing a JMP command to its beginning. This allows you to switch the execution flow from the target function to the corresponding handler.

For example, if, before interception, the inode_permission function is:

inode_permission: 0xffffffff811c4530 <+0>: nopl 0x0(%rax,%rax,1) 0xffffffff811c4535 <+5>: push %rbp 0xffffffff811c4536 <+6>: test $0x2,%sil 0xffffffff811c453a <+10>: mov 0x28(%rdi),%rax 0xffffffff811c453e <+14>: mov %rsp,%rbp 0xffffffff811c4541 <+17>: jne 0xffffffff811c454a <inode_permission+26> 0xffffffff811c4543 <+19>: callq 0xffffffff811c4470 <__inode_permission> 


Then after the interception, the prologue of this function will be as follows:

 inode_permission: 0xffffffff811c4530 <+0>: jmpq 0xffffffffa05a60e0 =>     0xffffffff811c4535 <+5>: push %rbp 0xffffffff811c4536 <+6>: test $0x2,%sil 0xffffffff811c453a <+10>: mov 0x28(%rdi),%rax 0xffffffff811c453e <+14>: mov %rsp,%rbp 0xffffffff811c4541 <+17>: jne 0xffffffff811c454a <inode_permission+26> 0xffffffff811c4543 <+19>: callq 0xffffffff811c4470 <__inode_permission> 


It is a five-byte JMP command written over the original instructions with the code E9.XX.XX.XX.XX leading to the transfer of control. This is the main essence of the described method of interception. Next, we will consider some features of its implementation in the Linux kernel.

Features of the interception of functions



As noted, the essence of the patch is to modify the kernel code. The main problem arising from this is that writing to memory pages containing the code is impossible because in the x86 architecture, there is a special protection mechanism, according to which an attempt to write to write-protected memory areas can lead to an exception being thrown. This mechanism is called “page protection” and is the basis for the implementation of many functions of the kernel, such as COW . The behavior of the processor in this situation is determined by the WP register's CR0 bit, and page access permissions are described in the corresponding PTE descriptor structure. When the WP register CR0 bit is set, an attempt to write to write-protected pages (the RW bit is reset in PTE) leads to the processor generating the corresponding exception ( #GP ).

Often, the solution to this problem is to temporarily turn off page protection by resetting the WP register CR0 bit. This solution is the place to be, however, it should be used with caution, because, as noted, the page protection mechanism is the basis for many core mechanisms. In addition, on SMP systems, a thread running on one of the processors and removing the WP bit in the same place can be interrupted and moved to another processor!

A better and sufficiently universal way to create temporary mappings. Due to the nature of the MMU, several descriptors referring to it with different attributes can be created for each physical memory frame. This allows you to create a writeable mapping for the target memory area. This method is used in the Ksplice project (fork on github 'e). Below is the map_writable function, which creates such a mapping:

 /* * map_writable creates a shadow page mapping of the range * [addr, addr + len) so that we can write to code mapped read-only. * * It is similar to a generalized version of x86's text_poke. But * because one cannot use vmalloc/vfree() inside stop_machine, we use * map_writable to map the pages before stop_machine, then use the * mapping inside stop_machine, and unmap the pages afterwards. * * STOLEN from: https://github.com/jirislaby/ksplice */ static void *map_writable(void *addr, size_t len) { void *vaddr; int nr_pages = DIV_ROUND_UP(offset_in_page(addr) + len, PAGE_SIZE); struct page **pages = kmalloc(nr_pages * sizeof(*pages), GFP_KERNEL); void *page_addr = (void *)((unsigned long)addr & PAGE_MASK); int i; if (pages == NULL) return NULL; for (i = 0; i < nr_pages; i++) { if (__module_address((unsigned long)page_addr) == NULL) { pages[i] = virt_to_page(page_addr); WARN_ON(!PageReserved(pages[i])); } else { pages[i] = vmalloc_to_page(page_addr); } if (pages[i] == NULL) { kfree(pages); return NULL; } page_addr += PAGE_SIZE; } vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); kfree(pages); if (vaddr == NULL) return NULL; return vaddr + offset_in_page(addr); } 


Using this function will create a recordable display for any area of ​​memory. The release of the region created in this way is performed using the vfree function, the argument of which should be the address value aligned to the page boundary. Additional information regarding this method of modifying write-protected pages is presented in this article.

The next important point is that during the modification by patching, one way or another, part of the prolog of the objective function is erased. It is not worth paying attention to, if you do not intend to use this feature further. However, if for some reason the algorithm implemented by the target function can be useful after patching, it is worthwhile to ensure the possibility of executing the “old” code given the “corruption” of the existing prologue.

The following is an illustration of which is a schematic representation of the process of intercepting a function while preserving the ability to access the original functionality.

image


In the example, numeral 1 marks the transfer of control from the target function to the interceptor function (JMP command), numeral 2 indicates a call to the original function using the saved part of the prologue (CALL command), numeral 3 returns control to the part of the original function that was not modified (command JMP), and finally, the number 4 - return control to complete the call to the original function from the interceptor (RET command). Thus, it is possible to use the capabilities implemented by the function being intercepted.

Implementing Interception Functions



We will describe each intercepted function with the following structure :

 typedef struct { /* tagret's name */ char * name; /* target's insn length */ int length; /* target's handler address */ void * handler; /* target's address and rw-mapping */ void * target; void * target_map; /* origin's address and rw-mapping */ void * origin; void * origin_map; atomic_t usage; } khookstr_t; 


Here, name is the name of the function being intercepted (symbol name), length is the length of the overwritten sequence of prolog instructions, handler is the address of the interceptor function, target is the address of the target function itself, target_map is the address available for recording the projection of the objective function, origin is the address of the adapter function , used to access the original functionality, origin_map is the address of the projection of the corresponding adapter available for recording, usage is the “sticking” counter, which takes into account the number of threads sleeping in the interception.

Each intercepted function must be represented by such a structure. To do this, in order to simplify the registration of interceptors, use the macro DECLARE_KHOOK (...) , represented as follows:

 #define __DECLARE_TARGET_ALIAS(t) \ void __attribute__((alias("khook_"#t))) khook_alias_##t(void) #define __DECLARE_TARGET_ORIGIN(t) \ void notrace khook_origin_##t(void){\ asm volatile ( \ ".rept 0x20\n" \ ".byte 0x90\n" \ ".endr\n" \ ); \ } #define __DECLARE_TARGET_STRUCT(t) \ khookstr_t __attribute__((unused,section(".khook"),aligned(1))) __khook_##t #define DECLARE_KHOOK(t) \ __DECLARE_TARGET_ALIAS(t); \ __DECLARE_TARGET_ORIGIN(t); \ __DECLARE_TARGET_STRUCT(t) = { \ .name = #t, \ .handler = khook_alias_##t, \ .origin = khook_origin_##t, \ .usage = ATOMIC_INIT(0), \ } 


Auxiliary macros __DECLARE_TARGET_ALIAS(...) , __DECLARE_TARGET_ORIGIN(...) declare an interceptor and an adapter (32 nop'a). The structure itself is declared by the macro __DECLARE_TARGET_STRUCT(...) , using the section attribute, defining it into a special section ( .khook ).

When a kernel module is loaded, all registered interceptions are listed (see khook_for_each ) represented by structures in the section named .khook. Each of them is searched for the address of the corresponding symbol (see get_symbol_address ), as well as setting auxiliary elements, including creating mappings (see map_witable ):

 static int init_hooks(void) { khookstr_t * s; khook_for_each(s) { s->target = get_symbol_address(s->name); if (s->target) { s->target_map = map_writable(s->target, 32); s->origin_map = map_writable(s->origin, 32); if (s->target_map && s->origin_map) { if (init_origin_stub(s) == 0) { atomic_inc(&s->usage); continue; } } } debug("Failed to initalize \"%s\" hook\n", s->name); } /* apply patches */ stop_machine(do_init_hooks, NULL, NULL); return 0; } 


An important role is played by the function init_origin_stub , which initializes and builds the adapter used to call the original function after interception:

 static int init_origin_stub(khookstr_t * s) { ud_t ud; ud_initialize(&ud, BITS_PER_LONG, \ UD_VENDOR_ANY, (void *)s->target, 32); while (ud_disassemble(&ud) && ud.mnemonic != UD_Iret) { if (ud.mnemonic == UD_Ijmp || ud.mnemonic == UD_Iint3) { debug("It seems that \"%s\" is not a hooking virgin\n", s->name); return -EINVAL; } #define JMP_INSN_LEN (1 + 4) s->length += ud_insn_len(&ud); if (s->length >= JMP_INSN_LEN) { memcpy(s->origin_map, s->target, s->length); x86_put_jmp(s->origin_map + s->length, s->origin + s->length, s->target + s->length); break; } } return 0; } 


As you can see, the udis86 disassembler is used to determine the number of instructions that are erased when patching the prologue. In principle, any disassembler with the function of determining the length of the instruction (the so-called Length-Disassembler Engine, LDE) is suitable for this purpose. I use for this purpose the complete disassembler udis86, which has a BSD license and has proven itself well. As soon as the number of instructions is determined, they are copied to the origin_map address, which corresponds to the RW projection of the 32-byte origin adapter. Finally , after the saved commands using x86_put_jmp, a command is inserted that returns control to the original code of the objective function that has not been changed.

The last element to make the modification of the kernel code safe is the stop_machine mechanism:

 #include <linux/stop_machine.h> int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) 


The bottom line is that stop_machine executes the fn function with a given set of processors active at the time of execution, which is set by the corresponding cpumask mask. This is exactly what allows using this mechanism for modifying the kernel code, since setting the appropriate mask automatically eliminates the need to keep track of those kernel threads, the execution of which may affect the modified code.

Using



An example of use is illustrated by intercepting the function inode_permission . Given the considered macros, the interception sequence of the function will be as follows:

 #include <linux/fs.h> DECLARE_KHOOK(inode_permission); int khook_inode_permission(struct inode * inode, int mode) { int result; KHOOK_USAGE_INC(inode_permission); debug("%s(%pK,%08x) [%s]\n", __func__, inode, mode, current->comm); result = KHOOK_ORIGIN(inode_permission, inode, mode); debug("%s(%pK,%08x) [%s] = %d\n", __func__, inode, mode, current->comm, result); KHOOK_USAGE_DEC(inode_permission); return result; } 


To work out the DECLARE_KHOOK(...) macro, it is necessary that there is a prototype of the function being intercepted ( linux/fs.h for inode_permission ). Further, in the implementation of the interceptor function (having the prefix khook_ ), you can do anything. For example, I display a debug message before and after calling the original inode_permission function.

Thus, through interception, the possibility of replacing functions, as well as replacing the passed parameters and the execution result, is implemented, which corresponds to the concept of embedding, which declares the possibility of redefining / supplementing the OS kernel mechanisms.

Traditionally, kernel module code that implements the necessary actions to intercept functions is available on github .

Source: https://habr.com/ru/post/237089/


All Articles