Intercepting kernel functions is a basic method that allows you to redefine / supplement its various mechanisms. Based on the fact that the Linux kernel is written almost entirely in C, with the exception of small architecture-specific parts, it can be argued that to implement embedding in most of the kernel components, it is enough to be able to intercept the corresponding functions.
This article is a continuation of the
previously announced cycle devoted to the particular issues of the implementation of the imposed remedies and, in particular, embedding into the software systems.
')
The purpose of intercepting any function is to get control at the time it is called. Further actions depend on specific tasks. In some cases, it is necessary to replace the system implementation of the algorithm with its own; in others, it should be supplemented. In this case, it is important to leave the possibility of using the intercepted function for its own purposes.
The traditional approach to intercepting is the use of the concept of “wrappers”, which allows pre-and post-processing to be implemented while preserving the ability to access the original functionality of the intercepted function.
The basis of most methods for intercepting functions is patching — modifying the kernel code so that it can transfer control to the interceptor function when the target function is called. At the same time, due to the developed command system of the x86 architecture, there may be a number of options for changing the flow of execution (yes, JMP is only one of them:
more ).
Method of interception
The essence of the described method of intercepting will be to modify the prolog (beginning) of the objective function so that its execution by the processor will result in the transfer of control to the handler function.
In other words, for each target function, we modify the prolog by writing a JMP command to its beginning. This allows you to switch the execution flow from the target function to the corresponding handler.
For example, if, before interception, the
inode_permission function is:
inode_permission: 0xffffffff811c4530 <+0>: nopl 0x0(%rax,%rax,1) 0xffffffff811c4535 <+5>: push %rbp 0xffffffff811c4536 <+6>: test $0x2,%sil 0xffffffff811c453a <+10>: mov 0x28(%rdi),%rax 0xffffffff811c453e <+14>: mov %rsp,%rbp 0xffffffff811c4541 <+17>: jne 0xffffffff811c454a <inode_permission+26> 0xffffffff811c4543 <+19>: callq 0xffffffff811c4470 <__inode_permission>
Then after the interception, the prologue of this function will be as follows:
inode_permission: 0xffffffff811c4530 <+0>: jmpq 0xffffffffa05a60e0 => 0xffffffff811c4535 <+5>: push %rbp 0xffffffff811c4536 <+6>: test $0x2,%sil 0xffffffff811c453a <+10>: mov 0x28(%rdi),%rax 0xffffffff811c453e <+14>: mov %rsp,%rbp 0xffffffff811c4541 <+17>: jne 0xffffffff811c454a <inode_permission+26> 0xffffffff811c4543 <+19>: callq 0xffffffff811c4470 <__inode_permission>
It is a five-byte JMP command written over the original instructions with the code E9.XX.XX.XX.XX leading to the transfer of control. This is the main essence of the described method of interception. Next, we will consider some features of its implementation in the Linux kernel.
Features of the interception of functions
As noted, the essence of the patch is to modify the kernel code. The main problem arising from this is that writing to memory pages containing the code is impossible because in the x86 architecture, there is a special protection mechanism, according to which an attempt to write to write-protected memory areas can lead to an exception being thrown. This mechanism is called “page protection” and is the basis for the implementation of many functions of the kernel, such as
COW . The behavior of the processor in this situation is determined by the WP register's
CR0 bit, and page access permissions are described in the corresponding PTE descriptor structure. When the WP register CR0 bit is set, an attempt to write to write-protected pages (the RW bit is reset in PTE) leads to the processor generating the corresponding exception (
#GP ).
Often, the solution to this problem is to temporarily turn off page protection by resetting the WP register CR0 bit. This solution is the place to be, however, it should be used with caution, because, as noted, the page protection mechanism is the basis for many core mechanisms. In addition, on SMP systems, a thread running on one of the processors and removing the WP bit in the same place can be interrupted and moved to another processor!
A better and sufficiently universal way to create temporary mappings. Due to the nature of the MMU, several descriptors referring to it with different attributes can be created for each physical memory frame. This allows you to create a writeable mapping for the target memory area. This method is used in the
Ksplice project (fork on
github 'e). Below is the map_writable function, which creates such a mapping:
static void *map_writable(void *addr, size_t len) { void *vaddr; int nr_pages = DIV_ROUND_UP(offset_in_page(addr) + len, PAGE_SIZE); struct page **pages = kmalloc(nr_pages * sizeof(*pages), GFP_KERNEL); void *page_addr = (void *)((unsigned long)addr & PAGE_MASK); int i; if (pages == NULL) return NULL; for (i = 0; i < nr_pages; i++) { if (__module_address((unsigned long)page_addr) == NULL) { pages[i] = virt_to_page(page_addr); WARN_ON(!PageReserved(pages[i])); } else { pages[i] = vmalloc_to_page(page_addr); } if (pages[i] == NULL) { kfree(pages); return NULL; } page_addr += PAGE_SIZE; } vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); kfree(pages); if (vaddr == NULL) return NULL; return vaddr + offset_in_page(addr); }
Using this function will create a recordable display for any area of ​​memory. The release of the region created in this way is performed using the
vfree function, the argument of which should be the address value aligned to the page boundary. Additional information regarding this method of modifying write-protected pages is presented in
this article.
The next important point is that during the modification by patching, one way or another, part of the prolog of the objective function is erased. It is not worth paying attention to, if you do not intend to use this feature further. However, if for some reason the algorithm implemented by the target function can be useful after patching, it is worthwhile to ensure the possibility of executing the “old” code given the “corruption” of the existing prologue.
The following is an illustration of which is a schematic representation of the process of intercepting a function while preserving the ability to access the original functionality.
In the example, numeral
1 marks the transfer of control from the target function to the interceptor function (JMP command), numeral
2 indicates a call to the original function using the saved part of the prologue (CALL command), numeral
3 returns control to the part of the original function that was not modified (command JMP), and finally, the number
4 - return control to complete the call to the original function from the interceptor (RET command). Thus, it is possible to use the capabilities implemented by the function being intercepted.
Implementing Interception Functions
We will describe each intercepted function with the following
structure :
typedef struct { char * name; int length; void * handler; void * target; void * target_map; void * origin; void * origin_map; atomic_t usage; } khookstr_t;
Here, name is the name of the function being intercepted (symbol name), length is the length of the overwritten sequence of prolog instructions, handler is the address of the interceptor function, target is the address of the target function itself, target_map is the address available for recording the projection of the objective function, origin is the address of the adapter function , used to access the original functionality, origin_map is the address of the projection of the corresponding adapter available for recording, usage is the “sticking” counter, which takes into account the number of threads sleeping in the interception.
Each intercepted function must be represented by such a structure. To do this, in order to simplify the registration of interceptors, use the macro
DECLARE_KHOOK (...) , represented as follows:
#define __DECLARE_TARGET_ALIAS(t) \ void __attribute__((alias("khook_"#t))) khook_alias_##t(void) #define __DECLARE_TARGET_ORIGIN(t) \ void notrace khook_origin_##t(void){\ asm volatile ( \ ".rept 0x20\n" \ ".byte 0x90\n" \ ".endr\n" \ ); \ } #define __DECLARE_TARGET_STRUCT(t) \ khookstr_t __attribute__((unused,section(".khook"),aligned(1))) __khook_##t #define DECLARE_KHOOK(t) \ __DECLARE_TARGET_ALIAS(t); \ __DECLARE_TARGET_ORIGIN(t); \ __DECLARE_TARGET_STRUCT(t) = { \ .name = #t, \ .handler = khook_alias_##t, \ .origin = khook_origin_##t, \ .usage = ATOMIC_INIT(0), \ }
Auxiliary macros
__DECLARE_TARGET_ALIAS(...)
,
__DECLARE_TARGET_ORIGIN(...)
declare an interceptor and an adapter (32 nop'a). The structure itself is declared by the macro
__DECLARE_TARGET_STRUCT(...)
, using the
section
attribute, defining it into a special section (
.khook ).
When a kernel module is loaded, all registered interceptions are listed (see
khook_for_each ) represented by structures in the section named
.khook. Each of them is searched for the address of the corresponding symbol (see
get_symbol_address ), as well as setting auxiliary elements, including creating mappings (see
map_witable ):
static int init_hooks(void) { khookstr_t * s; khook_for_each(s) { s->target = get_symbol_address(s->name); if (s->target) { s->target_map = map_writable(s->target, 32); s->origin_map = map_writable(s->origin, 32); if (s->target_map && s->origin_map) { if (init_origin_stub(s) == 0) { atomic_inc(&s->usage); continue; } } } debug("Failed to initalize \"%s\" hook\n", s->name); } stop_machine(do_init_hooks, NULL, NULL); return 0; }
An important role is played by the function
init_origin_stub , which initializes and builds the adapter used to call the original function after interception:
static int init_origin_stub(khookstr_t * s) { ud_t ud; ud_initialize(&ud, BITS_PER_LONG, \ UD_VENDOR_ANY, (void *)s->target, 32); while (ud_disassemble(&ud) && ud.mnemonic != UD_Iret) { if (ud.mnemonic == UD_Ijmp || ud.mnemonic == UD_Iint3) { debug("It seems that \"%s\" is not a hooking virgin\n", s->name); return -EINVAL; } #define JMP_INSN_LEN (1 + 4) s->length += ud_insn_len(&ud); if (s->length >= JMP_INSN_LEN) { memcpy(s->origin_map, s->target, s->length); x86_put_jmp(s->origin_map + s->length, s->origin + s->length, s->target + s->length); break; } } return 0; }
As you can see, the udis86 disassembler is used to determine the number of instructions that are erased when patching the prologue. In principle, any disassembler with the function of determining the length of the instruction (the so-called Length-Disassembler Engine, LDE) is suitable for this purpose. I use for this purpose the complete disassembler udis86, which has a BSD license and has proven itself well. As soon as the number of instructions is determined, they are copied to the
origin_map
address, which corresponds to the RW projection of the 32-byte
origin
adapter.
Finally , after the saved commands using
x86_put_jmp, a command is inserted that returns control to the original code of the objective function that has not been changed.
The last element to make the modification of the kernel code safe is the
stop_machine mechanism:
#include <linux/stop_machine.h> int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
The bottom line is that
stop_machine
executes the
fn
function with a given set of processors active at the time of execution, which is set by the corresponding cpumask mask. This is exactly what allows using this mechanism for modifying the kernel code, since setting the appropriate mask automatically eliminates the need to keep track of those kernel threads, the execution of which may affect the modified code.
Using
An example of use is illustrated by intercepting the function
inode_permission
. Given the considered macros, the interception sequence of the function will be as follows:
#include <linux/fs.h> DECLARE_KHOOK(inode_permission); int khook_inode_permission(struct inode * inode, int mode) { int result; KHOOK_USAGE_INC(inode_permission); debug("%s(%pK,%08x) [%s]\n", __func__, inode, mode, current->comm); result = KHOOK_ORIGIN(inode_permission, inode, mode); debug("%s(%pK,%08x) [%s] = %d\n", __func__, inode, mode, current->comm, result); KHOOK_USAGE_DEC(inode_permission); return result; }
To work out the
DECLARE_KHOOK(...)
macro, it is necessary that there is a prototype of the function being intercepted (
linux/fs.h
for
inode_permission
). Further, in the implementation of the interceptor function (having the prefix
khook_
), you can do anything. For example, I display a debug message before and after calling the original
inode_permission
function.
Thus, through interception, the possibility of replacing functions, as well as replacing the passed parameters and the execution result, is implemented, which corresponds to the concept of embedding, which declares the possibility of redefining / supplementing the OS kernel mechanisms.
Traditionally, kernel module code that implements the necessary actions to intercept functions is available on
github .