⬆️ ⬇️

Managed by PageFault in the Linux kernel

Exception handling takes an important place in the process of functioning of software systems. Indeed, ensuring timely and correct response to abnormal events is one of the key tasks performed by the operating system and, in particular, its core. Being modern, the Linux kernel provides the ability to control the process of exception handling, however, due to the limitations of its interface, this mechanism is not common among developers of kernel modules.



Further, on the example of PageFault, some features of the exception handling process will be considered, as well as a description of the method allowing to use this feature in the development of Linux kernel modules for the x86 architecture.





')

Kernel exceptions





As an example of where and how exceptions are used in the kernel, it is worth considering copying data between kernel space and user space. Usually, the copy_from_user and copy_to_user functions are responsible for this, a feature of which, in contrast to memcpy is that they correctly handle exceptions that occur during data transfer between different address spaces.



Indeed, if we consider the situation when data is copied from the kernel to the user ( copy_to_user function), it is possible that situations arise when the user’s process page into which the recording is attempted is in a swap or is inaccessible to the process. And if in the first case the correct solution to the problem is to load this page and continue copying, in the second case it is necessary to interrupt the operation and return the error code to the user (for example, -EINVAL ).



It is obvious that the execution of the command that accesses the address corresponding to the missing page causes an exception, namely the exception of the page fault, or Page Fault ( #PF ). At this moment, the kernel saves the context of the current task and executes the code of the corresponding handler, do_page_fault . Anyway, by eliminating the problem, the kernel restores the context of the interrupted task. However, depending on the result of processing the exception, the return address may differ from the address of the instruction that caused the exception. In other words, thanks to the mechanism provided for in the kernel, it is possible to set an address for a potentially “dangerous” instruction from which work will be continued in the event of an exception generated during its execution.



Exception handling interface





To understand how the indicated mechanism is implemented, it is worth considering the implementation of the primitive of copying 4 bytes from the kernel to the user - the function __put_user_4 :



 62 ENTRY(__put_user_4) 63 ENTER 64 mov TI_addr_limit(%_ASM_BX),%_ASM_BX 65 sub $3,%_ASM_BX 66 cmp %_ASM_BX,%_ASM_CX 67 jae bad_put_user 68 ASM_STAC 69 3: movl %eax,(%_ASM_CX) <-     70 xor %eax,%eax 71 EXIT 72 ENDPROC(__put_user_4) ... 89 bad_put_user: 90 CFI_STARTPROC 91 movl $-EFAULT,%eax 92 EXIT ... 98 _ASM_EXTABLE(3b,bad_put_user) 




As can be seen, in addition to checking the range of addresses, this function directly movl data (the movl instruction on line 69). It is here that an exception can be expected, since in addition to the fact that the target address really belongs to the range of user-space addresses, nothing more is known about it. Next, you should pay attention to the _ASM_EXTABLE macro, which is the following:



 43 # define _ASM_EXTABLE(from,to) \ 44 .pushsection "__ex_table","a" ; \ 45 .balign 8 ; \ 46 .long (from) - . ; \ 47 .long (to) - . ; \ 48 .popsection 




The action of this macro is to add to the special section __ex_table two values ​​- from and to , which, as it is not difficult to see, correspond to the addresses of the “suspicious” instruction in line 69 and the instruction that will be executed after processing the exception, namely, bad_put_user . Adding an entry to the __ex_table table makes the point of failure manageable, since This table is used by the kernel when handling exceptions.



Exception tables and their handling





So, as noted, the exception table is a central place where information is stored about those instructions, the error in the execution of which must be processed separately. Looking ahead, it is worth noting that in addition to the table of the kernel itself, an individual table is also provided for each module. However, now it is worth considering the structure of its element, described by the structure exception_table_entry :



 97 struct exception_table_entry { 98 int insn, fixup; 99 }; 




As you can see, the format of the table element corresponds to what was revealed when reviewing the _ASM_EXTABLE macro. The first element describes the instruction, the second - the code to which control will be transferred in the event of an exception. Each time a page __ex_table occurs, the Linux kernel, among other things, checks whether the address of the command that caused the exception is in the __ex_table kernel table, or in one of the tables of loaded modules. If such a record is found, then the corresponding action is taken. Otherwise, the kernel executes some kind of standard logic for completing exception handling.



As for the individual exception tables of the kernel modules, the format of the elements of these tables is standard and corresponds to that for the kernel. The reference to such a table for each module is available by the pointer THIS_MODULE->extable , whereas the number of elements of the table is contained in the variable THIS_MODULE->num_exentries . The macro itself THIS_MODULE gives a link to the structure-descriptor of the module:



 223 struct module 224 { ... 276 /* Exception table */ 277 unsigned int num_exentries; 278 struct exception_table_entry *extable; ... 378 }; 




The following is a key kernel function that searches for a handler that matches the statement that caused the exception. Here is its code :



  50 /* Given an address, look for it in the exception tables. */ 51 const struct exception_table_entry *search_exception_tables(unsigned long addr) 52 { 53 const struct exception_table_entry *e; 54 55 e = search_extable(__start___ex_table, __stop___ex_table-1, addr); 56 if (!e) 57 e = search_module_extables(addr); 58 return e; 59 } 




As you can see, really, first of all, the search is performed in the base table of the __ex_table kernel and only then, if there is no result, continues among the exception tables of the modules. If none of the handlers matches the instruction address, the result of the kernel’s execution of this function is NULL . Otherwise, the result will be a pointer to the corresponding element of the exception table.



Exception handling in the kernel module





So, if the procedure for handling exceptions is in general clear, then for training you can create a module whose purpose is to create exceptions and handle them. The code I have already written is available on github . Further I will give a brief description of the code and give some comments.



So, let the PageFault exception generation be handled by a function that makes the usual NULL pointer dereference:



 static void raise_page_fault(void) { debug(" %s enter\n", __func__); ((int *)0)[0] = 0xdeadbeef; debug(" %s leave\n", __func__); } 




Obviously, an attempt to write on a null pointer will lead to a fall. And this is exactly what you need. In order to properly respond, you must:







Below is a function that performs the above steps using disassembly using udis86 :



 static int fixup_page_fault(struct exception_table_entry * entry) { ud_t ud; ud_initialize(&ud, BITS_PER_LONG, \ UD_VENDOR_ANY, (void *)raise_page_fault, 128); while (ud_disassemble(&ud) && ud.mnemonic != UD_Iret) { if (ud.mnemonic == UD_Imov && \ ud.operand[0].type == UD_OP_MEM && ud.operand[1].type == UD_OP_IMM) { unsigned long address = \ (unsigned long)raise_page_fault + ud_insn_off(&ud); extable_make_insn(entry, address); extable_make_fixup(entry, address + ud_insn_len(&ud)); return 0; } } return -EINVAL; } 




As you can see, the disassembler is set up first (analysis start - raise_page_fault ). Further, with a given search depth, commands are searched. The required command (what the operation is translated into ((int *)0)[0] = 0xdeadbeef; ) is the usual movl $0xdeadbeef, 0 with the first operand of the UD_OP_MEM type and the second one of the UD_OP_IMM type. As soon as the address of the command is found, a table element is formed. At the same time, the following functions are performed:



 static void extable_make_insn(struct exception_table_entry * entry, unsigned long addr) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0) entry->insn = (unsigned int)((addr - (unsigned long)&entry->insn)); #else entry->insn = addr; #endif } static void extable_make_fixup(struct exception_table_entry * entry, unsigned long addr) { #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0) entry->fixup = (unsigned int)((addr - (unsigned long)&entry->fixup)); #else entry->fixup = addr; #endif } 




The first of these, forms the address of the instructions in the structure. The second is the fixup address, i.e. commands to which control will be transferred. It is important to note that since the 3.5 kernel, small changes have occurred in the structure of the exception_table_entry , namely, the dimension of its fields has been reduced - insn and fixup for 64-bit architectures. This made it possible to reduce the amount of memory required for storing addresses, but the logic of calculation has changed slightly. So, after the 3.5 kernel, the insn and fixup store 32-bit values ​​corresponding to the address offsets relative to these elements. For those who are interested, I bring a commit, which spoiled everything 706276543b699d80f546e45f8b12574e7b18d952 .



Conclusion





This example demonstrates the ability to handle exception handling in the Linux kernel using a kernel module. In the test case, the exception (PageFault) was called in the previously prepared environment, namely the configured table of the exables module. The latter circumstance made it possible to eliminate the abnormal termination and continue the execution of the program with the command following the emergency instruction.



In addition, the prepared test case allows us to evaluate the possibility of processing some other exceptions, such as division error (#DE) and undefined opcode (#UD):



 struct { const char * name; int (* fixup)(struct exception_table_entry *); void (* raise)(void); } exceptions[] = { { .name = "0x00 - div0 error (#DE)", .fixup = fixup_div0_error, .raise = raise_div0_error, }, { .name = "0x06 - undefined opcode (#UD)", .fixup = fixup_undefined_opcode, .raise = raise_undefined_opcode, }, { .name = "0x14 - page fault (#PF)", .fixup = fixup_page_fault, .raise = raise_page_fault, }, }; 

Source: https://habr.com/ru/post/196952/



All Articles