📜 ⬆️ ⬇️

A kosher way to modify write-protected areas of the Linux kernel

Those who have ever encountered the need to change something in the kernel on the fly know firsthand that this issue requires detailed study, because the kernel memory pages that store code and some data are marked as “read-only” and are protected from writing!

For x86, the known solution is to temporarily disable page protection by resetting the WP register's CR0 bit. But it should be used with caution, because page protection is the basis for many core mechanisms. In addition, it is necessary to take into account the peculiarities of work on SMP systems, when various unpleasant situations are possible.


')

Disable paging protection



In the x86 architecture, there is a special protection mechanism, according to which an attempt to write to write-protected memory areas can lead to an exception being thrown. This mechanism is called "page protection" and is the base for the implementation of many functions of the kernel, such as COW . The behavior of the processor in this situation is determined by the WP register's CR0 bit, and page access permissions are described in the corresponding PTE descriptor structure. When the WP register CR0 bit is set, an attempt to write to write-protected pages (the RW bit is reset in PTE) leads to the processor generating the corresponding exception (#GP).

The simplest solution to this problem is to temporarily disable page protection by resetting the WP register's CR0 bit. This solution is the place to be, but it should be used with caution, because, as noted, the paging mechanism is the basis for many core mechanisms. In addition, on SMP systems, a thread running on one of the processors and removing the WP bit in the same place can be interrupted and moved to another processor!

However, if you really want to, you need to do this by turning off the preemption , as recommended here :

static inline unsigned long native_pax_open_kernel(void) { unsigned long cr0; preempt_disable(); barrier(); cr0 = read_cr0() ^ X86_CR0_WP; BUG_ON(unlikely(cr0 & X86_CR0_WP)); write_cr0(cr0); return cr0 ^ X86_CR0_WP; } static inline unsigned long native_pax_close_kernel(void) { unsigned long cr0; cr0 = read_cr0() ^ X86_CR0_WP; BUG_ON(unlikely(!(cr0 & X86_CR0_WP))); write_cr0(cr0); barrier(); preempt_enable_no_resched(); return cr0 ^ X86_CR0_WP; } 


Use mappings



A better and sufficiently universal way to create temporary mappings. Due to the nature of the MMU, several descriptors referring to it with different attributes can be created for each physical memory frame. This allows you to create a writeable mapping for the target memory area. This method is used in the Ksplice project ( fork on github'e). Below is the map_writable function, which creates such a mapping:

 /* * map_writable creates a shadow page mapping of the range * [addr, addr + len) so that we can write to code mapped read-only. * * It is similar to a generalized version of x86's text_poke. But * because one cannot use vmalloc/vfree() inside stop_machine, we use * map_writable to map the pages before stop_machine, then use the * mapping inside stop_machine, and unmap the pages afterwards. */ static void *map_writable(void *addr, size_t len) { void *vaddr; int nr_pages = DIV_ROUND_UP(offset_in_page(addr) + len, PAGE_SIZE); struct page **pages = kmalloc(nr_pages * sizeof(*pages), GFP_KERNEL); void *page_addr = (void *)((unsigned long)addr & PAGE_MASK); int i; if (pages == NULL) return NULL; for (i = 0; i < nr_pages; i++) { if (__module_address((unsigned long)page_addr) == NULL) { pages[i] = virt_to_page(page_addr); WARN_ON(!PageReserved(pages[i])); } else { pages[i] = vmalloc_to_page(page_addr); } if (pages[i] == NULL) { kfree(pages); return NULL; } page_addr += PAGE_SIZE; } vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); kfree(pages); if (vaddr == NULL) return NULL; return vaddr + offset_in_page(addr); } 


Using this function will create a recordable display for any area of ​​memory. The release of the region created in this way is performed using the vfree function, the argument of which should be the address value aligned to the page boundary.

Stop the car!



The last element to make the modification of the kernel code safe is the stop_machine mechanism:

 #include <linux/stop_machine.h> int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus) 


The bottom line is that stop_machine executes the fn function with a given set of processors active at the time of execution, which is set by the corresponding cpumask mask. This is exactly what allows using this mechanism for modifying the kernel code, since setting the appropriate mask automatically eliminates the need to keep track of those kernel threads, the execution of which may affect the modified code.

Among the limitations of stop_machine it is worth noting that the function being executed must work in atomic context, which automatically excludes the possibility of using the previously described mechanism for creating temporary mappings via vmap . However, this circumstance is not significant, because the required mappings can be prepared before calling stop_machine .

Source: https://habr.com/ru/post/207122/


All Articles