Replacing a system call handler

Good day to all! I am a second-year student of a technical college. A couple of months ago it was time to choose a topic for a course project. Themes like a calculator did not suit me. Therefore, I wondered if there was anything more interesting, and received an affirmative answer. "Replacing the system call handler" is my topic.

Introduction

An interrupt handler (or interrupt service routine) is a special interrupt routine to perform its processing. These handlers are called either by a hardware interrupt or by a corresponding instruction in the program, and are usually designed to interact with devices or to call functions of the operating system (wiki).

What for?

The main goal, perhaps, is visually on the working system to see how it works, and not to “nibble” dry theory. Well, or as before, programmers tried to "do multitasking" in DOS, redefining the timer event handler.

32-bit handler

How to approach?
Having a little searching on the Internet ( this post was especially useful) and “smoking” the Linux architecture manuals, I found an implementation of replacing the 32-bit interrupt in C. Everything turned out to be simpler than I thought.
')
Let us analyze in order.
System call (English system call) - the appeal of the application program to the operating system kernel to perform any operation (wiki). The addresses of system call handlers are stored by the kernel in the system call table (sys_call_table). A handler located at one of these addresses is called whenever a program calls an 80h interrupt with the number of a system call in the eax register (for example, eax = 4 for a write system call that writes to a file or output device) . Knowing the address of this table and the number of the desired call, you can replace its handler with your own code.

So, with a 32-bit interrupt sorted out.

The interrupt substitution algorithm is extremely simple:

look for the address of the system call table (sys_call_table)
look for the address of the required system call in it
write the address of our handler instead

After such manipulations, when we call a substitute interrupt, our handler will be called.
To do this, we write a kernel module. Why module? Yes, everything is simple, the module is a program code that can be loaded or unloaded from memory as needed. Thereby we will expand the functionality of the kernel without the need to reboot the system. The module will be written in C.

"Skeleton" of the kernel module:

static int init(void) { } static void exit(void) { } module_init(init); module_exit(exit);

The module, as can be seen from the above, should contain at least 2 functions — the module initialization function in memory (called when the module is loaded into memory) and the shutdown function (called, respectively, when the module is unloaded).
The main task that needs to be solved to replace the handler is to find out the location of the system calls table in RAM. The address of the table can be found in the file "System.map-version_ of the core". The found address is added to the compiled module. To search for an address, use the following command:

grep sys_call_table /boot/System.map-$(uname -r) |awk '{print $1}'

The command will display the found address, for example:

c05d3180

To automate the process of searching the system call table, you can write a small script, which I actually did.
To completely replace the system call with your code, you need to fully implement its functionality. Therefore, in order to avoid unnecessary headaches, we will act differently: when changing the address in the table of system calls, we will keep the previous value in some variable, and each time after performing our actions, we will transfer control to this address. This approach allows you to add your own actions without prejudice to the existing functionality and thus do not break the work of the OS.
In order not to spoil anything, our first kernel module will simply output a message to the system kernel log when the write function is called.

Important detail! When replacing a handler, it is necessary to bypass the write protection for the region of interrupt vectors. We do this by resetting the WP bit of the CR0 system register. This bit acts on the hardware level, allowing (for code with sufficient privileges) modifying memory pages, regardless of whether they are allowed to write or not. The CR0 register is accessed by the write_cr0 () and read_cr0 () macros.

Module summary code

 #include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/unistd.h> #include <asm/cacheflush.h> #include <asm/page.h> #include <asm/current.h> #include <linux/sched.h> #include <linux/kallsyms.h> unsigned long *syscall_table = (unsigned long *)0xTABLE; // TABLE -     (    c05d3180) asmlinkage int (*original_write)(unsigned int, const char __user *, size_t); asmlinkage int new_write(unsigned int fd, const char __user *buf, size_t count) { //   write printk(KERN_ALERT "It works!\n"); return (*original_write)(fd, buf, count); } static int init(void) { printk(KERN_ALERT "Module init\n"); write_cr0 (read_cr0 () & (~ 0x10000)); //  WP  original_write = (void *)syscall_table[__NR_write];//     syscall_table[__NR_write] = new_write; //    write_cr0 (read_cr0 () | 0x10000); //  WP   return 0; } static void exit(void) { write_cr0 (read_cr0 () & (~ 0x10000)); //  WP  syscall_table[__NR_write] = original_write; //      write_cr0 (read_cr0 () | 0x10000); //  WP   printk(KERN_ALERT "Module exit\n"); return; } module_init(init); module_exit(exit);

64-bit handler

The first difference I noticed was when you typed

grep sys_call_table /boot/System.map-$(uname -r) |awk '{print $1}'

Two addresses were output:

ffffffff81801300
ffffffff81805260

Changing the team a little bit, I got this result

grep sys_call_table /boot/System.map-$(uname -r)

ffffffff81801300 R sys_call_table
ffffffff81805260 R ia32_sys_call_table

Everything immediately became clear. In the 64-bit architecture, for compatibility with 32-bit, there are two tables of system calls. As you can see, one for 64-bit calls, and the second for 32-bit.
In the 32-bit architecture, __NR_write was equal to 4 (which is understandable, the write system call is number 4), and in x64 it is 1. Since I hadn’t worked with a 64-bit assembler before, I didn’t immediately understand what was happening but then I learned that sys_write in the 64-bit architecture is number 1.
Actually, on this all the differences of interest between 64-bit and 32-bit handler for the write interrupt for me are over.
Since TK involves the use of an assembler, we write the kernel module in C, and all of its functions in an assembler.

Shell script

 #!/bin/bash TABLE=$(grep ' sys_call_table' /boot/System.map-$(uname -r) |awk '{print $1}') echo $TABLE sed -is/TABLE/$TABLE/g module.c

Core module

 #include <linux/init.h> #include <linux/module.h> unsigned long *syscall_table = (unsigned long *)0xTABLE; extern void change(unsigned long *temp); extern void unchange(unsigned long *temp); static int init(void) { printk(KERN_ALERT "\nModule init\n"); change(syscall_table); return 0; } static void cleanup(void) { unchange(syscall_table); printk(KERN_ALERT "Module exit\n"); } module_init(init); module_exit(cleanup);

Auxiliary module

 global unlockWP global lockWP global change global unchange extern printk SECTION .text newwrite: mov rax, original ; original_write mov rax, QWORD[rax] ;  rdi - 4  fd call far rax ;  rsi - 8  buf ;  rdx - 8  count ;    push rax ;     xor rax, rax ;  rax mov rdi, work ;   "It works" call printk ;   printk pop rax ;   rax ,     ret change: call unlockWP ;   ; rdi -  add rdi, 8 ; rdi - syscall_table + __NR_write mov rax, QWORD [rdi] ; rax - syscall_table[__NR_write] mov rbx , original mov QWORD [rbx], rax ;     mov rax, newwrite ;       mov QWORD [rdi], rax ;    call lockWP ;   ret unchange: call unlockWP ;   ; rdi -  add rdi, 8 ; rdi - syscall_table + __NR_write mov rbx, original ;  rbx    mov rax, QWORD [rbx] ; rax - syscall_table[__NR_write] mov QWORD [rdi],rax ;      call lockWP ;   ret unlockWP: mov rax, cr0 and rax, 0xfffffffffffeffff mov cr0, rax ret lockWP: mov rax, cr0 xor rax, 0x0000000000001000 mov cr0, rax ret SECTION .data original: DQ 0,0 work: DB "It works!",10,0

Makefile

 obj-m += kmod.o kmod-objs := module.o main.o KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) module: nasm -f elf64 -o main.o main.asm make -C $(KDIR) SUBDIRS=$(PWD) modules make clean clean: rm -f *.o *mod.c *.symvers *.order

If everything went well, in the source directory we get the kmod.ko file. This is our kernel module. To check its work, you need to load it into memory. This is done using the command insmod _. To unload a module - run the command rmmod _. To check the operation of the module, we execute the dmesg , thereby outputting the kernel message buffers to the standard output stream.

Thanks for attention.

PS:

A couple of screenshots.

Running script