The state of affairs
This discussion relates to the kernel of the Linux operating system, and is of interest to developers of kernel modules and drivers for this operating system. For all others, these notes are hardly of interest.
Everyone who wrote his simplest Linux kernel module knows, and it is written in all existing books on the technique of writing Linux drivers, that only names (mainly core API functions) that are exported by the kernel can be used in the module's own code. This is one of the most confusing notions from the Linux kernel area — exporting kernel symbols. In order for a name from the kernel space to be available for binding in another module, two conditions must be met for this name: a). the name must have a global scope (in your module such names should not be declared static) and b). the name must be explicitly declared exportable, it must be explicitly written with the EXPORT_SYMBOL macro parameter (or EXPORT_SYMBOL_GPL, which is not the same by its consequences).
All names known in the kernel are dynamically displayed in the / proc / kallsyms pseudo-file, and their number is huge:
$ uname -r 3.13.0-37-generic $ cat /proc/kallsyms | wc -l 108960
The number of names exported by the kernel (provided for use in the program code of the modules) is much smaller:
')
$ cat /lib/modules/`uname -r`/build/Module.symvers | wc -l 17533
It is easy to see that several hundred thousand names are defined in the kernel (depending on the version of the kernel). But only a small part (about 10%) of these names are declared as exported, and are available for use (binding) in the code of kernel modules.
Recall that the kernel API calls are made at the absolute location address of the name. Each exported kernel (or any module) name is associated with an address, and it is used to bind when loading a module using this name. This is the main mechanism of interaction of the module with the kernel. When the system is executed, the module is dynamically loaded and becomes an integral part of the kernel code. This explains the fact that a kernel module in Linux can only be compiled for a specific kernel (usually at the installation location), and an attempt to load such a binary module with another kernel will lead to the crash of the operating system.
As a result of this brief excursion, we can formulate that the Linux kernel developers provide for developers of extensions (kernel modules) a very limited (and extremely poorly documented) set of APIs, which, in their opinion, is sufficient for writing kernel extensions. But this opinion may not coincide with the opinion of the driver developers themselves, who would like to have the entire kernel arsenal in their hands. And it is quite possible to use it, which we will discuss in the rest of the text.
Address search by name
Let's take a look at the structure of the line-entry of any (out of 108960) kernel names in / proc / kallsyms:
$ sudo cat /proc/kallsyms | grep ' T ' | grep sys_close c1176ff0 T sys_close
This is the exported name of the system call handler (POSIX) close (). (In some Linux distributions, the addresses in the line will be filled only if they are read as root, for other users a zero value will be shown in the address field.)
We could well use the sys_close () function call in the code of our module. But we cannot do this with the sys_open () call, which is perfectly symmetrical to it, because this name is not exported by the kernel. When building such a module, we get a warning like the following:
$ make ... MODPOST 2 modules WARNING: "sys_open" [/home/olej/2011_WORK/LINUX-books/examples.DRAFT/sys_call_table/md_0o.ko] undefined! ...
But an attempt to load such a module will fail:
$ sudo insmod md_0o.ko insmod: error inserting 'md_0o.ko': -1 Unknown symbol in module $ dmesg md_0o: Unknown symbol sys_open
Such a module cannot be loaded, because it contradicts the kernel integrity rules: it contains an unresolved external symbol — this symbol is not exported by the kernel for binding (that is, a warning from the compiler’s point of view looks like a critical error from the developer’s point of view).
Does the one shown above mean that only exported kernel symbols are available in the code of our module. No, it only means that the recommended way to bind by name (at the absolute address of the name) applies only to exported names. Exporting provides another additional line of control to ensure the integrity of the kernel — minimal impropriety leads to a complete crash of the operating system, sometimes it does not even have time to make the message: Oops ...
Once all kernel symbols are displayed in the / proc / kallsyms pseudo-file, the module code could take them from there. Moreover, this means that the kernel API has methods for localizing all names, and these methods can be used in your code for the same purpose. Omitting the intermediate solutions path, consider only 2 options, 2 exported calls (all definitions in the <linux / kallsyms.h> in the kernel, or see
lxr.free-electrons.com/source/include/linux/kallsyms.h ):
Call:
unsigned long kallsyms_lookup_name( const char *name );
Here name is the name we are looking for, and its absolute address is returned. The disadvantage of this option is that it appears in the kernel somewhere between the 2.6.32 and 2.6.35 kernel versions (or approximately between the distribution packs of the summer 2010 and spring 2011 editions), or rather, it was present before, but was not exported. For embedded and small systems, this can be a serious obstacle.
More general challenge:
int kallsyms_on_each_symbol( int (*fn)(void*, const char*, struct module*, unsigned long), void *data );
This challenge is more difficult, and brief explanations are needed here. The first parameter (fn) it receives a pointer to your user function, which will be sequentially (in a loop) called for all characters in the kernel table, and the second (data) is a pointer to an arbitrary block of data (parameters) that will be passed to each call This function fn ().
The prototype of the user-defined function fn, which is cyclically called for each name:
int func( void *data, const char *symb, struct module *mod, unsigned long addr );
Here:
data is a block of parameters, filled in the calling unit, and transferred from the call to the kallsyms_on_each_symbol () function (2nd call parameter), as described above, here, just and it is good to transfer the name of the character that we are looking for;
symb is a symbolic image (string) of a name from the kernel name table, which is processed on the current func call;
mod - the kernel module to which the symbol being processed belongs;
addr - the address of the character in the address space of the kernel (this, in fact, is what we are looking for);
The iteration of the kernel table names can be interrupted at the current step and no longer continue (for efficiency reasons, if we have already processed the characters we need), if the user function func returns a nonzero value.
To use the kallsyms_on_each_symbol () call, we will prepare our own wrapper function, similar in meaning to kallsyms_lookup_name ():
static void* find_sym( const char *sym ) {
Here we used the trick with the embedded definition of the function symb_fn (), which is a completely legal use of the GCC compiler extension (relative to the C standard), but we only use GCC to compile kernel modules. This code avoids the declaration of a global intermediate variable, prevents clogging of the namespace and contributes to the localization of the code.
Usage example
One of the most sacred places in the Linux operating system is the sys_call_table selector table, through which any system call takes place: having prepared the parameters in advance, writing the number (selector) of the system call with the 1st parameter, the system executes the transition command into the kernel: int 80h ( older versions) or sysenter, which is essentially the same thing. The system call number (selector, 1st parameter) is the index in the sys_call_table table (array) of pointers to the core call handling functions. We can look at the numbers of all system calls, for example, for the i386 architecture:
$ cat /usr/include/i386-linux-gnu/asm/unistd_32.h ...
Here is a table of system call indexes (numbers) used in the user's address space, implemented by the standard C library libc.so. The exact analogue of this table is also present in the kernel header files, in the address space of the kernel. And similar tables of indexes of system calls are present for all architectures supported by Linux (tables for different architectures differ in dimension, composition, and numerical values ​​of indices for similar calls!).
Starting with version 2.6 of the kernel, the sys_call_table symbol was excluded from being exported, proceeding from the considerations of security that were very peculiarly understood by the kernel development team (I can assume that security was supposed to be interpreted in the sense of: security of a piece of bread by kernel developers from third-party programmers). All books on writing Linux drivers state that it is impossible to use sys_call_table in driver code. Now, and even more in the subsequent parts of the discussion, we will show that it is not so!
For quite a long time (since 2011), working with the topics under discussion, I read a lot of publications on this subject. Virus writers and any other trash scaring themselves with a scary word hacker, which they didn’t invent to search for sys_cal_table - even dynamically decoded dumps of binary memory fragments occupied by the kernel, doing a scan of the kernel’s memory sections (looking for, for example, sys_close () positions that are exported is always). As will be shown now, all this is done much easier. The only Linux resilience secret is not. that the dirty tricks can't find something there, but the fact that the regulation of access rights does not allow (without root rights) to do any nasty things outside of this regulation ... and no one gives root rights to dirty tricks.
But back to the task of resolving non-exported kernel symbols. The first option (the mod_kct.c file) demonstrates the use of kallsyms_lookup_name () (for simplicity and shortening, the inclusion of header files is not shown, the necessary macros like MODULE _ * () ... - all this is in the archive files):
static int __init ksys_call_tbl_init( void ) { void** sct = (void**)kallsyms_lookup_name( "sys_call_table" ); printk( "+ sys_call_table address = %p\n", sct ); if( sct ) { int i; char table[ 120 ] = "sys_call_table : "; for( i = 0; i < 10; i++ ) sprintf( table + strlen( table ), "%p ", sct[ i ] ); printk( "+ %s ...\n", table ); } return -EPERM; } module_init( ksys_call_tbl_init );
Here, the address of the sys_call_table table and then the addresses of the handlers of the first 10 system calls (__NR_restart_syscall ... __NR_link) contained in it are extracted:
$ sudo insmod mod_kct.ko insmod: ERROR: could not insert module mod_kct.ko: Operation not permitted $ dmesg | tail -n 2 [39473.496040] + sys_call_table address = c1666140 [39473.496045] + sys_call_table : c1067840 c1059280 c1055eb0 c1179ee0 c1179f70 c1178cb0 c1176ff0 c1059570 c1178d10 c1188860 ...
(The error 'Operation not permitted' should not be embarrassing - we didn’t intend to load the module, which is indicated by a non-zero return code -EPERM, we simply execute our code in the privileged mode, supervisor, zero processor protection ring).
Let us make sure what the addresses found correspond to the beginning of the sys_call_table array correspond to:
$ sudo cat /proc/kallsyms | grep c1067840 c1067840 T sys_restart_syscall $ sudo cat /proc/kallsyms | grep c1059280 c1059280 T SyS_exit c1059280 T sys_exit $ sudo cat /proc/kallsyms | grep c1055eb0 c1055eb0 T sys_fork
... and so on (compare with the table of system call numbers shown earlier).
The next option will be a little harder to understand, it uses the kallsyms_on_each_symbol () function, but it is also more universal (mod_koes.c file):
static int __init ksys_call_tbl_init( void ) { void **sct = find_sym( "sys_call_table" ); // table sys_call_table address printk( "+ sys_call_table address = %p\n", sct ); if( sct != NULL ) { int i; char table[ 120 ] = "sys_call_table : "; for( i = 0; i < 10; i++ ) sprintf( table + strlen( table ), "%p ", sct[ i ] ); printk( "+ %s ...\n", table ); } return -EPERM; } module_init( ksys_call_tbl_init );
Textually, it almost completely repeats the previous one; all productive work is performed by the find_sym () function, which is presented and discussed above. The result of the execution is always the same:
$ sudo insmod mod_koes.ko insmod: ERROR: could not insert module mod_koes.ko: Operation not permitted $ dmesg | tail -n2 [42451.186648] + sys_call_table address = c1666140 [42451.186654] + sys_call_table : c1067840 c1059280 c1055eb0 c1179ee0 c1179f70 c1178cb0 c1176ff0 c1059570 c1178d10 c1188860 ...
Discussion
A skeptic may say: "So what?". And the fact that the necessary and sufficient mechanisms are shown in order to use any kernel API in the actual code of the kernel modules loaded dynamically. The shown technique expands the range of possibilities of the author of the kernel module by orders of magnitude! These are so voluminous prospects that for their consideration we will need the following parts of this discussion.
... but so that the end of the story is not so boring, we will show one of the simple but impressive applications - the execution of the system call code (generally speaking, any) of the user library from the code of the kernel module.
Have you been told that the kernel module code outputs to the system log (printk ()) and cannot output to the terminal (printf ())? Now we will show that this is not so ... Here is such a simple kernel module that outputs to the terminal:
static asmlinkage long (*sys_write) ( unsigned int, const char __user *, size_t ); static int __init wr_init( void ) { char buf[ 80 ] = "Hello from kernel!\n"; int len = strlen( buf ), n; sys_write = find_sym( "sys_write" ); printk( "+ sys_write address = %p\n", sys_write ); printk( "+ [%d]: %s", len, buf ); if( sys_write != NULL ) { mm_segment_t fs = get_fs(); set_fs( get_ds() ); n = sys_write( 1, buf, len ); set_fs( fs ); printk( "+ printf() return : %d\n", n ); } return -EPERM; } module_init( wr_init );
And here is its execution (attempt to boot with the emergency exit code):
$ sudo insmod mod_wrc.ko Hello from kernel! insmod: ERROR: could not insert module mod_wrc.ko: Operation not permitted $ dmesg | tail -n3 [23942.974587] + sys_write address = c1179f70 [23942.974591] + [19]: Hello from kernel! [23942.974612] + printf() return : 19
The first line here is derived from the write () system call. Naturally, the output is made to the controlling terminal of the insmod user process, but here it is important that we execute the write () system call from the kernel space code. Here some details may require additional explanations:
Where did I get such a “cunning” prototype of the description of the address variable sys_write? Of course, I unscrupulously copied it from the original definition of the sys_write () function in the kernel, in the header file <linux / syscalls.h>, which is indicated by the comment in the code (in the full code, in the archive):
/* <linux/syscalls.h> asmlinkage long sys_write( unsigned int fd, const char __user *buf, size_t count ); */
And only this way should be done for all used non-exported kernel names - writing off the prototypes of the implementing functions from the corresponding header files. Any minimal mismatch of the prototype will lead to an immediate crash of the operating system!
What do several similar calls of the form mean: get_ds (), get_fs (), set_fs ()? This is a small trick in temporarily replacing data segments in the kernel. The fact is that in the prototype of the sys_write () system call handler there is a __user qualifier, indicating that the pointer points to data in user space. The system call code checks for ownership (only the range of the numerical value of the address), and if the address points to an area of ​​kernel space (as in our case) it will cause an abnormal termination. With this trick, we show the control code that our address should be interpreted as belonging to user space. In such cases, this trick can be used mechanically, without particularly thinking about its meaning.
Notes
Experiments with similar codes, and even more so in more detailed cases that I intend to discuss later, are fraught with troubles - even minor errors in the code instantly flood the operating system. Even worse, the system collapses in an unstable, unstable state, and there is a finite (not high) probability that the system will not recover even after a reboot.
During experiments with similar codes all the time I was interested in the question: is it possible to work out and test them in a virtual machine? For all that, we will have to perform (later) very machine-dependent things, such as writing to the hidden hardware registers of the processor, for example CR0.
I can state with satisfaction that all the discussed codes are adequately executed in virtual machines in the Oracle VirtualBox environment, at least in relatively recent versions, starting from the 2013 state.
Therefore, I strongly recommend working with such codes initially in virtual machines in order to avoid serious trouble.
The mention of Oracle VirtualBox does not mean that this state of affairs will not be saved in other virtual machine managers, I just did not check the codes in these managers (almost certainly everything will be fine in QEMU / KVM, since VirtualBox borrows the virtualization code from QEMU).
The archive of files (codes) for experiments, which is mentioned in the text, can be found
here or
here .