Identification of Linux kernel loadable modules [part 1]: source code

In this post, I’ll tell you about my search for signs of how you can determine that a loadable Linux kernel module (LKM) is being assembled from some source files, and not a regular executable file.
Suppose that there is no information about the purpose of source codes, or they are trying to deliberately hide it.
Upd : The amount of code> 4 GB and it is necessary to quickly select only those sources that implement the kernel modules.

# 01 KERNEL

When assembling the source texts, the preprocessor symbol __KERNEL__ is defined .
')

As Alessandro Rubini and Jonathan Corbet write in the book “Device Drivers for Linux”:

“Since the module is not linked to any of the standard libraries, the source code of the module should not include regular header files. In kernel modules, only those functions that are exported by the kernel can be used. All header files that are related to the kernel are located in the include / linux and include / asm directories, inside the directory tree with the kernel sources (usually, this is the / usr / src / linux directory).
Early versions of Linux (based on libc version 5 and earlier) installed symbolic links from / usr / include / linux and / usr / include / asm to the actual directories from the kernel sources, so the libc header file tree could refer to the kernel header files . This made it possible to include the kernel header files in user applications when the need arose.
But even now, when the kernel header files are separated from the header files used by application programs, it is still sometimes necessary to include them in programs running in user space in order to use definitions that are not found in regular header files. However, most of the definitions from the kernel header files belong exclusively to the kernel and are “invisible” for normal applications, since access to these definitions is enclosed in #ifdef __KERNEL__ blocks. By the way, this is one of the reasons why it is necessary to define the __KERNEL__ symbol when building a module. ”

For example, the line "CFLAGS = -D__KERNEL__" may be present in the makefile.
Or "-D__KERNEL__" can be found in the build logs.

# 02 MODULE

If the module is not statically linked to the kernel, then the string "-DMODULE" will always be included in the CFLAGS variable. This preprocessor symbol must be defined before the linux / module.h file is included .

# 03 All names are declared as static and have a unique prefix

Thus, the developer avoids the “pollution” of the kernel's namespace - otherwise, when debugging, he would have to catch the names of his module among all the kernel names. Using the prefix frees you from the obligation to invent unique names that will not match the names already present in the kernel namespace.

# 04 printk ()

In the source, the printk () function is used instead of the printf () function. "Device Drivers in Linux" says:

“The printk function is defined in the kernel and in its behavior resembles the printf function from the standard C library. Why then does the kernel have its own function? Everything is simple - the core is an independent code that is compiled without auxiliary libraries of the C language. "

# 05 init_module and cleanup_module

"Device Drivers in Linux" says:

“The application is performed as an integral task, from beginning to end. The module simply registers itself in the kernel, preparing it for servicing possible requests, and its main function completes its work immediately after the call. In other words, the task of the init_module function (entry point) is to prepare the functions of the module for subsequent calls. She seems to say to the nucleus: “Hey! I'm here! Here is what I can do! “. The second entry point to the module, cleanup_module, is called immediately before the module is unloaded. She tells the core: “I'm leaving! Don't ask me about anything else! “. „

Upd: A more reliable sign is the presence in the text of the function cleanup_module , since Functions with this name are found approximately 20 times less frequently than with the name “ init_module ”. Apparently, the name " init_module " is popular not only among writers of kernel modules.

# 06 Using current->

"Device Drivers in Linux" says:

"<...> The kernel code can determine the current process accessing the module through the global current element — a pointer to the struct task_struct , which is declared in the 2.4 kernel in the <asm / current.h> file. The current pointer refers to the current user process. In the process of making system calls, such as read or write , the process making the call is considered current. The kernel can use information about the current process using the current pointer if the need arises. <...>
In fact, current is no longer a global kernel variable, as it was before. The developers have optimized access to the structure that describes the current process, moving it onto the stack. You will find the implementation of current in the <asm / current.h> file. But before you go exploring this file, you must remember that Linux is an SMP-compatible system (from the English. SMP - Symmetric Multi-Processing) and therefore a simple global variable is simply not applicable here. Implementation details are in other kernel subsystems and yet, the device driver can plug in the header file <linux / sched.h> and access the current pointer.
From the point of view of the module, current is a regular external link, such as printk . A module can access current whenever it deems necessary. For example, the following code displays the identifier (ID) of the process and the name of the command that launched the process:
printk ("The process is \"% s \ "(pid% i) \ n", current-> comm, current-> pid);

The command name is stored in the field current-> comm and is the name of the program file.

And how do you know the differences between the kernel module and the executable file at the source level?

Identification of Linux kernel loadable modules [part 1]: source code

# 01 KERNEL

# 02 MODULE

# 03 All names are declared as static and have a unique prefix

# 04 printk ()

# 05 init_module and cleanup_module

# 06 Using current->

Related links:

More articles:

Identification of Linux kernel loadable modules [part 1]: source code

# 01 __KERNEL__

# 02 MODULE

# 03 All names are declared as static and have a unique prefix

# 04 printk ()

# 05 init_module and cleanup_module

# 06 Using current->

Related links:

More articles:

# 01 KERNEL