Learning Linux Processes

In this article, I would like to talk about the life course of the processes in the Linux family. In theory and examples, I will look at how processes are born and die, talk a little about the mechanics of system calls and signals.

This article is more designed for newcomers to system programming and those who just want to learn a little more about how Linux processes work.

Everything written below is valid for Debian Linux with kernel 4.15.0.

Content

Introduction

System software interacts with the system kernel through special functions - system calls. In rare cases, there is an alternative API, for example, procfs or sysfs, implemented as virtual file systems.
')

Process Attributes

The process in the kernel is simply represented as a structure with many fields (the definition of the structure can be read here ).
But since the article is devoted to system programming, and not to the development of the kernel, we abstract a little and just focus on the important for us fields of the process:

Process ID (pid)
Open File Descriptors (fd)
Signal handlers
Current working directory (cwd)
Environmental variables (environ)
Return code

Process life cycle

Birth process

Only one process in the system is born in a special way - init - it is generated directly by the kernel. All other processes appear by duplicating the current process using the fork(2) system call. After the execution of fork(2) we obtain two almost identical processes with the exception of the following points:

fork(2) returns the child's PID to the parent, 0 is returned to the child;
The child has the PPID (Parent Process Id) change to the PID of the parent.

After fork(2) all the resources of the child process are a copy of the parent's resources. Copying the process with all allocated memory pages is expensive, so the Linux kernel uses Copy-On-Write technology.
All pages in the memory of the parent are marked as read-only and become accessible to both the parent and the child. As soon as one of the processes changes the data on a particular page, this page does not change, but a copy is copied and changed. The original is “untied” from this process. As soon as the read-only original remains “tied” to one process, the page is again assigned the status of read-write.

An example of a simple useless program with a fork (2)

 #include <stdio.h> #include <unistd.h> #include <errno.h> #include <sys/wait.h> #include <sys/types.h> int main() { int pid = fork(); switch(pid) { case -1: perror("fork"); return -1; case 0: // Child printf("my pid = %i, returned pid = %i\n", getpid(), pid); break; default: // Parent printf("my pid = %i, returned pid = %i\n", getpid(), pid); break; } return 0; }

 $ gcc test.c && ./a.out my pid = 15594, returned pid = 15595 my pid = 15595, returned pid = 0

Ready state

Immediately after execution, fork(2) enters the “ready” state.
In fact, the process is in the queue and waiting for the scheduler in the kernel to let the process run on the processor.

Status "in progress"

As soon as the scheduler put the process to execution, the “running” state began. The process can be performed all the proposed period (quantum) of time, and can give way to other processes, using the sched_yield system export.

Rebirth into another program

In some programs, logic is implemented in which the parent process creates a child process for solving a task. The child in this case solves some specific problem, and the parent only delegates tasks to his children. For example, a web server on an incoming connection creates a child and transfers connection processing to it.
However, if you need to run another program, you must resort to the execve(2) system call:

 int execve(const char *filename, char *const argv[], char *const envp[]);

or library calls execl(3), execlp(3), execle(3), execv(3), execvp(3), execvpe(3) :

 int execl(const char *path, const char *arg, ... /* (char *) NULL */); int execlp(const char *file, const char *arg, ... /* (char *) NULL */); int execle(const char *path, const char *arg, ... /*, (char *) NULL, char * const envp[] */); int execv(const char *path, char *const argv[]); int execvp(const char *file, char *const argv[]); int execvpe(const char *file, char *const argv[], char *const envp[]);

All of the listed calls execute the program, the path to which is indicated in the first argument. If successful, the control is transferred to the loaded program and is not returned to the original one. In this case, the loaded program will have all the fields of the process structure, except for file descriptors marked as O_CLOEXEC , they will close.

How not to be confused in all these calls and choose the right one? Enough to understand the logic of naming:

All calls start with exec
The fifth letter defines the type of argument passing:
- l denotes a list , all parameters are passed as arg1, arg2, ..., NULL
- v stands for vector , all parameters are passed in a null-terminated array;
Next can follow the letter p , which stands for path . If the file argument starts with a character other than "/", then the specified file is searched in the directories listed in the PATH environment variable
The latter may be the letter e , meaning environ . In such calls, the last argument is a null-terminated array of null-terminated strings of the form key=value — environment variables that will be passed to the new program.

Call example / bin / cat --help via execve

 #define _GNU_SOURCE #include <unistd.h> int main() { char* args[] = { "/bin/cat", "--help", NULL }; execve("/bin/cat", args, environ); // Unreachable return 1; }

 $ gcc test.c && ./a.out Usage: /bin/cat [OPTION]... [FILE]... Concatenate FILE(s) to standard output. * *

The exec* call family allows you to run scripts with execute permissions and starting with a shebang sequence (#!).

An example of running a script with a spoofed PATH using execle

 #define _GNU_SOURCE #include <unistd.h> int main() { char* e[] = {"PATH=/habr:/rulez", NULL}; execle("/tmp/test.sh", "test.sh", NULL, e); // Unreachable return 1; }

 $ cat test.sh #!/bin/bash echo $0 echo $PATH $ gcc test.c && ./a.out /tmp/test.sh /habr:/rulez

There is a convention that implies that argv [0] matches zero arguments for exec * family functions. However, this can be broken.

Example of when cat becomes a dog using execlp

 #define _GNU_SOURCE #include <unistd.h> int main() { execlp("cat", "dog", "--help", NULL); // Unreachable return 1; }

 $ gcc test.c && ./a.out Usage: dog [OPTION]... [FILE]... * *

A curious reader may notice that the signature of the function int main(int argc, char* argv[]) has a number - the number of arguments, but nothing in the family of exec* functions. Why? Because when you start the program, control is not transferred immediately to main. Before this, some actions defined by glibc are performed, including the calculation of argc.

State "waiting"

Some system calls can take a long time, such as I / O. In such cases, the process goes into the "waiting" state. As soon as the system call is completed, the kernel will transfer the process to the “ready” state.
In Linux, there is also a “waiting” state in which the process does not respond to interrupt signals. In this state, the process becomes “unkillable”, and all incoming signals are queued until the process leaves this state.
The kernel itself chooses which of the states to transfer the process to. Most often, processes that request I / O get into the "waiting (without interrupts)" state. This is especially noticeable when using a remote disk (NFS) with not very fast internet.

“Stopped” status

You can pause the process at any time by sending a SIGSTOP signal to it. The process will go to the “stopped” state and remain there until it receives a signal to continue working (SIGCONT) or die (SIGKILL). The remaining signals will be queued.

Process completion

No program can complete itself. They can only ask the system for this using the _exit system call or be terminated by the system due to an error. Even when returning a number from main() , _exit is still implicitly called.
Although the system call argument takes an int, only the low byte of the number is taken as the return code.

Zombie condition

Immediately after the process is completed (whether it is correct or not), the kernel writes information about how the process ended and translates its zombie state. In other words, zombies are a completed process, but its memory is still stored in the core.
Moreover, this is the second state in which the process can safely ignore the SIGKILL signal, because dead cannot die again.

Forgetting

The return code and the reason for completing the process are still stored in the kernel and need to be retrieved from there. To do this, you can use the appropriate system calls:

 pid_t wait(int *wstatus); /*  waitpid(-1, wstatus, 0) */ pid_t waitpid(pid_t pid, int *wstatus, int options);

All information about the completion of the process fits into the data type int. The macros described in the waitpid(2) man page are used to get the return code and the reason for the program termination.

An example of correct completion and receipt of a return code

 #include <stdio.h> #include <unistd.h> #include <errno.h> #include <sys/wait.h> #include <sys/types.h> int main() { int pid = fork(); switch(pid) { case -1: perror("fork"); return -1; case 0: // Child return 13; default: { // Parent int status; waitpid(pid, &status, 0); printf("exit normally? %s\n", (WIFEXITED(status) ? "true" : "false")); printf("child exitcode = %i\n", WEXITSTATUS(status)); break; } } return 0; }

 $ gcc test.c && ./a.out exit normally? true child exitcode = 13

Example of incorrect termination

Passing argv [0] as NULL causes a drop.

 #include <stdio.h> #include <unistd.h> #include <errno.h> #include <sys/wait.h> #include <sys/types.h> int main() { int pid = fork(); switch(pid) { case -1: perror("fork"); return -1; case 0: // Child execl("/bin/cat", NULL); return 13; default: { // Parent int status; waitpid(pid, &status, 0); if(WIFEXITED(status)) { printf("Exit normally with code %i\n", WEXITSTATUS(status)); } if(WIFSIGNALED(status)) { printf("killed with signal %i\n", WTERMSIG(status)); } break; } } return 0; }

 $ gcc test.c && ./a.out killed with signal 6

There are cases in which the parent ends earlier than the child. In such cases, the child's parent will be init and he will apply the wait(2) call when the time comes.

After the parent has taken away the information about the child’s death, the kernel erases all the information about the child, so that another process soon comes to replace it.

Thanks

Thanks to Sasha “Al” for editing and design assistance;

Thanks to Sasha “Reisse” for clear answers to difficult questions.

They bravely endured the inspiration that attacked me and the flurry of my questions that attacked them.

Source: https://habr.com/ru/post/423049/

All Articles