fork () vs. vfork ()

Listen up
After all, if the stars are lit - it means - is it necessary for someone?

V.V. Mayakovsky, 1914

I do programming for embedded systems, and decided to write this article in order to better understand the problem of using the fork () and vfork () system calls. The second of them is often advised not to use, but it is clear that he appeared for a reason.

Let's see when and why it is better to use this or that challenge.
')
As a bonus, the description of vfork () / fork () implementations in our project will be given. First of all, my interest is connected with the use of these calls in embedded systems, and the main feature of these implementations is the absence of virtual memory. Perhaps habrovchane well versed in system programming and embedded systems, will give advice and share experiences.

Who cares, I ask under the cat.

Let's start with the definition, that is, the POSIX standard, in which these functions are defined:

fork () creates an exact copy of the process with the exception of a few variables. If successful, the function returns zero to the child process and the number of the child process to the parent (the processes then begin to “live their life”).

vfork () is defined as fork () with the following restriction: the behavior of the function is not defined if the process created with its help performs at least one of the following actions:

Will return from the function in which vfork () was called;
Call any function except _exit () or exec * () ;
Changes any data except the variable that stores the value returned by the vfork () function.

In order to understand why there is a system call at all with such strong limitations, you need to figure out what an exact copy of the process is.

One of the first links in the search engine for this topic in Russian is the description of the cloning parameters of processes in Linux. It follows that some parameters can be made common for the parent and child processes:

Address space (CLONE_VM);
File system information (CLONE_FS);
Open file table (CLONE_FILES);
Table of signal handlers (CLONE_SIGHAND);
Parent process (CLONE_PARENT).

In POSIX for vfork (), it is not allowed to change variables, and this suggests that it is a matter of cloning the address space. This link confirms the assumption:

Unlike fork () , vfork () does not create a copy of the parent process, but creates an address space that is shared with the parent process until the _exit function or one of the exec functions is called.
The parent process at this time stops its execution. All the restrictions on use follow from this — the child process cannot change any global variables or even common variables that are shared with the parent process.

In other words, if this statement is true, after calling vfork () both processes will see the same data.
Let's do an experiment. If this is true, then the changes made to the data of the child process must be visible in the parent process, and vice versa.

Code testing assumption.

static int create_process(void) { pid_t pid; int status; int common_variable; common_variable = 0; pid = fork(); if (-1 == pid) { return errno; } if (pid == 0) { /*     */ common_variable = 1; exit(EXIT_SUCCESS); } waitpid(pid, &status, 0); if (common_variable) { puts("vfork(): common variable has been changed."); } else { puts("fork(): common variable hasn't been changed."); } return EXIT_SUCCESS; } int main(void) { return create_process(); }

If you build and run this program, we get the output:
fork (): common variable hasn't been changed.

When replacing fork () with vfork () , the output will change:
vfork (): common variable has changed.

Many use this property when transferring data between processes, although the behavior of such programs is not defined by POSIX. This is likely to create problems that make it advised not to use vfork () .

Indeed, one thing is when a developer consciously changes the value of some variable, and quite another when he forgets that the child process cannot, for example, return from the function in which vfork () was called (because it will destroy the stack structure of the parent process !) And even acting consciously, as usual, you use undocumented features at your own risk.

But a couple of less obvious problems:

The book “Secure Programming for Linux and Unix HOWTO” says that even if a child does not really change any data in a high-level language code, it may not be so in computer code (for example, due to the appearance of hidden temporary variables).
This blog analyzes the following question: what if vfork () is called in a multithreaded application? Consider the vfork () implementation in Linux: the manual says that the parent process stops when it is called, but in fact it only happens with the current thread (which, of course, is easier to implement). This means that the child process continues to run in parallel with other threads that may, for example, change the rights of the parent process. And here everything will become very bad: we will get two processes with different rights in the same address space, which opens a security hole.

Now consider the functions of the exec * family . Only they (apart from _exit () ) can be called in the process obtained with vfork () . They create a new address space, and then load the code and data from the specified file into it. In this case, the old address space, in fact, is destroyed.
Therefore, if the process is created using fork () and then calls exec * () , creating (copying) the address space when calling fork () was redundant, and this is quite a laborious operation, and it may take the most time to call fork. () In Wikipedia, for example, this moment is given the most attention, and, unlike the standard, it is directly stated :

Space for the child. Copy of the parent process.

Of course, on most modern systems with virtual memory, no copying takes place; all pages in the memory of the parent process are simply flagged with copy-on-write . However, at the same time you need to go over the entire hierarchy of tables, and this takes time.

It turns out that the vfork () call should execute faster than fork () , which is also mentioned in the LinuxMan page .

We will conduct another experiment and make sure that this is true. Let's slightly change the previous example: add a loop to create 1000 processes, remove the common variable and display it on the screen.

The resulting code.

 #include <sys/types.h> #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <unistd.h> #include <sys/wait.h> static int create_process(void) { pid_t pid; int status; pid = vfork(); if (-1 == pid) { return errno; } if (pid == 0) { /* child */ exit(EXIT_SUCCESS); } waitpid(pid, &status, 0); return EXIT_SUCCESS; } int main(void) { int i; for (i = 0; i < 1000; i ++) { create_process(); } return EXIT_SUCCESS; }

Run through the time command.

Output when using fork ()	Output when using vfork ()
`real 0m0.135s user 0m0.000s sys 0m0.052s`	`real 0m0.028s user 0m0.000s sys 0m0.016s`

The result, to put it mildly, is impressive. From launch to launch, the data will differ slightly, but still vfork () will be 4 to 5 times faster.

The conclusions are as follows:
fork () is a more “heavy” call, and if vfork () can be called, it is better to use it.
vfork () is a less secure call, and it’s easier to shoot yourself in the foot, and, accordingly, it should be applied intelligently.
fork () / vfork () should be used where you need to create separate resources for the process (inodes, user, working folder), otherwise it is worth working with pthread *, which work even faster.
It is better to use fork () when you really need to create a separate address space. However, it is very difficult to implement on small processor platforms without hardware support for virtual memory.

Before turning to the second part of the article, I note that in POSIX there is a function posix_spawn () . This function, in fact, contains vfork () and exec () , and therefore avoids the problems associated with vfork () , in the absence of re-creating the address space as in fork () .

Now let's move on to our fork () / vfork () implementation without MMU support.

Vfork implementation

Implementing vfork () in our system, we assumed that the vfork () call should occur like this: the parent goes into standby mode, and the first of vfork () returns the child process, waking up the parent when calling the _exit () or exec * () function. This means that a descendant can be executed on the parent stack, but with its own resources of other types: inodes, signal table, and so on.

The storage of various types of resources in our project is a task ( struct task ). It is this structure that describes all the resources of the process, including the available memory, inodes, and the list of threads that belong to this process. A task always has a main thread - the one that is created when it is initialized. The flow in our system is called the object of planning, more about this - in the article of my colleague . Since the stack is controlled by a thread, not a task, we can offer two options for implementation:

Change the stack in the newly created thread to the parent's stack;
“Replace” a task with a new one for the same execution thread

One way or another, the task will have to be created, or rather, to inherit it from the parent one: a clone of the signal table, environment variables, and so on will be made. Address space, however, will not be inherited.

Return from vfork () will be performed twice: for the parent and child processes. It means that the registers of the stack frame from which vfork () was called should be saved somewhere. You cannot do this on the stack, since the child process can wipe these values at runtime. However, the vfork () signature does not imply the presence of some kind of buffer, so first the registers are stored on the stack, and only then - somewhere in the parent task. Saving registers on the stack could be done using a system call, but we decided to do without it and did it ourselves. Naturally, the vfork () function is written in assembler.

Code for i386 architecture.

 vfork: subl $28, %esp; pushl %ds; pushl %es; pushl %fs; pushl %gs; pushl %eax; pushl %ebp; pushl %edi; pushl %esi; pushl %edx; pushl %ecx; pushl %ebx; movl PT_END(%esp), %ecx; movl %ecx, PT_EIP(%esp); pushf; popl PT_EFLAGS(%esp); movl %esp, %eax; addl $PT_END+4, %eax; movl %eax, PT_ESP(%esp); push %esp; call vfork_body

Thus, the registers are first stored on the stack, and then the C- shny function vfork_body () is called. As an argument, it is passed a pointer to a structure with a set of registers.

The mentioned structure for i386.

 typedef struct pt_regs { /* Pushed by SAVE_ALL. */ uint32_t ebx; uint32_t ecx; uint32_t edx; uint32_t esi; uint32_t edi; uint32_t ebp; uint32_t eax; uint32_t gs; uint32_t fs; uint32_t es; uint32_t ds; /* Pushed at the very beginning of entry. */ uint32_t trapno; /* In some cases pushed by processor, in some - by us. */ uint32_t err; /* Pushed by processor. */ uint32_t eip; uint32_t cs; uint32_t eflags; /* Pushed by processor, if switching of rings occurs. */ uint32_t esp; uint32_t ss; } pt_regs_t;

The vfork_body () code is architecturally independent. He is responsible for creating the task and saving the registers needed to exit.

Function code vfork_body ().

 void __attribute__((noreturn)) vfork_body(struct pt_regs *ptregs) { struct task *child; pid_t child_pid; struct task_vfork *task_vfork; int res; /* can vfork only in single thread application */ assert(thread_self() == task_self()->tsk_main); /* create task description but not start its main thread */ child_pid = task_prepare(""); if (0 > child_pid) { /* error */ ptregs_retcode_err_jmp(ptregs, -1, child_pid); panic("vfork_body returning"); } child = task_table_get(child_pid); /* save ptregs for parent return from vfork() */ task_vfork = task_resource_vfork(child->parent); memcpy(&task_vfork->ptregs, ptregs, sizeof(task_vfork->ptregs)); res = vfork_child_start(child); if (res < 0) { /* Could not start child process */ /* Exit child task */ vfork_child_done(child, vfork_body_exit_stub, &res); /* Return to the parent */ ptregs_retcode_err_jmp(&task_vfork->ptregs, -1, -res); } panic("vfork_body returning"); }

A few explanations to the code.
First, a multithreading check occurs (problems associated with it when using vfork () , discussed above). Then a new task is created, and if it succeeds, the registers are saved in it to return from vfork () .
After that, the vfork_child_start () function is called , which, as the name implies, “starts” the child process. The quotes here are not random, as in fact the task can be launched later, it all depends on the specific implementation, of which there are two in our project. Before proceeding to their description, consider the functions _exit () and exec * () .
When they are called, the parent thread must be unblocked. We will say that at this very moment a full-fledged launch of the child process occurs as a separate entity in the system.

Execv function code

 int execv(const char *path, char *const argv[]) { struct task *task; /* save starting arguments for the task */ task = task_self(); task_resource_exec(task, path, argv); /* if vforked then unblock parent and start execute new image */ vfork_child_done(task, task_exec_callback, NULL); return 0; }

Other functions of the exec * family are expressed through the execv () call.

Function code _exit ()

 void _exit(int status) { struct task *task; task = task_self(); vfork_child_done(task, task_exit_callback, (void *)status); task_start_exit(); { task_do_exit(task, TASKST_EXITED_MASK | (status & TASKST_EXITST_MASK)); kill(task_get_id(task_get_parent(task)), SIGCHLD); } task_finish_exit(); panic("Returning from _exit"); }

As can be seen from the above code, in order to unlock the parent process, the vfork_child_done () function is used with the handler as one of the parameters. To implement a particular algorithm, the work must be implemented:

vfork_child_start () - the function called at the beginning of the cloning process should block the parent process;
vfork_child_done () is a function that is called upon the final start of the child process, the parent process is unlocked;
task_exit_callback () - a handler for completing a child process;
task_exec_callback () - a handler for the full launch of the child process.

First implementation

The idea of the first implementation is to use the same control flow besides the same stack. In fact, in this case, you only need to “replace” the task for the current thread with the child one until the child task finally starts when you call vfork_child_done () .

Function code vfork_child_start ()

 int vfork_child_start(struct task *child) { thread_set_task(thread_self(), child); /* mark as vforking */ task_vfork_start(child); /* Restore values of the registers and return 0 */ ptregs_retcode_jmp(&task_resource_vfork(child->parent)->ptregs, 0); panic("vfork_child_start returning"); return -1; }

The following happens: the current execution thread (that is, the parent thread) is bound to the child process by the thread_set_task () function — for this, it suffices to change the corresponding pointer in the structure of the current thread. This means that when accessing resources associated with a task, the thread will refer to the child's task, and not to the parent, as before. For example, when a thread tries to find out which task a thread belongs to (the task_self () function), it will receive a child task.

After this, the child task is marked as created as a result of vfork , this flag will be needed in order for the vfork_child_done () function to be executed as needed (more details - a little later).
Then the registers saved by the vfork () call are restored. Recall that according to POSIX, the vfork () call should return a value of zero to the child process, which is done by calling ptregs_retcode_jmp (ptregs, 0) .

As already mentioned, when the child process calls the _exit () or execv () function, the vfork_chlid_done () function must unblock the parent stream. In addition, you need to prepare a child task for executing the required handler.

Function code vfork_child_done ()

 void vfork_child_done(struct task *child, void * (*run)(void *), void *arg) { struct task_vfork *vfork_data; if (!task_is_vforking(child)) { return; } task_vfork_end(child); task_start(child, run, NULL); thread_set_task(thread_self(), child->parent); vfork_data = task_resource_vfork(child->parent); ptregs_retcode_jmp(&vfork_data->ptregs, child->tsk_id); }

Handler code task_exit_callback ()

 void *task_exit_callback(void *arg) { _exit((int)arg); return arg; }

Handler code for task_exec_callback ()

 void *task_exec_callback(void *arg) { int res; res = exec_call(); return (void*)res; }

When calling vfork_child_done (), you must consider the case of using exec () / _exit () without vfork () - then you just need to exit the current function, because there is no need to unlock the parent, and you can immediately proceed to launch the child. If the process was created using vfork () , the following is done: first, the is_vforking flag is removed from the child task using task_vfork_end () , then, finally, the main thread of the child task starts. The entry point is the run function, which should be one of the handlers described earlier ( task_exec_callback , task_exit_callback ) - they are necessary when implementing vfork () . After that, the thread’s belonging to the task changes: instead of the child, the parent is specified again. Finally, it returns to the parent task from the vfork () call with the child process ID as the return value. As mentioned above, this is done by calling ptregs_retcode_jmp () .

The second implementation of vfork

The idea of the second implementation is to use the parent stack with a new thread that was created with the new task. This is obtained automatically if the registers previously saved in the parent stream are restored in the child stream. In this case, you can use this synchronization between threads, as described in the already mentioned article . This is certainly more beautiful, but also more difficult to implement, because when the parent thread is waiting, its descendant will be executed on the same stack. This means that for the waiting time, you need to switch to some intermediate stack, where you can safely wait for the call of a child of _exit () or exec * () .

The vfork_child_start function code for the second implementation

 int vfork_child_start(struct task *child) { struct task_vfork *task_vfork; task_vfork = task_resource_vfork(task_self()); /* Allocate memory for the new stack */ task_vfork->stack = sysmalloc(sizeof(task_vfork->stack)); if (!task_vfork->stack) { return -EAGAIN; } task_vfork->child_pid = child->tsk_id; /* Set new stack and go to vfork_waiting */ if (!setjmp(task_vfork->env)) { CONTEXT_JMP_NEW_STACK(vfork_waiting, task_vfork->stack + sizeof(task_vfork->stack)); } /* current stack was broken, can't reach any old data */ task_vfork = task_resource_vfork(task_self()); sysfree(task_vfork->stack); ptregs_retcode_jmp(&task_vfork->ptregs, task_vfork->child_pid); panic("vfork_child_start returning"); return -1; }

Explanation of the code:
First, space is allocated for the stack, after that the pid (process ID) of the child is saved, since it will be required by the parent to return from vfork () .
Calling setjmp () will return to the stack location where vfork () was called. As already mentioned, the wait must be performed on some intermediate stack, and the switch is performed using the CONTEXT_JMP_NEW_STACK () macro, which changes the current stack and transfers control to the vfork_waiting () function. It will activate the child and block the ancestor before calling vfork_child_done () .

Vfork_waiting code

 static void vfork_waiting(void) { struct sigaction ochildsa; struct task *child; struct task *parent; struct task_vfork *task_vfork; parent = task_self(); task_vfork = task_resource_vfork(parent); child = task_table_get(task_vfork->child_pid); vfork_wait_signal_store(&ochildsa); { task_vfork_start(parent); task_start(child, vfork_child_task, &task_vfork->ptregs); while (SCHED_WAIT(!task_is_vforking(parent))); } vfork_wait_signal_restore(&ochildsa); longjmp(task_vfork->env, 1); panic("vfork_waiting returning"); }

As can be seen from the code, first of all the table of signals of the child process is saved. In fact, the SIGCHLD signal will be redefined, which is sent when the status of the child process changes. In this case, it is used to unlock the parent.

New SIGCHLD handler

 static void vfork_parent_signal_handler(int sig, siginfo_t *siginfo, void *context) { task_vfork_end(task_self()); }

Saving and restoring the signal table is done using the POSIX call sigaction () .

Saving the handler

 static void vfork_wait_signal_store(struct sigaction *ochildsa) { struct sigaction sa; sa.sa_flags = SA_SIGINFO; sa.sa_sigaction = vfork_parent_signal_handler; sigemptyset(&sa.sa_mask); sigaction(SIGCHLD, &sa, ochildsa); }

Recovery handler

 static void vfork_wait_signal_restore(const struct sigaction *ochildsa) { sigaction(SIGCHLD, ochildsa, NULL); }

After replacing the signal handler, the task is marked as being in the standby mode, in which it will remain until the present launch of the child task when calling _exit () / exec * () . The vfork_child_task () function is used as the entry point to the task, which restores previously saved registers and returns from vfork () .

Function code vfork_child_task ()

 static void *vfork_child_task(void *arg) { struct pt_regs *ptregs = arg; ptregs_retcode_jmp(ptregs, 0); panic("vfork_child_task returning"); }

When calling _exit () and exec * (), SIGCHLD will be sent, and the handler of this signal will uncheck waiting for the child to run. After that, the old SIGCHLD signal handler will be restored, and the control will be returned to the vfork_child_start () function using longjmp () . It must be remembered that the stack frame of this function will be damaged after the execution of the child process, therefore local variables will contain not what is needed. After releasing the previously allocated stack, the vfork () function returns the number of the child task.

Vfork health check

To check the correct behavior of vfork (), we wrote a set of tests covering several situations.

Two of them check for a valid return from vfork () when calling _exit () and execv () as a child process.

First test

 TEST_CASE("after called vfork() child call exit()") { pid_t pid; pid_t parent_pid; int res; parent_pid = getpid(); pid = vfork(); /* When vfork() returns -1, an error happened. */ test_assert(pid != -1); if (pid == 0) { /* When vfork() returns 0, we are in the child process. */ _exit(0); } wait(&res); test_assert_not_equal(pid, parent_pid); test_assert_equal(getpid(), parent_pid); }

Second test

 TEST_CASE("after called vfork() child call execv()") { pid_t pid; pid_t parent_pid; int res; parent_pid = getpid(); pid = vfork(); /* When vfork() returns -1, an error happened. */ test_assert(pid != -1); if (pid == 0) { close(0); close(1); close(2); /* When vfork() returns 0, we are in the child process. */ if (execv("help", NULL) == -1) { test_assert(0); } } wait(&res); test_assert_not_equal(pid, parent_pid); test_assert_equal(getpid(), parent_pid); }

Another test checks the use of the same stack by the parent and child processes.

Third test

 TEST_CASE("parent should see stack modifications made from child") { pid_t pid; int res; int data; data = 1; pid = vfork(); /* When vfork() returns -1, an error happened. */ test_assert(pid != -1); if (pid == 0) { data = 2; /* When vfork() returns 0, we are in the child process. */ _exit(0); } wait(&res); test_assert_equal(data, 2); }

However, I would like to check the correctness of the work on some real, and third-party, program, and for this, a fairly well-known dropbear package was chosen . When configured, it checks for fork () , and if it does not find it, it can use vfork () . Immediately I would like to say that this was done in order to support ucLinux , and not in order to improve performance.

The OS was configured accordingly (for dropbear to use vfork () ), and using ssh to successfully connect to both implementations.

Screenshot

PS Also in our project we managed to implement fork () itself without using the MMU, at the moment an article is being compiled about it.

Source: https://habr.com/ru/post/232605/

All Articles

fork () vs. vfork ()

Vfork implementation

First implementation

The second implementation of vfork

Vfork health check

More articles: